BETTER PROGRAMMING

How to Use Generator and yield in Python

Work with large datasets or files using Python generators

What are generators in Python?

Have you ever run into a situation where you would need to read large datasets or files, and those were too overwhelming to load into memory? Or maybe you wanted to build an iterator, but the producer function was so simple that most of your code is just around building the iterator other than producing the desired values? These are some of the scenarios where generator can be really useful and simple.

Some use cases of generators

Reading large files

A common use case of generators is to work with large files or data streams, like for example, CSV files. Let’s say we need to count how many rows there are on a text file, our code could look something like:

csv_gen = csv_reader("some_file.txt")
row_count = 0

for row in csv_gen:
row_count += 1

print(f"Row count is {row_count}")
def csv_reader(file_name):
file = open(file_name)
result = file.read().split("\n")
return result
Traceback (most recent call last):
File "ex1_naive.py", line 22, in <module>
main()
File "ex1_naive.py", line 13, in main
csv_gen = csv_reader("file.txt")
File "ex1_naive.py", line 6, in csv_reader
result = file.read().split("\n")
MemoryError
def csv_reader(file_name):
for row in open(file_name, "r"):
yield row
Row count is 65123455
csv_gen = (row for row in open(file_name))
  • Using return will result in the first line of the file only.

Generating an infinite sequence

Another common scenario for generators is an infinite sequence generation. In Python, when you are using a finite sequence, you can simply call range() and evaluate it in a list context, for example:

a = range(5)
print(list(a))
[0, 1, 2, 3, 4]
def infinite_sequence():
num = 0
while True:
yield num
num += 1
for i in infinite_sequence():
print(i, end=" ")
>> gen = infinite_sequence()
>>> next(gen)
0
>>> next(gen)
1
>>> next(gen)
2
....

More on yielding

So far we looked at simple cases for generators, and the yield statement, however, as with all Python things, it doesn't end there, there are more things around it, though the idea of it is what you learned so far.

>>> def multiple_yield():
... value = "I'm here for the first time"
... yield value
... value = "My Second time here"
... yield value
...
>>> multi_gen = multiple_yield()
>>> print(next(multi_gen))
I'm here for the first time
>>> print(next(multi_gen))
My Second time here
>>> print(next(multi_gen))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration

Advance generator methods

So far we have covered the most common uses and constructions of generators, but there are a few more things to cover. Over time Python added some extra methods to generators, and I’ll like to discuss the following here:

  • .throw()
  • .close()
def isPrime(n):
if n < 2 or n % 1 > 0:
return False
elif n == 2 or n == 3:
return True
for x in range(2, int(n**0.5) + 1):
if n % x == 0:
return False
return True

def getPrimes():
value = 0
while True:
if isPrime(value):
yield value
value += 1

How to use .send()

.send() allows you to set the value of the generator at any time. Let's say you want to generate only the prime numbers from 1000 onward, that's where .send() comes handy. Let's take a look into that example:

prime_gen = getPrimes()
print(next(prime_gen))
print(prime_gen.send(1000))
print(next(prime_gen))
2
3
5
def getPrimes():
value = 0
while True:
if isPrime(value):
i = yield value
if i is not None:
value = i
value += 1
prime_gen = getPrimes()
print(next(prime_gen))
print(prime_gen.send(1000))
print(next(prime_gen))
2
1009
1013

How to use .throw()

.throw() as you probably guessed allows you to throw exceptions with the generator. This can be useful to for example end the iteration at a certain value.

prime_gen = getPrimes()

for x in prime_gen:
if x > 10:
prime_gen.throw(ValueError, "I think it was enough!")
print(x)
2
3
5
7
Traceback (most recent call last):
File "test.py", line 25, in <module>
prime_gen.throw(ValueError, "I think it was enough!")
File "test.py", line 15, in getPrimes
i = yield value
ValueError: I think it was enough!

How to use .close()

In the previous example, we stop the iteration by raising an exception, however, that’s not very elegant. A better way to end the iterations is by using .close().

prime_gen = getPrimes()

for x in prime_gen:
if x > 10:
prime_gen.close()
print(x)
2
3
5
7
11

Conclusion

Generators, either used as generator functions or generator expressions can be really useful to optimize the performance of our python applications especially in scenarios when we work with large datasets or files. They will also bring clarity to your code by avoiding complicated iterators implementations or handling the data on your own by other means.

I’m an entrepreneur, developer, author, speaker, and doer of things. I write about JavaScript, Python, AI, and programming in general.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store