Iterators and Generators in Python

Iterators and Generators in Python

1. Iterators in Python

An iterator is an object that can be iterated (looped) upon. It follows two key methods:

  • __iter__(): Returns the iterator object itself.
  • __next__(): Returns the next item in the sequence. If no items are left, it raises a StopIteration exception.

Example: Creating an Iterator

class MyNumbers:
    def __init__(self):
        self.num = 1

    def __iter__(self):
        return self  # Returning the iterator object itself

    def __next__(self):
        if self.num <= 5:
            current = self.num
            self.num += 1
            return current
        else:
            raise StopIteration  # Stop when numbers exceed 5

# Using the iterator
numbers = MyNumbers()
for number in numbers:
    print(number)
        

Output

1
2
3
4
5
        

In this example:

  • MyNumbers is a custom iterator class.
  • Each time you call __next__(), it returns the next number.
  • When the iterator reaches 6, it raises a StopIteration exception to end the loop.


When to Use Iterators?

1. Custom Iterable Objects

  • Scenario: You need to create a custom data structure (like a linked list, tree, or a circular buffer) and allow users to traverse through it using a for loop.
  • Example: A custom LinkedList class can implement the iterator protocol so that each node can be visited one by one.

class LinkedListIterator:
    def __init__(self, head):
        self.current = head

    def __iter__(self):
        return self

    def __next__(self):
        if self.current:
            data = self.current.data
            self.current = self.current.next
            return data
        else:
            raise StopIteration

# Assuming we have a LinkedList class with head node
# linked_list = LinkedList([1, 2, 3])
# for item in linked_list:  # Using the iterator to loop over the nodes
#     print(item)
        

Usage:

  • Useful when creating complex data structures that need controlled traversal.

2. Database Query Results

  • Scenario: You query a database, and the results might not fit into memory all at once. You want to load one record at a time.

class DatabaseIterator:
    def __init__(self, cursor):
        self.cursor = cursor

    def __iter__(self):
        return self

    def __next__(self):
        row = self.cursor.fetchone()
        if row:
            return row
        else:
            raise StopIteration

# Usage: Fetch rows from a cursor one by one without loading everything into memory.
# for row in DatabaseIterator(cursor):
#     print(row)
        

Usage:

  • Useful for database pagination or large datasets to avoid memory overflow.



2. Generators in Python

A generator is a simpler way to create iterators using the yield keyword.

How Generators Work:

  • A generator function returns an iterator but instead of returning data with return, it uses yield to produce a value.
  • It remembers the state of the function between yield calls, which makes it memory-efficient.

Example: Creating a Generator

def my_generator():
    num = 1
    while num <= 5:
        yield num  # Yield pauses the function and returns a value
        num += 1

# Using the generator
for number in my_generator():
    print(number)
        

Output

1
2
3
4
5
        

In this example:

  • my_generator() is a generator function.
  • Every time it yields a value, it pauses and resumes from where it left off in the next iteration.
  • Unlike regular functions, it doesn’t reset the state between calls, making it more efficient for large data sequences.


When to Use Generators?

1. Reading Large Files Line-by-Line

  • Scenario: Reading a huge text file where loading the entire content into memory isn't feasible. Instead, you process one line at a time.

def read_large_file(filename):
    with open(filename) as file:
        for line in file:
            yield line.strip()

# Usage
for line in read_large_file('large_text_file.txt'):
    print(line)  # Process each line without loading entire file into memory
        

Usage:

  • Ideal for log files or data streams, where only part of the data is needed at a time.

2. Infinite Data Streams (Lazy Evaluation)

  • Scenario: You need to generate a never-ending sequence of data, like random numbers, Fibonacci series, or sensor data readings.

def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

# Usage
fib = fibonacci()
for _ in range(10):
    print(next(fib))  # Prints the first 10 Fibonacci numbers
        

3. Chunking Data for Batch Processing

  • Scenario: You want to process data in fixed-size chunks (e.g., for machine learning models) without loading the entire dataset into memory at once.

def chunk_data(data, chunk_size):
    for i in range(0, len(data), chunk_size):
        yield data[i:i + chunk_size]

# Usage
data = [1, 2, 3, 4, 5, 6, 7, 8]
for chunk in chunk_data(data, 3):
    print(chunk)  # Output: [1, 2, 3], [4, 5, 6], [7, 8]
        

Usage:

  • Useful in machine learning or data preprocessing where you train models on small batches.

4. Web Scraping Pagination

  • Scenario: When scraping websites with multiple pages, instead of making all requests upfront, you can generate URLs on the fly.

def url_generator(base_url, pages):
    for page in range(1, pages + 1):
        yield f"{base_url}?page={page}"

# Usage
for url in url_generator('https://example.com/data', 5):
    print(url)
        

Usage:

  • Helps reduce network load and control API rate limits.

5. Pipeline Data Processing

  • Scenario: You want to transform data in stages (like in ETL pipelines). Generators allow you to process and pass data between stages efficiently.

def generate_numbers():
    for i in range(1, 6):
        yield i

def square_numbers(numbers):
    for number in numbers:
        yield number * number

# Usage: Chain two generators together
squared_numbers = square_numbers(generate_numbers())
for num in squared_numbers:
    print(num)  # Output: 1, 4, 9, 16, 25
        

Usage:

  • Common in data pipelines (e.g., Apache Airflow, data preprocessing steps).


Difference Between Iterators and Generators

Article content

When to Use Iterators vs Generators

Article content

In summary, iterators are great when you need custom control over how data is accessed, such as with custom data structures or database queries. On the other hand, generators shine in scenarios involving large datasets, real-time streaming, or infinite data streams because they are more memory-efficient and simpler to implement.


For more in-depth technical insights and articles, feel free to explore:


"Have you explored this topic? Share your insights or drop your questions in the comments!"


Anand Bodhe

HubSpot Certified Expert | Boosting Revenue by 20-50% Through Pipeline Optimization | Helping Sales Teams Close Faster

11mo

mastering python is like riding a bike—once you get it, you'll glide! what’s your favorite part about coding?

To view or add a comment, sign in

Others also viewed

Explore content categories