Question

How to Read a Large File Line by Line in Python

Question

I want to read a large file line by line without loading the entire file into memory.

The file is too large to fully read at once, and when I try, I get out-of-memory errors.

The file size is about 1 GB.

How can I process it efficiently line by line in Python?

Short Answer

By the end of this page, you will understand how to process very large files in Python safely and efficiently using file iteration. You will learn why reading the whole file at once causes memory problems, how Python handles files lazily, and which patterns are commonly used in real programs to work with large logs, CSVs, and text datasets.

Concept

When working with large files, the main idea is streaming rather than loading everything at once.

If you use methods like read() or readlines() on a very large file, Python tries to bring all file contents into memory. For a 1 GB file, that can easily exceed available RAM, especially because the data in memory may use more space than the raw file size on disk.

A better approach is to read one line at a time. Python file objects are iterable, which means you can loop over them directly. In this pattern, Python only keeps a small part of the file in memory at any given moment.

This matters because real programs often process:

server logs
CSV exports
JSON lines files
system reports
imported datasets

Reading line by line makes your program more memory-efficient and often simpler to reason about. It also lets you start processing immediately instead of waiting for the whole file to load.

Mental Model

Imagine a huge book in a library.

Loading the whole file into memory is like photocopying the entire book onto your desk before reading it.
Reading line by line is like keeping the book open and reading one line at a time.

You do not need every page in front of you at once if your job is to process the book sequentially. Python can act like that careful reader, bringing in just enough data to continue.

Take Quiz

Syntax and Examples

In Python, the most common way to read a file line by line is:

with open("large_file.txt", "r", encoding="utf-8") as file:
    for line in file:
        print(line.rstrip())

Why this works

open(...) opens the file.
with ensures the file is closed automatically.
for line in file reads the file lazily, one line at a time.
rstrip() removes the newline at the end when printing.

Example: counting lines

count = 0

with open("large_file.txt", "r", encoding="utf-8") as file:
    for line in file:
        count += 1

print("Total lines:", count)

This is memory-efficient because only the current line is processed.

Example: find matching lines

Step by Step Execution

Consider this code:

with open("numbers.txt", "r", encoding="utf-8") as file:
    for line in file:
        number = int(line.strip())
        print(number * 2)

Suppose numbers.txt contains:

10
20
30

Step by step

Python opens numbers.txt.
The for line in file loop starts.
Python reads the first line: "10\n".
line.strip() becomes "10".
int("10") becomes 10.
print(number * 2) prints 20.
Python reads the next line: "20\n".
The same process repeats and prints .

Real World Use Cases

Reading large files line by line is common in many practical situations:

Log processing: scan server logs for warnings, errors, or suspicious activity.
CSV imports: process very large exports one row at a time.
ETL scripts: transform data before loading it into a database.
Batch jobs: clean or validate records from text files.
Monitoring tools: inspect output files generated by other systems.
Data migration: move old records into a new system without memory spikes.

Example: simple log counter

error_count = 0

with open("app.log", "r", encoding="utf-8") as file:
    for line in file:
        if "ERROR" in line:
            error_count += 1

print("Errors found:", error_count)

Example: filtering rows into another file

with open("input.txt", "r", encoding="utf-8") as source, open("filtered.txt", "w", encoding=)  target:
     line  source:
           line:
            target.write(line)

Real Codebase Usage

In real projects, developers usually combine line-by-line reading with a few important patterns.

Guard clauses

Skip bad or empty lines early:

with open("data.txt", "r", encoding="utf-8") as file:
    for line in file:
        line = line.strip()
        if not line:
            continue
        print(line)

Validation

Validate each line before processing:

with open("ids.txt", "r", encoding="utf-8") as file:
    for line in file:
        value = line.strip()
        if not value.isdigit():
            continue
        print(int(value))

Error handling

Handle malformed lines without crashing the whole job:

with open("numbers.txt", , encoding=)  file:
     line_number, line  (file, start=):
        :
            number = (line.strip())
            (number)
         ValueError:
            ()

Common Mistakes

1. Using `read()` on a huge file

Broken approach:

with open("large_file.txt", "r", encoding="utf-8") as file:
    data = file.read()

Why it is a problem:

It loads the whole file into memory.
Large files can cause memory errors.

Use this instead:

with open("large_file.txt", "r", encoding="utf-8") as file:
    for line in file:
        process(line)

2. Using `readlines()` for large input

Broken approach:

with open("large_file.txt", "r", encoding="utf-8") as file:
    lines = file.readlines()

Why it is a problem:

It creates a full list of lines in memory.

3. Forgetting to strip newlines

Comparisons

Approach	Memory usage	Good for large files?	Notes
`file.read()`	High	No	Loads the entire file as one string
`file.readlines()`	High	No	Loads all lines into a list
`for line in file`	Low	Yes	Best general approach
`file.readline()` in a loop	Low	Yes	Works, but usually less clean

`for line in file` vs `readline()`

Cheat Sheet

with open("file.txt", "r", encoding="utf-8") as file:
    for line in file:
        print(line.strip())

Best practice

Use with open(...) to auto-close files.
Use for line in file for memory-efficient reading.
Use strip() or rstrip() when needed.
Validate lines before converting them.

Avoid for huge files

file.read()
file.readlines()

Useful patterns

Count lines:

count = 0
with open("file.txt", "r", encoding="utf-8") as file:
    for _ in file:
        count += 1

Track line numbers:

FAQ

Why is reading line by line more memory-efficient in Python?

Because Python only keeps a small portion of the file in memory at a time instead of storing the entire file contents.

Is `for line in file` better than `readlines()` for large files?

Yes. for line in file streams the file, while readlines() loads all lines into memory.

Can Python handle a 1 GB text file?

Yes, if you process it incrementally. Reading it line by line is a common and safe approach.

Should I use `readline()` or a `for` loop?

Usually use a for loop. It is cleaner and is the standard Python style for iterating through a file.

What if the file has invalid text encoding?

Pass the correct encoding to open(). If needed, handle decoding errors with options such as errors="ignore" or errors="replace".

Does line-by-line reading mean only one line is ever in memory?

In practice, Python uses internal buffering, but memory usage stays low compared with reading the entire file.

What if one line in the file is extremely long?

Related Concepts

File handling — the broader topic of opening, reading, writing, and closing files.
Context managers — with statements safely manage resources like files.
Iteration — file objects are iterable, which is why for line in file works.
Generators — another memory-efficient way to process data lazily.
CSV processing — large CSV files are often read row by row.
Exception handling — useful for skipping malformed lines without crashing.
Text encoding — important when reading files with different character formats.
Buffering — explains how file reading stays efficient internally.

Take Quiz

Mini Project

Description

Build a small log scanner that reads a large log file line by line and extracts useful information without loading the whole file into memory. This demonstrates the most common real-world use of streaming file input: searching, counting, and filtering large text files safely.

Goal

Create a Python script that scans a log file, counts matching lines, and writes those matches to a separate output file.

Requirements

Read the input file line by line using a memory-efficient approach.
Count how many lines contain the word ERROR.
Write matching lines to a new file named errors.txt.
Print the total number of matching lines at the end.

Take Quiz

Keep learning

How to Read a Large File Line by Line in Python

Question

Short Answer

Concept

Mental Model

Syntax and Examples

Why this works

Example: counting lines

Example: find matching lines

Step by Step Execution

Step by step

Real World Use Cases

Example: simple log counter

Example: filtering rows into another file

Real Codebase Usage

Guard clauses

Validation

Error handling

Common Mistakes

1. Using read() on a huge file

2. Using readlines() for large input

3. Forgetting to strip newlines

Comparisons

for line in file vs readline()

Cheat Sheet

Best practice

Avoid for huge files

Useful patterns

FAQ

Why is reading line by line more memory-efficient in Python?

Is for line in file better than readlines() for large files?

Can Python handle a 1 GB text file?

Should I use readline() or a for loop?

What if the file has invalid text encoding?

Does line-by-line reading mean only one line is ever in memory?

What if one line in the file is extremely long?

Related Concepts

Mini Project

Description

Goal

Requirements

Related questions

Are PDO Prepared Statements Enough to Prevent SQL Injection in PHP?

Can You Bind an Array to an IN Clause in PHP PDO?

Choosing the Right MySQL Collation for PHP and UTF-8

Avoid this for huge files

Transformation pipelines

4. Not using with

5. Assuming every line is valid

6. Ignoring encoding issues

Text mode vs binary mode

Edge cases

Can I write to another file while reading a large file?

1. Using `read()` on a huge file

2. Using `readlines()` for large input

`for line in file` vs `readline()`

Is `for line in file` better than `readlines()` for large files?

Should I use `readline()` or a `for` loop?

4. Not using `with`