Question
I want to read a large file line by line without loading the entire file into memory.
The file is too large to fully read at once, and when I try, I get out-of-memory errors.
The file size is about 1 GB.
How can I process it efficiently line by line in Python?
Short Answer
By the end of this page, you will understand how to process very large files in Python safely and efficiently using file iteration. You will learn why reading the whole file at once causes memory problems, how Python handles files lazily, and which patterns are commonly used in real programs to work with large logs, CSVs, and text datasets.
Concept
When working with large files, the main idea is streaming rather than loading everything at once.
If you use methods like read() or readlines() on a very large file, Python tries to bring all file contents into memory. For a 1 GB file, that can easily exceed available RAM, especially because the data in memory may use more space than the raw file size on disk.
A better approach is to read one line at a time. Python file objects are iterable, which means you can loop over them directly. In this pattern, Python only keeps a small part of the file in memory at any given moment.
This matters because real programs often process:
- server logs
- CSV exports
- JSON lines files
- system reports
- imported datasets
Reading line by line makes your program more memory-efficient and often simpler to reason about. It also lets you start processing immediately instead of waiting for the whole file to load.
Mental Model
Imagine a huge book in a library.
- Loading the whole file into memory is like photocopying the entire book onto your desk before reading it.
- Reading line by line is like keeping the book open and reading one line at a time.
You do not need every page in front of you at once if your job is to process the book sequentially. Python can act like that careful reader, bringing in just enough data to continue.
Syntax and Examples
In Python, the most common way to read a file line by line is:
with open("large_file.txt", "r", encoding="utf-8") as file:
for line in file:
print(line.rstrip())
Why this works
open(...)opens the file.withensures the file is closed automatically.for line in filereads the file lazily, one line at a time.rstrip()removes the newline at the end when printing.
Example: counting lines
count = 0
with open("large_file.txt", "r", encoding="utf-8") as file:
for line in file:
count += 1
print("Total lines:", count)
This is memory-efficient because only the current line is processed.
Example: find matching lines
Step by Step Execution
Consider this code:
with open("numbers.txt", "r", encoding="utf-8") as file:
for line in file:
number = int(line.strip())
print(number * 2)
Suppose numbers.txt contains:
10
20
30
Step by step
- Python opens
numbers.txt. - The
for line in fileloop starts. - Python reads the first line:
"10\n". line.strip()becomes"10".int("10")becomes10.print(number * 2)prints20.- Python reads the next line:
"20\n". - The same process repeats and prints .
Real World Use Cases
Reading large files line by line is common in many practical situations:
- Log processing: scan server logs for warnings, errors, or suspicious activity.
- CSV imports: process very large exports one row at a time.
- ETL scripts: transform data before loading it into a database.
- Batch jobs: clean or validate records from text files.
- Monitoring tools: inspect output files generated by other systems.
- Data migration: move old records into a new system without memory spikes.
Example: simple log counter
error_count = 0
with open("app.log", "r", encoding="utf-8") as file:
for line in file:
if "ERROR" in line:
error_count += 1
print("Errors found:", error_count)
Example: filtering rows into another file
with open("input.txt", "r", encoding="utf-8") as source, open("filtered.txt", "w", encoding=) target:
line source:
line:
target.write(line)
Real Codebase Usage
In real projects, developers usually combine line-by-line reading with a few important patterns.
Guard clauses
Skip bad or empty lines early:
with open("data.txt", "r", encoding="utf-8") as file:
for line in file:
line = line.strip()
if not line:
continue
print(line)
Validation
Validate each line before processing:
with open("ids.txt", "r", encoding="utf-8") as file:
for line in file:
value = line.strip()
if not value.isdigit():
continue
print(int(value))
Error handling
Handle malformed lines without crashing the whole job:
with open("numbers.txt", , encoding=) file:
line_number, line (file, start=):
:
number = (line.strip())
(number)
ValueError:
()
Common Mistakes
1. Using read() on a huge file
Broken approach:
with open("large_file.txt", "r", encoding="utf-8") as file:
data = file.read()
Why it is a problem:
- It loads the whole file into memory.
- Large files can cause memory errors.
Use this instead:
with open("large_file.txt", "r", encoding="utf-8") as file:
for line in file:
process(line)
2. Using readlines() for large input
Broken approach:
with open("large_file.txt", "r", encoding="utf-8") as file:
lines = file.readlines()
Why it is a problem:
- It creates a full list of lines in memory.
3. Forgetting to strip newlines
Comparisons
| Approach | Memory usage | Good for large files? | Notes |
|---|---|---|---|
file.read() | High | No | Loads the entire file as one string |
file.readlines() | High | No | Loads all lines into a list |
for line in file | Low | Yes | Best general approach |
file.readline() in a loop | Low | Yes | Works, but usually less clean |
for line in file vs readline()
Cheat Sheet
with open("file.txt", "r", encoding="utf-8") as file:
for line in file:
print(line.strip())
Best practice
- Use
with open(...)to auto-close files. - Use
for line in filefor memory-efficient reading. - Use
strip()orrstrip()when needed. - Validate lines before converting them.
Avoid for huge files
file.read()
file.readlines()
Useful patterns
Count lines:
count = 0
with open("file.txt", "r", encoding="utf-8") as file:
for _ in file:
count += 1
Track line numbers:
FAQ
Why is reading line by line more memory-efficient in Python?
Because Python only keeps a small portion of the file in memory at a time instead of storing the entire file contents.
Is for line in file better than readlines() for large files?
Yes. for line in file streams the file, while readlines() loads all lines into memory.
Can Python handle a 1 GB text file?
Yes, if you process it incrementally. Reading it line by line is a common and safe approach.
Should I use readline() or a for loop?
Usually use a for loop. It is cleaner and is the standard Python style for iterating through a file.
What if the file has invalid text encoding?
Pass the correct encoding to open(). If needed, handle decoding errors with options such as errors="ignore" or errors="replace".
Does line-by-line reading mean only one line is ever in memory?
In practice, Python uses internal buffering, but memory usage stays low compared with reading the entire file.
What if one line in the file is extremely long?
Mini Project
Description
Build a small log scanner that reads a large log file line by line and extracts useful information without loading the whole file into memory. This demonstrates the most common real-world use of streaming file input: searching, counting, and filtering large text files safely.
Goal
Create a Python script that scans a log file, counts matching lines, and writes those matches to a separate output file.
Requirements
- Read the input file line by line using a memory-efficient approach.
- Count how many lines contain the word
ERROR. - Write matching lines to a new file named
errors.txt. - Print the total number of matching lines at the end.
Keep learning
Related questions
Are PDO Prepared Statements Enough to Prevent SQL Injection in PHP?
Learn how PDO prepared statements prevent SQL injection in PHP, what they protect, and the mistakes that still leave MySQL apps vulnerable.
Can You Bind an Array to an IN Clause in PHP PDO?
Learn how PDO handles placeholders in IN() clauses, why arrays cannot be bound directly, and the safe PHP pattern to build dynamic queries.
Choosing the Right MySQL Collation for PHP and UTF-8
Learn how MySQL character sets and collations work with PHP, and how to choose a practical UTF-8 setup for web applications.