Question
How can I select rows from a Pandas DataFrame based on values in a specific column?
In SQL, this would typically look like:
SELECT *
FROM table
WHERE column_name = some_value;
What is the Pandas equivalent for filtering rows by one or more column values?
Short Answer
By the end of this page, you will understand how to filter rows in a Pandas DataFrame using column-based conditions. You will learn boolean indexing, how to combine multiple conditions, when to use .loc, and common mistakes to avoid when writing filters in Pandas.
Concept
In Pandas, selecting rows based on column values is usually done with boolean indexing. This means you create a condition that returns True or False for each row, then use that condition to keep only the rows where the condition is True.
For example, if a column contains names, prices, or statuses, you can ask Pandas questions like:
- "Keep rows where
statusis'active'" - "Keep rows where
ageis greater than18" - "Keep rows where
cityis'London'andscoreis above80"
This matters because filtering is one of the most common data operations in real programming. You use it when:
- cleaning datasets
- preparing reports
- selecting API results
- analyzing logs
- finding records that match business rules
If you know SQL, the mental mapping is simple:
- SQL
WHERE-> Pandas boolean condition - SQL
SELECT * FROM table WHERE ...->df[df["column"] == value]
Mental Model
Think of a DataFrame as a spreadsheet and a filter condition as a yes/no test applied to each row.
Each row is asked a question:
- Does this row have
country == "USA"? - Is
price > 100? - Is
status == "paid"?
If the answer is:
True-> keep the rowFalse-> remove the row
So filtering in Pandas is like placing a transparent filter sheet over a table and only letting rows through if they pass the rule.
Syntax and Examples
Basic syntax
df[df["column_name"] == some_value]
This creates a boolean Series such as:
0 True
1 False
2 True
dtype: bool
Pandas then returns only the rows where the value is True.
Example: exact match
import pandas as pd
df = pd.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"age": [25, 30, 35],
"city": ["London", "Paris", "London"]
})
result = df[df["city"] == "London"]
print(result)
Output:
name age city
0 Alice 25 London
2 Charlie London
Step by Step Execution
Consider this example:
import pandas as pd
df = pd.DataFrame({
"product": ["Book", "Pen", "Notebook", "Pencil"],
"price": [12, 3, 8, 2]
})
filtered = df[df["price"] > 5]
print(filtered)
Step 1: Create the DataFrame
The DataFrame looks like this:
product price
0 Book 12
1 Pen 3
2 Notebook 8
3 Pencil 2
Step 2: Evaluate the condition
df["price"] > 5
This produces:
0 True
1 False
2 True
Name: price, dtype:
Real World Use Cases
Filtering rows by column values appears everywhere in data work and application code.
Common examples
- E-commerce: find orders where
status == "shipped" - Analytics: keep users where
country == "US" - Finance: select transactions where
amount > 1000 - Logging: extract rows where
level == "ERROR" - HR systems: find employees where
department == "Engineering" - APIs: keep only records where
active == True
Example: filter successful API responses
logs = pd.DataFrame({
"endpoint": ["/users", "/orders", "/users"],
"status_code": [200, 500, 200]
})
success = logs[logs["status_code"] == 200]
Example: high-value transactions
transactions = pd.DataFrame({
"user": ["A", "B", ],
: [, , ]
})
large = transactions[transactions[] >= ]
Real Codebase Usage
In real projects, developers rarely filter just once. They often build filtering into reusable data-processing steps.
Common patterns
Guarded filtering
Check that a column exists before filtering:
if "status" in df.columns:
df = df[df["status"] == "active"]
Validation before filtering
Make sure the data type is correct:
df["age"] = pd.to_numeric(df["age"], errors="coerce")
adults = df[df["age"] >= 18]
Reusable filters
def filter_active_users(df):
return df[df["active"] == True]
Filter and select columns together with .loc
result = df.loc[df["status"] == "active", ["name", "email"]]
Chained business rules
Common Mistakes
1. Using = instead of ==
Broken code:
# Wrong
result = df[df["city"] = "London"]
Why it fails:
=assigns a value==compares values
Correct version:
result = df[df["city"] == "London"]
2. Using and or or instead of & or |
Broken code:
# Wrong
result = df[(df["city"] == "London") and (df["age"] > 30)]
Why it fails:
andandorwork with single boolean values
Comparisons
Common filtering approaches in Pandas
| Approach | Example | Best for | Notes |
|---|---|---|---|
| Boolean indexing | df[df["age"] > 18] | Simple row filtering | Most common approach |
.loc | df.loc[df["age"] > 18] | Filtering rows clearly, optionally selecting columns too | Very readable |
.isin() | df[df["city"].isin(["London", "Paris"])] | Matching one of several values | Similar to SQL IN |
.query() | df.query("age > 18") |
Cheat Sheet
Quick syntax
df[df["column"] == value]
df[df["column"] > value]
df[(df["a"] == 1) & (df["b"] > 2)]
df[(df["a"] == 1) | (df["b"] > 2)]
df[df["column"].isin([value1, value2])]
df[df["column"].isna()]
df[df["column"].notna()]
df.loc[df["column"] == value, ["col1", "col2"]]
Rules to remember
- Use
==for equality - Use
&for AND - Use
|for OR - Put each condition in parentheses
- Use
.isin()for multiple possible matches - Use
.isna()for missing values - Use
.locwhen filtering rows and choosing columns together
SQL mapping
SELECT * FROM table WHERE column_name some_value
FAQ
How do I filter rows in Pandas by column value?
Use boolean indexing:
df[df["column_name"] == value]
What is the Pandas equivalent of SQL WHERE?
The closest equivalent is a boolean condition inside df[...] or .loc[...].
df[df["age"] > 18]
How do I filter by multiple conditions in Pandas?
Use & for AND and | for OR, with parentheses around each condition.
df[(df["city"] == "London") & (df["age"] > 30)]
How do I select rows where a column matches one of several values?
Use .isin():
df[df["city"].isin(["London", "Paris"])]
Should I use df[...] or ?
Mini Project
Description
Build a small Pandas script that filters an orders dataset the way a real reporting task would. This project demonstrates how to select rows by exact value, by numeric comparison, and by multiple conditions together.
Goal
Create a filtered report of orders that keeps only high-value shipped orders.
Requirements
- Create a DataFrame with order ID, customer, status, and total columns.
- Filter rows where the status is
"shipped". - Filter rows where the total is greater than or equal to
100. - Combine both conditions to return only shipped orders worth at least
100. - Print the final filtered DataFrame.
Keep learning
Related questions
@staticmethod vs @classmethod in Python Explained
Learn the difference between @staticmethod and @classmethod in Python with clear examples, use cases, mistakes, and a mini project.
Catch Multiple Exceptions in One except Block in Python
Learn how to catch multiple exceptions in one Python except block using tuples, with examples, mistakes, and real-world usage.
Convert Bytes to String in Python 3
Learn how to convert bytes to str in Python 3 using decode(), text mode, and proper encodings with practical examples.