Accessing Data with loc: Label-Based Indexing in Pandas
Welcome to the next article of the Pandas Data Structures module!
In this article, we focus on .loc, one of Pandas' most powerful methods for accessing and manipulating data. Whether you're working with rows, columns, or applying conditional filters, .loc is an intuitive and versatile tool.
What You’ll Learn in This Edition:
Let’s explore .loc with practical examples using real-world datasets!
What is .loc?
.loc is a label-based indexer in Pandas. This means you can use row and column labels to access specific parts of a DataFrame. Additionally, .loc supports conditional filtering, making it incredibly useful for analyzing data.
Let’s explore its functionality using two real-world datasets:
You can download the datasets from the following GitHub link: GitHub Datasets
Loading the Data
Start by loading the datasets:
import pandas as pd
# Load the Toyota sales and Sales Representatives datasets
toyota_data = pd.read_csv("data/car_sales/toyota_sales_data.csv")
sales_reps_data = pd.read_csv("data/car_sales/sales_reps_data.csv")
# Preview the Toyota Sales data
toyota_data.head()
With these datasets loaded, we’re ready to explore .loc in action.
Basic Usage of .loc
Selecting Rows and Columns by Labels.
To extract specific rows and columns using .loc, specify the row and column labels in square brackets.
Selecting Specific Rows
# Select the first 5 rows
toyota_selection = toyota_data.loc[0:4]
toyota_selection
Selecting Specific Rows and Columns
# Select specific columns for the first 10 rows
toyota_selection = toyota_data.loc[0:9, ['sale_rep_id', 'sale_date', 'car_model', 'sale_amount']]
Conditional Filtering with .loc
One of .loc’s strengths is its ability to filter data based on conditions.
Filtering Rows by a Condition
Let’s find all Toyota car models with sales greater than $30,000:
# Filter rows where sale_amount > 30,000
high_sales = toyota_data.loc[toyota_data['sale_amount'] > 30000]
Refining Results with Column Selection
To display specific columns for the filtered data:
# Select specific columns for sales > 30,000
high_sales_models = toyota_data.loc[
toyota_data['sale_amount'] > 30000,
['sale_rep_id', 'car_model', 'sale_date', 'sale_amount']
]
Combining Multiple Conditions
Find all sales where the sale amount is greater than $30,000 and the car model is "RAV4":
# Combine conditions using &
high_sales_models = toyota_data.loc[
(toyota_data['sale_amount'] > 30000) & (toyota_data['car_model'] == 'RAV4'),
['sale_rep_id', 'car_model', 'sale_date', 'sale_amount']
]
Updating Data with .loc
You can also use .loc to update specific rows or columns.
Handling Missing Values
Suppose the commission_pct column has missing values. You can set these to 0:
# Update null values in the commission_pct column
toyota_data.loc[toyota_data['commission_pct'].isnull(), 'commission_pct'] = 0.0
Verify the changes:
# Check the updated column
toyota_data['commission_pct'].isnull().sum() # Output: 0
🔗 Practice Assignment
💡 Want to practice? Attempt the Accessing Data with loc — Label-Based Indexing Assignment 👉 Click here.
What’s Next?
In the next article, we’ll explore .iloc, which provides Pandas iloc: position-based indexing for DataFrames. You’ll learn how .iloc complements .loc and adds another powerful dimension to your data manipulation toolkit.
Click 👇 to Enroll in the Python for Beginners: Learn Python with Hands-on Projects. It only costs $10 and you can reach out to us for $10 Coupon.
Conclusion
In this article, we explored the basics of .loc and demonstrated how to:
.loc is an indispensable tool for working with labeled data in Pandas. Whether you’re cleaning data, running analyses, or updating records, it simplifies the process and ensures accuracy.
Engage with Us
✨ Authored by Siva Kalyan Geddada, Abhinav Sai Penmetsa
🔄 Share this newsletter with anyone interested in Python, data engineering, or data analysis.
💬 Comments and questions are welcome—let's make this a collaborative learning experience!
Thank you for reading. Stay tuned for the next article, where we start exploring .iloc, which provides position-based indexing for DataFrames.