Accessing Data with loc: Label-Based Indexing in Pandas

Accessing Data with loc: Label-Based Indexing in Pandas

Welcome to the next article of the Pandas Data Structures module!

In this article, we focus on .loc, one of Pandas' most powerful methods for accessing and manipulating data. Whether you're working with rows, columns, or applying conditional filters, .loc is an intuitive and versatile tool.


What You’ll Learn in This Edition:

  • Understanding .loc: Learn what .loc is and how it compares to other indexing tools.
  • Basic Usage: Select rows and columns by labels.
  • Conditional Filtering: Filter rows based on one or multiple conditions.
  • Data Updates: Efficiently update specific rows or columns, such as handling missing values.

Let’s explore .loc with practical examples using real-world datasets!


What is .loc?

.loc is a label-based indexer in Pandas. This means you can use row and column labels to access specific parts of a DataFrame. Additionally, .loc supports conditional filtering, making it incredibly useful for analyzing data.

Let’s explore its functionality using two real-world datasets:

  • Toyota Sales Data
  • Sales Representatives Data


You can download the datasets from the following GitHub link: GitHub Datasets

Loading the Data

Start by loading the datasets:

import pandas as pd

# Load the Toyota sales and Sales Representatives datasets
toyota_data = pd.read_csv("data/car_sales/toyota_sales_data.csv")
sales_reps_data = pd.read_csv("data/car_sales/sales_reps_data.csv")

# Preview the Toyota Sales data
toyota_data.head()        

With these datasets loaded, we’re ready to explore .loc in action.


Basic Usage of .loc

Selecting Rows and Columns by Labels.

To extract specific rows and columns using .loc, specify the row and column labels in square brackets.

Selecting Specific Rows

# Select the first 5 rows
toyota_selection = toyota_data.loc[0:4]
toyota_selection        

Selecting Specific Rows and Columns

# Select specific columns for the first 10 rows
toyota_selection = toyota_data.loc[0:9, ['sale_rep_id', 'sale_date', 'car_model', 'sale_amount']]        

Conditional Filtering with .loc

One of .loc’s strengths is its ability to filter data based on conditions.

Filtering Rows by a Condition

Let’s find all Toyota car models with sales greater than $30,000:

# Filter rows where sale_amount > 30,000
high_sales = toyota_data.loc[toyota_data['sale_amount'] > 30000]        

Refining Results with Column Selection

To display specific columns for the filtered data:

# Select specific columns for sales > 30,000
high_sales_models = toyota_data.loc[
    toyota_data['sale_amount'] > 30000,
    ['sale_rep_id', 'car_model', 'sale_date', 'sale_amount']
]        

Combining Multiple Conditions

Find all sales where the sale amount is greater than $30,000 and the car model is "RAV4":

# Combine conditions using &
high_sales_models = toyota_data.loc[
    (toyota_data['sale_amount'] > 30000) & (toyota_data['car_model'] == 'RAV4'),
    ['sale_rep_id', 'car_model', 'sale_date', 'sale_amount']
]        

Updating Data with .loc

You can also use .loc to update specific rows or columns.

Handling Missing Values

Suppose the commission_pct column has missing values. You can set these to 0:

# Update null values in the commission_pct column
toyota_data.loc[toyota_data['commission_pct'].isnull(), 'commission_pct'] = 0.0        

Verify the changes:

# Check the updated column
toyota_data['commission_pct'].isnull().sum()  # Output: 0        

🔗 Practice Assignment

💡 Want to practice? Attempt the Accessing Data with loc — Label-Based Indexing Assignment 👉 Click here.


What’s Next?

In the next article, we’ll explore .iloc, which provides Pandas iloc: position-based indexing for DataFrames. You’ll learn how .iloc complements .loc and adds another powerful dimension to your data manipulation toolkit.


Click 👇 to Enroll in the Python for Beginners: Learn Python with Hands-on Projects. It only costs $10 and you can reach out to us for $10 Coupon.

Conclusion

In this article, we explored the basics of .loc and demonstrated how to:

  • Select rows and columns using labels.
  • Filter data with conditional queries.
  • Combine multiple conditions for advanced queries.
  • Update specific rows or columns.

.loc is an indispensable tool for working with labeled data in Pandas. Whether you’re cleaning data, running analyses, or updating records, it simplifies the process and ensures accuracy.


Engage with Us

Authored by Siva Kalyan Geddada, Abhinav Sai Penmetsa

🔄 Share this newsletter with anyone interested in Python, data engineering, or data analysis.

💬 Comments and questions are welcome—let's make this a collaborative learning experience!

Thank you for reading. Stay tuned for the next article, where we start exploring .iloc, which provides position-based indexing for DataFrames.


To view or add a comment, sign in

Others also viewed

Explore topics