How to deal with missing values in a Pandas DataFrame?

This recipe helps you deal with missing values in a Pandas DataFrame

Recipe Objective

In a dataset its very normal that we can get missing values and we can not use that missing values in models. So how to deal with missing values.

So this is the recipe on how we can deal with missing values in a Pandas DataFrame.

Step 1 - Import the library

import pandas as pd import numpy as np

We have imported numpy and pandas which will be needed for the dataset.

Step 2 - Setting up the Data

We have created a dataframe with different features like "first_name", "last_name", "age", "comedy_score" and "Rating_Score". raw_data = {"first_name": ["Sheldon", "Raj", "Leonard", "Howard", "Amy"], "last_name": ["Copper", "Koothrappali", "Hofstadter", "Wolowitz", "Fowler"], "age": [42, 38, np.nan, 41, 35], "Comedy_Score": [9, 7, np.nan, 8, 5], "Rating_Score": [25, 25, 49, np.nan, 70]} df = pd.DataFrame(raw_data, columns = ["first_name", "last_name", "age", "Comedy_Score", "Rating_Score"]) print(df)

Step 3 - Dealing with missing values

Here we will be using different methods to deal with missing values.

    • Droping missing observations

df_no_missing = df.dropna() print(df_no_missing)

    • Droping rows where all cells in that row is NA

df_cleaned = df.dropna(how="all") print(df_cleaned)

    • Creating a new column full of missing values

df3 = df.bfill(); print(df3)

    • Creating a new column full of missing values

df["location"] = np.nan print(df)

    • Droping column if they only contain missing values

print(df.dropna(axis=1, how="all"))

    • Droping rows that contain less than five observations

print(df.dropna(thresh=5))

    • Filling in missing data with zeros

print(df.fillna(0))

    • Filling in missing in Comedy_Score with the mean value of Comedy_Score

df["Comedy_Score"].fillna(df["Comedy_Score"].mean(), inplace=True) print(df)

    • Filling in missing in Comedy_Score with each age’s mean value of Comedy_Score

df["Comedy_Score"].fillna(df.groupby("age")["Comedy_Score"].transform("mean"), inplace=True) print(df)

    • Selecting the rows of df where age is not NaN and age is not NaN

print(df[df["age"].notnull() & df["Rating_Score"].notnull()]) print(df[df["age"].notnull() & df["Rating_Score"].notnull()].fillna(0))

So the output comes as:

  first_name     last_name   age  Comedy_Score  Rating_Score
0    Sheldon        Copper  42.0           9.0          25.0
1        Raj  Koothrappali  38.0           7.0          25.0
2    Leonard    Hofstadter   NaN           NaN          49.0
3     Howard      Wolowitz  41.0           8.0           NaN
4        Amy        Fowler  35.0           5.0          70.0

  first_name     last_name   age  Comedy_Score  Rating_Score
0    Sheldon        Copper  42.0           9.0          25.0
1        Raj  Koothrappali  38.0           7.0          25.0
4        Amy        Fowler  35.0           5.0          70.0

  first_name     last_name   age  Comedy_Score  Rating_Score
0    Sheldon        Copper  42.0           9.0          25.0
1        Raj  Koothrappali  38.0           7.0          25.0
2    Leonard    Hofstadter   NaN           NaN          49.0
3     Howard      Wolowitz  41.0           8.0           NaN
4        Amy        Fowler  35.0           5.0          70.0

  first_name     last_name   age  Comedy_Score  Rating_Score  location
0    Sheldon        Copper  42.0           9.0          25.0       NaN
1        Raj  Koothrappali  38.0           7.0          25.0       NaN
2    Leonard    Hofstadter   NaN           NaN          49.0       NaN
3     Howard      Wolowitz  41.0           8.0           NaN       NaN
4        Amy        Fowler  35.0           5.0          70.0       NaN

  first_name     last_name   age  Comedy_Score  Rating_Score
0    Sheldon        Copper  42.0           9.0          25.0
1        Raj  Koothrappali  38.0           7.0          25.0
2    Leonard    Hofstadter   NaN           NaN          49.0
3     Howard      Wolowitz  41.0           8.0           NaN
4        Amy        Fowler  35.0           5.0          70.0

  first_name     last_name   age  Comedy_Score  Rating_Score  location
0    Sheldon        Copper  42.0           9.0          25.0       NaN
1        Raj  Koothrappali  38.0           7.0          25.0       NaN
4        Amy        Fowler  35.0           5.0          70.0       NaN

  first_name     last_name   age  Comedy_Score  Rating_Score  location
0    Sheldon        Copper  42.0           9.0          25.0       0.0
1        Raj  Koothrappali  38.0           7.0          25.0       0.0
2    Leonard    Hofstadter   0.0           0.0          49.0       0.0
3     Howard      Wolowitz  41.0           8.0           0.0       0.0
4        Amy        Fowler  35.0           5.0          70.0       0.0

  first_name     last_name   age  Comedy_Score  Rating_Score  location
0    Sheldon        Copper  42.0          9.00          25.0       NaN
1        Raj  Koothrappali  38.0          7.00          25.0       NaN
2    Leonard    Hofstadter   NaN          7.25          49.0       NaN
3     Howard      Wolowitz  41.0          8.00           NaN       NaN
4        Amy        Fowler  35.0          5.00          70.0       NaN

  first_name     last_name   age  Comedy_Score  Rating_Score  location
0    Sheldon        Copper  42.0          9.00          25.0       NaN
1        Raj  Koothrappali  38.0          7.00          25.0       NaN
2    Leonard    Hofstadter   NaN          7.25          49.0       NaN
3     Howard      Wolowitz  41.0          8.00           NaN       NaN
4        Amy        Fowler  35.0          5.00          70.0       NaN

  first_name     last_name   age  Comedy_Score  Rating_Score  location
0    Sheldon        Copper  42.0           9.0          25.0       NaN
1        Raj  Koothrappali  38.0           7.0          25.0       NaN
4        Amy        Fowler  35.0           5.0          70.0       NaN

  first_name     last_name   age  Comedy_Score  Rating_Score  location
0    Sheldon        Copper  42.0           9.0          25.0       0.0
1        Raj  Koothrappali  38.0           7.0          25.0       0.0
4        Amy        Fowler  35.0           5.0          70.0       0.0
​


Download Materials


What Users are saying..

profile image

Gautam Vermani

Data Consultant at Confidential
linkedin profile url

Having worked in the field of Data Science, I wanted to explore how I can implement projects in other domains, So I thought of connecting with ProjectPro. A project that helped me absorb this topic... Read More

Relevant Projects

Forecasting Business KPI's with Tensorflow and Python
In this machine learning project, you will use the video clip of an IPL match played between CSK and RCB to forecast key performance indicators like the number of appearances of a brand logo, the frames, and the shortest and longest area percentage in the video.

AWS Project to Build and Deploy LSTM Model with Sagemaker
In this AWS Sagemaker Project, you will learn to build a LSTM model on Sagemaker for sales forecasting while analyzing the impact of weather conditions on Sales.

Natural language processing Chatbot application using NLTK for text classification
In this NLP AI application, we build the core conversational engine for a chatbot. We use the popular NLTK text classification library to achieve this.

Learn How to Build a Linear Regression Model in PyTorch
In this Machine Learning Project, you will learn how to build a simple linear regression model in PyTorch to predict the number of days subscribed.

Build a Speech-Text Transcriptor with Nvidia Quartznet Model
In this Deep Learning Project, you will leverage transfer learning from Nvidia QuartzNet pre-trained models to develop a speech-to-text transcriptor.

Ecommerce product reviews - Pairwise ranking and sentiment analysis
This project analyzes a dataset containing ecommerce product reviews. The goal is to use machine learning models to perform sentiment analysis on product reviews and rank them based on relevance. Reviews play a key role in product recommendation systems.

Learn to Build a Neural network from Scratch using NumPy
In this deep learning project, you will learn to build a neural network from scratch using NumPy

Time Series Python Project using Greykite and Neural Prophet
In this time series project, you will forecast Walmart sales over time using the powerful, fast, and flexible time series forecasting library Greykite that helps automate time series problems.

Image Classification Model using Transfer Learning in PyTorch
In this PyTorch Project, you will build an image classification model in PyTorch using the ResNet pre-trained model.

Image Segmentation using Mask R-CNN with Tensorflow
In this Deep Learning Project on Image Segmentation Python, you will learn how to implement the Mask R-CNN model for early fire detection.