SlideShare a Scribd company logo
dataframe_operations and various functions
Applying Arithmetic Operations
• Addition, subtraction, multiplication, and division
import pandas as pd
d = {'py_score' : pd.Series([88, 79, 81], index=['a', 'b', 'c']),
'sql_score' : pd.Series([86, 81,78, 88], index=['a', 'b', 'c', 'd']),
'ca_score' : pd.Series([71,95,88], index=['a','b','c’])}
df = pd.DataFrame(d)
print ("Dataframe
is:") print(df)
print("sum of python and sql score")
print(df['py_score'] +
df['sql_score’])
df['total'] =0.4 * df['py_score'] + 0.3 * df['sql_score'] + 0.3 *
df['ca_score'] print(df)
Sorting a Pandas DataFrame
DataFrame can be sorted
with
.sort_values
()
sets the label of the
row or column to
sort by
df.sort_values(by='py_score',
ascending=False)
specifies whether you want to sort in
ascending (True) or descending (False)
order
To sort by multiple columns, then just pass lists as arguments for by and
ascending:
df.sort_values(by=['total', ‘py_score'], ascending=[False, False])
In this case, the DataFrame is sorted by the column total, but if two
values are the same, then their order is determined by the values from
the column py_score.
Filtering Data
filter_score = df['sql_score'] >=
80 filter_score
The
expression
df[filter_score] returns a Pandas DataFrame with the
rows from
df that correspond to True
in
filter_score
Output is a Series filter_score filled with Boolean
data.
Combining logical operations
df[(df['py_score'] >= 80) & (df['sql_score'] >=
80)]
Handling Missing Data
• Pandas usually represents missing data with NaN (not a
number) values.
• Missing Data can occur when no information is provided for one or more
items or for a whole unit.
Checking for missing values using isnull() and notnull()
Filling missing values using fillna()
import pandas as
pd import numpy
as np
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}
# creating a dataframe from
dictionary df = pd.DataFrame(dict)
print(df)
# filling missing value using
fillna() df.fillna(0)
Drop rows with at least one Nan
value
Check for NaN in Pandas DataFrame
Import a csv file in to google colab session
storage
Load Files into a
DataFrame
print(df.to_string())
By default, when you print a DataFrame, you will only get
the first 5 rows, and the last 5 rows.
The head() method returns the headers and a specified number of rows, starting from the
top.
If your data sets are stored in a file, Pandas can load them into a DataFrame.
CSV Files (Comma Separator Value Files )
Data Processing with Pandas DataFrame
import pandas as pd
df=pd.read_csv('data.csv')
print(df.head(3)) # first 3 rows
print(df.tail(6)) # 6 rows from last
print(df['Age'].head()) #to refer the column
Age
# another method df.Age.head()
A common goal with data analysis is to
visualize data
• To do this, we'll
need matplotlib, which is a
popular data visualization
library.
• To do this , execute the
command pip install matplotlib
dataframe_operations and various functions

More Related Content

PPTX
Pandas csv
PDF
pandas dataframe notes.pdf
PPTX
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
PPTX
Python libraries for analysis Pandas.pptx
PPTX
Group B - Pandas Pandas is a powerful Python library that provides high-perfo...
PPTX
Pandas in Programming (python) presentation
PPTX
Pandas in Programming (Python) Presentation
PPTX
Lecture 3 intro2data
Pandas csv
pandas dataframe notes.pdf
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
Python libraries for analysis Pandas.pptx
Group B - Pandas Pandas is a powerful Python library that provides high-perfo...
Pandas in Programming (python) presentation
Pandas in Programming (Python) Presentation
Lecture 3 intro2data

Similar to dataframe_operations and various functions (20)

PPTX
introduction to data structures in pandas
PPTX
Handling Missing Data for Data Analysis.pptx
PDF
Pandas in Python for Data Exploration .pdf
PPTX
python-pandas-For-Data-Analysis-Manipulate.pptx
PPTX
Lecture 9.pptx
PPTX
pandas directories on the python language.pptx
PDF
Pandas in Depth_ Data Manipultion(Chapter 5)(Important).pdf
PPTX
Pwf-ujpynbnffffffffffffffffffffffffffffffffandas2.pptx
PPTX
2. Data Preprocessing with Numpy and Pandas.pptx
PPTX
Data Frame Data structure in Python pandas.pptx
PPTX
Presentation on Pandas in _ detail .pptx
PPTX
interenship.pptx
PPT
Pandas-and-NumPy-Powerful-Tools-for-Data-Analysis (1).ppt
PDF
Data Analysis with Pandas CheatSheet .pdf
PDF
Data science using python, Data Preprocessing
PDF
Data Analytics ,Data Preprocessing What is Data Preprocessing?
PPTX
Pandas Dataframe reading data Kirti final.pptx
PDF
Panda data structures and its importance in Python.pdf
PPTX
Unit 1 Ch 2 Data Frames digital vis.pptx
PPTX
Pandas easy to ;learn l;ibrary ffff.pptx
introduction to data structures in pandas
Handling Missing Data for Data Analysis.pptx
Pandas in Python for Data Exploration .pdf
python-pandas-For-Data-Analysis-Manipulate.pptx
Lecture 9.pptx
pandas directories on the python language.pptx
Pandas in Depth_ Data Manipultion(Chapter 5)(Important).pdf
Pwf-ujpynbnffffffffffffffffffffffffffffffffandas2.pptx
2. Data Preprocessing with Numpy and Pandas.pptx
Data Frame Data structure in Python pandas.pptx
Presentation on Pandas in _ detail .pptx
interenship.pptx
Pandas-and-NumPy-Powerful-Tools-for-Data-Analysis (1).ppt
Data Analysis with Pandas CheatSheet .pdf
Data science using python, Data Preprocessing
Data Analytics ,Data Preprocessing What is Data Preprocessing?
Pandas Dataframe reading data Kirti final.pptx
Panda data structures and its importance in Python.pdf
Unit 1 Ch 2 Data Frames digital vis.pptx
Pandas easy to ;learn l;ibrary ffff.pptx
Ad

More from JayanthiM19 (6)

PPTX
Type casting : Ip_ op_Typeconversions.pptx
PPTX
Various datatypes_operators supported in python
PPTX
cascading style sheets- About cascading style sheets on the selectors
PPTX
Method parameters in C# - All types of parameter passing in C #
PPTX
arrays in c# including Classes handling arrays
PDF
PACKAGES, MULTITHREADED PROGRAMMING & MANAGING ERRORS AND EXCEPTIONS in java
Type casting : Ip_ op_Typeconversions.pptx
Various datatypes_operators supported in python
cascading style sheets- About cascading style sheets on the selectors
Method parameters in C# - All types of parameter passing in C #
arrays in c# including Classes handling arrays
PACKAGES, MULTITHREADED PROGRAMMING & MANAGING ERRORS AND EXCEPTIONS in java
Ad

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Electronic commerce courselecture one. Pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Approach and Philosophy of On baking technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
cuic standard and advanced reporting.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Big Data Technologies - Introduction.pptx
Encapsulation theory and applications.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Electronic commerce courselecture one. Pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Understanding_Digital_Forensics_Presentation.pptx
Machine learning based COVID-19 study performance prediction
Advanced methodologies resolving dimensionality complications for autism neur...
“AI and Expert System Decision Support & Business Intelligence Systems”
Dropbox Q2 2025 Financial Results & Investor Presentation
Approach and Philosophy of On baking technology
20250228 LYD VKU AI Blended-Learning.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Unlocking AI with Model Context Protocol (MCP)
cuic standard and advanced reporting.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Big Data Technologies - Introduction.pptx

dataframe_operations and various functions

  • 2. Applying Arithmetic Operations • Addition, subtraction, multiplication, and division import pandas as pd d = {'py_score' : pd.Series([88, 79, 81], index=['a', 'b', 'c']), 'sql_score' : pd.Series([86, 81,78, 88], index=['a', 'b', 'c', 'd']), 'ca_score' : pd.Series([71,95,88], index=['a','b','c’])} df = pd.DataFrame(d) print ("Dataframe is:") print(df) print("sum of python and sql score") print(df['py_score'] + df['sql_score’]) df['total'] =0.4 * df['py_score'] + 0.3 * df['sql_score'] + 0.3 * df['ca_score'] print(df)
  • 3. Sorting a Pandas DataFrame DataFrame can be sorted with .sort_values () sets the label of the row or column to sort by df.sort_values(by='py_score', ascending=False) specifies whether you want to sort in ascending (True) or descending (False) order To sort by multiple columns, then just pass lists as arguments for by and ascending: df.sort_values(by=['total', ‘py_score'], ascending=[False, False]) In this case, the DataFrame is sorted by the column total, but if two values are the same, then their order is determined by the values from the column py_score.
  • 4. Filtering Data filter_score = df['sql_score'] >= 80 filter_score The expression df[filter_score] returns a Pandas DataFrame with the rows from df that correspond to True in filter_score Output is a Series filter_score filled with Boolean data.
  • 5. Combining logical operations df[(df['py_score'] >= 80) & (df['sql_score'] >= 80)]
  • 6. Handling Missing Data • Pandas usually represents missing data with NaN (not a number) values. • Missing Data can occur when no information is provided for one or more items or for a whole unit. Checking for missing values using isnull() and notnull()
  • 7. Filling missing values using fillna() import pandas as pd import numpy as np # dictionary of lists dict = {'First Score':[100, 90, np.nan, 95], 'Second Score': [30, 45, 56, np.nan], 'Third Score':[np.nan, 40, 80, 98]} # creating a dataframe from dictionary df = pd.DataFrame(dict) print(df) # filling missing value using fillna() df.fillna(0) Drop rows with at least one Nan value
  • 8. Check for NaN in Pandas DataFrame
  • 9. Import a csv file in to google colab session storage
  • 10. Load Files into a DataFrame print(df.to_string()) By default, when you print a DataFrame, you will only get the first 5 rows, and the last 5 rows. The head() method returns the headers and a specified number of rows, starting from the top. If your data sets are stored in a file, Pandas can load them into a DataFrame. CSV Files (Comma Separator Value Files )
  • 11. Data Processing with Pandas DataFrame import pandas as pd df=pd.read_csv('data.csv') print(df.head(3)) # first 3 rows print(df.tail(6)) # 6 rows from last print(df['Age'].head()) #to refer the column Age # another method df.Age.head()
  • 12. A common goal with data analysis is to visualize data • To do this, we'll need matplotlib, which is a popular data visualization library. • To do this , execute the command pip install matplotlib