dataframe_operations and various functions

Applying Arithmetic Operations
• Addition, subtraction, multiplication, and division
import pandas as pd
d = {'py_score' : pd.Series([88, 79, 81], index=['a', 'b', 'c']),
'sql_score' : pd.Series([86, 81,78, 88], index=['a', 'b', 'c', 'd']),
'ca_score' : pd.Series([71,95,88], index=['a','b','c’])}
df = pd.DataFrame(d)
print ("Dataframe
is:") print(df)
print("sum of python and sql score")
print(df['py_score'] +
df['sql_score’])
df['total'] =0.4 * df['py_score'] + 0.3 * df['sql_score'] + 0.3 *
df['ca_score'] print(df)

Sorting a Pandas DataFrame
DataFrame can be sorted
with
.sort_values
()
sets the label of the
row or column to
sort by
df.sort_values(by='py_score',
ascending=False)
specifies whether you want to sort in
ascending (True) or descending (False)
order
To sort by multiple columns, then just pass lists as arguments for by and
ascending:
df.sort_values(by=['total', ‘py_score'], ascending=[False, False])
In this case, the DataFrame is sorted by the column total, but if two
values are the same, then their order is determined by the values from
the column py_score.

Filtering Data
filter_score = df['sql_score'] >=
80 filter_score
The
expression
df[filter_score] returns a Pandas DataFrame with the
rows from
df that correspond to True
in
filter_score
Output is a Series filter_score filled with Boolean
data.

Combining logical operations
df[(df['py_score'] >= 80) & (df['sql_score'] >=
80)]

Handling Missing Data
• Pandas usually represents missing data with NaN (not a
number) values.
• Missing Data can occur when no information is provided for one or more
items or for a whole unit.
Checking for missing values using isnull() and notnull()

Filling missing values using fillna()
import pandas as
pd import numpy
as np
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}
# creating a dataframe from
dictionary df = pd.DataFrame(dict)
print(df)
# filling missing value using
fillna() df.fillna(0)
Drop rows with at least one Nan
value

Check for NaN in Pandas DataFrame

Import a csv file in to google colab session
storage

Load Files into a
DataFrame
print(df.to_string())
By default, when you print a DataFrame, you will only get
the first 5 rows, and the last 5 rows.
The head() method returns the headers and a specified number of rows, starting from the
top.
If your data sets are stored in a file, Pandas can load them into a DataFrame.
CSV Files (Comma Separator Value Files )

Data Processing with Pandas DataFrame
import pandas as pd
df=pd.read_csv('data.csv')
print(df.head(3)) # first 3 rows
print(df.tail(6)) # 6 rows from last
print(df['Age'].head()) #to refer the column
Age
# another method df.Age.head()

A common goal with data analysis is to
visualize data
• To do this, we'll
need matplotlib, which is a
popular data visualization
library.
• To do this , execute the
command pip install matplotlib

dataframe_operations and various functions

More Related Content

Similar to dataframe_operations and various functions (20)

More from JayanthiM19 (6)

Recently uploaded (20)

dataframe_operations and various functions