3. Sorting a Pandas DataFrame
DataFrame can be sorted
with
.sort_values
()
sets the label of the
row or column to
sort by
df.sort_values(by='py_score',
ascending=False)
specifies whether you want to sort in
ascending (True) or descending (False)
order
To sort by multiple columns, then just pass lists as arguments for by and
ascending:
df.sort_values(by=['total', ‘py_score'], ascending=[False, False])
In this case, the DataFrame is sorted by the column total, but if two
values are the same, then their order is determined by the values from
the column py_score.
4. Filtering Data
filter_score = df['sql_score'] >=
80 filter_score
The
expression
df[filter_score] returns a Pandas DataFrame with the
rows from
df that correspond to True
in
filter_score
Output is a Series filter_score filled with Boolean
data.
6. Handling Missing Data
• Pandas usually represents missing data with NaN (not a
number) values.
• Missing Data can occur when no information is provided for one or more
items or for a whole unit.
Checking for missing values using isnull() and notnull()
7. Filling missing values using fillna()
import pandas as
pd import numpy
as np
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}
# creating a dataframe from
dictionary df = pd.DataFrame(dict)
print(df)
# filling missing value using
fillna() df.fillna(0)
Drop rows with at least one Nan
value
9. Import a csv file in to google colab session
storage
10. Load Files into a
DataFrame
print(df.to_string())
By default, when you print a DataFrame, you will only get
the first 5 rows, and the last 5 rows.
The head() method returns the headers and a specified number of rows, starting from the
top.
If your data sets are stored in a file, Pandas can load them into a DataFrame.
CSV Files (Comma Separator Value Files )
11. Data Processing with Pandas DataFrame
import pandas as pd
df=pd.read_csv('data.csv')
print(df.head(3)) # first 3 rows
print(df.tail(6)) # 6 rows from last
print(df['Age'].head()) #to refer the column
Age
# another method df.Age.head()
12. A common goal with data analysis is to
visualize data
• To do this, we'll
need matplotlib, which is a
popular data visualization
library.
• To do this , execute the
command pip install matplotlib