Pandas csv

WHAT IS A CSV FILE ?
• CSV files are used to store a large number of variables – or data.
• Incredibly simplified spreadsheets – think Excel – only the content is stored in plaintext.
• The CSV module is a built-in function that allows Python to parse these types of files.
• The text inside a CSV file is laid out in rows, and each of those has columns, all separated
by commas.
• Every line in the file is a row in the spreadsheet, while the commas are used to define and
separate cells.

CSV MODULE
• The csv module is useful for working with data exported from
spreadsheets and databases into text files formatted with fields and
records, commonly referred to as comma-separated value (CSV) format
because commas are often used to separate the fields in a record.
• If you want to import or export spreadsheets and databases for use in
the Python interpreter, you must rely on the CSV module, or Comma
Separated Values format.

STEPS
• First, save the excel file with ‘.csv’ extension .
• Second, save the csv file in same folder where the python file is there.
• And then write the code for reading and writing of the csv file.

READING A CSV FILE
• There are two ways to read a CSV file.
• You can use the csv module’s reader function or you can use the
DictReader class.
• Using DictReader class:
• Here we have open the csv file ‘mpg.csv’
and try to open the file and read the file
using DictReader() class.
• DictReader() is used to output the data in
dictionary format.
Here, m[:3] prints the first three row from
starting.

READING A CSV FILE
• USING READER() CLASS:
Here , we read the code using the reader() class which separate
the row and column value with comma.
Output:

Writing a CSV File
• The csv module also has two methods that you can use to write a CSV file,
you can use the writer function or the DictWriter class.
• USING DictWriter() CLASS:

LOOPING THROUGH ROWS
The for loop which defines that for the following indented lines, the row variable should contain each element
from the list, and the second line which will print this row variable.
We can open the csv file using open(filename.csv) and
then perform the operation.

LOOPING THROUGH ROWS
In this, we create an empty list ‘model_no’
• After creating empty list we append the
data of row[2] in the list and print the list.
• Once run, this code will print a single list

EXTRACTING INFORMATION FROM CSV FILE
• If you want information about a particular column
then extract it using row[].
• Here in this code, we extract the information about
‘model’ column.

CONVERTING LIST TO SETS IN CSV FILE
Here in this code, ‘set’ function is used to
remove the duplicay of the value and print only
the value once.
First we import the csv module while manipulating
with csv file.

PANDAS
• Pandas is an open source Python library for data analysis.

PANDAS DATA STRUCTURES
Pandas introduces two new data structures to Python :
• Series
• DataFrame

SERIES
• Series is a one-dimensional labelled array capable of holding any data type.
• A Series is a one-dimensional object similar to an array, list, or column in a table.
• It will assign a labelled index to each item in the Series.
• By default, each item will receive an index label from 0 to N, where N is the length of
the Series minus one.

SERIES
CREATE A SERIES WITH AN ARBITRARY LIST
In the output the value in list is arranged in series with the index assigned.
The dtype in output is ‘object’ as the strings is taken as
object data type.
You can arrange the values in the list in series
form using pd.series() data structure.

SERIES
Alternatively , specify an index to use when creating the Series.
In this, we can specify the index of the elements which are in the list and then print it, for naming the index we
use index=[] .

The Series constructor can convert a dictonary as well, using the keys of the
dictionary as its index.
In this, series constructor convert
the dictionary key to use as its
index .

SERIES EXAMPLE
If you want to output the index of the values in the series then use , ‘index’ keyword.

SERIES EXAMPLE
If one of the elements in the series is ‘None’ then in the output it
prints ‘None’ only.
If one of the elements in the series is
‘none’ and all elements are numeric
then it prints in the output as ‘NaN’
(not a number) value.
• NaN is not same as None
keyword.
• In numpy we use isnan() to
check NaN value is there or
not.

QUERYING A SERIES
We can basically query in the series using:
 loc() : used when we query about the label
 iloc() : used when we query the data using numeric value.
When you want to query about the
particular element in series using
numeric position use ‘iloc[]’ .
When you want to query about the
particular element in series using
label use ‘loc[]’ .

DATAFRAME
• A DataFrame is a tabular data structure comprised of rows and columns.
• A DataFrame is defined as a group of Series objects that share an index (the column
names).
• The Pandas data frame consists of three main components: the data, the index, and
the columns.

DATAFRAME EXAMPLE
head() is used to displays the first five records of the
dataset
Here pd.DataFrame() function is used to frame the different series
object and output the result in two-dimensional form.

EXTRACTING VALUES FROM DATAFRAME
To extract the element by label use loc[] attribute.
In this code, we find out the customer come in ‘shop 2’ index.
We can also extract the element if we want
only particular column by their mentioned
index, pass two values in df.loc[] function.

EXTRACTING VALUES FROM DATAFRAME
In this the ‘place’ column is added in the dataframe
. We can add any column using this form.
If we want to display two or more columns along with the index
then we use this form. In this cost and student column is shown
only with all indices.

RENAME A COLUMN NAME
In this , to rename the column we use
‘df.rename(columns={}) ‘ syntax.
In this, we write the column name which have to
rename.
In this, we have to write the new column
name which you want to mention.

INPLACE
• In any method , if inplace is False then operation won’t affect the underlying data.
• If the inplace is True then nothing going to print out
• And it is tip that something is happen in inplace.

DROP
To drop any column we use drop() function which
drop the mentioned column.
In this, we use inplace =True which tell
something is happen in inplace and nothing
prints it.
• Axis=1 is used if we want to drop the column
• Axis=0 is used if we want to drop the row.

QUERYING A DATAFRAME
In this, we want output for the cost>20 value in dataframe and
it returns True or False if it satisfies the condition.
Where() takes the Boolean masking condition,applies it to the
dataframe series and returns a new dataframe of the series of the
shape shape.
Here count() is used to count the occurrence of cost in
dataframe.

FILTER THE ROWS WITH NaN VALUE
Dropna() function is used to remove the row which contain not a
number value.
We can also filter the rows or drop row by using this way of
writing a code.

QUERYING DATAFRAME USING LOGICAL OPERATION
Here in this, &(and) operation is used in the two
condition and output the result if it satisfies the
both condition.
Here in this, |(or) operation is used in the
two condition and output the result if it
satisfies either of the condition.

USE THIS DATA FOR INDEXING A DATAFRAME

INDEXING A DATAFRAME
Index() is used to display the index or rows
of the dataframe.
Set_index() is used to set the column as an index in
the dataframe.
Reset_index() is used to reset the index that is set
using set_index().

HANDLE MISSING VALUES IN PANDAS
Output:
• Isnull() function returns True for a value if
the value is null otherwise returns False.
• Tail() function is used to display the last five
column from the data.

Output :
Notnull() function returns True if the value is not
null and False when value is null.

Fillna() is used to fill the missing values in csv file
to some value named to it. In this , ‘Various’ is
used to fill the missing values.
Output:

GROUPBY
• groupby function is used anytime when u want to analyse panda series by
some category.
Census.csv is a csv file.
In this line of code, we want to find the mean of the BIRTHS2012 column
for each CTYNAME column.

GROUPBY EXAMPLE
In this code, if we want to find out the mean of BIRTHS2012 column wrt city
name ‘Ada county’ then use this way .
Output:

GROUPBY EXAMPLE
In this line of code, if you want to
calculate the mean over across all the
column for each CTYNAME, then use this.

AGG() Function
• agg() function allow to specify multiple aggregation function at once.
In this line of code, agg() function is used to aggregate the
value for count,min,max,mean.

NOMINAL SCALES EXAMPLE
Output:
.astype() simply convert the datatype of one form to another.

ORDINAL SCALES EXAMPLE
Output:
If we want to arrange the resulting data in ordered
form, then ordered attribute is used.

SCALES EXAMPLE
Here, the dtype return is of object type.
Here, the dtype return is of category type as we
change the dtype ‘object’ to category using astype.

PIVOT TABLE
• To give a better representation where the columns are the unique variables
and an index of dates identifies individual observations.
• To reshape the data into this form, use the pivot function
OUTPUT:

PIVOT TABLE
Here, we can use the aggfunc=[] and pass a number of
aggregate operations you want to apply on.

DATA FUNCTIONAITY IN PANDAS
• Timestamp:
• Period : represents a single time span.

DATA FUNCTIONAITY IN PANDAS
DatetimeIndex: is the index of the timestamp
PeriodIndex: is the index of the period
In this ,(‘abc’) is the index
assigned to timestamp value.
In this ,(‘abc’) is the index
assigned to period value.

CONVERTING TO DATETIME
To convert into datetime format
use ‘to _ datetime()’ .

TIMEDELTAS
• TIMEDELTAS: differences in time
In this, we find the difference between the two
timestamps.

MERGING DATAFRAMES
Use this dataset to merge the
dataframes

OUTER JOIN
Merge() function is use dto merge
the two dataframes .

Pandas csv

More Related Content

What's hot (20)

Similar to Pandas csv (20)

Recently uploaded (20)

Pandas csv