SlideShare a Scribd company logo
CSV Files
WHAT IS A CSV FILE ?
• CSV files are used to store a large number of variables – or data.
• Incredibly simplified spreadsheets – think Excel – only the content is stored in plaintext.
• The CSV module is a built-in function that allows Python to parse these types of files.
• The text inside a CSV file is laid out in rows, and each of those has columns, all separated
by commas.
• Every line in the file is a row in the spreadsheet, while the commas are used to define and
separate cells.
CSV MODULE
• The csv module is useful for working with data exported from
spreadsheets and databases into text files formatted with fields and
records, commonly referred to as comma-separated value (CSV) format
because commas are often used to separate the fields in a record.
• If you want to import or export spreadsheets and databases for use in
the Python interpreter, you must rely on the CSV module, or Comma
Separated Values format.
STEPS
• First, save the excel file with ‘.csv’ extension .
• Second, save the csv file in same folder where the python file is there.
• And then write the code for reading and writing of the csv file.
READING A CSV FILE
• There are two ways to read a CSV file.
• You can use the csv module’s reader function or you can use the
DictReader class.
• Using DictReader class:
• Here we have open the csv file ‘mpg.csv’
and try to open the file and read the file
using DictReader() class.
• DictReader() is used to output the data in
dictionary format.
Here, m[:3] prints the first three row from
starting.
READING A CSV FILE
• USING READER() CLASS:
Here , we read the code using the reader() class which separate
the row and column value with comma.
Output:
Writing a CSV File
• The csv module also has two methods that you can use to write a CSV file,
you can use the writer function or the DictWriter class.
• USING DictWriter() CLASS:
LOOPING THROUGH ROWS
The for loop which defines that for the following indented lines, the row variable should contain each element
from the list, and the second line which will print this row variable.
We can open the csv file using open(filename.csv) and
then perform the operation.
LOOPING THROUGH ROWS
In this, we create an empty list ‘model_no’
• After creating empty list we append the
data of row[2] in the list and print the list.
• Once run, this code will print a single list
EXTRACTING INFORMATION FROM CSV FILE
• If you want information about a particular column
then extract it using row[].
• Here in this code, we extract the information about
‘model’ column.
CONVERTING LIST TO SETS IN CSV FILE
Here in this code, ‘set’ function is used to
remove the duplicay of the value and print only
the value once.
First we import the csv module while manipulating
with csv file.
PANDAS
• Pandas is an open source Python library for data analysis.
PANDAS DATA STRUCTURES
Pandas introduces two new data structures to Python :
• Series
• DataFrame
SERIES
SERIES
• Series is a one-dimensional labelled array capable of holding any data type.
• A Series is a one-dimensional object similar to an array, list, or column in a table.
• It will assign a labelled index to each item in the Series.
• By default, each item will receive an index label from 0 to N, where N is the length of
the Series minus one.
SERIES
CREATE A SERIES WITH AN ARBITRARY LIST
In the output the value in list is arranged in series with the index assigned.
The dtype in output is ‘object’ as the strings is taken as
object data type.
You can arrange the values in the list in series
form using pd.series() data structure.
SERIES
Alternatively , specify an index to use when creating the Series.
In this, we can specify the index of the elements which are in the list and then print it, for naming the index we
use index=[] .
The Series constructor can convert a dictonary as well, using the keys of the
dictionary as its index.
In this, series constructor convert
the dictionary key to use as its
index .
SERIES EXAMPLE
If you want to output the index of the values in the series then use , ‘index’ keyword.
SERIES EXAMPLE
If one of the elements in the series is ‘None’ then in the output it
prints ‘None’ only.
If one of the elements in the series is
‘none’ and all elements are numeric
then it prints in the output as ‘NaN’
(not a number) value.
• NaN is not same as None
keyword.
• In numpy we use isnan() to
check NaN value is there or
not.
QUERYING A SERIES
We can basically query in the series using:
 loc() : used when we query about the label
 iloc() : used when we query the data using numeric value.
When you want to query about the
particular element in series using
numeric position use ‘iloc[]’ .
When you want to query about the
particular element in series using
label use ‘loc[]’ .
Pandas csv
Pandas csv
Pandas csv
DATAFRAME
DATAFRAME
• A DataFrame is a tabular data structure comprised of rows and columns.
• A DataFrame is defined as a group of Series objects that share an index (the column
names).
• The Pandas data frame consists of three main components: the data, the index, and
the columns.
DATAFRAME EXAMPLE
head() is used to displays the first five records of the
dataset
Here pd.DataFrame() function is used to frame the different series
object and output the result in two-dimensional form.
EXTRACTING VALUES FROM DATAFRAME
To extract the element by label use loc[] attribute.
In this code, we find out the customer come in ‘shop 2’ index.
We can also extract the element if we want
only particular column by their mentioned
index, pass two values in df.loc[] function.
EXTRACTING VALUES FROM DATAFRAME
In this the ‘place’ column is added in the dataframe
. We can add any column using this form.
If we want to display two or more columns along with the index
then we use this form. In this cost and student column is shown
only with all indices.
RENAME A COLUMN NAME
In this , to rename the column we use
‘df.rename(columns={}) ‘ syntax.
In this, we write the column name which have to
rename.
In this, we have to write the new column
name which you want to mention.
INPLACE
• In any method , if inplace is False then operation won’t affect the underlying data.
• If the inplace is True then nothing going to print out
• And it is tip that something is happen in inplace.
DROP
To drop any column we use drop() function which
drop the mentioned column.
In this, we use inplace =True which tell
something is happen in inplace and nothing
prints it.
• Axis=1 is used if we want to drop the column
• Axis=0 is used if we want to drop the row.
QUERYING A DATAFRAME
In this, we want output for the cost>20 value in dataframe and
it returns True or False if it satisfies the condition.
Where() takes the Boolean masking condition,applies it to the
dataframe series and returns a new dataframe of the series of the
shape shape.
Here count() is used to count the occurrence of cost in
dataframe.
FILTER THE ROWS WITH NaN VALUE
Dropna() function is used to remove the row which contain not a
number value.
We can also filter the rows or drop row by using this way of
writing a code.
QUERYING DATAFRAME USING LOGICAL OPERATION
Here in this, &(and) operation is used in the two
condition and output the result if it satisfies the
both condition.
Here in this, |(or) operation is used in the
two condition and output the result if it
satisfies either of the condition.
USE THIS DATA FOR INDEXING A DATAFRAME
INDEXING A DATAFRAME
Index() is used to display the index or rows
of the dataframe.
Set_index() is used to set the column as an index in
the dataframe.
Reset_index() is used to reset the index that is set
using set_index().
HANDLE MISSING VALUES IN PANDAS
Output:
• Isnull() function returns True for a value if
the value is null otherwise returns False.
• Tail() function is used to display the last five
column from the data.
HANDLE MISSING VALUES IN PANDAS
Output :
Notnull() function returns True if the value is not
null and False when value is null.
HANDLE MISSING VALUES IN PANDAS
Fillna() is used to fill the missing values in csv file
to some value named to it. In this , ‘Various’ is
used to fill the missing values.
Output:
GROUPBY
GROUPBY
• groupby function is used anytime when u want to analyse panda series by
some category.
Census.csv is a csv file.
In this line of code, we want to find the mean of the BIRTHS2012 column
for each CTYNAME column.
GROUPBY EXAMPLE
In this code, if we want to find out the mean of BIRTHS2012 column wrt city
name ‘Ada county’ then use this way .
Output:
GROUPBY EXAMPLE
In this line of code, if you want to
calculate the mean over across all the
column for each CTYNAME, then use this.
AGG() Function
• agg() function allow to specify multiple aggregation function at once.
In this line of code, agg() function is used to aggregate the
value for count,min,max,mean.
SCALES
Pandas csv
NOMINAL SCALES EXAMPLE
Output:
.astype() simply convert the datatype of one form to another.
ORDINAL SCALES EXAMPLE
Output:
If we want to arrange the resulting data in ordered
form, then ordered attribute is used.
SCALES EXAMPLE
Here, the dtype return is of object type.
Here, the dtype return is of category type as we
change the dtype ‘object’ to category using astype.
PIVOT TABLE
PIVOT TABLE
• To give a better representation where the columns are the unique variables
and an index of dates identifies individual observations.
• To reshape the data into this form, use the pivot function
OUTPUT:
PIVOT TABLE
Here, we can use the aggfunc=[] and pass a number of
aggregate operations you want to apply on.
DATA FUNCTIONAITY IN PANDAS
• Timestamp:
• Period : represents a single time span.
DATA FUNCTIONAITY IN PANDAS
DatetimeIndex: is the index of the timestamp
PeriodIndex: is the index of the period
In this ,(‘abc’) is the index
assigned to timestamp value.
In this ,(‘abc’) is the index
assigned to period value.
CONVERTING TO DATETIME
To convert into datetime format
use ‘to _ datetime()’ .
TIMEDELTAS
• TIMEDELTAS: differences in time
In this, we find the difference between the two
timestamps.
MERGING DATAFRAMES
MERGING DATAFRAMES
Use this dataset to merge the
dataframes
OUTER JOIN
Merge() function is use dto merge
the two dataframes .
INNER JOIN
LEFT JOIN
RIGHT JOIN
MERGING DATAFRAMES

More Related Content

PPTX
Data Analysis in Python-NumPy
PPTX
Data Structures in Python
PPTX
Introduction to pandas
PPT
Java Script ppt
PPT
Introduction to Google Search Console
PPTX
Introduction Node.js
PDF
Data structure ppt
PPTX
Microcytic anemia
Data Analysis in Python-NumPy
Data Structures in Python
Introduction to pandas
Java Script ppt
Introduction to Google Search Console
Introduction Node.js
Data structure ppt
Microcytic anemia

What's hot (20)

PPTX
Introduction to matplotlib
PDF
Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Tra...
PPTX
Modules in Python Programming
PPSX
Modules and packages in python
PPTX
PPT on Data Science Using Python
PPTX
Presentation on data preparation with pandas
PDF
Dimensionality Reduction
PPT
Introduction to method overloading & method overriding in java hdm
PPTX
Basics of Object Oriented Programming in Python
PDF
Data visualization in Python
PPT
Python Pandas
PDF
Introduction to NumPy (PyData SV 2013)
PDF
Arrays in python
PPTX
SUBQUERIES.pptx
ODP
Data Analysis in Python
PDF
Wrapper classes
PDF
Python NumPy Tutorial | NumPy Array | Edureka
PPT
Sql operators & functions 3
PPT
Codd's rules
PDF
pandas - Python Data Analysis
Introduction to matplotlib
Python For Data Analysis | Python Pandas Tutorial | Learn Python | Python Tra...
Modules in Python Programming
Modules and packages in python
PPT on Data Science Using Python
Presentation on data preparation with pandas
Dimensionality Reduction
Introduction to method overloading & method overriding in java hdm
Basics of Object Oriented Programming in Python
Data visualization in Python
Python Pandas
Introduction to NumPy (PyData SV 2013)
Arrays in python
SUBQUERIES.pptx
Data Analysis in Python
Wrapper classes
Python NumPy Tutorial | NumPy Array | Edureka
Sql operators & functions 3
Codd's rules
pandas - Python Data Analysis
Ad

Similar to Pandas csv (20)

PPTX
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
PPTX
Lecture 9.pptx
PPTX
Unit 3_Numpy_Vsp.pptx
PPT
SQL select statement and functions
PPTX
python for data anal gh i o fytysis creation.pptx
PDF
Pandas in Depth_ Data Manipultion(Chapter 5)(Important).pdf
PPTX
Group B - Pandas Pandas is a powerful Python library that provides high-perfo...
PPT
Data structure
PPTX
Practical Tutorial about the PostgreSQL Database
PPTX
Chapter 5-Numpy-Pandas.pptx python programming
PPTX
Python-for-Data-Analysis.pptx
PPTX
MODUL new hlgjg thaybkhvnghgpv7E_02.pptx
PPTX
Unit4pptx__2024_11_ 11_10_16_09.pptx
PPTX
Python for data analysis
PDF
Python-for-Data-Analysis.pdf
PPTX
2. Data Preprocessing with Numpy and Pandas.pptx
PPTX
interenship.pptx
PPTX
Aggregate.pptx
PPTX
introduction to data structures in pandas
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
Lecture 9.pptx
Unit 3_Numpy_Vsp.pptx
SQL select statement and functions
python for data anal gh i o fytysis creation.pptx
Pandas in Depth_ Data Manipultion(Chapter 5)(Important).pdf
Group B - Pandas Pandas is a powerful Python library that provides high-perfo...
Data structure
Practical Tutorial about the PostgreSQL Database
Chapter 5-Numpy-Pandas.pptx python programming
Python-for-Data-Analysis.pptx
MODUL new hlgjg thaybkhvnghgpv7E_02.pptx
Unit4pptx__2024_11_ 11_10_16_09.pptx
Python for data analysis
Python-for-Data-Analysis.pdf
2. Data Preprocessing with Numpy and Pandas.pptx
interenship.pptx
Aggregate.pptx
introduction to data structures in pandas
Ad

Recently uploaded (20)

PPTX
Database Infoormation System (DBIS).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Business Analytics and business intelligence.pdf
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
annual-report-2024-2025 original latest.
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Lecture1 pattern recognition............
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
Database Infoormation System (DBIS).pptx
Clinical guidelines as a resource for EBP(1).pdf
IB Computer Science - Internal Assessment.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Business Analytics and business intelligence.pdf
Qualitative Qantitative and Mixed Methods.pptx
Miokarditis (Inflamasi pada Otot Jantung)
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
annual-report-2024-2025 original latest.
Fluorescence-microscope_Botany_detailed content
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Reliability_Chapter_ presentation 1221.5784
Lecture1 pattern recognition............
Data_Analytics_and_PowerBI_Presentation.pptx
Quality review (1)_presentation of this 21
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
climate analysis of Dhaka ,Banglades.pptx

Pandas csv

  • 2. WHAT IS A CSV FILE ? • CSV files are used to store a large number of variables – or data. • Incredibly simplified spreadsheets – think Excel – only the content is stored in plaintext. • The CSV module is a built-in function that allows Python to parse these types of files. • The text inside a CSV file is laid out in rows, and each of those has columns, all separated by commas. • Every line in the file is a row in the spreadsheet, while the commas are used to define and separate cells.
  • 3. CSV MODULE • The csv module is useful for working with data exported from spreadsheets and databases into text files formatted with fields and records, commonly referred to as comma-separated value (CSV) format because commas are often used to separate the fields in a record. • If you want to import or export spreadsheets and databases for use in the Python interpreter, you must rely on the CSV module, or Comma Separated Values format.
  • 4. STEPS • First, save the excel file with ‘.csv’ extension . • Second, save the csv file in same folder where the python file is there. • And then write the code for reading and writing of the csv file.
  • 5. READING A CSV FILE • There are two ways to read a CSV file. • You can use the csv module’s reader function or you can use the DictReader class. • Using DictReader class: • Here we have open the csv file ‘mpg.csv’ and try to open the file and read the file using DictReader() class. • DictReader() is used to output the data in dictionary format. Here, m[:3] prints the first three row from starting.
  • 6. READING A CSV FILE • USING READER() CLASS: Here , we read the code using the reader() class which separate the row and column value with comma. Output:
  • 7. Writing a CSV File • The csv module also has two methods that you can use to write a CSV file, you can use the writer function or the DictWriter class. • USING DictWriter() CLASS:
  • 8. LOOPING THROUGH ROWS The for loop which defines that for the following indented lines, the row variable should contain each element from the list, and the second line which will print this row variable. We can open the csv file using open(filename.csv) and then perform the operation.
  • 9. LOOPING THROUGH ROWS In this, we create an empty list ‘model_no’ • After creating empty list we append the data of row[2] in the list and print the list. • Once run, this code will print a single list
  • 10. EXTRACTING INFORMATION FROM CSV FILE • If you want information about a particular column then extract it using row[]. • Here in this code, we extract the information about ‘model’ column.
  • 11. CONVERTING LIST TO SETS IN CSV FILE Here in this code, ‘set’ function is used to remove the duplicay of the value and print only the value once. First we import the csv module while manipulating with csv file.
  • 12. PANDAS • Pandas is an open source Python library for data analysis.
  • 13. PANDAS DATA STRUCTURES Pandas introduces two new data structures to Python : • Series • DataFrame
  • 15. SERIES • Series is a one-dimensional labelled array capable of holding any data type. • A Series is a one-dimensional object similar to an array, list, or column in a table. • It will assign a labelled index to each item in the Series. • By default, each item will receive an index label from 0 to N, where N is the length of the Series minus one.
  • 16. SERIES CREATE A SERIES WITH AN ARBITRARY LIST In the output the value in list is arranged in series with the index assigned. The dtype in output is ‘object’ as the strings is taken as object data type. You can arrange the values in the list in series form using pd.series() data structure.
  • 17. SERIES Alternatively , specify an index to use when creating the Series. In this, we can specify the index of the elements which are in the list and then print it, for naming the index we use index=[] .
  • 18. The Series constructor can convert a dictonary as well, using the keys of the dictionary as its index. In this, series constructor convert the dictionary key to use as its index .
  • 19. SERIES EXAMPLE If you want to output the index of the values in the series then use , ‘index’ keyword.
  • 20. SERIES EXAMPLE If one of the elements in the series is ‘None’ then in the output it prints ‘None’ only. If one of the elements in the series is ‘none’ and all elements are numeric then it prints in the output as ‘NaN’ (not a number) value. • NaN is not same as None keyword. • In numpy we use isnan() to check NaN value is there or not.
  • 21. QUERYING A SERIES We can basically query in the series using:  loc() : used when we query about the label  iloc() : used when we query the data using numeric value. When you want to query about the particular element in series using numeric position use ‘iloc[]’ . When you want to query about the particular element in series using label use ‘loc[]’ .
  • 26. DATAFRAME • A DataFrame is a tabular data structure comprised of rows and columns. • A DataFrame is defined as a group of Series objects that share an index (the column names). • The Pandas data frame consists of three main components: the data, the index, and the columns.
  • 27. DATAFRAME EXAMPLE head() is used to displays the first five records of the dataset Here pd.DataFrame() function is used to frame the different series object and output the result in two-dimensional form.
  • 28. EXTRACTING VALUES FROM DATAFRAME To extract the element by label use loc[] attribute. In this code, we find out the customer come in ‘shop 2’ index. We can also extract the element if we want only particular column by their mentioned index, pass two values in df.loc[] function.
  • 29. EXTRACTING VALUES FROM DATAFRAME In this the ‘place’ column is added in the dataframe . We can add any column using this form. If we want to display two or more columns along with the index then we use this form. In this cost and student column is shown only with all indices.
  • 30. RENAME A COLUMN NAME In this , to rename the column we use ‘df.rename(columns={}) ‘ syntax. In this, we write the column name which have to rename. In this, we have to write the new column name which you want to mention.
  • 31. INPLACE • In any method , if inplace is False then operation won’t affect the underlying data. • If the inplace is True then nothing going to print out • And it is tip that something is happen in inplace.
  • 32. DROP To drop any column we use drop() function which drop the mentioned column. In this, we use inplace =True which tell something is happen in inplace and nothing prints it. • Axis=1 is used if we want to drop the column • Axis=0 is used if we want to drop the row.
  • 33. QUERYING A DATAFRAME In this, we want output for the cost>20 value in dataframe and it returns True or False if it satisfies the condition. Where() takes the Boolean masking condition,applies it to the dataframe series and returns a new dataframe of the series of the shape shape. Here count() is used to count the occurrence of cost in dataframe.
  • 34. FILTER THE ROWS WITH NaN VALUE Dropna() function is used to remove the row which contain not a number value. We can also filter the rows or drop row by using this way of writing a code.
  • 35. QUERYING DATAFRAME USING LOGICAL OPERATION Here in this, &(and) operation is used in the two condition and output the result if it satisfies the both condition. Here in this, |(or) operation is used in the two condition and output the result if it satisfies either of the condition.
  • 36. USE THIS DATA FOR INDEXING A DATAFRAME
  • 37. INDEXING A DATAFRAME Index() is used to display the index or rows of the dataframe. Set_index() is used to set the column as an index in the dataframe. Reset_index() is used to reset the index that is set using set_index().
  • 38. HANDLE MISSING VALUES IN PANDAS Output: • Isnull() function returns True for a value if the value is null otherwise returns False. • Tail() function is used to display the last five column from the data.
  • 39. HANDLE MISSING VALUES IN PANDAS Output : Notnull() function returns True if the value is not null and False when value is null.
  • 40. HANDLE MISSING VALUES IN PANDAS Fillna() is used to fill the missing values in csv file to some value named to it. In this , ‘Various’ is used to fill the missing values. Output:
  • 42. GROUPBY • groupby function is used anytime when u want to analyse panda series by some category. Census.csv is a csv file. In this line of code, we want to find the mean of the BIRTHS2012 column for each CTYNAME column.
  • 43. GROUPBY EXAMPLE In this code, if we want to find out the mean of BIRTHS2012 column wrt city name ‘Ada county’ then use this way . Output:
  • 44. GROUPBY EXAMPLE In this line of code, if you want to calculate the mean over across all the column for each CTYNAME, then use this.
  • 45. AGG() Function • agg() function allow to specify multiple aggregation function at once. In this line of code, agg() function is used to aggregate the value for count,min,max,mean.
  • 48. NOMINAL SCALES EXAMPLE Output: .astype() simply convert the datatype of one form to another.
  • 49. ORDINAL SCALES EXAMPLE Output: If we want to arrange the resulting data in ordered form, then ordered attribute is used.
  • 50. SCALES EXAMPLE Here, the dtype return is of object type. Here, the dtype return is of category type as we change the dtype ‘object’ to category using astype.
  • 52. PIVOT TABLE • To give a better representation where the columns are the unique variables and an index of dates identifies individual observations. • To reshape the data into this form, use the pivot function OUTPUT:
  • 53. PIVOT TABLE Here, we can use the aggfunc=[] and pass a number of aggregate operations you want to apply on.
  • 54. DATA FUNCTIONAITY IN PANDAS • Timestamp: • Period : represents a single time span.
  • 55. DATA FUNCTIONAITY IN PANDAS DatetimeIndex: is the index of the timestamp PeriodIndex: is the index of the period In this ,(‘abc’) is the index assigned to timestamp value. In this ,(‘abc’) is the index assigned to period value.
  • 56. CONVERTING TO DATETIME To convert into datetime format use ‘to _ datetime()’ .
  • 57. TIMEDELTAS • TIMEDELTAS: differences in time In this, we find the difference between the two timestamps.
  • 59. MERGING DATAFRAMES Use this dataset to merge the dataframes
  • 60. OUTER JOIN Merge() function is use dto merge the two dataframes .