SlideShare a Scribd company logo
Comma Separated Files
This lesson focuses on CSV type les. It gives a complete explanation about how to read data from CSV les using
the Pandas library of Python.
W E ' L L C O V E R T H E F O L L O W I N G
• Introduction to CSV le
• Reading CSV le with Pandas
Introduction to CSV le #
Comma-separated files (CSV) are common in machine learning. These files
have a row of data per line of the file and each line is a comma-separated list
in which each element is a column. Pandas makes it easy to read this data.
Reading CSV le with Pandas #
The documentation can be found here. Before reading a CSV file, there are
three parameters that should be known:
sep - this defaults to a comma, but we can specify anything we want. For
example, CSV format is poor if some of your columns contain commas. A
better option might be a |.
header - which row (if any) have the column names.
names - column names to use.
If your CSV is well-formatted where the first row is the column names, then
the default parameters should work well.
It is important to note that while it might sound simple to read in a CSV file
without Pandas, CSV files are often very messy and reading them
appropriately can often consist of handling many edge cases. Pandas module
handles many of those edge cases right out of the box and has many
parameters that you can change to handle messier CSV files.
Let’s see an example with code.
import pandas as pd
# Define the column names as a list
names = ['age', 'workclass', 'fnlwgt', 'education', 'educationnum', 'maritalstatus', 'occupat
'sex', 'capitalgain', 'capitalloss', 'hoursperweek', 'nativecountry', 'label']
# Read in the CSV file from the webpage using the defined column names
df = pd.read_csv("adult.data", header=None, names=names)
print(df.head())
Let’s deconstruct the code. First, we have a CSV of data that lives at this page.
Go to the page, click on Data Folder and then select adult.data file. It will
download the small dataset for you. Once downloaded, open it in Excel or any
text editor. You will see rows of data with each column separated by commas.
You will notice that this CSV doesn’t have column names. Fortunately, we
know what the names should be and supplied them to the names parameter at
line 2. Since Pandas assumes the first row is the header (columns),
header=None was used at line 7, so it would read the first row as data.
The data is read into a Pandas dataframe.
A Pandas dataframe is very similar to a matrix of data, but with some
additional benefits (usually at the cost of performance). For example, you get
named columns and rows as well as the ability to store different types of data
in each column. For example, one column could be integers and another text.
You might imagine a dataframe as an Excel sheet of data. For example, if you
had a sheet with a column of dates and a column of temperatures on those
dates, you could easily represent it as a dataframe.
We can see the first 5 rows of data by calling the head() function on the
dataframe. In the next section on describing data, we will go more in-depth on
how to use a dataframe.
Next, let’s look at reading in JSON files.
Dealing with files in python specially CSV files

More Related Content

PPTX
Pandas csv
PPTX
Python and CSV Connectivity
PDF
The Pandas Chapter 5(Important Questions).pdf
PPTX
ReadingWriting_CSV_files.pptx sjdjs sjbjs sjnd
PDF
CSV Files-1.pdf
PPTX
Lecture 3 intro2data
PPTX
How to process csv files
PPTX
Group B - Pandas Pandas is a powerful Python library that provides high-perfo...
Pandas csv
Python and CSV Connectivity
The Pandas Chapter 5(Important Questions).pdf
ReadingWriting_CSV_files.pptx sjdjs sjbjs sjnd
CSV Files-1.pdf
Lecture 3 intro2data
How to process csv files
Group B - Pandas Pandas is a powerful Python library that provides high-perfo...

Similar to Dealing with files in python specially CSV files (20)

PPTX
Reading_csv.pptx
PDF
Python - Lecture 11
PPTX
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
PPTX
Pandas-(Ziad).pptx
PPTX
Python Pandas.pptx
PPTX
Data Management in Python
PPTX
pandas directories on the python language.pptx
PPTX
Introduccion a Pandas_cargar datos, modelar, analizar, manipular y prepararlo...
PPTX
Python libraries for analysis Pandas.pptx
PPTX
Presentation on the basic of numpy and Pandas
PPTX
CSV_FILES.pptx
PPTX
PDF
pandas.pdf
PDF
pandas (1).pdf
PPTX
Lecture 9.pptx
PPTX
Unit 5 Introduction to Built-in Packages in python .pptx
PDF
Csv python-project
PPTX
interenship.pptx
PDF
Importing Data Sets | Importing Data Sets | Importing Data Sets
PDF
Chapter-12eng-Data-Transfer-Between-Files-SQL-Databases-and-DataFrames.pdf
Reading_csv.pptx
Python - Lecture 11
Pandas yayyyyyyyyyyyyyyyyyin Python.pptx
Pandas-(Ziad).pptx
Python Pandas.pptx
Data Management in Python
pandas directories on the python language.pptx
Introduccion a Pandas_cargar datos, modelar, analizar, manipular y prepararlo...
Python libraries for analysis Pandas.pptx
Presentation on the basic of numpy and Pandas
CSV_FILES.pptx
pandas.pdf
pandas (1).pdf
Lecture 9.pptx
Unit 5 Introduction to Built-in Packages in python .pptx
Csv python-project
interenship.pptx
Importing Data Sets | Importing Data Sets | Importing Data Sets
Chapter-12eng-Data-Transfer-Between-Files-SQL-Databases-and-DataFrames.pdf
Ad

Recently uploaded (20)

PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
Classroom Observation Tools for Teachers
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PPTX
Lesson notes of climatology university.
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PPTX
Digestion and Absorption of Carbohydrates, Proteina and Fats
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Hazard Identification & Risk Assessment .pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
History, Philosophy and sociology of education (1).pptx
PPTX
Introduction to Building Materials
PDF
What if we spent less time fighting change, and more time building what’s rig...
PPTX
UNIT III MENTAL HEALTH NURSING ASSESSMENT
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PDF
Indian roads congress 037 - 2012 Flexible pavement
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
Unit 4 Skeletal System.ppt.pptxopresentatiom
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
Classroom Observation Tools for Teachers
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Lesson notes of climatology university.
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
Digestion and Absorption of Carbohydrates, Proteina and Fats
Final Presentation General Medicine 03-08-2024.pptx
Hazard Identification & Risk Assessment .pdf
Final Presentation General Medicine 03-08-2024.pptx
History, Philosophy and sociology of education (1).pptx
Introduction to Building Materials
What if we spent less time fighting change, and more time building what’s rig...
UNIT III MENTAL HEALTH NURSING ASSESSMENT
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
Indian roads congress 037 - 2012 Flexible pavement
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Unit 4 Skeletal System.ppt.pptxopresentatiom
Ad

Dealing with files in python specially CSV files

  • 1. Comma Separated Files This lesson focuses on CSV type les. It gives a complete explanation about how to read data from CSV les using the Pandas library of Python. W E ' L L C O V E R T H E F O L L O W I N G • Introduction to CSV le • Reading CSV le with Pandas Introduction to CSV le # Comma-separated files (CSV) are common in machine learning. These files have a row of data per line of the file and each line is a comma-separated list in which each element is a column. Pandas makes it easy to read this data. Reading CSV le with Pandas # The documentation can be found here. Before reading a CSV file, there are three parameters that should be known: sep - this defaults to a comma, but we can specify anything we want. For example, CSV format is poor if some of your columns contain commas. A better option might be a |. header - which row (if any) have the column names. names - column names to use. If your CSV is well-formatted where the first row is the column names, then the default parameters should work well. It is important to note that while it might sound simple to read in a CSV file without Pandas, CSV files are often very messy and reading them appropriately can often consist of handling many edge cases. Pandas module handles many of those edge cases right out of the box and has many
  • 2. parameters that you can change to handle messier CSV files. Let’s see an example with code. import pandas as pd # Define the column names as a list names = ['age', 'workclass', 'fnlwgt', 'education', 'educationnum', 'maritalstatus', 'occupat 'sex', 'capitalgain', 'capitalloss', 'hoursperweek', 'nativecountry', 'label'] # Read in the CSV file from the webpage using the defined column names df = pd.read_csv("adult.data", header=None, names=names) print(df.head()) Let’s deconstruct the code. First, we have a CSV of data that lives at this page. Go to the page, click on Data Folder and then select adult.data file. It will download the small dataset for you. Once downloaded, open it in Excel or any text editor. You will see rows of data with each column separated by commas. You will notice that this CSV doesn’t have column names. Fortunately, we know what the names should be and supplied them to the names parameter at line 2. Since Pandas assumes the first row is the header (columns), header=None was used at line 7, so it would read the first row as data. The data is read into a Pandas dataframe. A Pandas dataframe is very similar to a matrix of data, but with some additional benefits (usually at the cost of performance). For example, you get named columns and rows as well as the ability to store different types of data in each column. For example, one column could be integers and another text. You might imagine a dataframe as an Excel sheet of data. For example, if you had a sheet with a column of dates and a column of temperatures on those dates, you could easily represent it as a dataframe. We can see the first 5 rows of data by calling the head() function on the dataframe. In the next section on describing data, we will go more in-depth on how to use a dataframe. Next, let’s look at reading in JSON files.