SlideShare a Scribd company logo
PROGRAMMING CRASH COURSE
KEDS BIODESIGNS
WEEK : II
CLASS : 3
SESSION :ML SYSTEM
MOST IMPORTANT THING ??
 It is the data that we need to load for starting any of the ML project. With respect to data, the most
common format of data for ML projects is CSV (comma-separated values).
 CSV is a simple file format which is used to store tabular data (number and text) such as a spreadsheet in
plain text. In Python, we can load CSV data into with different ways but before loading CSV data we must
have to take care about some considerations.
CONSIDERATION WHILE LOADING CSV DATA
 CSV data format is the most common format for ML data, but we need to take care about following
major considerations while loading the same into our ML projects.
File Header
 In CSV data files, the header contains the information for each field. We must use the same delimiter for
the header file and for data file because it is the header file that specifies how should data fields be
interpreted.
 The following are the two cases related to CSV file header which must be considered −
 Case-I: When Data file is having a file header − It will automatically assign the names to each column
of data if data file is having a file header.
 Case-II: When Data file is not having a file header − We need to assign the names to each column of
data manually if data file is not having a file header.
 In both the cases, we must need to specify explicitly weather our CSV file contains header or not.
Remarks
 Remarks in any information document are having their criticalness. In CSV information record, remarks are
demonstrated by a hash (#) toward the beginning of the line. We have to consider remarks while stacking CSV
information into ML ventures in such a case that we are having remarks in the document then we may need to
demonstrate, relies on the technique we decide for stacking, regardless of whether to anticipate those remarks
or not.
Delimiter
 In CSV information documents, comma (,) character is the standard delimiter. The job of delimiter is to isolate
the qualities in the fields. It is critical to consider the job of delimiter while transferring the CSV record into ML
ventures since we can likewise utilize an alternate delimiter, for example, a tab or void area. In any case, on
account of utilizing an alternate delimiter than standard one, we should need to determine it expressly.
Statements
 In CSV information documents, the twofold citation (" ") mark is the default quote character. It is
imperative to think about the job of statements while transferring the CSV record into ML ventures
we can likewise utilize another statement character than the twofold quote. Be that as it may, if there
should be an occurrence of utilizing an alternate statement character than the standard one, we
need to determine it unequivocally.
METHODS TO LOAD CSV DATA FILE
Load CSV with Python Standard
Library
 The first and most used approach
to load CSV data file is the use of
Python standard library which
provides us a variety of built-in
modules namely csv module and
the reader () function. The
following is an example of loading
CSV data file with the help of it
 Another approach to load CSV data file
is NumPy and numpy.loadtxt() function
. The following is an example of
loading CSV data file with the help of it
−
Example
 In this example, we are using the Pima
Indians Dataset having the data of
diabetic patients. This dataset is a
numeric dataset with no header. It can
also be downloaded into our local
directory. After loading the data file,
we can convert it into NumPy array
and use it for ML projects. The
following is the Python script for
loading CSV data file.
SCRIPT
 The following is the Python script for loading CSV data file, along with providing the headers names
too, using Pandas on Pima Indians Diabetes dataset

More Related Content

PDF
CSPro Training Slides
PDF
HPCC Systems Engineering Summit Presentation - Improving Thor Data Loading us...
PPTX
Bdam presentation on parquet
PPTX
Unit 5-apache hive
PPT
Search & Replace
PPTX
Rise of Column Oriented Database
PPTX
Programming in C++ and Data Strucutres
PDF
New in Hadoop: You should know the Various File Format in Hadoop.
CSPro Training Slides
HPCC Systems Engineering Summit Presentation - Improving Thor Data Loading us...
Bdam presentation on parquet
Unit 5-apache hive
Search & Replace
Rise of Column Oriented Database
Programming in C++ and Data Strucutres
New in Hadoop: You should know the Various File Format in Hadoop.

Similar to Machine learning session 3 (20)

PDF
CSV Files-1.pdf
PPTX
oops (1).pptx
PDF
Preparing a Dataset for Processing
PPTX
Python and CSV Connectivity
PPTX
WebSphere Commerce v7 Data Load
PDF
The Pandas Chapter 5(Important Questions).pdf
DOCX
Cassandra data modelling best practices
PPTX
Machine learning session 5
DOCX
Microsoft Fabric data warehouse by dataplatr
PPTX
Introduction of ssis
PDF
Chapter-12eng-Data-Transfer-Between-Files-SQL-Databases-and-DataFrames.pdf
PPT
CS8091_BDA_Unit_V_NoSQL
PDF
Aws data analytics practice tests 2022
PPTX
Big Data Analytics (BAD601) Module-4.pptx
PPTX
csv_file_by_deb_real[1].pptxurtyh6yg87ygh
PPTX
Data file handling in python binary & csv files
PPTX
Data file handling in python binary & csv files
PDF
1.CSV stands for commma seperated values which are applied to move.pdf
DOCX
Apache hive
PDF
Data Wrangling and Visualization Using Python
CSV Files-1.pdf
oops (1).pptx
Preparing a Dataset for Processing
Python and CSV Connectivity
WebSphere Commerce v7 Data Load
The Pandas Chapter 5(Important Questions).pdf
Cassandra data modelling best practices
Machine learning session 5
Microsoft Fabric data warehouse by dataplatr
Introduction of ssis
Chapter-12eng-Data-Transfer-Between-Files-SQL-Databases-and-DataFrames.pdf
CS8091_BDA_Unit_V_NoSQL
Aws data analytics practice tests 2022
Big Data Analytics (BAD601) Module-4.pptx
csv_file_by_deb_real[1].pptxurtyh6yg87ygh
Data file handling in python binary & csv files
Data file handling in python binary & csv files
1.CSV stands for commma seperated values which are applied to move.pdf
Apache hive
Data Wrangling and Visualization Using Python
Ad

More from NirsandhG (10)

PPTX
Machine learning session 10
PPTX
Machine learning session 9
PPTX
Machine learning session 8
PPTX
Machine learning session 7
PPTX
Machine learning session 4
PPTX
Machine learning session 1
PPTX
Augmented reality session 5
PPTX
Augmented reality session 4
PPTX
Augmented reality session 3
PPTX
Augmented reality session 2
Machine learning session 10
Machine learning session 9
Machine learning session 8
Machine learning session 7
Machine learning session 4
Machine learning session 1
Augmented reality session 5
Augmented reality session 4
Augmented reality session 3
Augmented reality session 2
Ad

Recently uploaded (20)

PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
Trump Administration's workforce development strategy
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Lesson notes of climatology university.
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Computing-Curriculum for Schools in Ghana
PPTX
Cell Types and Its function , kingdom of life
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
VCE English Exam - Section C Student Revision Booklet
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Trump Administration's workforce development strategy
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Lesson notes of climatology university.
2.FourierTransform-ShortQuestionswithAnswers.pdf
Computing-Curriculum for Schools in Ghana
Cell Types and Its function , kingdom of life
Microbial disease of the cardiovascular and lymphatic systems
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Orientation - ARALprogram of Deped to the Parents.pptx
01-Introduction-to-Information-Management.pdf
Pharma ospi slides which help in ospi learning
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE

Machine learning session 3

  • 1. PROGRAMMING CRASH COURSE KEDS BIODESIGNS WEEK : II CLASS : 3 SESSION :ML SYSTEM
  • 2. MOST IMPORTANT THING ??  It is the data that we need to load for starting any of the ML project. With respect to data, the most common format of data for ML projects is CSV (comma-separated values).  CSV is a simple file format which is used to store tabular data (number and text) such as a spreadsheet in plain text. In Python, we can load CSV data into with different ways but before loading CSV data we must have to take care about some considerations.
  • 3. CONSIDERATION WHILE LOADING CSV DATA  CSV data format is the most common format for ML data, but we need to take care about following major considerations while loading the same into our ML projects. File Header  In CSV data files, the header contains the information for each field. We must use the same delimiter for the header file and for data file because it is the header file that specifies how should data fields be interpreted.  The following are the two cases related to CSV file header which must be considered −  Case-I: When Data file is having a file header − It will automatically assign the names to each column of data if data file is having a file header.  Case-II: When Data file is not having a file header − We need to assign the names to each column of data manually if data file is not having a file header.  In both the cases, we must need to specify explicitly weather our CSV file contains header or not.
  • 4. Remarks  Remarks in any information document are having their criticalness. In CSV information record, remarks are demonstrated by a hash (#) toward the beginning of the line. We have to consider remarks while stacking CSV information into ML ventures in such a case that we are having remarks in the document then we may need to demonstrate, relies on the technique we decide for stacking, regardless of whether to anticipate those remarks or not. Delimiter  In CSV information documents, comma (,) character is the standard delimiter. The job of delimiter is to isolate the qualities in the fields. It is critical to consider the job of delimiter while transferring the CSV record into ML ventures since we can likewise utilize an alternate delimiter, for example, a tab or void area. In any case, on account of utilizing an alternate delimiter than standard one, we should need to determine it expressly.
  • 5. Statements  In CSV information documents, the twofold citation (" ") mark is the default quote character. It is imperative to think about the job of statements while transferring the CSV record into ML ventures we can likewise utilize another statement character than the twofold quote. Be that as it may, if there should be an occurrence of utilizing an alternate statement character than the standard one, we need to determine it unequivocally.
  • 6. METHODS TO LOAD CSV DATA FILE Load CSV with Python Standard Library  The first and most used approach to load CSV data file is the use of Python standard library which provides us a variety of built-in modules namely csv module and the reader () function. The following is an example of loading CSV data file with the help of it
  • 7.  Another approach to load CSV data file is NumPy and numpy.loadtxt() function . The following is an example of loading CSV data file with the help of it − Example  In this example, we are using the Pima Indians Dataset having the data of diabetic patients. This dataset is a numeric dataset with no header. It can also be downloaded into our local directory. After loading the data file, we can convert it into NumPy array and use it for ML projects. The following is the Python script for loading CSV data file.
  • 8. SCRIPT  The following is the Python script for loading CSV data file, along with providing the headers names too, using Pandas on Pima Indians Diabetes dataset