SlideShare a Scribd company logo
Implementing a data_science_project (Python Version)_part1
Arthur Samuel (1959)
Machine Learning is
the field of study
that gives computers
the ability to learn
without being
The Tools
Implementing a data_science_project (Python Version)_part1
Project Description & Checklist
Data Loading, Merging and Visualisation
Feature Cleaning, Selection & Transformation
Machine Learning Algorithm Adoption
Model Performance Evaluation
Outline
Model Validation, Fine-Tuning & Ensembling
1
Project Description & Checklist
The Description
To use machine learning
techniques to perform
exploratory and predictive
analyses on crime data.
Project Description, Resources & Checklist
The Datasets
Additional data
(to be sourced later)
Dataset D
?
!
Data on the location
(i.e. geographical
coordinates) of the
police stations across
the country.
Dataset C
Data on the names of
police station and the
population that fall
under their
jurisdiction.
Dataset B
Data on crime
reported across the
country and the
respective police
stations
(2015/ 2016).
Dataset A
Project Description & Checklist
Checklist
Checklist 1
Is it a supervised, unsupervised or reinforcement machine
learning project?
Unsupervised
Learning
Computer
learns by
searching
Unsupervised
Learning
Aims at
finding
patterns
Outcome feature is known
Task driven
Fits data
Its goal is to predict values in
continuous (regression) or categorical
(classification) format
Example, in retail business, predict
the credit worthiness of a a potential
customer.
Re-Inforcement
Learning
Unsupervised
Learning
Supervised
Learning
Outcome feature is unknown.
Data driven
Clusters data
Its goal is to find patterns
(clustering) in the data.
Example: Segment clients by socio-
demographic characteristics.
Outcome feature is unknown.
Circumstance driven.
Decides on data
Its goal is to learn how to decide
under a given circumstance.
Example: In forex trading, adjust the
take-loss or take-profit based on the
performance of the traded currency.
Id Province Police Station Population Burglary
AB123 Gauteng Dunnottar 10479 141
AB123 North West Mmabatho 134138 773
Id Province Police Station Population Frequent Crime
AB123 Gauteng Dunnottar 10479 Burglary
AB123 North West Mmabatho 134138 Arson
Label
Supervised Learning
Labelled Data
Label
Id Province Police Station Population Burglary Crime Type
AB123 Gauteng Dunnottar 10479 141 Burglary
AB123 North West Mmabatho 134138 773 Arson
Unsupervised Learning
Unlabelled Data
Project Description & Checklist
Checklist
Checklist 1
Checklist 2
Is it a supervised or unsupervised machine learning project?
Is it a classification or regression task?
Id Province Police Station Population Burglary
AB123 Gauteng Dunnottar 10479 141
AB123 North West Mmabatho 134138 773
Regression
Id Province Police Station Population Frequent Crime
AB123 Gauteng Dunnottar 10479 Burglary
AB123 North West Mmabatho 134138 Arson
Classification
Supervised Learning
Labelled Data
The values are
continuous
The values are
categorical
Project Description, Resources & Checklist
Checklist
Checklist 1
Checklist 2
Is it a supervised, unsupervised or reinforcement machine
learning project?
Is it a classification or regression task?
Checklist 3 Identify the target feature or features to be clustered
Checklist 4 Can I get extra data or feature to boost my project?
Project Description, Resources & Checklist
Checklist 5
Checklist 6
What are the available solutions to the problem?
How do I intend to measure the performance of my model?
Checklist 7 How will my solution be deployed and utilised?
Checklist
2
Video
AudioText
ImageAlpha
Numeric $1,000
Male Female
No
Yes
2014-08-21
10-5
2.0
1
This is a quote by Napoleon Hill.
do small things in a great way.
If you cannot do great things
Data Loading, Merging & Visualisation
Data Form
Data Loading, Merging & Visualisation
Data Location
Computer | Server | Web | Cloud.
Where is the dataset located?
Data Form
Numeric | Text | Image | Audio | Video.
The dataset is what form? Alpha-
Data Size
byte, megabyte, gigabyte or terabyte.
How big is the dataset? Is the size in kilo
Analysis Platform
Can I analyse it on my computer or I need to engage the
Data Flow
as a stream or in batches?
Is it a real time data? Does it come
Data Loading Checklist
service of cloud based computing provider e.g. Microsoft Azure,
Amazon web service (AWS), google cloud etc.
Data Loading, Merging & Visualisation
Data Loading Steps
Step 1
 a webpage will open Type Jupyter Notebook Anaconda PromptStart Menu
Start the Jupyter notebook or your
LET’S DEMONSTRATE THIS
It is assumed that you have already installed Anaconda
Anaconda
In your Windows Start Menu,
type in Anaconda or browse
to find anaconda prompt
Click on Anaconda prompt and a
command prompt will appear
Type Jupyter Notebook and press Enter.
A webpage will come up.
Jupyter notebook
Click on new
Select python3
To change the title click on the
default type and type your title.
Select this each time you want to write code
This is where you will enter your code. Each
time you press Alt+Enter to run your codes
another one will appear.
This box can be in different mode.
Code | Markdown |Raw NBConvert |Heading
Select this each time you want to write comments. It
support HTML codes.
This has option for HTML, LaTex, rest codes to be run.
Select this each time you want to make heading.
Data Loading, Merging & Visualisation
Data Loading Steps
Step 2
Step 1
import os
os.getcwd()
os.chdir('C:/Anaconda3')
Import the python module for checking & changing your directory
a webpage will open Type Jupyter Notebook Anaconda PromptStart Menu
Start the Jupyter notebook or your
LET’S SEE THE CODE ON JUPYTER NOTEBOOK
Implementing a data_science_project (Python Version)_part1
Data Loading, Merging & Visualisation
Data Loading Steps
Step 3
Step 2
Step 1
import pandas as pd
Import the python module for loading data i.e. pandas
import os
os.getcwd()
os.chdir('C:/Anaconda3')
Import the python module for checking & changing your directory
a webpage will open Type Jupyter Notebook Anaconda PromptStart Menu
Start the Jupyter notebook or your
Implementing a data_science_project (Python Version)_part1
Data Loading, Merging & Visualisation
Data Loading Steps
Step 4
Step 3
Step 2
Step 1
Dataset=pd.read_csv(‘C:/MyDataset.csv’)
Load the data
import pandas as pd
Import the python module for loading data i.e. pandas
import os
os.getcwd()
os.chdir('C:/Anaconda3')
Import the python module for checking & changing your directory
 a webpage will open Type Jupyter Notebook Anaconda PromptStart Menu
Start the Jupyter notebook or your
The other kind of data
Format that you can load
That is the folder where you put your dataset.
Note the direction of the slash (/)
If you want it like (), type r’s
r’CAnacondaMyData.csv’
Data Loading, Merging & Visualisation
Data Loading Steps
Step 4
Step 3
Step 2
Step 1
Dataset=pd.read_csv(‘C:/MyDataset.csv’)
Load the data
import pandas as pd
Import the python module for loading data i.e. pandas
import os
os.getcwd()
os.chdir('C:/Anaconda3')
Import the python module for checking & changing your directory
 Webpage will open Type Jupyter Notebook Anaconda PromptStart Menu
Start the Jupyter notebook or your
Data Loading, Merging & Visualisation
Project Data Loading
Viewing the top 5 Records
DatasetA
The dataset is in csv (comma delimited) format
Dataset A - Crime Reported and Police Station
Data Loading, Merging & Visualisation
Project Data Loading
DatasetA
Data Loading, Merging & Visualisation
Reshaping the dataset
DatasetA
Province Police_Station Crime_Category Period_2015_2016
Eastern Cape Aberdeen All theft not mentioned elsewhere 51
Eastern Cape Aberdeen Theft out of or from motor vehicle 7
Eastern Cape Aberdeen Theft of motor vehicle and motorcycle 2
Eastern Cape Aberdeen Stock-theft 20
Long Format
Province Police_Station All theft not
mentioned elsewhere
Theft out of or from
motor vehicle
Theft of motor vehicle
and motorcycle
Stock-theft
Eastern Cape Aberdeen 51 7 2 20
Wide Format
Data Loading, Merging & Visualisation
Project Data Loading
DatasetA
Reshaping (Pivoting) the dataset from "long" to "wide" format
We need to flatten the data frame.
Data Loading, Merging & Visualisation
Project Data Loading
DatasetA
Flattening the pivoted dataset
Data Loading, Merging & Visualisation
Project Data Loading
DatasetA
Data Loading, Merging & Visualisation
Project Data Loading
DatasetA
Check the datasets for duplicates
This is a major checklist before merging this dataset with the other datasets.
Data Loading, Merging & Visualisation
Project Data Loading
Dataset B - Police Station and the Population that they Cover
DatasetB
Viewing the top 5 Records
The dataset is in xlsx (MS excel) format
Data Loading, Merging & Visualisation
Project Data Loading
DatasetB
Viewing the attributes of the features
Check the datasets for duplicates
Data Loading, Merging & Visualisation
Project Data Loading
Dataset C - Police Station and their Geo-Coordinates
DatasetC
Viewing the top 5 Records
The dataset is in tsv (tab delimited) format
Data Loading, Merging & Visualisation
Project Data Loading
DatasetC
Viewing the attributes of the features
Check the datasets for duplicates
Total Records = 1142
Feature
Police_Station
LongitudeY
LatitudeX
Dataset C
Total Records = 1140
Feature
Police_Station
population_estimate
Dataset B
Total Records = 1143
Feature
Province
Police_Station
Crime_Category
Period_2015_2016
Dataset A
Data Loading, Merging & Visualisation
Datasets Merging
Province
Police_Station
Crime_Category
Period_2015_2016
Police_Station
population_estimate
Police_Station
LongitudeY
LatitudeX
1143
1140 1142
Data Loading, Merging & Visualisation
Datasets Merging
Merging Dataset A & B
Note: Dataset A contains more records than Dataset B. Hence, Dataset A is the universal dataset.
Data Loading, Merging & Visualisation
Datasets Merging
Merging Dataset A_B with Dataset C
Merging …
Please subscribe to my youtube channel for the
other versions
And like the video on linkedin and youtube

More Related Content

PDF
Implementing a data science project (R Version) Part1
PDF
EXECUTION OF ASSOCIATION RULE MINING WITH DATA GRIDS IN WEKA 3.8
PDF
QUERY INVERSION TO FIND DATA PROVENANCE
PDF
Top-K Dominating Queries on Incomplete Data with Priorities
PPTX
Property Alignment on Linked Open Data
PPTX
Recommendation Engine Powered by Hadoop
PPTX
From keyword-based search to language-agnostic semantic search
PDF
Document Classification Using Expectation Maximization with Semi Supervised L...
Implementing a data science project (R Version) Part1
EXECUTION OF ASSOCIATION RULE MINING WITH DATA GRIDS IN WEKA 3.8
QUERY INVERSION TO FIND DATA PROVENANCE
Top-K Dominating Queries on Incomplete Data with Priorities
Property Alignment on Linked Open Data
Recommendation Engine Powered by Hadoop
From keyword-based search to language-agnostic semantic search
Document Classification Using Expectation Maximization with Semi Supervised L...

What's hot (19)

PPT
probabilistic ranking
PDF
IRJET- Sentiment Analysis of Election Result based on Twitter Data using R
PDF
J0945761
PDF
Computational model for artificial learning using formal concept analysis
PDF
IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...
PPTX
Data Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
PDF
ICDE 2015 - LDV: Light-weight Database Virtualization
PDF
Performance Analysis of Hashing Mathods on the Employment of App
PPT
Cs583 info-retrieval
PDF
Improving Spam Mail Filtering Using Classification Algorithms With Partition ...
PDF
11.query optimization to improve performance of the code execution
PDF
Query optimization to improve performance of the code execution
PDF
Frequent Item Set Mining - A Review
PDF
Analysis of the Datasets
PPTX
Data Science With Python | Python For Data Science | Python Data Science Cour...
PDF
C04701019027
PDF
An improvised tree algorithm for association rule mining using transaction re...
PDF
Z04506138145
PPTX
Document clustering for forensic analysis
probabilistic ranking
IRJET- Sentiment Analysis of Election Result based on Twitter Data using R
J0945761
Computational model for artificial learning using formal concept analysis
IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...
Data Patterns - A Native Open Source Data Profiling Tool for HPCC Systems
ICDE 2015 - LDV: Light-weight Database Virtualization
Performance Analysis of Hashing Mathods on the Employment of App
Cs583 info-retrieval
Improving Spam Mail Filtering Using Classification Algorithms With Partition ...
11.query optimization to improve performance of the code execution
Query optimization to improve performance of the code execution
Frequent Item Set Mining - A Review
Analysis of the Datasets
Data Science With Python | Python For Data Science | Python Data Science Cour...
C04701019027
An improvised tree algorithm for association rule mining using transaction re...
Z04506138145
Document clustering for forensic analysis
Ad

Similar to Implementing a data_science_project (Python Version)_part1 (20)

PDF
Lesson 2 data preprocessing
PPTX
data_preprocessingknnnaiveandothera.pptx
PDF
Big Data LDN 2018: TIPS AND TRICKS TO WRANGLE BIG, DIRTY DATA
PDF
DataCamp Cheat Sheets 4 Python Users (2020)
PPTX
data wrangling (1).pptx kjhiukjhknjbnkjh
PPTX
Data science tips for data engineers
PDF
Data Science Introduction and Process in Data Science
PPT
Data extraction, cleanup & transformation tools 29.1.16
PPT
Preprocessing_new.ppt
PDF
Data Science and Machine Learning Using Python and Scikit-learn
PPTX
Python for Data Science Professionals.pptx
PPTX
Lecture3.pptx
PDF
Understanding your Data - Data Analytics Lifecycle and Machine Learning
PDF
Wes McKinney - Python for Data Analysis-O'Reilly Media (2012).pdf
PDF
Exploratory Data Analysis - Satyajit.pdf
PDF
lec13_ref.pdf
PPTX
Anwar kamal .pdf.pptx
PDF
Module 1 introduction to machine learning
PDF
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
PDF
Python for Data Analysis_ Data Wrangling with Pandas, Numpy, and Ipython ( PD...
Lesson 2 data preprocessing
data_preprocessingknnnaiveandothera.pptx
Big Data LDN 2018: TIPS AND TRICKS TO WRANGLE BIG, DIRTY DATA
DataCamp Cheat Sheets 4 Python Users (2020)
data wrangling (1).pptx kjhiukjhknjbnkjh
Data science tips for data engineers
Data Science Introduction and Process in Data Science
Data extraction, cleanup & transformation tools 29.1.16
Preprocessing_new.ppt
Data Science and Machine Learning Using Python and Scikit-learn
Python for Data Science Professionals.pptx
Lecture3.pptx
Understanding your Data - Data Analytics Lifecycle and Machine Learning
Wes McKinney - Python for Data Analysis-O'Reilly Media (2012).pdf
Exploratory Data Analysis - Satyajit.pdf
lec13_ref.pdf
Anwar kamal .pdf.pptx
Module 1 introduction to machine learning
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Python for Data Analysis_ Data Wrangling with Pandas, Numpy, and Ipython ( PD...
Ad

More from Dr Sulaimon Afolabi (12)

PPTX
Pragmatic South African Strategies in the Era of Artificial Intelligence
PPTX
Multi image object detection v5
PPTX
Smart tools for modern researchers
PDF
GeoSpatial Analytics
PPTX
Embarking on an AI journey - Africa4Ai
PPTX
State of Africa Artificial Intelliegnce -Part 1
PPTX
State of Africa Artificial Intelliegnce -Part 2
PPTX
Boosting Approach to Solving Machine Learning Problems
PPT
Implementing a data science project (data generation) part2
PDF
OpenHDS / ODK fieldworker manual
PPTX
Encounters with big data
PPTX
Practical Guide for HDSS Data for Analysis using Stata
Pragmatic South African Strategies in the Era of Artificial Intelligence
Multi image object detection v5
Smart tools for modern researchers
GeoSpatial Analytics
Embarking on an AI journey - Africa4Ai
State of Africa Artificial Intelliegnce -Part 1
State of Africa Artificial Intelliegnce -Part 2
Boosting Approach to Solving Machine Learning Problems
Implementing a data science project (data generation) part2
OpenHDS / ODK fieldworker manual
Encounters with big data
Practical Guide for HDSS Data for Analysis using Stata

Recently uploaded (20)

PDF
Sports Quiz easy sports quiz sports quiz
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
Cell Types and Its function , kingdom of life
PDF
Classroom Observation Tools for Teachers
PDF
RMMM.pdf make it easy to upload and study
PPTX
master seminar digital applications in india
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
01-Introduction-to-Information-Management.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Insiders guide to clinical Medicine.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
Cell Structure & Organelles in detailed.
Sports Quiz easy sports quiz sports quiz
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Cell Types and Its function , kingdom of life
Classroom Observation Tools for Teachers
RMMM.pdf make it easy to upload and study
master seminar digital applications in india
STATICS OF THE RIGID BODIES Hibbelers.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Microbial diseases, their pathogenesis and prophylaxis
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Abdominal Access Techniques with Prof. Dr. R K Mishra
01-Introduction-to-Information-Management.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Microbial disease of the cardiovascular and lymphatic systems
PPH.pptx obstetrics and gynecology in nursing
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
human mycosis Human fungal infections are called human mycosis..pptx
Insiders guide to clinical Medicine.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Cell Structure & Organelles in detailed.

Implementing a data_science_project (Python Version)_part1

  • 2. Arthur Samuel (1959) Machine Learning is the field of study that gives computers the ability to learn without being
  • 5. Project Description & Checklist Data Loading, Merging and Visualisation Feature Cleaning, Selection & Transformation Machine Learning Algorithm Adoption Model Performance Evaluation Outline Model Validation, Fine-Tuning & Ensembling
  • 6. 1
  • 7. Project Description & Checklist The Description To use machine learning techniques to perform exploratory and predictive analyses on crime data.
  • 8. Project Description, Resources & Checklist The Datasets Additional data (to be sourced later) Dataset D ? ! Data on the location (i.e. geographical coordinates) of the police stations across the country. Dataset C Data on the names of police station and the population that fall under their jurisdiction. Dataset B Data on crime reported across the country and the respective police stations (2015/ 2016). Dataset A
  • 9. Project Description & Checklist Checklist Checklist 1 Is it a supervised, unsupervised or reinforcement machine learning project?
  • 12. Outcome feature is known Task driven Fits data Its goal is to predict values in continuous (regression) or categorical (classification) format Example, in retail business, predict the credit worthiness of a a potential customer. Re-Inforcement Learning Unsupervised Learning Supervised Learning Outcome feature is unknown. Data driven Clusters data Its goal is to find patterns (clustering) in the data. Example: Segment clients by socio- demographic characteristics. Outcome feature is unknown. Circumstance driven. Decides on data Its goal is to learn how to decide under a given circumstance. Example: In forex trading, adjust the take-loss or take-profit based on the performance of the traded currency.
  • 13. Id Province Police Station Population Burglary AB123 Gauteng Dunnottar 10479 141 AB123 North West Mmabatho 134138 773 Id Province Police Station Population Frequent Crime AB123 Gauteng Dunnottar 10479 Burglary AB123 North West Mmabatho 134138 Arson Label Supervised Learning Labelled Data Label
  • 14. Id Province Police Station Population Burglary Crime Type AB123 Gauteng Dunnottar 10479 141 Burglary AB123 North West Mmabatho 134138 773 Arson Unsupervised Learning Unlabelled Data
  • 15. Project Description & Checklist Checklist Checklist 1 Checklist 2 Is it a supervised or unsupervised machine learning project? Is it a classification or regression task?
  • 16. Id Province Police Station Population Burglary AB123 Gauteng Dunnottar 10479 141 AB123 North West Mmabatho 134138 773 Regression Id Province Police Station Population Frequent Crime AB123 Gauteng Dunnottar 10479 Burglary AB123 North West Mmabatho 134138 Arson Classification Supervised Learning Labelled Data The values are continuous The values are categorical
  • 17. Project Description, Resources & Checklist Checklist Checklist 1 Checklist 2 Is it a supervised, unsupervised or reinforcement machine learning project? Is it a classification or regression task? Checklist 3 Identify the target feature or features to be clustered Checklist 4 Can I get extra data or feature to boost my project?
  • 18. Project Description, Resources & Checklist Checklist 5 Checklist 6 What are the available solutions to the problem? How do I intend to measure the performance of my model? Checklist 7 How will my solution be deployed and utilised? Checklist
  • 19. 2
  • 20. Video AudioText ImageAlpha Numeric $1,000 Male Female No Yes 2014-08-21 10-5 2.0 1 This is a quote by Napoleon Hill. do small things in a great way. If you cannot do great things Data Loading, Merging & Visualisation Data Form
  • 21. Data Loading, Merging & Visualisation Data Location Computer | Server | Web | Cloud. Where is the dataset located? Data Form Numeric | Text | Image | Audio | Video. The dataset is what form? Alpha- Data Size byte, megabyte, gigabyte or terabyte. How big is the dataset? Is the size in kilo Analysis Platform Can I analyse it on my computer or I need to engage the Data Flow as a stream or in batches? Is it a real time data? Does it come Data Loading Checklist service of cloud based computing provider e.g. Microsoft Azure, Amazon web service (AWS), google cloud etc.
  • 22. Data Loading, Merging & Visualisation Data Loading Steps Step 1  a webpage will open Type Jupyter Notebook Anaconda PromptStart Menu Start the Jupyter notebook or your LET’S DEMONSTRATE THIS It is assumed that you have already installed Anaconda
  • 23. Anaconda In your Windows Start Menu, type in Anaconda or browse to find anaconda prompt Click on Anaconda prompt and a command prompt will appear
  • 24. Type Jupyter Notebook and press Enter. A webpage will come up. Jupyter notebook
  • 26. To change the title click on the default type and type your title. Select this each time you want to write code This is where you will enter your code. Each time you press Alt+Enter to run your codes another one will appear. This box can be in different mode. Code | Markdown |Raw NBConvert |Heading Select this each time you want to write comments. It support HTML codes. This has option for HTML, LaTex, rest codes to be run. Select this each time you want to make heading.
  • 27. Data Loading, Merging & Visualisation Data Loading Steps Step 2 Step 1 import os os.getcwd() os.chdir('C:/Anaconda3') Import the python module for checking & changing your directory a webpage will open Type Jupyter Notebook Anaconda PromptStart Menu Start the Jupyter notebook or your LET’S SEE THE CODE ON JUPYTER NOTEBOOK
  • 29. Data Loading, Merging & Visualisation Data Loading Steps Step 3 Step 2 Step 1 import pandas as pd Import the python module for loading data i.e. pandas import os os.getcwd() os.chdir('C:/Anaconda3') Import the python module for checking & changing your directory a webpage will open Type Jupyter Notebook Anaconda PromptStart Menu Start the Jupyter notebook or your
  • 31. Data Loading, Merging & Visualisation Data Loading Steps Step 4 Step 3 Step 2 Step 1 Dataset=pd.read_csv(‘C:/MyDataset.csv’) Load the data import pandas as pd Import the python module for loading data i.e. pandas import os os.getcwd() os.chdir('C:/Anaconda3') Import the python module for checking & changing your directory  a webpage will open Type Jupyter Notebook Anaconda PromptStart Menu Start the Jupyter notebook or your The other kind of data Format that you can load
  • 32. That is the folder where you put your dataset. Note the direction of the slash (/) If you want it like (), type r’s r’CAnacondaMyData.csv’
  • 33. Data Loading, Merging & Visualisation Data Loading Steps Step 4 Step 3 Step 2 Step 1 Dataset=pd.read_csv(‘C:/MyDataset.csv’) Load the data import pandas as pd Import the python module for loading data i.e. pandas import os os.getcwd() os.chdir('C:/Anaconda3') Import the python module for checking & changing your directory  Webpage will open Type Jupyter Notebook Anaconda PromptStart Menu Start the Jupyter notebook or your
  • 34. Data Loading, Merging & Visualisation Project Data Loading Viewing the top 5 Records DatasetA The dataset is in csv (comma delimited) format Dataset A - Crime Reported and Police Station
  • 35. Data Loading, Merging & Visualisation Project Data Loading DatasetA
  • 36. Data Loading, Merging & Visualisation Reshaping the dataset DatasetA Province Police_Station Crime_Category Period_2015_2016 Eastern Cape Aberdeen All theft not mentioned elsewhere 51 Eastern Cape Aberdeen Theft out of or from motor vehicle 7 Eastern Cape Aberdeen Theft of motor vehicle and motorcycle 2 Eastern Cape Aberdeen Stock-theft 20 Long Format Province Police_Station All theft not mentioned elsewhere Theft out of or from motor vehicle Theft of motor vehicle and motorcycle Stock-theft Eastern Cape Aberdeen 51 7 2 20 Wide Format
  • 37. Data Loading, Merging & Visualisation Project Data Loading DatasetA Reshaping (Pivoting) the dataset from "long" to "wide" format We need to flatten the data frame.
  • 38. Data Loading, Merging & Visualisation Project Data Loading DatasetA Flattening the pivoted dataset
  • 39. Data Loading, Merging & Visualisation Project Data Loading DatasetA
  • 40. Data Loading, Merging & Visualisation Project Data Loading DatasetA Check the datasets for duplicates This is a major checklist before merging this dataset with the other datasets.
  • 41. Data Loading, Merging & Visualisation Project Data Loading Dataset B - Police Station and the Population that they Cover DatasetB Viewing the top 5 Records The dataset is in xlsx (MS excel) format
  • 42. Data Loading, Merging & Visualisation Project Data Loading DatasetB Viewing the attributes of the features Check the datasets for duplicates
  • 43. Data Loading, Merging & Visualisation Project Data Loading Dataset C - Police Station and their Geo-Coordinates DatasetC Viewing the top 5 Records The dataset is in tsv (tab delimited) format
  • 44. Data Loading, Merging & Visualisation Project Data Loading DatasetC Viewing the attributes of the features Check the datasets for duplicates
  • 45. Total Records = 1142 Feature Police_Station LongitudeY LatitudeX Dataset C Total Records = 1140 Feature Police_Station population_estimate Dataset B Total Records = 1143 Feature Province Police_Station Crime_Category Period_2015_2016 Dataset A
  • 46. Data Loading, Merging & Visualisation Datasets Merging Province Police_Station Crime_Category Period_2015_2016 Police_Station population_estimate Police_Station LongitudeY LatitudeX 1143 1140 1142
  • 47. Data Loading, Merging & Visualisation Datasets Merging Merging Dataset A & B Note: Dataset A contains more records than Dataset B. Hence, Dataset A is the universal dataset.
  • 48. Data Loading, Merging & Visualisation Datasets Merging Merging Dataset A_B with Dataset C Merging …
  • 49. Please subscribe to my youtube channel for the other versions And like the video on linkedin and youtube