SlideShare a Scribd company logo
EDA Visualization
before building model
Orozco Hsu
2023-10-31
1
About me
• Education
• NCU (MIS)、NCCU (CS)
• Work Experience
• Telecom big data Innovation
• AI projects
• Retail marketing technology
• User Group
• TW Spark User Group
• TW Hadoop User Group
• Taiwan Data Engineer Association Director
• Research
• Big Data/ ML/ AIOT/ AI Columnist
2
Tutorial
Content
3
Iris dataset summary
EDA and visualization
Homework
Code
• Download materials:
• https://drive.google.com/drive/folders/1Kaneenrtd2P2IWbo-
PhMd3b6NvtT5FOc?usp=sharing
4
Table
• Load dataset
• Iirs.tab
5
The most recommend dataset
• Where the independent variables are numerical and the dependent
variable is categorical
• The advantage of such a dataset also lies in its ease of clustering
• The preferable data type for the dependent variable is binary,
meaning it is either 'YES' or ‘NO
• When the number of independent variables exceeds two or more, the
accuracy will decrease
• The most commonly used algorithm is logistic regression
6
Table pivot
7
Rank
8
Correlations
9
Scatter plot
10
Distribution
11
Box plot
12
Feature Statistics
13
Homework
• Change dataset, the numeric target feature
• Please explain the data visualization
• housing.tab
14

More Related Content

PDF
Supervised learning in decision tree algorithm
 
PDF
Unsupervised learning in data clustering
 
PDF
CNN_Image Classification for deep learning.pdf
 
PDF
Sequence Model with practicing hands on coding.pdf
 
PDF
Seq2seq Model introduction with practicing hands on coding.pdf
 
PDF
AIGEN introduction with practicing hands on coding.pdf
 
PDF
資料視覺化_Exploation_Data_Analysis_20241015.pdf
 
PDF
Operation_research_Linear_programming_20241015.pdf
 
Supervised learning in decision tree algorithm
 
Unsupervised learning in data clustering
 
CNN_Image Classification for deep learning.pdf
 
Sequence Model with practicing hands on coding.pdf
 
Seq2seq Model introduction with practicing hands on coding.pdf
 
AIGEN introduction with practicing hands on coding.pdf
 
資料視覺化_Exploation_Data_Analysis_20241015.pdf
 
Operation_research_Linear_programming_20241015.pdf
 

More from FEG (20)

PDF
Operation_research_Linear_programming_20241112.pdf
 
PDF
非監督是學習_Kmeans_process_visualization20241110.pdf
 
PDF
Sequence Model pytorch at colab with gpu.pdf
 
PDF
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
 
PDF
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
 
PDF
Pytorch cnn netowork introduction 20240318
 
PDF
2023 Decision Tree analysis in business practices
 
PDF
2023 Clustering analysis using Python from scratch
 
PDF
2023 Data visualization using Python from scratch
 
PDF
2023 Supervised Learning for Orange3 from scratch
 
PDF
2023 Supervised_Learning_Association_Rules
 
PDF
202312 Exploration of Data Analysis Visualization
 
PDF
Transfer Learning (20230516)
 
PDF
Image Classification (20230411)
 
PDF
Google CoLab (20230321)
 
PDF
Supervised Learning
 
PDF
UnSupervised Learning Clustering
 
PDF
Data Visualization in Excel
 
PDF
6_Association_rule_碩士班第六次.pdf
 
PDF
5_Neural_network_碩士班第五次.pdf
 
Operation_research_Linear_programming_20241112.pdf
 
非監督是學習_Kmeans_process_visualization20241110.pdf
 
Sequence Model pytorch at colab with gpu.pdf
 
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
 
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
 
Pytorch cnn netowork introduction 20240318
 
2023 Decision Tree analysis in business practices
 
2023 Clustering analysis using Python from scratch
 
2023 Data visualization using Python from scratch
 
2023 Supervised Learning for Orange3 from scratch
 
2023 Supervised_Learning_Association_Rules
 
202312 Exploration of Data Analysis Visualization
 
Transfer Learning (20230516)
 
Image Classification (20230411)
 
Google CoLab (20230321)
 
Supervised Learning
 
UnSupervised Learning Clustering
 
Data Visualization in Excel
 
6_Association_rule_碩士班第六次.pdf
 
5_Neural_network_碩士班第五次.pdf
 
Ad

Recently uploaded (20)

PPTX
Global journeys: estimating international migration
PDF
Introduction to Business Data Analytics.
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Global journeys: estimating international migration
Introduction to Business Data Analytics.
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Business Acumen Training GuidePresentation.pptx
Database Infoormation System (DBIS).pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Miokarditis (Inflamasi pada Otot Jantung)
Major-Components-ofNKJNNKNKNKNKronment.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction-to-Cloud-ComputingFinal.pptx
climate analysis of Dhaka ,Banglades.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Clinical guidelines as a resource for EBP(1).pdf
Introduction to Knowledge Engineering Part 1
IB Computer Science - Internal Assessment.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Ad

202312 Exploration Data Analysis Visualization (English version)