202312 Exploration Data Analysis Visualization (English version)

EDA Visualization
before building model
Orozco Hsu
2023-10-31
1

About me
• Education
• NCU (MIS)、NCCU (CS)
• Work Experience
• Telecom big data Innovation
• AI projects
• Retail marketing technology
• User Group
• TW Spark User Group
• TW Hadoop User Group
• Taiwan Data Engineer Association Director
• Research
• Big Data/ ML/ AIOT/ AI Columnist
2

Tutorial
Content
3
Iris dataset summary
EDA and visualization
Homework

Code
• Download materials:
• https://drive.google.com/drive/folders/1Kaneenrtd2P2IWbo-
PhMd3b6NvtT5FOc?usp=sharing
4

Table
• Load dataset
• Iirs.tab
5

The most recommend dataset
• Where the independent variables are numerical and the dependent
variable is categorical
• The advantage of such a dataset also lies in its ease of clustering
• The preferable data type for the dependent variable is binary,
meaning it is either 'YES' or ‘NO
• When the number of independent variables exceeds two or more, the
accuracy will decrease
• The most commonly used algorithm is logistic regression
6

Homework
• Change dataset, the numeric target feature
• Please explain the data visualization
• housing.tab
14

202312 Exploration Data Analysis Visualization (English version)

More Related Content

More from FEG (20)

Recently uploaded (20)

202312 Exploration Data Analysis Visualization (English version)