SlideShare a Scribd company logo
Unsupervised
Learning
Orozco Hsu
2023-11-21 1
About me
• Education
• NCU (MIS)、NCCU (CS)
• Work Experience
• Telecom big data Innovation
• AI projects
• Retail marketing technology
• User Group
• TW Spark User Group
• TW Hadoop User Group
• Taiwan Data Engineer Association Director
• Research
• Big Data/ ML/ AIOT/ AI Columnist
2
Tutorial
Content
3
Getting started unsupervised learning with
Orange3 (K-means and Associated Rules)
Home works
What is the unsupervised learning
Supervised learning vs. Unsupervised learning
• Supervised learning: Discover patterns in the data that relate data
attributes with a target (class) attribute.
• These patterns are then utilized to predict the values of the target attribute in
future data instances.
• Unsupervised learning: The data have no target attribute.
• We want to explore the data to find some intrinsic structures in them.
• Classic unsupervised learning algorithm
• Clustering algorithms (Inductive/ Transductive learning)
• Association rules (also called Market Basket Analysis)
4
2023 Supervised_Learning_Association_Rules
K-means
K-means
7
K-means
(Data observation: Shall we PREPROC data?)
8
Transformation
Transform data before K-means
• Many statistical tests make the assumption that datasets are normally
distributed.
• However, this is often NOT the case in practice.
• Transformations:
• Log Transformation: Transform the response variable from y to log(y).
• Square Root Transformation: Transform the response variable from y to y1/2.
• Cube Root Transformation: Transform the response variable from y to y1/3.
Log Transformation
Square Root Transformation
Cube Root Transformation
Quiz 1:
• Why should we transform data?
• Answer 1 : To avoid overfitting. Ok, but what is the overfitting?
Re-Scaling
Standardize Data
• Standardization (Z-scores) rescales a
dataset to have a mean of 0 and a
standard deviation of 1.
• We typically standardize data when we’d
like to know how many standard
deviations each value in a dataset lies
from the mean.
Normalize Data
• Normalization rescales a dataset so that
each value falls between 0 and 1.
• Typically we normalize data when
performing some type of analysis in
which we have multiple variables that
are measured on different scales and we
want each of the variables to have the
same range.
Quiz 2:
• When conducting K-means, how should categorical variable be
handled?
• When conducting K-means on numerical variables with severe
skewness distribution, how to handle with it?
• If we have segmented several groups by the re-scaled data, how to
proceed new data and group assignment? (Using K-means)
• Answer 1: Union all data, and rebuild model again.
• Answer 2: ??
Example of Clustering analysis
Example of Cluster Analysis
• Retail Marketing
• The company can then send personalized advertisements or sales letters to
each household based on how likely they are to respond to specific types
of advertisements.
Example of Cluster Analysis
• Streaming Services
• Using these metrics, a streaming service can perform cluster analysis
to identify high usage and low usage users so that they can know who
they should spend most of their advertising dollars on.
Example of Cluster Analysis
• Sports Science
• They can then feed these variables into a clustering algorithm to
identify players that are similar to each other so that they can have
these players practice with each other and perform specific drills
based on their strengths and weaknesses.
Example of Cluster Analysis
• Email Marketing
• Using these metrics, a business can perform
cluster analysis to identify consumers who use
email in similar ways and tailor the types of
emails and frequency of emails they send to
different clusters of customers.
https://email.uplers.com/blog/email-segmentation-recipe-great-email-marketing/
Example of Cluster Analysis
• Health Insurance
• An actuary can then feed these variables into a clustering algorithm
to identify households that are similar. The health insurance company
can then set monthly premiums based on how often they expect
households in specific clusters to use their insurance.
Association Rules
Association Rules
• In a transaction database with a large amount of data, look
for items correlations.
• The classic story of Walmart diapers and beer.
• Selling these two unrelated products together can actually increase
sales.
26
In general, the correlations can’t be obtained through direct observation, but through algorithms.
Association Rules
• Two steps as below.
• First, obtain the frequent item sets!
• A collection of items that often appear together.
• Utilizing Apriori algorithm.
• Second, generate Association Rules from frequent item sets!
• There may be strong correlations based on frequent item sets.
• Must meet the definition such like Min Supportance or Min confidence.
27
Association Rules
• From sales database, we found {B, C, E} items have high
correlation. That is called frequent item sets.
• According to {B, E} are likely to be purchased together, that is
called strength of association.
• How strong of association, we estimate Supportance and
Confidence.
28
Association Rules
• Supportance
• If the total transaction data has 200 records, and the item Sausage
has 20 records, then its Supportance is 50/200 = 1/4, that is, the
support of sausage is 25%.
• Confidence range: [0, 1].
• Indicates the conditional probability of two items appearing at the
same time. Simply put, it is the probability of item A appearing
when item B has already appeared.
29
Confidence(A -> B) =
• P(A|B): The probability that A will occur
under the conditions that B occurs
• P ( A ∩ B ) or P ( A , B ) or P ( A B ) : The
probability that two events will occur together
Association Rules
• Min Supportance and Min Confidence:
• Generally, we define support as 50%, which means that the purchased
product set {A, B} appears in at least 50% of the total times before it is
considered a frequent item set.
30
If the Supportance/ Confidence is set too low, too many
association rules will appear in the results.
If it is too high, there will be too few association rules,
which is not conducive to us making decisions based on
the association results. 。
Association Rules
• Outputs:
• A bunch of rules are generated, we use to sorting by Supportance or
Confidence to find what we are interesting.
Example of Association Rules
Association Rules (Item Bundle sales )
• Mixed Bundling
33
Association Rules (Item Bundle sales )
• Cross industry bundling
• Gund Teddy Bear and Amazon.com Gift Cards (Bundle)
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules
Workflows
Have a look on Food dataset
Metrics Description
Supportance how often a rule is applicable to a given data set (rule/data)
Confidence how frequently items in Y appear in transactions with X or in other words how
frequently the rule is true (support for a rule/support of antecedent)
Coverage how often antecedent item is found in the data set (support of antecedent/data)
Strength (support of consequent/support of antecedent)
Lift how frequently a rule is true per consequent item (data * confidence/support of
consequent)
Leverage the difference between two item appearing in a transaction and the two items
appearing independently (support*data - antecedent support * consequent
support/data2)
2023 Supervised_Learning_Association_Rules
2023 Supervised_Learning_Association_Rules
Association Rules analysis
A logical step would be to place Wine closer to the (Nuts, Aspirin, Pancakes) section
The condition holds when looking from the left Antecedent toward on the right Consequent, but NOT in reverse!
Association Rules analysis
• If we are running a promotion for Wine, which products should we
emphasize?
Home works
• Modifying the file format (20231121_hw.csv) to a format compatible
with Orange 3 Association Rules.
• Please identify what have you discovered any
interesting association rules?
The first row is all item names, go
allover purchase item and mark the
values 1; otherwise mark as ? (not 0)

More Related Content

PPTX
Masket Basket Analysis
PPTX
Unit 4_ML.pptx
PPTX
MODULE 5 _ Mining frequent patterns and associations.pptx
PPTX
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
PPTX
big data seminar.pptx
PPTX
MIning association rules and frequent patterns.pptx
PPTX
RS NAIVE BAYES ASSOCIATION RULE MINING AND BLACK BOX
PPTX
BAS 250 Lecture 4
Masket Basket Analysis
Unit 4_ML.pptx
MODULE 5 _ Mining frequent patterns and associations.pptx
Module_6_-_Datamining_tasks_and_tools_uGuVaDv4iv-2.pptx
big data seminar.pptx
MIning association rules and frequent patterns.pptx
RS NAIVE BAYES ASSOCIATION RULE MINING AND BLACK BOX
BAS 250 Lecture 4

Similar to 2023 Supervised_Learning_Association_Rules (20)

PPT
Cluster2
PPTX
Data Mining Functionalities and data mining
PDF
Data Science - Part VI - Market Basket and Product Recommendation Engines
PPTX
2_From Business Problems to Data Mining Tasks.pptx
PPTX
Instacart Market Basket Analysis
PPTX
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
PPTX
Association rule introduction, Market basket Analysis
PDF
Market Basket Analysis of bakery Shop
PDF
What goes with what (Market Basket Analysis)
PPTX
WEEK 11 - Association Mining_020520.pptx
PPTX
chapter 1 powerpoint presentation for data and analytics
PPTX
Recommended System.pptx
PPTX
apriori.pptx
PDF
IRJET- Minning Frequent Patterns,Associations and Correlations
PPTX
Association rule mining and Apriori algorithm
PDF
Probability Distributions of Univariate Data
PPT
lect1lect1lect1lect1lect1lect1lect1lect1.ppt
PPT
Data Mining Course Overview Overview.ppt
PPT
data mining presentation power point for the study
Cluster2
Data Mining Functionalities and data mining
Data Science - Part VI - Market Basket and Product Recommendation Engines
2_From Business Problems to Data Mining Tasks.pptx
Instacart Market Basket Analysis
What is FP Growth Analysis and How Can a Business Use Frequent Pattern Mining...
Association rule introduction, Market basket Analysis
Market Basket Analysis of bakery Shop
What goes with what (Market Basket Analysis)
WEEK 11 - Association Mining_020520.pptx
chapter 1 powerpoint presentation for data and analytics
Recommended System.pptx
apriori.pptx
IRJET- Minning Frequent Patterns,Associations and Correlations
Association rule mining and Apriori algorithm
Probability Distributions of Univariate Data
lect1lect1lect1lect1lect1lect1lect1lect1.ppt
Data Mining Course Overview Overview.ppt
data mining presentation power point for the study
Ad

More from FEG (20)

PDF
Supervised learning in decision tree algorithm
 
PDF
Unsupervised learning in data clustering
 
PDF
CNN_Image Classification for deep learning.pdf
 
PDF
Sequence Model with practicing hands on coding.pdf
 
PDF
Seq2seq Model introduction with practicing hands on coding.pdf
 
PDF
AIGEN introduction with practicing hands on coding.pdf
 
PDF
資料視覺化_Exploation_Data_Analysis_20241015.pdf
 
PDF
Operation_research_Linear_programming_20241015.pdf
 
PDF
Operation_research_Linear_programming_20241112.pdf
 
PDF
非監督是學習_Kmeans_process_visualization20241110.pdf
 
PDF
Sequence Model pytorch at colab with gpu.pdf
 
PDF
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
 
PDF
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
 
PDF
Pytorch cnn netowork introduction 20240318
 
PDF
2023 Decision Tree analysis in business practices
 
PDF
2023 Clustering analysis using Python from scratch
 
PDF
2023 Data visualization using Python from scratch
 
PDF
2023 Supervised Learning for Orange3 from scratch
 
PDF
202312 Exploration Data Analysis Visualization (English version)
 
PDF
202312 Exploration of Data Analysis Visualization
 
Supervised learning in decision tree algorithm
 
Unsupervised learning in data clustering
 
CNN_Image Classification for deep learning.pdf
 
Sequence Model with practicing hands on coding.pdf
 
Seq2seq Model introduction with practicing hands on coding.pdf
 
AIGEN introduction with practicing hands on coding.pdf
 
資料視覺化_Exploation_Data_Analysis_20241015.pdf
 
Operation_research_Linear_programming_20241015.pdf
 
Operation_research_Linear_programming_20241112.pdf
 
非監督是學習_Kmeans_process_visualization20241110.pdf
 
Sequence Model pytorch at colab with gpu.pdf
 
學院碩士班_非監督式學習_使用Orange3直接使用_分群_20240417.pdf
 
資料視覺化_透過Orange3進行_無須寫程式直接使用_碩士學程_202403.pdf
 
Pytorch cnn netowork introduction 20240318
 
2023 Decision Tree analysis in business practices
 
2023 Clustering analysis using Python from scratch
 
2023 Data visualization using Python from scratch
 
2023 Supervised Learning for Orange3 from scratch
 
202312 Exploration Data Analysis Visualization (English version)
 
202312 Exploration of Data Analysis Visualization
 
Ad

Recently uploaded (20)

PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Fluorescence-microscope_Botany_detailed content
PPT
Quality review (1)_presentation of this 21
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Moving the Public Sector (Government) to a Digital Adoption
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Miokarditis (Inflamasi pada Otot Jantung)
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Fluorescence-microscope_Botany_detailed content
Quality review (1)_presentation of this 21
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
1_Introduction to advance data techniques.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Reliability_Chapter_ presentation 1221.5784
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx

2023 Supervised_Learning_Association_Rules

  • 2. About me • Education • NCU (MIS)、NCCU (CS) • Work Experience • Telecom big data Innovation • AI projects • Retail marketing technology • User Group • TW Spark User Group • TW Hadoop User Group • Taiwan Data Engineer Association Director • Research • Big Data/ ML/ AIOT/ AI Columnist 2
  • 3. Tutorial Content 3 Getting started unsupervised learning with Orange3 (K-means and Associated Rules) Home works What is the unsupervised learning
  • 4. Supervised learning vs. Unsupervised learning • Supervised learning: Discover patterns in the data that relate data attributes with a target (class) attribute. • These patterns are then utilized to predict the values of the target attribute in future data instances. • Unsupervised learning: The data have no target attribute. • We want to explore the data to find some intrinsic structures in them. • Classic unsupervised learning algorithm • Clustering algorithms (Inductive/ Transductive learning) • Association rules (also called Market Basket Analysis) 4
  • 8. K-means (Data observation: Shall we PREPROC data?) 8
  • 10. Transform data before K-means • Many statistical tests make the assumption that datasets are normally distributed. • However, this is often NOT the case in practice. • Transformations: • Log Transformation: Transform the response variable from y to log(y). • Square Root Transformation: Transform the response variable from y to y1/2. • Cube Root Transformation: Transform the response variable from y to y1/3.
  • 14. Quiz 1: • Why should we transform data? • Answer 1 : To avoid overfitting. Ok, but what is the overfitting?
  • 16. Standardize Data • Standardization (Z-scores) rescales a dataset to have a mean of 0 and a standard deviation of 1. • We typically standardize data when we’d like to know how many standard deviations each value in a dataset lies from the mean.
  • 17. Normalize Data • Normalization rescales a dataset so that each value falls between 0 and 1. • Typically we normalize data when performing some type of analysis in which we have multiple variables that are measured on different scales and we want each of the variables to have the same range.
  • 18. Quiz 2: • When conducting K-means, how should categorical variable be handled? • When conducting K-means on numerical variables with severe skewness distribution, how to handle with it? • If we have segmented several groups by the re-scaled data, how to proceed new data and group assignment? (Using K-means) • Answer 1: Union all data, and rebuild model again. • Answer 2: ??
  • 20. Example of Cluster Analysis • Retail Marketing • The company can then send personalized advertisements or sales letters to each household based on how likely they are to respond to specific types of advertisements.
  • 21. Example of Cluster Analysis • Streaming Services • Using these metrics, a streaming service can perform cluster analysis to identify high usage and low usage users so that they can know who they should spend most of their advertising dollars on.
  • 22. Example of Cluster Analysis • Sports Science • They can then feed these variables into a clustering algorithm to identify players that are similar to each other so that they can have these players practice with each other and perform specific drills based on their strengths and weaknesses.
  • 23. Example of Cluster Analysis • Email Marketing • Using these metrics, a business can perform cluster analysis to identify consumers who use email in similar ways and tailor the types of emails and frequency of emails they send to different clusters of customers. https://email.uplers.com/blog/email-segmentation-recipe-great-email-marketing/
  • 24. Example of Cluster Analysis • Health Insurance • An actuary can then feed these variables into a clustering algorithm to identify households that are similar. The health insurance company can then set monthly premiums based on how often they expect households in specific clusters to use their insurance.
  • 26. Association Rules • In a transaction database with a large amount of data, look for items correlations. • The classic story of Walmart diapers and beer. • Selling these two unrelated products together can actually increase sales. 26 In general, the correlations can’t be obtained through direct observation, but through algorithms.
  • 27. Association Rules • Two steps as below. • First, obtain the frequent item sets! • A collection of items that often appear together. • Utilizing Apriori algorithm. • Second, generate Association Rules from frequent item sets! • There may be strong correlations based on frequent item sets. • Must meet the definition such like Min Supportance or Min confidence. 27
  • 28. Association Rules • From sales database, we found {B, C, E} items have high correlation. That is called frequent item sets. • According to {B, E} are likely to be purchased together, that is called strength of association. • How strong of association, we estimate Supportance and Confidence. 28
  • 29. Association Rules • Supportance • If the total transaction data has 200 records, and the item Sausage has 20 records, then its Supportance is 50/200 = 1/4, that is, the support of sausage is 25%. • Confidence range: [0, 1]. • Indicates the conditional probability of two items appearing at the same time. Simply put, it is the probability of item A appearing when item B has already appeared. 29 Confidence(A -> B) = • P(A|B): The probability that A will occur under the conditions that B occurs • P ( A ∩ B ) or P ( A , B ) or P ( A B ) : The probability that two events will occur together
  • 30. Association Rules • Min Supportance and Min Confidence: • Generally, we define support as 50%, which means that the purchased product set {A, B} appears in at least 50% of the total times before it is considered a frequent item set. 30 If the Supportance/ Confidence is set too low, too many association rules will appear in the results. If it is too high, there will be too few association rules, which is not conducive to us making decisions based on the association results. 。
  • 31. Association Rules • Outputs: • A bunch of rules are generated, we use to sorting by Supportance or Confidence to find what we are interesting.
  • 33. Association Rules (Item Bundle sales ) • Mixed Bundling 33
  • 34. Association Rules (Item Bundle sales ) • Cross industry bundling • Gund Teddy Bear and Amazon.com Gift Cards (Bundle)
  • 38. Have a look on Food dataset
  • 39. Metrics Description Supportance how often a rule is applicable to a given data set (rule/data) Confidence how frequently items in Y appear in transactions with X or in other words how frequently the rule is true (support for a rule/support of antecedent) Coverage how often antecedent item is found in the data set (support of antecedent/data) Strength (support of consequent/support of antecedent) Lift how frequently a rule is true per consequent item (data * confidence/support of consequent) Leverage the difference between two item appearing in a transaction and the two items appearing independently (support*data - antecedent support * consequent support/data2)
  • 42. Association Rules analysis A logical step would be to place Wine closer to the (Nuts, Aspirin, Pancakes) section The condition holds when looking from the left Antecedent toward on the right Consequent, but NOT in reverse!
  • 43. Association Rules analysis • If we are running a promotion for Wine, which products should we emphasize?
  • 44. Home works • Modifying the file format (20231121_hw.csv) to a format compatible with Orange 3 Association Rules. • Please identify what have you discovered any interesting association rules? The first row is all item names, go allover purchase item and mark the values 1; otherwise mark as ? (not 0)