SlideShare a Scribd company logo
Project II
Data Mining a
Mushroom Dataset
Group 1
Raymond Borges
Jarilyn Hernandez
The Mushroom Dataset
Data Set                      Number of
                 Multivariate            8124 Area:           Life
Characteristics:              Instances:
Attribute                    Number of           Date
                 Categorical             22               1987
Characteristics:             Attributes:         Donated:

This data set includes descriptions of hypothetical samples
corresponding to 23 species of gilled mushrooms in the
Agaricus and Lepiota Family.

Each species is identified as definitely edible, definitely
poisonous, or of unknown edibility and not recommended.
This latter class was combined with the poisonous one.
Mushroom Dataset
 22 Independent attributes
 1 Class Attribute (Can you eat it?)
Edible(4,208)51.8%
Poisonous(3,916)48.2%
Mushroom Dataset
22 Attributes Total
18 Intrinsically
on Mushroom

4 Others
1 Habitat
1 Population
1 Bruises
1 Odor
Odor attribute, 1R Learner
The Simplest Rule 98.52% Acc.
A = almond             N = none
C = creosote           P = pungent
F = foul               S = spicy
L = anise              Y = fishy
M = musty




           a   c   f   l    m n      p   s   y
J48 Tree 100%                                                     E = Edible
Classification                                                    P = Poisonous



   E       P           P         E          P                 P        P           P
almond creosote    foul      anise        musty   none pungent spicy              fishy


   E      E        E         E             P          E       E                   E

 black   brown    buff chocolate green orange purple white                    yellow


                                                                              E
                            P                             E
                                                              narrow       broad
                           close         crowded distant

          E            P             E            E           E        E
       abundant clustered numerous scattered several               solitary
Simplest rule-set (Benchmark)
These are Poisonous
1. Odor = not almond or anise or none
(120 poisonous cases missed, 98.52% accuracy)

2. Spore-print-color =green
(48 cases missed, 99.41% accuracy)

3. Odor=none and stalk-surface-below-ring = scaly
 and stalk-color-above-ring= not brown
(8 cases missed, 99.90% accuracy)

4. Habitat= leaves and cap-color=white
4. May also be population=clustered and cap-color=white
(100% accuracy)
Habitat Insights
Waste is safe but stay away from paths




Woods   Grasses   Leaves Meadows Paths   Urban   Waste
Population Insights
  Mushrooms travel safer in groups




Abundant Clustered Numerous Scattered   Several   Solitary
Information  Knowledge

         Population Data                                        %Rates vs. Mushrooms
                                                           120.00%

                                                           100.00%

                                                            80.00%

                                                            60.00%

                                                            40.00%

                                                            20.00%

Abundant Clustered Numerous Scattered Several   Solitary     0.00%




                                                                     % Poisonous   % Edible
Poisonous/Edible Ratio
vs. Mushroom Population Density
                         300.00%


                         250.00%
                                                          several
Poisonous/Edible Ratio




                         200.00%


                         150.00%


                         100.00%


                          50.00%           solitary
                                                                        scattered
                                                                                           clustered
                           0.00%                                                    numerous         abundant
                                   0   1              2             3          4          5        6       7

                         -50.00%
                                                             Mushroom Density
Conclusions
 If   it stinks don’t eat it, 98.52% accuracy

 Ifit doesn’t stink and it’s spore color is not
  green then you have a 99.41% chance of
  survival

 Odor  and spore color may be the best
  attributes statistically but not in the field
Future Work
   Use more easily identified attributes to classify
    mushrooms to produce a method of easier
    visual classification

   Eliminate nonvisual attributes

Focus on visual-queue attributes, e.g.
habitat, population, cap and stalk

   Compare the two methods

More Related Content

PPTX
Mushroom tutorial http://rjdatamining.weebly.com
PPTX
Project 3 mushrooms
PDF
Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...
PDF
Empirical Study on Classification Algorithm For Evaluation of Students Academ...
PPTX
DOCX
PROJECT_REPORT_FINAL
PDF
HCI - Individual Report for Metrolink App
PPTX
Assessing Component based ERP Architecture for Developing Organizations
Mushroom tutorial http://rjdatamining.weebly.com
Project 3 mushrooms
Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...
Empirical Study on Classification Algorithm For Evaluation of Students Academ...
PROJECT_REPORT_FINAL
HCI - Individual Report for Metrolink App
Assessing Component based ERP Architecture for Developing Organizations

Viewers also liked (17)

DOCX
Group7_Datamining_Project_Report_Final
PPTX
Support Vector Machine(SVM) with Iris and Mushroom Dataset
PPT
Scopus Overview
PDF
EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014
PPT
Plagiarism for Faculty Workshop
PDF
ANDROID IEEE PROJECT TITLES 2014
PPTX
Why publish in an international journal?
DOCX
Embedded project titles1:2015-2016
PDF
PROJECTS FROM SHPINE TECHNOLOGIES
DOCX
Java course
PDF
Matlab titles 2015 2016
PPTX
Marshmallow
PPSX
Android os by jje
PDF
Android ieee project titles 2015 2016
PDF
Java titles 2015 2016
DOCX
Dot Net Course Syllabus
PDF
Introduction to iOS and Objective-C
Group7_Datamining_Project_Report_Final
Support Vector Machine(SVM) with Iris and Mushroom Dataset
Scopus Overview
EMBEDDED-MICRO CONTROLLER BASED WIRELESS PROJECTS TITLES2014
Plagiarism for Faculty Workshop
ANDROID IEEE PROJECT TITLES 2014
Why publish in an international journal?
Embedded project titles1:2015-2016
PROJECTS FROM SHPINE TECHNOLOGIES
Java course
Matlab titles 2015 2016
Marshmallow
Android os by jje
Android ieee project titles 2015 2016
Java titles 2015 2016
Dot Net Course Syllabus
Introduction to iOS and Objective-C
Ad

Recently uploaded (20)

PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Machine learning based COVID-19 study performance prediction
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation theory and applications.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
KodekX | Application Modernization Development
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Machine learning based COVID-19 study performance prediction
Per capita expenditure prediction using model stacking based on satellite ima...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation theory and applications.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
MYSQL Presentation for SQL database connectivity
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Approach and Philosophy of On baking technology
Spectral efficient network and resource selection model in 5G networks
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
KodekX | Application Modernization Development
The AUB Centre for AI in Media Proposal.docx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Ad

Project 2 Data Mining Part 1

  • 1. Project II Data Mining a Mushroom Dataset Group 1 Raymond Borges Jarilyn Hernandez
  • 2. The Mushroom Dataset Data Set Number of Multivariate 8124 Area: Life Characteristics: Instances: Attribute Number of Date Categorical 22 1987 Characteristics: Attributes: Donated: This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family. Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one.
  • 3. Mushroom Dataset  22 Independent attributes  1 Class Attribute (Can you eat it?) Edible(4,208)51.8% Poisonous(3,916)48.2%
  • 4. Mushroom Dataset 22 Attributes Total 18 Intrinsically on Mushroom 4 Others 1 Habitat 1 Population 1 Bruises 1 Odor
  • 5. Odor attribute, 1R Learner The Simplest Rule 98.52% Acc. A = almond N = none C = creosote P = pungent F = foul S = spicy L = anise Y = fishy M = musty a c f l m n p s y
  • 6. J48 Tree 100% E = Edible Classification P = Poisonous E P P E P P P P almond creosote foul anise musty none pungent spicy fishy E E E E P E E E black brown buff chocolate green orange purple white yellow E P E narrow broad close crowded distant E P E E E E abundant clustered numerous scattered several solitary
  • 7. Simplest rule-set (Benchmark) These are Poisonous 1. Odor = not almond or anise or none (120 poisonous cases missed, 98.52% accuracy) 2. Spore-print-color =green (48 cases missed, 99.41% accuracy) 3. Odor=none and stalk-surface-below-ring = scaly and stalk-color-above-ring= not brown (8 cases missed, 99.90% accuracy) 4. Habitat= leaves and cap-color=white 4. May also be population=clustered and cap-color=white (100% accuracy)
  • 8. Habitat Insights Waste is safe but stay away from paths Woods Grasses Leaves Meadows Paths Urban Waste
  • 9. Population Insights Mushrooms travel safer in groups Abundant Clustered Numerous Scattered Several Solitary
  • 10. Information  Knowledge Population Data %Rates vs. Mushrooms 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% Abundant Clustered Numerous Scattered Several Solitary 0.00% % Poisonous % Edible
  • 11. Poisonous/Edible Ratio vs. Mushroom Population Density 300.00% 250.00% several Poisonous/Edible Ratio 200.00% 150.00% 100.00% 50.00% solitary scattered clustered 0.00% numerous abundant 0 1 2 3 4 5 6 7 -50.00% Mushroom Density
  • 12. Conclusions  If it stinks don’t eat it, 98.52% accuracy  Ifit doesn’t stink and it’s spore color is not green then you have a 99.41% chance of survival  Odor and spore color may be the best attributes statistically but not in the field
  • 13. Future Work  Use more easily identified attributes to classify mushrooms to produce a method of easier visual classification  Eliminate nonvisual attributes Focus on visual-queue attributes, e.g. habitat, population, cap and stalk  Compare the two methods

Editor's Notes