SlideShare a Scribd company logo
Seminar On
DATA MINING
Presenting
YOGESH WAGHODE
J.T.MAHAJAN COLLEGE OF
ENGINEERING, FAIZPUR
(2022-2023)
Guided by
Prof. K. S. PATIL
DEPARTMENT OF SECOND YEAR
(COMPUTER) ENGINEERING
CONTENT
• Data Mining
• Data Mining Definition
• Data Mining – Two Main Components
• Data Mining vs. Data Analysis
• What is (not) Data Mining?
• Related Fields
• Data Mining Process
• Major Data Mining Tasks
• Uses of Data Mining
• Sources of Data for Mining
• Challenges of Data Mining
• Advantages
• Conclusion
• Reference
DATA MINING
• New buzzword, old idea.
• Inferring new information from already collected
data.
• Traditionally job of Data Analysts
• Computers have changed this.
Far more efficient to comb through data using a
machine than eyeballing statistical data.
DATA MINING DEFINITION
Data mining in Data is the non-trivial process of
Identifying
 valid
 novel
 potentially useful
 and ultimately understandable patterns in data.
DATA MINING VS. DATA
ANALYSIS
• In terms of software and the marketing there of
Data Mining = Data Analysis
• Data Mining implies software uses some intelligence
over simple grouping and partitioning of data to infer
new information.
• Data Analysis is more in line with standard statistical
software (ie: web stats). These usually present
information about subsets and relations within the
recorded data set
• (browser/search engine usage, average visit time, etc.
)
WHAT IS (NOT) DATA MINING?
WHAT IS NOT DATA
MINING?
•Look up phone number
in phone directory
•Query a Web search
engine for information
about “Amazon”
WHAT IS DATA MINING?
•Certain names are more
prevalent in certain US
locations (O’Brien,
O’Rurke, O’Reilly… in
Boston area)
• Group together similar
documents returned by
search engine according to
their context (e.g. Amazon
rainforest, Amazon.com,)
DATA MINING TECHNIQUES
• Classification
• Clustering
• Regression
• Association Rules
WHY MINE DATA? SCIENTIFIC
VIEWPOINT
 Data collected and stored at
enormous speeds (GB/hour)
o remote sensors on a satellite
o telescopes scanning the skies
o microarrays generating gene
expression data
o scientific simulations
generating terabytes of data
 Traditional techniques infeasible for raw data
 Data mining may help scientists
o in classifying and segmenting data
o in Hypothesis Formation
DATA MINING ARCHITECTURE
RELATED FIELDS
Statistics
Machine
Learning
Databases
Visualization
Data Mining and
Knowledge Discovery
__
__
__
__
__
__
__
__
__
Transformed
Data
Patterns
and
Rules
Target
Data
Raw
Dat
a
Knowledge
Interpretation
& Evaluation
Integration
Understanding
Data Mining Process
DATA
Ware
house
Knowledge
MAJOR DATA MINING TASKS
 Classification: predicting an item class
 Associations: e.g. A & B & C occur frequently
 Visualization: to facilitate human discovery
 Estimation: predicting a continuous value
 Deviation Detection: finding changes
 Link Analysis: finding relationships...
USES OF DATA MINING
• AI/Machine Learning
Combinatorial/Game Data Mining
Good for analyzing winning strategies to games, and thus developing
intelligent AI opponents. (ie: Chess)
• Business Strategies
Market Basket Analysis
Identify customer demographics, preferences, and purchasing patterns.
• Risk Analysis
Product Defect Analysis
Analyze product defect rates for given plants and predict possible
complications (read: lawsuits) down the line.
USES OF DATA MINING (CONT..)
• User Behavior Validation
Fraud Detection
In the realm of cell phones
Comparing phone activity to calling records. Can help
detect calls made on cloned phones.
Similarly, with credit cards, comparing purchases with
historical purchases. Can detect activity with stolen
cards.
USES OF DATA MINING (CONT..)
• Health and Science
Protein Folding
Predicting protein interactions and functionality within
biological cells. Applications of this research include
determining causes and possible cures for Alzheimers,
Parkinson's, and some cancers (caused by protein
"misfolds")
Extra-Terrestrial Intelligence
Scanning Satellite receptions for possible transmissions
from other planets.
• For more information see Stanford’s Folding@home and
SETI@home projects. Both involve participation in a
widely distributed computer application.
SOURCES OF DATA FOR MINING
• Databases (most obvious)
• Text Documents
• Computer Simulations
• Social Networks
ADVANTAGES OF DATA MINING
• Marketing / Retail
• Finance / Banking
• Manufacturing
• Governments
CHALLENGES OF DATA MINING
• Scalability
• Dimensionality
• Complex and Heterogeneous Data
• Data Quality
• Data Ownership and Distribution
• Privacy Preservation
• Streaming Data
CONCLUSION
• Comprehensive data warehouses that integrate operational
data with customer, supplier, and market information have
resulted in an explosion of information.
• Competition requires timely and sophisticated analysis on an
integrated view of the data.
• However, there is a growing gap between more powerful
storage and retrieval systems and the users’ ability to
effectively analyze and act on the information they contain.
REFERENCE
• www.google.com
• www.wikipedia.com
• www.studymafia.org
Thank You
For Your Attention

More Related Content

PPTX
Data-Mining-ppt.pptx
PPTX
Data-Mining-ppt (1).pptx
PPTX
data.2.pptx
PDF
Data-Mining-ppt (1).pdf
PPT
Data mining
PPT
Datamining
PPTX
DATA MINING seminar prjzkpwnshzghBwkwodoxjz
Data-Mining-ppt.pptx
Data-Mining-ppt (1).pptx
data.2.pptx
Data-Mining-ppt (1).pdf
Data mining
Datamining
DATA MINING seminar prjzkpwnshzghBwkwodoxjz

Similar to Yogesh Waghode Data-Mining-ppt seminar report (20)

PPT
Data Mining- Unit-I PPT (1).ppt
PPT
lecture1.ppt
PPTX
Aggahsbsbsbsbsbsbsbsbsbwbshhwhwhwgwhwhwh
PPT
Chapter 1. Introduction
PDF
01datamining.pdf
PPT
Lecture1
PPTX
Data mining
PPTX
Data mining techniques
PPT
`Data mining
PPTX
lec01-IntroductionToDataMining.pptx
PDF
Lect 1 introduction
PDF
Module-1-IntroductionToDataMining (Data Mining)
PPT
PPT
PDF
Data mining chapter for students of university
PPT
introduction to data minining and unit iii
PPT
Introduction of Data Mining - Concept and techniques
PPT
Introduction to data warehouse
PPTX
Data Mining in Operating System
PPT
Unit 1 (Chapter-1) on data mining concepts.ppt
Data Mining- Unit-I PPT (1).ppt
lecture1.ppt
Aggahsbsbsbsbsbsbsbsbsbwbshhwhwhwgwhwhwh
Chapter 1. Introduction
01datamining.pdf
Lecture1
Data mining
Data mining techniques
`Data mining
lec01-IntroductionToDataMining.pptx
Lect 1 introduction
Module-1-IntroductionToDataMining (Data Mining)
Data mining chapter for students of university
introduction to data minining and unit iii
Introduction of Data Mining - Concept and techniques
Introduction to data warehouse
Data Mining in Operating System
Unit 1 (Chapter-1) on data mining concepts.ppt
Ad

Recently uploaded (20)

PDF
6.-propertise of noble gases, uses and isolation in noble gases
PPTX
nose tajweed for the arabic alphabets for the responsive
PPTX
Phylogeny and disease transmission of Dipteran Fly (ppt).pptx
PPTX
NORMAN_RESEARCH_PRESENTATION.in education
DOCX
Action plan to easily understanding okey
PPTX
Intro to ISO 9001 2015.pptx wareness raising
PPTX
chapter8-180915055454bycuufucdghrwtrt.pptx
PPTX
Hydrogel Based delivery Cancer Treatment
PDF
natwest.pdf company description and business model
PPTX
Impressionism_PostImpressionism_Presentation.pptx
DOC
LSTM毕业证学历认证,利物浦大学毕业证学历认证怎么认证
PPTX
Tour Presentation Educational Activity.pptx
PDF
Unnecessary information is required for the
PPTX
Research Process - Research Methods course
PDF
Microsoft-365-Administrator-s-Guide_.pdf
PPTX
ART-APP-REPORT-FINctrwxsg f fuy L-na.pptx
DOCX
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
PPTX
Tablets And Capsule Preformulation Of Paracetamol
PPTX
3RD-Q 2022_EMPLOYEE RELATION - Copy.pptx
DOCX
"Project Management: Ultimate Guide to Tools, Techniques, and Strategies (2025)"
6.-propertise of noble gases, uses and isolation in noble gases
nose tajweed for the arabic alphabets for the responsive
Phylogeny and disease transmission of Dipteran Fly (ppt).pptx
NORMAN_RESEARCH_PRESENTATION.in education
Action plan to easily understanding okey
Intro to ISO 9001 2015.pptx wareness raising
chapter8-180915055454bycuufucdghrwtrt.pptx
Hydrogel Based delivery Cancer Treatment
natwest.pdf company description and business model
Impressionism_PostImpressionism_Presentation.pptx
LSTM毕业证学历认证,利物浦大学毕业证学历认证怎么认证
Tour Presentation Educational Activity.pptx
Unnecessary information is required for the
Research Process - Research Methods course
Microsoft-365-Administrator-s-Guide_.pdf
ART-APP-REPORT-FINctrwxsg f fuy L-na.pptx
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
Tablets And Capsule Preformulation Of Paracetamol
3RD-Q 2022_EMPLOYEE RELATION - Copy.pptx
"Project Management: Ultimate Guide to Tools, Techniques, and Strategies (2025)"
Ad

Yogesh Waghode Data-Mining-ppt seminar report

  • 2. J.T.MAHAJAN COLLEGE OF ENGINEERING, FAIZPUR (2022-2023) Guided by Prof. K. S. PATIL DEPARTMENT OF SECOND YEAR (COMPUTER) ENGINEERING
  • 3. CONTENT • Data Mining • Data Mining Definition • Data Mining – Two Main Components • Data Mining vs. Data Analysis • What is (not) Data Mining? • Related Fields • Data Mining Process • Major Data Mining Tasks • Uses of Data Mining • Sources of Data for Mining • Challenges of Data Mining • Advantages • Conclusion • Reference
  • 4. DATA MINING • New buzzword, old idea. • Inferring new information from already collected data. • Traditionally job of Data Analysts • Computers have changed this. Far more efficient to comb through data using a machine than eyeballing statistical data.
  • 5. DATA MINING DEFINITION Data mining in Data is the non-trivial process of Identifying  valid  novel  potentially useful  and ultimately understandable patterns in data.
  • 6. DATA MINING VS. DATA ANALYSIS • In terms of software and the marketing there of Data Mining = Data Analysis • Data Mining implies software uses some intelligence over simple grouping and partitioning of data to infer new information. • Data Analysis is more in line with standard statistical software (ie: web stats). These usually present information about subsets and relations within the recorded data set • (browser/search engine usage, average visit time, etc. )
  • 7. WHAT IS (NOT) DATA MINING? WHAT IS NOT DATA MINING? •Look up phone number in phone directory •Query a Web search engine for information about “Amazon” WHAT IS DATA MINING? •Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly… in Boston area) • Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,)
  • 8. DATA MINING TECHNIQUES • Classification • Clustering • Regression • Association Rules
  • 9. WHY MINE DATA? SCIENTIFIC VIEWPOINT  Data collected and stored at enormous speeds (GB/hour) o remote sensors on a satellite o telescopes scanning the skies o microarrays generating gene expression data o scientific simulations generating terabytes of data  Traditional techniques infeasible for raw data  Data mining may help scientists o in classifying and segmenting data o in Hypothesis Formation
  • 13. MAJOR DATA MINING TASKS  Classification: predicting an item class  Associations: e.g. A & B & C occur frequently  Visualization: to facilitate human discovery  Estimation: predicting a continuous value  Deviation Detection: finding changes  Link Analysis: finding relationships...
  • 14. USES OF DATA MINING • AI/Machine Learning Combinatorial/Game Data Mining Good for analyzing winning strategies to games, and thus developing intelligent AI opponents. (ie: Chess) • Business Strategies Market Basket Analysis Identify customer demographics, preferences, and purchasing patterns. • Risk Analysis Product Defect Analysis Analyze product defect rates for given plants and predict possible complications (read: lawsuits) down the line.
  • 15. USES OF DATA MINING (CONT..) • User Behavior Validation Fraud Detection In the realm of cell phones Comparing phone activity to calling records. Can help detect calls made on cloned phones. Similarly, with credit cards, comparing purchases with historical purchases. Can detect activity with stolen cards.
  • 16. USES OF DATA MINING (CONT..) • Health and Science Protein Folding Predicting protein interactions and functionality within biological cells. Applications of this research include determining causes and possible cures for Alzheimers, Parkinson's, and some cancers (caused by protein "misfolds") Extra-Terrestrial Intelligence Scanning Satellite receptions for possible transmissions from other planets. • For more information see Stanford’s Folding@home and SETI@home projects. Both involve participation in a widely distributed computer application.
  • 17. SOURCES OF DATA FOR MINING • Databases (most obvious) • Text Documents • Computer Simulations • Social Networks
  • 18. ADVANTAGES OF DATA MINING • Marketing / Retail • Finance / Banking • Manufacturing • Governments
  • 19. CHALLENGES OF DATA MINING • Scalability • Dimensionality • Complex and Heterogeneous Data • Data Quality • Data Ownership and Distribution • Privacy Preservation • Streaming Data
  • 20. CONCLUSION • Comprehensive data warehouses that integrate operational data with customer, supplier, and market information have resulted in an explosion of information. • Competition requires timely and sophisticated analysis on an integrated view of the data. • However, there is a growing gap between more powerful storage and retrieval systems and the users’ ability to effectively analyze and act on the information they contain.
  • 22. Thank You For Your Attention