SlideShare a Scribd company logo
2
Most read
3
Most read
6
Most read
Data Mining Input: Concepts, Instances, and Attributes
Input takes the following forms:Concept: The thing that is to be learned is called the concept. Concept  should be :
Intelligible in that it can be understood
Operational in that it can be applied to actual examples
Instances: The data present consists of various instances of the class. E.g. the table below consists of 2 instances
Attributes: Each instance of the class has various attributes. E.g. the table bellow consists of two attributes {Name, Age}Types of learning in data miningClassification learning:
Learning scheme is presented with a set of classified examples from which it is expected to learn a way of classifying unseen examples
Also called supervised learning
E.g. Classification rules for the weather forecasting problem      If outlook = sunny and humidity = high then play = no      If outlook = rainy and windy = true         then play = no      If outlook = overcast                                   then play =  yes
Numeric prediction
Same as classification learning but the outcome to be predicted is not a discreet class but a numeric quantity
Clustering
Groups of examples that belong together are sought and clubbed together in a cluster
E.g. based on the data with a bank the following relation between debt and income was seen:Association rules
Any association among features is sought, not just ones that predict a particular class value
It predicts any attribute, not just the class
It can predict more than one attribute value at a time
E.g. from the following super market data it can be concluded: If milk and bread is bought, customers also buy butterFew important terms…Concept description: Output produced by a learning scheme

More Related Content

PPT
Processor Allocation (Distributed computing)
PPTX
Graph coloring using backtracking
PPTX
Timestamp protocols
PPT
5 Data Modeling for NoSQL 1/2
PPT
Distributed data processing
DOCX
Software Engineering Assignment
PPTX
8 queens problem using back tracking
PPTX
Water jug problem ai part 6
Processor Allocation (Distributed computing)
Graph coloring using backtracking
Timestamp protocols
5 Data Modeling for NoSQL 1/2
Distributed data processing
Software Engineering Assignment
8 queens problem using back tracking
Water jug problem ai part 6

What's hot (20)

PPTX
Distributed database management system
PPT
Ports & sockets
PPTX
Transaction Processing Concept
PDF
Artificial Intelligence - Hill climbing.
PPT
Deadlock Detection in Distributed Systems
PPT
Chapter 4 data link layer
PPTX
Uninformed search /Blind search in AI
PPT
Two phase commit protocol in dbms
PPTX
01 Introduction to Data Mining
DOCX
Concurrency Control Techniques
PPTX
Link state routing protocol
PPTX
Presentations on web database
PPTX
Challenges of Conventional Systems.pptx
PPTX
Introduction to NoSQL
DOCX
Multiversion Concurrency Control Techniques
PPTX
Distance Vector Routing
PPT
15. Transactions in DBMS
PPSX
Token ring
PPTX
Shortest path algorithm
Distributed database management system
Ports & sockets
Transaction Processing Concept
Artificial Intelligence - Hill climbing.
Deadlock Detection in Distributed Systems
Chapter 4 data link layer
Uninformed search /Blind search in AI
Two phase commit protocol in dbms
01 Introduction to Data Mining
Concurrency Control Techniques
Link state routing protocol
Presentations on web database
Challenges of Conventional Systems.pptx
Introduction to NoSQL
Multiversion Concurrency Control Techniques
Distance Vector Routing
15. Transactions in DBMS
Token ring
Shortest path algorithm
Ad

Viewers also liked (20)

PDF
Classification and Clustering Analysis using Weka
PDF
Data Mining using Weka
PPT
An Introduction To Weka
PPT
Weka presentation
DOCX
Data mining techniques using weka
PPT
WEKA Tutorial
PPTX
WEKA: Algorithms The Basic Methods
PPTX
WEKA: The Experimenter
PPTX
WEKA: The Knowledge Flow Interface
PPTX
Data For Datamining
PDF
Query Directed Data Mining
PDF
K nearest neighbor classification over semantically secure encrypted relation...
DOCX
Sesión mat resolvemos problemas de equilibrio copia
PPTX
WEKA:Output Knowledge Representation
PPT
Data Mining with WEKA WEKA
PDF
Fun with Python
PPT
Dummy variables xd
PDF
Data Visualization(s) Using Python
PPTX
WEKA:Data Mining Input Concepts Instances And Attributes
PPTX
Aprendizagem Supervisionada I
Classification and Clustering Analysis using Weka
Data Mining using Weka
An Introduction To Weka
Weka presentation
Data mining techniques using weka
WEKA Tutorial
WEKA: Algorithms The Basic Methods
WEKA: The Experimenter
WEKA: The Knowledge Flow Interface
Data For Datamining
Query Directed Data Mining
K nearest neighbor classification over semantically secure encrypted relation...
Sesión mat resolvemos problemas de equilibrio copia
WEKA:Output Knowledge Representation
Data Mining with WEKA WEKA
Fun with Python
Dummy variables xd
Data Visualization(s) Using Python
WEKA:Data Mining Input Concepts Instances And Attributes
Aprendizagem Supervisionada I
Ad

Similar to WEKA: Data Mining Input Concepts Instances And Attributes (20)

PPT
Data Mining: Practical Machine Learning Tools and Techniques ...
PPT
Wk. 3. Data [12-05-2021] (2).ppt
PPTX
Pengertian data dan Informasi pada mata kuliah analisa data
PDF
Data Mining - Introduction and Data
PPT
Its all about data mining
DOC
DATA MINING.doc
PPT
Datamining
PPT
PDF
Lecture 2 - Data Mining (Data mining).pdf
PPT
(Talk in Powerpoint Format)
PDF
Lect 2 getting to know your data
PPTX
unit 1.pptx
PPTX
Data mining Basics and complete description
DOCX
Data Mining DataLecture Notes for Chapter 2Introduc
PPTX
Preprocessing_exploring_and_Visualization.pptx
PDF
Data mining and data warehouse lab manual updated
PDF
Ch.3 Data Science Data Preprocessing.pdf
PDF
BIM Data Mining Unit2 by Tekendra Nath Yogi
PPTX
omama munir 58.pptx
PPTX
Data Preprocessing
Data Mining: Practical Machine Learning Tools and Techniques ...
Wk. 3. Data [12-05-2021] (2).ppt
Pengertian data dan Informasi pada mata kuliah analisa data
Data Mining - Introduction and Data
Its all about data mining
DATA MINING.doc
Datamining
Lecture 2 - Data Mining (Data mining).pdf
(Talk in Powerpoint Format)
Lect 2 getting to know your data
unit 1.pptx
Data mining Basics and complete description
Data Mining DataLecture Notes for Chapter 2Introduc
Preprocessing_exploring_and_Visualization.pptx
Data mining and data warehouse lab manual updated
Ch.3 Data Science Data Preprocessing.pdf
BIM Data Mining Unit2 by Tekendra Nath Yogi
omama munir 58.pptx
Data Preprocessing

More from DataminingTools Inc (20)

PPTX
Terminology Machine Learning
PPTX
Techniques Machine Learning
PPTX
Machine learning Introduction
PPTX
Areas of machine leanring
PPTX
AI: Planning and AI
PPTX
AI: Logic in AI 2
PPTX
AI: Logic in AI
PPTX
AI: Learning in AI 2
PPTX
AI: Learning in AI
PPTX
AI: Introduction to artificial intelligence
PPTX
AI: Belief Networks
PPTX
AI: AI & Searching
PPTX
AI: AI & Problem Solving
PPTX
Data Mining: Text and web mining
PPTX
Data Mining: Outlier analysis
PPTX
Data Mining: Mining stream time series and sequence data
PPTX
Data Mining: Mining ,associations, and correlations
PPTX
Data Mining: Graph mining and social network analysis
PPTX
Data warehouse and olap technology
PPTX
Data Mining: Data processing
Terminology Machine Learning
Techniques Machine Learning
Machine learning Introduction
Areas of machine leanring
AI: Planning and AI
AI: Logic in AI 2
AI: Logic in AI
AI: Learning in AI 2
AI: Learning in AI
AI: Introduction to artificial intelligence
AI: Belief Networks
AI: AI & Searching
AI: AI & Problem Solving
Data Mining: Text and web mining
Data Mining: Outlier analysis
Data Mining: Mining stream time series and sequence data
Data Mining: Mining ,associations, and correlations
Data Mining: Graph mining and social network analysis
Data warehouse and olap technology
Data Mining: Data processing

Recently uploaded (20)

PDF
Modernizing your data center with Dell and AMD
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Big Data Technologies - Introduction.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Cloud computing and distributed systems.
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Encapsulation_ Review paper, used for researhc scholars
Modernizing your data center with Dell and AMD
The AUB Centre for AI in Media Proposal.docx
Big Data Technologies - Introduction.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Cloud computing and distributed systems.
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Reach Out and Touch Someone: Haptics and Empathic Computing
Advanced methodologies resolving dimensionality complications for autism neur...
MYSQL Presentation for SQL database connectivity
Building Integrated photovoltaic BIPV_UPV.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation_ Review paper, used for researhc scholars

WEKA: Data Mining Input Concepts Instances And Attributes

  • 1. Data Mining Input: Concepts, Instances, and Attributes
  • 2. Input takes the following forms:Concept: The thing that is to be learned is called the concept. Concept should be :
  • 3. Intelligible in that it can be understood
  • 4. Operational in that it can be applied to actual examples
  • 5. Instances: The data present consists of various instances of the class. E.g. the table below consists of 2 instances
  • 6. Attributes: Each instance of the class has various attributes. E.g. the table bellow consists of two attributes {Name, Age}Types of learning in data miningClassification learning:
  • 7. Learning scheme is presented with a set of classified examples from which it is expected to learn a way of classifying unseen examples
  • 9. E.g. Classification rules for the weather forecasting problem If outlook = sunny and humidity = high then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes
  • 11. Same as classification learning but the outcome to be predicted is not a discreet class but a numeric quantity
  • 13. Groups of examples that belong together are sought and clubbed together in a cluster
  • 14. E.g. based on the data with a bank the following relation between debt and income was seen:Association rules
  • 15. Any association among features is sought, not just ones that predict a particular class value
  • 16. It predicts any attribute, not just the class
  • 17. It can predict more than one attribute value at a time
  • 18. E.g. from the following super market data it can be concluded: If milk and bread is bought, customers also buy butterFew important terms…Concept description: Output produced by a learning scheme
  • 19. Flat file: Each dataset is represented as a matrix of instances versus attributes, which in database terms is a single relationship, or a flat file
  • 20. Closed world assumption: The idea of specifying only positive examples and adopting a standing assumption that the rest are negative is called closed world assumptionSteps to prepare dataData assembly and aggregationData integration Data Cleaning 4. General preparation
  • 21. Data assembly and aggregationInstances which are there in the input should be independent
  • 22. Independence can be achieved by de-normalization
  • 23. In database terms, take two relations and join them together to make one, a process of flattening that is technically called de-normalization
  • 24. Possible with finite set of finite relationsInput is a family tree
  • 25. We are trying to find ‘Sister of’ relation shipEach row of tree mapped to instances:We cant make sense of this with respect to our requirement or concept. Therefore …….
  • 26. We de-normalize these tables to get:Here we can clearly see the ‘Sister of’ relationship
  • 27. Problems with de-normalization:If relationship between large number of items is required then tables will be hugeIt produces irregularities in data that are completely spuriousRelations might not be finite (use: Inductive logic programming)Overlay data: Sometimes data relevant to the problem at hand needs to be collected from outside of the organization. This is called overlay data.
  • 28. Data IntegrationIntegration of system wide databases is difficult because different departments will use/have:Different style of record keepingDifferent conventions Different degrees of data aggregations etcDifferent types of errorsDifferent time periodDifferent primary keys These issues are taken care by the idea of company wide databases, a process called as data warehousing
  • 29. Data CleaningData cleaning is the careful checking of data It helps in resolving many architectural issues with different databasesData cleaning usually requires good domain knowledge
  • 30. Attribute-Relation File Format (ARFF)Definition: An ARFF file is an ASCII text file that describes a list of instances sharing a set of attributesConventions used in ARFF :ARFF Header Line beginning with % are comments To declare relation: @relation <name of relation>To declare attribute: @attribute <attribute> <data type>ARFF Data SectionTo start the actual data: @data, followed by row wise CS data
  • 31. Data type for ARFF:Numeric can be real or integer numbersNominal values are defined by providing <nominal-specification> listing the possible values: {nm-value1, nm-value2,…} e.g. {yes, no}Values separated by space must be quotedString attributes allow us to create attributes containing arbitrary textual values Date type is used as: @attribute <name> date [<date-format>]The default date format is ISO-8601 combined date and time format:”yyyy-MM-dd’T’HH:mm:ss” Missing values are represented by ?
  • 32. Sparse ARFF filesSparse ARFF files are very similar to ARFF files, but data with value 0 are not be explicitly representedSame header as ARFF but different data section. Instead of representing each value in order, like this:@data 0, X, 0, Y, “class A”The non zero attributes are explicitly identified by attribute number(starting from zero) and their value stated , like this:@data{1X, 3Y,4 “class A”}
  • 33. Visit more self help tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net