SlideShare a Scribd company logo
TopicMapsforAssociation Rule MiningTomášKliegr, Jan Zemánek, Marek OvečkaDepartment ofInformationandKnowledgeEngineeringFacultyofInformaticsandStatisticsUniversity ofEconomics, Prague
Data Mining using CRISP-DMThe goal of data mining is to obtain useful non-trivial patterns from the data.Analytical Report
Common data mining tasksSex(M) andSalary(Low) andDistrict(Havlickuv Brod) => Quality(Bad)Association rulesClusteringClassification
Association Rule MiningEXAMPLEUnlike clustering and classification, association rules provide true “nuggets” – rules meeting selectedinterestmeasuresDuration(2y+)andDistrict(Prague)=>Loan Quality(good)THE QUEST FOR TOPIC MAPSAntecedentConsequentTHE PROBLEM WITH INTEREST MEASURESItisusually not possible to tweaktheinterestmeasurethresholdssothatonlythereallyinterestingrules are output. To be on the safe side, we often get (many!) more rulesthandesired, Selectthereallyinterestingrulesfromtherulesoutputautomatically.Help searchingthroughtheresults.
ThequestMore precise tasks	orAutomatic rule filteringThe lingua franca for exchange of data mining models is PMML
Predictive Modeling Markup LanguageXML SchemaPMML is the leading standard for statistical and data mining modelsSupported by over 20 vendors and organizationsCovers the technical part of the CRISP-DM Cyclehttp://www.dmg.org/pmml_examples/index.html
PMML is “just” an XML SchemaDeveloped for deploying mining models Good for migration from one data mining environment to anotherBut:No explicit links between nodesVerboseSelf-contained. Lacks support forInterlinking multiple PMML documentsInterlinking PMML with other information
Association Rule Mining OntologyThe ontology is a „semantization“ of PMML XML SchemaDESIGN GUIDELINESThekey design principlewas to alloweasytransformationof data from PMML to AROnSCOPEThe ontology is limited to thesubsetof PMML relevant toassociation rule mining. 60 topictypes, 50 associationtypesand 20 occurencetypesUSENo automatictransformationisyetavailable, butwe are working on oneusing OKS framework. Currently, data canbe input usingOntopoly.
xs:element ismapped to topic typeTopics are assignedsamenames as PMML NodesButrespectingspacesbetweenwordsandcapitalizationSuperclasses are introducedforsemanticallysimilar XML NodesNamedelementsused as children in otherelementsthatcarry most ofthesemanticsoftheirparents are mergedwithparentIfan XML element has a directlycorrespondingtopic type in the ontology, the URI ofthe XML element withintheschemaisused as subjectidentifierDesign guidelines: Elements
Design guidelines: AttributesEnumerationrestriction on anattributeismapped as a topic type withenumerationsuperclass (thisis a workaroundformissing TMCL support in OKS)Attributesthatcouldbeinterpreted as reference to otherelementsbecomeassociationsOtherattributesbecomeoccurencetypes
Design guidelines: AssociationsNames for association types are arbitrarily chosen so that they are most descriptiveIntroduce less rather than more associations minimizes the effort when populating the ontology from PMMLAvoid unnecessary inflation of the topic mapLink only the semantically closest topicsAdditional „soft“relations can be introduced with inference statementsorderivedwithtolog
Design guidelines: Role typesTopictypesused to map PMML elements are used as role typesUnless multiple topics are permitted in  associationend. In that case superclassisused as a role type, or a new role type isintroduced
Twoalternativeassociation rulerepresentationsAprioribased(Item-Itemset)GUHA based(BooleanAttributes)
OngoingworkSupport for background knowledge „alreadyknownassociationrules“Support forschemamapping „linkingof background knowledgewithminingresults“Already in the ontology, distinguished by base ofsubjectidentifierSchemaMappinghttp://keg.vse.cz/sma/XXXBackground Knowledgehttp://keg.vse.cz/bko/xxx
Data Mining Use casePREDICT LOAN QUALITYFindclientcharacteristicsthatcouldbeused to predicttheirattitude to payingback a loan.BASED ON PAST RECORDS   Input data: records on alreadygivenloans
The data6181 clients in the PKDD’99 financial datasetData were preprocessed, i.e.
….And perhaps 9997 otherassociationrulesPreprocessed dataAssociation Rule Learner
WE CAN’T PRESENT ALL 10.000 RULES TO THE CLIENTASK CLIENT WHAT HE KNOWSIfloandurationis more thantwoyearsandtheloanwasgiven in Praguedistrict, wecanexpectgoodloanquality.				…background knowledge
Semantizetheresults
Formalize Background Knowledge
SchemaMappingBackground knowledge can use different “vocabulary” than the data If we are to use background knowledge in querying, we need to interlink them with data.The same approach would apply if we interlink several mining models (PMMLs)
DeletinginformationwithTopicMapsFind association rules that subsume background knowledgeVisualizationof a tologquery
SummaryMethodology for transferring XML Schema to Topic MapsAssociation Rule Mining Ontology based on PMMLEasily extensible to other data mining algorithmsInitial attempts to formalize background knowledgeInitial attempts to use Topic Maps for schema mappingAROn On-Line: http://maiana.topicmapslab.de/u/lmaicher/tm/kliegr

More Related Content

PDF
Semantic data mining: an ontology based approach
PDF
Multiview Alignment Hashing for Efficient Image Search
PDF
TMRAP - Topic Maps Remote Access Protocol
PPT
H-Maps: An Efficient Approach for Graphical Visualization and Navigation of T...
PPTX
idSpace: Distributed Collaborative Product Innovation
PPTX
Fernando Sancho Caparrini
PDF
Towards an automatic semantic integration of information
PDF
bioequivalence
Semantic data mining: an ontology based approach
Multiview Alignment Hashing for Efficient Image Search
TMRAP - Topic Maps Remote Access Protocol
H-Maps: An Efficient Approach for Graphical Visualization and Navigation of T...
idSpace: Distributed Collaborative Product Innovation
Fernando Sancho Caparrini
Towards an automatic semantic integration of information
bioequivalence

Viewers also liked (13)

PDF
TMCL Edit
PPT
Topic Maps Exchange in the Absence of Shared Vocabularies
PDF
tolog - a topic maps query language
PPT
Creating Topic Maps Ontologies for Space Experiments
PPT
A step towards TMDM 3.0
PDF
interchangeability
PPT
Topic map for Topic Maps case examples
PDF
idSpace
PPT
SocioTM – Relevancies, Collaboration, and Socio-knowledge in Topic Maps
PDF
JavaScript Topic Maps in server environments
PPT
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
PDF
HStrategies
PDF
TMCL Edit
Topic Maps Exchange in the Absence of Shared Vocabularies
tolog - a topic maps query language
Creating Topic Maps Ontologies for Space Experiments
A step towards TMDM 3.0
interchangeability
Topic map for Topic Maps case examples
idSpace
SocioTM – Relevancies, Collaboration, and Socio-knowledge in Topic Maps
JavaScript Topic Maps in server environments
Quality, Relevance and Importance in Information Retrieval with Fuzzy Semanti...
HStrategies
Ad

Similar to Topic Maps for Association Rule Mining (20)

PPTX
Clustering for Stream and Parallelism (DATA ANALYTICS)
PPTX
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
DOCX
A survey of xml tree patterns
PPTX
Machine Learning basics
PPTX
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
PDF
JOSA TechTalks - Machine Learning in Practice
PDF
Model Evaluation in the land of Deep Learning
PPTX
Azure Databricks for Data Scientists
PPTX
INTRODUCTIONTOML2024 for graphic era.pptx
DOCX
Learning to rank image tags with limited training examples
PPTX
Learning deep structured semantic models for web search
PPT
Machine learning for the Web:
PPT
(Talk in Powerpoint Format)
PDF
CLUSTERING IN DATA MINING.pdf
PDF
Machine Learning for Dummies (without mathematics)
PPT
about data mining and Exp about data mining and Exp.
PPTX
Clustering Algorithms.pptx
PDF
Machine learning-for-dummies-andrews-sobral-activeeon
PDF
Learning from similarity and information extraction from structured documents...
PDF
IEEE Datamining 2016 Title and Abstract
Clustering for Stream and Parallelism (DATA ANALYTICS)
DMDW Lesson 05 + 06 + 07 - Data Mining Applied
A survey of xml tree patterns
Machine Learning basics
A TALE of DATA PATTERN DISCOVERY IN PARALLEL
JOSA TechTalks - Machine Learning in Practice
Model Evaluation in the land of Deep Learning
Azure Databricks for Data Scientists
INTRODUCTIONTOML2024 for graphic era.pptx
Learning to rank image tags with limited training examples
Learning deep structured semantic models for web search
Machine learning for the Web:
(Talk in Powerpoint Format)
CLUSTERING IN DATA MINING.pdf
Machine Learning for Dummies (without mathematics)
about data mining and Exp about data mining and Exp.
Clustering Algorithms.pptx
Machine learning-for-dummies-andrews-sobral-activeeon
Learning from similarity and information extraction from structured documents...
IEEE Datamining 2016 Title and Abstract
Ad

More from tmra (20)

PDF
Topic Maps for improved access to and use of content in relational databases ...
PDF
External Schema for Topic Map Database
PDF
Weber 2010 brn
PDF
Subject Headings make information to be topic maps
PDF
Inquiry Optimization Technique for a Topic Map Database
PDF
Topic Merge Scenarios for Knowledge Federation
PDF
Modelling IMS QTI with Topic Maps
PDF
Hatana - Virtual Topic Map Merging
PDF
Designing a gui_description_language_with_topic_maps
PDF
Maiana - The social Topic Maps explorer
PDF
Tmra2010 matsuuraposter
PDF
Automatic semantic interpretation of unstructured data for knowledge management
PDF
Putting topic maps to rest.tmra2010
PDF
Presentation final
PPT
Evaluation of Instances Asset in a Topic Maps-Based Ontology
PDF
Defining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
XLSX
Mappe1
PDF
Et Tu, Brute? Topic Maps and Discourse Semantics
PDF
A PHP library for Ontopia-CMS Integration
PDF
Live Integration Framework
Topic Maps for improved access to and use of content in relational databases ...
External Schema for Topic Map Database
Weber 2010 brn
Subject Headings make information to be topic maps
Inquiry Optimization Technique for a Topic Map Database
Topic Merge Scenarios for Knowledge Federation
Modelling IMS QTI with Topic Maps
Hatana - Virtual Topic Map Merging
Designing a gui_description_language_with_topic_maps
Maiana - The social Topic Maps explorer
Tmra2010 matsuuraposter
Automatic semantic interpretation of unstructured data for knowledge management
Putting topic maps to rest.tmra2010
Presentation final
Evaluation of Instances Asset in a Topic Maps-Based Ontology
Defining Domain-Specific Facets for Topic Maps With TMQL Path Expressions
Mappe1
Et Tu, Brute? Topic Maps and Discourse Semantics
A PHP library for Ontopia-CMS Integration
Live Integration Framework

Recently uploaded (20)

PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Cloud computing and distributed systems.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation theory and applications.pdf
PPT
Teaching material agriculture food technology
PDF
Approach and Philosophy of On baking technology
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Machine learning based COVID-19 study performance prediction
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Empathic Computing: Creating Shared Understanding
Per capita expenditure prediction using model stacking based on satellite ima...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Cloud computing and distributed systems.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation theory and applications.pdf
Teaching material agriculture food technology
Approach and Philosophy of On baking technology
Review of recent advances in non-invasive hemoglobin estimation
The Rise and Fall of 3GPP – Time for a Sabbatical?
Diabetes mellitus diagnosis method based random forest with bat algorithm
Machine learning based COVID-19 study performance prediction
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
“AI and Expert System Decision Support & Business Intelligence Systems”
Advanced methodologies resolving dimensionality complications for autism neur...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
A comparative analysis of optical character recognition models for extracting...
Chapter 3 Spatial Domain Image Processing.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11

Topic Maps for Association Rule Mining

  • 1. TopicMapsforAssociation Rule MiningTomášKliegr, Jan Zemánek, Marek OvečkaDepartment ofInformationandKnowledgeEngineeringFacultyofInformaticsandStatisticsUniversity ofEconomics, Prague
  • 2. Data Mining using CRISP-DMThe goal of data mining is to obtain useful non-trivial patterns from the data.Analytical Report
  • 3. Common data mining tasksSex(M) andSalary(Low) andDistrict(Havlickuv Brod) => Quality(Bad)Association rulesClusteringClassification
  • 4. Association Rule MiningEXAMPLEUnlike clustering and classification, association rules provide true “nuggets” – rules meeting selectedinterestmeasuresDuration(2y+)andDistrict(Prague)=>Loan Quality(good)THE QUEST FOR TOPIC MAPSAntecedentConsequentTHE PROBLEM WITH INTEREST MEASURESItisusually not possible to tweaktheinterestmeasurethresholdssothatonlythereallyinterestingrules are output. To be on the safe side, we often get (many!) more rulesthandesired, Selectthereallyinterestingrulesfromtherulesoutputautomatically.Help searchingthroughtheresults.
  • 5. ThequestMore precise tasks orAutomatic rule filteringThe lingua franca for exchange of data mining models is PMML
  • 6. Predictive Modeling Markup LanguageXML SchemaPMML is the leading standard for statistical and data mining modelsSupported by over 20 vendors and organizationsCovers the technical part of the CRISP-DM Cyclehttp://www.dmg.org/pmml_examples/index.html
  • 7. PMML is “just” an XML SchemaDeveloped for deploying mining models Good for migration from one data mining environment to anotherBut:No explicit links between nodesVerboseSelf-contained. Lacks support forInterlinking multiple PMML documentsInterlinking PMML with other information
  • 8. Association Rule Mining OntologyThe ontology is a „semantization“ of PMML XML SchemaDESIGN GUIDELINESThekey design principlewas to alloweasytransformationof data from PMML to AROnSCOPEThe ontology is limited to thesubsetof PMML relevant toassociation rule mining. 60 topictypes, 50 associationtypesand 20 occurencetypesUSENo automatictransformationisyetavailable, butwe are working on oneusing OKS framework. Currently, data canbe input usingOntopoly.
  • 9. xs:element ismapped to topic typeTopics are assignedsamenames as PMML NodesButrespectingspacesbetweenwordsandcapitalizationSuperclasses are introducedforsemanticallysimilar XML NodesNamedelementsused as children in otherelementsthatcarry most ofthesemanticsoftheirparents are mergedwithparentIfan XML element has a directlycorrespondingtopic type in the ontology, the URI ofthe XML element withintheschemaisused as subjectidentifierDesign guidelines: Elements
  • 10. Design guidelines: AttributesEnumerationrestriction on anattributeismapped as a topic type withenumerationsuperclass (thisis a workaroundformissing TMCL support in OKS)Attributesthatcouldbeinterpreted as reference to otherelementsbecomeassociationsOtherattributesbecomeoccurencetypes
  • 11. Design guidelines: AssociationsNames for association types are arbitrarily chosen so that they are most descriptiveIntroduce less rather than more associations minimizes the effort when populating the ontology from PMMLAvoid unnecessary inflation of the topic mapLink only the semantically closest topicsAdditional „soft“relations can be introduced with inference statementsorderivedwithtolog
  • 12. Design guidelines: Role typesTopictypesused to map PMML elements are used as role typesUnless multiple topics are permitted in associationend. In that case superclassisused as a role type, or a new role type isintroduced
  • 14. OngoingworkSupport for background knowledge „alreadyknownassociationrules“Support forschemamapping „linkingof background knowledgewithminingresults“Already in the ontology, distinguished by base ofsubjectidentifierSchemaMappinghttp://keg.vse.cz/sma/XXXBackground Knowledgehttp://keg.vse.cz/bko/xxx
  • 15. Data Mining Use casePREDICT LOAN QUALITYFindclientcharacteristicsthatcouldbeused to predicttheirattitude to payingback a loan.BASED ON PAST RECORDS Input data: records on alreadygivenloans
  • 16. The data6181 clients in the PKDD’99 financial datasetData were preprocessed, i.e.
  • 17. ….And perhaps 9997 otherassociationrulesPreprocessed dataAssociation Rule Learner
  • 18. WE CAN’T PRESENT ALL 10.000 RULES TO THE CLIENTASK CLIENT WHAT HE KNOWSIfloandurationis more thantwoyearsandtheloanwasgiven in Praguedistrict, wecanexpectgoodloanquality. …background knowledge
  • 21. SchemaMappingBackground knowledge can use different “vocabulary” than the data If we are to use background knowledge in querying, we need to interlink them with data.The same approach would apply if we interlink several mining models (PMMLs)
  • 22. DeletinginformationwithTopicMapsFind association rules that subsume background knowledgeVisualizationof a tologquery
  • 23. SummaryMethodology for transferring XML Schema to Topic MapsAssociation Rule Mining Ontology based on PMMLEasily extensible to other data mining algorithmsInitial attempts to formalize background knowledgeInitial attempts to use Topic Maps for schema mappingAROn On-Line: http://maiana.topicmapslab.de/u/lmaicher/tm/kliegr