SlideShare a Scribd company logo
Transformation and aggregation
preprocessing for top-k recommendation
GAP rules induction
Marta Vomlelova, Michal Kopecky and Peter Vojtas
Charles University Prague
Content
• Data
• Task
• Mining – heuristics, domain specific, …
• Some results
• Mining - transferable methods , data aggregations
• Some results
• Oracle DB Data Miner
• Second order logic GAP rules
• Conclusions
RuleML-2015 Challenge Rule-based RS
for the web of data
Transformation and aggregation preprocessing for top-k
recommendation GAP rules induction
2
RuleML-2015 Challenge Rule-based RS
for the web of data
3
Task
• Run Python script train data – intermediate join processing size big,
redundant (for each UserID,MovieID the 5003 movie data repeat)
• For each user find 5 movies that best match a user profile top5(u)
• Submit CSV format: userId, movieId, scoren
• Observations
• Score does not affect system response, only (unordered) sets are
compared
• P, R, F@5 between top5(u) and varying size target (estimated average
size of target is 9.4 resp. 8, depending on assumptions)
RuleML-2015 Challenge Rule-based RS
for the web of data
Transformation and aggregation preprocessing for top-k
recommendation GAP rules induction
4
Mining – heuristics, domain specific, …
• 5003 DBPedia attributes – most frequent, clusters of properties, tried
mining, no relevant results (acquaintance with data)
• per attribute:
• relative frequency in ratings, NLP extraction
MAKEUP,VISUAL,SMIX,SEDIT,SPIELBERG,NY,CALIF,NOVELS,CAMERON,LA,ARIZONA,WILLIAMS
• KSI Pure first order logic with weighted average F = 0.05262 (our third)
• 0-1 order agreement with ratings ( good properties)
• 100*Movies.Spielberg + 50*Movies.Original + Movies.BayesAVG
• SCS_CUNI “Spielberg” F = 0.10681 (our best)
• Script downloaded table Xratings DB Ratings gave surprise
• disqualified Did not use only the training/test set F = 0.6987
• Precision: 0.9994 * 5000 = 4997 – three users have target set of size 4
RuleML-2015 Challenge Rule-based RS
for the web of data
Transformation and aggregation preprocessing for top-k
recommendation GAP rules induction
5
Transferable methods , data aggregations
• GenreMatch (genres in users ratings versus movie genres) and decision
tree drastic pruning
• KTIML Data mining combined with first order 0.10085 (our second)
RuleML-2015 Challenge Rule-based RS
for the web of data
Transformation and aggregation preprocessing for top-k
recommendation GAP rules induction
6
RulePreference Rule
0.11 R1:GoodProperty=1
0.25 R2: 113.5<CNT<400
0.29 R3: R1 and R2
0.58 R4: GoodProperty=0& CNT>399
0.57 R5: GoodProperty=1 & CNT>399
RuleML-2015 Challenge Rule-based RS
for the web of data
Transformation and aggregation preprocessing for top-k
recommendation GAP rules induction
7
Oracle DB Data Miner
Second order logic GAP rules
• DB aggregations  second order logic
• “simple” queries can be transformed to rules. E.g.
SELECT UserID, MovieID, 5 FROM Ordered_Prediction WHERE OrdNr <= 5; …
… 100*Movies.Spielberg + 50*Movies.Original + Movies.BayesAVG
• corresponds to GAP rule
• SCS_CUNI_Movie(u,m):100*x1+50*x2+ x3 
•  SPIELBERG(m): x1 & ORIGINAL(m): x2 & BAYESAVG(m):x3
• Semantics so far:
• 2GAP - facts extended by atomic predicates corresponding to tables resulting
from database aggregations e.g. SPIELBERG(m), ORIGINAL(m), BAYESAVG(m)
RuleML-2015 Challenge Rule-based RS
for the web of data
Transformation and aggregation preprocessing for top-k
recommendation GAP rules induction
8
Conclusions
• Data too big for rule induction tools – all processing in a relational DB
• Transformation via NLP extraction. Clustering and importance of
attributes
• Data base aggregation – CNT, AVG, ….
• “simple” rules (in a second order logic GAP)
• Rules give explanation intuitive for humans
• Precision - In ideal case we gave 75% of users at least one correct
recommendation
• Future work – distribution of learning quality along users (not only
AVG)
RuleML-2015 Challenge Rule-based RS
for the web of data
Transformation and aggregation preprocessing for top-k
recommendation GAP rules induction
9

More Related Content

PDF
RuleML2015: Rule-based data transformations in electricity smart grids
PDF
Motiva online monitoring and optimization energy system
PPTX
IES Faculty - Big Data in Building Services
PDF
SUNSHINE Project: Romain Nouvel, Jean Marie Bahu
DOCX
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Dynamic heterogeneity aware resource ...
PDF
GIS Based Power Distribution System: A Case study for the Junagadh City
PDF
FIWARE Global Summit - FISMEP: a FIWARE-based Platform for Energy Applications
PDF
JRC-EU-TIMES updates and outlook
RuleML2015: Rule-based data transformations in electricity smart grids
Motiva online monitoring and optimization energy system
IES Faculty - Big Data in Building Services
SUNSHINE Project: Romain Nouvel, Jean Marie Bahu
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Dynamic heterogeneity aware resource ...
GIS Based Power Distribution System: A Case study for the Junagadh City
FIWARE Global Summit - FISMEP: a FIWARE-based Platform for Energy Applications
JRC-EU-TIMES updates and outlook

What's hot (20)

PPTX
DataSplice Mobile for Maximo Power Utilities
PPTX
Lynx industry use case. Pilot 3: Compliance Assurance for Geothermal Energy P...
PPTX
Spot trading profits of electricity storage systems in the region covered by ...
PPT
TPC TC And TPC-Energy Slide Deck 5.4.09
PPTX
IES Faculty - Integration of IES with Revit
PPT
Smart Grids:Enterprise GIS For Distribution Loss Reduction in Electric Utilit...
PPTX
Berlin presentation 20171023
PDF
21 gi 8th_pvpmc_talk_abq_final_ext_170508t19_new
PDF
energiency
PDF
Energy Efficiency Version 3.0
PDF
Cosmi cjuin sig2018
PPTX
Leandro Madrazo, ARC Engineering and Architecture La Salle, Barcelona, Spain.
PPT
Application driven IT service management for energy efficiency
PDF
Case Study: Operational Energy Reduction through Data Analysis & Virtual Benc...
PPTX
Rolf Bastiaanssen, Bax & Company, Barcelona, Spain.
PDF
SLOPE 2nd workshop - presentation 2
PDF
ICHEC - Observation systems, technologies and big data
PDF
S.1.2 Data Model for Energy Maps
PDF
ETSAP-TIAM update and re-calibration
PPTX
Analytics as value added service for energy utilities
DataSplice Mobile for Maximo Power Utilities
Lynx industry use case. Pilot 3: Compliance Assurance for Geothermal Energy P...
Spot trading profits of electricity storage systems in the region covered by ...
TPC TC And TPC-Energy Slide Deck 5.4.09
IES Faculty - Integration of IES with Revit
Smart Grids:Enterprise GIS For Distribution Loss Reduction in Electric Utilit...
Berlin presentation 20171023
21 gi 8th_pvpmc_talk_abq_final_ext_170508t19_new
energiency
Energy Efficiency Version 3.0
Cosmi cjuin sig2018
Leandro Madrazo, ARC Engineering and Architecture La Salle, Barcelona, Spain.
Application driven IT service management for energy efficiency
Case Study: Operational Energy Reduction through Data Analysis & Virtual Benc...
Rolf Bastiaanssen, Bax & Company, Barcelona, Spain.
SLOPE 2nd workshop - presentation 2
ICHEC - Observation systems, technologies and big data
S.1.2 Data Model for Energy Maps
ETSAP-TIAM update and re-calibration
Analytics as value added service for energy utilities
Ad

Viewers also liked (16)

PDF
RuleML2015: Rule Generalization Strategies in Incremental Learning of Disjunc...
PDF
RuleML2015: Using PSL to Extend and Evaluate Event Ontologies
PDF
RuleML2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
PDF
RuleML2015: GRAAL - a toolkit for query answering with existential rules
PDF
RuleML2015: Ontology-Based Multidimensional Contexts with Applications to Qua...
PDF
RuleML2015: Norwegian State of Estate: A Reporting Service for the State-Owne...
PDF
RuleML2015: Input-Output STIT Logic for Normative Systems
PDF
RuleML2015: FOWLA, a federated architecture for ontologies
PDF
RuleML2015 PSOA RuleML: Integrated Object-Relational Data and Rules
PDF
RuleML2015: Towards Formal Semantics for ODRL Policies
PDF
RuleML2015 : Hybrid Relational and Graph Reasoning
PDF
RuleML2015: How to combine event stream reasoning with transactions for the...
PDF
RuleML2015 - Tutorial - Powerful Practical Semantic Rules in Rulelog - Funda...
PDF
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
PDF
RuleML 2015 Constraint Handling Rules - What Else?
PDF
RuleML2015: Explanation of proofs of regulatory (non-)complianceusing semanti...
RuleML2015: Rule Generalization Strategies in Incremental Learning of Disjunc...
RuleML2015: Using PSL to Extend and Evaluate Event Ontologies
RuleML2015: Semantics of Notation3 Logic: A Solution for Implicit Quantifica...
RuleML2015: GRAAL - a toolkit for query answering with existential rules
RuleML2015: Ontology-Based Multidimensional Contexts with Applications to Qua...
RuleML2015: Norwegian State of Estate: A Reporting Service for the State-Owne...
RuleML2015: Input-Output STIT Logic for Normative Systems
RuleML2015: FOWLA, a federated architecture for ontologies
RuleML2015 PSOA RuleML: Integrated Object-Relational Data and Rules
RuleML2015: Towards Formal Semantics for ODRL Policies
RuleML2015 : Hybrid Relational and Graph Reasoning
RuleML2015: How to combine event stream reasoning with transactions for the...
RuleML2015 - Tutorial - Powerful Practical Semantic Rules in Rulelog - Funda...
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
RuleML 2015 Constraint Handling Rules - What Else?
RuleML2015: Explanation of proofs of regulatory (non-)complianceusing semanti...
Ad

Similar to Transformation and aggregation preprocessing for top-k recommendation GAP rules induction (20)

PDF
Graph Gurus Episode 28: In-Database Machine Learning Solution for Real-Time R...
PDF
Cikm 2013 - Beyond Data From User Information to Business Value
PPTX
Data Mining and Recommendation Systems
PDF
Frequent Item set Mining of Big Data for Social Media
PDF
Frequent Item set Mining of Big Data for Social Media
PDF
APPLYING SUPERVISED AND UN-SUPERVISED LEARNING APPROACHES FOR MOVIE RECOMMEND...
PDF
Applying supervised and un supervised learning approaches for movie recommend...
DOCX
Mining Large Streams of User Data for PersonalizedRecommenda.docx
PPTX
acmsigtalkshare-121023190142-phpapp01.pptx
PPTX
Data warehousing and mining furc
PDF
Dunham - Data Mining.pdf
PDF
Dunham - Data Mining.pdf
ODP
Data mining
ODP
Data mining
PDF
Xavier amatriain, dir algorithms netflix m lconf 2013
PDF
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
PPTX
Data mining
PPTX
Skillwise Big data
PPTX
Social Νetworks Data Mining
PDF
Data Mining in the World of BIG Data-A Survey
Graph Gurus Episode 28: In-Database Machine Learning Solution for Real-Time R...
Cikm 2013 - Beyond Data From User Information to Business Value
Data Mining and Recommendation Systems
Frequent Item set Mining of Big Data for Social Media
Frequent Item set Mining of Big Data for Social Media
APPLYING SUPERVISED AND UN-SUPERVISED LEARNING APPROACHES FOR MOVIE RECOMMEND...
Applying supervised and un supervised learning approaches for movie recommend...
Mining Large Streams of User Data for PersonalizedRecommenda.docx
acmsigtalkshare-121023190142-phpapp01.pptx
Data warehousing and mining furc
Dunham - Data Mining.pdf
Dunham - Data Mining.pdf
Data mining
Data mining
Xavier amatriain, dir algorithms netflix m lconf 2013
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
Data mining
Skillwise Big data
Social Νetworks Data Mining
Data Mining in the World of BIG Data-A Survey

Recently uploaded (20)

PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PDF
Introduction to Data Science and Data Analysis
DOCX
Factor Analysis Word Document Presentation
PDF
Global Data and Analytics Market Outlook Report
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Introduction to the R Programming Language
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PDF
annual-report-2024-2025 original latest.
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PDF
Transcultural that can help you someday.
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PDF
[EN] Industrial Machine Downtime Prediction
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPT
DU, AIS, Big Data and Data Analytics.ppt
PDF
Microsoft Core Cloud Services powerpoint
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Introduction to Data Science and Data Analysis
Factor Analysis Word Document Presentation
Global Data and Analytics Market Outlook Report
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction to the R Programming Language
STERILIZATION AND DISINFECTION-1.ppthhhbx
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
annual-report-2024-2025 original latest.
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
retention in jsjsksksksnbsndjddjdnFPD.pptx
Transcultural that can help you someday.
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Optimise Shopper Experiences with a Strong Data Estate.pdf
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
[EN] Industrial Machine Downtime Prediction
ISS -ESG Data flows What is ESG and HowHow
Qualitative Qantitative and Mixed Methods.pptx
DU, AIS, Big Data and Data Analytics.ppt
Microsoft Core Cloud Services powerpoint

Transformation and aggregation preprocessing for top-k recommendation GAP rules induction

  • 1. Transformation and aggregation preprocessing for top-k recommendation GAP rules induction Marta Vomlelova, Michal Kopecky and Peter Vojtas Charles University Prague
  • 2. Content • Data • Task • Mining – heuristics, domain specific, … • Some results • Mining - transferable methods , data aggregations • Some results • Oracle DB Data Miner • Second order logic GAP rules • Conclusions RuleML-2015 Challenge Rule-based RS for the web of data Transformation and aggregation preprocessing for top-k recommendation GAP rules induction 2
  • 3. RuleML-2015 Challenge Rule-based RS for the web of data 3
  • 4. Task • Run Python script train data – intermediate join processing size big, redundant (for each UserID,MovieID the 5003 movie data repeat) • For each user find 5 movies that best match a user profile top5(u) • Submit CSV format: userId, movieId, scoren • Observations • Score does not affect system response, only (unordered) sets are compared • P, R, F@5 between top5(u) and varying size target (estimated average size of target is 9.4 resp. 8, depending on assumptions) RuleML-2015 Challenge Rule-based RS for the web of data Transformation and aggregation preprocessing for top-k recommendation GAP rules induction 4
  • 5. Mining – heuristics, domain specific, … • 5003 DBPedia attributes – most frequent, clusters of properties, tried mining, no relevant results (acquaintance with data) • per attribute: • relative frequency in ratings, NLP extraction MAKEUP,VISUAL,SMIX,SEDIT,SPIELBERG,NY,CALIF,NOVELS,CAMERON,LA,ARIZONA,WILLIAMS • KSI Pure first order logic with weighted average F = 0.05262 (our third) • 0-1 order agreement with ratings ( good properties) • 100*Movies.Spielberg + 50*Movies.Original + Movies.BayesAVG • SCS_CUNI “Spielberg” F = 0.10681 (our best) • Script downloaded table Xratings DB Ratings gave surprise • disqualified Did not use only the training/test set F = 0.6987 • Precision: 0.9994 * 5000 = 4997 – three users have target set of size 4 RuleML-2015 Challenge Rule-based RS for the web of data Transformation and aggregation preprocessing for top-k recommendation GAP rules induction 5
  • 6. Transferable methods , data aggregations • GenreMatch (genres in users ratings versus movie genres) and decision tree drastic pruning • KTIML Data mining combined with first order 0.10085 (our second) RuleML-2015 Challenge Rule-based RS for the web of data Transformation and aggregation preprocessing for top-k recommendation GAP rules induction 6 RulePreference Rule 0.11 R1:GoodProperty=1 0.25 R2: 113.5<CNT<400 0.29 R3: R1 and R2 0.58 R4: GoodProperty=0& CNT>399 0.57 R5: GoodProperty=1 & CNT>399
  • 7. RuleML-2015 Challenge Rule-based RS for the web of data Transformation and aggregation preprocessing for top-k recommendation GAP rules induction 7 Oracle DB Data Miner
  • 8. Second order logic GAP rules • DB aggregations  second order logic • “simple” queries can be transformed to rules. E.g. SELECT UserID, MovieID, 5 FROM Ordered_Prediction WHERE OrdNr <= 5; … … 100*Movies.Spielberg + 50*Movies.Original + Movies.BayesAVG • corresponds to GAP rule • SCS_CUNI_Movie(u,m):100*x1+50*x2+ x3  •  SPIELBERG(m): x1 & ORIGINAL(m): x2 & BAYESAVG(m):x3 • Semantics so far: • 2GAP - facts extended by atomic predicates corresponding to tables resulting from database aggregations e.g. SPIELBERG(m), ORIGINAL(m), BAYESAVG(m) RuleML-2015 Challenge Rule-based RS for the web of data Transformation and aggregation preprocessing for top-k recommendation GAP rules induction 8
  • 9. Conclusions • Data too big for rule induction tools – all processing in a relational DB • Transformation via NLP extraction. Clustering and importance of attributes • Data base aggregation – CNT, AVG, …. • “simple” rules (in a second order logic GAP) • Rules give explanation intuitive for humans • Precision - In ideal case we gave 75% of users at least one correct recommendation • Future work – distribution of learning quality along users (not only AVG) RuleML-2015 Challenge Rule-based RS for the web of data Transformation and aggregation preprocessing for top-k recommendation GAP rules induction 9