SlideShare a Scribd company logo
Using Graph Theory to understand Intent & Concepts – January 2013	
  



                               tumra.com	
  
UNDERSTANDING INTENT & CONCEPTS	
  
•  Use case:
    -  Enhancing Social TV user experience
    -  Matching users to content that interests them

•  Topics we’ll cover:
    -  Natural Language Processing
    -  Graph Theory
    -  Machine Learning


                         tumra.com	
  
USE CASE ENHANCED SOCIAL TV	
  
•  Objectives:
    -  Increase engagement with content
    -  Enhance multi-channel user experience

•  We built a prototype solution:
    -  Mines unstructured data in real-time
    -  Understands:
      -  What interests individual users
      -  Entities & Concepts (People, Places, Events)


                          tumra.com	
  
THE CHALLENGE	
  


THANKS FORtoLISTENING	
  
 Help users to “follow the story” regardless of the
 news outlet, integrated web / second-screen	
  




                      tumra.com	
  
                                             Photo Credit: byrion on Flickr (cc)
THE PROBLEM	
  


Unstructured
    Data
                  Magic?!?!         Awesomeness!




                    tumra.com	
  
THE PROBLEM	
  
•  Little useful data to work with…
    -  Streams of continuous live TV
    -  Have to create metadata

•  Where did we start?
    -  Ingest several live news channels
    -  Extract whatever data was available:
      -  In-video text using OCR
      -  Subtitles / Closed Captions


                         tumra.com	
  
STEP 1 NAMED ENTITY RECOGNITION	
  


We used a simple N-Gram model for exact matches;
    then Apache Lucene for everything else…	
  




                      tumra.com	
  
EXAMPLE N.E.R.	
  

  “David Cameron and the German
Chancellor Angela Merkel meets to
 discuss the debt crisis and signal
their approval for greater eurozone
           integration.”	
  


               tumra.com	
  
EXAMPLE N.E.R.	
  

  “David Cameron and the German
Chancellor Angela Merkel meets to
 discuss the debt crisis and signal
their approval for greater eurozone
           integration.”	
  


               tumra.com	
  
INITIAL SOLUTION	
  

                       NoSQL

Unstructured
                                       Awesomeness!
    Data


                         NER




                       tumra.com	
  
OH NO!!!
 *facepalm*	
  




     Photo Credit: cesarastudillo on Flickr (cc)
DISAMBIGUATION	
  
•  Which “David Cameron”?
    -  We have many in our Knowledgebase
    -  Sportsmen, actors, painters & characters…

•  Our initial simplistic approach was naïve
    -  Works great with unambiguous matches
    -  Best-case returns top-scoring entity

•  We needed a smarter approach
                       tumra.com	
  
RECAP	
  
•  We have an effectively ‘flat’ KB of Entities
    -    “David Cameron” -> Politician (Person)
    -    “Angela Merkel” -> Politician (Person)
    -    “German Chancellor” -> Political office (Concept)
    -    “Debt” -> Economic concept (Concept)
    -    “Eurozone” -> Economic area (Place)


•  We needed a way to find relationships
   between Entities

                            tumra.com	
  
THE BIG IDEA	
  




Graphs allow us to store relationships between entities, and
graph algorithms allow us to interrogate those connections…	
  
GRAPH DATABASES	
  
                                              Graph
   Neo4J
                                               Lab

                    Apache                             Golden
                    Giraph                              Orb


… of course there are many more open-source & proprietary ones	
  
                              tumra.com	
  
SO, WHICH ONE?	
  


                       ???
… it had to be fast, scalable, active development	
  

                        tumra.com	
  
STEP 2 BUILDING RELATIONSHIPS	
  

We had 250 million Nodes, and 4 billion Edges…
great initial results but horrendously inefficient!

  Example: “David Cameron” & “Angela Merkel”	
  



                       tumra.com	
  
Using Graph theory to understand Intent & Concepts - Neo4j User Group (January 2013)
Using Graph theory to understand Intent & Concepts - Neo4j User Group (January 2013)
INITIAL IMPROVEMENTS	
  
•  We didn’t need everything… just:
    -    People: “David Cameron”, “Angela Merkel”
    -    Places: “London”, “Downing Street”, “Eurozone”
    -    Concepts: “Debt”, “President”, “Eurozone”
    -    Things: Companies, Products etc.


•  Pruned the graph using Map/Reduce

•  This reduced the number of Entities…
    -  … but we still had billions of connections
                            tumra.com	
  
EXAMPLE PEOPLE, PLACES, CONCEPTS	
  

       “David Cameron and the German
     Chancellor Angela Merkel meets to
      discuss the debt crisis and signal
     their approval for greater eurozone
                integration.”	
  


                    tumra.com	
  
EXAMPLE PEOPLE, PLACES, CONCEPTS	
  
                  	
  
             “David Cameron and the German
           Chancellor Angela Merkel meets to
            discuss the debt crisis and signal
           their approval for greater eurozone
                      integration.”	
  
Concepts                                         Places
                          People

                          tumra.com	
  
DISAMBIGUATION	
  
                                                                         Angela
                                                                         Merkel

   David
 Cameron
 (painter)                  Living
                            Person         Politician
                                                               Head of
                                                                State




   David
  Cameron                                         David
(footballer)           David
                     Cameron                     Cameron
                      (actor)                   (politician)



Possibilities: shortest path, number of common connections etc.	
  
STEP 3 SIMPLIFYING THE GRAPH	
  

Sure all that extra metadata was tasty but we didn’t
           need it all to solve the use-case…

   So we used Map/Reduce to count the common
                  connections	
  


                        tumra.com	
  
SIMPLIFIED	
  
                                                                     Angela
                                                                     Merkel

   David
 Cameron
 (painter)
                                   1
                                                                 3
                                              1
   David
  Cameron                                           David
(footballer)              David
                        Cameron                    Cameron
                         (actor)                  (politician)



       Woah … that looks a lot like Least Cost Routing problem	
  
LEAST COST PATH	
  
                                                                 Angela
                                                                 Merkel

   David
 Cameron
 (painter)
                                   1/1
                                                               1/3
                                              1/1
   David
  Cameron                                         David
(footballer)              David
                        Cameron                  Cameron
                         (actor)                (politician)



               1 / number of common connections = cost	
  
UPDATED SOLUTION	
  

                  Neo4J                      NoSQL

Unstructured
                          Disambiguation             Awesomeness!
    Data


                               NER




                             tumra.com	
  
RECAP	
  
•  Graphs allow us to interrogate relationships
    -  Disambiguate when faced with multiple possibilities
    -  Infer more about the context of what’s happening


•  Went through iterations of improvements
    -  Kept our Entity data in NoSQL = TB’s
    -  Used the Graph as an index of sorts = GB’s


•  Neo4j was a great fit for our needs

                           tumra.com	
  
STEP 4 MAKING IT WORK REAL-TIME	
  

Some queries were taking ‘seconds’ and we needed
 to go a lot faster because TV wont wait for us …

 Do we really need to check the Graph everytime?	
  



                        tumra.com	
  
ENTER MACHINE LEARNING	
  
•  We can use simple predictors to estimate
   the likelihood of Entities occurring
    -  i.e. every time we’ve looked for “David Cameron” in
       the past the best match was the Politician


•  Keeping a ‘probabilistic context’ of recent
   Entities allows us to detect shifts in topics
    -  Works especially well on News channels
    -  Reduces the demand on Graph lookups

                          tumra.com	
  
BAYES THEOREM	
  




Looks complicated, but its basically just counting & division	
  
                                                         Photo Credit: mattbuck007 on Flickr (cc)
STEP 5 MAKING IT WORK WORLDWIDE	
  


 We solved the problem for English, but what about
                 other languages?	
  




                       tumra.com	
  
LANGUAGE	
  
•  Our core Entities of ‘People’, ‘Places’, &
   ‘Concepts’ are language agnostic…

•  We needed a way to ditch ‘language’ and
   jump straight to entities…
    -  The colour ‘Red’ means the same thing regardless of
       you calling it ‘Rot’, ‘Rouge’ or ‘赤’


•  Again, Graphs could solve the problem
                          tumra.com	
  
LANGUAGE INDEPENDENT	
  
Red                                   !"#‫أ‬

                       Color:
Rouge
                        Red           赤


        Rot                     Röd
                Rojo        紅
PROBLEM SOLVED	
  


Typical response time ~30ms … relevancy improves
     over time and learns new entities ‘online’	
  




                       tumra.com	
  
FINAL SOLUTION	
  

                 Neo4J                           NoSQL

Unstructured    Language Model              Disambiguation
                                                             Awesomeness!
    Data
                         Machine Learning

                                 NER




                                 tumra.com	
  
ABOUT US	
  
•  We’ve built a product…
    -  Our ‘Digital Marketing Optimization’ platform
       improves conversion rates & customer satisfaction
       for eCommerce & Marketing campaigns
    -  Launches Q1 2013

•  What else do we do?
    -  ‘Big Data’ & ‘Data Science’ professional services
    -  Bespoke prototype & solution development


         “TUMRA” is a transliteration of the Sanskrit word for “BIG”;
        we thought it’s a great name … ( and the .COM was available )
                                   tumra.com	
  
TUMRA
                                   You?

THANKS FOR LISTENING	
  
         We’re hiring!
        Data Scientists & Developers
              work@tumra.com
                     tumra.com	
  
THANKS FOR LISTENING
    Questions?	
  
          tumra.com
      hello@tumra.com
               	
  
      twitter.com/tumra
            tumra.com	
  

More Related Content

PDF
Natural Language Processing with Graphs
PPTX
Neo4j Import Webinar
PDF
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
PDF
Using neo4j for enterprise metadata requirements
PDF
Neo4j Introduction - Game of Thrones
PDF
Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct...
PDF
PDF
Introducing Neo4j
Natural Language Processing with Graphs
Neo4j Import Webinar
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
Using neo4j for enterprise metadata requirements
Neo4j Introduction - Game of Thrones
Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct...
Introducing Neo4j

Viewers also liked (20)

PPTX
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
PPTX
Introduction: Relational to Graphs
PDF
Deploying Massive Scale Graphs for Realtime Insights
PDF
Digital Transformation in a Connected World
PPTX
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
PDF
Graphs for Enterprise Architects
PPTX
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
PDF
RDBMS to Graphs
PPT
An Introduction to Graph Databases
PPTX
Using a Graph Database for Next-Gen MDM
PPTX
An Introduction to NOSQL, Graph Databases and Neo4j
PDF
Relational to Big Graph
PDF
Importing Data into Neo4j quickly and easily - StackOverflow
PDF
Neo4j the Anti Crime Database
PDF
Fraud Detection with Neo4j
PDF
An overview of Neo4j Internals
KEY
Intro to Neo4j presentation
PDF
Neo4j PartnerDay Amsterdam 2017
PPTX
Introduction to Graph Databases
PPTX
Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater
Webinar: Large Scale Graph Processing with IBM Power Systems & Neo4j
Introduction: Relational to Graphs
Deploying Massive Scale Graphs for Realtime Insights
Digital Transformation in a Connected World
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
Graphs for Enterprise Architects
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
RDBMS to Graphs
An Introduction to Graph Databases
Using a Graph Database for Next-Gen MDM
An Introduction to NOSQL, Graph Databases and Neo4j
Relational to Big Graph
Importing Data into Neo4j quickly and easily - StackOverflow
Neo4j the Anti Crime Database
Fraud Detection with Neo4j
An overview of Neo4j Internals
Intro to Neo4j presentation
Neo4j PartnerDay Amsterdam 2017
Introduction to Graph Databases
Neo4j Partner Tag Berlin - Potential für System-Integratoren und Berater
Ad

Similar to Using Graph theory to understand Intent & Concepts - Neo4j User Group (January 2013) (20)

PDF
Tim Estes - Generating dynamic social networks from large scale unstructured ...
PPT
Exploring Europeana - Opportunities, Challenges, Inspirations and Plans
PDF
CAEPIA 2011
PPT
Demography1
PDF
Intro to Data Vis for the Humanities nov 2013
PDF
MPhil Lecture on Data Vis for Analysis
PDF
Visualizations using Visualbox
PPTX
Patterns of Big Social Data
KEY
Analyzing FEC Data with NEO4J
PPTX
Insights from Data: Overcoming Objections
PDF
Semantic web user interfaces - Do they have to be ugly?
PDF
Network Mapping & Data Storytelling for Beginners
PPTX
Manichean Progress: Positive and Negative States of the Art in Web-Scale Data...
PDF
II-SDV 2013 The Analytics Challenges Posed by Big Data
PDF
DataStax Enterprise Graph Fundamentals with Real World Example
PPT
PATHS at VSMM 2012
PDF
Waterloo Data Science and Data Engineering Meetup - 2018-08-29
PPTX
Big Data and the Social Sciences
PDF
Text Analytic Summit 2010
DOCX
TED Wiley Visualizing .docx
Tim Estes - Generating dynamic social networks from large scale unstructured ...
Exploring Europeana - Opportunities, Challenges, Inspirations and Plans
CAEPIA 2011
Demography1
Intro to Data Vis for the Humanities nov 2013
MPhil Lecture on Data Vis for Analysis
Visualizations using Visualbox
Patterns of Big Social Data
Analyzing FEC Data with NEO4J
Insights from Data: Overcoming Objections
Semantic web user interfaces - Do they have to be ugly?
Network Mapping & Data Storytelling for Beginners
Manichean Progress: Positive and Negative States of the Art in Web-Scale Data...
II-SDV 2013 The Analytics Challenges Posed by Big Data
DataStax Enterprise Graph Fundamentals with Real World Example
PATHS at VSMM 2012
Waterloo Data Science and Data Engineering Meetup - 2018-08-29
Big Data and the Social Sciences
Text Analytic Summit 2010
TED Wiley Visualizing .docx
Ad

Recently uploaded (20)

PDF
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPT
What is a Computer? Input Devices /output devices
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
DP Operators-handbook-extract for the Mautical Institute
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
August Patch Tuesday
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
1. Introduction to Computer Programming.pptx
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Hybrid model detection and classification of lung cancer
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PPTX
The various Industrial Revolutions .pptx
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Assigned Numbers - 2025 - Bluetooth® Document
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
NewMind AI Weekly Chronicles - August'25-Week II
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
What is a Computer? Input Devices /output devices
Web App vs Mobile App What Should You Build First.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
DP Operators-handbook-extract for the Mautical Institute
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Developing a website for English-speaking practice to English as a foreign la...
August Patch Tuesday
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
1. Introduction to Computer Programming.pptx
WOOl fibre morphology and structure.pdf for textiles
Hybrid model detection and classification of lung cancer
Zenith AI: Advanced Artificial Intelligence
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
The various Industrial Revolutions .pptx
observCloud-Native Containerability and monitoring.pptx
Getting started with AI Agents and Multi-Agent Systems
Assigned Numbers - 2025 - Bluetooth® Document

Using Graph theory to understand Intent & Concepts - Neo4j User Group (January 2013)

  • 1. Using Graph Theory to understand Intent & Concepts – January 2013   tumra.com  
  • 2. UNDERSTANDING INTENT & CONCEPTS   •  Use case: -  Enhancing Social TV user experience -  Matching users to content that interests them •  Topics we’ll cover: -  Natural Language Processing -  Graph Theory -  Machine Learning tumra.com  
  • 3. USE CASE ENHANCED SOCIAL TV   •  Objectives: -  Increase engagement with content -  Enhance multi-channel user experience •  We built a prototype solution: -  Mines unstructured data in real-time -  Understands: -  What interests individual users -  Entities & Concepts (People, Places, Events) tumra.com  
  • 4. THE CHALLENGE   THANKS FORtoLISTENING   Help users to “follow the story” regardless of the news outlet, integrated web / second-screen   tumra.com   Photo Credit: byrion on Flickr (cc)
  • 5. THE PROBLEM   Unstructured Data Magic?!?! Awesomeness! tumra.com  
  • 6. THE PROBLEM   •  Little useful data to work with… -  Streams of continuous live TV -  Have to create metadata •  Where did we start? -  Ingest several live news channels -  Extract whatever data was available: -  In-video text using OCR -  Subtitles / Closed Captions tumra.com  
  • 7. STEP 1 NAMED ENTITY RECOGNITION   We used a simple N-Gram model for exact matches; then Apache Lucene for everything else…   tumra.com  
  • 8. EXAMPLE N.E.R.   “David Cameron and the German Chancellor Angela Merkel meets to discuss the debt crisis and signal their approval for greater eurozone integration.”   tumra.com  
  • 9. EXAMPLE N.E.R.   “David Cameron and the German Chancellor Angela Merkel meets to discuss the debt crisis and signal their approval for greater eurozone integration.”   tumra.com  
  • 10. INITIAL SOLUTION   NoSQL Unstructured Awesomeness! Data NER tumra.com  
  • 11. OH NO!!! *facepalm*   Photo Credit: cesarastudillo on Flickr (cc)
  • 12. DISAMBIGUATION   •  Which “David Cameron”? -  We have many in our Knowledgebase -  Sportsmen, actors, painters & characters… •  Our initial simplistic approach was naïve -  Works great with unambiguous matches -  Best-case returns top-scoring entity •  We needed a smarter approach tumra.com  
  • 13. RECAP   •  We have an effectively ‘flat’ KB of Entities -  “David Cameron” -> Politician (Person) -  “Angela Merkel” -> Politician (Person) -  “German Chancellor” -> Political office (Concept) -  “Debt” -> Economic concept (Concept) -  “Eurozone” -> Economic area (Place) •  We needed a way to find relationships between Entities tumra.com  
  • 14. THE BIG IDEA   Graphs allow us to store relationships between entities, and graph algorithms allow us to interrogate those connections…  
  • 15. GRAPH DATABASES   Graph Neo4J Lab Apache Golden Giraph Orb … of course there are many more open-source & proprietary ones   tumra.com  
  • 16. SO, WHICH ONE?   ??? … it had to be fast, scalable, active development   tumra.com  
  • 17. STEP 2 BUILDING RELATIONSHIPS   We had 250 million Nodes, and 4 billion Edges… great initial results but horrendously inefficient! Example: “David Cameron” & “Angela Merkel”   tumra.com  
  • 20. INITIAL IMPROVEMENTS   •  We didn’t need everything… just: -  People: “David Cameron”, “Angela Merkel” -  Places: “London”, “Downing Street”, “Eurozone” -  Concepts: “Debt”, “President”, “Eurozone” -  Things: Companies, Products etc. •  Pruned the graph using Map/Reduce •  This reduced the number of Entities… -  … but we still had billions of connections tumra.com  
  • 21. EXAMPLE PEOPLE, PLACES, CONCEPTS   “David Cameron and the German Chancellor Angela Merkel meets to discuss the debt crisis and signal their approval for greater eurozone integration.”   tumra.com  
  • 22. EXAMPLE PEOPLE, PLACES, CONCEPTS     “David Cameron and the German Chancellor Angela Merkel meets to discuss the debt crisis and signal their approval for greater eurozone integration.”   Concepts Places People tumra.com  
  • 23. DISAMBIGUATION   Angela Merkel David Cameron (painter) Living Person Politician Head of State David Cameron David (footballer) David Cameron Cameron (actor) (politician) Possibilities: shortest path, number of common connections etc.  
  • 24. STEP 3 SIMPLIFYING THE GRAPH   Sure all that extra metadata was tasty but we didn’t need it all to solve the use-case… So we used Map/Reduce to count the common connections   tumra.com  
  • 25. SIMPLIFIED   Angela Merkel David Cameron (painter) 1 3 1 David Cameron David (footballer) David Cameron Cameron (actor) (politician) Woah … that looks a lot like Least Cost Routing problem  
  • 26. LEAST COST PATH   Angela Merkel David Cameron (painter) 1/1 1/3 1/1 David Cameron David (footballer) David Cameron Cameron (actor) (politician) 1 / number of common connections = cost  
  • 27. UPDATED SOLUTION   Neo4J NoSQL Unstructured Disambiguation Awesomeness! Data NER tumra.com  
  • 28. RECAP   •  Graphs allow us to interrogate relationships -  Disambiguate when faced with multiple possibilities -  Infer more about the context of what’s happening •  Went through iterations of improvements -  Kept our Entity data in NoSQL = TB’s -  Used the Graph as an index of sorts = GB’s •  Neo4j was a great fit for our needs tumra.com  
  • 29. STEP 4 MAKING IT WORK REAL-TIME   Some queries were taking ‘seconds’ and we needed to go a lot faster because TV wont wait for us … Do we really need to check the Graph everytime?   tumra.com  
  • 30. ENTER MACHINE LEARNING   •  We can use simple predictors to estimate the likelihood of Entities occurring -  i.e. every time we’ve looked for “David Cameron” in the past the best match was the Politician •  Keeping a ‘probabilistic context’ of recent Entities allows us to detect shifts in topics -  Works especially well on News channels -  Reduces the demand on Graph lookups tumra.com  
  • 31. BAYES THEOREM   Looks complicated, but its basically just counting & division   Photo Credit: mattbuck007 on Flickr (cc)
  • 32. STEP 5 MAKING IT WORK WORLDWIDE   We solved the problem for English, but what about other languages?   tumra.com  
  • 33. LANGUAGE   •  Our core Entities of ‘People’, ‘Places’, & ‘Concepts’ are language agnostic… •  We needed a way to ditch ‘language’ and jump straight to entities… -  The colour ‘Red’ means the same thing regardless of you calling it ‘Rot’, ‘Rouge’ or ‘赤’ •  Again, Graphs could solve the problem tumra.com  
  • 34. LANGUAGE INDEPENDENT   Red !"#‫أ‬ Color: Rouge Red 赤 Rot Röd Rojo 紅
  • 35. PROBLEM SOLVED   Typical response time ~30ms … relevancy improves over time and learns new entities ‘online’   tumra.com  
  • 36. FINAL SOLUTION   Neo4J NoSQL Unstructured Language Model Disambiguation Awesomeness! Data Machine Learning NER tumra.com  
  • 37. ABOUT US   •  We’ve built a product… -  Our ‘Digital Marketing Optimization’ platform improves conversion rates & customer satisfaction for eCommerce & Marketing campaigns -  Launches Q1 2013 •  What else do we do? -  ‘Big Data’ & ‘Data Science’ professional services -  Bespoke prototype & solution development “TUMRA” is a transliteration of the Sanskrit word for “BIG”; we thought it’s a great name … ( and the .COM was available ) tumra.com  
  • 38. TUMRA You? THANKS FOR LISTENING   We’re hiring! Data Scientists & Developers work@tumra.com tumra.com  
  • 39. THANKS FOR LISTENING Questions?   tumra.com hello@tumra.com   twitter.com/tumra tumra.com