SlideShare a Scribd company logo
ECIR 2014 Industry Day
Content Discovery Through Entity Driven Search
Alessandro Benedetti
http://uk.linkedin.com/in/alexbenedetti
Antonio David Perez Morales
http://es.linkedin.com/in/adperezmorales
16th
April 2014
• Experienced at building and delivering a wide range of enterprise
solutions across the whole information life cycle
• Alfresco & Ephesoft certified Platinum Partner
• Red Hat Enterprise Linux Ready Partner
• Crafter & Varnish Gold Partners
• Search Solutions Consultant
Alfresco Partner of the Year 2012 and
2013
Working effectively together
Who We Are
3
Antonio David Pérez Morales
- R&D Senior Engineer
- Master in Engineering and Technology
Software
- Digital Identity and Security expert
- Enterprise Search Background
- Semantic, NLP, ML Technologies and
Information Retrieval lover
- Apache Stanbol Committer
- Apache contributor
@adperezmorales
http://es.linkedin.com/in/adperezmorales/
Alessandro Benedetti
- R&D Senior Engineer
- Master in Computer Science
- Information Retrieval background
-- Enterprise Search specialist
- Semantic, NLP, ML Technologies
and Information Retrieval lover
@AlexBenedetti
http://uk.linkedin.com/in/alexbenedetti
Working effectively together
Agenda
4
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Agenda
5
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Zaizi R&D Department
6
•Giving sense to the content
• Enriching it semantically
•Adding value to ECM/CMS
• More structured content, easy to manage, link and search,
•Improving search
• Across different domains, data sources, User Experience
• Machine Learning applied research
• Content Organization – Recommendation Systems
Working effectively together
Agenda
7
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Enterprise Search Problems
8
Challenge :
Search within Big and Heterogeneus Repositories
• Heterogeneus Data Sources
• Filesystem, DB, ECM/CMS, Email, …
• Unstructured Content
• PDFs, text plain, Word, …
• Documents not linked between each other
• Federated Search needed
• Search across data sources
• Different permissions
• Centralized endpoint
Working effectively together
Current Enterprise Search Weaknesses
9
• Keyword based
• Low precision
• Ambiguous terms not in context
• Not accurate weighting when keywords are combined
in a query
Working effectively together
Agenda
10
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Entity Driven Search
11
• Moves from keywords to Entities
•More understandable to a Human
• Process the unstructured text
• Enrich it
• Build specific indexes
• Use entities and concepts in searches
Working effectively together
Sensefy
12
• Semantic Enterprise Search Engine
• Federated Search
• Evolved User Experience
• Based on cutting-edge Open Source Frameworks
Working effectively together
Architecture
13
Working effectively together
RedLink
14
• Semantic Cloud platform
• Providing Software as a Service
• Manage unstructured data
• Extract knowledge and intelligence
• Make sense of information
• Feed into business processes
• Open-Source based components
• Entity Linking using Knowledge Bases
Working effectively together
NLP & Semantic Enrichment
15
• From unstructured to structured
• NLP Analysis. POS Tagging
• Named Entities Recognition
• Linked Data
• Entity Linking using Knowledge Bases
• Disambiguation
• Indexing in Solr
Working effectively together
Smart Autocomplete
16
• Multi Phase suggestions
• Closer to natural language query formulation
• Named Entities infix
• Entity types infix
• Multi Language entity type support
• Properties driven query approach
Working effectively together
Smart Autocomplete
Configuration
17
• Entity type properties
•Interesting to our use case and scenario
• Properties inheritance through type hierarchy
• Enhance type information from external resource
•Freebase, DbPedia , Custom Data Set
Working effectively together
Semantic Search
18
• Search by Named Entity
• Search by Entity Type
• Search by Entity Type properties
• Grouping Results by Sense
• Contextualize Results Using Semantic Information
Working effectively together
Semantic More Like This
19
• Search for Similar Documents based on Entities and Entities’
categories
• Similarity Function based on Documents’ Sense
• Not based on text tokens
• Entity Frequency /
Inverted Document Frequency
• Entity Type Frequency /
Inverted Document Frequency
Working effectively together
Agenda
20
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Agenda
21
• Context
• Problem
• Solution
• Demo
• Future Works
Working effectively together
Future Work
22
• Semantic More Like This new approach (Graph
relations)
• Machine Learning components: Classification, Topic
annotation, Clustering
• Semantic facets
• Secured Entity Search
• Image and Media searches
Working effectively together
Conclusions
23
• Better user experience
• More precision in search results
• Closer to human language
Zaizi Headquarters
Brook House
4th Floor, North Wing
229-243 Shepherd’s Bush Road
London W6 7AN
United Kingdom
T: (+44) 20 3582 8330
Zaizi Iberia
Calle Gremios 13-15, Edificio Diseño
Planta 1, Oficina 5
41927 Mairena del Aljarafe
Sevilla
Spain
T: (+34) 666 42 43 64
Zaizi Asia
50 Flower Road
Colombo 07
Sri Lanka
T: (+94) 112 301 461
Zaizi Singapore
14 Robinson Road #13-00
Far East Finance Building
Singapore 048545
T: (+65) 3158 5886
F: (+65) 6323 1839
VAT Registration No GB 932 8855 89
Registered in England and Wales with registration number 6440931
www.zaizi.com
Thanks!

More Related Content

PDF
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
PDF
Sebastian Hellmann
PPTX
David Kuilman | Creating a Semantic Enterprise Content model to support conti...
PPSX
RDF and OWL : the powerful duo | Tara Raafat
PDF
Ontos NLP Stack, Sep. 2016
PPT
Search driven knowledge management
PPT
Marc and beyond: 3 Linked Data Choices
PDF
Semantic E-Commerce - Use Cases in Enterprise Web Applications
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
Sebastian Hellmann
David Kuilman | Creating a Semantic Enterprise Content model to support conti...
RDF and OWL : the powerful duo | Tara Raafat
Ontos NLP Stack, Sep. 2016
Search driven knowledge management
Marc and beyond: 3 Linked Data Choices
Semantic E-Commerce - Use Cases in Enterprise Web Applications

What's hot (14)

PDF
Three Linked Data choices for Libraries
PPTX
Crawlable Spatial Data - #Geo4Web research topic #3
PPTX
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
PPTX
DFW SEO Meetup 3-11-2014
PDF
Linked Open Data in the World of Patents
PDF
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...
PPTX
Daniel Ene - Keyword Research (2014.10.30, Impact HUB Bucharest)
PDF
Schema.org Structured data the What, Why, & How
PPTX
Digital Projects Outreach: A Challenge to Traditional Library Liaison Services
PPTX
RDA and Linked Data: Moving Beyond the Rules.
PDF
Mastering your data with ca e rwin dm 09082010
PDF
Understanding Voice of Members via Text Mining – How Linkedin Built a Text An...
PDF
Structured Data: It's All About the Graph!
PDF
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...
Three Linked Data choices for Libraries
Crawlable Spatial Data - #Geo4Web research topic #3
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
DFW SEO Meetup 3-11-2014
Linked Open Data in the World of Patents
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...
Daniel Ene - Keyword Research (2014.10.30, Impact HUB Bucharest)
Schema.org Structured data the What, Why, & How
Digital Projects Outreach: A Challenge to Traditional Library Liaison Services
RDA and Linked Data: Moving Beyond the Rules.
Mastering your data with ca e rwin dm 09082010
Understanding Voice of Members via Text Mining – How Linkedin Built a Text An...
Structured Data: It's All About the Graph!
Kerstin Diwisch | Towards a holistic visualization management for knowledge g...
Ad

Viewers also liked (20)

PPTX
React Native Intro
PPT
Extract And Manage Knowledge
PPT
SES Personalization, User Data & Search
PPTX
WordCamp Montreal 2016 WP-API + React with server rendering
PDF
Hadoop in Love
PDF
Incorporating site level knowledge to extract structured data from web forums...
PPTX
Algorithmic music generation
PDF
Part 1: Algorithmic Self-Governance
PPTX
React native - t3chfest 2016
PPTX
React Native is Ready for Prime Time — Team × Technology (React Conf 2016)
PDF
Algorithmic Game Theory
PDF
DevCommerce Conference 2016: React para aplicações web e mobile como platafor...
PDF
Geek Time Juin 2016 : React
PPT
Search 4.0 Search Ads and Behavioral Targeting
PDF
Algorithmic Information Theory and Computational Biology
PPTX
Engines of Order. Social Media and the Rise of Algorithmic Knowing.
PPTX
React for WordPress developers
PDF
Web Development with Delphi and React - ITDevCon 2016
PDF
React Native: React Meetup 3
PPTX
React Native - Unleash the power of React in your device - Eduard Tomàs - Cod...
React Native Intro
Extract And Manage Knowledge
SES Personalization, User Data & Search
WordCamp Montreal 2016 WP-API + React with server rendering
Hadoop in Love
Incorporating site level knowledge to extract structured data from web forums...
Algorithmic music generation
Part 1: Algorithmic Self-Governance
React native - t3chfest 2016
React Native is Ready for Prime Time — Team × Technology (React Conf 2016)
Algorithmic Game Theory
DevCommerce Conference 2016: React para aplicações web e mobile como platafor...
Geek Time Juin 2016 : React
Search 4.0 Search Ads and Behavioral Targeting
Algorithmic Information Theory and Computational Biology
Engines of Order. Social Media and the Rise of Algorithmic Knowing.
React for WordPress developers
Web Development with Delphi and React - ITDevCon 2016
React Native: React Meetup 3
React Native - Unleash the power of React in your device - Eduard Tomàs - Cod...
Ad

Similar to ECIR-2014: Multilanguage Content Discovery Through Entity Driven Search (20)

PDF
cross media concept and entity driven search for enterprise
PPT
How search engines work
PPTX
The evolution of Search spscinci
PPTX
Enterprise search Information
PPTX
Solving Real World Challenges with Enterprise Search
PDF
Search Solutions 2011: Successful Enterprise Search By Design
PPTX
Search Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint Summit
PDF
Graph databases and the #panamapapers
PPTX
Relevancy and Search Quality Analysis - Search Technologies
PDF
10 Sourcing Tips with Ryan Gillis - SourceCon DC Webinar 8-29-19
PDF
Harnessing search engines for KM
PDF
Information Architecture Exposing the Secret Sauce for Success
PPTX
SPConnections Amsterdam: Beyond the Search Center - Application or Solution? ...
PDF
Going Meta – How to Use Metadata in SharePoint and Office 365
PPTX
SharePoint Saturday Toronto - Going Meta – How to Use Metadata in SharePoint ...
PPTX
#SEASPC: Information Architecture and Enterprise Search - Better Together
PPTX
FAST Search-webinar-06-29-2010
PPTX
European SharePoint Conference Automated Tagging and Metadata Management w...
PDF
Big Data Evolution
PDF
The Enterprise Search Market in a Nutshell
cross media concept and entity driven search for enterprise
How search engines work
The evolution of Search spscinci
Enterprise search Information
Solving Real World Challenges with Enterprise Search
Search Solutions 2011: Successful Enterprise Search By Design
Search Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint Summit
Graph databases and the #panamapapers
Relevancy and Search Quality Analysis - Search Technologies
10 Sourcing Tips with Ryan Gillis - SourceCon DC Webinar 8-29-19
Harnessing search engines for KM
Information Architecture Exposing the Secret Sauce for Success
SPConnections Amsterdam: Beyond the Search Center - Application or Solution? ...
Going Meta – How to Use Metadata in SharePoint and Office 365
SharePoint Saturday Toronto - Going Meta – How to Use Metadata in SharePoint ...
#SEASPC: Information Architecture and Enterprise Search - Better Together
FAST Search-webinar-06-29-2010
European SharePoint Conference Automated Tagging and Metadata Management w...
Big Data Evolution
The Enterprise Search Market in a Nutshell

Recently uploaded (20)

PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Transform Your Business with a Software ERP System
PDF
System and Network Administraation Chapter 3
PPTX
ai tools demonstartion for schools and inter college
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPTX
Online Work Permit System for Fast Permit Processing
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
history of c programming in notes for students .pptx
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PPTX
Introduction to Artificial Intelligence
PPTX
L1 - Introduction to python Backend.pptx
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
CHAPTER 2 - PM Management and IT Context
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Design an Analysis of Algorithms II-SECS-1021-03
Which alternative to Crystal Reports is best for small or large businesses.pdf
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Transform Your Business with a Software ERP System
System and Network Administraation Chapter 3
ai tools demonstartion for schools and inter college
Odoo POS Development Services by CandidRoot Solutions
Online Work Permit System for Fast Permit Processing
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Odoo Companies in India – Driving Business Transformation.pdf
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
VVF-Customer-Presentation2025-Ver1.9.pptx
history of c programming in notes for students .pptx
ManageIQ - Sprint 268 Review - Slide Deck
Introduction to Artificial Intelligence
L1 - Introduction to python Backend.pptx
Navsoft: AI-Powered Business Solutions & Custom Software Development
CHAPTER 2 - PM Management and IT Context

ECIR-2014: Multilanguage Content Discovery Through Entity Driven Search

  • 1. ECIR 2014 Industry Day Content Discovery Through Entity Driven Search Alessandro Benedetti http://uk.linkedin.com/in/alexbenedetti Antonio David Perez Morales http://es.linkedin.com/in/adperezmorales 16th April 2014
  • 2. • Experienced at building and delivering a wide range of enterprise solutions across the whole information life cycle • Alfresco & Ephesoft certified Platinum Partner • Red Hat Enterprise Linux Ready Partner • Crafter & Varnish Gold Partners • Search Solutions Consultant Alfresco Partner of the Year 2012 and 2013
  • 3. Working effectively together Who We Are 3 Antonio David Pérez Morales - R&D Senior Engineer - Master in Engineering and Technology Software - Digital Identity and Security expert - Enterprise Search Background - Semantic, NLP, ML Technologies and Information Retrieval lover - Apache Stanbol Committer - Apache contributor @adperezmorales http://es.linkedin.com/in/adperezmorales/ Alessandro Benedetti - R&D Senior Engineer - Master in Computer Science - Information Retrieval background -- Enterprise Search specialist - Semantic, NLP, ML Technologies and Information Retrieval lover @AlexBenedetti http://uk.linkedin.com/in/alexbenedetti
  • 4. Working effectively together Agenda 4 • Context • Problem • Solution • Demo • Future Works
  • 5. Working effectively together Agenda 5 • Context • Problem • Solution • Demo • Future Works
  • 6. Working effectively together Zaizi R&D Department 6 •Giving sense to the content • Enriching it semantically •Adding value to ECM/CMS • More structured content, easy to manage, link and search, •Improving search • Across different domains, data sources, User Experience • Machine Learning applied research • Content Organization – Recommendation Systems
  • 7. Working effectively together Agenda 7 • Context • Problem • Solution • Demo • Future Works
  • 8. Working effectively together Enterprise Search Problems 8 Challenge : Search within Big and Heterogeneus Repositories • Heterogeneus Data Sources • Filesystem, DB, ECM/CMS, Email, … • Unstructured Content • PDFs, text plain, Word, … • Documents not linked between each other • Federated Search needed • Search across data sources • Different permissions • Centralized endpoint
  • 9. Working effectively together Current Enterprise Search Weaknesses 9 • Keyword based • Low precision • Ambiguous terms not in context • Not accurate weighting when keywords are combined in a query
  • 10. Working effectively together Agenda 10 • Context • Problem • Solution • Demo • Future Works
  • 11. Working effectively together Entity Driven Search 11 • Moves from keywords to Entities •More understandable to a Human • Process the unstructured text • Enrich it • Build specific indexes • Use entities and concepts in searches
  • 12. Working effectively together Sensefy 12 • Semantic Enterprise Search Engine • Federated Search • Evolved User Experience • Based on cutting-edge Open Source Frameworks
  • 14. Working effectively together RedLink 14 • Semantic Cloud platform • Providing Software as a Service • Manage unstructured data • Extract knowledge and intelligence • Make sense of information • Feed into business processes • Open-Source based components • Entity Linking using Knowledge Bases
  • 15. Working effectively together NLP & Semantic Enrichment 15 • From unstructured to structured • NLP Analysis. POS Tagging • Named Entities Recognition • Linked Data • Entity Linking using Knowledge Bases • Disambiguation • Indexing in Solr
  • 16. Working effectively together Smart Autocomplete 16 • Multi Phase suggestions • Closer to natural language query formulation • Named Entities infix • Entity types infix • Multi Language entity type support • Properties driven query approach
  • 17. Working effectively together Smart Autocomplete Configuration 17 • Entity type properties •Interesting to our use case and scenario • Properties inheritance through type hierarchy • Enhance type information from external resource •Freebase, DbPedia , Custom Data Set
  • 18. Working effectively together Semantic Search 18 • Search by Named Entity • Search by Entity Type • Search by Entity Type properties • Grouping Results by Sense • Contextualize Results Using Semantic Information
  • 19. Working effectively together Semantic More Like This 19 • Search for Similar Documents based on Entities and Entities’ categories • Similarity Function based on Documents’ Sense • Not based on text tokens • Entity Frequency / Inverted Document Frequency • Entity Type Frequency / Inverted Document Frequency
  • 20. Working effectively together Agenda 20 • Context • Problem • Solution • Demo • Future Works
  • 21. Working effectively together Agenda 21 • Context • Problem • Solution • Demo • Future Works
  • 22. Working effectively together Future Work 22 • Semantic More Like This new approach (Graph relations) • Machine Learning components: Classification, Topic annotation, Clustering • Semantic facets • Secured Entity Search • Image and Media searches
  • 23. Working effectively together Conclusions 23 • Better user experience • More precision in search results • Closer to human language
  • 24. Zaizi Headquarters Brook House 4th Floor, North Wing 229-243 Shepherd’s Bush Road London W6 7AN United Kingdom T: (+44) 20 3582 8330 Zaizi Iberia Calle Gremios 13-15, Edificio Diseño Planta 1, Oficina 5 41927 Mairena del Aljarafe Sevilla Spain T: (+34) 666 42 43 64 Zaizi Asia 50 Flower Road Colombo 07 Sri Lanka T: (+94) 112 301 461 Zaizi Singapore 14 Robinson Road #13-00 Far East Finance Building Singapore 048545 T: (+65) 3158 5886 F: (+65) 6323 1839 VAT Registration No GB 932 8855 89 Registered in England and Wales with registration number 6440931 www.zaizi.com Thanks!