SlideShare a Scribd company logo
Linked Data at
Tatiana Al-Chueyr Martins
tatiana.martins@corp.globo.com
@tati_alchueyr
18 de setembro de 2013, Simpósio Rio Info
globo.com
BROADCAST MOVIES PAY TV INTERNET
EVENTS MUSIC
PUBLISHING
NEW VENTURES NEWSPAPERRADIO NETWORK
Andréia Bustamante
Ícaro Medeiros
Tatiana Al-Chueyr
Rodrigo Senra
Semantic Team
Franklin Amorim
Diogo Kiss
Contributors
MotivationNot only words
São Paulo
MotivationNot only words
São Paulo?
MotivationNot only words
São Paulo state
MotivationNot only words
São Paulo city
MotivationNot only words
São Paulo saint
MotivationNot only words
São Paulo soccer team
MotivationMultiple words for the same thing
Female
f
F
female
woman
...
MotivationMultiple words for the same thing
http://data.globo.com/female
Motivation
Soccer player
Cross-link content from different web products
Politician
MotivationCross-link content from different web products
Celebrity
Motivation
● Cross-link content from different web products
MotivationCross-link content from different web products
Isabella Nardoni foi morta em 29 de março de 2008
na Zona Norte de São Paulo (Foto:Reprodução)
Isabella de Oliveira Nardoni, de 5
anos, foi morta na noite de 29 de
março de 2008. A perícia concluiu
que a menina foi atirada do sexto
andar do prédio onde moravam seu
pai, Alexandre Nardoni, sua
madrasta, Anna Carolina Jatobá, e
dois filhos pequenos do casal, na
Vila Isolina Mazzei, na zona norte de
São Paulo.
Túmulo de Isabella vira local de visitação em SP; casal Nardoni está preso.
Caso Isabella Nardoni
Juliana Cardilli G1 SP
RDF
FOAF
GEO
Dublin
Core
SKOS
Semantic markup in web pages
Motivation
Recommend annotations to information Producer
Motivation
Suggest related content to information Consumer
Motivation
Suggest related content to information Consumer
Motivation
Suggest related content to information Consumer
Motivation
Changes
● Replacement of words by entities
http://data.globo.com/person/Person/santos_dumont
Changes
● Replacement of labels by qualified relationships
Changes
● Organize data from tables to graphs
Outcomes
● To replace words by entities improved:
○ Finding
○ Linking
○ Reconciling
○ Organizing
multiple layers of information
Outcomes
● Flexible ways to organize content
● Ease to find related issues
● Explicit relations derived from annotated content
● Up-to-date topic pages with little editorial effort
● Linking content across different web products
● Seamless navigation leading to flow state
Status Quo
Used by the main web products of Globo.com:
○ 18,485 organizations
○ 83,000 people
○ 9,129 places
○ 1,000,000+ annotated news
Which sum up 2,500,000+ entities!
from August 2010 to May 2013
Linked data
problems
Legacy Architecture
CDA
CMA
triple
store
search
engine
ontology
CDA
CMA
CDA
CMA
CDA
CMA
CDA
CMA
Legacy Architecture
triple
store
search
engine
ontology
Poor data management
○ direct access to triple store (unmanaged)
○ difficulty to share data (distributed DBs)
○ re-sync triple-store and search engine index
○ scalability of triple store
○ high entropy in distributed ontology engineering
Problems
Problems
Ontology Engineering
Domain-driven
(current)
Base
G1 GE EGO TVG
news sports gossip tv
Upper
Person Organization
Music
Politics
Programme Education
Sports
Product-driven
(past)
Place
Possible Solution
Upper
Ontology
Semantic as a library
○ many different versions in production
○ programming language dependent
○ steep learning curve for RDF/OWL/SPARQL
Problems
Create an open semantic data management platform
● Scalable
● Mobile and Web friendly
● Interconnect Globo's data with external data sources
● Automate content extraction (including NER)
Solution
Brainiak
linked data restful
API
CDA
CMA
CDA
CMA
CDA
CMA
CDA
CMA
Legacy Architecture
triple
store
search
engine
ontology
API
Brainiak
CMA
CDA
CDA
CDA
CDA
triple
store
search
engine
Under Development
Requirements
● Indirect usage of SPARQL
● Programming language independent
● Data management with quality
● Finer-grained authorization and authentication
● Isolate applications from triplestore
● Improve triplestore performance
SPARQL query
DEFINE input:inference <http://data.globo.com/ruleset>
SELECT ?uri ?label
FROM <http://data.globo.com/sports/>
WHERE
{
?uri a <http://data.globo.com/sports/Team>;
rdfs:label ?label .
}
LIMIT 10
OFFSET 0
task: list all sports teams
/sports/Team
Brainiak query
GET
SPARQL response
Brainiak response
SPARQL query
SELECT DISTINCT ?class
WHERE {
<http://data.globo.com/place/City> rdfs:subClassOf ?class OPTION
(TRANSITIVE, t_distinct, t_step('step_no') as ?n, t_min (0)) .
?class a owl:Class .
}
task: retrieve all superclasses of a class
SPARQL query
SELECT DISTINCT ?predicate ?predicate_graph ?predicate_comment ?type ?range ?title ?range_graph ?range_label ?super_property
WHERE {
{
GRAPH ?predicate_graph { ?predicate rdfs:domain ?domain_class } .
} UNION {
graph ?predicate_graph {?predicate rdfs:domain ?blank} .
?blank a owl:Class .
?blank owl:unionOf ?enumeration .
OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } .
OPTIONAL { ?list_node rdf:first ?domain_class } .
}
FILTER (?domain_class IN (<http://data.globo.com/place/City>, <http://data.globo.com/place/GeopoliticalDivision>, <http://data.globo.com/place/Place>, <http://data.globo.
com/upper/Object>, <http://data.globo.com/upper/Substance>, <http://data.globo.com/upper/ConcreteEntity>, <http://data.globo.com/upper/Entity>))
{?predicate rdfs:range ?range .}
UNION {
?predicate rdfs:range ?blank .
?blank a owl:Class .
?blank owl:unionOf ?enumeration .
OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } .
OPTIONAL { ?list_node rdf:first ?range } .
}
FILTER (!isBlank(?range))
?predicate rdfs:label ?title .
?predicate rdf:type ?type .
OPTIONAL { ?predicate rdfs:subPropertyOf ?super_property } .
FILTER (?type in (owl:ObjectProperty, owl:DatatypeProperty)) .
FILTER(langMatches(lang(?title), "en") OR langMatches(lang(?title), "")) .
OPTIONAL { ?predicate rdfs:comment ?predicate_comment }
FILTER(langMatches(lang(?predicate_comment), "en") OR langMatches(lang(?predicate_comment), "")) .
OPTIONAL {
GRAPH ?range_graph {
?range rdfs:label ?range_label .
FILTER(langMatches(lang(?range_label), "en") OR langMatches(lang(?range_label), "")) .
}
}
}
task: retrieve all properties of a group of classes
SPARQL query
SELECT DISTINCT ?predicate ?min ?max ?range ?enumerated_value ?enumerated_value_label
WHERE {
<http://data.globo.com/place/City> rdfs:subClassOf ?s OPTION (TRANSITIVE, t_distinct, t_step('step_no') as ?n,
t_min (0)) .
?s owl:onProperty ?predicate .
OPTIONAL { ?s owl:minQualifiedCardinality ?min } .
OPTIONAL { ?s owl:maxQualifiedCardinality ?max } .
OPTIONAL {
{ ?s owl:onClass ?range }
UNION { ?s owl:onDataRange ?range }
UNION { ?s owl:allValuesFrom ?range }
OPTIONAL { ?range owl:oneOf ?enumeration } .
OPTIONAL { ?enumeration rdf:rest ?list_node OPTION(TRANSITIVE, t_min (0)) } .
OPTIONAL { ?list_node rdf:first ?enumerated_value } .
OPTIONAL {
?enumerated_value rdfs:label ?enumerated_value_label .
} .
}
}
}
task: retrieve the cardinalities of all properties of a certain class
/place/City/_schema
Brainiak query
GET
● Enrich Globo.com search
● SEO (automatic schema.org)
● Improve annotator (DBpedia Spotlight)
● Richer content relationships (inference)
● Link to open data (e.g. DBPedia, dados.gov.br)
Next steps
Stay tuned
@brainiak_api
... will be soon released
as an open source project !
http://www.slideshare.net/
@semantic_team
@alchueyr
Slides
tatiana.martins@corp.globo.com
semantica@corp.globo.com
globo.com
Thank you
for the attention!

More Related Content

PDF
SQL For Programmers -- Boston Big Data Techcon April 27th
PPTX
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
PPTX
Federated Query Formulation and Processing Through BioFed
PPTX
Efficient source selection for sparql endpoint federation
PDF
Go fast in a graph world
PDF
Building a Knowledge Graph
PDF
Optimizing Facebook Campaigns with R
PPTX
Federated SPARQL Query Processing ISWC2015 Tutorial
SQL For Programmers -- Boston Big Data Techcon April 27th
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
Federated Query Formulation and Processing Through BioFed
Efficient source selection for sparql endpoint federation
Go fast in a graph world
Building a Knowledge Graph
Optimizing Facebook Campaigns with R
Federated SPARQL Query Processing ISWC2015 Tutorial

Viewers also liked (8)

PDF
PythonBrasil[8] closing
PDF
Python packaging and dependency resolution
PDF
Transifex: Ensinando o seu Software Público a falar novos idiomas
PPTX
3 min. lesson plan
PDF
Desarollando aplicaciones web en python con pruebas
PDF
Automatic English text correction
PDF
Desarollando aplicaciones móviles con Python y Android
PDF
Linking the world with Python and Semantics
PythonBrasil[8] closing
Python packaging and dependency resolution
Transifex: Ensinando o seu Software Público a falar novos idiomas
3 min. lesson plan
Desarollando aplicaciones web en python con pruebas
Automatic English text correction
Desarollando aplicaciones móviles con Python y Android
Linking the world with Python and Semantics
Ad

Similar to Rio info 2013 - Linked Data at Globo.com (20)

PDF
Semantic day 2013 linked data at globo.com
PDF
Linked data at globo.com
PDF
Open Data and News Analytics Demo
PDF
Open Data: a view from the trenches
PDF
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
PPTX
Boost your data analytics with open data and public news content
PDF
Simple fuzzy name matching in elasticsearch paris meetup
PDF
QCon SP - recommended for you
PDF
How to Reveal Hidden Relationships in Data and Risk Analytics
PPTX
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
PDF
Market analysis through Consumer Behavior Pattern Insights
PPTX
Diversifying Beyond Google- Brighton SEO October 24, Nathan Height
PPTX
Search Engine PPT For Students and Professionals
PDF
[Conference] Cognitive Graph Analytics on Company Data and News
PPT
Semantic Web Science
PDF
Make sense of your big data - Pilato
PDF
10 best platforms to find free datasets
PDF
Semantic SEO in the post Hummingbird Era and WordLift
PDF
A Day in the Life of a Functional Data Scientist
PPT
Data Strategy
Semantic day 2013 linked data at globo.com
Linked data at globo.com
Open Data and News Analytics Demo
Open Data: a view from the trenches
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
Boost your data analytics with open data and public news content
Simple fuzzy name matching in elasticsearch paris meetup
QCon SP - recommended for you
How to Reveal Hidden Relationships in Data and Risk Analytics
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
Market analysis through Consumer Behavior Pattern Insights
Diversifying Beyond Google- Brighton SEO October 24, Nathan Height
Search Engine PPT For Students and Professionals
[Conference] Cognitive Graph Analytics on Company Data and News
Semantic Web Science
Make sense of your big data - Pilato
10 best platforms to find free datasets
Semantic SEO in the post Hummingbird Era and WordLift
A Day in the Life of a Functional Data Scientist
Data Strategy
Ad

More from Tatiana Al-Chueyr (20)

PDF
PyData London - Scaling AI workloads with Ray & Airflow.pdf
PDF
dbt no Airflow: Como melhorar o seu deploy (de forma correta)
PDF
Integrating dbt with Airflow - Overcoming Performance Hurdles
PDF
Best Practices for Effectively Running dbt in Airflow
PDF
Integrating ChatGPT with Apache Airflow
PDF
Contributing to Apache Airflow
PDF
From an idea to production: building a recommender for BBC Sounds
PDF
Precomputing recommendations with Apache Beam
PDF
Scaling machine learning to millions of users with Apache Beam
PDF
Clearing Airflow Obstructions
PPTX
Scaling machine learning workflows with Apache Beam
PDF
Responsible machine learning at the BBC
PDF
Powering machine learning workflows with Apache Airflow and Python
PPTX
Responsible Machine Learning at the BBC
PDF
PyConUK 2018 - Journey from HTTP to gRPC
PDF
Sprint cPython at Globo.com
PDF
PythonBrasil[8] - CPython for dummies
PDF
Crafting APIs
PDF
PyConUK 2016 - Writing English Right
PDF
InVesalius: 3D medical imaging software
PyData London - Scaling AI workloads with Ray & Airflow.pdf
dbt no Airflow: Como melhorar o seu deploy (de forma correta)
Integrating dbt with Airflow - Overcoming Performance Hurdles
Best Practices for Effectively Running dbt in Airflow
Integrating ChatGPT with Apache Airflow
Contributing to Apache Airflow
From an idea to production: building a recommender for BBC Sounds
Precomputing recommendations with Apache Beam
Scaling machine learning to millions of users with Apache Beam
Clearing Airflow Obstructions
Scaling machine learning workflows with Apache Beam
Responsible machine learning at the BBC
Powering machine learning workflows with Apache Airflow and Python
Responsible Machine Learning at the BBC
PyConUK 2018 - Journey from HTTP to gRPC
Sprint cPython at Globo.com
PythonBrasil[8] - CPython for dummies
Crafting APIs
PyConUK 2016 - Writing English Right
InVesalius: 3D medical imaging software

Recently uploaded (20)

PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
IGGE1 Understanding the Self1234567891011
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
Classroom Observation Tools for Teachers
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PPTX
Digestion and Absorption of Carbohydrates, Proteina and Fats
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Complications of Minimal Access Surgery at WLH
PDF
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
PDF
Empowerment Technology for Senior High School Guide
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PPTX
Introduction to Building Materials
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
202450812 BayCHI UCSC-SV 20250812 v17.pptx
IGGE1 Understanding the Self1234567891011
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
LDMMIA Reiki Yoga Finals Review Spring Summer
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Classroom Observation Tools for Teachers
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Digestion and Absorption of Carbohydrates, Proteina and Fats
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
Final Presentation General Medicine 03-08-2024.pptx
Complications of Minimal Access Surgery at WLH
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
Empowerment Technology for Senior High School Guide
Paper A Mock Exam 9_ Attempt review.pdf.
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Introduction to Building Materials
What if we spent less time fighting change, and more time building what’s rig...
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf

Rio info 2013 - Linked Data at Globo.com