SlideShare a Scribd company logo
Linked Data for content analytics in Celi 
Semantics 2014 - Leipzig 
Alessio Bosca
Agenda 
ü Presentation of Celi 
ü Technologies (and what we do with 
them) 
ü Focus on LOD for content analytics 
in Celi 
ü … what we’d like to do 
2
1999 
CELI srl was born 
2002 
Speech Technology 
2006 
BlogMeter 
1999 2005 2010 
2013 
Korean Market 
2011 
Cross Library 
2010 
Milan, Rome, 
Trento 
3
4 Seats 
Torino 
Milano 
Trento 
Roma 
Italy 
Belgium 
France 
Spain 
Corea 
Poland 
50 
Employees + 
Collaborators 
6 Markets 
>100 
Active clients 
4 
Business branches 
15 Years of experience 
NLP components 
Speech technology 
Social Media Intelligence 
Digital Humanities 
4
>50 Published papers 15 Research projects 
Scuola Normale Superiore 
Università di Torino 
Università di Pisa 
Università di Trento 
Fondazione Bruno Kessler 
Politecnico di Milano 
6 Agreements with research centers 
Relationships with the scientific community 
5
6 
Core technology 
opinion 
mining, 
mood and 
sentiment 
analysis 
normalization 
language 
identification 
NSW 
processing morphological 
tokenization 
analysis 
disambiguation 
chunking and 
phrasing 
phonetic 
transcription 
with word 
stress 
semantic 
clustering 
automatic 
classification 
named entities
Techs 
Guava 
Kestrel 
Virtuoso 
OpenSource 
7
8 
Clients 
Semantic Solutions Speech Technology Social Media Monitoring
Linked (and/or Open) Data 
Linked Data 
Open Data 
? 
LOD 
9
Private Sector: how Celi exploits L(O)D 
• as user 
LODs as linguistic resources for NER, 
content enrichment, machine linking, 
discovery search… 
• as provider for the PA 
publishing, data integration 
• internal use 
(e.g. assets management) 
• crafting of RDF artifacts 
for custom projects and applications 
10
LOD for NER 
• GENDER GUESSER 
• LOCATION GUESSER 
• ENTITY LINKER 
• ETC . 
11 
INDEXER 
DUMP 
CELI TRIPLE STORES 
INDEXES 
Linguistic Analysis 
SPARQL QUERIES 
SEARCHER 
CUSTOM RDF 
WEBAPPS
Faceted Semantic Search 
Browse through 
documents 
and contents 
Relations between Facets 
12
LOD for CLIR 
THE AGROVOC 
THESAURUS HAS 
BEEN USED IN 
THE 
ORGANIC.LINGU 
A PROJECT FOR 
ONTOLOGY-BASED 
CLIR 
13
Sem-web techs for internal models 
Information in the 
CRUNCHED BOOK is 
represented using 
combinations of RDF 
and GRAPH DBS 
14
Public Sector: clear process … 
acquire 
data 
set open 
license 
open 
formats publish 
15 
Celi for the public sector 
(CSI Piemonte): 
the Homer project
(Public sector contd.) … but … 
LACK 
OF MONEY 
USE OF “STANDARDS” 
… hard 
problems OPAQUE DATASETS 
LACK 
OF WILLINGNESS 
POOR RDF/SPARQL 
SUPPORT 
16
Why companies’ RDF is not published 
• It reflects customers’ needs • It reflects internal data models 
HENCE à OVERFITTING: 
Provocation 
It would not be interesting nor usable 
WAY OUTS: 
having more standard models for particular micro-domains could permit their direct 
(re)use by the private company (and hence the publication of enhanced versions) 
17
Receipts 
Public Sector: use “true” LOD technologies (RDF dumps and SPARQL endpoints) 
Private companies: use standard data models, internally and for their artifacts 
OpenData Community: please stress the linked in LOD! 
The success of LOD is bound to the use of Linked Data 
(as a technology) 
The use of LD in the Private Sector will positively 
feedback on the diffusion of the necessary expertise and 
sensibility in the Public Sector too 
18
Thank You!

More Related Content

PPT
InLOC: new motivation for portfolios
PDF
An introduction to Semantic Web and Linked Data
ODP
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
PPTX
Linked Data for Biopharma
PDF
Lider Reference Model ld4lt session March, 3rd, 2015
PDF
Forum Tal 2014: Celi company presentation
PPTX
The Future of LOD
PDF
Adoption of Digital Learning Objects
InLOC: new motivation for portfolios
An introduction to Semantic Web and Linked Data
SESAM4 - A guide to semantics in the Linked Open Data cloud, Robert HP Engels...
Linked Data for Biopharma
Lider Reference Model ld4lt session March, 3rd, 2015
Forum Tal 2014: Celi company presentation
The Future of LOD
Adoption of Digital Learning Objects

Similar to Alessio Bosca: Linked Data for Content Analytics in CELI (20)

PDF
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
PDF
Weaving a Web of Linked Data - September 26th, 2019
DOC
Notes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
PDF
A Lightning Introduction To Clouds & HLT - Human Language Technology Conference
PPT
Methodology for the publication of Linked Open Data from small and medium siz...
PPTX
Hello Open World - Semtech 2009
PPT
Michael Lang Sr. Presentation
PPT
SURFconext: a next generation collaboration infrastructure across institution...
PPT
Linked Open Data and Ontotext Projects
PPTX
Clarin nl odijk-final_event_2015-03-13
PDF
BUILDING Q&A EDUCATIONAL APPLICATIONS WITH LLMS - MARCH 2024.pdf
PPTX
Jones "Enabling Discovery in the Library"
PPTX
SWSIG wlic2016
PDF
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
PDF
BDS14 Big Data Analytics to the masses
PDF
Linked Open Library Data @hbz
PDF
Data Production Pipelines: Legacy, practices, and innovation
PPTX
Grand Challenges Learning Analytics
PDF
Peter Bjørn Larsen - Öresund Smart City Hub
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
Weaving a Web of Linked Data - September 26th, 2019
Notes for talk on 12th June 2013 to Open Innovation meeting, Glasgow
A Lightning Introduction To Clouds & HLT - Human Language Technology Conference
Methodology for the publication of Linked Open Data from small and medium siz...
Hello Open World - Semtech 2009
Michael Lang Sr. Presentation
SURFconext: a next generation collaboration infrastructure across institution...
Linked Open Data and Ontotext Projects
Clarin nl odijk-final_event_2015-03-13
BUILDING Q&A EDUCATIONAL APPLICATIONS WITH LLMS - MARCH 2024.pdf
Jones "Enabling Discovery in the Library"
SWSIG wlic2016
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
BDS14 Big Data Analytics to the masses
Linked Open Library Data @hbz
Data Production Pipelines: Legacy, practices, and innovation
Grand Challenges Learning Analytics
Peter Bjørn Larsen - Öresund Smart City Hub
Ad

More from mbruemmer (7)

PDF
Dirk Goldhahn: Introduction to the German Wortschatz Project
PDF
Oscar Muñoz: Content Analytics for Media Agencies
PDF
Marc Egger: Text Analytics for Brand Research -Non-reactive Concept Mapping t...
PDF
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
PDF
Ilan Kernerman: Generating Multilingual Lexicographic Resources
PDF
Tatiana Gornostay: Language Meets Knowledge in Digital Content Management
PDF
Lemon-aid: using Lemon to aid quantitative historical linguistic analysis
Dirk Goldhahn: Introduction to the German Wortschatz Project
Oscar Muñoz: Content Analytics for Media Agencies
Marc Egger: Text Analytics for Brand Research -Non-reactive Concept Mapping t...
Mark Zöpfgen: Software-Supported Bibliographic Recording and Linked Data
Ilan Kernerman: Generating Multilingual Lexicographic Resources
Tatiana Gornostay: Language Meets Knowledge in Digital Content Management
Lemon-aid: using Lemon to aid quantitative historical linguistic analysis
Ad

Recently uploaded (20)

PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Introduction to Data Science and Data Analysis
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Mega Projects Data Mega Projects Data
PPTX
Introduction to machine learning and Linear Models
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
annual-report-2024-2025 original latest.
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
SAP 2 completion done . PRESENTATION.pptx
Clinical guidelines as a resource for EBP(1).pdf
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to Data Science and Data Analysis
Introduction to Knowledge Engineering Part 1
STERILIZATION AND DISINFECTION-1.ppthhhbx
Mega Projects Data Mega Projects Data
Introduction to machine learning and Linear Models
Supervised vs unsupervised machine learning algorithms
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Database Infoormation System (DBIS).pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
annual-report-2024-2025 original latest.
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx

Alessio Bosca: Linked Data for Content Analytics in CELI

  • 1. Linked Data for content analytics in Celi Semantics 2014 - Leipzig Alessio Bosca
  • 2. Agenda ü Presentation of Celi ü Technologies (and what we do with them) ü Focus on LOD for content analytics in Celi ü … what we’d like to do 2
  • 3. 1999 CELI srl was born 2002 Speech Technology 2006 BlogMeter 1999 2005 2010 2013 Korean Market 2011 Cross Library 2010 Milan, Rome, Trento 3
  • 4. 4 Seats Torino Milano Trento Roma Italy Belgium France Spain Corea Poland 50 Employees + Collaborators 6 Markets >100 Active clients 4 Business branches 15 Years of experience NLP components Speech technology Social Media Intelligence Digital Humanities 4
  • 5. >50 Published papers 15 Research projects Scuola Normale Superiore Università di Torino Università di Pisa Università di Trento Fondazione Bruno Kessler Politecnico di Milano 6 Agreements with research centers Relationships with the scientific community 5
  • 6. 6 Core technology opinion mining, mood and sentiment analysis normalization language identification NSW processing morphological tokenization analysis disambiguation chunking and phrasing phonetic transcription with word stress semantic clustering automatic classification named entities
  • 7. Techs Guava Kestrel Virtuoso OpenSource 7
  • 8. 8 Clients Semantic Solutions Speech Technology Social Media Monitoring
  • 9. Linked (and/or Open) Data Linked Data Open Data ? LOD 9
  • 10. Private Sector: how Celi exploits L(O)D • as user LODs as linguistic resources for NER, content enrichment, machine linking, discovery search… • as provider for the PA publishing, data integration • internal use (e.g. assets management) • crafting of RDF artifacts for custom projects and applications 10
  • 11. LOD for NER • GENDER GUESSER • LOCATION GUESSER • ENTITY LINKER • ETC . 11 INDEXER DUMP CELI TRIPLE STORES INDEXES Linguistic Analysis SPARQL QUERIES SEARCHER CUSTOM RDF WEBAPPS
  • 12. Faceted Semantic Search Browse through documents and contents Relations between Facets 12
  • 13. LOD for CLIR THE AGROVOC THESAURUS HAS BEEN USED IN THE ORGANIC.LINGU A PROJECT FOR ONTOLOGY-BASED CLIR 13
  • 14. Sem-web techs for internal models Information in the CRUNCHED BOOK is represented using combinations of RDF and GRAPH DBS 14
  • 15. Public Sector: clear process … acquire data set open license open formats publish 15 Celi for the public sector (CSI Piemonte): the Homer project
  • 16. (Public sector contd.) … but … LACK OF MONEY USE OF “STANDARDS” … hard problems OPAQUE DATASETS LACK OF WILLINGNESS POOR RDF/SPARQL SUPPORT 16
  • 17. Why companies’ RDF is not published • It reflects customers’ needs • It reflects internal data models HENCE à OVERFITTING: Provocation It would not be interesting nor usable WAY OUTS: having more standard models for particular micro-domains could permit their direct (re)use by the private company (and hence the publication of enhanced versions) 17
  • 18. Receipts Public Sector: use “true” LOD technologies (RDF dumps and SPARQL endpoints) Private companies: use standard data models, internally and for their artifacts OpenData Community: please stress the linked in LOD! The success of LOD is bound to the use of Linked Data (as a technology) The use of LD in the Private Sector will positively feedback on the diffusion of the necessary expertise and sensibility in the Public Sector too 18