SlideShare a Scribd company logo
G. Futia F. Cairo F. Morando L. Leschiutta
Exploiting Linked Open Data
and Natural Language Processing for
Classification of Political Speech
Krems, 22nd
May 2014
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 2
Introduction
●
Our goal:
● assist anyone interested in automatic categorization of political
speeches, to identify unambiguously the main political trends
addressed by the White House
●
What we have to achieve our goal:
● TellMeFirst (http://tellmefirst.polito.it/), a topic extraction tool:
– it leverages DBpedia knowledge base and English Wikipedia
linguistic corpus
– it exploits Linked Open Data (LOD) and Natural Language
Processing (NLP) techniques
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 3
DBpedia
● A crowd-sourced community effort to extract
structured information from Wikipedia and a
central interlinking hub for the Linking Open Data
project.
● It is a suitable knowledge base for text classification
(Mendes et al., 2012; Hellmann et al., 2013; Steinmetz
et al., 2013)
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 4
Why DBpedia for US
political speeches?
Comparison between the
coverage of US politics and the
coverage of politics of other
countries
The coverage of politics in Wikipedia is “often very good for recent or
prominent topics but is lacking on older or more obscure topics”
(Brown, 2011).
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 5
Text Categorization Approach
● An instance-based approch:TellMeFirst assigns target
documents to classes based on a local comparison between
a set of pre-classified documents and the target
document itself
● This training set consists of all the Wikipedia paragraphs
where a wikilink occurs.These paragraphs are stored in a
Lucene index, where each document represents a DBpedia
resource
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 6
Success rate (%) of the TellMeFirst classification
process on US Presidents profiles
1st topic Within the
first 2 topics
Within the
first 7 topics
Full text of the Presidents profiles 95.4% 100% 100%
President profiles without name
and surname
45.4% 61.3% 90.9%
TellMeFirst provides as output the seven most relevant topics
(in the form of DBpedia URI) of the document sorted by relevance
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 7
whitehouse.gov
●
3173 videos in English were available on the White House
website on the 24th of November 2013
● These videos are categorized according to a taxonomy not
related to the subject of the speeches
● They need a semantic layer that point out the content of the
speeches, so that questions such as “what is the First Lady
talking about?” could be automatically answered
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 8
Not just a bag-of-words tool
Results obtained with TellMeFirst (on the left) and withTagCrowd (on the right)
«President Obama Speaks on the Affordable Care Act»
http://1.usa.gov/1jR4Ky2
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 9
Results (i)
Occ. % overall % 2013 % 2012 % 2011 % 2010 % 2009
Barack Obama 607 4.88% 5.68% 4.52% 5.51% 4.45% 3.88%
Patient Protection and
Affordable Care Act
286 2.30% 3.06% 1.35% 1.91% 2.47% 2.71%
American Recovery and
Reinvestment Act of 2009
278 2.23% 1.09% 1.82% 2.88% 2.84% 1.88%
Social Security 272 2.19% 2.58% 1.77% 3.54% 1.61% 0.78%
Amount and percentage of topic
occurrences extracted with TellMeFirst
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 10
Results (ii)
● “New Deal” (141 occurrences), probably used as a metaphor
within the political speeches of President Obama
● “Libya” has a value corresponding to 1.00% in 2011.This result can
be related to the full-scale revolt beginning on 17 February 2011 in
Libya
● “Deepwater Horizon oil spill” reaches the 1.05% in 2010.This
result is related to the marine oil spill which took place in the Gulf
of Mexico that began on 20 april 2010
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 11
Correlation among topics
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 12
A focus on the First Lady (i)
● According to Michelle Obama’s page on the White House
website, the First Lady “looks forward to continuing her work
on the issues close to her heart”:
● supporting military families
● helping working women balance career and family
encouraging national service
● promoting the arts and arts education
● fostering healthy eating and healthy living for children and
families across the country
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 13
A focus on the First Lady (ii)
● We tested whether TellMeFirst confirms or not these
impressions and claims, manually selecting nine Wikipedia
categories which seemed to be related to these issues
● We then interrogated the SPARQL end-point of DBpedia with
a query to collect all the topics of these categories
●
We then associated each topic to one or more of the nine
high-level categories: these categories encompassed
almost 75% of the topics
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 14
A focus on the First Lady (iii)
Wikipedia Category First Lady sp.
9 categories
All speeches
9 categories
Government of the United States 26.68% 32.68%
Education 21.64% 5.40%
Nutrition 19.96% 1.61%
Social issues 14.71% 28.38%
Barack Obama 13.66% 14.00%
Health care 11.34% 7.57%
Arts 8.61% 1.11%
Military personnel 3.99% 3.16%
Gender equality 2.73% 0.84%
Others (unclassified topics) 25.63% 38.34%
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 15
Conclusions (i)
● The ability for citizens to easily retrieve the content of political
speeches and decisions is a crucial factor in e-participation
● Not guaranteed by a traditional keywords search, as in
most of the public administration websites (the White
House website included)
● Example: in a keyword-based system, by typing the word
"education", for instance, users get as result only videos that
have the word education in their title
● All terms that belong to the semantic area of education
are omitted
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 16
Conclusions (ii)
● When documents are semantically classified through
DBpedia URIs all synonyms, hypernyms and hyponyms of
lemmas are traced to the same concept making
user search more effective
● Leveraging Wikipedia categories would allow to go
even a step further, taking advantage of the links
between concepts as designed by the Wikipedia
community
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 17
Next steps
● Building a content search/navigation layer around the
scraping/classification module
● Integration with other Linked Open Data repositories on the
Web, combining the extracted topics with other information
(President Obama's federal budget proposal?)
Thank you!
Giuseppe Futia (giuseppe.futia@polito.it)
This paper was drafted in the context of the Network of Excellence in Internet Science EINS (GA n°288021), and, in
particular, in relation with the activities concerning Evidence and Experimentation (JRA3).
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 19
Appendix - The algorithm
●
The classifier needs to hold in memory all the instances of the
training set and calculate, during classification stage, the vector
distance between training documents and target documents.
● Specifically, the algorithm used by TMF is k-Nearest Neighbor
(kNN), a type of memory-based approach which selects the
categories for a target document on the basis of the k most
similar documents within the vector space.
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 20
Appendix - Scoring formula
●
In a Lucene query, both the target document and the training
set become weighed terms vectors, where terms are weighted
by means of the TF-IDF algorithm.The query returns a list of
documents in the form of DBpedia URIs, ordered by similarity
score. Scoring formula is:
22nd
May 2014 Giuseppe Futia – Politecnico di Torino 21
Appendix - Basic concepts
● Natural Language Processing - A field of computer science,
concerned with the interactions between computers and human
(natural) languages.
● Linked Data - A recommended best practice for exposing, sharing,
and connecting pieces of data, information, and knowledge on the
Semantic Web using URIs and RDF.
● DBpedia - A crowd-sourced community effort to extract
structured information from Wikipedia and a central interlinking hub
for the Linking Open Data project.

More Related Content

ODP
Exploiting Linked Open Data and Natural Language Processing for Classificati...
PPT
EU-US insights into Open Educational Practices for language education
PDF
Update on flegt vpa process in vietnam (august 2015)
PDF
体じゅーよー
PDF
sprint nextel Quarterly Results 2008 3rd
PDF
MicrocreditoDiscussione_
PDF
Automotive IT solution for social eco-driving – A case study from the proje...
PDF
intel Third Quarter 2006 Earnings Release
Exploiting Linked Open Data and Natural Language Processing for Classificati...
EU-US insights into Open Educational Practices for language education
Update on flegt vpa process in vietnam (august 2015)
体じゅーよー
sprint nextel Quarterly Results 2008 3rd
MicrocreditoDiscussione_
Automotive IT solution for social eco-driving – A case study from the proje...
intel Third Quarter 2006 Earnings Release

Viewers also liked (12)

PDF
2009 BIOL503 Class 3 Torts Zyprexa Lillysignedsettlementagreement
DOCX
imtiyaz cv'16
PDF
HND Rayner Meyer 1
PDF
Piri Point
PPTX
2015年JSET全国大会 SIG-05 SIGセッションスライド
PDF
PoonamMalhotra_CV
PDF
fannie mae 2005 Form 10-K
PDF
intel First Quarter 2008
PPT
Montaggio Doccia Chiocciola
DOCX
vijesh resume
PDF
sprint nextel Quarterly Results 2007 3rd
PPTX
Lineárny park Petržalka
2009 BIOL503 Class 3 Torts Zyprexa Lillysignedsettlementagreement
imtiyaz cv'16
HND Rayner Meyer 1
Piri Point
2015年JSET全国大会 SIG-05 SIGセッションスライド
PoonamMalhotra_CV
fannie mae 2005 Form 10-K
intel First Quarter 2008
Montaggio Doccia Chiocciola
vijesh resume
sprint nextel Quarterly Results 2007 3rd
Lineárny park Petržalka
Ad

Similar to Cedem futia-2014 (20)

PDF
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
PPTX
TellMeFirst - A knowledge domain discovery framework
PPT
Empowering Digital Direct Democracy: Policy making via Stance Classification
PDF
Session 1.2 improving access to digital content by semantic enrichment
ODP
Exploiting Linked Open Data as Background Knowledge in Data Mining
PPTX
Poli120i guide
PDF
Connecting political data to media data
PDF
Analyzing-Threat-Levels-of-Extremists-using-Tweets
PPTX
The Europeana Strategy and Linked Open Data
PDF
Connecting political data to media data
PPTX
A framework for real time semantic social media analysis
PPTX
Boost your data analytics with open data and public news content
PPTX
PDF
Ингмар Вебер «Политическая поляризация в поисковых логах и Твиттере»
PPT
PDF
The Politics of Open Data: Past, Present and Future
PPTX
Language of Politics on Twitter - 03 Analysis
PDF
Topic models, vector semantics and applications
PDF
Linked Open Data for Cultural Heritage
PDF
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
TellMeFirst - A knowledge domain discovery framework
Empowering Digital Direct Democracy: Policy making via Stance Classification
Session 1.2 improving access to digital content by semantic enrichment
Exploiting Linked Open Data as Background Knowledge in Data Mining
Poli120i guide
Connecting political data to media data
Analyzing-Threat-Levels-of-Extremists-using-Tweets
The Europeana Strategy and Linked Open Data
Connecting political data to media data
A framework for real time semantic social media analysis
Boost your data analytics with open data and public news content
Ингмар Вебер «Политическая поляризация в поисковых логах и Твиттере»
The Politics of Open Data: Past, Present and Future
Language of Politics on Twitter - 03 Analysis
Topic models, vector semantics and applications
Linked Open Data for Cultural Heritage
Guest Lecture: Linked Open Data for the Humanities and Social Sciences
Ad

More from Danube University Krems, Centre for E-Governance (20)

PPTX
Smart Cities workshop at CeDEM17
PPTX
#CeDEM17 - Towards an Open Data based ICT Reference Architecture for Smart Ci...
PPTX
#CeDEM17 - Financial Payments and Smart Cities
PPTX
#CeDEM2017 Smart Cities of Self-Determined Data Subjects
PPTX
Open Data as Enabler of Public Service Co-creation: Exploring the Drivers and...
PDF
DatalEt-Ecosystem Provider - The DEEP project
PPTX
Towards Open Justice: ICT acceptance in the Greek justice system
PPTX
Using fuzzy cognitive maps as decision support tool for smart cities goraczek
PPTX
Understanding of smartphone divide dal yong
PPTX
The motivations behind open access publishing judith schossboeck
PPTX
Social media as hobed of racism and hate speech kobayashi, kaigo, kwak
PDF
Social media and citizen engagement in asia skoric
PDF
Realizin modeling and evaluation city's enerfy efficiency leonidas anthopoulos
PDF
Post 2015 paris c limate conference politics on the internet manuela hartwig
PPTX
Open government and national sovereignty ivo babaja
PPTX
Health r isk communication in the digital era myojung chung
PPTX
An analysis of japanese local government facebook profiles muneo kaigo
PDF
Datenschutzbeauftragte werden in Zukunft eine wichtige Rolle im Unternehmen s...
Smart Cities workshop at CeDEM17
#CeDEM17 - Towards an Open Data based ICT Reference Architecture for Smart Ci...
#CeDEM17 - Financial Payments and Smart Cities
#CeDEM2017 Smart Cities of Self-Determined Data Subjects
Open Data as Enabler of Public Service Co-creation: Exploring the Drivers and...
DatalEt-Ecosystem Provider - The DEEP project
Towards Open Justice: ICT acceptance in the Greek justice system
Using fuzzy cognitive maps as decision support tool for smart cities goraczek
Understanding of smartphone divide dal yong
The motivations behind open access publishing judith schossboeck
Social media as hobed of racism and hate speech kobayashi, kaigo, kwak
Social media and citizen engagement in asia skoric
Realizin modeling and evaluation city's enerfy efficiency leonidas anthopoulos
Post 2015 paris c limate conference politics on the internet manuela hartwig
Open government and national sovereignty ivo babaja
Health r isk communication in the digital era myojung chung
An analysis of japanese local government facebook profiles muneo kaigo
Datenschutzbeauftragte werden in Zukunft eine wichtige Rolle im Unternehmen s...

Recently uploaded (20)

DOCX
End Of The Age TV Program: Depicting the Actual Truth in a World of Lies
PDF
9th-President-of-the-Philippines_lecture .pdf
PPTX
7th-president-Ramon-Magsaysay-Presentation.pptx
PDF
Samaya Jyothi Live News Telugu | Breaking & Trusted Updates
DOCX
Memecoin memecoinist news site for trends and insights
PPTX
The-Evolution-of-Public-Human-Resource-Management (1).pptx
PDF
Jim Stone Freelance Voterig August 13, 2025.pdf
PPTX
15 Years of Fraud The Shocking Case of CA Impersonation.pptx
PDF
4th-president-of-the-Philippines-_20250 812_103637_0000.pdf
PDF
Aron Govil on Why America Lacks Skilled Engineers.pdf
PDF
Opher Bryer-The Rise and Fall of Opher Bryer How an AI Startup Turned from Pr...
PDF
History ppt on World War 2 and its consequences
PPTX
Rhythms of Freedom_ India Day Shines at Battery Dance Festival 2025.
PDF
424926802-1987-Constitution-as-Basis-of-Environmental-Laws.pdf
PPTX
Sir Creek Conflict: History and its importance
DOC
BU毕业证学历认证,阿什兰大学毕业证文凭证书
PDF
Naya Bharat Vision 2047_ Key Takeaways from This Year’s Independence Day Them...
PPTX
Elias Salame Uses Fake Trades to Make Real Money Disappear.pptx
PDF
The Blogs_ Hamas’s Deflection Playbook _ Andy Blumenthal _ The Times of Israe...
PPTX
Pakistan movement part 2: story about Pakistan Movement
End Of The Age TV Program: Depicting the Actual Truth in a World of Lies
9th-President-of-the-Philippines_lecture .pdf
7th-president-Ramon-Magsaysay-Presentation.pptx
Samaya Jyothi Live News Telugu | Breaking & Trusted Updates
Memecoin memecoinist news site for trends and insights
The-Evolution-of-Public-Human-Resource-Management (1).pptx
Jim Stone Freelance Voterig August 13, 2025.pdf
15 Years of Fraud The Shocking Case of CA Impersonation.pptx
4th-president-of-the-Philippines-_20250 812_103637_0000.pdf
Aron Govil on Why America Lacks Skilled Engineers.pdf
Opher Bryer-The Rise and Fall of Opher Bryer How an AI Startup Turned from Pr...
History ppt on World War 2 and its consequences
Rhythms of Freedom_ India Day Shines at Battery Dance Festival 2025.
424926802-1987-Constitution-as-Basis-of-Environmental-Laws.pdf
Sir Creek Conflict: History and its importance
BU毕业证学历认证,阿什兰大学毕业证文凭证书
Naya Bharat Vision 2047_ Key Takeaways from This Year’s Independence Day Them...
Elias Salame Uses Fake Trades to Make Real Money Disappear.pptx
The Blogs_ Hamas’s Deflection Playbook _ Andy Blumenthal _ The Times of Israe...
Pakistan movement part 2: story about Pakistan Movement

Cedem futia-2014

  • 1. G. Futia F. Cairo F. Morando L. Leschiutta Exploiting Linked Open Data and Natural Language Processing for Classification of Political Speech Krems, 22nd May 2014
  • 2. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 2 Introduction ● Our goal: ● assist anyone interested in automatic categorization of political speeches, to identify unambiguously the main political trends addressed by the White House ● What we have to achieve our goal: ● TellMeFirst (http://tellmefirst.polito.it/), a topic extraction tool: – it leverages DBpedia knowledge base and English Wikipedia linguistic corpus – it exploits Linked Open Data (LOD) and Natural Language Processing (NLP) techniques
  • 3. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 3 DBpedia ● A crowd-sourced community effort to extract structured information from Wikipedia and a central interlinking hub for the Linking Open Data project. ● It is a suitable knowledge base for text classification (Mendes et al., 2012; Hellmann et al., 2013; Steinmetz et al., 2013)
  • 4. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 4 Why DBpedia for US political speeches? Comparison between the coverage of US politics and the coverage of politics of other countries The coverage of politics in Wikipedia is “often very good for recent or prominent topics but is lacking on older or more obscure topics” (Brown, 2011).
  • 5. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 5 Text Categorization Approach ● An instance-based approch:TellMeFirst assigns target documents to classes based on a local comparison between a set of pre-classified documents and the target document itself ● This training set consists of all the Wikipedia paragraphs where a wikilink occurs.These paragraphs are stored in a Lucene index, where each document represents a DBpedia resource
  • 6. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 6 Success rate (%) of the TellMeFirst classification process on US Presidents profiles 1st topic Within the first 2 topics Within the first 7 topics Full text of the Presidents profiles 95.4% 100% 100% President profiles without name and surname 45.4% 61.3% 90.9% TellMeFirst provides as output the seven most relevant topics (in the form of DBpedia URI) of the document sorted by relevance
  • 7. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 7 whitehouse.gov ● 3173 videos in English were available on the White House website on the 24th of November 2013 ● These videos are categorized according to a taxonomy not related to the subject of the speeches ● They need a semantic layer that point out the content of the speeches, so that questions such as “what is the First Lady talking about?” could be automatically answered
  • 8. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 8 Not just a bag-of-words tool Results obtained with TellMeFirst (on the left) and withTagCrowd (on the right) «President Obama Speaks on the Affordable Care Act» http://1.usa.gov/1jR4Ky2
  • 9. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 9 Results (i) Occ. % overall % 2013 % 2012 % 2011 % 2010 % 2009 Barack Obama 607 4.88% 5.68% 4.52% 5.51% 4.45% 3.88% Patient Protection and Affordable Care Act 286 2.30% 3.06% 1.35% 1.91% 2.47% 2.71% American Recovery and Reinvestment Act of 2009 278 2.23% 1.09% 1.82% 2.88% 2.84% 1.88% Social Security 272 2.19% 2.58% 1.77% 3.54% 1.61% 0.78% Amount and percentage of topic occurrences extracted with TellMeFirst
  • 10. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 10 Results (ii) ● “New Deal” (141 occurrences), probably used as a metaphor within the political speeches of President Obama ● “Libya” has a value corresponding to 1.00% in 2011.This result can be related to the full-scale revolt beginning on 17 February 2011 in Libya ● “Deepwater Horizon oil spill” reaches the 1.05% in 2010.This result is related to the marine oil spill which took place in the Gulf of Mexico that began on 20 april 2010
  • 11. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 11 Correlation among topics
  • 12. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 12 A focus on the First Lady (i) ● According to Michelle Obama’s page on the White House website, the First Lady “looks forward to continuing her work on the issues close to her heart”: ● supporting military families ● helping working women balance career and family encouraging national service ● promoting the arts and arts education ● fostering healthy eating and healthy living for children and families across the country
  • 13. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 13 A focus on the First Lady (ii) ● We tested whether TellMeFirst confirms or not these impressions and claims, manually selecting nine Wikipedia categories which seemed to be related to these issues ● We then interrogated the SPARQL end-point of DBpedia with a query to collect all the topics of these categories ● We then associated each topic to one or more of the nine high-level categories: these categories encompassed almost 75% of the topics
  • 14. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 14 A focus on the First Lady (iii) Wikipedia Category First Lady sp. 9 categories All speeches 9 categories Government of the United States 26.68% 32.68% Education 21.64% 5.40% Nutrition 19.96% 1.61% Social issues 14.71% 28.38% Barack Obama 13.66% 14.00% Health care 11.34% 7.57% Arts 8.61% 1.11% Military personnel 3.99% 3.16% Gender equality 2.73% 0.84% Others (unclassified topics) 25.63% 38.34%
  • 15. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 15 Conclusions (i) ● The ability for citizens to easily retrieve the content of political speeches and decisions is a crucial factor in e-participation ● Not guaranteed by a traditional keywords search, as in most of the public administration websites (the White House website included) ● Example: in a keyword-based system, by typing the word "education", for instance, users get as result only videos that have the word education in their title ● All terms that belong to the semantic area of education are omitted
  • 16. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 16 Conclusions (ii) ● When documents are semantically classified through DBpedia URIs all synonyms, hypernyms and hyponyms of lemmas are traced to the same concept making user search more effective ● Leveraging Wikipedia categories would allow to go even a step further, taking advantage of the links between concepts as designed by the Wikipedia community
  • 17. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 17 Next steps ● Building a content search/navigation layer around the scraping/classification module ● Integration with other Linked Open Data repositories on the Web, combining the extracted topics with other information (President Obama's federal budget proposal?)
  • 18. Thank you! Giuseppe Futia (giuseppe.futia@polito.it) This paper was drafted in the context of the Network of Excellence in Internet Science EINS (GA n°288021), and, in particular, in relation with the activities concerning Evidence and Experimentation (JRA3).
  • 19. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 19 Appendix - The algorithm ● The classifier needs to hold in memory all the instances of the training set and calculate, during classification stage, the vector distance between training documents and target documents. ● Specifically, the algorithm used by TMF is k-Nearest Neighbor (kNN), a type of memory-based approach which selects the categories for a target document on the basis of the k most similar documents within the vector space.
  • 20. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 20 Appendix - Scoring formula ● In a Lucene query, both the target document and the training set become weighed terms vectors, where terms are weighted by means of the TF-IDF algorithm.The query returns a list of documents in the form of DBpedia URIs, ordered by similarity score. Scoring formula is:
  • 21. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 21 Appendix - Basic concepts ● Natural Language Processing - A field of computer science, concerned with the interactions between computers and human (natural) languages. ● Linked Data - A recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF. ● DBpedia - A crowd-sourced community effort to extract structured information from Wikipedia and a central interlinking hub for the Linking Open Data project.