SlideShare a Scribd company logo
Utilizing Linked Open Data
                                  (LOD) Resources for
                             Semantic Enhancement of
                              User-Generated Content
                             Dong-Po Deng1,2, Guan-Shuo Mai3, Cheng-Hsin Hsu3,
                        Chin-Lung Chang1,4, Tyng-Ruey Chuang1, and Kwang-Tsao Shao3

                                   1ITC, University of Twente, Enschede, the Netherlands

                              2Institute of Information Science & 3Biodiversity Research Center,
                                              Academia Sinica, Taipei, Taiwan
                               4Department of Computer Science and Information Engineering
                                   National Taiwan University of Science and Technology
                                                      Taipei, Taiwan



Thursday, February 7, 2013
Outline
                    Background
                    Motivation
                    Data Collection
                    LOD resources - LODE and LOD TGN
                    An approach for processing UGC
                        Information Extraction
                        Information Formalization
                        Information Reuse
                    Conclusion remarking




                                                        JIST2012   2012/12/3   2


Thursday, February 7, 2013
Outline
                    Background
                    Motivation
                    Data Collection
                    LOD resources - LODE and LOD TGN
                    An approach for processing UGC
                        Information Extraction
                        Information Formalization
                        Information Reuse
                    Conclusion remarking




                                                        JIST2012   2012/12/3   3


Thursday, February 7, 2013
Background
                    Web 2.0 technologies enable people to contribute
                     their content on the web, e.g. wiki, blog, tagging
                    Social media utilize web 2.0 technologies to
                     support social interactive on the web, e.g. twitter,
                     flickr, facebook
                    The content on the web (or/and social media)
                     contributed by people is called “User-Generated
                     Content” (UGC)
                    UGC is mainly multimedia or textual data
                    UGC is considered as a potential resource for
                     scientific projects, e.g. citizen science

                                                              JIST2012   2012/12/3   4


Thursday, February 7, 2013
Background(cont.)
                    There are several problems to harvest UGC to
                     scientific purposes
                        The unstructured UGC is difficult to handle
                        The semantics of UGC is often ambiguous or/and poor
                        Social media is not designed for scientific purposes




                                     Courtesy from http://www.datenform.de/mapeng.html

                                                                                         JIST2012   2012/12/3   5


Thursday, February 7, 2013
Outline
                    Background
                    Motivation
                    Data Collection
                    LOD resources - LODE and LOD TGN
                    An approach for processing UGC
                        Information Extraction
                        Information Formalization
                        Information Reuse
                    Conclusion remarking




                                                        JIST2012   2012/12/3   6


Thursday, February 7, 2013
Motivation
                    LOD datasets as resources
                        LOD aims on how to make data available on the Web, and
                         to interconnect data with the aim of increasing its value for
                         users
                        about 300 datasets consisting of over 31 billion RDF triples
                         within LOD projects.
                    Each entry representing a fact in LOD datasets has
                     a Unique Resource Identifier (URI) which is
                     referenceable and linkable on the Web.
                    The high interconnectivity between entries
                     potentially increases discoverability, reusability,
                     and the utility of information

                                                                           JIST2012   2012/12/3   7


Thursday, February 7, 2013
Motivation (cont.)
                    Therefore, if named entities of UGC can be
                     identified and connected to entries of LOD, the
                     semantics of named entities would be
                     disambiguated, so that the UGC could be easier to
                     process.




                                                            JIST2012   2012/12/3   8


Thursday, February 7, 2013
Outline
                    Background
                    Motivation
                    Data Collection
                    LOD resources - LODE and LOD TGN
                    An approach for processing UGC
                        Information Extraction
                        Information Formalization
                        Information Reuse
                    Conclusion remarking




                                                        JIST2012   2012/12/3   9


Thursday, February 7, 2013
Data collection
                    Two Facebook interest groups for ecological
                     observations in Taiwan




http://www.facebook.com/groups/roadkilled/   http://www.facebook.com/groups/enjoymoths/



                                                                      JIST2012   2012/12/3   10


Thursday, February 7, 2013
Ecological Observations on Facebook




                                               JIST2012   2012/12/3   11


Thursday, February 7, 2013
Outline
                    Background
                    Motivation
                    Data Collection
                    LOD resources - LODE and LOD TGN
                    An approach for processing UGC
                        Information Extraction
                        Information Formalization
                        Information Reuse
                    Conclusion remarking




                                                        JIST2012   2012/12/3   12


Thursday, February 7, 2013
LOD Ecology
                    Linked Open Data of Ecology (LODE) is a validated
                     dataset from a LOD project.
                    LODE integrated 5 previously distributed
                     databases:




          TFRI: Taiwan Forestry Research Institute



                                                            JIST2012   2012/12/3   13


Thursday, February 7, 2013
LODE in Linked Open Data Cloud




                                                 JIST2012   2012/12/3   14


Thursday, February 7, 2013
LODE in Linked Open Data Cloud




                                                 JIST2012   2012/12/3   14


Thursday, February 7, 2013
LOD Taiwan Geographic Name (TGN)
                    LOD TGN is mainly transferred from Taiwan
                     Gazetteer via LOD principles
                    LOD TGN has 159,241 geographic name entries, in
                     which 17,442 entries are linked to geonames.org




                                                           JIST2012   2012/12/3   15


Thursday, February 7, 2013
Outline
                    Background
                    Motivation
                    Data Collection
                    LOD resources - LODE and LOD TGN
                    An approach for processing UGC
                        Information Extraction
                        Information Formalization
                        Information Reuse
                    Conclusion remarking




                                                        JIST2012   2012/12/3   16


Thursday, February 7, 2013
An approach for processing UGC
                             Information Extraction   Information Reuse




                                                         Information Formalization

                                                                                     JIST2012   2012/12/3   17


Thursday, February 7, 2013
Outline
                    Background
                    Motivation
                    Data Collection
                    LOD resources - LODE and LOD TGN
                    An approach for processing UGC
                        Information Extraction
                        Information Formalization
                        Information Reuse
                    Conclusion remarking




                                                        JIST2012   2012/12/3   18


Thursday, February 7, 2013
Problems in Chinese species names in
                   Facebook ecological observations

                                         曙鳳蝶 (Atrophaneura Horishana)              曙鳳

               (1)                   玉帶鳳蝶 (Papilio Polytes)                        玉帶

                                   琉璃紋鳳蝶 (Papilio Hermosanus)                      琉璃
                             Adjective      Noun


                              細紋 (pronounced Si-Wen, meaning “fine veined”
                                 細紋黃鉤蛾
               (2)               細紋蠍蛉
                                 細紋新蠍蛉
                                ...15 species names with prefix name “細紋”

                                                                        JIST2012   2012/12/3   19


Thursday, February 7, 2013
Identifying shortened
                             species names




                                 Confidence value =




                                                      JIST2012   2012/12/3   20


Thursday, February 7, 2013
Determine a species name for a thread
                    What if several species
                     names had mentioned in
                     one thread? We used three
                     criteria
                        How many Like does the post or
                         the comments get?
                        How prestigious are the people
                         who post or make comments?
                        How many times does a species
                         name occur in a thread?




                                                          JIST2012   2012/12/3   21


Thursday, February 7, 2013
The problems of geographic names in
                   Facebook ecological observations

                  An example:
                  The Endemic Species Research Institute
                 特有生物研究保育中心
                 Te-You-Sheng-Wu-Yan-Jiou-Bao-Yu-Jhong-Sin

                             is shorten to

                    特生中心
                    Te-Sheng-Jhong-Sin




                                                           JIST2012   2012/12/3   22


Thursday, February 7, 2013
The problems of geographic names in
                   Facebook ecological observations

                  An example:
                  The Endemic Species Research Institute
                 特有生物研究保育中心
                 Te-You-Sheng-Wu-Yan-Jiou-Bao-Yu-Jhong-Sin

                             is shorten to

                    特生中心                     There are no rules to
                    Te-Sheng-Jhong-Sin       shorten long geographic
                                             names



                                                            JIST2012   2012/12/3   22


Thursday, February 7, 2013
Identifying shortened geographic
                   names




                                                  JIST2012   2012/12/3   23


Thursday, February 7, 2013
The ontology...
                    is relied on a Facebook thread, which is an entity
                     comprised of social media contents involving
                     peoples, places, time periods, photos, and links to
                     other contents
                    uses standard vocabularies,
                        Semantically-Interlinked Online communities (SIOC) can be
                         used to represent the structure of Facebook posts,
                         comments, and threads.
                        Friend of a Friend (FOAF) can be used to describe content
                         creators,
                        and Dublin Core for the interlinked contents they created




                                                                        JIST2012   2012/12/3   24


Thursday, February 7, 2013
An ontology for formalizing the extracted
                   information from Facebook threads




                                                   JIST2012   2012/12/3   25


Thursday, February 7, 2013
Outline
                    Background
                    Motivation
                    Data Collection
                    LOD resources - LODE and LOD TGN
                    An approach for processing UGC
                        Information Extraction
                        Information Formalization
                        Information Reuse
                    Conclusion remarking




                                                        JIST2012   2012/12/3   26


Thursday, February 7, 2013
Transfer ecological observations in
                   Facebook to RDF




                     http://140.109.28.64:2020/page/thread/177883715557195_440860179259546

                                                                             JIST2012   2012/12/3   27


Thursday, February 7, 2013
Transfer ecological observations in
                   Facebook to RDF




                     http://140.109.28.64:2020/page/thread/177883715557195_440860179259546

                                                                             JIST2012   2012/12/3   27


Thursday, February 7, 2013
The extracted species name from the
                   Facebook thread is linked to LOD resources




                                                    JIST2012   2012/12/3   28


Thursday, February 7, 2013
The extracted species name from the
                   Facebook thread is linked to LOD resources




                                                    JIST2012   2012/12/3   28


Thursday, February 7, 2013
The extracted species name from the
                   Facebook thread is linked to LOD resources




                                                    JIST2012   2012/12/3   28


Thursday, February 7, 2013
The extracted species name from the
                   Facebook thread is linked to LOD resources




                                                    JIST2012   2012/12/3   28


Thursday, February 7, 2013
A taxon of Theretra Nessus is the
                   extracted species name




                                                       JIST2012   2012/12/3   29


Thursday, February 7, 2013
A taxon of Theretra Nessus is the
                   extracted species name




                   This entry is connected to LODE via owl:sameAs
                                                                    JIST2012   2012/12/3   29


Thursday, February 7, 2013
The extracted place name from the
                   Facebook thread is linked to LOD resources




                                                    JIST2012   2012/12/3   30


Thursday, February 7, 2013
The extracted place name from the
                   Facebook thread is linked to LOD resources




                                                    JIST2012   2012/12/3   30


Thursday, February 7, 2013
The extracted place name from the
                   Facebook thread is linked to LOD resources




                                                    JIST2012   2012/12/3   30


Thursday, February 7, 2013
The extracted place name from the
                   Facebook thread is linked to LOD resources




                                                    JIST2012   2012/12/3   30


Thursday, February 7, 2013
The entry of LOD TGN transferred from
                   Taiwan Gazetteer




                                                   JIST2012   2012/12/3   31


Thursday, February 7, 2013
The entry of LOD TGN transferred from
                   Taiwan Gazetteer




                             It is linked to geonames.org via owl:sameAs


                                                                  JIST2012   2012/12/3   31


Thursday, February 7, 2013
Publish the processed Facebook
                   ecological observations




                                                    JIST2012   2012/12/3   32


Thursday, February 7, 2013
Outline
                    Background
                    Motivation
                    Data Collection
                    LOD resources - LODE and LOD TGN
                    An approach for processing UGC
                        Information Extraction
                        Information Formalization
                        Information Reuse
                    Conclusion remarking




                                                        JIST2012   2012/12/3   33


Thursday, February 7, 2013
A semantic annotation plug-in for entering
                   geographic names in Facebook posts




                                                    JIST2012   2012/12/3   34


Thursday, February 7, 2013
A semantic annotation plug-in for entering
                   geographic names in Facebook posts




                                                    JIST2012   2012/12/3   34


Thursday, February 7, 2013
A semantic annotation plug-in for entering
                   geographic names in Facebook posts




                                                    JIST2012   2012/12/3   34


Thursday, February 7, 2013
JIST2012   2012/12/3   35


Thursday, February 7, 2013
Outline
                    Background
                    Motivation
                    Data Collection
                    LOD resources - LODE and LOD TGN
                    An approach for processing UGC
                        Information Extraction
                        Information Formalization
                        Information Reuse
                    Conclusion remarking




                                                        JIST2012   2012/12/3   36


Thursday, February 7, 2013
Conclusion remarking
                    This study reports our experiences in transferring FB
                     ecological observations to interlink to LOD
                     resources (LODE and LOD TGN)
                    With these information extraction tools and LOD
                     resources, we developed a tool for semantic
                     enhancement of user input.

                    The LOD TGN is an ongoing project.
                    In the future, we will consolidate the feature types
                     of the geographic names, and we plan to make
                     the LOD TGN a geospatial semantics reference
                     resource.
                                                                JIST2012   2012/12/3   37


Thursday, February 7, 2013
Thank you for your attentions

                             Questions?

                             deng@itc.nl




                                                      JIST2012   2012/12/3   38


Thursday, February 7, 2013

More Related Content

PDF
Research Inventy : International Journal of Engineering and Science
PPTX
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
PPT
Open Data - Where Do We Stand from a Researcher's Perspective?
PDF
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
PDF
"Analysis of Different Text Classification Algorithms: An Assessment "
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PDF
Lod challenge
PDF
Toward Next Generation of Gazetteer: Utilizing GeoSPARQL For Developing Link...
Research Inventy : International Journal of Engineering and Science
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
Open Data - Where Do We Stand from a Researcher's Perspective?
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
"Analysis of Different Text Classification Algorithms: An Assessment "
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
Lod challenge
Toward Next Generation of Gazetteer: Utilizing GeoSPARQL For Developing Link...

Viewers also liked (10)

PDF
SotM taiwan 2012 opening
PPTX
Twitter and Social movements
PPTX
Social Media and Disaster Management
PPTX
How To Use Social Media In Emergency Response Management
PDF
Social media use in times of crisis
PPTX
Disaster Relief Using Social Media Data
PDF
Lessons Learned from OGP Summit 2016
PDF
A Short Introduction to Semantic Web-based E-Commerce: The GoodRelations Voca...
PDF
Social Media Crisis Management
PDF
The 10 Big Social Media Challenges (and the tactics to solve them)
SotM taiwan 2012 opening
Twitter and Social movements
Social Media and Disaster Management
How To Use Social Media In Emergency Response Management
Social media use in times of crisis
Disaster Relief Using Social Media Data
Lessons Learned from OGP Summit 2016
A Short Introduction to Semantic Web-based E-Commerce: The GoodRelations Voca...
Social Media Crisis Management
The 10 Big Social Media Challenges (and the tactics to solve them)
Ad

Similar to JIST 2012 (20)

PDF
You rang, M’LOD? Google Refine in the world of LOD
PDF
Data Science: An Emerging Field for Future Jobs
PDF
BIG DATA ANALYTICS AND E LEARNING IN HIGHER EDUCATION
PDF
My fire st petersburg 27 june 2012 (d hladky)
PDF
Big Data Analytics and E Learning in Higher Education. Tulasi.B & Suchithra.R
PDF
Linked Data for Federation of OER Data & Repositories
PDF
LOD2 - Creating Knowledge out of Interlinked Data - General Presentation
PDF
Keynote - TUT W3C Web Technology Day: Linked Data for Science and Industry, 2...
PPTX
UpSkills: Research Data Management for the Sciences
PDF
Application and Methods of Deep Learning in IoT
PPTX
Tech sem 2_dilip
PDF
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacy
PDF
Data hosting infrastructure for primary biodiversity data
PDF
Citrination-MRS Fall Meeting 2015
PPTX
Repository Federation: Towards Data Interoperability
PDF
GBIF BIFA mentoring, Day 5a Data management, July 2016
PDF
RDFC2012 Open Access to Research Data
PDF
KnowEscape workshop, OKCon 2013
PPT
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
ODP
Introduction to LDL 2012
You rang, M’LOD? Google Refine in the world of LOD
Data Science: An Emerging Field for Future Jobs
BIG DATA ANALYTICS AND E LEARNING IN HIGHER EDUCATION
My fire st petersburg 27 june 2012 (d hladky)
Big Data Analytics and E Learning in Higher Education. Tulasi.B & Suchithra.R
Linked Data for Federation of OER Data & Repositories
LOD2 - Creating Knowledge out of Interlinked Data - General Presentation
Keynote - TUT W3C Web Technology Day: Linked Data for Science and Industry, 2...
UpSkills: Research Data Management for the Sciences
Application and Methods of Deep Learning in IoT
Tech sem 2_dilip
Two-Phase TDS Approach for Data Anonymization To Preserving Bigdata Privacy
Data hosting infrastructure for primary biodiversity data
Citrination-MRS Fall Meeting 2015
Repository Federation: Towards Data Interoperability
GBIF BIFA mentoring, Day 5a Data management, July 2016
RDFC2012 Open Access to Research Data
KnowEscape workshop, OKCon 2013
Open Educational Data - Datasets and APIs (Athens Green Hackathon 2012)
Introduction to LDL 2012
Ad

More from Dongpo Deng (20)

PDF
20180226 data driven smart governance
PDF
The methods and practices of Linked Open Data
PDF
Construction and reuse of linked traceable agricultural product records - An ...
PDF
農產品產銷履歷資料鏈結化處理 (Linked Traceable Agricultural Data )
PDF
開放街圖社群經營的不等式
PDF
OSM 與 LocalWiki 的整合: 支援社區層級災害管理
PDF
啟動開放,創新價值
PDF
2016年歐洲資料論壇
PDF
From Structured Data to Linked Open Governmental Data
PDF
開放街圖: 集合群眾之力的製圖 (OpenStreetMap: A crowdsoucing map )
PDF
20150427_NCDR_OSM_Disaster_Mapping
PDF
Crowdsourced mapping for open collaboration: A story of Taiwan so far
PDF
2014_WWW_BTOR
PDF
20141018_OD_meetup#3
PDF
20141001 climate change&osm
PDF
20140721 open geomeeting
PDF
20140710 tca gsdi
PDF
開放資料: 全球化的草根性運動
PDF
Social Web Meets Sensor Web: Linked Crowdsourced Observation Data
PDF
20140114 moi open_data
20180226 data driven smart governance
The methods and practices of Linked Open Data
Construction and reuse of linked traceable agricultural product records - An ...
農產品產銷履歷資料鏈結化處理 (Linked Traceable Agricultural Data )
開放街圖社群經營的不等式
OSM 與 LocalWiki 的整合: 支援社區層級災害管理
啟動開放,創新價值
2016年歐洲資料論壇
From Structured Data to Linked Open Governmental Data
開放街圖: 集合群眾之力的製圖 (OpenStreetMap: A crowdsoucing map )
20150427_NCDR_OSM_Disaster_Mapping
Crowdsourced mapping for open collaboration: A story of Taiwan so far
2014_WWW_BTOR
20141018_OD_meetup#3
20141001 climate change&osm
20140721 open geomeeting
20140710 tca gsdi
開放資料: 全球化的草根性運動
Social Web Meets Sensor Web: Linked Crowdsourced Observation Data
20140114 moi open_data

JIST 2012

  • 1. Utilizing Linked Open Data (LOD) Resources for Semantic Enhancement of User-Generated Content Dong-Po Deng1,2, Guan-Shuo Mai3, Cheng-Hsin Hsu3, Chin-Lung Chang1,4, Tyng-Ruey Chuang1, and Kwang-Tsao Shao3 1ITC, University of Twente, Enschede, the Netherlands 2Institute of Information Science & 3Biodiversity Research Center, Academia Sinica, Taipei, Taiwan 4Department of Computer Science and Information Engineering National Taiwan University of Science and Technology Taipei, Taiwan Thursday, February 7, 2013
  • 2. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 2 Thursday, February 7, 2013
  • 3. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 3 Thursday, February 7, 2013
  • 4. Background  Web 2.0 technologies enable people to contribute their content on the web, e.g. wiki, blog, tagging  Social media utilize web 2.0 technologies to support social interactive on the web, e.g. twitter, flickr, facebook  The content on the web (or/and social media) contributed by people is called “User-Generated Content” (UGC)  UGC is mainly multimedia or textual data  UGC is considered as a potential resource for scientific projects, e.g. citizen science JIST2012 2012/12/3 4 Thursday, February 7, 2013
  • 5. Background(cont.)  There are several problems to harvest UGC to scientific purposes  The unstructured UGC is difficult to handle  The semantics of UGC is often ambiguous or/and poor  Social media is not designed for scientific purposes Courtesy from http://www.datenform.de/mapeng.html JIST2012 2012/12/3 5 Thursday, February 7, 2013
  • 6. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 6 Thursday, February 7, 2013
  • 7. Motivation  LOD datasets as resources  LOD aims on how to make data available on the Web, and to interconnect data with the aim of increasing its value for users  about 300 datasets consisting of over 31 billion RDF triples within LOD projects.  Each entry representing a fact in LOD datasets has a Unique Resource Identifier (URI) which is referenceable and linkable on the Web.  The high interconnectivity between entries potentially increases discoverability, reusability, and the utility of information JIST2012 2012/12/3 7 Thursday, February 7, 2013
  • 8. Motivation (cont.)  Therefore, if named entities of UGC can be identified and connected to entries of LOD, the semantics of named entities would be disambiguated, so that the UGC could be easier to process. JIST2012 2012/12/3 8 Thursday, February 7, 2013
  • 9. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 9 Thursday, February 7, 2013
  • 10. Data collection  Two Facebook interest groups for ecological observations in Taiwan http://www.facebook.com/groups/roadkilled/ http://www.facebook.com/groups/enjoymoths/ JIST2012 2012/12/3 10 Thursday, February 7, 2013
  • 11. Ecological Observations on Facebook JIST2012 2012/12/3 11 Thursday, February 7, 2013
  • 12. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 12 Thursday, February 7, 2013
  • 13. LOD Ecology  Linked Open Data of Ecology (LODE) is a validated dataset from a LOD project.  LODE integrated 5 previously distributed databases: TFRI: Taiwan Forestry Research Institute JIST2012 2012/12/3 13 Thursday, February 7, 2013
  • 14. LODE in Linked Open Data Cloud JIST2012 2012/12/3 14 Thursday, February 7, 2013
  • 15. LODE in Linked Open Data Cloud JIST2012 2012/12/3 14 Thursday, February 7, 2013
  • 16. LOD Taiwan Geographic Name (TGN)  LOD TGN is mainly transferred from Taiwan Gazetteer via LOD principles  LOD TGN has 159,241 geographic name entries, in which 17,442 entries are linked to geonames.org JIST2012 2012/12/3 15 Thursday, February 7, 2013
  • 17. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 16 Thursday, February 7, 2013
  • 18. An approach for processing UGC Information Extraction Information Reuse Information Formalization JIST2012 2012/12/3 17 Thursday, February 7, 2013
  • 19. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 18 Thursday, February 7, 2013
  • 20. Problems in Chinese species names in Facebook ecological observations 曙鳳蝶 (Atrophaneura Horishana) 曙鳳 (1) 玉帶鳳蝶 (Papilio Polytes) 玉帶 琉璃紋鳳蝶 (Papilio Hermosanus) 琉璃 Adjective Noun 細紋 (pronounced Si-Wen, meaning “fine veined” 細紋黃鉤蛾 (2) 細紋蠍蛉 細紋新蠍蛉 ...15 species names with prefix name “細紋” JIST2012 2012/12/3 19 Thursday, February 7, 2013
  • 21. Identifying shortened species names Confidence value = JIST2012 2012/12/3 20 Thursday, February 7, 2013
  • 22. Determine a species name for a thread  What if several species names had mentioned in one thread? We used three criteria  How many Like does the post or the comments get?  How prestigious are the people who post or make comments?  How many times does a species name occur in a thread? JIST2012 2012/12/3 21 Thursday, February 7, 2013
  • 23. The problems of geographic names in Facebook ecological observations An example: The Endemic Species Research Institute 特有生物研究保育中心 Te-You-Sheng-Wu-Yan-Jiou-Bao-Yu-Jhong-Sin is shorten to 特生中心 Te-Sheng-Jhong-Sin JIST2012 2012/12/3 22 Thursday, February 7, 2013
  • 24. The problems of geographic names in Facebook ecological observations An example: The Endemic Species Research Institute 特有生物研究保育中心 Te-You-Sheng-Wu-Yan-Jiou-Bao-Yu-Jhong-Sin is shorten to 特生中心 There are no rules to Te-Sheng-Jhong-Sin shorten long geographic names JIST2012 2012/12/3 22 Thursday, February 7, 2013
  • 25. Identifying shortened geographic names JIST2012 2012/12/3 23 Thursday, February 7, 2013
  • 26. The ontology...  is relied on a Facebook thread, which is an entity comprised of social media contents involving peoples, places, time periods, photos, and links to other contents  uses standard vocabularies,  Semantically-Interlinked Online communities (SIOC) can be used to represent the structure of Facebook posts, comments, and threads.  Friend of a Friend (FOAF) can be used to describe content creators,  and Dublin Core for the interlinked contents they created JIST2012 2012/12/3 24 Thursday, February 7, 2013
  • 27. An ontology for formalizing the extracted information from Facebook threads JIST2012 2012/12/3 25 Thursday, February 7, 2013
  • 28. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 26 Thursday, February 7, 2013
  • 29. Transfer ecological observations in Facebook to RDF http://140.109.28.64:2020/page/thread/177883715557195_440860179259546 JIST2012 2012/12/3 27 Thursday, February 7, 2013
  • 30. Transfer ecological observations in Facebook to RDF http://140.109.28.64:2020/page/thread/177883715557195_440860179259546 JIST2012 2012/12/3 27 Thursday, February 7, 2013
  • 31. The extracted species name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 28 Thursday, February 7, 2013
  • 32. The extracted species name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 28 Thursday, February 7, 2013
  • 33. The extracted species name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 28 Thursday, February 7, 2013
  • 34. The extracted species name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 28 Thursday, February 7, 2013
  • 35. A taxon of Theretra Nessus is the extracted species name JIST2012 2012/12/3 29 Thursday, February 7, 2013
  • 36. A taxon of Theretra Nessus is the extracted species name This entry is connected to LODE via owl:sameAs JIST2012 2012/12/3 29 Thursday, February 7, 2013
  • 37. The extracted place name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 30 Thursday, February 7, 2013
  • 38. The extracted place name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 30 Thursday, February 7, 2013
  • 39. The extracted place name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 30 Thursday, February 7, 2013
  • 40. The extracted place name from the Facebook thread is linked to LOD resources JIST2012 2012/12/3 30 Thursday, February 7, 2013
  • 41. The entry of LOD TGN transferred from Taiwan Gazetteer JIST2012 2012/12/3 31 Thursday, February 7, 2013
  • 42. The entry of LOD TGN transferred from Taiwan Gazetteer It is linked to geonames.org via owl:sameAs JIST2012 2012/12/3 31 Thursday, February 7, 2013
  • 43. Publish the processed Facebook ecological observations JIST2012 2012/12/3 32 Thursday, February 7, 2013
  • 44. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 33 Thursday, February 7, 2013
  • 45. A semantic annotation plug-in for entering geographic names in Facebook posts JIST2012 2012/12/3 34 Thursday, February 7, 2013
  • 46. A semantic annotation plug-in for entering geographic names in Facebook posts JIST2012 2012/12/3 34 Thursday, February 7, 2013
  • 47. A semantic annotation plug-in for entering geographic names in Facebook posts JIST2012 2012/12/3 34 Thursday, February 7, 2013
  • 48. JIST2012 2012/12/3 35 Thursday, February 7, 2013
  • 49. Outline  Background  Motivation  Data Collection  LOD resources - LODE and LOD TGN  An approach for processing UGC  Information Extraction  Information Formalization  Information Reuse  Conclusion remarking JIST2012 2012/12/3 36 Thursday, February 7, 2013
  • 50. Conclusion remarking  This study reports our experiences in transferring FB ecological observations to interlink to LOD resources (LODE and LOD TGN)  With these information extraction tools and LOD resources, we developed a tool for semantic enhancement of user input.  The LOD TGN is an ongoing project.  In the future, we will consolidate the feature types of the geographic names, and we plan to make the LOD TGN a geospatial semantics reference resource. JIST2012 2012/12/3 37 Thursday, February 7, 2013
  • 51. Thank you for your attentions Questions? deng@itc.nl JIST2012 2012/12/3 38 Thursday, February 7, 2013