SlideShare a Scribd company logo
Chalice – Linked Data and Historic Place-names Jo Walsh  [email_address] Kate Byrne, Richard Tobin, Claire Grover
Nostalgia ->
Overview of the Edinburgh Geoparser System to automatically recognise place names in text and disambiguate them with respect to a gazetteer. (Athens, Springfield) Patchy development over past few years funded by a variety of projects applied to a range of data sets: GeoCrossWalk BOPCRIS GeoDigRef (Histpop, BOPCRIS, BL) Embedding GeoCrossWalk (Stormont Papers) SYNC3 (online news) Chalice (EPNS) Unlock Main concern has been to keep it generally usable while applying it to specific data sets.
Overview of the Edinburgh Geoparser .txt .html .xml Format  conversion Tokenisation POS tagging Lemmatis- ation Named Entity Recognition .geotagged.xml Geotagging Gazetteer lookup Resolution .geotagged.xml .gaz.xml Georesolution
 
 
Chalice Connecting Historical Authorities with Linked Data, Contexts, and Entities. Part of jiscEXPO - "exposing digital content for education and research".  The project is exploring the viability of creating a historical gazetteer from digitized volumes from the English Place-Name Society (EPNS). Partners:  CDDA, Queen’s University, Belfast School of Informatics, Edinburgh EDINA, Edinburgh CeRch, Kings College London
English Place-Name Survey At the Institute of Name Studies in Nottingham 80+ volumes covering English counties Over 1000 years of place-name history Started in 1925 and still going!
 
Archaeology and Place-names and History "The first point, already noted repeatedly but so important that it cannot be too strongly emphasised, is that historical evidence is documentary and therefore direct evidence only of a state of mind; that archaeological evidence is material and therefore direct evidence only of practical skills, technological processes, aesthetic interests and physical sequences; and that place-name evidence is linguistic and therefore direct evidence only of language and speech habits. Indirect inferences may be drawn in each case, and  the evidence of place-names may be used to throw light on the date, nature and extent of settlements, on the movements of peoples and their relationships to each other , on certain aspects of their organisation and on many of the other problems that concern the historian and the archaeologist. But in all these cases the inferences depend to some extent on assumptions and they must be examined carefully before they are accepted as valid."  – F.T. Wainwright
Chalice data Cheshire Cheshire Part I. EPNS Volume 44, 1970 Cheshire Part II. EPNS Volume 45, 1970 Cheshire Part III. EPNS Volume 46, 1971 Cheshire Part IV. EPNS Volume 47, 1972 Cheshire Part V (1 :i). EPNS Volume 48, 1981 Cheshire Part V (1 :ii). EPNS Volume 54, 1981 Small samples from: Berkshire, Buckinghamshire (Vol. 2), Cambridgeshire (Vol 19), Derbyshire (Vols 27-29), Hertfordshire (Vol. 15)  Shropshire: Pimhill Hundred (born digital)
EPNS Parishes organised in terms of the hundreds in which they belong. Towns and villages referred to as townships, organised in terms of the parish in which they belong. Township descriptions often contain descriptions of buildings, bridges, lanes, woods and farms.  Information about river and major road names are described separately from the inhabited place descriptions.  Names and spellings that have been attested in historical sources and the etymology of names or name parts. In Chalice we focus on capturing parishes, townships, sub-townships, attestation.
 
The start of the entry for the township of Willaston in the parish of Neston in Wirral Hundred.
 
 
 
 
Turtle-ish version @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .  @prefix gn: <http://www.geonames.org/ontology#> .  @prefix dc: <http://purl.org/dc/elements/1.1/#>  @prefix chalice: <http://made.up.domain.name/chalice/>  :Bosley a chalice:Place;  dc:title Bosley . owl:sameas <http://sws.geonames.org/2655141/> :Boselega a chalice:PlaceName;  dc:title Boselega . #attested a chalice:PlaceNameAttestation;  chalice:place :Bosley ;  chalice:known_as :Boselega ;  chalice:source :DB ;  chalice:date 1086 .  :DB a chalice:Source dc:title 'Domesday Book' .
Linking Data A URI for each place-name Links to information about each attestation Links to nearby places Links to other sources of place-name references Geonames.org (variable quality, wide usage) Ordnance Survey Open Data (also variable quality) Then links from and between documentary sources
 
 
 
Issues OCR quality needs to be high: not just recognising characters correctly but getting font and layout information right.  Variation in use of layout and font to indicate structure Different volumes reflect different decisions about where place name information should be put Consider long-term preservation of URIs  Need to share vocabularies with other projects  (Pleaides, SPQR, geodataverse?)
Integrating (with) other sources Series of use cases by Stuart Dunn at KCL Victoria County History Clergy of the Church of England Database Archaeology Data Service
 
 
 
 
GAP & Ancient Place-names Based on Pleiades set of ancient place names but extended in two ways: by matching Pleiades place names against GeoNames place names in the same location and adding the GeoNames alternative names to the Pleiades+ list: adds three alternative names for the single Pleiades entry for &quot;Autricum&quot; (&quot;Chartrez&quot;, &quot;Chartres&quot;, &quot;Shartr&quot;), because &quot;Autricum” is present in both Pleiades and GeoNames, with the same approximate location (We don't want to simply take places directly from GeoNames because, when we tried it, we were swamped with irrelevant modern places having names corresponding to ancient toponyms.)
Pleiades+(+) Pleiades+: get alternative names for places that match in geonames Pleiades++ is a runtime supercharging bit: if place X isn't in Pleiades+, look at &quot;synonym ring&quot; of alternative names in geonames try all of those against Pleiades+ mysql> select distinct p.name,p.plid,p.geonameId,p.fclass,p.fcode,p.country,p.latitude,p.longitude,p.population,p.normname from plplus p join geonames.alternatename a on p.name=a.alternatename join geonames.geoname g on a.geonameid=g.geonameid join geonames.alternatename a2 on a2.geonameid=g.geonameid where a2.alternatename=&quot;Egypt&quot;; +----------+---------+-----------+--------+-------+---------+------------+------------+------------+----------+ | name     | plid    | geonameId | fclass | fcode | country | latitude   | longitude  | population | normname | +----------+---------+-----------+--------+-------+---------+------------+------------+------------+----------+ | Aegyptus |     766 |         0 |        |       |         | 32.5000000 | 32.5000000 |          0 | aegyptus | | Aegyptus |  981503 |         0 |        |       |         | 27.5000000 | 26.5476190 |          0 | aegyptus | | Aigyptos | 1001943 |         0 |        |       |         | 32.5000000 | 32.5000000 |          0 | aigyptos | +----------+---------+-----------+--------+-------+---------+------------+------------+------------+----------+ 3 rows in set (0.05 sec)
A Linked Data Focus on Open Scholarship
Thanks http://chalice.blogs.edina.ac.uk http://unlock.edina.ac.uk/text.html

More Related Content

PPT
Blue Ocean Strategy
PDF
Blue Ocean Strategy - Summary and Examples
PPT
Chalice / Edinburgh Geoparser at GISRUK
PPT
Chalice / Edinburgh Geoparser at CA2011
PPT
Open Data talk - work of Open Knowledge Foundation, EDINA, OpenStreetmap ...
PPTX
Mozcon 2016 - Building Links in 2016
PDF
3 Key Messages for the CEO
PDF
SEO: Getting Personal
Blue Ocean Strategy
Blue Ocean Strategy - Summary and Examples
Chalice / Edinburgh Geoparser at GISRUK
Chalice / Edinburgh Geoparser at CA2011
Open Data talk - work of Open Knowledge Foundation, EDINA, OpenStreetmap ...
Mozcon 2016 - Building Links in 2016
3 Key Messages for the CEO
SEO: Getting Personal

Similar to Chalice / Edinburgh Geoparser at GISRUK with extra slides (20)

PDF
Dan Pett - British Museum - Enriched Archaeological Records - Geomob Feb 2011
PDF
Applied GIS Masters Dissertation
PPT
Data Mining for scholarship
PPT
Numismatic Linked Open Data and Geographic Analysis
PPTX
W3G conference: Geodata at the British Museum
PPTX
Towards Semi-Automatic Annotation of Toponyms on Old Maps
PPT
Mining and mapping places with multiple names
PDF
Mediterranean Archaeological Landscapes Current Issues Effie F Athanassopoulo...
ODP
Integrating Geographic Linked Data
PDF
Edin pelagios
PPTX
Digital Exposure of English Place-Names (DEEP) -Stuart Dunn
PPT
Digital archaeology and museums
PPTX
Linking Spaces with Places: Examples from the PastPlace Project
PPTX
GeoSemantic Technologies for Archaeological Resources
PPTX
Masters Dissertation - Presentation
PDF
Barcelona georeferencer
PPTX
Reference sources presentation geographical and biographical sources final
ODP
4 anna mria
ODP
4 anna mria
Dan Pett - British Museum - Enriched Archaeological Records - Geomob Feb 2011
Applied GIS Masters Dissertation
Data Mining for scholarship
Numismatic Linked Open Data and Geographic Analysis
W3G conference: Geodata at the British Museum
Towards Semi-Automatic Annotation of Toponyms on Old Maps
Mining and mapping places with multiple names
Mediterranean Archaeological Landscapes Current Issues Effie F Athanassopoulo...
Integrating Geographic Linked Data
Edin pelagios
Digital Exposure of English Place-Names (DEEP) -Stuart Dunn
Digital archaeology and museums
Linking Spaces with Places: Examples from the PastPlace Project
GeoSemantic Technologies for Archaeological Resources
Masters Dissertation - Presentation
Barcelona georeferencer
Reference sources presentation geographical and biographical sources final
4 anna mria
4 anna mria
Ad

Recently uploaded (20)

PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Introduction to Building Materials
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PPTX
History, Philosophy and sociology of education (1).pptx
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PPTX
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
PPTX
Cell Types and Its function , kingdom of life
PDF
Classroom Observation Tools for Teachers
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Trump Administration's workforce development strategy
PDF
Indian roads congress 037 - 2012 Flexible pavement
PDF
Hazard Identification & Risk Assessment .pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
UNIT III MENTAL HEALTH NURSING ASSESSMENT
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Unit 4 Skeletal System.ppt.pptxopresentatiom
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Introduction to Building Materials
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Paper A Mock Exam 9_ Attempt review.pdf.
History, Philosophy and sociology of education (1).pptx
A powerpoint presentation on the Revised K-10 Science Shaping Paper
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
Cell Types and Its function , kingdom of life
Classroom Observation Tools for Teachers
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Trump Administration's workforce development strategy
Indian roads congress 037 - 2012 Flexible pavement
Hazard Identification & Risk Assessment .pdf
Final Presentation General Medicine 03-08-2024.pptx
UNIT III MENTAL HEALTH NURSING ASSESSMENT
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Orientation - ARALprogram of Deped to the Parents.pptx
Complications of Minimal Access Surgery at WLH
Unit 4 Skeletal System.ppt.pptxopresentatiom
Ad

Chalice / Edinburgh Geoparser at GISRUK with extra slides

  • 1. Chalice – Linked Data and Historic Place-names Jo Walsh [email_address] Kate Byrne, Richard Tobin, Claire Grover
  • 3. Overview of the Edinburgh Geoparser System to automatically recognise place names in text and disambiguate them with respect to a gazetteer. (Athens, Springfield) Patchy development over past few years funded by a variety of projects applied to a range of data sets: GeoCrossWalk BOPCRIS GeoDigRef (Histpop, BOPCRIS, BL) Embedding GeoCrossWalk (Stormont Papers) SYNC3 (online news) Chalice (EPNS) Unlock Main concern has been to keep it generally usable while applying it to specific data sets.
  • 4. Overview of the Edinburgh Geoparser .txt .html .xml Format conversion Tokenisation POS tagging Lemmatis- ation Named Entity Recognition .geotagged.xml Geotagging Gazetteer lookup Resolution .geotagged.xml .gaz.xml Georesolution
  • 5.  
  • 6.  
  • 7. Chalice Connecting Historical Authorities with Linked Data, Contexts, and Entities. Part of jiscEXPO - &quot;exposing digital content for education and research&quot;. The project is exploring the viability of creating a historical gazetteer from digitized volumes from the English Place-Name Society (EPNS). Partners: CDDA, Queen’s University, Belfast School of Informatics, Edinburgh EDINA, Edinburgh CeRch, Kings College London
  • 8. English Place-Name Survey At the Institute of Name Studies in Nottingham 80+ volumes covering English counties Over 1000 years of place-name history Started in 1925 and still going!
  • 9.  
  • 10. Archaeology and Place-names and History &quot;The first point, already noted repeatedly but so important that it cannot be too strongly emphasised, is that historical evidence is documentary and therefore direct evidence only of a state of mind; that archaeological evidence is material and therefore direct evidence only of practical skills, technological processes, aesthetic interests and physical sequences; and that place-name evidence is linguistic and therefore direct evidence only of language and speech habits. Indirect inferences may be drawn in each case, and the evidence of place-names may be used to throw light on the date, nature and extent of settlements, on the movements of peoples and their relationships to each other , on certain aspects of their organisation and on many of the other problems that concern the historian and the archaeologist. But in all these cases the inferences depend to some extent on assumptions and they must be examined carefully before they are accepted as valid.&quot; – F.T. Wainwright
  • 11. Chalice data Cheshire Cheshire Part I. EPNS Volume 44, 1970 Cheshire Part II. EPNS Volume 45, 1970 Cheshire Part III. EPNS Volume 46, 1971 Cheshire Part IV. EPNS Volume 47, 1972 Cheshire Part V (1 :i). EPNS Volume 48, 1981 Cheshire Part V (1 :ii). EPNS Volume 54, 1981 Small samples from: Berkshire, Buckinghamshire (Vol. 2), Cambridgeshire (Vol 19), Derbyshire (Vols 27-29), Hertfordshire (Vol. 15) Shropshire: Pimhill Hundred (born digital)
  • 12. EPNS Parishes organised in terms of the hundreds in which they belong. Towns and villages referred to as townships, organised in terms of the parish in which they belong. Township descriptions often contain descriptions of buildings, bridges, lanes, woods and farms. Information about river and major road names are described separately from the inhabited place descriptions. Names and spellings that have been attested in historical sources and the etymology of names or name parts. In Chalice we focus on capturing parishes, townships, sub-townships, attestation.
  • 13.  
  • 14. The start of the entry for the township of Willaston in the parish of Neston in Wirral Hundred.
  • 15.  
  • 16.  
  • 17.  
  • 18.  
  • 19. Turtle-ish version @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> . @prefix gn: <http://www.geonames.org/ontology#> . @prefix dc: <http://purl.org/dc/elements/1.1/#> @prefix chalice: <http://made.up.domain.name/chalice/> :Bosley a chalice:Place; dc:title Bosley . owl:sameas <http://sws.geonames.org/2655141/> :Boselega a chalice:PlaceName; dc:title Boselega . #attested a chalice:PlaceNameAttestation; chalice:place :Bosley ; chalice:known_as :Boselega ; chalice:source :DB ; chalice:date 1086 . :DB a chalice:Source dc:title 'Domesday Book' .
  • 20. Linking Data A URI for each place-name Links to information about each attestation Links to nearby places Links to other sources of place-name references Geonames.org (variable quality, wide usage) Ordnance Survey Open Data (also variable quality) Then links from and between documentary sources
  • 21.  
  • 22.  
  • 23.  
  • 24. Issues OCR quality needs to be high: not just recognising characters correctly but getting font and layout information right. Variation in use of layout and font to indicate structure Different volumes reflect different decisions about where place name information should be put Consider long-term preservation of URIs Need to share vocabularies with other projects (Pleaides, SPQR, geodataverse?)
  • 25. Integrating (with) other sources Series of use cases by Stuart Dunn at KCL Victoria County History Clergy of the Church of England Database Archaeology Data Service
  • 26.  
  • 27.  
  • 28.  
  • 29.  
  • 30. GAP & Ancient Place-names Based on Pleiades set of ancient place names but extended in two ways: by matching Pleiades place names against GeoNames place names in the same location and adding the GeoNames alternative names to the Pleiades+ list: adds three alternative names for the single Pleiades entry for &quot;Autricum&quot; (&quot;Chartrez&quot;, &quot;Chartres&quot;, &quot;Shartr&quot;), because &quot;Autricum” is present in both Pleiades and GeoNames, with the same approximate location (We don't want to simply take places directly from GeoNames because, when we tried it, we were swamped with irrelevant modern places having names corresponding to ancient toponyms.)
  • 31. Pleiades+(+) Pleiades+: get alternative names for places that match in geonames Pleiades++ is a runtime supercharging bit: if place X isn't in Pleiades+, look at &quot;synonym ring&quot; of alternative names in geonames try all of those against Pleiades+ mysql> select distinct p.name,p.plid,p.geonameId,p.fclass,p.fcode,p.country,p.latitude,p.longitude,p.population,p.normname from plplus p join geonames.alternatename a on p.name=a.alternatename join geonames.geoname g on a.geonameid=g.geonameid join geonames.alternatename a2 on a2.geonameid=g.geonameid where a2.alternatename=&quot;Egypt&quot;; +----------+---------+-----------+--------+-------+---------+------------+------------+------------+----------+ | name     | plid    | geonameId | fclass | fcode | country | latitude   | longitude  | population | normname | +----------+---------+-----------+--------+-------+---------+------------+------------+------------+----------+ | Aegyptus |     766 |         0 |        |       |         | 32.5000000 | 32.5000000 |          0 | aegyptus | | Aegyptus |  981503 |         0 |        |       |         | 27.5000000 | 26.5476190 |          0 | aegyptus | | Aigyptos | 1001943 |         0 |        |       |         | 32.5000000 | 32.5000000 |          0 | aigyptos | +----------+---------+-----------+--------+-------+---------+------------+------------+------------+----------+ 3 rows in set (0.05 sec)
  • 32. A Linked Data Focus on Open Scholarship