WP3:	Linguis,cs	–	Text	
Sjef	Barbiers		
Partners		
	INT,	MI,	RU,	RUG,	UU,	VU	
	
Today’s	demo’s	
	Integra,ng	Diachronous	Conceptual	Lexicons	through	Linked	
	Open	Data	–	VU	
	
	GrETEL,	PaQU:	Searching	Tree	Banks	–	RUG	/	UU	/	Leuven	
	
	MIMORE	in	Nederlab:	Morphosyntac,c	dialect	research	–	Meertens
Integra,ng	Diachronous	Conceptual	Lexicons	
through	Linked	Open	Data	
	
Isa	Maks1,	Marieke	van	Erp1,	Piek	Vossen1,	Rinke	Hoekstra,	Nicoline	van	der	Sijs	
	1:Fac.	of	Humani,es,	Vrije	Universiteit	Amsterdam	(WP3)	2:Computer	Science	Department,	Vrije	Universiteit	Amsterdam	(WP4)	
	3:Meertens	Ins,tuut	Amsterdam(	WP2)	
	
	
Integra(on	and	enrichment	of	several	exis(ng	historical	conceptual	lexicons,	
matching	the	ontologies,	using	linked	open	data	principles.		
Enables:	
•  tracing	changes	in	word	meanings	and	concepts	over	(me	
•  query	expansion	
•  natural	language	processing	of	historical	Dutch	texts.
xsd:string
ontolex:
LexicalEntry
rdfs:label
penn:Tag ontolex:LexicalSense
ontolex:Form
olia:hasTag
ontolex:sense
ontolex:canonicalForm ontolex:Formontolex:otherForm
lemon-cltl:Usage
xsd:date
xsd:date
lemon:Sense
Definition
ontolex:Lexical
Concept
ontolex:definition
ontolex:isSenseOf
lemon-cltl:periodEnd
ontolex:usage
xsd:string
skos:prefLabel
skos:Concept
skos:related
lemon-cltl:periodStart
dbo:Place
lemon-cltl:geographicArea
dbo:Thing
dct:subject
skos:concept
is a
ontolex:reference
lemon-cltl:
SpatioTemporalScope
lemon-cltl:scope
lexinfo:Register
lexinfo:register
Prefixes:
ontolex: http://www.w3.org/ns/lemon/ontolex#
lexinfo: http://www.lexinfo.net/ontology/2.0/lexinfo#
penn: http://purl.org/olia/penn.owl#
olia: http://purl.org/olia/olia.owl#
xsd: http://www.w3.org/2001/XMLSchema#
skos: http://www.w3.org/2004/02/skos#
dct: http://purl.org/dc/terms/
dbo: http://dbpedia.org/ontology/
lemon:cltl: additional modeling (in progress)
Ontology	or	classifica,on	
Which	concept?	Is	it	a	plant,	an	
occupa,on,	an	emo,on,	etc.	?	
	
Words	
Which	words	can	express	these	
concepts?	part-of-speech,	form	
variants,	spelling	variants?	
When	
In	which	period	are	these	words	
used?		
Where	
In	which	part	of	the	Netherlands	
or	Belgium	is	this	word	used?	
Provenance	
Which	source	provided	the	informa,on?			
Modelling	the	lexicons	as	linked	open	data
Resources	
1600	 EmbodiedEmo,ons	
	h`ps://www.esciencecenter.nl/project/from-
sen,ment-mining-to-mining-embodied-
emo,ons	
emo,ons	
1650	 Meijers		 Meijers	Woordenschat	(1669)	 all	domains	
1800	 HISCO	 h`p://historyofwork.iisg.nl/	 occupa,on	
1850	 Brouwers	 Brouwers	Thesaurus	(1987)	 all	domains	
1885	 Pland	 h`ps://www.meertens.knaw.nl/pland/	 plants	
1950	 ODWN	
h`p://www.cltl.nl/results/demos/open-
source-dutch-wordnet/	 all	domains	
other	resources	will	be	added	in	the	future
Query	expansion	
Finding	occupa(ons	in	historic	texts	
‘small	farmers’ 	 		
	
	
	
	
	
	
En	van	de	schamelheid	zijner	plaggen	had	er	de		heikeuter	nog	eerst	den	
langen	weg	te	gaan	tot	de	burgers	van	Venlo,	eer	hij	de	winst	van	zijn	
arbeid	ingeruild	zag	tegen	't	noodige	voor	een	schraal	bestaan.	(Felix		
Ru`en,	1918,	Ons	mooie	Limburg,	DBNL)	
Hisco		
[occupa7on-65111-small	
farming]	
kleinboer		
kleinlandbouwer	
keuterboer	
…........	
Brouwers	
[concept?]	
keuterboer	
heikeuter	
landbouwer		
….........
GrETEL,	PaQU	
Gertjan	van	Noord	(RUG)		–	Jan	Odijk	(UU)	
•  Web	applica,ons:	search	in	treebanks	
–  Treebank	=	text	in	which	each	sentence	has	a	syntac,c	parse	
•  With	interfaces	designed	for	linguists	
•  Enables	syntac,c	research	
•  Applica,ons	language-independent	but	need	language-
specific	components	
–  PaQu,	GrETEL:	Dutch	only	
–  PolyGrETEL:	mul,ple	languages
Development	Plan	&	Status	
			
GrETEL	 PaQU	
Base	 CLARIN-NL	 CLARIN-NL	
Own	Corpus	 CLARIAH	 CLARIN-NL	
Metadata	 CLARIAH	 CLARIAH	
Analysis	
Component	 CLARIAH	 CLARIN-NL	+	CLARIAH	
More	formats	 CLARIAH	 CLARIAH	
Interface	 CLARIAH	 CLARIN-NL	
More	Corpora	 CLARIAH	 CLARIAH	
GREEN	=	done	 ORANGE=	par,al	 RED=	TO	DO
Research	Done		
•  PhD	on	verb	clustering	in	Dutch:	Augus,nus	(2015):		
•  acquisi,on	of	the	words	zeer,	heel,	erg	(`very’):	Odijk	
(2015,	2016)		
•  norma,ve	and	non-norma,ve	variants	of	12	Dutch	
construc,ons	:	Odijk	(2015),	van	Noord	&	Odijk	
(2016)	
•  agreement	in	copular	construc,ons:	Van	Eynde	et	al.	
(2016)
Hun	–	Zij/Ze	as	subject	
Per	million	
words	
WriQen	 Spoken	
hun	
	
0	 20	
zij	(mv)	 343	 360	
ze	(mv)	 1481	 4107	
9	
•  Hun	very	rare,	only	in	spoken	corpus,	only	in	NL,	
only	in	unprepared	speech	(a-i)
Hem/’m	–	Hij/ie	as	subject	
•  ‘m	rare,	only	in	spoken	corpus,	only	in	Flanders,	
only	in	unprepared	speech	(a-h)		
Per	million	
words	
WriQen	 Spoken	
hem/	‘m		 0	 101	
hij	 2703	 2686	
ie	 55	 1919	
10
References	
L.	Augus,nus	(2015):	Complement	Raising	and	Cluster	Forma,on	in	Dutch:	A	
treebank-supported	inves,ga,on,		PhD	KU	Leuven,	Belgium.		
Odijk,	J.	(2015)	'Linguis,c	Research	with	PaQU'	Computa(onal	Linguis(cs	in	The	
Netherlands	journal	5,	p.	3-14	[pdf]	
Odijk,	J.	(2016)	‘A	Use	Case	for	Linguis,c	Research	on	Dutch	with	CLARIN’,	in	K.	De	
Smedt	(ed.),	Selected	Papers	from	the	CLARIN	Annual	Conference	2015,	45-61.	
[Abstract	and	Fulltext]	
Odijk,	J.	(2015),	'Zoeken	naar	Construc,es',	presenta,on	and	poster	held	at	the	
DRONGO	Language	Fes,val,	Utrecht,	26	September	2015.	[presenta,on]	[poster]	
Noord,	G.	van	&	J.	Odijk	(2016).	`Goed	of	Fout:	Wat	gebruikt	men	feitelijk?’,	
presenta,on	at	the		`Grote	Taaldag'	(TIN),	Utrecht,	6	February	2016.	[handout]	
[pptx]	[pdf]	
Van	Eynde,	F,	L.	Augus,nus	&	V.	Vandeghinste	(2016).'Number	agreement	in	copular	
construc,ons:	A	treebank-based	inves,ga,on'.doi:10.1016/j.lingua.2016.02.001	to	
appear	in	Lingua.	[URL]
Upgrade	MIMORE		
	Marc	Kemps	Snijders	-	Sjef	Barbiers	(Meertens)	
•  Morphosyntac,c	research	tool	for	three	Dutch	dialect	
databases	(CLARIN)	
•  Integra,on	into	Nederlab:	Interface	-	MTAS	
•  Integra,on	of	the	SAND	maps	from	SAND	Volumes	I	and	II	
•  Workspace	with	opera,ons	on	virtual	collec,ons
Search	for	subject	doubling		
in	the	Syntac7c	Atlas	of	the		
Dutch	Dialects	(SAND)
Show	the	data	underlying	the	map	
	of	subject	doubling	2.singular	
Save	as	corpus
Search	for	a	poten7ally	
correla7ng	property	in	a	different	
database	(DIDDD):		
ar7cle-demonstra7ve	sequences
POS	tag	specifica7on	of	Search	
D(art,def)	followed	by	D(dem,def)
Result	list	with	(a.o.).		
KWIC	concordance	and	POS	tags	
Save	as	corpus
Saved	data	sets	in	workspace
Geographic	distribu7on	
of	the	two	phenomena

More Related Content

PPTX
DM2E and eCloud
PPT
New horizons for Open Access policies in Europe and Research data management ...
PPT
OpenAIRE Presentation at EC INFSO-RTD, Brussels, May 4th, 2011
PPTX
OpenAIRE at Workshop on CRIS and OAR, May 2010
PPT
Linking Collections Through Linked Open Data
PPT
The Dutch Approach to Research Data Infrastructure
PPTX
The Future is All Mine
PDF
OpenMinted: It's Uses and Benefits for the Social Sciences
DM2E and eCloud
New horizons for Open Access policies in Europe and Research data management ...
OpenAIRE Presentation at EC INFSO-RTD, Brussels, May 4th, 2011
OpenAIRE at Workshop on CRIS and OAR, May 2010
Linking Collections Through Linked Open Data
The Dutch Approach to Research Data Infrastructure
The Future is All Mine
OpenMinted: It's Uses and Benefits for the Social Sciences

What's hot (11)

PDF
OpenMinTeD: Making Sense of Large Volumes of Data
PPT
How can repositories support the text mining of their content and why?
PDF
Abstracts: Building infrastructures for archives in a digital world
PPT
Peter Verhaar BL RIC Workshop 22032011
PPTX
Baum, Kempf: Thesaurus based indexing
PPTX
Connecting Museums with Linked Data
PPT
ViBRANT—Virtual Biodiversity Research and Access Network for Taxonomy
PPT
Metadata and the Amount of Information
PDF
Prezentare
PDF
Semantic Publishing and Nanopublications
PDF
Text databases and information retrieval
OpenMinTeD: Making Sense of Large Volumes of Data
How can repositories support the text mining of their content and why?
Abstracts: Building infrastructures for archives in a digital world
Peter Verhaar BL RIC Workshop 22032011
Baum, Kempf: Thesaurus based indexing
Connecting Museums with Linked Data
ViBRANT—Virtual Biodiversity Research and Access Network for Taxonomy
Metadata and the Amount of Information
Prezentare
Semantic Publishing and Nanopublications
Text databases and information retrieval
Ad

More from CLARIAH (20)

PPT
ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018
PPTX
DB:CCC Presentation of Karin Hofmeester, CLARIAH Toogdag 19-10-2018
PPTX
Masterclass innosurance 2018
PPTX
Flat TLA
PPTX
QB'er demonstration
PDF
Collection registration for the CLARIAH Media Suite.
PPTX
CMDI2RDF
PDF
2016 05-20-clariah-wp4
PDF
2016 05-20-clariah-wp2
PDF
2016 05-20-clariah-wp5
PDF
MTAS Henny Brugman
PDF
LREC Ton vd Wouden
PDF
Paqu Gertjan van Noord en Jan Odijk
PDF
Open sonar martinreynaert
PDF
Struc data Auke Rijpma
PDF
Diachronous conceptuallexicons Marieke van Erp / Piek Vossen
PDF
Corpus studio Erwin Komen
PDF
Athena richard zijdeman
PDF
Struc data aukerijpma
PDF
Anansi jauco noordzij
ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018
DB:CCC Presentation of Karin Hofmeester, CLARIAH Toogdag 19-10-2018
Masterclass innosurance 2018
Flat TLA
QB'er demonstration
Collection registration for the CLARIAH Media Suite.
CMDI2RDF
2016 05-20-clariah-wp4
2016 05-20-clariah-wp2
2016 05-20-clariah-wp5
MTAS Henny Brugman
LREC Ton vd Wouden
Paqu Gertjan van Noord en Jan Odijk
Open sonar martinreynaert
Struc data Auke Rijpma
Diachronous conceptuallexicons Marieke van Erp / Piek Vossen
Corpus studio Erwin Komen
Athena richard zijdeman
Struc data aukerijpma
Anansi jauco noordzij
Ad

Recently uploaded (20)

PDF
My India Quiz Book_20210205121199924.pdf
PDF
What if we spent less time fighting change, and more time building what’s rig...
PPTX
Education and Perspectives of Education.pptx
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PPTX
Computer Architecture Input Output Memory.pptx
PDF
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
PDF
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PDF
Uderstanding digital marketing and marketing stratergie for engaging the digi...
PPTX
Core Concepts of Personalized Learning and Virtual Learning Environments
PDF
Environmental Education MCQ BD2EE - Share Source.pdf
PDF
MICROENCAPSULATION_NDDS_BPHARMACY__SEM VII_PCI .pdf
PDF
Hazard Identification & Risk Assessment .pdf
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
My India Quiz Book_20210205121199924.pdf
What if we spent less time fighting change, and more time building what’s rig...
Education and Perspectives of Education.pptx
Paper A Mock Exam 9_ Attempt review.pdf.
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
Computer Architecture Input Output Memory.pptx
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
Share_Module_2_Power_conflict_and_negotiation.pptx
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
B.Sc. DS Unit 2 Software Engineering.pptx
Uderstanding digital marketing and marketing stratergie for engaging the digi...
Core Concepts of Personalized Learning and Virtual Learning Environments
Environmental Education MCQ BD2EE - Share Source.pdf
MICROENCAPSULATION_NDDS_BPHARMACY__SEM VII_PCI .pdf
Hazard Identification & Risk Assessment .pdf
FORM 1 BIOLOGY MIND MAPS and their schemes
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
MBA _Common_ 2nd year Syllabus _2021-22_.pdf

2016 05-20-clariah-wp3