Paul Walk
Director, Antleaf
Managing Director, Dublin Core Metadata Initiative (DCMI)
Web: http://www.paulwalk.net
Email: paul@paulwalk.net
Twitter: @paulwalk
www.antleaf.com www.dublincore.org
Sharing profiles: Documenting profiles and
vocabularies on the Web
is it more important that
application profiles are
machine-friendly, or user-
friendly?
the specific challenge:
how to manage & publish the Dublin Core
technical documentation in a more
efficient & sustainable way, making it
as user-friendly as possible while
maintaining its machine-readability
context
• DCMI publishes important technical
documentation (vocabularies,
specifications, models) on the Web
• until recently, managed in sophisticated
bespoke system:
• sources edited as XML files
• maintained in a Subversion
repository
• assembled & converted with shell
scripts and 'Ant'
• FTP to a 'staging server'
• deployed to the live server by the
server admin, on request
• essentially a "closed" system
three technologies which make the difference
1. Git
• stable, sophisticated, free version control technology which is ubiquitously
supported
• github: global scale infrastructure providing git as a service
• invite contribution by 'pull request’
2. Markdown
• simple, parseable but easily readable plain text format
3. Static website generators
• a new class of content management system where sources are managed
locally and compiled into webpages which are then uploaded to a server
(like we used to do it in the early 90s!)
• supports distributed content-management via git
• supports long-term preservation by requiring only simple text-based
formats
• supports use of desktop authoring tools - e.g. text-editors
we are exploring how these three
technologies:
* Git/GitHub
* Markdown (with metadata “front matter”)
* static-site generators
can be harnessed together to address
our challenge
what are static site
generators?
what are static site generators?
• a different kind of web-content management system, designed to publish
content as static content to a bog-standard web-server.
• content is processed during the publishing operation, rather than when the
user requests content (although client-side Javascript still supported)
• simple command-line application to generate content and serve pages
• no database - content in semi-structured text files
components - standard to most systems
1. content-model
• folder hierarchy, text files
2. content pages
• (markdown, front-matter)
• blog type content is also often supported
3. templates (& themes)
• (with some level of basic scripting)
4. generator software
• typically a command-line script or application
5. configuration file
1. content-model
• text files arranged in folder
hierarchy
• folder hierarchy relates to URL path
structure
• filename relates to URL
2. content pages
• "front-matter" metadata
• often in YAML format like here
• main body in Markdown, arbitrary
HTML also accepted where necessary
3. templates
• can reference metadata (e.g. 'page title') from content page
• can re-use 'partial' templates (e.g. a common 'header' & 'footer')
• often in a common templating language such as HAML
• (example below is in Go's templating syntax)
= include partials/header.html .
div.row-fluid
div class="col-xs-12"
h1.page-title {{if .Draft}}[**draft**]{{end}}{{.Title}}
h2.page-title
i {{.Params.author}}, {{.Date.Format "Monday, January 02, 2006"}}
{{.Content}}
= include partials/share_buttons.html .
= include _internal/disqus.html .
= include partials/footer.html .
4. generator software
• used to generate new content:
• also used to run a local sever to see how the site will look
deployment options
• SFTP
• Rsync (over SSH)
• git commit hooks (or GitHub webhooks)
• requires the site to be built on the server, so a little more infrastructure (a
simple CGI) is required
436 known generators
https://staticsitegenerators.net
workflow
‘flipping’ the approach
old approach (single source file)
new approach (many source files, one per term)
pros and cons
• old approach (source in XML file
or similar)
• pros:
• easy to track source files (few in
number)
• easy to transform into other
machine-readable formats
• cons:
• difficult to maintain the source -
not user-friendly
• poor support for extensive free
text description
• new approach (source in
Markdown+YAML)
• pros:
• easier to for humans to read and
maintain
• good support for extensive free
text description
• easy to re-use
(partially/completely)
• cons:
• may not suit very complex
vocabularies/or profiles
simplifying curation and preservation
• version control and redundancy
• synchronised repositories & distributed version control via Git
• active curation
• ease of access and contribution to sources via Git
• simple & readable plain text formats (Markdown)
• "one click" deployment
• minimal deployment infrastructure
• standard web-server
• text files, open formats, no database or server-side 'logic', static site
generators
• reduces broken websites
issues & challenges
1. is this still too technical for
some people who may need
to maintain a metadata
profile or vocabulary?
2. will this approach be
sophisticated enough to
document the majority of
candidate
profiles/vocabularies?
3. can we generalise this
approach to provide a
useful, re-usable tool kit for
others to adopt?
4. how do we handle
versioning? By term, or by
‘collection’ - e.g. vocabulary
or profile
versioning by term
Paul Walk
Director, Antleaf
Managing Director, Dublin Core Metadata Initiative (DCMI)
Web: http://www.paulwalk.net
Email: paul@paulwalk.net
Twitter: @paulwalk www.antleaf.com www.dublincore.org
Thank you!

More Related Content

PDF
MVC Web Application
PPTX
Alfresco Tech Talk Live (Episode 70): Customizing Alfresco Share 4.2
PPTX
Asp folders and web configurations
PPTX
Web programming
PDF
Daten und Verzeichnisse Vergleichen/Synchronisieren mit Beyond Compare (Windo...
PPTX
NiFi - First approach
PPTX
Asp .net folders and web.config
PPT
Alfresco In An Hour - Document Management, Web Content Management, and Collab...
MVC Web Application
Alfresco Tech Talk Live (Episode 70): Customizing Alfresco Share 4.2
Asp folders and web configurations
Web programming
Daten und Verzeichnisse Vergleichen/Synchronisieren mit Beyond Compare (Windo...
NiFi - First approach
Asp .net folders and web.config
Alfresco In An Hour - Document Management, Web Content Management, and Collab...

What's hot (20)

PDF
X All The Things: Enterprise Content Management
PPTX
Discovery Layers: An Overview and Case Study
PPTX
Why Drupal?
PDF
OpenProdoc Overview
PPTX
Entity Framework Core 2.1: Simple, Powerful Data Access for .NET
PPTX
Backing Library Operations with Open Source Applications
PPTX
Domain access - drupal下的多重站台應用
PPTX
eGrove Systems Review - "Features of Magento 2.0"
PPTX
SilverStripe From a Developer's Perspective
PPTX
eGrove Systems - "SOLR" An Apache Product
PPTX
Drop acid
PPTX
.Net Fundamentals
PPTX
Leveraging Open Source Library Guides: Integrating Koha and SubjectsPlus
PPTX
Web server
PPTX
Local storage
PDF
Asp.Net 3 5 Part 1
PPTX
Languages of Internet
PPTX
DOC Presentation by DOC Contractor Alison McCauley
PDF
StoryCode Tech Immersion 1
PPTX
Implementing OpenAthens Single Sign-On Authentication
X All The Things: Enterprise Content Management
Discovery Layers: An Overview and Case Study
Why Drupal?
OpenProdoc Overview
Entity Framework Core 2.1: Simple, Powerful Data Access for .NET
Backing Library Operations with Open Source Applications
Domain access - drupal下的多重站台應用
eGrove Systems Review - "Features of Magento 2.0"
SilverStripe From a Developer's Perspective
eGrove Systems - "SOLR" An Apache Product
Drop acid
.Net Fundamentals
Leveraging Open Source Library Guides: Integrating Koha and SubjectsPlus
Web server
Local storage
Asp.Net 3 5 Part 1
Languages of Internet
DOC Presentation by DOC Contractor Alison McCauley
StoryCode Tech Immersion 1
Implementing OpenAthens Single Sign-On Authentication
Ad

Similar to Documenting metadata application profiles and vocabularies (20)

PPTX
Static Site Generators - Developing Websites in Low-resource Condition
PPTX
Static Site Generators: what they are and when they are useful
PDF
Going Back to Static HTML Sites - SEMPRO 2017
PDF
The Future is Static
ZIP
Pylons - An Overview: Rapid MVC Web Development with WSGI
PDF
Going back to static html sites / Artem Daniliants / LumoSpark
PDF
Dynamic to-static
PPTX
Alfresco Template Feb 2011
PDF
A Comprehensive Guide on Building Lightning-Fast Websites with React Static S...
PDF
The future is mostly static
PPTX
Static website generator
KEY
WordPress & Other Content Management Systems
PDF
Whitepaper SITEFORUM v6.0
PDF
High Voltage - Building Static Sites With Wordpress-Managed Content
PPT
Open Source Content Management Systems
PPSX
Olympya web-tools 2011
PDF
Open Source WCM and Standards
KEY
Web Technology Trends (early 2009)
PPTX
Meetup which approach to choose?
Static Site Generators - Developing Websites in Low-resource Condition
Static Site Generators: what they are and when they are useful
Going Back to Static HTML Sites - SEMPRO 2017
The Future is Static
Pylons - An Overview: Rapid MVC Web Development with WSGI
Going back to static html sites / Artem Daniliants / LumoSpark
Dynamic to-static
Alfresco Template Feb 2011
A Comprehensive Guide on Building Lightning-Fast Websites with React Static S...
The future is mostly static
Static website generator
WordPress & Other Content Management Systems
Whitepaper SITEFORUM v6.0
High Voltage - Building Static Sites With Wordpress-Managed Content
Open Source Content Management Systems
Olympya web-tools 2011
Open Source WCM and Standards
Web Technology Trends (early 2009)
Meetup which approach to choose?
Ad

More from Paul Walk (20)

PPTX
COAR Notify - presentation to PRC Meeting Lyon Notify
PDF
Should Repositories Participate in the Fediverse?
PPTX
Introduction to the COAR Notify project
PPTX
Next generation repositories
PDF
What does the next generation repository look like?
PPTX
COAR Next Generation Repositories Working Group
PPTX
RIOXX: a Modern Metadata Application Profile
PDF
Implementing RIOXX
PPTX
Exploiting the value of Dublin Core through pragmatic development
PPTX
Rioxx 2 repository fringe
PPTX
The Strategic Developer: a new role for Higher Education?
PDF
Local, technical innovation in an outsourced world
PDF
Working with Developers
PPT
It's their cloud, not yours
PDF
Technical Challenges in Resource Discovery
PDF
Responsive Innovation in a Local Context
KEY
The Changing Role of the Developer in HE
KEY
Supporting Developers, Supporting Research
KEY
Future of LMS
KEY
Innovation, community, sustainability
COAR Notify - presentation to PRC Meeting Lyon Notify
Should Repositories Participate in the Fediverse?
Introduction to the COAR Notify project
Next generation repositories
What does the next generation repository look like?
COAR Next Generation Repositories Working Group
RIOXX: a Modern Metadata Application Profile
Implementing RIOXX
Exploiting the value of Dublin Core through pragmatic development
Rioxx 2 repository fringe
The Strategic Developer: a new role for Higher Education?
Local, technical innovation in an outsourced world
Working with Developers
It's their cloud, not yours
Technical Challenges in Resource Discovery
Responsive Innovation in a Local Context
The Changing Role of the Developer in HE
Supporting Developers, Supporting Research
Future of LMS
Innovation, community, sustainability

Recently uploaded (20)

PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PPTX
2018-HIPAA-Renewal-Training for executives
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Flame analysis and combustion estimation using large language and vision assi...
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
PDF
Architecture types and enterprise applications.pdf
PDF
CloudStack 4.21: First Look Webinar slides
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PDF
Abstractive summarization using multilingual text-to-text transfer transforme...
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
UiPath Agentic Automation session 1: RPA to Agents
DOCX
search engine optimization ppt fir known well about this
PPTX
The various Industrial Revolutions .pptx
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
Consumable AI The What, Why & How for Small Teams.pdf
OpenACC and Open Hackathons Monthly Highlights July 2025
Taming the Chaos: How to Turn Unstructured Data into Decisions
Custom Battery Pack Design Considerations for Performance and Safety
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
2018-HIPAA-Renewal-Training for executives
Chapter 5: Probability Theory and Statistics
Flame analysis and combustion estimation using large language and vision assi...
Getting started with AI Agents and Multi-Agent Systems
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Architecture types and enterprise applications.pdf
CloudStack 4.21: First Look Webinar slides
A proposed approach for plagiarism detection in Myanmar Unicode text
Abstractive summarization using multilingual text-to-text transfer transforme...
A comparative study of natural language inference in Swahili using monolingua...
UiPath Agentic Automation session 1: RPA to Agents
search engine optimization ppt fir known well about this
The various Industrial Revolutions .pptx
Final SEM Unit 1 for mit wpu at pune .pptx

Documenting metadata application profiles and vocabularies

  • 1. Paul Walk Director, Antleaf Managing Director, Dublin Core Metadata Initiative (DCMI) Web: http://www.paulwalk.net Email: paul@paulwalk.net Twitter: @paulwalk www.antleaf.com www.dublincore.org Sharing profiles: Documenting profiles and vocabularies on the Web
  • 2. is it more important that application profiles are machine-friendly, or user- friendly?
  • 3. the specific challenge: how to manage & publish the Dublin Core technical documentation in a more efficient & sustainable way, making it as user-friendly as possible while maintaining its machine-readability
  • 4. context • DCMI publishes important technical documentation (vocabularies, specifications, models) on the Web • until recently, managed in sophisticated bespoke system: • sources edited as XML files • maintained in a Subversion repository • assembled & converted with shell scripts and 'Ant' • FTP to a 'staging server' • deployed to the live server by the server admin, on request • essentially a "closed" system
  • 5. three technologies which make the difference 1. Git • stable, sophisticated, free version control technology which is ubiquitously supported • github: global scale infrastructure providing git as a service • invite contribution by 'pull request’ 2. Markdown • simple, parseable but easily readable plain text format 3. Static website generators • a new class of content management system where sources are managed locally and compiled into webpages which are then uploaded to a server (like we used to do it in the early 90s!) • supports distributed content-management via git • supports long-term preservation by requiring only simple text-based formats • supports use of desktop authoring tools - e.g. text-editors
  • 6. we are exploring how these three technologies: * Git/GitHub * Markdown (with metadata “front matter”) * static-site generators can be harnessed together to address our challenge
  • 7. what are static site generators?
  • 8. what are static site generators? • a different kind of web-content management system, designed to publish content as static content to a bog-standard web-server. • content is processed during the publishing operation, rather than when the user requests content (although client-side Javascript still supported) • simple command-line application to generate content and serve pages • no database - content in semi-structured text files
  • 9. components - standard to most systems 1. content-model • folder hierarchy, text files 2. content pages • (markdown, front-matter) • blog type content is also often supported 3. templates (& themes) • (with some level of basic scripting) 4. generator software • typically a command-line script or application 5. configuration file
  • 10. 1. content-model • text files arranged in folder hierarchy • folder hierarchy relates to URL path structure • filename relates to URL
  • 11. 2. content pages • "front-matter" metadata • often in YAML format like here • main body in Markdown, arbitrary HTML also accepted where necessary
  • 12. 3. templates • can reference metadata (e.g. 'page title') from content page • can re-use 'partial' templates (e.g. a common 'header' & 'footer') • often in a common templating language such as HAML • (example below is in Go's templating syntax) = include partials/header.html . div.row-fluid div class="col-xs-12" h1.page-title {{if .Draft}}[**draft**]{{end}}{{.Title}} h2.page-title i {{.Params.author}}, {{.Date.Format "Monday, January 02, 2006"}} {{.Content}} = include partials/share_buttons.html . = include _internal/disqus.html . = include partials/footer.html .
  • 13. 4. generator software • used to generate new content: • also used to run a local sever to see how the site will look
  • 14. deployment options • SFTP • Rsync (over SSH) • git commit hooks (or GitHub webhooks) • requires the site to be built on the server, so a little more infrastructure (a simple CGI) is required
  • 18. old approach (single source file)
  • 19. new approach (many source files, one per term)
  • 20. pros and cons • old approach (source in XML file or similar) • pros: • easy to track source files (few in number) • easy to transform into other machine-readable formats • cons: • difficult to maintain the source - not user-friendly • poor support for extensive free text description • new approach (source in Markdown+YAML) • pros: • easier to for humans to read and maintain • good support for extensive free text description • easy to re-use (partially/completely) • cons: • may not suit very complex vocabularies/or profiles
  • 21. simplifying curation and preservation • version control and redundancy • synchronised repositories & distributed version control via Git • active curation • ease of access and contribution to sources via Git • simple & readable plain text formats (Markdown) • "one click" deployment • minimal deployment infrastructure • standard web-server • text files, open formats, no database or server-side 'logic', static site generators • reduces broken websites
  • 23. 1. is this still too technical for some people who may need to maintain a metadata profile or vocabulary?
  • 24. 2. will this approach be sophisticated enough to document the majority of candidate profiles/vocabularies?
  • 25. 3. can we generalise this approach to provide a useful, re-usable tool kit for others to adopt?
  • 26. 4. how do we handle versioning? By term, or by ‘collection’ - e.g. vocabulary or profile
  • 28. Paul Walk Director, Antleaf Managing Director, Dublin Core Metadata Initiative (DCMI) Web: http://www.paulwalk.net Email: paul@paulwalk.net Twitter: @paulwalk www.antleaf.com www.dublincore.org Thank you!

Editor's Notes

  • #3: I’ll start with a provocative question! This is, essentially, what this presentation will be about - so this is the question or issue to keep in mind. I suggest that in our enthusiasm for linked data we have given more thought to machines and, perhaps, not enough to human users
  • #4: I believe we make better progress if we have a genuine, concrete challenge to address.
  • #5: until very recently, this was the approach but, we have introduced a new approach based on some interesting and relatively new technologies….
  • #7: You will almost certainly know what Git and Github are, and you’ve probably encountered Markdown. You may not have worked with static site generators, so I will describe these next.
  • #9: contrasted with a 'Content Management System', which typically assembles and pre-processes content on request not a new idea (this is where we started with the Web!) but it is much better supported now that we have things like distributed version control (e.g. git) and useable markup and presentation languages (e.g. Markdown, HAML etc.)
  • #11: text files arranged in folder hierarchy the folder hierarchy normally conveys some meaning, and relates directly to URL structures
  • #12: can include HTML were Markdown is not sophisticated enough to deal with some particular markup or structure.
  • #13: eagle-eyed will spot that this is using CSS from 'bootstrap'
  • #14: the new content will use the appropriate 'archetype' according to the path - in this case a 'post' -w flag means watch for changes - extremely fast in Hugo - the browser refreshes the content as soon as you save any file (content or template)
  • #15: we are using GitHub’s web hooks to cause the server running www.dublincore.org to rebuild the site every time a changed is ‘committed’ to Github.
  • #16: We are using Hugo (Go) for the main www.dublincore.org website. Jekyll is a Ruby system, very well known and used to power GitHub
  • #17: the workflow become editing simple, easy to read and write documents, committing changes with standards git commands. This triggers a GitHub web hook which causes a web server to re-generate the resulting website.
  • #19: here the source is the XML, which is then transformed into a variety of formats for both humans and machines (e.g. RDF-XML or nowadays more likely JSON-LD). This approach was used for Dublin Core and, initially, for RIOXX
  • #20: The source is semi-structured, with YAML metadata for the structured part, and Markdown for the less-structured part. We know we can produce HTML from this, & we think we can embed RDFa. If we can do these, then we can do the rest. It’s simply a matter of writing a transformation template for each format.
  • #24: A spreadsheet is, for many people, a tool which is more familiar than a text editor, GitHub and Hugo or Jekyll.
  • #26: The idea of using the Jekyll built into Github is appealing in this respect
  • #27: I ted to favour the latter, because then we can have a ‘release’
  • #28: looks something like this - this is from the changelings and records of decisions which Tom maintained in the old site. We’re now not sure that this is actually useful