SlideShare a Scribd company logo
The	
  ARK	
  Iden+fier	
  Scheme	
  at	
  
          Ten	
  Years	
  Old	
  
                                       7 	
   M a y 	
   2 0 1 2 	
  

                                         J o h n 	
   K u n z e 	
  
      U n i v e r s i t y 	
   o f 	
   C a l i f o r n i a 	
   C u r a + o n 	
   C e n t e r 	
  
                    C a l i f o r n i a 	
   D i g i t a l 	
   L i b r a r y 	
  
California	
  Digital	
  Library	
  
Serving	
  the	
  University	
  of	
  California	
     CDL	
  supports	
  the	
  research	
  lifecycle	
  	
  
•  10	
  campuses	
                                    •  Collec+ons	
  
•  360K	
  students,	
  faculty,	
  and	
  staff	
      •  Digital	
  Special	
  Collec+ons	
  
•  100’s	
  of	
  museums,	
  art	
  galleries,	
      •  Discovery	
  &	
  Delivery	
  
   observatories,	
  marine	
  centers,	
              •  Publishing	
  Group	
  
   botanical	
  gardens	
                              •  UC	
  Cura+on	
  Center	
  (UC3)	
  
•  5	
  medical	
  centers	
  
•  5	
  law	
  schools	
  
•  3	
  Na+onal	
  Laboratories	
  
California	
  Digital	
  Library	
  (CDL)	
  
Today’s	
  journey	
  

           • What	
  are	
  ARKs?	
  
           • Separa+on	
  of	
  concerns	
  
               • Naming	
  ≠	
  hos+ng	
  
               • Scheme	
  ≠	
  resolu+on	
  
               • Syntax	
  ≠	
  persistence	
  
           • Inflec+ons	
  and	
  metadata	
  
           • EZID	
  (easy	
  iden+fiers)	
  and	
  
           N2T	
  (name-­‐to-­‐thing)	
  
           • Data	
  cita+on,	
  passthrough	
  
What’s	
  an	
  ARK	
  iden+fier?	
  
ARK	
  =	
  Archival	
  Resource	
  Key	
  
ARKs	
  support	
  long-­‐term	
  access	
  to	
  informa+on	
  objects	
  
ARKs	
  iden+fy	
  objects	
  of	
  any	
  type:	
  
•  digital	
  objects	
  –	
  data,	
  documents,	
  images,	
  sodware,	
  ...	
  
•  physical	
  objects	
  –	
  books,	
  bones,	
  statues,	
  ...	
  
•  groups	
  &	
  living	
  beings	
  –	
  people,	
  animals,	
  orchestras,	
  ...	
  
•  Intangibles	
  –	
  places,	
  chemicals,	
  diseases,	
  terms,	
  ...	
  
The	
  URL	
  is	
  dead,	
  long	
  live	
  the	
  URL!	
  
Fallacy	
  #1:	
  	
  URLs	
  are	
  unreliable,	
  so	
  instead	
  use	
  
  this...	
  um...	
  well...	
  ah	
  ...	
  (shhh!)	
  “URL”	
  
Some	
  of	
  your	
  best	
  friends	
  are	
  URLs:	
  
                        hlp://dx.doi.org/10.1234/98765	
  
               hlp://hdl.handle.net/10.1234/98765	
  
                          hlp://purl.org/10.1234/98765	
  
                       hlp://n2t.net/ark:/101234/98765	
  
Persistence	
  is	
  about	
  service	
  
•  Imagine	
  the	
  “perfect”	
  golden	
  iden+fier	
  
•  Apply	
  bankruptcy,	
  disk	
  crash,	
  human	
  error,	
  or	
  
   war,	
  and	
  there’s	
  nothing	
  that	
  syntax,	
  scheme,	
  or	
  
   resolver	
  can	
  do	
  to	
  prevent	
  iden+fier	
  breakage.	
  
What’s	
  an	
  ARK	
  iden+fier?	
  (take	
  2)	
  
An	
  ARK	
  is	
  a	
  URL,	
  with	
  some	
  extra	
  rules	
  
ARK	
  reserves	
  /	
  and	
  .	
  for	
  what	
  we	
  oden	
  assume	
  
•  A/B/C	
  means	
  C	
  is	
  contained	
  in	
  A/B,	
  and	
  B	
  in	
  A	
  
•  A.pdf,	
  A.html,	
  and	
  A.docx	
  are	
  all	
  variants	
  of	
  A	
  
Could	
  dras+cally	
  improve	
  search	
  result	
  display	
  
•  No	
  need	
  to	
  lookup	
  rela+onships	
  
ARK	
  inflec+ons	
  (declina+ons)	
  
An	
  ARK	
  is	
  a	
  special	
  URL	
  with	
  access	
  to	
  3	
  things	
  
1.  An	
  informa+on	
  object	
  
2.  Its	
  metadata,	
  by	
  appending	
  ‘?’	
  inflec+on	
  
3.  A	
  provider’s	
  promise,	
  by	
  appending	
  a	
  ‘??’	
  
An	
  inflec1on	
  changes	
  a	
  name	
  ending	
  for	
  a	
  purpose	
  
•  Reduces	
  the	
  number	
  of	
  different	
  names	
  needed	
  
•  Use	
  seman+c	
  web	
  without	
  hiring	
  a	
  programmer	
  
‘?’	
  Inflec+on	
  returns	
  Dublin	
  Kernel	
  
Same	
  machine-­‐readable	
  informa+on	
  as	
  before:	
  
erc:!
who:       National Research Council!
what:      The Digital Dilemma!
when:      2000!
where:     http://books.nap.edu/html/digital%5Fdilemma!

Even	
  shorter:	
  
erc: National Research Council!
     | The Digital Dilemma | 2000 !
     | http://books.nap.edu/html/digital%5Fdilemma!

See	
  hlp://dublincore.org/groups/kernel/	
  for	
  more	
  informa+on!
Why	
  use	
  ARKs?	
  
ARKs	
  are	
  assigned	
  for	
  a	
  variety	
  of	
  reasons:	
  
•  affordability	
  –	
  there	
  are	
  no	
  fees	
  to	
  assign	
  or	
  use	
  ARKs	
  
•  self-­‐sufficiency	
  –	
  can	
  host	
  ARKs	
  on	
  your	
  own	
  web	
  server	
  
•  portability	
  –	
  can	
  move	
  ARKs	
  without	
  change	
  of	
  iden+ty	
  
         http://cdlib.org/ark:/12025/654xz321
       http://rutgers.edu/ark:/12025/654xz321
           http://n2t.net/ark:/12025/654xz321	
  

•  global	
  resolvability	
  –	
  can	
  host	
  ARKs	
  at	
  N2T	
  resolver	
  
•  density	
  –	
  mixed	
  case	
  means	
  CD,	
  Cd,	
  cD,	
  cd	
  are	
  all	
  dis+nct	
  
Some	
  unique	
  advantages	
  of	
  ARKs	
  
•  simplicity	
  –	
  uses	
  only	
  ordinary	
  "redirects”	
  &	
  "get"	
  requests	
  
•  versa+lity	
  –	
  with	
  "inflec+ons"	
  (different	
  endings),	
  an	
  ARK	
  
   should	
  access	
  data,	
  metadata,	
  promises,	
  and	
  more	
  
•  transparency	
  –	
  no	
  iden+fier	
  can	
  guarantee	
  stability,	
  and	
  
   ARK	
  inflec+ons	
  help	
  users	
  make	
  informed	
  judgments	
  
•  visibility	
  –	
  syntax	
  rules	
  make	
  ARKs	
  easy	
  to	
  extract	
  and	
  to	
  
   compare	
  for	
  containment	
  and	
  	
  variant	
  rela+onships	
  
•  reserved	
  characters:	
  	
  -­‐	
  (hyphen),	
  	
  /	
  (slash),	
  	
  .	
  (period)	
  
What’s	
  an	
  ARK	
  iden+fier?	
  (take	
  3)	
  
ARK	
  is	
  a	
  collec+on	
  of	
  good	
  ideas	
  
•  Separates	
  scheme	
  syntax	
  from	
  resolver	
  rules	
  
    –  Resolu1on	
  is	
  a	
  process	
  of	
  mapping	
  an	
  id	
  to	
  a	
  thing	
  
•  Separates	
  name	
  assigning	
  from	
  name	
  mapping	
  
•  All	
  schemes	
  encouraged	
  to	
  use	
  these	
  ideas,	
  even	
  
   ordinary	
  URLs	
  
•  N2T	
  resolver	
  can	
  support	
  them	
  for	
  any	
  scheme	
  
Iden+fier	
  schemes	
  are	
  highly	
  parallel	
  
                           Scheme
                              :
Name Mapping Authority        : Name Assigning Authority
          :     (NMA)         :    :        Number (NAAN)
          v                   v    v
|..........................|....+..................|
           http://dx.doi.org/doi:10.30/tqb3kh97gh8w
       http://hdl.handle.net/hdl:13030/tqb3kh97gh8w
                       http://purl.org/tqb3kh97gh8w
                       ...   urn:13030:tqb3kh97gh8w
             http://n2t.net/ark:/13030/tqb3kh97gh8w
 http://OwlBike.example.org/ark:/13030/tqb3kh97gh8w
|..........................|.......................|......
    Branded or neutral          Base identifier     Suffix
Locksmith	
  jargon:	
  shoulder,	
  blade,	
  +p,	
  bow,	
  cover	
  
       _____    slips on     _____
    .-' ,_,'-.. ----> .-'          '-.
   /    (o,o)         /             
  :     {`"'} ||       :               `____
 / .-. -"-"- ||       / .-.                 '--^.   .^--^.        .^.
{ (    )       ||    { (     )                   `-'      `-^--^-'   '--^.
  `-'    _o   ||      '-'            ===================================}
  :     _|<,_ ||       :               __________________________________/
      (*)/(*) /                     /
    `-._____.-'          `-._____.-'
|....................|...............|....|..........................|..|
          ^                      ^        ^                ^            ^
          :                      :        :                :            :
        Cover=                Bow=     Shoulder .------ Blade          Tip
         NMA              Scheme+NAAN     :       : .-------------------'
          :                   :     :     :      : :
          v                    v    v     v       v v
|..........................|....+.....|...|......|.|
 http://OwlBike.example.org/ark:/13030/tqb3kh97gh8w     <---- Example Key
                              doi:10.30/tqb3kh97gh8w          with parallel
                              hdl:13030/tqb3kh97gh8w         parts in other
                              urn:13030:tqb3kh97gh8w           id schemes.
|..........................|.......................|....
   Name Mapping Authority         Base identifier      ...
ARK	
  usage	
  in	
  10	
  years	
  
•  In	
  2001-­‐2011	
  ~100	
  organiza+ons	
  registered	
  for	
  ARKs	
  
•  Registry	
  is	
  replicated	
  at	
  BnF	
  and	
  NLM	
  
•  Some	
  of	
  the	
  largest	
  users	
  are	
  
    –  The	
  California	
  Digital	
  Library	
  
    –  The	
  Internet	
  Archive	
  
    –  Bibliothèque	
  na+onale	
  de	
  France	
  
    –  Por+co	
  Digital	
  Preserva+on	
  Service	
  
    –  University	
  of	
  California	
  Berkeley	
  
    –  University	
  of	
  Chicago	
  
Some	
  other	
  ARK	
  registrants	
  
	
  	
  	
  	
  	
  	
  12025	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  US	
  Na+onal	
  Library	
  of	
  Medicine	
  
	
  	
  	
  	
  	
  	
  86077	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Cornell	
  Ins+tute	
  for	
  Social	
  and	
  Economic	
  Research	
  
	
  	
  	
  	
  	
  	
  26677	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Library	
  and	
  Archives	
  Canada	
  
	
  	
  	
  	
  	
  	
  77635	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Humboldt-­‐Universität	
  zu	
  Berlin	
  
	
  	
  	
  	
  	
  	
  13038	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  World	
  Intellectual	
  Property	
  Organiza+on	
  
	
  	
  	
  	
  	
  	
  78319	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Google	
  
	
  	
  	
  	
  	
  	
  61001	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  University	
  of	
  Chicago	
  
	
  	
  	
  	
  	
  	
  28722	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  University	
  of	
  California	
  Berkeley	
  
	
  	
  	
  	
  	
  	
  64269	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  UK	
  Digital	
  Cura+on	
  Centre	
  
	
  	
  	
  	
  	
  	
  87895	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Centre	
  Informa+que	
  Na+onal	
  de	
  l'Enseignement	
  Supérieur	
  
	
  	
  	
  	
  	
  	
  61903	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Family	
  Search	
  
	
  	
  	
  	
  	
  	
  52327	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Na+onal	
  Library	
  and	
  Archives	
  of	
  Quebec	
  
	
  	
  	
  	
  	
  	
  10261	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Jüdisches	
  Museum	
  Berlin	
  
	
  	
  	
  	
  	
  	
  71479	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Spanish	
  Na+onal	
  Research	
  Council	
  
	
  	
  	
  	
  	
  	
  32833	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Massachusels	
  Ins+tute	
  of	
  Technology	
  
	
  	
  	
  	
  	
  	
  81055	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Bri+sh	
  Library	
  
	
  	
  	
  	
  	
  	
  80713	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Biblioteca	
  Nacional	
  de	
  Portugal	
  
Immersion	
  vs	
  landing	
  page	
  
What	
  do	
  you	
  mean	
  by	
  “get	
  the	
  data”?	
  
What	
  inflec+ons	
  might	
  dis+nguish	
  these?	
  


• Immersion	
  –	
  a	
  consump+ve	
  experience	
  or	
  



• Landing	
  page	
  –	
  a	
  menu-­‐study	
  experience?	
  
The ARK Identifier Scheme at Ten Years Old
Vision	
  for	
  a	
  “data	
  paper”	
  	
  
•  Wrap	
  the	
  unfamiliar	
  in	
  a	
  familiar	
  façade	
  
•  A	
  “data	
  paper”	
  is	
  minimally	
  a	
  cover	
  sheet	
  
   and	
  a	
  set	
  of	
  links	
  to	
  archived	
  ar+facts	
  	
  
•  Cover	
  sheet	
  contains	
  familiar	
  elements:	
  
   +tle,	
  date,	
  authors,	
  abstract,	
  and	
  
   persistent	
  iden+fier	
  (DOI,	
  ARK,	
  etc.)	
  
•  Just	
  enough	
  to	
  permit	
  basic	
  exposure	
  and	
  
   discovery	
  
–  Building	
  a	
  basic	
  data	
  cita+on	
  	
  
–  Indexing	
  by	
  services	
  such	
  as	
  Web	
  of	
  
   Science,	
  Google	
  Scholar	
  
–  Ins+lling	
  	
  confidence	
  in	
  the	
  iden+fier’s	
  	
  
   stability	
  	
  
New	
  distributed	
  framework	
  
           Coordina9ng	
  Nodes	
              Flexible,	
  scalable,	
  
              Member	
  Nodes	
  
•  retain	
  complete	
  metadata	
  
                                              sustainable	
  network	
  
• 	
  catalog	
  	
   ins+tu+ons	
  
      	
  diverse	
  
•  subset	
  of	
  all	
  data	
  
• 	
  	
  serve	
  local	
  community	
  
•  perform	
  basic	
  indexing	
  
• 	
  provide	
  network-­‐wide	
  
•  	
  provide	
  resources	
  for	
  
managing	
  their	
  data	
  
     services	
  
•  ensure	
  data	
  availability	
  
     (preserva+on)	
  	
  	
  
•  provide	
  replica+on	
  
     services	
  
ARKs	
  –	
  coming	
  soon	
  
•  Community	
  forum	
  
•  Standardiza+on	
  as	
  an	
  Internet	
  RFC	
  
•  New	
  inflec+ons	
  for	
  landing	
  page	
  &	
  immersion	
  
N2T/EZID	
  –	
  coming	
  soon	
  
•  Indexing	
  by	
  A&I	
  vendors	
  
•  Suffix	
  pass-­‐through	
  
   –  Register	
  Name	
  -­‐>	
  target	
  T	
  
   –  Resolve	
  Name/a/b/c	
  -­‐>	
  T/a/b/c	
  automa+cally	
  
   –  Greatly	
  reduce	
  number	
  of	
  ids	
  to	
  manage	
  
•  URNs	
  

More Related Content

PDF
Spark cassandra integration, theory and practice
PDF
Apache Spark: Moving on from Hadoop
PPTX
ARK identifiers: lessons learnt at BnF: paths forward
PDF
DCMI ARK Tutorial 2024.10.20, slides and notes, 120 mins.pdf
PPTX
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
PDF
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
PDF
06 gioca-ontologies
PPT
Persistently identifying website content
Spark cassandra integration, theory and practice
Apache Spark: Moving on from Hadoop
ARK identifiers: lessons learnt at BnF: paths forward
DCMI ARK Tutorial 2024.10.20, slides and notes, 120 mins.pdf
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
06 gioca-ontologies
Persistently identifying website content

Similar to The ARK Identifier Scheme at Ten Years Old (20)

KEY
Biodiversity Informatics on the Semantic Web
PDF
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
PPTX
Thomas ecn 2012
ZIP
Wikisym Deep Hypertext slides
PPTX
Libraries and Linked Data: Looking to the Future (2)
PDF
E-ARK-iPRES2016-Bern-October-2016
PPTX
Semantic web
PPTX
Taxonomy, ontology, folksonomies & SKOS.
PPTX
Scalable Identifiers for Natural History Collections
PPT
Pratt Sils LIS653 4 Fall 2007
PDF
RDF, RDA, and other TLAs
PDF
PPTX
Url,purl and doi
PDF
Future-Proofing the Web: What We Can Do Today
DOCX
Adventures at the Article Level - Suggested Readings
PPTX
EZID: Easy Persistent Identifiers and Data Citation
PDF
Speech acts meet tagging: NiceTag ontology (Pragmatic Web)
PDF
Schema and Identity for Linked Data
PPTX
Linked data in the digital humanities skills workshop for realising the oppo...
PPT
The Elephant in the Library
Biodiversity Informatics on the Semantic Web
Persistent Identifiers, Herbarium workshop at Kongsvold, September 1 to 4, 2014
Thomas ecn 2012
Wikisym Deep Hypertext slides
Libraries and Linked Data: Looking to the Future (2)
E-ARK-iPRES2016-Bern-October-2016
Semantic web
Taxonomy, ontology, folksonomies & SKOS.
Scalable Identifiers for Natural History Collections
Pratt Sils LIS653 4 Fall 2007
RDF, RDA, and other TLAs
Url,purl and doi
Future-Proofing the Web: What We Can Do Today
Adventures at the Article Level - Suggested Readings
EZID: Easy Persistent Identifiers and Data Citation
Speech acts meet tagging: NiceTag ontology (Pragmatic Web)
Schema and Identity for Linked Data
Linked data in the digital humanities skills workshop for realising the oppo...
The Elephant in the Library
Ad

More from John Kunze (18)

PPTX
The YAMZ Metadictionary
PPTX
YAMZ Metadata Vocabulary Builder
PDF
EZID and N2T at CDL
PDF
YAMZ.net: better, faster, cheaper taxonomy building
PDF
A Vocabulary for Persistence
PDF
Identifiers obey Resolvers not Schemes
PPTX
YAMZ: a cross-domain crowd-sourced metadata vocabulary
PPTX
DataONE Preservation and Metadata Working Group Report 2014
PPTX
Selected Bash shell tricks from Camp CDL breakout group
PDF
Annotating Research Datasets
PPTX
The Data Management Ecosystem
PPTX
Library Tools Supporting Data-Rich Research
PPTX
Big Data's Long Tail
PPTX
Pamwg 2012ahm
PDF
Supporting Data-Rich Research on Many Fronts
PDF
New Metaphors: Data Papers and Data Citations
PDF
Pairtrees for object storage
PDF
The BagIt file package format
The YAMZ Metadictionary
YAMZ Metadata Vocabulary Builder
EZID and N2T at CDL
YAMZ.net: better, faster, cheaper taxonomy building
A Vocabulary for Persistence
Identifiers obey Resolvers not Schemes
YAMZ: a cross-domain crowd-sourced metadata vocabulary
DataONE Preservation and Metadata Working Group Report 2014
Selected Bash shell tricks from Camp CDL breakout group
Annotating Research Datasets
The Data Management Ecosystem
Library Tools Supporting Data-Rich Research
Big Data's Long Tail
Pamwg 2012ahm
Supporting Data-Rich Research on Many Fronts
New Metaphors: Data Papers and Data Citations
Pairtrees for object storage
The BagIt file package format
Ad

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
cuic standard and advanced reporting.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
KodekX | Application Modernization Development
PDF
Spectral efficient network and resource selection model in 5G networks
Approach and Philosophy of On baking technology
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
cuic standard and advanced reporting.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Unlocking AI with Model Context Protocol (MCP)
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Per capita expenditure prediction using model stacking based on satellite ima...
Encapsulation_ Review paper, used for researhc scholars
Review of recent advances in non-invasive hemoglobin estimation
Reach Out and Touch Someone: Haptics and Empathic Computing
Dropbox Q2 2025 Financial Results & Investor Presentation
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
KodekX | Application Modernization Development
Spectral efficient network and resource selection model in 5G networks

The ARK Identifier Scheme at Ten Years Old

  • 1. The  ARK  Iden+fier  Scheme  at   Ten  Years  Old   7   M a y   2 0 1 2   J o h n   K u n z e   U n i v e r s i t y   o f   C a l i f o r n i a   C u r a + o n   C e n t e r   C a l i f o r n i a   D i g i t a l   L i b r a r y  
  • 2. California  Digital  Library   Serving  the  University  of  California   CDL  supports  the  research  lifecycle     •  10  campuses   •  Collec+ons   •  360K  students,  faculty,  and  staff   •  Digital  Special  Collec+ons   •  100’s  of  museums,  art  galleries,   •  Discovery  &  Delivery   observatories,  marine  centers,   •  Publishing  Group   botanical  gardens   •  UC  Cura+on  Center  (UC3)   •  5  medical  centers   •  5  law  schools   •  3  Na+onal  Laboratories  
  • 4. Today’s  journey   • What  are  ARKs?   • Separa+on  of  concerns   • Naming  ≠  hos+ng   • Scheme  ≠  resolu+on   • Syntax  ≠  persistence   • Inflec+ons  and  metadata   • EZID  (easy  iden+fiers)  and   N2T  (name-­‐to-­‐thing)   • Data  cita+on,  passthrough  
  • 5. What’s  an  ARK  iden+fier?   ARK  =  Archival  Resource  Key   ARKs  support  long-­‐term  access  to  informa+on  objects   ARKs  iden+fy  objects  of  any  type:   •  digital  objects  –  data,  documents,  images,  sodware,  ...   •  physical  objects  –  books,  bones,  statues,  ...   •  groups  &  living  beings  –  people,  animals,  orchestras,  ...   •  Intangibles  –  places,  chemicals,  diseases,  terms,  ...  
  • 6. The  URL  is  dead,  long  live  the  URL!   Fallacy  #1:    URLs  are  unreliable,  so  instead  use   this...  um...  well...  ah  ...  (shhh!)  “URL”   Some  of  your  best  friends  are  URLs:   hlp://dx.doi.org/10.1234/98765   hlp://hdl.handle.net/10.1234/98765   hlp://purl.org/10.1234/98765   hlp://n2t.net/ark:/101234/98765  
  • 7. Persistence  is  about  service   •  Imagine  the  “perfect”  golden  iden+fier   •  Apply  bankruptcy,  disk  crash,  human  error,  or   war,  and  there’s  nothing  that  syntax,  scheme,  or   resolver  can  do  to  prevent  iden+fier  breakage.  
  • 8. What’s  an  ARK  iden+fier?  (take  2)   An  ARK  is  a  URL,  with  some  extra  rules   ARK  reserves  /  and  .  for  what  we  oden  assume   •  A/B/C  means  C  is  contained  in  A/B,  and  B  in  A   •  A.pdf,  A.html,  and  A.docx  are  all  variants  of  A   Could  dras+cally  improve  search  result  display   •  No  need  to  lookup  rela+onships  
  • 9. ARK  inflec+ons  (declina+ons)   An  ARK  is  a  special  URL  with  access  to  3  things   1.  An  informa+on  object   2.  Its  metadata,  by  appending  ‘?’  inflec+on   3.  A  provider’s  promise,  by  appending  a  ‘??’   An  inflec1on  changes  a  name  ending  for  a  purpose   •  Reduces  the  number  of  different  names  needed   •  Use  seman+c  web  without  hiring  a  programmer  
  • 10. ‘?’  Inflec+on  returns  Dublin  Kernel   Same  machine-­‐readable  informa+on  as  before:   erc:! who: National Research Council! what: The Digital Dilemma! when: 2000! where: http://books.nap.edu/html/digital%5Fdilemma! Even  shorter:   erc: National Research Council! | The Digital Dilemma | 2000 ! | http://books.nap.edu/html/digital%5Fdilemma! See  hlp://dublincore.org/groups/kernel/  for  more  informa+on!
  • 11. Why  use  ARKs?   ARKs  are  assigned  for  a  variety  of  reasons:   •  affordability  –  there  are  no  fees  to  assign  or  use  ARKs   •  self-­‐sufficiency  –  can  host  ARKs  on  your  own  web  server   •  portability  –  can  move  ARKs  without  change  of  iden+ty   http://cdlib.org/ark:/12025/654xz321 http://rutgers.edu/ark:/12025/654xz321 http://n2t.net/ark:/12025/654xz321   •  global  resolvability  –  can  host  ARKs  at  N2T  resolver   •  density  –  mixed  case  means  CD,  Cd,  cD,  cd  are  all  dis+nct  
  • 12. Some  unique  advantages  of  ARKs   •  simplicity  –  uses  only  ordinary  "redirects”  &  "get"  requests   •  versa+lity  –  with  "inflec+ons"  (different  endings),  an  ARK   should  access  data,  metadata,  promises,  and  more   •  transparency  –  no  iden+fier  can  guarantee  stability,  and   ARK  inflec+ons  help  users  make  informed  judgments   •  visibility  –  syntax  rules  make  ARKs  easy  to  extract  and  to   compare  for  containment  and    variant  rela+onships   •  reserved  characters:    -­‐  (hyphen),    /  (slash),    .  (period)  
  • 13. What’s  an  ARK  iden+fier?  (take  3)   ARK  is  a  collec+on  of  good  ideas   •  Separates  scheme  syntax  from  resolver  rules   –  Resolu1on  is  a  process  of  mapping  an  id  to  a  thing   •  Separates  name  assigning  from  name  mapping   •  All  schemes  encouraged  to  use  these  ideas,  even   ordinary  URLs   •  N2T  resolver  can  support  them  for  any  scheme  
  • 14. Iden+fier  schemes  are  highly  parallel   Scheme : Name Mapping Authority : Name Assigning Authority : (NMA) : : Number (NAAN) v v v |..........................|....+..................| http://dx.doi.org/doi:10.30/tqb3kh97gh8w http://hdl.handle.net/hdl:13030/tqb3kh97gh8w http://purl.org/tqb3kh97gh8w ... urn:13030:tqb3kh97gh8w http://n2t.net/ark:/13030/tqb3kh97gh8w http://OwlBike.example.org/ark:/13030/tqb3kh97gh8w |..........................|.......................|...... Branded or neutral Base identifier Suffix
  • 15. Locksmith  jargon:  shoulder,  blade,  +p,  bow,  cover   _____ slips on _____ .-' ,_,'-.. ----> .-' '-. / (o,o) / : {`"'} || : `____ / .-. -"-"- || / .-. '--^. .^--^. .^. { ( ) || { ( ) `-' `-^--^-' '--^. `-' _o || '-' ===================================} : _|<,_ || : __________________________________/ (*)/(*) / / `-._____.-' `-._____.-' |....................|...............|....|..........................|..| ^ ^ ^ ^ ^ : : : : : Cover= Bow= Shoulder .------ Blade Tip NMA Scheme+NAAN : : .-------------------' : : : : : : v v v v v v |..........................|....+.....|...|......|.| http://OwlBike.example.org/ark:/13030/tqb3kh97gh8w <---- Example Key doi:10.30/tqb3kh97gh8w with parallel hdl:13030/tqb3kh97gh8w parts in other urn:13030:tqb3kh97gh8w id schemes. |..........................|.......................|.... Name Mapping Authority Base identifier ...
  • 16. ARK  usage  in  10  years   •  In  2001-­‐2011  ~100  organiza+ons  registered  for  ARKs   •  Registry  is  replicated  at  BnF  and  NLM   •  Some  of  the  largest  users  are   –  The  California  Digital  Library   –  The  Internet  Archive   –  Bibliothèque  na+onale  de  France   –  Por+co  Digital  Preserva+on  Service   –  University  of  California  Berkeley   –  University  of  Chicago  
  • 17. Some  other  ARK  registrants              12025                      US  Na+onal  Library  of  Medicine              86077                      Cornell  Ins+tute  for  Social  and  Economic  Research              26677                      Library  and  Archives  Canada              77635                      Humboldt-­‐Universität  zu  Berlin              13038                      World  Intellectual  Property  Organiza+on              78319                      Google              61001                      University  of  Chicago              28722                      University  of  California  Berkeley              64269                      UK  Digital  Cura+on  Centre              87895                      Centre  Informa+que  Na+onal  de  l'Enseignement  Supérieur              61903                      Family  Search              52327                      Na+onal  Library  and  Archives  of  Quebec              10261                      Jüdisches  Museum  Berlin              71479                      Spanish  Na+onal  Research  Council              32833                      Massachusels  Ins+tute  of  Technology              81055                      Bri+sh  Library              80713                      Biblioteca  Nacional  de  Portugal  
  • 18. Immersion  vs  landing  page   What  do  you  mean  by  “get  the  data”?   What  inflec+ons  might  dis+nguish  these?   • Immersion  –  a  consump+ve  experience  or   • Landing  page  –  a  menu-­‐study  experience?  
  • 20. Vision  for  a  “data  paper”     •  Wrap  the  unfamiliar  in  a  familiar  façade   •  A  “data  paper”  is  minimally  a  cover  sheet   and  a  set  of  links  to  archived  ar+facts     •  Cover  sheet  contains  familiar  elements:   +tle,  date,  authors,  abstract,  and   persistent  iden+fier  (DOI,  ARK,  etc.)   •  Just  enough  to  permit  basic  exposure  and   discovery   –  Building  a  basic  data  cita+on     –  Indexing  by  services  such  as  Web  of   Science,  Google  Scholar   –  Ins+lling    confidence  in  the  iden+fier’s     stability    
  • 21. New  distributed  framework   Coordina9ng  Nodes   Flexible,  scalable,   Member  Nodes   •  retain  complete  metadata   sustainable  network   •   catalog     ins+tu+ons    diverse   •  subset  of  all  data   •     serve  local  community   •  perform  basic  indexing   •   provide  network-­‐wide   •   provide  resources  for   managing  their  data   services   •  ensure  data  availability   (preserva+on)       •  provide  replica+on   services  
  • 22. ARKs  –  coming  soon   •  Community  forum   •  Standardiza+on  as  an  Internet  RFC   •  New  inflec+ons  for  landing  page  &  immersion  
  • 23. N2T/EZID  –  coming  soon   •  Indexing  by  A&I  vendors   •  Suffix  pass-­‐through   –  Register  Name  -­‐>  target  T   –  Resolve  Name/a/b/c  -­‐>  T/a/b/c  automa+cally   –  Greatly  reduce  number  of  ids  to  manage   •  URNs