Will it last?
How secure is the longevity of
archaeological data?
Ahmad Alam
Supervisor: Professor Andy Brass
Co-Supervisor: Professor Robert Stevens
Bio-Health Informatics Group
Will it last?
• Major recurrent issue
• Competition : Vellum, life span, 1000+ years.
• Loss: Several near misses – some systems ‘stillborn’
• Information systems ‘decay’ in progress
• Totally unlike ‘big data’ – this is ‘small data’
• Longevity is key
Hanging by a thread
• Almost lost once
• No paper published
version
• Currently held as web
pages by a volunteer
• No suitable ‘free’ host
• Hence a good ‘use
case’
Skeletal db: Dying –
slowly
• 100K + detailed skeletal
remains
• Written Delphi (obsolete
language)
• No source code
• No data dictionary
• Sybase Back End (a SAP
minor product)
• Dated interface, on the
verge of breaking
IADB : Contemporary,
but orphaned
• Javascript – My SQL
• Comprehensive system
• No wide scale adoption
• Several users now off-line
• Can data be trusted with it,
when there’s no support?
Manchester Mummy
Tissue database :
stillborn
• PHP - MySQL
• Caught up in deployment
‘issues’
• Original team of authors
‘gone’
• Sat in an inbox for a year (!)
• What is its long term
future?
Linked Open Data
• Evaluated
• More effort than relations
DB with web framework
• Geared to towards
electronic data
• Doesn’t help classical
information
Ask the Question!
I have an interesting challenge, and just want to hear your two pennies'
worth.
Put simply, where can one host a few dozen records (cave archaeology)
in perpetuity, i.e. where it would last the longest without being deleted,
or becoming inaccessible (e.g. dBase, 8" floppies disk etc.), for free?
'Longest' in this case means competing with vellum, calf skin, proven
life span of at least 1000 years, in the news recently due to a review of
costs as Acts of Parliament are recorded on it.
If you're still interested, the bonus question is, where could microCT
scans of mummies (70 GB each) be similarly hosted, this for the least
amount possible, free would be unfeasible for this volume of data!
Vellum is out of the question, that would be a lot of calves.
Hang on, 70 GB Mummy Records?!
• ADS York : £120,000 – A very big number!
• Zenodo CERN : data only, no manipulation, volume an issue
• Morphosource : Not EU based, breaches funding condition
• UK Data Archive : Said No, as wrong type of data
Challenges
• Are huge and diverse
• … and largely down to funding
• Preserve the artefact, the data, both?
• Is the rush to the ‘Cloud’ safe, as services come and go?
• But small steps can help
• Why not start with data capture at the earliest stage to help with long
term ‘longevity’?
Using spreadsheet programs for scientific data
• Good data organization is the foundation of any research project
• Most researchers have data or do data entry in spreadsheets
• Spreadsheet programs are very useful graphical interfaces for
designing data tables and handling very basic data quality
control functions
• Good data entry practices - formatting data tables in spreadsheets
• How to avoid common formatting mistakes
• Dates as data - beware!
• Basic quality control and data manipulation in spreadsheets
• Exporting data from spreadsheets
• Overall good data practices
• Much of a researcher’s will be spent in the 'data wrangling' stage.
It's not the most fun, but it's necessary. This can help teach how to
think about data organization and some practices for more effective
data
Resources
• http://www.datacarpentry.org/spreadsheet-ecology-
lesson/00-intro.html
• www.ahmadalam.net
• http://www.presentpasts.info/articles/10.5334/pp.58/

More Related Content

PDF
Science Gateways for Life Sciences – Balancing Usability and Re-Usability
PDF
High-Performance Networking Use Cases in Life Sciences
PDF
From the Benchtop to the Datacenter: HPC Requirements in Life Science Research
PPTX
2016 09 cxo forum
PDF
Tracking Social Practices with Big(ish) data
PPTX
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
PPTX
2017 bio it world
PPTX
2015 09 emc lsug
Science Gateways for Life Sciences – Balancing Usability and Re-Usability
High-Performance Networking Use Cases in Life Sciences
From the Benchtop to the Datacenter: HPC Requirements in Life Science Research
2016 09 cxo forum
Tracking Social Practices with Big(ish) data
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
2017 bio it world
2015 09 emc lsug

Similar to Will it last? How secure is the longevity of archaeological data? (20)

PPTX
2014 aus-agta
PDF
PPTX
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
PDF
Why would I store my data in more than one database?
PDF
Preservation and institutional repositories for the digital arts and humanities
PPT
Sept 24 NISO Virtual Conference: Library Data in the Cloud
PPTX
CLIMB System Introduction Talk - CLIMB Launch
PDF
Guy avoiding-dat apocalypse
PPTX
Introduction to Data Engineering
PPTX
Database technologies in bioinformatics
PDF
Big Data Rampage
PPTX
BDI- The Beginning (Big data training in Coimbatore)
PDF
Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...
PPTX
"Filling the digital preservation gap" with Archivematica
PDF
Is one enough? Data warehousing for biomedical research
PPTX
SMRUDAS
PDF
2010 AIRI Petabyte Challenge - View From The Trenches
PPT
Agile Data Science: Hadoop Analytics Applications
PPTX
Machine Learning with Hadoop Boston hug 2012
PDF
Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...
2014 aus-agta
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Why would I store my data in more than one database?
Preservation and institutional repositories for the digital arts and humanities
Sept 24 NISO Virtual Conference: Library Data in the Cloud
CLIMB System Introduction Talk - CLIMB Launch
Guy avoiding-dat apocalypse
Introduction to Data Engineering
Database technologies in bioinformatics
Big Data Rampage
BDI- The Beginning (Big data training in Coimbatore)
Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...
"Filling the digital preservation gap" with Archivematica
Is one enough? Data warehousing for biomedical research
SMRUDAS
2010 AIRI Petabyte Challenge - View From The Trenches
Agile Data Science: Hadoop Analytics Applications
Machine Learning with Hadoop Boston hug 2012
Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...
Ad

Recently uploaded (20)

PDF
Complications of Minimal Access-Surgery.pdf
PPTX
Computer Architecture Input Output Memory.pptx
PDF
Myanmar Dental Journal, The Journal of the Myanmar Dental Association (2013).pdf
PDF
Climate and Adaptation MCQs class 7 from chatgpt
PDF
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
PDF
Journal of Dental Science - UDMY (2020).pdf
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PPTX
Climate Change and Its Global Impact.pptx
PDF
Journal of Dental Science - UDMY (2022).pdf
PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
PDF
Hazard Identification & Risk Assessment .pdf
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
HVAC Specification 2024 according to central public works department
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PPTX
Module on health assessment of CHN. pptx
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PDF
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
Complications of Minimal Access-Surgery.pdf
Computer Architecture Input Output Memory.pptx
Myanmar Dental Journal, The Journal of the Myanmar Dental Association (2013).pdf
Climate and Adaptation MCQs class 7 from chatgpt
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
Journal of Dental Science - UDMY (2020).pdf
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
Climate Change and Its Global Impact.pptx
Journal of Dental Science - UDMY (2022).pdf
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
Hazard Identification & Risk Assessment .pdf
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
A powerpoint presentation on the Revised K-10 Science Shaping Paper
HVAC Specification 2024 according to central public works department
Share_Module_2_Power_conflict_and_negotiation.pptx
Module on health assessment of CHN. pptx
AI-driven educational solutions for real-life interventions in the Philippine...
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
Ad

Will it last? How secure is the longevity of archaeological data?

  • 1. Will it last? How secure is the longevity of archaeological data? Ahmad Alam Supervisor: Professor Andy Brass Co-Supervisor: Professor Robert Stevens Bio-Health Informatics Group
  • 2. Will it last? • Major recurrent issue • Competition : Vellum, life span, 1000+ years. • Loss: Several near misses – some systems ‘stillborn’ • Information systems ‘decay’ in progress • Totally unlike ‘big data’ – this is ‘small data’ • Longevity is key
  • 3. Hanging by a thread • Almost lost once • No paper published version • Currently held as web pages by a volunteer • No suitable ‘free’ host • Hence a good ‘use case’
  • 4. Skeletal db: Dying – slowly • 100K + detailed skeletal remains • Written Delphi (obsolete language) • No source code • No data dictionary • Sybase Back End (a SAP minor product) • Dated interface, on the verge of breaking
  • 5. IADB : Contemporary, but orphaned • Javascript – My SQL • Comprehensive system • No wide scale adoption • Several users now off-line • Can data be trusted with it, when there’s no support?
  • 6. Manchester Mummy Tissue database : stillborn • PHP - MySQL • Caught up in deployment ‘issues’ • Original team of authors ‘gone’ • Sat in an inbox for a year (!) • What is its long term future?
  • 7. Linked Open Data • Evaluated • More effort than relations DB with web framework • Geared to towards electronic data • Doesn’t help classical information
  • 8. Ask the Question! I have an interesting challenge, and just want to hear your two pennies' worth. Put simply, where can one host a few dozen records (cave archaeology) in perpetuity, i.e. where it would last the longest without being deleted, or becoming inaccessible (e.g. dBase, 8" floppies disk etc.), for free? 'Longest' in this case means competing with vellum, calf skin, proven life span of at least 1000 years, in the news recently due to a review of costs as Acts of Parliament are recorded on it. If you're still interested, the bonus question is, where could microCT scans of mummies (70 GB each) be similarly hosted, this for the least amount possible, free would be unfeasible for this volume of data! Vellum is out of the question, that would be a lot of calves.
  • 9. Hang on, 70 GB Mummy Records?! • ADS York : £120,000 – A very big number! • Zenodo CERN : data only, no manipulation, volume an issue • Morphosource : Not EU based, breaches funding condition • UK Data Archive : Said No, as wrong type of data
  • 10. Challenges • Are huge and diverse • … and largely down to funding • Preserve the artefact, the data, both? • Is the rush to the ‘Cloud’ safe, as services come and go? • But small steps can help • Why not start with data capture at the earliest stage to help with long term ‘longevity’?
  • 11. Using spreadsheet programs for scientific data • Good data organization is the foundation of any research project • Most researchers have data or do data entry in spreadsheets • Spreadsheet programs are very useful graphical interfaces for designing data tables and handling very basic data quality control functions
  • 12. • Good data entry practices - formatting data tables in spreadsheets • How to avoid common formatting mistakes • Dates as data - beware! • Basic quality control and data manipulation in spreadsheets • Exporting data from spreadsheets • Overall good data practices • Much of a researcher’s will be spent in the 'data wrangling' stage. It's not the most fun, but it's necessary. This can help teach how to think about data organization and some practices for more effective data

Editor's Notes