SlideShare a Scribd company logo
Open Data
Peter Murray-Rust*,
Open Knowledge and University of Cambridge
European Bioinformatics Institute, UK, 2014-05-15
*Shuttleworth Fellow 2014-5
Overview
• Most scientific data is lost; costs many billions…
• … AND LIVES. Closed Data Means People Die
• Human problem; lack of vision + active opposition.
• Fully open data can change this
• Appreciation of Jean-Claude Bradley’s work
• Panton Fellows (Ross Mounce, Sophie Kershaw)
• Content Mining as partial solution (Hargreaves UK)
• WHAT YOU MUST DO
Elsevier wants to control Open Data
Open Knowledge and University of Cambridge European Bioinformatics Institute
Award of Blue Obelisk
Jean-Claude Bradley Egon Willighagen
Open Knowledge and University of Cambridge European Bioinformatics Institute
Conventional Research
“Lab” work paper/th
esis
Write
rewrite
Re-experiment
publish
???
Validation??
DATA
All your data are
belong to publisher
Free/Open Software Development
Engineered
repository
World
community
CODE
rewrite
validate
CODE
fork
CODE
Re-use
CODE
Re-use
Github, BitBucket
Stackoverflow,
Apache
e.g. Chem4Word (M-R group)
Outercurve repository,
Now developed by ex-pharma s/w
And interfaced to ChemDoodle
inspires
OSI
Open Knowledge and University of Cambridge European Bioinformatics Institute
Open Source software inspires Open Science
Jean-Claude Bradley 2006
Open Notebook Science, ONS
Jean-Claude Bradley 2006
Open Knowledge and University of Cambridge European Bioinformatics Institute
Jean-Claude Bradley 2006
Jean-Claude Bradley 2006
Jean-Claude Bradley 2006
And spectra were included as well
Jean-Claude Bradley 2006
https://www.youtube.com/watch?v=BN8UjUL
NG9A&feature=youtube_gdata
Jean-Claude Bradley talking in 2013
TOOLS
Open Science
Open
engineered
repository
World
community
INSTRUMENT
validate
merge
MODEL
CODE
DATA
DATA
knowledge
calibrate
Problems are solved communally;
Nothing is needlessly duplicated; “publication“ is
continuous ; data are SEMANTIC
Machines
and humans
Working
together
Mat Todd, University of Sydney
• JC was a pioneer in open science, and uncompromising
about its importance. We had so many productive
interactions over the years, starting from the end of
January 2006, when we started our open chemistry project
on The Synaptic Leap (JC was the first to comment!) and JC
posted his very first experiment online at Usefulchem. I
remember starting to think about how to do completely
open projects, looking around the web in 2005 to see if
anything open was going on in chemistry, and coming
across JC's lone voice, and I thought "Wow, who is this
guy?" He had dedication and integrity - we'll all miss him.
2014-05-15 (Mail to PM-R)
Mat Todd, University of Sydney: Antimalarial
Open Knowledge and University of Cambridge European Bioinformatics Institute
The economic value of data
• I believe that we spend globally ca 400 billion
USD / yr on public research.
• The outputs include:
– Knowledge / papers / patents
– Organizations
– People
– materials
– Data – many billions/year and much is lost
Open Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics Institute
http://michaelnielsen.org/blog/reinventing-
discovery/
https://en.wikipedia.org/wiki/Reinventing_Discovery
Michael Neilsen
Kasparov versus the World, The Wisdom of Crowds, various online collaborative projects
InnoCentive, collective intelligence, Paul Seabright's economic theory, online chat
History of Linux, Open Architecture Network, Wikipedia, MathWorks' computer programming
contest
communication in small groups, particularly as studied by Stasser and Titus; praxis of science; a
discussion of communication among scientists
Don R. Swanson and Literature-based discovery, predicting influenza with Google searches,
Sloan Digital Sky Survey, Allen Institute for Brain Science, Ocean Observatories Initiative, Human
Genome Project, Google Translate
Democratizing Science Galaxy Zoo, Foldit, citizen science, eBird, open access, arXiv, PLoS
The Challenge of Doing Science in the Open Complexity Zoo, academic publishing
The Open Science Imperative Open science, academic journal publishing reform, SPIRES
appendix - The problem solved by the Polymath Project
Open Knowledge and University of Cambridge European Bioinformatics Institute
“Free” and “Open”
• "Free software is a matter of liberty, not price. ’free
speech', not 'free beer'”. (RMS)
• “A piece of data or content is open if anyone is free to use,
reuse, and redistribute it”
(OKFN)http://opendefinition.org/
• “open” (access) has multiple incompatible “definitions”.
Major split is “human eyeballs” vs copying and machine
“reusability”
• “Open” is a marketing term for publishers, who frequently
(often deliberately) do not grant full Openness.
4 Freedoms (Richard Stallman)
• Freedom 0: The freedom to run the program for any purpose.
• Freedom 1: The freedom to study how the program works, and
change it to make it do what you wish.
• Freedom 2: The freedom to redistribute copies so you can help
your neighbor.
• Freedom 3: The freedom to improve the program, and release
your improvements (and modified versions in general) to the
public, so that the whole community benefits.
"I’ve spent a third of my life building software based on Stallman’sfour freedoms, and
I’ve been astonished by the results. WordPress wouldn’t be here if it weren’t for those
freedoms, and it couldn’t have evolved the way it has.”
- Matt Mullenweg, co-creator of WordPress
Critical Historical Open Events
• Free Software Foundation (RMS,
1985) and Linux (Torvalds, 1991)
• The World Wide Web (TBL, 1991)
• The human genome (1990-2001)
The life of Aaron Swarz (1986-2013)
https://en.wikipedia.org/wiki/Bermuda_Principles
• Automatic release of sequence assemblies larger than 1
kb (preferably within 24 hours).
• Immediate publication of finished annotated
sequences.
• Aim to make the entire sequence freely available in the
public domain for both research and development in
order to maximise benefits to society.
http://www.budapestopenaccessinitiative.org/read
… an unprecedented public good. …
… completely free and unrestricted access to [peer-
reviewed literature] by all scientists, scholars, teachers,
students, and other curious minds. …
…Removing access barriers to this literature will
accelerate research, enrich education, share the
learning of the rich with the poor and the poor with
the rich, make this literature as useful as it can be, and
lay the foundation for uniting humanity in a common
intellectual conversation and quest for knowledge.
(BOAI, 2003)
Where to put the data?
Mendeley
From Wikipedia, the free encyclopedia
• Mendeley – a social media site used by many
scientists to store metadata …
• … purchased by Elsevier in 2013
• David Dobbs, in The New Yorker, described
motive as:
– to acquire its user data,
– to destroy or coöpt an open-science icon that
threatens its business model.
• PM-R: Mendeley can also Snoop and Control
Authors don’t deposit data (Ross Mounce)
NOTE: RSC have always published raw crystal data as
“CC0” and the enhanced data is openly
available
Open Knowledge and University of Cambridge European Bioinformatics Institute
Restrictions on Re-use of Crystallographic data
NOTE: The CCDC is based on data contributed by
scientists as part of publication and validation
Open Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics Institute
(auth: Mark Hahnel in response to our debates)
Panton Principles for Open Data in
science(2010)
• …make an explicit and robust statement of your
wishes.
• Use a recognized waiver or license that is appropriate
for data.
• open as defined by the Open Knowledge/Data
Definition (… NOT non-commercial)
• Explicit dedication of data … into the public domain
via PDDL or CCZero
Panton Authors and Fellows
Sophie Kershaw, Panton Fellow :
Doctoral Training in Oxford
Sophie Kershaw, Panton Fellow
Reproducibility?
Begley & Ellis (2012)
Nature 483, 531-533
Image shown is from front page of Begley & Ellis
(2012), produced by the Nature Publishing Group
“Train a new generation of data scientists
and broaden public understanding”
“Riding The Wave”
European Commission
October 2010
Rotation-Based Learning (RBL)
Phase 1: Initiator
• No communication
permitted between groups
• Attempt to reproduce
existing literature
• Deliver a coherent research
story by the end of Phase 1
Phase 2: Successor
• Communication between
groups still prohibited
• Validate and develop the
inherited research story
• Critique your predecessors
• Role of research producer vs. research user
• Can this approach help to foster awareness of reproducibility issues?
Throughout Phases 1 & 2:
• Daily lectures on open
science culture & techniques
• First-hand application to own
research work
• Version control using GitHub
• Daily group supervision
“Do you think you would be
more confident in the future
about trying to apply Open
techniques to your work..?”
• 50% Yes, by myself
• 41% Yes, with help/guidance
• 9% No opinion/neutral
• 0% No
Ross Mounce (Bath), Panton Fellow
• Sharing research data:
http://www.slideshare.net/rossmounce
• How to figures from PLOS/One [link]:
Ross shows how to bring figures to life:
• PLOSOne at http://bit.ly/PLOStrees
• PLOS at http://bit.ly/phylofigs (demo)
TOOLS
Open Notebook Science
Open
engineered
repository
World
community
INSTRUMENT
validate
merge
MODEL
CODE
DATA
DATA
knowledge
calibrate
Problems are solved communally;
Nothing is needlessly duplicated; “publication“ is
continuous
Machines
and humans
Working
together
CC-BY
Content Mining
“Lab” work paper/th
esis
Write
publish
???
DATA
Intelligent software
To read scientific papers
DATA
Despite the inefficiency and loss much unused data remains
In published articles. Publishers have tried to stop us mining it.
On 2014-06-01 IT WILL BE LEGAL IN UK!
Content Mining
• 1,000,000 papers/year => 3,000 / day => 2 /min
• 10,000+ phylogenetic trees (Ross Mounce, BBSRC)
• 20,000 chemical reactions / day
• >> 1 million graphs, plots, bar charts, statistics
• Possible on a laptop
• http://contentmine.org
Anyone interested in data from clinical trials papers?
AMI2: High-throughput extraction of
semantic chemistry from the scientific
literature
Andy Howlett, Mark Williamson, Peter Murray-Rust,
Unilever Centre, Cambridge
AMI2 is a framework that can extract
semantic data from the scientific
literature.
AMI2 architecture
Visitor Design Pattern/Example
Visitor= something that extracts a specific type of data
SpeciesVisitor, ChemVisitor, PhylogeneticTreeVisitor,
GeoLocationVisitor, ClinicalTrialVisitor …
Visitable= something that can have specific data extracted
PDF, SVG, Table
ChemistryVisitor
Can interpret diagram or look up chemistry in PubChem or ChEBI
PhylogeneticTreeVisitor
1) SpeciesVisitor
2) ChemistryVisitor
C) What’s the problem with this spectrum?
Org. Lett., 2011, 13 (15), pp 4084–4087
Original thanks to ChemBark
After AMI2 processing…..
… AMI2 has detected a square
Open Knowledge and University of Cambridge European Bioinformatics Institute
Thanks
• BBSRC for PLUTo project (Bath)
• Unilever Research for PhD (Andy Howlett)
• TSB / Cambridge IP (PDRA Mark Williamson)
• Shuttleworth Foundation (Fellowship PM-R)
• Julian Huppert MP and David Willetts (support for
Hargreaves copyright reform)
• Christoph Steinbeck (EBI) Metabolights
• The ContentMine team (Michelle Brook, Ross Mounce,
Jenny Molloy, Richard Smith-Unna, CottageLabs)
• The Blue Obelisk
• Open Knowledge
• Apache PDFBox and all F/LOSS software authors
• Unilever Centre and University of Cambridge
CLOSED ACCESS MEANS PEOPLE DIE
• Create Open Notebook Science in your discipline
• Actively release data into Public Domain.
• Actively campaign against any re-use restrictions
(including CC-BY-NC)
• Refuse to work with closed organizations
CLOSED DATA MEANS PEOPLE DIE
http://usefulchem.blogspot.co.uk/2011/06/quest-to-determine-melting-point-of-4.html
http://www.slideshare.net/jcbradley/minisymp2011-bradley
https://impactstory.org/BlueObelisk
http://www.slideshare.net/rossmounce/sharing-reusable-phylogenetic-data-were-not-
there-yet
http://footnote1.com/the-exploitative-
economics-of-academic-publishing/
http://web.ornl.gov/sci/techresources/Human
_Genome/publicat/BattelleReport2011.pdf
https://www.youtube.com/watch?v=BN8UjUL
NG9A&feature=youtube_gdata mins 5-9
Some references
TOOLS
Open Science
Open
engineered
repository
World
community
INSTRUMENT
validate
merge
MODEL
CODE
DATA
DATA
knowledge
calibrate
“Publication” is continuous and all “curious minds”
can be involved.
Open Knowledge and University of Cambridge European Bioinformatics Institute
3) PhylogeneticTreeVisitor

More Related Content

PPTX
Principles and practice of Open Science
PPTX
Open Data and Open Science
PPTX
Open Notebook Science
PPTX
Open data and Open Science
PPTX
OpenNotebookScience NOW!
PPTX
The Content Mine (presented at UKSG)
PPTX
ContentMine: Open Data and Social Machines
Principles and practice of Open Science
Open Data and Open Science
Open Notebook Science
Open data and Open Science
OpenNotebookScience NOW!
The Content Mine (presented at UKSG)
ContentMine: Open Data and Social Machines

What's hot (20)

PPTX
Content Mining for Machines and Humans
PPTX
Petermrjisc20141201
PPTX
ContentMining in Neuroscience
PPTX
ContentMine: Liberating scholarship from Open publications and theses
PPT
The Future of Research (Science and Technology)
KEY
Dagstuhl "Future" sesssion intro slides
PPTX
Content Mining at Wellcome Trust
PDF
Humanities Crowdsourcing on the Zooniverse Platform
PPT
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
PPTX
Can Computers understand the scientific literature (includes compscie material)
PPTX
The culture of researchData
PPTX
Paradise Lost and The Right to Read is the Right to Mine
PPTX
Open science / open research
PDF
Data Publication at CDL for IDCC14
PPTX
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
PPTX
The Chemist's Toolkit 10 9 09
PPTX
Automatic Extraction of Knowledge from the Literature
PPTX
ContentMine: Open Data and Social Machines
PPTX
Data Publishing in Archaeozoology
PPTX
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Content Mining for Machines and Humans
Petermrjisc20141201
ContentMining in Neuroscience
ContentMine: Liberating scholarship from Open publications and theses
The Future of Research (Science and Technology)
Dagstuhl "Future" sesssion intro slides
Content Mining at Wellcome Trust
Humanities Crowdsourcing on the Zooniverse Platform
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Can Computers understand the scientific literature (includes compscie material)
The culture of researchData
Paradise Lost and The Right to Read is the Right to Mine
Open science / open research
Data Publication at CDL for IDCC14
Laurie Goodman at #SSPBoston: Article+Data+Tools Reproducibility, Reuse, & Ra...
The Chemist's Toolkit 10 9 09
Automatic Extraction of Knowledge from the Literature
ContentMine: Open Data and Social Machines
Data Publishing in Archaeozoology
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Ad

Viewers also liked (18)

PPTX
Content Mining of Science in Europe
PPTX
Content Mining of Science in Cambridge
PPTX
Copyright Reform and Open Data
PPTX
ContentMining in Neuroscience
PPTX
Content Mining for Machines and Humans
PPTX
PLOS slides
PPTX
TheContentMine: Mining for Everyone
PPTX
Embrace the Open Revolution
PPTX
Mining Scientific Images
PPTX
Content Mining of Science in Europe
PPTX
Principles and practice of Open Science
PPTX
Making Theses USEFUL
PPTX
Automatic Extraction of Science and Medicine from the scholarly literature
PPTX
Content Mining of Science and Medicine
PPTX
ContentMining for Synthetic Biology
PPTX
Digital Scholarship: Enlightenment or Devastated Landscape?
PPTX
Liberating facts from the scientific literature - Jisc Digifest 2016
PPTX
ContentMine and WikiData
Content Mining of Science in Europe
Content Mining of Science in Cambridge
Copyright Reform and Open Data
ContentMining in Neuroscience
Content Mining for Machines and Humans
PLOS slides
TheContentMine: Mining for Everyone
Embrace the Open Revolution
Mining Scientific Images
Content Mining of Science in Europe
Principles and practice of Open Science
Making Theses USEFUL
Automatic Extraction of Science and Medicine from the scholarly literature
Content Mining of Science and Medicine
ContentMining for Synthetic Biology
Digital Scholarship: Enlightenment or Devastated Landscape?
Liberating facts from the scientific literature - Jisc Digifest 2016
ContentMine and WikiData
Ad

Similar to Open Knowledge and University of Cambridge European Bioinformatics Institute (20)

PPTX
The culture of researchData
PPTX
The Culture of Research Data, by Peter Murray-Rust
PPTX
Making Theses USEFUL
PDF
Open Research Data: Licensing | Standards | Future
PDF
Open science
PPTX
Open Science
PDF
The OpenCon Intro to Open Data
PPT
What does open science mean? A stakeholder perspective
PDF
KEYNOTE: Erin McKiernan, My pledge to be open (Yeah, how’s that going?)
PPTX
Learn to speak open
PPTX
Benefits and practice of open science
PDF
Do you speak open science
PDF
The State of Open Research Data
PDF
The State of Open Research Data - OpenCon 2014
PDF
Open Sesame (and other open movements)
PDF
Cal Poly - An Overview of Open Science
PPTX
Managing and sharing data: lessons from the European context
PPTX
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
PDF
Data and Research Infrastructures and Open Science
PDF
Open Knowledge and the Benefits for University-based Research
The culture of researchData
The Culture of Research Data, by Peter Murray-Rust
Making Theses USEFUL
Open Research Data: Licensing | Standards | Future
Open science
Open Science
The OpenCon Intro to Open Data
What does open science mean? A stakeholder perspective
KEYNOTE: Erin McKiernan, My pledge to be open (Yeah, how’s that going?)
Learn to speak open
Benefits and practice of open science
Do you speak open science
The State of Open Research Data
The State of Open Research Data - OpenCon 2014
Open Sesame (and other open movements)
Cal Poly - An Overview of Open Science
Managing and sharing data: lessons from the European context
Early Career Reseachers in Science. Start Early, Be Open , Be Brave
Data and Research Infrastructures and Open Science
Open Knowledge and the Benefits for University-based Research

More from TheContentMine (14)

PPTX
High throughput mining of the scholarly literature
PPTX
Amanuens.is HUmans and machines annotating scholarly literature
PPTX
Open software and knowledge for MIOSS
PPTX
Automatic Extraction of Knowledge from Biomedical literature
PPSX
Cochrane workshop 2016
PPTX
ContentMine + EPMC: Finding Zika!
PPTX
Mining Scientific Diagrams for facts
PPTX
Can Computers understand the scientific literature (includes compscie material)
PPTX
OpenNotebookScience NOW!
PPTX
Disruptive Communities and Technology
PPTX
Overview of Practical Content Mining
PPTX
ContentMining and Clinical Trials
PPTX
Content Mining at Wellcome Trust
PPTX
ContentMine: Liberating scholarship from Open publications and theses
High throughput mining of the scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
Open software and knowledge for MIOSS
Automatic Extraction of Knowledge from Biomedical literature
Cochrane workshop 2016
ContentMine + EPMC: Finding Zika!
Mining Scientific Diagrams for facts
Can Computers understand the scientific literature (includes compscie material)
OpenNotebookScience NOW!
Disruptive Communities and Technology
Overview of Practical Content Mining
ContentMining and Clinical Trials
Content Mining at Wellcome Trust
ContentMine: Liberating scholarship from Open publications and theses

Recently uploaded (20)

PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Classroom Observation Tools for Teachers
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Pharma ospi slides which help in ospi learning
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Cell Types and Its function , kingdom of life
PPTX
Cell Structure & Organelles in detailed.
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
TR - Agricultural Crops Production NC III.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Classroom Observation Tools for Teachers
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Sports Quiz easy sports quiz sports quiz
GDM (1) (1).pptx small presentation for students
Supply Chain Operations Speaking Notes -ICLT Program
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Pharma ospi slides which help in ospi learning
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Anesthesia in Laparoscopic Surgery in India
Cell Types and Its function , kingdom of life
Cell Structure & Organelles in detailed.

Open Knowledge and University of Cambridge European Bioinformatics Institute

Editor's Notes

  • #2: Hi, I’m here to talk about AMI; a data extraction framework and tool. First, I just want highlight some of key contributors to the projects; Andy for his work on the ChemistryVisitor and Peter for the overall architecture. In this talk, I’m going to impress the importance of data in a specific format and its utility to automated machine processing. Then I’m going to demonstrate AMI’s architecture and the transformation of data as it flows through the process. I’m going to dwell a little on a core format used, Scalable Vector Graphics (SVG) before introducing the concept of visitors, which are pluggable context specific data extractors. Next, I’m going to introduce Andy’s ChemVisitor, for extracting semantic chemistry data, along with a few other visitors that can process non-chemistry specific data. Finally, I will demonstrate some uses of the ChemVisitor, within the realm of validation and metabolism.
  • #56: Hi, I’m here to talk about AMI; a data extraction framework and tool. First, I just want highlight some of key contributors to the projects; Andy for his work on the ChemistryVisitor and Peter for the overall architecture. In this talk, I’m going to impress the importance of data in a specific format and its utility to automated machine processing. Then I’m going to demonstrate AMI’s architecture and the transformation of data as it flows through the process. I’m going to dwell a little on a core format used, Scalable Vector Graphics (SVG) before introducing the concept of visitors, which are pluggable context specific data extractors. Next, I’m going to introduce Andy’s ChemVisitor, for extracting semantic chemistry data, along with a few other visitors that can process non-chemistry specific data. Finally, I will demonstrate some uses of the ChemVisitor, within the realm of validation and metabolism.
  • #57: As scientists, we publish our findings and data, and these generally manifest as PDFs on journals’ websites; ~60% of all documents are PDF. However, during this process, much information has been lost.. Hence a vast amount of Scientific knowledge has been rendered inaccessible. AMI2 is a tool that is attempting to extract data, primarily (but not exclusively) from PDFs, to produce a format that is useful to automated processing. AMI is derived from the term amanuensis; someone who copies manuscripts. In a perverse way; this is attempting to make cows from beef burgers. Dial-a-molecule is now involved
  • #58: This is an overview of how the framework looks. The entire codebase is written in JAVA and is released under quite a generous Apache 2 license. One pass over the information flow
  • #59: “Do the right thing” based on the type of two objects. Talk more about design patterns Two Abstract classes Picked up via reflection
  • #62: Latin species name
  • #63: Thus a ChemVisitor knows how to create a CMLMolecule from a SVGVisitable if it contains a picture of the molecule. OSRA counter argument:not duplication, diversity, AMI has plugins,works with JAVA
  • #64: ChemBark
  • #72: Ross’ talk for example pictures Phylogenetic trees to NeXML