Databasing the World:Biodiversity and the 2000sWritten by Bowker, G. C. Presented by Chen Zhang (Mike)
Four Key AspectsDatabase InfrastructureStandards—flexible, stableTechnology—stable CommunicationData SharingOwnershipDisarticulationData collection
Four Key AspectsDistributed Collective PracticeCollaborate workNew Knowledge EconomyAccounting for lifeDevelopment of ClassificationCladisticsThe Future
Database    Infrastructure
StandardsWhy do we need standardsExample of air-conditioner industryDiameter Match between screw and the hole on the panelReasons for databaseNeed ‘handshake’ among various mediaMIME<Multipurpose Internet Mail Extensions>protocol Each layer of infrastructure requires its own set of standardsNeed standardized  categories.
StandardsStandards will not always winSome best-known standardsQWERTY keyboard
StandardsStandards will not always winSome best-known standardsVHS (Video Home System) standard
StandardsStandards will not always winSome best-known standardsDOS computing system
StandardsStandards will not always winWhy?The best standard maybe doesn’t have best marketStandards setting is a key site of political workThe inferior standard may be respected by the political agency. ( Such as standards-setting bodies)
StandardsInteroperabilityContinuum of strategies for standards settingOne Standard Fits AllLet A Thousand standards bloom
StandardsInteroperabilitySome Related Standards1. ANSI/NISO Z39.50ANSI/NISO Z39.50 is the American National Standard Information Retrieval Application Service Definition and Protocol Specification for Open Systems Interconnection.	IT makes it easier to use large information databases by standardizing the procedures and features for searching and retrieving information.
StandardsInteroperabilitySome Related StandardsANSI/NISO Z39.50
StandardsInteroperabilitySome Related Standards1. ANSI/NISO Z39.50	A single enquiry over multiple databases.	widely adopter in the library world.
StandardsInteroperabilitySome Related Standards2. XMLExtensible Markup Language (XML) is a set of rules for encoding documents in machine-readable form.	Two extremes:	a. Colonial modelb. Democratic model (win out)	       People’s established computing environment
TechnologyTechnology must be stableNothing to guarantee the stability of vast data setsFailure of Paul Otlet’s  well catalogued microfichesDevelopment of computer memoryHard to retrieve information
TechnologyTechnology must stableData accessible and usableInfrastructure will require a continued maintenance effortReasons	a.  Data is passed from one medium to anotherb.  Data is analyzed by one generation of database technology to the next.
Issues of CommunicationProblem of reliable metadataMetadata—data about dataThe blue lines are metadata
Issues of CommunicationProblem of reliable metadataThe standard name of certain kinds of dataSearchable—easy to search over multiple databaseIssue—how detail does the name of data should be?Lack of details— the information of data is uselessToo many details— longer time, more work
Issues of CommunicationDublin codeThe Dublin Core set ofmetadata elements provides a small and fundamental group of text elements through which most resources can be described and cataloged.The Simple Dublin Core Metadata Element Set (DCMES) consists of 15 metadata elements:LanguageRelationCoverageRightsTitle Creator Subject Description Publisher ContributorDateTypeFormatIdentifierSource
Data Sharing
OwnershipControl of knowledgeMid-nineteenth century: only professionally trained scientists and doctors New information economy: from many peopleExample: patients group
OwnershipPrivacyKeep data private is difficult :	Example: data is complied by third-company to generate a new, marketable form of knowledgeNew Patterns of ownershipScience has frequently been analyzed as a “public good”Increasing privatization of knowledge :  	It is unclear to what extent the vaunted openness of the scientific community will last
DisarticulationIdeal databaseShould according to most practitioners be theory-neutral, but should serve as a common basis for a number of scientific disciplines to progress.Example: genome databank new kind of science  genome construct arguments about the genetic causation ≠ the process of mapping the genomeData must be reusable by scientists
The data in a database should be easily manipulated by other scientists.Data CollectionBiodiversityLarge-scale databases are being developed for a diverse array of animal and plant groupsWorldwide effortIUBSCODATAIUMSDeal with old dataData was rolled into a theory should rememberAll its own dataPotentially data that had not yet been collected
Data CollectionDeal with old dataDifficultiesScientific paper don’t in general offer enough information to allow an experiment or procedure to be repeated.The distributed database is becoming a new model form of scientific publication in its own rightIssues of UpdateNo automatic update from one field to a cognate oneScientist are not able to share information across discipline divides
Data CollectionInternational TechnosciencePurpose: Narrow the gaps between countriesIssues:People do not have equal knowledgeAccess is never really equalGovernment have doubts of the usefulness of opening the database onto internet.
Distributed Collective Practice
Collaborative WorkManagement structures in universities and industry still tend to support the heroic myth of the individual researcher.What kind of value the large publishing houses add to journal production.Great attention must be paid to the social and organizational setting of technoscientific work
New Knowledge EconomyThree central issuesThe development of flexible, stable data standardThe generation of protocols for data sharingThe restructuring of scientific careers
Accounting For Life
Development of ClassificationIntroduction: PANDORA taxonomic database
Development of ClassificationImportance of classification18th-19th centuries : botanist must know all genera, and commit their names to memory, but cannot be expected to remember all specific names. ( A.J. Cain, 1958)Later part of 19th century: new information technologies developed which permitted the easy storage and coding of larger amounts of data than could previously be easily manipulated. (Chandler,1977),(Yates,1989)
Development of ClassificationExample of classificationPaper-based archival practice.Issues: hard to reclassifiedType specimen had to be relocated physicallySo do Series of articles or books
Development of ClassificationExample of classificationMultifaceted classification systemImprove: Enabling the classifications to be ordered in multiple ways, rather than in a singleExample: A collection of books might be classified using an author facet, a subject facet, a date facet
Development of ClassificationExample of classificationHierarchical classification (for reading the past)E.F. Codd In the early 1970sSplit physical storage of data in the computer and the representation of that data.Disadvantage: becomes awkward to introduce other levels of taxonomic category as an afterthought.Improve method: one record for every name, regardless of its taxonomic level
CladisticsDefinitionIt is a method of classifying species of organisms into groups called clades, which consist of 1) all the descendants of an ancestral organism and 2) the ancestor itself.Features : Give a more regular algorithm for determining phylogenyFocusing attention on shared, derived characteristics of set organismsUsing ‘outgroup’ comparisons to develop the classification system
CladisticsTree of lifeCladists use cladograms, diagrams which show ancestral relations between taxa, to represent the evolutionary tree of lifeCharles Darwin (1809–1882) was the first to produce an evolutionary tree of life
CladisticsTree of life
CladisticsComputer programs in cladisticsUndertaken using Swofford’s (1985) package PAUP version 2.4installed on a Cyber mainframe computer and version 2.4.1 on an amstrad 1512 PCDavid Swofford’s PAUP is a software package for inference of evolutionary treesPurpose: follow a given algorithm for generating and testing cladograms
CladisticsComputer programs in cladistics
CladisticsComputer programs in cladisticsIssues:The packages produce variable results and cannot possibly look at all the possibilities, since there is NP-complete problem.Algorithm issues
The FutureStore the lifeLife is described as itself a program, with DNA being code.IF everything is information, then life can equally well be “stored”
THANK YOU !

More Related Content

PDF
Next generation sequencing requires next generation publishing: the Biodivers...
PPTX
The Research Object Initiative: Frameworks and Use Cases
PDF
Knowledge Organisation Systems in ETDs: A Comparative Study
PPTX
Mtsr2015 goble-keynote
PPTX
Reproducibility, Research Objects and Reality, Leiden 2016
PPT
Current and emerging scientific data curation practices
PPTX
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
PPTX
Thomas ecn 2012
Next generation sequencing requires next generation publishing: the Biodivers...
The Research Object Initiative: Frameworks and Use Cases
Knowledge Organisation Systems in ETDs: A Comparative Study
Mtsr2015 goble-keynote
Reproducibility, Research Objects and Reality, Leiden 2016
Current and emerging scientific data curation practices
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Thomas ecn 2012

What's hot (20)

PPT
Smith T Bio Hdf Bosc2008
PPTX
Research Objects, SEEK and FAIRDOM
PPTX
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
PPTX
Advances in Scientific Workflow Environments
PPTX
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
PPT
The Seven Deadly Sins of Bioinformatics
PPTX
Being FAIR: Enabling Reproducible Data Science
PPTX
E Research Chapter 1
PPT
Disciplinary and institutional perspectives on digital curation
PPTX
Research Objects: more than the sum of the parts
PPTX
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
PDF
Research Shared: researchobject.org
PPTX
Crediting informatics and data folks in life science teams
PPTX
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
PPTX
Data Standards & Best Practices for the Stratigraphic Record
PPT
Moving From Small Science To Big Science
PPT
Moving From Small Science To Big Science
PPTX
The Role of Ontology in the Era of Big Military Data
PPTX
Open Science: how to serve the needs of the researcher?
PPTX
IAO-Intel: An Ontology of Information Artifacts in the Intelligence Domain
Smith T Bio Hdf Bosc2008
Research Objects, SEEK and FAIRDOM
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
Advances in Scientific Workflow Environments
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
The Seven Deadly Sins of Bioinformatics
Being FAIR: Enabling Reproducible Data Science
E Research Chapter 1
Disciplinary and institutional perspectives on digital curation
Research Objects: more than the sum of the parts
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
Research Shared: researchobject.org
Crediting informatics and data folks in life science teams
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Data Standards & Best Practices for the Stratigraphic Record
Moving From Small Science To Big Science
Moving From Small Science To Big Science
The Role of Ontology in the Era of Big Military Data
Open Science: how to serve the needs of the researcher?
IAO-Intel: An Ontology of Information Artifacts in the Intelligence Domain
Ad

Viewers also liked (9)

PPTX
Blog
PPTX
Getting started with delicious
PPT
[Eng] LEAKINT – Leaks Intelligence Use of leak files by intelligence companie...
PPT
RUS: Безопасность детей в соцсетях, на примере одноклассники.ру, для младшей ...
PPT
Practicum slideshow
PDF
Cp indicator
 
PPTX
Family Newsletter
PPT
Practicum slideshow
ODP
Blog
Getting started with delicious
[Eng] LEAKINT – Leaks Intelligence Use of leak files by intelligence companie...
RUS: Безопасность детей в соцсетях, на примере одноклассники.ру, для младшей ...
Practicum slideshow
Cp indicator
 
Family Newsletter
Practicum slideshow
Ad

Similar to Databasing the world (20)

PPT
Data curation issues for repositories
PPT
kantorNSF-NIJ-ISI-03-06-04.ppt
PDF
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
PPT
eScience: A Transformed Scientific Method
PPT
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
PPTX
Metadata standards
PPTX
No Free Lunch: Metadata in the life sciences
PPT
Cyberistructure
PPT
Hedstrom Infrastructure
PDF
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
PPTX
NIH Data Summit - The NIH Data Commons
PPTX
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
PPT
Riding the wave - Paradigm shifts in information access
PPT
Digital Destiny
PDF
Spark Summit Europe: Share and analyse genomic data at scale
PPTX
Life science requirements from e-infrastructure: initial results from a joint...
PPT
Open Archives Initiative Object Reuse and Exchange
PDF
The Human Cell Atlas Data Coordination Platform
PDF
Open Research Data: Licensing | Standards | Future
PPTX
HKU Data Curation MLIM7350 Class 8
Data curation issues for repositories
kantorNSF-NIJ-ISI-03-06-04.ppt
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
eScience: A Transformed Scientific Method
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
Metadata standards
No Free Lunch: Metadata in the life sciences
Cyberistructure
Hedstrom Infrastructure
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
NIH Data Summit - The NIH Data Commons
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Riding the wave - Paradigm shifts in information access
Digital Destiny
Spark Summit Europe: Share and analyse genomic data at scale
Life science requirements from e-infrastructure: initial results from a joint...
Open Archives Initiative Object Reuse and Exchange
The Human Cell Atlas Data Coordination Platform
Open Research Data: Licensing | Standards | Future
HKU Data Curation MLIM7350 Class 8

Recently uploaded (20)

PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
August Patch Tuesday
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PPTX
The various Industrial Revolutions .pptx
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
Getting Started with Data Integration: FME Form 101
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
DOCX
search engine optimization ppt fir known well about this
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
1 - Historical Antecedents, Social Consideration.pdf
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
August Patch Tuesday
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
The various Industrial Revolutions .pptx
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
A comparative study of natural language inference in Swahili using monolingua...
Enhancing emotion recognition model for a student engagement use case through...
Module 1.ppt Iot fundamentals and Architecture
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Developing a website for English-speaking practice to English as a foreign la...
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
sustainability-14-14877-v2.pddhzftheheeeee
Benefits of Physical activity for teenagers.pptx
Getting Started with Data Integration: FME Form 101
Taming the Chaos: How to Turn Unstructured Data into Decisions
search engine optimization ppt fir known well about this
WOOl fibre morphology and structure.pdf for textiles
Assigned Numbers - 2025 - Bluetooth® Document
1 - Historical Antecedents, Social Consideration.pdf

Databasing the world

  • 1. Databasing the World:Biodiversity and the 2000sWritten by Bowker, G. C. Presented by Chen Zhang (Mike)
  • 2. Four Key AspectsDatabase InfrastructureStandards—flexible, stableTechnology—stable CommunicationData SharingOwnershipDisarticulationData collection
  • 3. Four Key AspectsDistributed Collective PracticeCollaborate workNew Knowledge EconomyAccounting for lifeDevelopment of ClassificationCladisticsThe Future
  • 4. Database Infrastructure
  • 5. StandardsWhy do we need standardsExample of air-conditioner industryDiameter Match between screw and the hole on the panelReasons for databaseNeed ‘handshake’ among various mediaMIME<Multipurpose Internet Mail Extensions>protocol Each layer of infrastructure requires its own set of standardsNeed standardized categories.
  • 6. StandardsStandards will not always winSome best-known standardsQWERTY keyboard
  • 7. StandardsStandards will not always winSome best-known standardsVHS (Video Home System) standard
  • 8. StandardsStandards will not always winSome best-known standardsDOS computing system
  • 9. StandardsStandards will not always winWhy?The best standard maybe doesn’t have best marketStandards setting is a key site of political workThe inferior standard may be respected by the political agency. ( Such as standards-setting bodies)
  • 10. StandardsInteroperabilityContinuum of strategies for standards settingOne Standard Fits AllLet A Thousand standards bloom
  • 11. StandardsInteroperabilitySome Related Standards1. ANSI/NISO Z39.50ANSI/NISO Z39.50 is the American National Standard Information Retrieval Application Service Definition and Protocol Specification for Open Systems Interconnection. IT makes it easier to use large information databases by standardizing the procedures and features for searching and retrieving information.
  • 13. StandardsInteroperabilitySome Related Standards1. ANSI/NISO Z39.50 A single enquiry over multiple databases. widely adopter in the library world.
  • 14. StandardsInteroperabilitySome Related Standards2. XMLExtensible Markup Language (XML) is a set of rules for encoding documents in machine-readable form. Two extremes: a. Colonial modelb. Democratic model (win out) People’s established computing environment
  • 15. TechnologyTechnology must be stableNothing to guarantee the stability of vast data setsFailure of Paul Otlet’s well catalogued microfichesDevelopment of computer memoryHard to retrieve information
  • 16. TechnologyTechnology must stableData accessible and usableInfrastructure will require a continued maintenance effortReasons a. Data is passed from one medium to anotherb. Data is analyzed by one generation of database technology to the next.
  • 17. Issues of CommunicationProblem of reliable metadataMetadata—data about dataThe blue lines are metadata
  • 18. Issues of CommunicationProblem of reliable metadataThe standard name of certain kinds of dataSearchable—easy to search over multiple databaseIssue—how detail does the name of data should be?Lack of details— the information of data is uselessToo many details— longer time, more work
  • 19. Issues of CommunicationDublin codeThe Dublin Core set ofmetadata elements provides a small and fundamental group of text elements through which most resources can be described and cataloged.The Simple Dublin Core Metadata Element Set (DCMES) consists of 15 metadata elements:LanguageRelationCoverageRightsTitle Creator Subject Description Publisher ContributorDateTypeFormatIdentifierSource
  • 21. OwnershipControl of knowledgeMid-nineteenth century: only professionally trained scientists and doctors New information economy: from many peopleExample: patients group
  • 22. OwnershipPrivacyKeep data private is difficult : Example: data is complied by third-company to generate a new, marketable form of knowledgeNew Patterns of ownershipScience has frequently been analyzed as a “public good”Increasing privatization of knowledge : It is unclear to what extent the vaunted openness of the scientific community will last
  • 23. DisarticulationIdeal databaseShould according to most practitioners be theory-neutral, but should serve as a common basis for a number of scientific disciplines to progress.Example: genome databank new kind of science  genome construct arguments about the genetic causation ≠ the process of mapping the genomeData must be reusable by scientists
  • 24. The data in a database should be easily manipulated by other scientists.Data CollectionBiodiversityLarge-scale databases are being developed for a diverse array of animal and plant groupsWorldwide effortIUBSCODATAIUMSDeal with old dataData was rolled into a theory should rememberAll its own dataPotentially data that had not yet been collected
  • 25. Data CollectionDeal with old dataDifficultiesScientific paper don’t in general offer enough information to allow an experiment or procedure to be repeated.The distributed database is becoming a new model form of scientific publication in its own rightIssues of UpdateNo automatic update from one field to a cognate oneScientist are not able to share information across discipline divides
  • 26. Data CollectionInternational TechnosciencePurpose: Narrow the gaps between countriesIssues:People do not have equal knowledgeAccess is never really equalGovernment have doubts of the usefulness of opening the database onto internet.
  • 28. Collaborative WorkManagement structures in universities and industry still tend to support the heroic myth of the individual researcher.What kind of value the large publishing houses add to journal production.Great attention must be paid to the social and organizational setting of technoscientific work
  • 29. New Knowledge EconomyThree central issuesThe development of flexible, stable data standardThe generation of protocols for data sharingThe restructuring of scientific careers
  • 31. Development of ClassificationIntroduction: PANDORA taxonomic database
  • 32. Development of ClassificationImportance of classification18th-19th centuries : botanist must know all genera, and commit their names to memory, but cannot be expected to remember all specific names. ( A.J. Cain, 1958)Later part of 19th century: new information technologies developed which permitted the easy storage and coding of larger amounts of data than could previously be easily manipulated. (Chandler,1977),(Yates,1989)
  • 33. Development of ClassificationExample of classificationPaper-based archival practice.Issues: hard to reclassifiedType specimen had to be relocated physicallySo do Series of articles or books
  • 34. Development of ClassificationExample of classificationMultifaceted classification systemImprove: Enabling the classifications to be ordered in multiple ways, rather than in a singleExample: A collection of books might be classified using an author facet, a subject facet, a date facet
  • 35. Development of ClassificationExample of classificationHierarchical classification (for reading the past)E.F. Codd In the early 1970sSplit physical storage of data in the computer and the representation of that data.Disadvantage: becomes awkward to introduce other levels of taxonomic category as an afterthought.Improve method: one record for every name, regardless of its taxonomic level
  • 36. CladisticsDefinitionIt is a method of classifying species of organisms into groups called clades, which consist of 1) all the descendants of an ancestral organism and 2) the ancestor itself.Features : Give a more regular algorithm for determining phylogenyFocusing attention on shared, derived characteristics of set organismsUsing ‘outgroup’ comparisons to develop the classification system
  • 37. CladisticsTree of lifeCladists use cladograms, diagrams which show ancestral relations between taxa, to represent the evolutionary tree of lifeCharles Darwin (1809–1882) was the first to produce an evolutionary tree of life
  • 39. CladisticsComputer programs in cladisticsUndertaken using Swofford’s (1985) package PAUP version 2.4installed on a Cyber mainframe computer and version 2.4.1 on an amstrad 1512 PCDavid Swofford’s PAUP is a software package for inference of evolutionary treesPurpose: follow a given algorithm for generating and testing cladograms
  • 41. CladisticsComputer programs in cladisticsIssues:The packages produce variable results and cannot possibly look at all the possibilities, since there is NP-complete problem.Algorithm issues
  • 42. The FutureStore the lifeLife is described as itself a program, with DNA being code.IF everything is information, then life can equally well be “stored”