SlideShare a Scribd company logo
A story of data won, data lost and
data re-found: the realities of
ecological data preservation
Alison Specht
School of Earth and Environmental Sciences
with collaborators
Matt Bolton, Lee Belbin and Bryn Kingsford.
The story
Introduction
Chapter 1
Chapter 2
Chapter 3
Reflections
Specht A. Nov. 2020
The story
Introduction
Chapter 1
Chapter 2
Chapter 3
Reflections
Specht A. Nov. 2020
Introduction: global changes in science
Specht A. Nov. 2020
Introduction: global changes in science
• Integrating the internationalisation of science
• The race for the moon and stars (the IGY)
• The International Biological Program (IBP)
Specht A. Nov. 2020
Introduction: global changes in science
• Integrating the internationalisation of science
• The race for the moon and stars (the IGY)
• The International Biological Program (IBP)
Specht A. Nov. 2020
Introduction: global changes in science
• Integrating the internationalisation of science
• The race for the moon and stars (the IGY)
• The International Biological Program (IBP)
Specht A. Nov. 2020
Introduction: The IBP, 1964-1974
Was to ensure a world-wide study of1:
(a) organic production on the land, in freshwaters and in the seas,
and the potentialities and uses of new as well as of existing
natural resources, and
(b) human adaptability to changing conditions.
In addition:
• to strengthen scientific support for developing nations through
international collaboration.
• ensure biological productivity for the benefit of humans.
1Worthington E.B. (1965) Nature 208: 223-226.
Specht A. Nov. 2020
Introduction: The IBP, 1964-1974
Was to ensure a world-wide study of1:
(a) organic production on the land, in freshwaters and in the seas,
and the potentialities and uses of new as well as of existing
natural resources, and
(b) human adaptability to changing conditions.
In addition:
• to strengthen scientific support for developing nations through
international collaboration.
• ensure biological productivity for the benefit of humans.
1Worthington E.B. (1965) Nature 208: 223-226.
Specht A. Nov. 2020
Introduction: The IBP, 1964-1974
Was to ensure a world-wide study of1:
(a) organic production on the land, in freshwaters and in the seas,
and the potentialities and uses of new as well as of existing
natural resources, and
(b) human adaptability to changing conditions.
In addition:
• to strengthen scientific support for developing nations through
international collaboration.
• ensure biological productivity for the benefit of humans.
1Worthington E.B. (1965) Nature 208: 223-226.
Specht A. Nov. 2020
Introduction: The IBP, 1964-1974
Was to ensure a world-wide study of1:
(a) organic production on the land, in freshwaters and in the seas,
and the potentialities and uses of new as well as of existing
natural resources, and
(b) human adaptability to changing conditions.
In addition:
• to strengthen scientific support for developing nations through
international collaboration.
• ensure biological productivity for the benefit of humans.
1Worthington E.B. (1965) Nature 208: 223-226.
Specht A. Nov. 2020
Introduction: Organisation of the IBP
Seven scientific areas, each with a chair and section
committee, were established:
PT: productivity of terrestrial communities
PP: production processes
CT: conservation of terrestrial communities*
PF: productivity of freshwater communities
PM: productivity of marine communities
HA: human adaptability
UM: use and management of biological resources.
• the Royal Society agreed to host the Special Committee of the IBP
• No international funding was allocated (except for the SCIBP)
* Max Nicholson the International Chair, co-founder of the WWF, IUCN etc
Specht A. Nov. 2020
Introduction: Organisation of the IBP
Seven scientific areas, each with a chair and section
committee, were established:
PT: productivity of terrestrial communities
PP: production processes
CT: conservation of terrestrial communities*
PF: productivity of freshwater communities
PM: productivity of marine communities
HA: human adaptability
UM: use and management of biological resources.
• the Royal Society agreed to host the Special Committee of the IBP
• No international funding was allocated (except for the SCIBP)
* Max Nicholson the International Chair, co-founder of the WWF, IUCN etc
Specht A. Nov. 2020
Introduction: Organisation of the IBP
Seven scientific areas, each with a chair and section
committee, were established:
PT: productivity of terrestrial communities
PP: production processes
CT: conservation of terrestrial communities*
PF: productivity of freshwater communities
PM: productivity of marine communities
HA: human adaptability
UM: use and management of biological resources.
• the Royal Society agreed to host the Special Committee of the IBP
• No international funding was allocated (except for the SCIBP)
* Max Nicholson the International Chair, co-founder of the WWF, IUCN etc
Specht A. Nov. 2020
Introduction: In Australia
A national IBP committee was established under the
AAS, and the section structure rationalised1.
PT: productivity of terrestrial communities
PP: production processes
CT: conservation of terrestrial communities
PF: productivity of freshwater communities
PM: productivity of marine communities
HA: human adaptability
UM: use and management of biological resources.
1Frankel O.H. (1966) Aust. J. Science 28(8): 324-326
Specht A. Nov. 2020
Introduction: In Australia
A national IBP committee was established under the
AAS, and the section structure rationalised1.
PT: productivity of terrestrial communities
PP: production processes
CT: conservation of terrestrial communities
PF: productivity of freshwater communities
PM: productivity of marine communities
HA: human adaptability
UM: use and management of biological resources.
1Frankel O.H. (1966) Aust. J. Science 28(8): 324-326
Specht A. Nov. 2020
Section PCT
Chair
Ray Specht
Chair
G. Humphrey
Introduction: In Australia
A national IBP committee was established under the
AAS, and the section structure rationalised1.
PT: productivity of terrestrial communities
PP: production processes
CT: conservation of terrestrial communities
PF: productivity of freshwater communities
PM: productivity of marine communities
HA: human adaptability
UM: use and management of biological resources.
1Frankel O.H. (1966) Aust. J. Science 28(8): 324-326
Specht A. Nov. 2020
Section PCT
Chair
Ray Specht
Chair
G. Humphrey
The story
Introduction
Chapter 1
Chapter 2
Chapter 3
reflections
Specht A. Nov. 2020
Chapter 1: IBP-CT conservation of
terrestrial communities
• 1964 Max Nicholson visited Australia
• 1966 RL Specht1 outlined the CT plan for a
conservation assessment of major plant
communities in Australia and PNG. No
national funding was made available.
• 1974 the ‘Conservation survey of major plant
communities.’2 was published. This was
supported by a team of state and territory
convenors and Dr Geoff Mosely of the
Australian Conservation Foundation.
1 Specht (1966) Aust. J. Science 28(10): 377-380.
2 Specht, Roe and Boughton (1974) Aust. J. Bot. Suppl no. 7
Specht A. Nov. 2020
Chapter 1: IBP-CT conservation of
terrestrial communities
• 1964 Max Nicholson visited Australia
• 1966 RL Specht1 outlined the CT plan for a
conservation assessment of major plant
communities in Australia and PNG. No
national funding was made available.
• 1974 the ‘Conservation survey of major plant
communities.’2 was published. This was
supported by a team of state and territory
convenors and Dr Geoff Mosely of the
Australian Conservation Foundation.
1 Specht (1966) Aust. J. Science 28(10): 377-380.
2 Specht, Roe and Boughton (1974) Aust. J. Bot. Suppl no. 7
Specht A. Nov. 2020
Chapter 1: IBP-CT conservation of
terrestrial communities
• 1964 Max Nicholson visited Australia
• 1966 RL Specht1 outlined the CT plan for a
conservation assessment of major plant
communities in Australia and PNG. No
national funding was made available.
• 1974 the ‘Conservation survey of major plant
communities.’2 was published. This was
supported by a team of state and territory
convenors and Dr Geoff Mosely of the
Australian Conservation Foundation.
1 Specht (1966) Aust. J. Science 28(10): 377-380.
2 Specht, Roe and Boughton (1974) Aust. J. Bot. Suppl no. 7
Specht A. Nov. 2020
The story
Introduction
Chapter 1
Chapter 2
Chapter 3
Reflections
Specht A. Nov. 2020
Chapter 2: IBP-CT conservation of
terrestrial communities
1975. Make an objective assessment
Aim:
(i) collate and harmonise all the existing vegetation survey data across
Australia,
(ii) convert paper to digital using the new sophisticated computer
systems,
(iii) use the new non-parametric analyses available through
CSIRONET to define broad plant formation / vegetation complexes,
and
(iv) assess their conservation adequacy.
• Outcome: a basis for decisions on CAR reserves
Specht A. Nov. 2020
Chapter 2: IBP-CT conservation of
terrestrial communities
1975. Make an objective assessment
Aim:
(i) collate and harmonise all the existing vegetation survey data across
Australia,
(ii) convert paper to digital using the new sophisticated computer
systems,
(iii) use the new non-parametric analyses available through
CSIRONET to define broad plant formation / vegetation complexes,
and
(iv) assess their conservation adequacy.
• Outcome: a basis for decisions on CAR reserves
Specht A. Nov. 2020
Chapter 2: IBP-CT conservation of
terrestrial communities
1975. Make an objective assessment
Aim:
(i) collate and harmonise all the existing vegetation survey data across
Australia,
(ii) convert paper to digital using the new sophisticated computer
systems,
(iii) use the new non-parametric analyses available through
CSIRONET to define broad plant formation / vegetation complexes,
and
(iv) assess their conservation adequacy.
• Outcome: a basis for decisions on CAR reserves
Specht A. Nov. 2020
Chapter 2: Collation and extraction of data
Specht A. Nov. 2020
Chapter 2: Collation and extraction of data
Specht A. Nov. 2020
Chapter 2: Collation and extraction of data
Specht A. Nov. 2020
Chapter 2: Collation and extraction of data
Specht A. Nov. 2020
Due to computing capacity, the data were organized into state x formation
datasets, and codes were employed for many items rather than names.
State: N = New South Wales, V = Victoria, T = Tasmania etc.
Formation: Closed forests, chenopod shrubland, desert acacia etc.
Chapter 2: Data organization and entry
LINE IDInformation
800000 N
503200 LOCATION N032 = CENTRAL COAST: SYDNEY (PIDGEON 1940)
903200 33 51 151 13
503201 COMMUNITY 01 = FRESHWATER RIVER (COMBINED LIST)
003201 UTRIAUST UTRIEXOL UTRIBILO VALLGIGA POTAOCHR POTAPERF POTATRIC BRASSCHR #
003201 NAJAMARI MYRIPROP PHRAAUST ELEOCHAR* TYPHORIE TYPHDOMI TRIGPROC TRIGSTRI #
003201 JUNCPAUC JUNCPALL JUNCPLAN AGROAVEN GAHNIA__* CASUCUNN MELALINA MELASTYP #
003201 CALLSALI EUCAROBU EUCAAMPL CAREX___* ISOLPROL VILLRENI ALISPLAN RANURIVU #
003201 GRATPUBE GOODPANI HYDRPEDU CENTASIA VIOLHEDE PRUNVULG STELFLAC SCHOAPOG #
003201 OPLIIMBE BLECINDI ADIAAETH PHILLANU #
503202 COMMUNITY 02 = FRESHWATER SWAMPS ON WIND BLOWN SAND (PORT STEPHENS)
003202 BAUMTERE BAUMARTI TRIGPROC TRIGSTRI PHILLANU LEPIARTI MELAQUIN EUCAROBU #
003202 ISOLINUN GRATPEDU DROSSPAT VILLRENI BAUMJUNC SCHOBREV RESTAUST LEPTTENA #
003202 RESTTETR SPREINCA BOROPARV EPACOBTU GONOMICR BLECINDI HYDRTRIP SPHAGNUM* #
003202 VIOLHEDE #
500000 -------------------------------
Specht A. Nov. 2020
Due to computing capacity, the data were organized into state x formation
datasets, and codes were employed for many items rather than names.
State: N = New South Wales, V = Victoria, T = Tasmania etc.
Formation: Closed forests, chenopod shrubland, desert acacia etc.
Chapter 2: Data organization and entry
LINE IDInformation
800000 N
503200 LOCATION N032 = CENTRAL COAST: SYDNEY (PIDGEON 1940)
903200 33 51 151 13
503201 COMMUNITY 01 = FRESHWATER RIVER (COMBINED LIST)
003201 UTRIAUST UTRIEXOL UTRIBILO VALLGIGA POTAOCHR POTAPERF POTATRIC BRASSCHR #
003201 NAJAMARI MYRIPROP PHRAAUST ELEOCHAR* TYPHORIE TYPHDOMI TRIGPROC TRIGSTRI #
003201 JUNCPAUC JUNCPALL JUNCPLAN AGROAVEN GAHNIA__* CASUCUNN MELALINA MELASTYP #
003201 CALLSALI EUCAROBU EUCAAMPL CAREX___* ISOLPROL VILLRENI ALISPLAN RANURIVU #
003201 GRATPUBE GOODPANI HYDRPEDU CENTASIA VIOLHEDE PRUNVULG STELFLAC SCHOAPOG #
003201 OPLIIMBE BLECINDI ADIAAETH PHILLANU #
503202 COMMUNITY 02 = FRESHWATER SWAMPS ON WIND BLOWN SAND (PORT STEPHENS)
003202 BAUMTERE BAUMARTI TRIGPROC TRIGSTRI PHILLANU LEPIARTI MELAQUIN EUCAROBU #
003202 ISOLINUN GRATPEDU DROSSPAT VILLRENI BAUMJUNC SCHOBREV RESTAUST LEPTTENA #
003202 RESTTETR SPREINCA BOROPARV EPACOBTU GONOMICR BLECINDI HYDRTRIP SPHAGNUM* #
003202 VIOLHEDE #
500000 -------------------------------
source
Specht A. Nov. 2020
Due to computing capacity, the data were organized into state x formation
datasets, and codes were employed for many items rather than names.
State: N = New South Wales, V = Victoria, T = Tasmania etc.
Formation: Closed forests, chenopod shrubland, desert acacia etc.
Chapter 2: Data organization and entry
LINE IDInformation
800000 N
503200 LOCATION N032 = CENTRAL COAST: SYDNEY (PIDGEON 1940)
903200 33 51 151 13
503201 COMMUNITY 01 = FRESHWATER RIVER (COMBINED LIST)
003201 UTRIAUST UTRIEXOL UTRIBILO VALLGIGA POTAOCHR POTAPERF POTATRIC BRASSCHR #
003201 NAJAMARI MYRIPROP PHRAAUST ELEOCHAR* TYPHORIE TYPHDOMI TRIGPROC TRIGSTRI #
003201 JUNCPAUC JUNCPALL JUNCPLAN AGROAVEN GAHNIA__* CASUCUNN MELALINA MELASTYP #
003201 CALLSALI EUCAROBU EUCAAMPL CAREX___* ISOLPROL VILLRENI ALISPLAN RANURIVU #
003201 GRATPUBE GOODPANI HYDRPEDU CENTASIA VIOLHEDE PRUNVULG STELFLAC SCHOAPOG #
003201 OPLIIMBE BLECINDI ADIAAETH PHILLANU #
503202 COMMUNITY 02 = FRESHWATER SWAMPS ON WIND BLOWN SAND (PORT STEPHENS)
003202 BAUMTERE BAUMARTI TRIGPROC TRIGSTRI PHILLANU LEPIARTI MELAQUIN EUCAROBU #
003202 ISOLINUN GRATPEDU DROSSPAT VILLRENI BAUMJUNC SCHOBREV RESTAUST LEPTTENA #
003202 RESTTETR SPREINCA BOROPARV EPACOBTU GONOMICR BLECINDI HYDRTRIP SPHAGNUM* #
003202 VIOLHEDE #
500000 -------------------------------
programmes (e.g. basic, fortran) could not handle long names
(e.g. species names) so code was required. In this example,
EUCAROBU = Eucalyptus robusta (at the time)
source
Two lists for this location
Specht A. Nov. 2020
Due to computing capacity, the data were organized into state x formation
datasets, and codes were employed for many items rather than names.
State: N = New South Wales, V = Victoria, T = Tasmania etc.
Formation: Closed forests, chenopod shrubland, desert acacia etc.
Chapter 2: Data organization and entry
LINE IDInformation
800000 N
503200 LOCATION N032 = CENTRAL COAST: SYDNEY (PIDGEON 1940)
903200 33 51 151 13
503201 COMMUNITY 01 = FRESHWATER RIVER (COMBINED LIST)
003201 UTRIAUST UTRIEXOL UTRIBILO VALLGIGA POTAOCHR POTAPERF POTATRIC BRASSCHR #
003201 NAJAMARI MYRIPROP PHRAAUST ELEOCHAR* TYPHORIE TYPHDOMI TRIGPROC TRIGSTRI #
003201 JUNCPAUC JUNCPALL JUNCPLAN AGROAVEN GAHNIA__* CASUCUNN MELALINA MELASTYP #
003201 CALLSALI EUCAROBU EUCAAMPL CAREX___* ISOLPROL VILLRENI ALISPLAN RANURIVU #
003201 GRATPUBE GOODPANI HYDRPEDU CENTASIA VIOLHEDE PRUNVULG STELFLAC SCHOAPOG #
003201 OPLIIMBE BLECINDI ADIAAETH PHILLANU #
503202 COMMUNITY 02 = FRESHWATER SWAMPS ON WIND BLOWN SAND (PORT STEPHENS)
003202 BAUMTERE BAUMARTI TRIGPROC TRIGSTRI PHILLANU LEPIARTI MELAQUIN EUCAROBU #
003202 ISOLINUN GRATPEDU DROSSPAT VILLRENI BAUMJUNC SCHOBREV RESTAUST LEPTTENA #
003202 RESTTETR SPREINCA BOROPARV EPACOBTU GONOMICR BLECINDI HYDRTRIP SPHAGNUM* #
003202 VIOLHEDE #
500000 -------------------------------
programmes (e.g. basic, fortran) could not handle long names
(e.g. species names) so code was required. In this example,
EUCAROBU = Eucalyptus robusta (at the time)
Latitude longitude (degrees)
source
Two lists for this location
Specht A. Nov. 2020
Due to computing capacity, the data were organized into state x formation
datasets, and codes were employed for many items rather than names.
State: N = New South Wales, V = Victoria, T = Tasmania etc.
Formation: Closed forests, chenopod shrubland, desert acacia etc.
Chapter 2: Data organization and entry
LINE IDInformation
800000 N
503200 LOCATION N032 = CENTRAL COAST: SYDNEY (PIDGEON 1940)
903200 33 51 151 13
503201 COMMUNITY 01 = FRESHWATER RIVER (COMBINED LIST)
003201 UTRIAUST UTRIEXOL UTRIBILO VALLGIGA POTAOCHR POTAPERF POTATRIC BRASSCHR #
003201 NAJAMARI MYRIPROP PHRAAUST ELEOCHAR* TYPHORIE TYPHDOMI TRIGPROC TRIGSTRI #
003201 JUNCPAUC JUNCPALL JUNCPLAN AGROAVEN GAHNIA__* CASUCUNN MELALINA MELASTYP #
003201 CALLSALI EUCAROBU EUCAAMPL CAREX___* ISOLPROL VILLRENI ALISPLAN RANURIVU #
003201 GRATPUBE GOODPANI HYDRPEDU CENTASIA VIOLHEDE PRUNVULG STELFLAC SCHOAPOG #
003201 OPLIIMBE BLECINDI ADIAAETH PHILLANU #
503202 COMMUNITY 02 = FRESHWATER SWAMPS ON WIND BLOWN SAND (PORT STEPHENS)
003202 BAUMTERE BAUMARTI TRIGPROC TRIGSTRI PHILLANU LEPIARTI MELAQUIN EUCAROBU #
003202 ISOLINUN GRATPEDU DROSSPAT VILLRENI BAUMJUNC SCHOBREV RESTAUST LEPTTENA #
003202 RESTTETR SPREINCA BOROPARV EPACOBTU GONOMICR BLECINDI HYDRTRIP SPHAGNUM* #
003202 VIOLHEDE #
500000 -------------------------------
programmes (e.g. basic, fortran) could not handle long names
(e.g. species names) so code was required. In this example,
EUCAROBU = Eucalyptus robusta (at the time)
Latitude longitude (degrees)
source
Two lists for this location
Formation Locations Communities Species*
Closed forests n/a 644 1,418
Dry scrubs – SE Queensland 232 232 475
Dry scrubs – Northern Territory n/a 1,219 559
Eucalypt open-forests and woodlands (tree species) 201 1,275 276
Sclerophyll vegetation SW Western Australia 64 172 1,761
Sclerophyll vegetation Central and Eastern Australia 188 549 2,581**
Sclerophyll vegetation – heathland and tall shrubland 136 312 2,071**
Alpine vegetation 73 61 556
Savanna understorey 56 198 1,313
Mallee open-scrub 28 41 395
Desert Acacia 54 148 1,229
Chenopod shrubland 30 68 410
Forested wetlands (including brigalow) 31 36 193
Arid wetlands 20 42 642
Freshwater swamp vegetation 80 80 139
Coastal dune vegetation 45 56 315
Coastal wetland vegetation (mangroves and saltmarshes) n/a 15 74
* Not including introduced species or singletons within the formation.
** Not including tree species > 10m tall
Specht A. Nov. 2020
Chapter 2: Data handling
• Entry and storage
• Desktop computers were used for
data entry to UQ’s PDP-10. 9-track
magnetic tapes were used as
regular backup.
• Analysis
• Analysis on CSIRONET mainframe
computer (TAXON & TWINSPAN).
• Hard copies (as in print-outs for
proofing and run outputs) obtained
throughout.
• Data processing
• Described in a procedures manual
(CAVE: M.P. Bolton)
Specht A. Nov. 2020
Chapter 2: Data handling
• Entry and storage
• Desktop computers were used for
data entry to UQ’s PDP-10. 9-track
magnetic tapes were used as
regular backup.
• Analysis
• Analysis on CSIRONET mainframe
computer (TAXON & TWINSPAN).
• Hard copies (as in print-outs for
proofing and run outputs) obtained
throughout.
• Data processing
• Described in a procedures manual
(CAVE: M.P. Bolton)
Specht A. Nov. 2020
Chapter 2: Data handling
• Entry and storage
• Desktop computers were used for
data entry to UQ’s PDP-10. 9-track
magnetic tapes were used as
regular backup.
• Analysis
• Analysis on CSIRONET mainframe
computer (TAXON & TWINSPAN).
• Hard copies (as in print-outs for
proofing and run outputs) obtained
throughout.
• Data processing
• Described in a procedures manual
(CAVE: M.P. Bolton)
Specht A. Nov. 2020
Chapter 2: Result
1995: 911 objectively-defined plant
communities, mapped, keys for their
identification, their conservation status,
and biogeographic regionalization…
Specht A. Nov. 2020
Chapter 2: Result
1995: 911 objectively-defined plant
communities, mapped, keys for their
identification, their conservation status,
and biogeographic regionalization…
Specht A. Nov. 2020
Chapter 2: Result
Specht R.L., Specht A., Whelan M. Hegarty E. (1995) Conservation Atlas of Plant Communities in Australia. Centre for Coastal Management
and Southern Cross University Press. (2kg)
Specht R.L. and Specht A. (2002) Objective classification of plant communities in tropical and subtropical Australia. Proceedings of the
Royal Society of Queensland 110: 65-82.
Specht R.L.. and Specht A. (2013) Australia: Biodiversity of Ecosystems. In, The Encyclopedia of Biodiversity Vol. 1 (ed. B. Levin, et al.) pp
291-306. Waltham, MA: Academic Press.
1995: 911 objectively-defined plant
communities, mapped, keys for their
identification, their conservation status,
and biogeographic regionalization…
Specht A. Nov. 2020
Chapter 2: Result
Specht R.L., Specht A., Whelan M. Hegarty E. (1995) Conservation Atlas of Plant Communities in Australia. Centre for Coastal Management
and Southern Cross University Press. (2kg)
Specht R.L. and Specht A. (2002) Objective classification of plant communities in tropical and subtropical Australia. Proceedings of the
Royal Society of Queensland 110: 65-82.
Specht R.L.. and Specht A. (2013) Australia: Biodiversity of Ecosystems. In, The Encyclopedia of Biodiversity Vol. 1 (ed. B. Levin, et al.) pp
291-306. Waltham, MA: Academic Press.
1995: 911 objectively-defined plant
communities, mapped, keys for their
identification, their conservation status,
and biogeographic regionalization…
Specht A. Nov. 2020
The story
Introduction
Chapter 1
Chapter 2
Chapter 3
Reflections
Specht A. Nov. 2020
But what about the data?
• In 1995 there was no ‘home’ for the data, so the data were
‘saved’ on the magnetic tapes and subsequently exabyte
tapes when the main-frame reader was de-commissioned.
The print-outs were conserved.
• High-level data for biogeographical analysis (by PATN) was
saved on excel
• So there they sat…until someone cared…
Specht A. Nov. 2020
Why should we care?
Value proposition
These are heritage data. They were collected on field trips from 1879-
1989 and provide unique records for comparison.
Repeating initial project work would be painful, if not impossible
Opportunities in the 2010s
new data initiatives were emerging and globally linked
AND, key members of the team were still alive, personally invested
and new team members identified.
👉 Can we save the data and finally make it available?
Specht A. Nov. 2020
Why should we care?
Value proposition
These are heritage data. They were collected on field trips from 1879-
1989 and provide unique records for comparison.
Repeating initial project work would be painful, if not impossible
Opportunities in the 2010s
new data initiatives were emerging and globally linked
AND, key members of the team were still alive, personally invested
and new team members identified.
👉 Can we save the data and finally make it available?
Specht A. Nov. 2020
Why should we care?
Value proposition
These are heritage data. They were collected on field trips from 1879-
1989 and provide unique records for comparison.
Repeating initial project work would be painful, if not impossible
Opportunities in the 2010s
new data initiatives were emerging and globally linked
AND, key members of the team were still alive, personally invested
and new team members identified.
👉 Can we save the data and finally make it available?
Specht A. Nov. 2020
Chapter 3: what do we need to do?
• Recover available data
• Design an appropriate structure
• Update the species codes/names
to current nomenclature
• Update georeferencing and check
errors
• Map the fields used in the
Conservation Atlas project to the
Darwin Core standard
• Publish the data in an open
repository
Terrestrial Ecosystem Research NetworkSpecht A. Nov. 2020
Chapter 3: Recover available data
• Challenge: (a) find exabyte tapes, and (b) an
exabyte tape reader.
• While waiting, started on the print-outs:
• Master sites file (location, formation &
community data)
• Reference file (source data)
• Finally, the last exabyte tape reader in captivity was
found (and about to be de-commissioned)!
• Two major challenges remained: updating
georeferences and species names
Specht A. Nov. 2020
Georeferencing
On-line resources such
as the Biodiversity
Heritage Library, the
National Library of
Australia
Maps in Appendices often not scanned
in digital copies of old journals
Original locations were accurate to half a
degree, so the team did four things:
• Reviewed original documents and
where possible contacted authors to
update locations
• Checked locations on google maps
• Checked locations on the ALA's
Spatial Portal so vegetation and soil
type could be displayed for checking
• Mapped data repeatedly on the ALA
sandbox site.
Co-ordinate precision was then
estimated to reflect confidence in the
range of the community.
Original articles
Specht A. Nov. 2020
Species names
Sequential
row
number
Validity
and
Growth
habit flag species code Original scientific name
Scientific names updated
during Conservation Atlas
project
2 L G ABELMOSC Abelmoschus moschatus
19 LMG ACACARGY Acacia argyrodendron
20 SZG ACACARMA -> ACACPARA Acacia armata Acacia paradoxa
21 MLG ACACASHA -> ACACOSHA Acacia ashanesii Acacia oshanesii
174 S G ACAKEMP Acacia sp. aff. A. sibirica Acacia sp. aff. A. kempeana
466 S G BORRCARP/ -> SPERSTEN/ Borreria sp. aff. carpentariae Spermacoce sp. aff. stenophylla
704 S G CARPAEQU -> CARPMODE Carpobrotus aequilaterus Carpobrotus modestus
705 L G CARPMODE Carpobrotus modestus
CODES to NAMES
• apply master species conversion file
• blend across formations (with caution as some species names
are location- and formation-specific)
Specht A. Nov. 2020
Update to current nomenclature
CODE Meaning action
MATCH Near-exact match or better accept
PARTIAL-L and PARTIAL-R A significant substring match manual check
FUZZY
Fuzzy matching algorithm built on the score
from the web service using a 'letter-pair
similarity' score
manual check
WEAK
A weak match falling below thresholds; the
best match is retained
manual check
TAXM
No match or major problem with original or
subsequent species name
refer to expert
Stage 1. Current name check
Due to the size of the data set, the Atlas of Living Australia web service
lookup (BIE) was employed, with codes allocated for follow-up (or not).
Stage 2. Validation
Stage 3. Reference to an expert
Resources used included:
1. On-line national species records
2. State species records
3. Books and papers
4. Experts
Specht A. Nov. 2020
Chapter 3: Map the fields used to the
Darwin Core standard
Specht A. Nov. 2020
Specht A. Nov. 2020
Chapter 3: spatial view
Specht A. Nov. 2020
Chapter 3: spatial view
Specht A. Nov. 2020
Chapter 3: spatial view
Specht A. Nov. 2020
Chapter 3: spatial view
Specht A. Nov. 2020
Chapter 3: spatial view
Specht A. Nov. 2020
Specht A. Nov. 2020
Specht A. Nov. 2020
Chapter 3: spatial view
Chapter 3: Data delivery
• Ingested into the Atlas of Living Australia as a collection,
discoverable through species records with associated
metadata:
• https://collections.ala.org.au/public/show/dr8212
• Delivered as csv and excel with associated code for
replication in the Knowledge Network for Biocomplexity:
• http://doi.org/10.5063/F1QC01QK
• Perhaps one day discoverable as survey information
Specht A. Nov. 2020
The story
Introduction
Chapter 1
Chapter 2
Chapter 3
Reflections
Specht A. Nov. 2020
Reflection: How did we do?
 Data saved, updated and deposited for future use in two
stable repositories.
 9450 taxa found in 1390 communities at 461 locations
across the continent of Australia, between 1879 and 1989
from 714 sources. This is a lot!
Sadly this represents only around half of the
material used in Chapter 2 (the Atlas). Why?
Specht A. Nov. 2020
1° loss of data on transfer from magnetic tape to
exabyte tape back in 1991. And it appears in some
instances those data cannot be found elsewhere.
Reflection: How did we do?
 Data saved, updated and deposited for future use in two
stable repositories.
 9450 taxa found in 1390 communities at 461 locations
across the continent of Australia, between 1879 and 1989
from 714 sources. This is a lot!
Sadly this represents only around half of the
material used in Chapter 2 (the Atlas). Why?
Specht A. Nov. 2020
1° loss of data on transfer from magnetic tape to
exabyte tape back in 1991. And it appears in some
instances those data cannot be found elsewhere.
Reflection: How did we do?
 Data saved, updated and deposited for future use in two
stable repositories.
 9450 taxa found in 1390 communities at 461 locations
across the continent of Australia, between 1879 and 1989
from 714 sources. This is a lot!
Sadly this represents only around half of the
material used in Chapter 2 (the Atlas). Why?
Specht A. Nov. 2020
1° loss of data on transfer from magnetic tape to
exabyte tape back in 1991. And it appears in some
instances those data cannot be found elsewhere.
Reflection: take home messages
• Researchers and field workers need access to / expertise
in data science skills (FAIR)
• Data (with metadata) need to be deposited as soon as
possible after creation.
• We need to think ahead: today’s technology will be
outmoded in 5-10 years.
• We need to have repositories that are open, secure, and
properly managed (TRUST)
• Team with others, especially those with diverse but
relevant skills.
Without this more data will be lost than were ever
gathered.
Specht A. Nov. 2020
Thankyou!
Contact: a.specht@uq.edu.au
School of Earth and Environmental Sciences
Ecosystem Research Analyst, TERN
The University of Queensland, Australia
Biodiversity Data Journal https://bdj.pensoft.net/article/28073/
Knowledge Network for Biocomplexity https://knb.ecoinformatics.org/#view/doi:10.5063/F1QC01QK
Atlas of Living Australia https://collections.ala.org.au/public/show/dr8212
Specht A. Nov. 2020

More Related Content

PDF
Synthetic biology and energy from the sun future is bright for food, feed &am...
PPT
Ecology
PDF
RESUME(PAUL + NELLY OLA IGBOJI).
PDF
Programa preliminar noche zero
PPTX
Blogger Bladt Presentation
PDF
Collaboration in the New Life Sciences 1st Edition John N. Parker (Editor)
DOCX
Curabitur felis nisi, vehicula eu, bibendum id, erat. Aliqua.docx
Synthetic biology and energy from the sun future is bright for food, feed &am...
Ecology
RESUME(PAUL + NELLY OLA IGBOJI).
Programa preliminar noche zero
Blogger Bladt Presentation
Collaboration in the New Life Sciences 1st Edition John N. Parker (Editor)
Curabitur felis nisi, vehicula eu, bibendum id, erat. Aliqua.docx

Similar to Data recovery of archival data: a temporal story (20)

PPT
Indo norway delhi_vishwas_28_oct2011_final
PPTX
Biodiversity conservation
PDF
The Unified Neutral Theory of Biodiversity and Biogeography MPB 32 Stephen P....
PDF
A global standard_for_the_identification_of_key_biodiversity_areas_final_web
PDF
A global standard_for_the_identification_of_key_biodiversity_areas_final_web
PPTX
Biodiversity cites seminar
PDF
1-s2.0-S0048969715000959-main
PDF
The Unified Neutral Theory of Biodiversity and Biogeography MPB 32 Stephen P....
PPSX
Biodiversity
PPTX
Bio-diversity_uses__threats_and_conservation.pptx
PDF
Designing Field Studies For Biodiversity Conservation 1st Edition Peter Feins...
PDF
Designing Field Studies For Biodiversity Conservation 1st Edition Peter Feins...
PDF
Designing Field Studies For Biodiversity Conservation 1st Edition Peter Feins...
PDF
Biodiversity
PPTX
in situ and ex situ conservation.pptx
PDF
The Unified Neutral Theory Of Biodiversity And Biogeography Mpb32 Stephen P H...
PPTX
Lec 8.pptx National Parks of Pakistan, Laws in a National Park,
PDF
Collaboration In The New Life Sciences 1st Edition John N Parker Editor
PDF
Biodiversity Conservation
PPTX
In situ conservation
Indo norway delhi_vishwas_28_oct2011_final
Biodiversity conservation
The Unified Neutral Theory of Biodiversity and Biogeography MPB 32 Stephen P....
A global standard_for_the_identification_of_key_biodiversity_areas_final_web
A global standard_for_the_identification_of_key_biodiversity_areas_final_web
Biodiversity cites seminar
1-s2.0-S0048969715000959-main
The Unified Neutral Theory of Biodiversity and Biogeography MPB 32 Stephen P....
Biodiversity
Bio-diversity_uses__threats_and_conservation.pptx
Designing Field Studies For Biodiversity Conservation 1st Edition Peter Feins...
Designing Field Studies For Biodiversity Conservation 1st Edition Peter Feins...
Designing Field Studies For Biodiversity Conservation 1st Edition Peter Feins...
Biodiversity
in situ and ex situ conservation.pptx
The Unified Neutral Theory Of Biodiversity And Biogeography Mpb32 Stephen P H...
Lec 8.pptx National Parks of Pakistan, Laws in a National Park,
Collaboration In The New Life Sciences 1st Edition John N Parker Editor
Biodiversity Conservation
In situ conservation
Ad

More from Alison Specht (20)

PDF
Collaboration for Environmental Evidence 2018, Paris
PDF
DSWS PARSEC 200925
PPTX
Parsec 191119 slideshare
PPTX
Retrospective Analysis of Antarctic Tracking Data
PDF
Community assembly on remote islands: does equilibrium theory apply?
PPTX
African rainforest dynamics: interactions between ecological processes and co...
PPTX
Community resistance to biological invasions : role of diversity and network ...
PPTX
Origin and congruence of taxonomic, phylogenetic and functional diversity in ...
PPTX
How local-scale processes build up the large-scale response of butterflies to...
PDF
NETSEED : a cross-disciplinary project to analyse how small farms contribute ...
PPT
The linkages between biodiversity and the transmission of emerging infectious...
PPTX
Macroecology of species pools: insights from network theory
PPTX
Reef fish, newcomers to macro-ecology
PPTX
Feedback of a couple of eco-informatic tools for soil invertebrate functional...
PPTX
Global patterns of insect diiversity, distribution and evolutionary distinctness
PPTX
Biodiversity of intermittent rivers: analysis & synthesis
PPTX
Coreids massol 171107
PDF
Stockwell aslo2017 session024_final
PPTX
Data sharing archiving discovery, Bill Michener
PPTX
Michener workshop montpellier
Collaboration for Environmental Evidence 2018, Paris
DSWS PARSEC 200925
Parsec 191119 slideshare
Retrospective Analysis of Antarctic Tracking Data
Community assembly on remote islands: does equilibrium theory apply?
African rainforest dynamics: interactions between ecological processes and co...
Community resistance to biological invasions : role of diversity and network ...
Origin and congruence of taxonomic, phylogenetic and functional diversity in ...
How local-scale processes build up the large-scale response of butterflies to...
NETSEED : a cross-disciplinary project to analyse how small farms contribute ...
The linkages between biodiversity and the transmission of emerging infectious...
Macroecology of species pools: insights from network theory
Reef fish, newcomers to macro-ecology
Feedback of a couple of eco-informatic tools for soil invertebrate functional...
Global patterns of insect diiversity, distribution and evolutionary distinctness
Biodiversity of intermittent rivers: analysis & synthesis
Coreids massol 171107
Stockwell aslo2017 session024_final
Data sharing archiving discovery, Bill Michener
Michener workshop montpellier
Ad

Recently uploaded (20)

PPTX
"One Earth Celebrating World Environment Day"
PDF
Tree Biomechanics, a concise presentation
PPTX
Disposal Of Wastes.pptx according to community medicine
PDF
2-Reqerwsrhfdfsfgtdrttddjdiuiversion 2.pdf
PPTX
Environmental Ethics: issues and possible solutions
PDF
Ornithology-Basic-Concepts.pdf..........
DOCX
Epoxy Coated Steel Bolted Tanks for Leachate Storage Securely Contain Landfil...
PDF
Insitu conservation seminar , national park ,enthobotanical significance
PPTX
Topic Globalisation and Lifelines of National Economy (1).pptx
PPTX
Green and Cream Aesthetic Group Project Presentation.pptx
PDF
Effective factors on adoption of intercropping and it’s role on development o...
PPT
PPTPresentation3 jhsvdasvdjhavsdhsvjcksjbc.jasb..ppt
DOCX
Epoxy Coated Steel Bolted Tanks for Beverage Wastewater Storage Manages Liqui...
PDF
Earthquake, learn from the past and do it now.pdf
PDF
Effect of salinity on biochimical and anatomical characteristics of sweet pep...
PPTX
Envrironmental Ethics: issues and possible solution
PDF
Effect of anthropisation and revegetation efforts on soil bacterial community...
PPTX
UN Environmental Inventory User Training 2021.pptx
PDF
School Leaders Revised Training Module, SCB.pdf
PPTX
FIRE SAFETY SEMINAR SAMPLE FOR EVERYONE.pptx
"One Earth Celebrating World Environment Day"
Tree Biomechanics, a concise presentation
Disposal Of Wastes.pptx according to community medicine
2-Reqerwsrhfdfsfgtdrttddjdiuiversion 2.pdf
Environmental Ethics: issues and possible solutions
Ornithology-Basic-Concepts.pdf..........
Epoxy Coated Steel Bolted Tanks for Leachate Storage Securely Contain Landfil...
Insitu conservation seminar , national park ,enthobotanical significance
Topic Globalisation and Lifelines of National Economy (1).pptx
Green and Cream Aesthetic Group Project Presentation.pptx
Effective factors on adoption of intercropping and it’s role on development o...
PPTPresentation3 jhsvdasvdjhavsdhsvjcksjbc.jasb..ppt
Epoxy Coated Steel Bolted Tanks for Beverage Wastewater Storage Manages Liqui...
Earthquake, learn from the past and do it now.pdf
Effect of salinity on biochimical and anatomical characteristics of sweet pep...
Envrironmental Ethics: issues and possible solution
Effect of anthropisation and revegetation efforts on soil bacterial community...
UN Environmental Inventory User Training 2021.pptx
School Leaders Revised Training Module, SCB.pdf
FIRE SAFETY SEMINAR SAMPLE FOR EVERYONE.pptx

Data recovery of archival data: a temporal story

  • 1. A story of data won, data lost and data re-found: the realities of ecological data preservation Alison Specht School of Earth and Environmental Sciences with collaborators Matt Bolton, Lee Belbin and Bryn Kingsford.
  • 2. The story Introduction Chapter 1 Chapter 2 Chapter 3 Reflections Specht A. Nov. 2020
  • 3. The story Introduction Chapter 1 Chapter 2 Chapter 3 Reflections Specht A. Nov. 2020
  • 4. Introduction: global changes in science Specht A. Nov. 2020
  • 5. Introduction: global changes in science • Integrating the internationalisation of science • The race for the moon and stars (the IGY) • The International Biological Program (IBP) Specht A. Nov. 2020
  • 6. Introduction: global changes in science • Integrating the internationalisation of science • The race for the moon and stars (the IGY) • The International Biological Program (IBP) Specht A. Nov. 2020
  • 7. Introduction: global changes in science • Integrating the internationalisation of science • The race for the moon and stars (the IGY) • The International Biological Program (IBP) Specht A. Nov. 2020
  • 8. Introduction: The IBP, 1964-1974 Was to ensure a world-wide study of1: (a) organic production on the land, in freshwaters and in the seas, and the potentialities and uses of new as well as of existing natural resources, and (b) human adaptability to changing conditions. In addition: • to strengthen scientific support for developing nations through international collaboration. • ensure biological productivity for the benefit of humans. 1Worthington E.B. (1965) Nature 208: 223-226. Specht A. Nov. 2020
  • 9. Introduction: The IBP, 1964-1974 Was to ensure a world-wide study of1: (a) organic production on the land, in freshwaters and in the seas, and the potentialities and uses of new as well as of existing natural resources, and (b) human adaptability to changing conditions. In addition: • to strengthen scientific support for developing nations through international collaboration. • ensure biological productivity for the benefit of humans. 1Worthington E.B. (1965) Nature 208: 223-226. Specht A. Nov. 2020
  • 10. Introduction: The IBP, 1964-1974 Was to ensure a world-wide study of1: (a) organic production on the land, in freshwaters and in the seas, and the potentialities and uses of new as well as of existing natural resources, and (b) human adaptability to changing conditions. In addition: • to strengthen scientific support for developing nations through international collaboration. • ensure biological productivity for the benefit of humans. 1Worthington E.B. (1965) Nature 208: 223-226. Specht A. Nov. 2020
  • 11. Introduction: The IBP, 1964-1974 Was to ensure a world-wide study of1: (a) organic production on the land, in freshwaters and in the seas, and the potentialities and uses of new as well as of existing natural resources, and (b) human adaptability to changing conditions. In addition: • to strengthen scientific support for developing nations through international collaboration. • ensure biological productivity for the benefit of humans. 1Worthington E.B. (1965) Nature 208: 223-226. Specht A. Nov. 2020
  • 12. Introduction: Organisation of the IBP Seven scientific areas, each with a chair and section committee, were established: PT: productivity of terrestrial communities PP: production processes CT: conservation of terrestrial communities* PF: productivity of freshwater communities PM: productivity of marine communities HA: human adaptability UM: use and management of biological resources. • the Royal Society agreed to host the Special Committee of the IBP • No international funding was allocated (except for the SCIBP) * Max Nicholson the International Chair, co-founder of the WWF, IUCN etc Specht A. Nov. 2020
  • 13. Introduction: Organisation of the IBP Seven scientific areas, each with a chair and section committee, were established: PT: productivity of terrestrial communities PP: production processes CT: conservation of terrestrial communities* PF: productivity of freshwater communities PM: productivity of marine communities HA: human adaptability UM: use and management of biological resources. • the Royal Society agreed to host the Special Committee of the IBP • No international funding was allocated (except for the SCIBP) * Max Nicholson the International Chair, co-founder of the WWF, IUCN etc Specht A. Nov. 2020
  • 14. Introduction: Organisation of the IBP Seven scientific areas, each with a chair and section committee, were established: PT: productivity of terrestrial communities PP: production processes CT: conservation of terrestrial communities* PF: productivity of freshwater communities PM: productivity of marine communities HA: human adaptability UM: use and management of biological resources. • the Royal Society agreed to host the Special Committee of the IBP • No international funding was allocated (except for the SCIBP) * Max Nicholson the International Chair, co-founder of the WWF, IUCN etc Specht A. Nov. 2020
  • 15. Introduction: In Australia A national IBP committee was established under the AAS, and the section structure rationalised1. PT: productivity of terrestrial communities PP: production processes CT: conservation of terrestrial communities PF: productivity of freshwater communities PM: productivity of marine communities HA: human adaptability UM: use and management of biological resources. 1Frankel O.H. (1966) Aust. J. Science 28(8): 324-326 Specht A. Nov. 2020
  • 16. Introduction: In Australia A national IBP committee was established under the AAS, and the section structure rationalised1. PT: productivity of terrestrial communities PP: production processes CT: conservation of terrestrial communities PF: productivity of freshwater communities PM: productivity of marine communities HA: human adaptability UM: use and management of biological resources. 1Frankel O.H. (1966) Aust. J. Science 28(8): 324-326 Specht A. Nov. 2020 Section PCT Chair Ray Specht Chair G. Humphrey
  • 17. Introduction: In Australia A national IBP committee was established under the AAS, and the section structure rationalised1. PT: productivity of terrestrial communities PP: production processes CT: conservation of terrestrial communities PF: productivity of freshwater communities PM: productivity of marine communities HA: human adaptability UM: use and management of biological resources. 1Frankel O.H. (1966) Aust. J. Science 28(8): 324-326 Specht A. Nov. 2020 Section PCT Chair Ray Specht Chair G. Humphrey
  • 18. The story Introduction Chapter 1 Chapter 2 Chapter 3 reflections Specht A. Nov. 2020
  • 19. Chapter 1: IBP-CT conservation of terrestrial communities • 1964 Max Nicholson visited Australia • 1966 RL Specht1 outlined the CT plan for a conservation assessment of major plant communities in Australia and PNG. No national funding was made available. • 1974 the ‘Conservation survey of major plant communities.’2 was published. This was supported by a team of state and territory convenors and Dr Geoff Mosely of the Australian Conservation Foundation. 1 Specht (1966) Aust. J. Science 28(10): 377-380. 2 Specht, Roe and Boughton (1974) Aust. J. Bot. Suppl no. 7 Specht A. Nov. 2020
  • 20. Chapter 1: IBP-CT conservation of terrestrial communities • 1964 Max Nicholson visited Australia • 1966 RL Specht1 outlined the CT plan for a conservation assessment of major plant communities in Australia and PNG. No national funding was made available. • 1974 the ‘Conservation survey of major plant communities.’2 was published. This was supported by a team of state and territory convenors and Dr Geoff Mosely of the Australian Conservation Foundation. 1 Specht (1966) Aust. J. Science 28(10): 377-380. 2 Specht, Roe and Boughton (1974) Aust. J. Bot. Suppl no. 7 Specht A. Nov. 2020
  • 21. Chapter 1: IBP-CT conservation of terrestrial communities • 1964 Max Nicholson visited Australia • 1966 RL Specht1 outlined the CT plan for a conservation assessment of major plant communities in Australia and PNG. No national funding was made available. • 1974 the ‘Conservation survey of major plant communities.’2 was published. This was supported by a team of state and territory convenors and Dr Geoff Mosely of the Australian Conservation Foundation. 1 Specht (1966) Aust. J. Science 28(10): 377-380. 2 Specht, Roe and Boughton (1974) Aust. J. Bot. Suppl no. 7 Specht A. Nov. 2020
  • 22. The story Introduction Chapter 1 Chapter 2 Chapter 3 Reflections Specht A. Nov. 2020
  • 23. Chapter 2: IBP-CT conservation of terrestrial communities 1975. Make an objective assessment Aim: (i) collate and harmonise all the existing vegetation survey data across Australia, (ii) convert paper to digital using the new sophisticated computer systems, (iii) use the new non-parametric analyses available through CSIRONET to define broad plant formation / vegetation complexes, and (iv) assess their conservation adequacy. • Outcome: a basis for decisions on CAR reserves Specht A. Nov. 2020
  • 24. Chapter 2: IBP-CT conservation of terrestrial communities 1975. Make an objective assessment Aim: (i) collate and harmonise all the existing vegetation survey data across Australia, (ii) convert paper to digital using the new sophisticated computer systems, (iii) use the new non-parametric analyses available through CSIRONET to define broad plant formation / vegetation complexes, and (iv) assess their conservation adequacy. • Outcome: a basis for decisions on CAR reserves Specht A. Nov. 2020
  • 25. Chapter 2: IBP-CT conservation of terrestrial communities 1975. Make an objective assessment Aim: (i) collate and harmonise all the existing vegetation survey data across Australia, (ii) convert paper to digital using the new sophisticated computer systems, (iii) use the new non-parametric analyses available through CSIRONET to define broad plant formation / vegetation complexes, and (iv) assess their conservation adequacy. • Outcome: a basis for decisions on CAR reserves Specht A. Nov. 2020
  • 26. Chapter 2: Collation and extraction of data Specht A. Nov. 2020
  • 27. Chapter 2: Collation and extraction of data Specht A. Nov. 2020
  • 28. Chapter 2: Collation and extraction of data Specht A. Nov. 2020
  • 29. Chapter 2: Collation and extraction of data Specht A. Nov. 2020
  • 30. Due to computing capacity, the data were organized into state x formation datasets, and codes were employed for many items rather than names. State: N = New South Wales, V = Victoria, T = Tasmania etc. Formation: Closed forests, chenopod shrubland, desert acacia etc. Chapter 2: Data organization and entry LINE IDInformation 800000 N 503200 LOCATION N032 = CENTRAL COAST: SYDNEY (PIDGEON 1940) 903200 33 51 151 13 503201 COMMUNITY 01 = FRESHWATER RIVER (COMBINED LIST) 003201 UTRIAUST UTRIEXOL UTRIBILO VALLGIGA POTAOCHR POTAPERF POTATRIC BRASSCHR # 003201 NAJAMARI MYRIPROP PHRAAUST ELEOCHAR* TYPHORIE TYPHDOMI TRIGPROC TRIGSTRI # 003201 JUNCPAUC JUNCPALL JUNCPLAN AGROAVEN GAHNIA__* CASUCUNN MELALINA MELASTYP # 003201 CALLSALI EUCAROBU EUCAAMPL CAREX___* ISOLPROL VILLRENI ALISPLAN RANURIVU # 003201 GRATPUBE GOODPANI HYDRPEDU CENTASIA VIOLHEDE PRUNVULG STELFLAC SCHOAPOG # 003201 OPLIIMBE BLECINDI ADIAAETH PHILLANU # 503202 COMMUNITY 02 = FRESHWATER SWAMPS ON WIND BLOWN SAND (PORT STEPHENS) 003202 BAUMTERE BAUMARTI TRIGPROC TRIGSTRI PHILLANU LEPIARTI MELAQUIN EUCAROBU # 003202 ISOLINUN GRATPEDU DROSSPAT VILLRENI BAUMJUNC SCHOBREV RESTAUST LEPTTENA # 003202 RESTTETR SPREINCA BOROPARV EPACOBTU GONOMICR BLECINDI HYDRTRIP SPHAGNUM* # 003202 VIOLHEDE # 500000 ------------------------------- Specht A. Nov. 2020
  • 31. Due to computing capacity, the data were organized into state x formation datasets, and codes were employed for many items rather than names. State: N = New South Wales, V = Victoria, T = Tasmania etc. Formation: Closed forests, chenopod shrubland, desert acacia etc. Chapter 2: Data organization and entry LINE IDInformation 800000 N 503200 LOCATION N032 = CENTRAL COAST: SYDNEY (PIDGEON 1940) 903200 33 51 151 13 503201 COMMUNITY 01 = FRESHWATER RIVER (COMBINED LIST) 003201 UTRIAUST UTRIEXOL UTRIBILO VALLGIGA POTAOCHR POTAPERF POTATRIC BRASSCHR # 003201 NAJAMARI MYRIPROP PHRAAUST ELEOCHAR* TYPHORIE TYPHDOMI TRIGPROC TRIGSTRI # 003201 JUNCPAUC JUNCPALL JUNCPLAN AGROAVEN GAHNIA__* CASUCUNN MELALINA MELASTYP # 003201 CALLSALI EUCAROBU EUCAAMPL CAREX___* ISOLPROL VILLRENI ALISPLAN RANURIVU # 003201 GRATPUBE GOODPANI HYDRPEDU CENTASIA VIOLHEDE PRUNVULG STELFLAC SCHOAPOG # 003201 OPLIIMBE BLECINDI ADIAAETH PHILLANU # 503202 COMMUNITY 02 = FRESHWATER SWAMPS ON WIND BLOWN SAND (PORT STEPHENS) 003202 BAUMTERE BAUMARTI TRIGPROC TRIGSTRI PHILLANU LEPIARTI MELAQUIN EUCAROBU # 003202 ISOLINUN GRATPEDU DROSSPAT VILLRENI BAUMJUNC SCHOBREV RESTAUST LEPTTENA # 003202 RESTTETR SPREINCA BOROPARV EPACOBTU GONOMICR BLECINDI HYDRTRIP SPHAGNUM* # 003202 VIOLHEDE # 500000 ------------------------------- source Specht A. Nov. 2020
  • 32. Due to computing capacity, the data were organized into state x formation datasets, and codes were employed for many items rather than names. State: N = New South Wales, V = Victoria, T = Tasmania etc. Formation: Closed forests, chenopod shrubland, desert acacia etc. Chapter 2: Data organization and entry LINE IDInformation 800000 N 503200 LOCATION N032 = CENTRAL COAST: SYDNEY (PIDGEON 1940) 903200 33 51 151 13 503201 COMMUNITY 01 = FRESHWATER RIVER (COMBINED LIST) 003201 UTRIAUST UTRIEXOL UTRIBILO VALLGIGA POTAOCHR POTAPERF POTATRIC BRASSCHR # 003201 NAJAMARI MYRIPROP PHRAAUST ELEOCHAR* TYPHORIE TYPHDOMI TRIGPROC TRIGSTRI # 003201 JUNCPAUC JUNCPALL JUNCPLAN AGROAVEN GAHNIA__* CASUCUNN MELALINA MELASTYP # 003201 CALLSALI EUCAROBU EUCAAMPL CAREX___* ISOLPROL VILLRENI ALISPLAN RANURIVU # 003201 GRATPUBE GOODPANI HYDRPEDU CENTASIA VIOLHEDE PRUNVULG STELFLAC SCHOAPOG # 003201 OPLIIMBE BLECINDI ADIAAETH PHILLANU # 503202 COMMUNITY 02 = FRESHWATER SWAMPS ON WIND BLOWN SAND (PORT STEPHENS) 003202 BAUMTERE BAUMARTI TRIGPROC TRIGSTRI PHILLANU LEPIARTI MELAQUIN EUCAROBU # 003202 ISOLINUN GRATPEDU DROSSPAT VILLRENI BAUMJUNC SCHOBREV RESTAUST LEPTTENA # 003202 RESTTETR SPREINCA BOROPARV EPACOBTU GONOMICR BLECINDI HYDRTRIP SPHAGNUM* # 003202 VIOLHEDE # 500000 ------------------------------- programmes (e.g. basic, fortran) could not handle long names (e.g. species names) so code was required. In this example, EUCAROBU = Eucalyptus robusta (at the time) source Two lists for this location Specht A. Nov. 2020
  • 33. Due to computing capacity, the data were organized into state x formation datasets, and codes were employed for many items rather than names. State: N = New South Wales, V = Victoria, T = Tasmania etc. Formation: Closed forests, chenopod shrubland, desert acacia etc. Chapter 2: Data organization and entry LINE IDInformation 800000 N 503200 LOCATION N032 = CENTRAL COAST: SYDNEY (PIDGEON 1940) 903200 33 51 151 13 503201 COMMUNITY 01 = FRESHWATER RIVER (COMBINED LIST) 003201 UTRIAUST UTRIEXOL UTRIBILO VALLGIGA POTAOCHR POTAPERF POTATRIC BRASSCHR # 003201 NAJAMARI MYRIPROP PHRAAUST ELEOCHAR* TYPHORIE TYPHDOMI TRIGPROC TRIGSTRI # 003201 JUNCPAUC JUNCPALL JUNCPLAN AGROAVEN GAHNIA__* CASUCUNN MELALINA MELASTYP # 003201 CALLSALI EUCAROBU EUCAAMPL CAREX___* ISOLPROL VILLRENI ALISPLAN RANURIVU # 003201 GRATPUBE GOODPANI HYDRPEDU CENTASIA VIOLHEDE PRUNVULG STELFLAC SCHOAPOG # 003201 OPLIIMBE BLECINDI ADIAAETH PHILLANU # 503202 COMMUNITY 02 = FRESHWATER SWAMPS ON WIND BLOWN SAND (PORT STEPHENS) 003202 BAUMTERE BAUMARTI TRIGPROC TRIGSTRI PHILLANU LEPIARTI MELAQUIN EUCAROBU # 003202 ISOLINUN GRATPEDU DROSSPAT VILLRENI BAUMJUNC SCHOBREV RESTAUST LEPTTENA # 003202 RESTTETR SPREINCA BOROPARV EPACOBTU GONOMICR BLECINDI HYDRTRIP SPHAGNUM* # 003202 VIOLHEDE # 500000 ------------------------------- programmes (e.g. basic, fortran) could not handle long names (e.g. species names) so code was required. In this example, EUCAROBU = Eucalyptus robusta (at the time) Latitude longitude (degrees) source Two lists for this location Specht A. Nov. 2020
  • 34. Due to computing capacity, the data were organized into state x formation datasets, and codes were employed for many items rather than names. State: N = New South Wales, V = Victoria, T = Tasmania etc. Formation: Closed forests, chenopod shrubland, desert acacia etc. Chapter 2: Data organization and entry LINE IDInformation 800000 N 503200 LOCATION N032 = CENTRAL COAST: SYDNEY (PIDGEON 1940) 903200 33 51 151 13 503201 COMMUNITY 01 = FRESHWATER RIVER (COMBINED LIST) 003201 UTRIAUST UTRIEXOL UTRIBILO VALLGIGA POTAOCHR POTAPERF POTATRIC BRASSCHR # 003201 NAJAMARI MYRIPROP PHRAAUST ELEOCHAR* TYPHORIE TYPHDOMI TRIGPROC TRIGSTRI # 003201 JUNCPAUC JUNCPALL JUNCPLAN AGROAVEN GAHNIA__* CASUCUNN MELALINA MELASTYP # 003201 CALLSALI EUCAROBU EUCAAMPL CAREX___* ISOLPROL VILLRENI ALISPLAN RANURIVU # 003201 GRATPUBE GOODPANI HYDRPEDU CENTASIA VIOLHEDE PRUNVULG STELFLAC SCHOAPOG # 003201 OPLIIMBE BLECINDI ADIAAETH PHILLANU # 503202 COMMUNITY 02 = FRESHWATER SWAMPS ON WIND BLOWN SAND (PORT STEPHENS) 003202 BAUMTERE BAUMARTI TRIGPROC TRIGSTRI PHILLANU LEPIARTI MELAQUIN EUCAROBU # 003202 ISOLINUN GRATPEDU DROSSPAT VILLRENI BAUMJUNC SCHOBREV RESTAUST LEPTTENA # 003202 RESTTETR SPREINCA BOROPARV EPACOBTU GONOMICR BLECINDI HYDRTRIP SPHAGNUM* # 003202 VIOLHEDE # 500000 ------------------------------- programmes (e.g. basic, fortran) could not handle long names (e.g. species names) so code was required. In this example, EUCAROBU = Eucalyptus robusta (at the time) Latitude longitude (degrees) source Two lists for this location Formation Locations Communities Species* Closed forests n/a 644 1,418 Dry scrubs – SE Queensland 232 232 475 Dry scrubs – Northern Territory n/a 1,219 559 Eucalypt open-forests and woodlands (tree species) 201 1,275 276 Sclerophyll vegetation SW Western Australia 64 172 1,761 Sclerophyll vegetation Central and Eastern Australia 188 549 2,581** Sclerophyll vegetation – heathland and tall shrubland 136 312 2,071** Alpine vegetation 73 61 556 Savanna understorey 56 198 1,313 Mallee open-scrub 28 41 395 Desert Acacia 54 148 1,229 Chenopod shrubland 30 68 410 Forested wetlands (including brigalow) 31 36 193 Arid wetlands 20 42 642 Freshwater swamp vegetation 80 80 139 Coastal dune vegetation 45 56 315 Coastal wetland vegetation (mangroves and saltmarshes) n/a 15 74 * Not including introduced species or singletons within the formation. ** Not including tree species > 10m tall Specht A. Nov. 2020
  • 35. Chapter 2: Data handling • Entry and storage • Desktop computers were used for data entry to UQ’s PDP-10. 9-track magnetic tapes were used as regular backup. • Analysis • Analysis on CSIRONET mainframe computer (TAXON & TWINSPAN). • Hard copies (as in print-outs for proofing and run outputs) obtained throughout. • Data processing • Described in a procedures manual (CAVE: M.P. Bolton) Specht A. Nov. 2020
  • 36. Chapter 2: Data handling • Entry and storage • Desktop computers were used for data entry to UQ’s PDP-10. 9-track magnetic tapes were used as regular backup. • Analysis • Analysis on CSIRONET mainframe computer (TAXON & TWINSPAN). • Hard copies (as in print-outs for proofing and run outputs) obtained throughout. • Data processing • Described in a procedures manual (CAVE: M.P. Bolton) Specht A. Nov. 2020
  • 37. Chapter 2: Data handling • Entry and storage • Desktop computers were used for data entry to UQ’s PDP-10. 9-track magnetic tapes were used as regular backup. • Analysis • Analysis on CSIRONET mainframe computer (TAXON & TWINSPAN). • Hard copies (as in print-outs for proofing and run outputs) obtained throughout. • Data processing • Described in a procedures manual (CAVE: M.P. Bolton) Specht A. Nov. 2020
  • 38. Chapter 2: Result 1995: 911 objectively-defined plant communities, mapped, keys for their identification, their conservation status, and biogeographic regionalization… Specht A. Nov. 2020
  • 39. Chapter 2: Result 1995: 911 objectively-defined plant communities, mapped, keys for their identification, their conservation status, and biogeographic regionalization… Specht A. Nov. 2020
  • 40. Chapter 2: Result Specht R.L., Specht A., Whelan M. Hegarty E. (1995) Conservation Atlas of Plant Communities in Australia. Centre for Coastal Management and Southern Cross University Press. (2kg) Specht R.L. and Specht A. (2002) Objective classification of plant communities in tropical and subtropical Australia. Proceedings of the Royal Society of Queensland 110: 65-82. Specht R.L.. and Specht A. (2013) Australia: Biodiversity of Ecosystems. In, The Encyclopedia of Biodiversity Vol. 1 (ed. B. Levin, et al.) pp 291-306. Waltham, MA: Academic Press. 1995: 911 objectively-defined plant communities, mapped, keys for their identification, their conservation status, and biogeographic regionalization… Specht A. Nov. 2020
  • 41. Chapter 2: Result Specht R.L., Specht A., Whelan M. Hegarty E. (1995) Conservation Atlas of Plant Communities in Australia. Centre for Coastal Management and Southern Cross University Press. (2kg) Specht R.L. and Specht A. (2002) Objective classification of plant communities in tropical and subtropical Australia. Proceedings of the Royal Society of Queensland 110: 65-82. Specht R.L.. and Specht A. (2013) Australia: Biodiversity of Ecosystems. In, The Encyclopedia of Biodiversity Vol. 1 (ed. B. Levin, et al.) pp 291-306. Waltham, MA: Academic Press. 1995: 911 objectively-defined plant communities, mapped, keys for their identification, their conservation status, and biogeographic regionalization… Specht A. Nov. 2020
  • 42. The story Introduction Chapter 1 Chapter 2 Chapter 3 Reflections Specht A. Nov. 2020
  • 43. But what about the data? • In 1995 there was no ‘home’ for the data, so the data were ‘saved’ on the magnetic tapes and subsequently exabyte tapes when the main-frame reader was de-commissioned. The print-outs were conserved. • High-level data for biogeographical analysis (by PATN) was saved on excel • So there they sat…until someone cared… Specht A. Nov. 2020
  • 44. Why should we care? Value proposition These are heritage data. They were collected on field trips from 1879- 1989 and provide unique records for comparison. Repeating initial project work would be painful, if not impossible Opportunities in the 2010s new data initiatives were emerging and globally linked AND, key members of the team were still alive, personally invested and new team members identified. 👉 Can we save the data and finally make it available? Specht A. Nov. 2020
  • 45. Why should we care? Value proposition These are heritage data. They were collected on field trips from 1879- 1989 and provide unique records for comparison. Repeating initial project work would be painful, if not impossible Opportunities in the 2010s new data initiatives were emerging and globally linked AND, key members of the team were still alive, personally invested and new team members identified. 👉 Can we save the data and finally make it available? Specht A. Nov. 2020
  • 46. Why should we care? Value proposition These are heritage data. They were collected on field trips from 1879- 1989 and provide unique records for comparison. Repeating initial project work would be painful, if not impossible Opportunities in the 2010s new data initiatives were emerging and globally linked AND, key members of the team were still alive, personally invested and new team members identified. 👉 Can we save the data and finally make it available? Specht A. Nov. 2020
  • 47. Chapter 3: what do we need to do? • Recover available data • Design an appropriate structure • Update the species codes/names to current nomenclature • Update georeferencing and check errors • Map the fields used in the Conservation Atlas project to the Darwin Core standard • Publish the data in an open repository Terrestrial Ecosystem Research NetworkSpecht A. Nov. 2020
  • 48. Chapter 3: Recover available data • Challenge: (a) find exabyte tapes, and (b) an exabyte tape reader. • While waiting, started on the print-outs: • Master sites file (location, formation & community data) • Reference file (source data) • Finally, the last exabyte tape reader in captivity was found (and about to be de-commissioned)! • Two major challenges remained: updating georeferences and species names Specht A. Nov. 2020
  • 49. Georeferencing On-line resources such as the Biodiversity Heritage Library, the National Library of Australia Maps in Appendices often not scanned in digital copies of old journals Original locations were accurate to half a degree, so the team did four things: • Reviewed original documents and where possible contacted authors to update locations • Checked locations on google maps • Checked locations on the ALA's Spatial Portal so vegetation and soil type could be displayed for checking • Mapped data repeatedly on the ALA sandbox site. Co-ordinate precision was then estimated to reflect confidence in the range of the community. Original articles Specht A. Nov. 2020
  • 50. Species names Sequential row number Validity and Growth habit flag species code Original scientific name Scientific names updated during Conservation Atlas project 2 L G ABELMOSC Abelmoschus moschatus 19 LMG ACACARGY Acacia argyrodendron 20 SZG ACACARMA -> ACACPARA Acacia armata Acacia paradoxa 21 MLG ACACASHA -> ACACOSHA Acacia ashanesii Acacia oshanesii 174 S G ACAKEMP Acacia sp. aff. A. sibirica Acacia sp. aff. A. kempeana 466 S G BORRCARP/ -> SPERSTEN/ Borreria sp. aff. carpentariae Spermacoce sp. aff. stenophylla 704 S G CARPAEQU -> CARPMODE Carpobrotus aequilaterus Carpobrotus modestus 705 L G CARPMODE Carpobrotus modestus CODES to NAMES • apply master species conversion file • blend across formations (with caution as some species names are location- and formation-specific) Specht A. Nov. 2020
  • 51. Update to current nomenclature CODE Meaning action MATCH Near-exact match or better accept PARTIAL-L and PARTIAL-R A significant substring match manual check FUZZY Fuzzy matching algorithm built on the score from the web service using a 'letter-pair similarity' score manual check WEAK A weak match falling below thresholds; the best match is retained manual check TAXM No match or major problem with original or subsequent species name refer to expert Stage 1. Current name check Due to the size of the data set, the Atlas of Living Australia web service lookup (BIE) was employed, with codes allocated for follow-up (or not). Stage 2. Validation Stage 3. Reference to an expert Resources used included: 1. On-line national species records 2. State species records 3. Books and papers 4. Experts Specht A. Nov. 2020
  • 52. Chapter 3: Map the fields used to the Darwin Core standard Specht A. Nov. 2020
  • 53. Specht A. Nov. 2020 Chapter 3: spatial view
  • 54. Specht A. Nov. 2020 Chapter 3: spatial view
  • 55. Specht A. Nov. 2020 Chapter 3: spatial view
  • 56. Specht A. Nov. 2020 Chapter 3: spatial view
  • 57. Specht A. Nov. 2020 Chapter 3: spatial view
  • 60. Specht A. Nov. 2020 Chapter 3: spatial view
  • 61. Chapter 3: Data delivery • Ingested into the Atlas of Living Australia as a collection, discoverable through species records with associated metadata: • https://collections.ala.org.au/public/show/dr8212 • Delivered as csv and excel with associated code for replication in the Knowledge Network for Biocomplexity: • http://doi.org/10.5063/F1QC01QK • Perhaps one day discoverable as survey information Specht A. Nov. 2020
  • 62. The story Introduction Chapter 1 Chapter 2 Chapter 3 Reflections Specht A. Nov. 2020
  • 63. Reflection: How did we do?  Data saved, updated and deposited for future use in two stable repositories.  9450 taxa found in 1390 communities at 461 locations across the continent of Australia, between 1879 and 1989 from 714 sources. This is a lot! Sadly this represents only around half of the material used in Chapter 2 (the Atlas). Why? Specht A. Nov. 2020 1° loss of data on transfer from magnetic tape to exabyte tape back in 1991. And it appears in some instances those data cannot be found elsewhere.
  • 64. Reflection: How did we do?  Data saved, updated and deposited for future use in two stable repositories.  9450 taxa found in 1390 communities at 461 locations across the continent of Australia, between 1879 and 1989 from 714 sources. This is a lot! Sadly this represents only around half of the material used in Chapter 2 (the Atlas). Why? Specht A. Nov. 2020 1° loss of data on transfer from magnetic tape to exabyte tape back in 1991. And it appears in some instances those data cannot be found elsewhere.
  • 65. Reflection: How did we do?  Data saved, updated and deposited for future use in two stable repositories.  9450 taxa found in 1390 communities at 461 locations across the continent of Australia, between 1879 and 1989 from 714 sources. This is a lot! Sadly this represents only around half of the material used in Chapter 2 (the Atlas). Why? Specht A. Nov. 2020 1° loss of data on transfer from magnetic tape to exabyte tape back in 1991. And it appears in some instances those data cannot be found elsewhere.
  • 66. Reflection: take home messages • Researchers and field workers need access to / expertise in data science skills (FAIR) • Data (with metadata) need to be deposited as soon as possible after creation. • We need to think ahead: today’s technology will be outmoded in 5-10 years. • We need to have repositories that are open, secure, and properly managed (TRUST) • Team with others, especially those with diverse but relevant skills. Without this more data will be lost than were ever gathered. Specht A. Nov. 2020
  • 67. Thankyou! Contact: a.specht@uq.edu.au School of Earth and Environmental Sciences Ecosystem Research Analyst, TERN The University of Queensland, Australia Biodiversity Data Journal https://bdj.pensoft.net/article/28073/ Knowledge Network for Biocomplexity https://knb.ecoinformatics.org/#view/doi:10.5063/F1QC01QK Atlas of Living Australia https://collections.ala.org.au/public/show/dr8212 Specht A. Nov. 2020

Editor's Notes

  • #20: Roe and Boughton,
  • #21: Roe and Boughton,
  • #22: Roe and Boughton,