SlideShare a Scribd company logo
MLIM7350 PROJECT
DATA CURATION WORKSHOP
The University of Hong Kong Ernest LAM
Apr 27, 2017
Outline
1. What is Data Curation?
2. Why Data Curation?
3. How to Start Data Curation?
4. How to Organize Data?
5. Which Data Formats to Use?
6. Where to Preserve and Share Data?
1.
What is Data Curation?
“Data Curation is maintaining and
adding value to, a trusted body of digital
information for current and future use; It
encompasses the active management of
data throughout the research lifecycle.
Digital Curation Centre (DCC)
http://www.dcc.ac.uk/about-us/dcc-charter/dcc-charter-and-statement-principles
What is
Data
Curation?
DCC Lifecycle Model DataOne Model
http://www.dcc.ac.uk/sites/default/files/documents/publications/DCCLifecycle.pdf
http://www.dataone.org/sites/all/documents/L02_DataSharing.ppt
x
Data Curation Model
A process of Creation, Preservation, Reuse
What is
Data
Curation?
2.
Why Data Curation?
80%
Data are Unavailableafter
20 years
http://www.nature.com/news/scientists-losing-data-at-a-rapid-rate-1.14416
Why Data
Curation?
New York University, Health Sciences Library
https://youtu.be/N2zK3sAtr-4
Why Data
Curation?
A story about data sharing request that may happen to
researchers...
http://www.sciencemag.org/careers/2014/04/chasing-down-data-you-need
Why Data
Curation?
To ensure the use and reuse of data
● Case Study: An ecologist failed to collect the useful data of an
agricultural researcher after his death
http://www.rss.hku.hk/integrity/research-data-records-management
To meet the local requirement and policy
● HKU’s Policy on the Management of Research Data and Records
Why Data
Curation?
3.
How to Start Data
Curation?
How to
Start Data
Curation?
Data Management Planning Tool
A tool for Researchers to start with managing the data or
writing a proposal for funding
https://dmp.cdlib.org/
Data Management Planning Tool
A list of templates to choose
How to
Start Data
Curation?
Data Management Planning Tool
Visibility Setting: Public, Institutional, Private
Co-worker to edit, view and download
How to
Start Data
Curation?
Data Management Planning Tool
Guidance to help the planning
How to
Start Data
Curation?
How to
Start Data
Curation?
Data Management Planning Tool
Preview
Export to PDF / DOCX / Print
4.
How to Organize
Data?
How to
Organize
Data?
Metadata Standard: Dublin Core
15 standard elements for describing data resources
http://wiki.dublincore.org/index.php/User_Guide
http://seopressor.com/wp-content/uploads/2015/11/dublin-core-elements-2.jpg
How to
Organize
Data?
https://library.stanford.edu/research/data-management-services/data-best-practices/best-practices-file-naming
Tips for File Renaming
✓ Date format - YYYYMMDD or YYMMDD
✗ Use too long File names
✗ Use Special characters, e.g. ~ ! @ # $ % ^ & * ( ) ` ; < > ? , [ ] { } ' " |
Use leading “0” for clarity and to ensure files sort in sequential order
✓ "001, 002, ...010, 011 ... 100, 101, etc."
✗ "1, 2, ...10, 11 ... 100, 101, etc."
File names with spaces must be enclosed in quotes
✓ Underscores, e.g. file_name.xxx
✓ Dashes, e.g. file-name.xxx
✓ No separation, e.g. filename.xxx
✓ Camel case, e.g. FileName.xxx
✗ Use spaces, e.g. file name.xxx
Tools OS Free?
Bulk Rename Utility Windows Yes
Renamer 4 Mac
PSRenamer Linux, Mac, or Windows Yes
How to
Organize
Data?
Tips for Organizing Spreadsheet
Be consistent
✓ Use consistent codes for categorical variables
Fill in all of the cells
✓ Use “NA” or “-” to fill the blank cells for missing data
Create a data dictionary
✓ Use a separate file to describe the data
No calculations in the raw data files
✗ Use calculations and graphs in the raw data file
Don’t use font color or highlighting as data
✓ Use an additional column that indicates the outliers
Make backups
✓ Make a copy of the file with a new version number, e.g. file_v1.xlsx, file_v2.xlsx
✓ Write-protect the file when finished entering the data
For more details: http://kbroman.org/dataorg/
How to
Organize
Data?
Data Cleaning Tools: Open Refine
“A free, open source, powerful tool for working with messy data”
http://openrefine.org/
https://github.com/OpenRefine
https://github.com/OpenRefine/OpenRefine/wiki/Sample-Datasets
How to
Organize
Data?
Network and Graphic Visualization Tools: Gephi
“Interactive visualization and exploration platform for all kinds of
networks and complex systems, dynamic and hierarchical graphs.”
https://gephi.org/
https://gephi.org/images/screenshots/preview2.png
How to
Organize
Data?
Data Visualization Tools: Silk
“Create interactive data visualizations, publish websites, and tell
interactive stories.”
● https://www.silk.co/home
https://www.silk.co/help/charts-tutorial/
5.
Which Data Formats
to Use?
cc The Wolf Law Library - https://www.flickr.com/photos/wolflawlibrary/8747894458/
Forgotten Technologies...
Which
Data
Formats
to use?
Tabular data ● SPSS portable format (.por)
● comma-separated values (.csv)
● SPSS (.sav), Stata (.dta), MS Access (.mdb/.accdb)
● MS Excel (.xls/.xlsx), MS Access (.mdb/.accdb), dBase
(.dbf), OpenDocument Spreadsheet (.ods)
Geospatial data ● ESRI Shapefile (.shp, .shx, .dbf, .prj, .sbx, .sbn
optional)
● CAD data (.dwg)
● ESRI Geodatabase format (.mdb)
● Adobe Illustrator (.ai), CAD data (.dxf or .svg)
Textual data ● Rich Text Format (.rtf)
● plain text, ASCII (.txt)
● eXtensible Mark-up Language (.xml)
● Hypertext Mark-up Language (.html)
● MS Word (.doc/.docx)
Image data ● TIFF 6.0 uncompressed (.tif) ● JPEG (.jpeg, .jpg, .jp2)
● GIF (.gif)
● TIFF other versions (.tiff)
● RAW image format (.raw)
● Photoshop files (.psd)
● BMP (.bmp)
● PNG (.png)
Audio data ● Free Lossless Audio Codec (FLAC) (.flac) ● MPEG-1 Audio Layer 3 (.mp3)
● Audio Interchange File Format (.aif)
● Waveform Audio Format (.wav)
Video data ● MPEG-4 (.mp4)
● OGG video (.ogv, .ogg)
● motion JPEG 2000 (.mj2)
● AVCHD video (.avchd)
Documentation and
scripts
● Rich Text Format (.rtf)
● PDF (.pdf)
● plain text (.txt)
● MS Word (.doc/.docx)
https://www.ukdataservice.ac.uk/manage-data/format/recommended-formats
Better!
Recommended format for preservation,
reuse and sharing
For more details: http://5stardata.info/en/
5 ★ OPEN DATA Which
Data
Formats
to use?
Any format
available on the
web but with an
open licence, to
be Open Data
Available as
machine-
readable
structured data
As (2) + non-
proprietary
format
All the above +
use URIs to
identify things,
so that people
can point at
your stuff
All the above +
link your data
to other data to
provide context
6.
Where to preserve
and share data?
Where to
Preserve
and Share
Data?
Institutional Repository
● HKU Scholars Hub
● enhance visibility of HKU authors and their research
● opportunities for collaboration
● ~325 Datasets
● http://hub.hku.hk/
● Open source code and software
● https://github.com
● Reserve DOI for publication
● https://figshare.com
● Research data with science and medicine
● http://datadryad.org
● Research data with biology and biomedical
● http://gigadb.org/site/index
● largest collection of science dataset
● http://dataverse.org
Disciplinary Repository
● Global online archiving platforms for particular subject
● Some provide free storages
Where to
Preserve
and Share
Data?
REFERENCE
Mallery, M. (2014). Dmptool: Guidance and Resources for Your Data Management Plan;
https://dmp. cdlib. org. Technical Services Quarterly, 31(2), 197-199
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and
stewardship. Sci. Data 3:160018 doi: 10.1038/sdata.2016.18 (2016).
THANKS!
Any questions?
You can find me at lernest@hku.hk
CREDITS
Special thanks to all the people who made and released these awesome resources for free:
▸ Presentation template by SlidesCarnival

More Related Content

PDF
DBpedia/association Introduction The Hague 12.2.2016
PPT
DataCite How To: Use the MDS
PDF
DBpedia Tutorial - Feb 2015, Dublin
PPT
PIDs and DOI registration with DataCite - IATUL Workshop 2013
PPTX
The Semantic Data Web, Sören Auer, University of Leipzig
PPT
DataCite and its DOI infrastructure - IASSIST 2013
PPTX
Linked data life cycles
PPTX
Clipper, research data network
DBpedia/association Introduction The Hague 12.2.2016
DataCite How To: Use the MDS
DBpedia Tutorial - Feb 2015, Dublin
PIDs and DOI registration with DataCite - IATUL Workshop 2013
The Semantic Data Web, Sören Auer, University of Leipzig
DataCite and its DOI infrastructure - IASSIST 2013
Linked data life cycles
Clipper, research data network

What's hot (20)

PDF
TIB's action for research data managament as a national library's strategy in...
PDF
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
PPTX
MongoDB
PPTX
SMRUDAS
PPTX
SharePoint Saturday Durban Presentation
PDF
Digital Preservation in Production (DPN and DuraCloud Vault)
PDF
DBpedia InsideOut
PPT
News from the DOI and DataCite Community
PDF
Portland Common Data Model (PCDM): Creating and Sharing Complex Digital Objects
PPT
PPTX
AAT LOD Microthesauri
PPTX
Data Life Cycle
PPTX
Mongo db
PDF
Sharing Between Data Repositories
PPTX
Mongo db workshop # 01
PPTX
Scaling up Linked Data
PPTX
Expanding the content categories at JaLC
PPT
Semantic HTML
PDF
Baker and Dekkers, "Dublin Core: The Road from Metadata Formats to Linked Data"
TIB's action for research data managament as a national library's strategy in...
Build Narratives, Connect Artifacts: Linked Open Data for Cultural Heritage
MongoDB
SMRUDAS
SharePoint Saturday Durban Presentation
Digital Preservation in Production (DPN and DuraCloud Vault)
DBpedia InsideOut
News from the DOI and DataCite Community
Portland Common Data Model (PCDM): Creating and Sharing Complex Digital Objects
AAT LOD Microthesauri
Data Life Cycle
Mongo db
Sharing Between Data Repositories
Mongo db workshop # 01
Scaling up Linked Data
Expanding the content categories at JaLC
Semantic HTML
Baker and Dekkers, "Dublin Core: The Road from Metadata Formats to Linked Data"
Ad

Similar to HKU Data Curation MLIM7350 Student Project: Data Curation Workshop (20)

PDF
The state of global research data initiatives: observations from a life on th...
PPTX
Data management for TA's
PPTX
Research Data Management Fundamentals for MSU Engineering Students
PPTX
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
PDF
Data Engineering.pdf
PPTX
Good Practice in Research Data Management
PPT
What is-rdm
PDF
All data accessible to all my organization - Presentation at OW2con'19, June...
 
PPT
Keep Calm and Curate
PPTX
RDM@Edinburgh_interoperation_IDCC2015
PDF
Service Integration to Enhance RDM
PPTX
Hughes RDAP11 Data Publication Repositories
PPTX
Research Data (and Software) Management at Imperial: (Everything you need to ...
PPTX
Web storage
PDF
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
PPTX
OU Library Research Support webinar: Working with research data
PPTX
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
PPTX
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
PDF
A Gen3 Perspective of Disparate Data
PPTX
Introduction to RDM for trainee physicians
The state of global research data initiatives: observations from a life on th...
Data management for TA's
Research Data Management Fundamentals for MSU Engineering Students
Research Data Management: An Overview - 2014-05-12 - Humanities Division, Uni...
Data Engineering.pdf
Good Practice in Research Data Management
What is-rdm
All data accessible to all my organization - Presentation at OW2con'19, June...
 
Keep Calm and Curate
RDM@Edinburgh_interoperation_IDCC2015
Service Integration to Enhance RDM
Hughes RDAP11 Data Publication Repositories
Research Data (and Software) Management at Imperial: (Everything you need to ...
Web storage
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
OU Library Research Support webinar: Working with research data
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
A Gen3 Perspective of Disparate Data
Introduction to RDM for trainee physicians
Ad

Recently uploaded (20)

PPTX
Institutional Correction lecture only . . .
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Computing-Curriculum for Schools in Ghana
PPTX
Lesson notes of climatology university.
PDF
Pre independence Education in Inndia.pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Classroom Observation Tools for Teachers
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
master seminar digital applications in india
PDF
RMMM.pdf make it easy to upload and study
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Complications of Minimal Access Surgery at WLH
Institutional Correction lecture only . . .
Abdominal Access Techniques with Prof. Dr. R K Mishra
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Computing-Curriculum for Schools in Ghana
Lesson notes of climatology university.
Pre independence Education in Inndia.pdf
TR - Agricultural Crops Production NC III.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Final Presentation General Medicine 03-08-2024.pptx
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Classroom Observation Tools for Teachers
Pharmacology of Heart Failure /Pharmacotherapy of CHF
master seminar digital applications in india
RMMM.pdf make it easy to upload and study
human mycosis Human fungal infections are called human mycosis..pptx
VCE English Exam - Section C Student Revision Booklet
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Complications of Minimal Access Surgery at WLH

HKU Data Curation MLIM7350 Student Project: Data Curation Workshop

  • 1. MLIM7350 PROJECT DATA CURATION WORKSHOP The University of Hong Kong Ernest LAM Apr 27, 2017
  • 2. Outline 1. What is Data Curation? 2. Why Data Curation? 3. How to Start Data Curation? 4. How to Organize Data? 5. Which Data Formats to Use? 6. Where to Preserve and Share Data?
  • 3. 1. What is Data Curation?
  • 4. “Data Curation is maintaining and adding value to, a trusted body of digital information for current and future use; It encompasses the active management of data throughout the research lifecycle. Digital Curation Centre (DCC) http://www.dcc.ac.uk/about-us/dcc-charter/dcc-charter-and-statement-principles What is Data Curation?
  • 5. DCC Lifecycle Model DataOne Model http://www.dcc.ac.uk/sites/default/files/documents/publications/DCCLifecycle.pdf http://www.dataone.org/sites/all/documents/L02_DataSharing.ppt x Data Curation Model A process of Creation, Preservation, Reuse What is Data Curation?
  • 7. 80% Data are Unavailableafter 20 years http://www.nature.com/news/scientists-losing-data-at-a-rapid-rate-1.14416 Why Data Curation?
  • 8. New York University, Health Sciences Library https://youtu.be/N2zK3sAtr-4 Why Data Curation? A story about data sharing request that may happen to researchers...
  • 9. http://www.sciencemag.org/careers/2014/04/chasing-down-data-you-need Why Data Curation? To ensure the use and reuse of data ● Case Study: An ecologist failed to collect the useful data of an agricultural researcher after his death
  • 10. http://www.rss.hku.hk/integrity/research-data-records-management To meet the local requirement and policy ● HKU’s Policy on the Management of Research Data and Records Why Data Curation?
  • 11. 3. How to Start Data Curation?
  • 12. How to Start Data Curation? Data Management Planning Tool A tool for Researchers to start with managing the data or writing a proposal for funding https://dmp.cdlib.org/
  • 13. Data Management Planning Tool A list of templates to choose How to Start Data Curation?
  • 14. Data Management Planning Tool Visibility Setting: Public, Institutional, Private Co-worker to edit, view and download How to Start Data Curation?
  • 15. Data Management Planning Tool Guidance to help the planning How to Start Data Curation?
  • 16. How to Start Data Curation? Data Management Planning Tool Preview Export to PDF / DOCX / Print
  • 18. How to Organize Data? Metadata Standard: Dublin Core 15 standard elements for describing data resources http://wiki.dublincore.org/index.php/User_Guide http://seopressor.com/wp-content/uploads/2015/11/dublin-core-elements-2.jpg
  • 19. How to Organize Data? https://library.stanford.edu/research/data-management-services/data-best-practices/best-practices-file-naming Tips for File Renaming ✓ Date format - YYYYMMDD or YYMMDD ✗ Use too long File names ✗ Use Special characters, e.g. ~ ! @ # $ % ^ & * ( ) ` ; < > ? , [ ] { } ' " | Use leading “0” for clarity and to ensure files sort in sequential order ✓ "001, 002, ...010, 011 ... 100, 101, etc." ✗ "1, 2, ...10, 11 ... 100, 101, etc." File names with spaces must be enclosed in quotes ✓ Underscores, e.g. file_name.xxx ✓ Dashes, e.g. file-name.xxx ✓ No separation, e.g. filename.xxx ✓ Camel case, e.g. FileName.xxx ✗ Use spaces, e.g. file name.xxx Tools OS Free? Bulk Rename Utility Windows Yes Renamer 4 Mac PSRenamer Linux, Mac, or Windows Yes
  • 20. How to Organize Data? Tips for Organizing Spreadsheet Be consistent ✓ Use consistent codes for categorical variables Fill in all of the cells ✓ Use “NA” or “-” to fill the blank cells for missing data Create a data dictionary ✓ Use a separate file to describe the data No calculations in the raw data files ✗ Use calculations and graphs in the raw data file Don’t use font color or highlighting as data ✓ Use an additional column that indicates the outliers Make backups ✓ Make a copy of the file with a new version number, e.g. file_v1.xlsx, file_v2.xlsx ✓ Write-protect the file when finished entering the data For more details: http://kbroman.org/dataorg/
  • 21. How to Organize Data? Data Cleaning Tools: Open Refine “A free, open source, powerful tool for working with messy data” http://openrefine.org/ https://github.com/OpenRefine https://github.com/OpenRefine/OpenRefine/wiki/Sample-Datasets
  • 22. How to Organize Data? Network and Graphic Visualization Tools: Gephi “Interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs.” https://gephi.org/ https://gephi.org/images/screenshots/preview2.png
  • 23. How to Organize Data? Data Visualization Tools: Silk “Create interactive data visualizations, publish websites, and tell interactive stories.” ● https://www.silk.co/home https://www.silk.co/help/charts-tutorial/
  • 25. cc The Wolf Law Library - https://www.flickr.com/photos/wolflawlibrary/8747894458/ Forgotten Technologies...
  • 26. Which Data Formats to use? Tabular data ● SPSS portable format (.por) ● comma-separated values (.csv) ● SPSS (.sav), Stata (.dta), MS Access (.mdb/.accdb) ● MS Excel (.xls/.xlsx), MS Access (.mdb/.accdb), dBase (.dbf), OpenDocument Spreadsheet (.ods) Geospatial data ● ESRI Shapefile (.shp, .shx, .dbf, .prj, .sbx, .sbn optional) ● CAD data (.dwg) ● ESRI Geodatabase format (.mdb) ● Adobe Illustrator (.ai), CAD data (.dxf or .svg) Textual data ● Rich Text Format (.rtf) ● plain text, ASCII (.txt) ● eXtensible Mark-up Language (.xml) ● Hypertext Mark-up Language (.html) ● MS Word (.doc/.docx) Image data ● TIFF 6.0 uncompressed (.tif) ● JPEG (.jpeg, .jpg, .jp2) ● GIF (.gif) ● TIFF other versions (.tiff) ● RAW image format (.raw) ● Photoshop files (.psd) ● BMP (.bmp) ● PNG (.png) Audio data ● Free Lossless Audio Codec (FLAC) (.flac) ● MPEG-1 Audio Layer 3 (.mp3) ● Audio Interchange File Format (.aif) ● Waveform Audio Format (.wav) Video data ● MPEG-4 (.mp4) ● OGG video (.ogv, .ogg) ● motion JPEG 2000 (.mj2) ● AVCHD video (.avchd) Documentation and scripts ● Rich Text Format (.rtf) ● PDF (.pdf) ● plain text (.txt) ● MS Word (.doc/.docx) https://www.ukdataservice.ac.uk/manage-data/format/recommended-formats Better! Recommended format for preservation, reuse and sharing
  • 27. For more details: http://5stardata.info/en/ 5 ★ OPEN DATA Which Data Formats to use? Any format available on the web but with an open licence, to be Open Data Available as machine- readable structured data As (2) + non- proprietary format All the above + use URIs to identify things, so that people can point at your stuff All the above + link your data to other data to provide context
  • 29. Where to Preserve and Share Data? Institutional Repository ● HKU Scholars Hub ● enhance visibility of HKU authors and their research ● opportunities for collaboration ● ~325 Datasets ● http://hub.hku.hk/
  • 30. ● Open source code and software ● https://github.com ● Reserve DOI for publication ● https://figshare.com ● Research data with science and medicine ● http://datadryad.org ● Research data with biology and biomedical ● http://gigadb.org/site/index ● largest collection of science dataset ● http://dataverse.org Disciplinary Repository ● Global online archiving platforms for particular subject ● Some provide free storages Where to Preserve and Share Data?
  • 31. REFERENCE Mallery, M. (2014). Dmptool: Guidance and Resources for Your Data Management Plan; https://dmp. cdlib. org. Technical Services Quarterly, 31(2), 197-199 Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3:160018 doi: 10.1038/sdata.2016.18 (2016).
  • 32. THANKS! Any questions? You can find me at lernest@hku.hk CREDITS Special thanks to all the people who made and released these awesome resources for free: ▸ Presentation template by SlidesCarnival

Editor's Notes

  • #3: This data curation workshop aims to provide fundamental concept about data curation and practical tools for data management. To help the researchers understand the workshop, the following 6 questions are used: What is Data Curation? Why Data Curation? How to Start Data Curation? How to Organize Data? Which Data Formats to Use? Where to Preserve and Share Data?
  • #5: Keywords: adding value; current and future use; active management; lifecycle
  • #6: 2 data curation models examples: DCC lifecycle and DataOne The main idea of the data curation is that it is a continuous process of creation, preservation, reuse Data curation is more than just preservation - It organizes the data through metadata, and enhanced re-usability of the data
  • #7: A few examples are used to show that data management is important for preservation of the data and re-use of data
  • #8: A statistical figure showing that 80% data are unavailable after 20 years, scientists are losing their data at a rapid rate
  • #9: An interesting cartoon video explaining that the researcher cannot use the data because of the poor data management, such as data format is not working, poor organization of the data name
  • #10: A case study of the researcher cannot collect the useful data of an agricultural researcher after his death
  • #11: Data curation is also needed to satisfy the local requirement or policy by the institution or government. In HK, there is only institutional policy.
  • #13: Data Management Planning Tool - a very simple and useful tool for Researchers to start with managing the data or writing a proposal for funding
  • #14: A wide variety of useful template to choose
  • #15: Visibility setting; co-worker editing function
  • #16: Guideline for the planning
  • #17: Preview and export of data
  • #19: Dublin Core for metadata standard
  • #20: File renaming tools and tips (particularly do not use space for renaming)
  • #22: Open Refine - for data cleaning, a sample dataset is used to demonstrate it is a handy tool if there is a lot of data and we need to combine the same word with different formats, such as different “Spacing”, “Capital letter”, “Articles (a/an/the)”
  • #23: Gephi - for social network analysis and visualization
  • #24: Silk - for the data publishing and visualization
  • #26: Some data storage technologies such as floppy disk and cassette tape are already out of date.
  • #27: Some formats are better choice for preservation, a list of recommended formats is provided. For example, CSV is better than XLSX/XLSX; TXT is better than DOCX/DOC; TIFF is better than JPG.
  • #28: The recommendation is also similar to data sharing, the 5 star open data is a simple indicator to understand which format is better for data sharing. Most of the researchers use PDF and XLS for sharing, however, CSV is a better option RDF - Resource Description Framework; a globally-accepted framework for data and knowledge representation that is intended to be read and interpreted by machines. (http://www.nature.com/articles/sdata201618#ref1) LOD - Linked Open Data; a linked data which is released under an open licence, which does not impede its reuse for free (https://www.w3.org/DesignIssues/LinkedData.html)
  • #30: The data repository is suggested for preservation and sharing of the data. In the case of HKU, there is an institutional repository - HKU Scholars Hub, there are approximately 325 Datasets at the moment.
  • #31: Disciplinary repositories are an online platform for archiving particular subject,and most of them are free: Github is a repository for open source code and software, for example Open Refine Figshare enables reserve the DOI for publication. (DOI refers to the specific persistent link for publication) Dryad is a repository for research data with science and medicine. Dataverse Network is a repository containing all kind of scientific data. It has one of the largest data collection of social science.