SlideShare a Scribd company logo
Towards Reusable
Research Software
Daniel Garijo Verdejo
@dgarijov
daniel.garijo@upm.es
Ontology Engineering Group
Departamento de Inteligencia Artificial
Facultad de Informática
Universidad Politécnica de Madrid
Reproducibility: Open Research Data, Software and Methods
2
Scientific publication
Research Data Research Software Research Methods
EOSC Symposium: Infrastructure for quality research software
Challenges for (Re)using and Sharing Research Software
3
• What does the software component do?
Which of its methods should I use?
• How to transform my data to use the
software component?
• How to interpret the results produced by
the software component?
• How to invoke the software component?
• How to configure the software component
with the right parameters?
• How to compare against similar methods?
Software designer
Software user
• How to ease capturing the
dependencies and installation
instructions of my software?
• How to encapsulate my software so
it can be used with other data?
• How to describe my software so it
can be used by others?
• How to test if my software is ready
to be used by others?
EOSC Symposium: Infrastructure for quality research software
Community Initiatives and Standards
• Describing Research Software
• Schema.org & Codemeta
• Common Worflow Language (I/O)
• Packaging Research Artefacts (incl. software)
• Research Objects (RO-Crate)
• Aggregators (OpenAIRE, EOSC)
• General (e.g., Zenodo) &
domain-specific registries
• Scicodes (https://scicodes.net/)
4
Nine Best Practices for Research Software Registries and Repositories: A Concise Guide https://arxiv.org/abs/2012.13117
EOSC Symposium: Infrastructure for quality research software
Adopting annotation vocabularies: where are we at?
Software metadata is not abundant machine readable
5
EOSC Symposium: Infrastructure for quality research software
Can you please describe your
software component with metadata?
I already did! Did you read the
project readme?
Did you see the online
documentation?
Perhaps the you saw the
paper?
Many domain-specific registries are curated by
hand by experts
Automated Software Metadata Extraction
6
SOMEF
SOftware Metadata
Extraction Framework
https://github.com/KnowledgeCaptureAndDiscovery/somef/
[Mao et al 2019]: SoMEF: A Framework for Capturing Software Metadata from its Documentation. 2019 IEEE BigData REU Symposium. Los
Angeles, 2019
EOSC Symposium: Infrastructure for quality research software
Code repository
(readme)
Machine-readable file with software metadata:
• > 20 common metadata fields
• Installation instructions, description, invocation
command, license, author, citation, requirements,
examples, documentation, notebooks, etc.
• Analysis of readme and supp. Files (e.g., notebooks,
Dockerfiles)
• JSON, RDF(graph), Codemeta, RO (in progress)
Leveraging Software Metadata to create Knowledge Graphs
7
Explore input/output variables (interoperability)
Explore Software I/O files
(composition)
Knowledge Graphs with can link RS and its
components.
OKG-Soft: machine-readable Software Metadata:
• (From Schema.org) Attribution, license, funding,
usage examples...
• Executable software components
• Software invocation
• Input & output files, variables and units
• Containers used to encapsulate and run software
components
• Parameter validation and suggestion
[Garijo et al 2019]: OKG-Soft: An Open Knowledge Graph with Machine Readable Scientific Software Metadata. International
Conference on eScience, San Diego, USA. 2019
EOSC Symposium: Infrastructure for quality research software
Conclusions
Research Software Metadata should be actionable and useful for:
• Understanding the differences between two or more software
components
• Help portability (ROs)
• Add components in workflows (CWL + ROs)
• Help linking similar software methods
• Build automated comparison benchmarks
• Reduce the time needed to understand and adopt an existing
software component
• Author credit
8
EOSC Symposium: Infrastructure for quality research software
Questions?
Let's create machine-actionable software metadata
9
Image credit: https://icons8.com/icons/
+
findable
portable
comparable
executable
reusable
Code +
documentation
Automated
extraction
Knowledge
Graphs
EOSC Symposium: Infrastructure for quality research software
Acknowledgements: Yolanda Gil, Deborah Khider, Varun Ratnakar, Maximiliano Osorio,
Hernan Vargas, Oscar Corcho
SOMEF

More Related Content

PPTX
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
PDF
SOMEF: a metadata extraction framework from software documentation
PPTX
A Template-Based Approach for Annotating Long-Tailed Datasets
PDF
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
PPTX
Towards Knowledge Graphs of Reusable Research Software Metadata
PPTX
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
PDF
FAIR Workflows: A step closer to the Scientific Paper of the Future
PPTX
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
SOMEF: a metadata extraction framework from software documentation
A Template-Based Approach for Annotating Long-Tailed Datasets
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
Towards Knowledge Graphs of Reusable Research Software Metadata
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
FAIR Workflows: A step closer to the Scientific Paper of the Future
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...

What's hot (20)

PDF
Coming to terms to FAIR semantics
PPTX
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
PPTX
FAIRer Research
PDF
OKE2018 Challenge @ ESWC2018
PDF
FAIRness through a novel combination of Web technologies
PPTX
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
PDF
A Comparative analysis of Graph Databases vs Relational Database
PPTX
Scientific Units in the Electronic Age
PPTX
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
PDF
It Takes a Village to Grow ORCIDs on Campus: Establishing and Integrating Uni...
PPTX
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
PPTX
Mining Sociotechnical Information From Software Repositories
PDF
A Guide for Reproducible Research
PDF
Capturing the context: one small(ish step for modellers, one giant leap for m...
PPTX
Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...
PPTX
Liberating Laboratory Data - Eureka
PPTX
Working with RDF in Jupyter Notebooks: some lessons in getting rid of Excel f...
DOCX
2016 Summer - Araport Project Overview Leaflet
PDF
v2_Shikha_Gupta_Resume
PPTX
A guided tour of Araport
Coming to terms to FAIR semantics
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
FAIRer Research
OKE2018 Challenge @ ESWC2018
FAIRness through a novel combination of Web technologies
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
A Comparative analysis of Graph Databases vs Relational Database
Scientific Units in the Electronic Age
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
It Takes a Village to Grow ORCIDs on Campus: Establishing and Integrating Uni...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Mining Sociotechnical Information From Software Repositories
A Guide for Reproducible Research
Capturing the context: one small(ish step for modellers, one giant leap for m...
Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...
Liberating Laboratory Data - Eureka
Working with RDF in Jupyter Notebooks: some lessons in getting rid of Excel f...
2016 Summer - Araport Project Overview Leaflet
v2_Shikha_Gupta_Resume
A guided tour of Araport
Ad

Similar to Towards Reusable Research Software (20)

PDF
OntoSoft: A Distributed Semantic Registry for Scientific Software
PPTX
Linking Software: citations, roles, references and more
PPTX
Software Repositories for Research-- An Environmental Scan
PPTX
20171003 lancaster data conversations Chue-Hong
PPTX
Better software, better service, better research: The Software Sustainabilit...
PPTX
Software Citation in Theory and Practice
PDF
Towards FAIR principles for research software @ FAIR Software Session, Nation...
PPTX
Software Sustainability: Better Software Better Science
PDF
Research software susainability
PPTX
Better Software, Better Research
PPTX
Software Sustainability Institute
PPTX
Doing Science Properly In The Digital Age - Rutgers Seminar
PPTX
The Research Software Encyclopedia
PPTX
The Research Object Initiative: Frameworks and Use Cases
PDF
Software Metadata: Describing "dark software" in GeoSciences
PPTX
Software Repositories for Research -- An Environmental Scan
PPT
Sustainability Training Workshop - Intro to the SSI
PPTX
Governance Software Systems_ Managing and Governing Your Data Assets.pptx
PPTX
Better Software, Better Practices, Better Research
PPT
Workshop Presentation
OntoSoft: A Distributed Semantic Registry for Scientific Software
Linking Software: citations, roles, references and more
Software Repositories for Research-- An Environmental Scan
20171003 lancaster data conversations Chue-Hong
Better software, better service, better research: The Software Sustainabilit...
Software Citation in Theory and Practice
Towards FAIR principles for research software @ FAIR Software Session, Nation...
Software Sustainability: Better Software Better Science
Research software susainability
Better Software, Better Research
Software Sustainability Institute
Doing Science Properly In The Digital Age - Rutgers Seminar
The Research Software Encyclopedia
The Research Object Initiative: Frameworks and Use Cases
Software Metadata: Describing "dark software" in GeoSciences
Software Repositories for Research -- An Environmental Scan
Sustainability Training Workshop - Intro to the SSI
Governance Software Systems_ Managing and Governing Your Data Assets.pptx
Better Software, Better Practices, Better Research
Workshop Presentation
Ad

More from dgarijo (19)

PPTX
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
PPTX
Towards Human-Guided Machine Learning - IUI 2019
PPTX
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
PPTX
WIDOCO: A Wizard for Documenting Ontologies
PPTX
Towards Automating Data Narratives
PDF
Automated Hypothesis Testing with Large Scale Scientific Workflows
PDF
OEG tools for supporting Ontology Engineering
PPTX
Reproducibility Using Semantics: An Overview
PPTX
PhD Thesis: Mining abstractions in scientific workflows
PPTX
Publicación de datos y métodos científicos en investigación
PPTX
EDBT 2015: Summer School Overview
PDF
Similarity in Wikipedia Articles (EDBT Summer School)
PPTX
Semantic web 101: Benefits for geologists
PPTX
Is preserving data enough? Towards the preservation of scientific methods
PPTX
Creating abstractions from scientific workflows: PhD symposium 2015
PDF
Towards Workflow Ecosystems Through Semantic and Standard Representations
PDF
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users
PDF
Frag Flow: Automated Fragment Detection in Scientific Workflows
PPTX
User requirments for geospatial provenance
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
Towards Human-Guided Machine Learning - IUI 2019
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
WIDOCO: A Wizard for Documenting Ontologies
Towards Automating Data Narratives
Automated Hypothesis Testing with Large Scale Scientific Workflows
OEG tools for supporting Ontology Engineering
Reproducibility Using Semantics: An Overview
PhD Thesis: Mining abstractions in scientific workflows
Publicación de datos y métodos científicos en investigación
EDBT 2015: Summer School Overview
Similarity in Wikipedia Articles (EDBT Summer School)
Semantic web 101: Benefits for geologists
Is preserving data enough? Towards the preservation of scientific methods
Creating abstractions from scientific workflows: PhD symposium 2015
Towards Workflow Ecosystems Through Semantic and Standard Representations
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users
Frag Flow: Automated Fragment Detection in Scientific Workflows
User requirments for geospatial provenance

Recently uploaded (20)

PPTX
web development for engineering and engineering
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PDF
Well-logging-methods_new................
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
PPT on Performance Review to get promotions
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Digital Logic Computer Design lecture notes
PPTX
Welding lecture in detail for understanding
PPTX
Geodesy 1.pptx...............................................
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
web development for engineering and engineering
Arduino robotics embedded978-1-4302-3184-4.pdf
Well-logging-methods_new................
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPT on Performance Review to get promotions
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Digital Logic Computer Design lecture notes
Welding lecture in detail for understanding
Geodesy 1.pptx...............................................
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
UNIT 4 Total Quality Management .pptx
Internet of Things (IOT) - A guide to understanding
Operating System & Kernel Study Guide-1 - converted.pdf
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
OOP with Java - Java Introduction (Basics)
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...

Towards Reusable Research Software

  • 1. Towards Reusable Research Software Daniel Garijo Verdejo @dgarijov daniel.garijo@upm.es Ontology Engineering Group Departamento de Inteligencia Artificial Facultad de Informática Universidad Politécnica de Madrid
  • 2. Reproducibility: Open Research Data, Software and Methods 2 Scientific publication Research Data Research Software Research Methods EOSC Symposium: Infrastructure for quality research software
  • 3. Challenges for (Re)using and Sharing Research Software 3 • What does the software component do? Which of its methods should I use? • How to transform my data to use the software component? • How to interpret the results produced by the software component? • How to invoke the software component? • How to configure the software component with the right parameters? • How to compare against similar methods? Software designer Software user • How to ease capturing the dependencies and installation instructions of my software? • How to encapsulate my software so it can be used with other data? • How to describe my software so it can be used by others? • How to test if my software is ready to be used by others? EOSC Symposium: Infrastructure for quality research software
  • 4. Community Initiatives and Standards • Describing Research Software • Schema.org & Codemeta • Common Worflow Language (I/O) • Packaging Research Artefacts (incl. software) • Research Objects (RO-Crate) • Aggregators (OpenAIRE, EOSC) • General (e.g., Zenodo) & domain-specific registries • Scicodes (https://scicodes.net/) 4 Nine Best Practices for Research Software Registries and Repositories: A Concise Guide https://arxiv.org/abs/2012.13117 EOSC Symposium: Infrastructure for quality research software
  • 5. Adopting annotation vocabularies: where are we at? Software metadata is not abundant machine readable 5 EOSC Symposium: Infrastructure for quality research software Can you please describe your software component with metadata? I already did! Did you read the project readme? Did you see the online documentation? Perhaps the you saw the paper? Many domain-specific registries are curated by hand by experts
  • 6. Automated Software Metadata Extraction 6 SOMEF SOftware Metadata Extraction Framework https://github.com/KnowledgeCaptureAndDiscovery/somef/ [Mao et al 2019]: SoMEF: A Framework for Capturing Software Metadata from its Documentation. 2019 IEEE BigData REU Symposium. Los Angeles, 2019 EOSC Symposium: Infrastructure for quality research software Code repository (readme) Machine-readable file with software metadata: • > 20 common metadata fields • Installation instructions, description, invocation command, license, author, citation, requirements, examples, documentation, notebooks, etc. • Analysis of readme and supp. Files (e.g., notebooks, Dockerfiles) • JSON, RDF(graph), Codemeta, RO (in progress)
  • 7. Leveraging Software Metadata to create Knowledge Graphs 7 Explore input/output variables (interoperability) Explore Software I/O files (composition) Knowledge Graphs with can link RS and its components. OKG-Soft: machine-readable Software Metadata: • (From Schema.org) Attribution, license, funding, usage examples... • Executable software components • Software invocation • Input & output files, variables and units • Containers used to encapsulate and run software components • Parameter validation and suggestion [Garijo et al 2019]: OKG-Soft: An Open Knowledge Graph with Machine Readable Scientific Software Metadata. International Conference on eScience, San Diego, USA. 2019 EOSC Symposium: Infrastructure for quality research software
  • 8. Conclusions Research Software Metadata should be actionable and useful for: • Understanding the differences between two or more software components • Help portability (ROs) • Add components in workflows (CWL + ROs) • Help linking similar software methods • Build automated comparison benchmarks • Reduce the time needed to understand and adopt an existing software component • Author credit 8 EOSC Symposium: Infrastructure for quality research software
  • 9. Questions? Let's create machine-actionable software metadata 9 Image credit: https://icons8.com/icons/ + findable portable comparable executable reusable Code + documentation Automated extraction Knowledge Graphs EOSC Symposium: Infrastructure for quality research software Acknowledgements: Yolanda Gil, Deborah Khider, Varun Ratnakar, Maximiliano Osorio, Hernan Vargas, Oscar Corcho SOMEF