SlideShare a Scribd company logo
Towards reproducible research
and maximally-open data
Pablo Bernabeu
OSCG Open Scholarship Prize Competition, 14th May 2021
Psycholinguistics
> Conceptual processing
Reanalysis of data from
Hutchison et al. (2013)
How does the brain process the
meaning of words?
Statistical regularities in language
as well as perceptual, motor and
emotional information.
How does this process vary in
different contexts and across
different people?
2
Open information at each stage of research
Design: preregistration of theoretical background and methodological
protocol.
Development: procedural issues (corrections, errors, other changes)
that bear on the final materials.
Completion: data collection software, raw data, processed data, final
data set, analysis code and results.
S
h
a
r
e
d
i
n
m
y
P
h
D
S
h
a
r
e
d
i
n
m
y
M
A
3
Application of
open science
My research
Community: open-source code workshops and software
Open data
Bernabeu et al. (2017), Bernabeu (2018)
• All materials from the completion
stage: experiment administration
software, raw data, processed
data, final data sets, analysis code
and results.
• Development stage proceedings
reported.
• Readme files describing the data
sets and linking to resources such
as data dashboards.
4
Maximally-open data
Bernabeu et al. (2017), Bernabeu (2018)
• R-based web applications open to scientists and the general public.
• Easy visualization of the variance and inspection of procedural aspects such as
trimming, adjustments, changes. Quicker usage of the data (see blog post).
5
Preregistration, power analysis and open data
Chen et al. (2018)
• Prereg.: https://psyarxiv.com/t2pjv/
• Video demonstration of the lab
procedure: https://osf.io/h36wr/
• Data: https://osf.io/waf48/
6
Preregistration
and open data
Bernabeu et al. (2021)
• Specification of the theoretical
background and the methodological
protocol for a forthcoming study.
• Integration of FAIR data from several,
large-sample studies
• Large, secondary data = valuable
alternative to small, noisy samples
(see Loken & Gelman, 2017).
7
Power analysis: How many participants required?
For next study, the preregistration will include a power analysis based on two large-
sample pilots (combined, FAIR data sets), using power curves based on Monte Carlo
simulations (simr R package).
Preliminary curves below (pending more simulations for a greater accuracy).
Y axis = power for a certain effect; X axis = 1 to 312 participants.
8
R workshops
Workshops and presentations on data
visualisation and analysis in R since 2018,
mostly in the context of a fellowship from
the Software Sustainability Institute.
• http://pablobernabeu.github.io/#workshops
• https://github.com/pablobernabeu/Data-is-present
Blogging
Several blog posts on psycholinguistic research,
open science and statistics.
http://pablobernabeu.github.io/blog
9
More open-source web applications
for research and teaching
Experimental data simulation WebVTT caption transcription
https://github.com/pablobernabeu/Experimental-data-simulation https://github.com/pablobernabeu/VTT-Transcription-App
10
Concluding thoughts
• Attainable for early-career researcher: individual and community applications
Some win-win benefits
• Open science is a framework, rather than an all-or-nothing result.
Design stage: my preregistrations could be even more precise (see Bakker et al., 2020).
Development stage: my procedures could be even more open.
Completion stage: my materials could be even more easily accessible.
• Let’s not eschew studies that report adjustments or errors.
If errare humanum est, by definition (see Bakker et al., 2020), how many spotless studies should
there naturally be in journals?
• Reward structures (e.g., promotion) still often prioritise number of publications.
11
References
Bakker, M., Veldkamp, C. L., van Assen, M. A., Crompvoets, E. A., Ong, H. H., Nosek, B. A., Soderberg, C. K., Mellor,
D., & Wicherts, J. M. (2020). Ensuring the quality and specificity of preregistrations. PLoS Biology, 18(12),
e3000937. https://doi.org/10.1371/journal.pbio.3000937
Bernabeu, P. (2018). Dutch modality exclusivity norms for 336 properties and 411 concepts. PsyArXiv.
https://psyarxiv.com/s2c5h
Bernabeu, P., Lynott, D., & Connell, L. (2021). Preregistration: The interplay between linguistic and embodied
systems in conceptual processing. OSF. https://osf.io/ftydw/
Bernabeu, P., Willems, R. M., & Louwerse, M. M. (2017). Modality switch effects emerge early and increase
throughout conceptual processing: evidence from ERPs. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. J.
Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (pp. 1629-1634).
Austin, TX: Cognitive Science Society. https://mindmodeling.org/cogsci2017/papers/0318
Chen, S., Szabelska, A., Chartier, C. R., Kekecs, Z., Lynott, D., Bernabeu, P., … Schmidt, K. (2018). Investigating object
orientation effects across 14 languages. PsyArXiv. https://doi.org/10.31234/osf.io/t2pjv/
Hutchison, K. A., Balota, D. A., Neely, J. H., Cortese, M. J., Cohen-Shikora, E. R., Tse, C.-S., Yap, M. J., Bengson, J. J.,
Niemeyer, D., & Buchanan, E. (2013). The semantic priming project. Behavior Research Methods, 45, 1099–1114.
https://doi.org/10.3758/s13428-012-0304-z
Loken, E., & Gelman, A. (2017). Measurement error and the replication crisis. Science, 355(6325), 584-585.
https://doi.org/10.1126/science.aal3618
12
Thank you to OSCG, the sponsors and the audience!
Also, thank you to my mentors and everyone else who has
contributed to my research.
13

More Related Content

PDF
Indexing data on the web a comparison of schema level indices for data search
PDF
Data legend dh_benelux_2017.key
PDF
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
PDF
Machine Learning in computational materials science: an overview, a primer, a...
PDF
Extracting and Making Use of Materials Data from Millions of Journal Articles...
PDF
Natural Language Processing for Materials Design - What Can We Extract From t...
PDF
KunGao_Resume.
PDF
Materials design using knowledge from millions of journal articles via natura...
Indexing data on the web a comparison of schema level indices for data search
Data legend dh_benelux_2017.key
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Machine Learning in computational materials science: an overview, a primer, a...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Natural Language Processing for Materials Design - What Can We Extract From t...
KunGao_Resume.
Materials design using knowledge from millions of journal articles via natura...

What's hot (20)

DOCX
Himansu sahoo resume-ds
PDF
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
PDF
Accelerating materials design through natural language processing
PDF
The Status of ML Algorithms for Structure-property Relationships Using Matb...
PDF
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
PPTX
EDF2012 Peter Boncz - LOD benchmarking SRbench
PDF
Assessing Factors Underpinning PV Degradation through Data Analysis
PPTX
The need for a transparent data supply chain
PPTX
More ways of symbol grounding for knowledge graphs?
PPTX
Content + Signals: The value of the entire data estate for machine learning
KEY
Panda Provenance
PDF
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
PPT
Berlin 6 Open Access Conference: Patrick Vandewalle
PDF
A First Step Towards Content Protecting Plagiarism Detection
PDF
Applications of Natural Language Processing to Materials Design
PPTX
Using Knowledge Graph for Promoting Cognitive Computing
PPTX
Big(ger) Data in Software Engineering
PDF
Drug Repurposing using Deep Learning on Knowledge Graphs
PDF
Open-source tools for generating and analyzing large materials data sets
Himansu sahoo resume-ds
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Accelerating materials design through natural language processing
The Status of ML Algorithms for Structure-property Relationships Using Matb...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
EDF2012 Peter Boncz - LOD benchmarking SRbench
Assessing Factors Underpinning PV Degradation through Data Analysis
The need for a transparent data supply chain
More ways of symbol grounding for knowledge graphs?
Content + Signals: The value of the entire data estate for machine learning
Panda Provenance
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Berlin 6 Open Access Conference: Patrick Vandewalle
A First Step Towards Content Protecting Plagiarism Detection
Applications of Natural Language Processing to Materials Design
Using Knowledge Graph for Promoting Cognitive Computing
Big(ger) Data in Software Engineering
Drug Repurposing using Deep Learning on Knowledge Graphs
Open-source tools for generating and analyzing large materials data sets
Ad

Similar to Towards reproducibility and maximally-open data (20)

PPTX
ChildBrain/Predictable summer school - Open Science
PPTX
CuttingEEG - Open Science, Open Data and BIDS for EEG
PDF
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
PPTX
Developing and sharing tools for bioelectromagnetic research
PPTX
Reproducible research: theory
PPTX
Open research in linguistics
PPTX
Reproducibility
PPTX
Bosman and Kramer Open Research: A 2024 NISO Training Series, Session Four: O...
PPTX
Does preregistration improve the interpretability and credibility of research...
PDF
Burger_SSIB_Open_Sci_NutriXiv_7_2019_draft
PPTX
Center for Open Science (COS) Preregistration
PDF
Open Data & Open Research Data Repositories
PDF
The State of Open Research Data
PDF
The State of Open Research Data - OpenCon 2014
PPTX
Not just for STEM: Open and reproducible research in the social sciences
PPTX
Open science, open data - FOSTER training, Potsdam
PPTX
sience 2.0 : an illustration of good research practices in a real study
PPTX
Open science LMU session contribution E Steyerberg 2jul20
PDF
Open data in ubi systems research - introduction to open science and open dat...
PPTX
OSFair2017 Training | Increasing Research Transparency using the Open Science...
ChildBrain/Predictable summer school - Open Science
CuttingEEG - Open Science, Open Data and BIDS for EEG
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
Developing and sharing tools for bioelectromagnetic research
Reproducible research: theory
Open research in linguistics
Reproducibility
Bosman and Kramer Open Research: A 2024 NISO Training Series, Session Four: O...
Does preregistration improve the interpretability and credibility of research...
Burger_SSIB_Open_Sci_NutriXiv_7_2019_draft
Center for Open Science (COS) Preregistration
Open Data & Open Research Data Repositories
The State of Open Research Data
The State of Open Research Data - OpenCon 2014
Not just for STEM: Open and reproducible research in the social sciences
Open science, open data - FOSTER training, Potsdam
sience 2.0 : an illustration of good research practices in a real study
Open science LMU session contribution E Steyerberg 2jul20
Open data in ubi systems research - introduction to open science and open dat...
OSFair2017 Training | Increasing Research Transparency using the Open Science...
Ad

Recently uploaded (20)

PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
2Systematics of Living Organisms t-.pptx
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
famous lake in india and its disturibution and importance
PPTX
BIOMOLECULES PPT........................
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
2. Earth - The Living Planet earth and life
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
lecture 2026 of Sjogren's syndrome l .pdf
Taita Taveta Laboratory Technician Workshop Presentation.pptx
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
AlphaEarth Foundations and the Satellite Embedding dataset
Classification Systems_TAXONOMY_SCIENCE8.pptx
. Radiology Case Scenariosssssssssssssss
2Systematics of Living Organisms t-.pptx
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
famous lake in india and its disturibution and importance
BIOMOLECULES PPT........................
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
HPLC-PPT.docx high performance liquid chromatography
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
2. Earth - The Living Planet Module 2ELS
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
2. Earth - The Living Planet earth and life
Introduction to Fisheries Biotechnology_Lesson 1.pptx
lecture 2026 of Sjogren's syndrome l .pdf

Towards reproducibility and maximally-open data

  • 1. Towards reproducible research and maximally-open data Pablo Bernabeu OSCG Open Scholarship Prize Competition, 14th May 2021
  • 2. Psycholinguistics > Conceptual processing Reanalysis of data from Hutchison et al. (2013) How does the brain process the meaning of words? Statistical regularities in language as well as perceptual, motor and emotional information. How does this process vary in different contexts and across different people? 2
  • 3. Open information at each stage of research Design: preregistration of theoretical background and methodological protocol. Development: procedural issues (corrections, errors, other changes) that bear on the final materials. Completion: data collection software, raw data, processed data, final data set, analysis code and results. S h a r e d i n m y P h D S h a r e d i n m y M A 3 Application of open science My research Community: open-source code workshops and software
  • 4. Open data Bernabeu et al. (2017), Bernabeu (2018) • All materials from the completion stage: experiment administration software, raw data, processed data, final data sets, analysis code and results. • Development stage proceedings reported. • Readme files describing the data sets and linking to resources such as data dashboards. 4
  • 5. Maximally-open data Bernabeu et al. (2017), Bernabeu (2018) • R-based web applications open to scientists and the general public. • Easy visualization of the variance and inspection of procedural aspects such as trimming, adjustments, changes. Quicker usage of the data (see blog post). 5
  • 6. Preregistration, power analysis and open data Chen et al. (2018) • Prereg.: https://psyarxiv.com/t2pjv/ • Video demonstration of the lab procedure: https://osf.io/h36wr/ • Data: https://osf.io/waf48/ 6
  • 7. Preregistration and open data Bernabeu et al. (2021) • Specification of the theoretical background and the methodological protocol for a forthcoming study. • Integration of FAIR data from several, large-sample studies • Large, secondary data = valuable alternative to small, noisy samples (see Loken & Gelman, 2017). 7
  • 8. Power analysis: How many participants required? For next study, the preregistration will include a power analysis based on two large- sample pilots (combined, FAIR data sets), using power curves based on Monte Carlo simulations (simr R package). Preliminary curves below (pending more simulations for a greater accuracy). Y axis = power for a certain effect; X axis = 1 to 312 participants. 8
  • 9. R workshops Workshops and presentations on data visualisation and analysis in R since 2018, mostly in the context of a fellowship from the Software Sustainability Institute. • http://pablobernabeu.github.io/#workshops • https://github.com/pablobernabeu/Data-is-present Blogging Several blog posts on psycholinguistic research, open science and statistics. http://pablobernabeu.github.io/blog 9
  • 10. More open-source web applications for research and teaching Experimental data simulation WebVTT caption transcription https://github.com/pablobernabeu/Experimental-data-simulation https://github.com/pablobernabeu/VTT-Transcription-App 10
  • 11. Concluding thoughts • Attainable for early-career researcher: individual and community applications Some win-win benefits • Open science is a framework, rather than an all-or-nothing result. Design stage: my preregistrations could be even more precise (see Bakker et al., 2020). Development stage: my procedures could be even more open. Completion stage: my materials could be even more easily accessible. • Let’s not eschew studies that report adjustments or errors. If errare humanum est, by definition (see Bakker et al., 2020), how many spotless studies should there naturally be in journals? • Reward structures (e.g., promotion) still often prioritise number of publications. 11
  • 12. References Bakker, M., Veldkamp, C. L., van Assen, M. A., Crompvoets, E. A., Ong, H. H., Nosek, B. A., Soderberg, C. K., Mellor, D., & Wicherts, J. M. (2020). Ensuring the quality and specificity of preregistrations. PLoS Biology, 18(12), e3000937. https://doi.org/10.1371/journal.pbio.3000937 Bernabeu, P. (2018). Dutch modality exclusivity norms for 336 properties and 411 concepts. PsyArXiv. https://psyarxiv.com/s2c5h Bernabeu, P., Lynott, D., & Connell, L. (2021). Preregistration: The interplay between linguistic and embodied systems in conceptual processing. OSF. https://osf.io/ftydw/ Bernabeu, P., Willems, R. M., & Louwerse, M. M. (2017). Modality switch effects emerge early and increase throughout conceptual processing: evidence from ERPs. In G. Gunzelmann, A. Howes, T. Tenbrink, & E. J. Davelaar (Eds.), Proceedings of the 39th Annual Conference of the Cognitive Science Society (pp. 1629-1634). Austin, TX: Cognitive Science Society. https://mindmodeling.org/cogsci2017/papers/0318 Chen, S., Szabelska, A., Chartier, C. R., Kekecs, Z., Lynott, D., Bernabeu, P., … Schmidt, K. (2018). Investigating object orientation effects across 14 languages. PsyArXiv. https://doi.org/10.31234/osf.io/t2pjv/ Hutchison, K. A., Balota, D. A., Neely, J. H., Cortese, M. J., Cohen-Shikora, E. R., Tse, C.-S., Yap, M. J., Bengson, J. J., Niemeyer, D., & Buchanan, E. (2013). The semantic priming project. Behavior Research Methods, 45, 1099–1114. https://doi.org/10.3758/s13428-012-0304-z Loken, E., & Gelman, A. (2017). Measurement error and the replication crisis. Science, 355(6325), 584-585. https://doi.org/10.1126/science.aal3618 12
  • 13. Thank you to OSCG, the sponsors and the audience! Also, thank you to my mentors and everyone else who has contributed to my research. 13