SlideShare a Scribd company logo
May 2016
© 2016 IEEE
Importance and Challenges of
Reproducible Research
Vladimir Kanchev
vladimir.kanchev@ieee.org
*
* http://www.software.ac.uk/blog/2014-03-21-reproducible-
research-impossible-dream
Slide 2
Slide 3
Agenda
1. Personal introduction
2. Introduction to Reproducible Research (RR)
3. Software tools
4. The context – personal experience
5. The situation in Bulgaria and abroad
6. Additional resources for RR
7. Discussion
Slide 4
Agenda
1. Personal introduction
2. Introduction to Reproducible Research (RR)
3. Software tools
4. The context – personal experience
5. The situation in Bulgaria and abroad
6. Additional resources for RR
7. Discussion
Slide 5
Personal Introduction
• Defense of my Ph.D. thesis at TU-Sofia is pending
• Research in image/MR image segmentation
• Publications in peer-reviewed journals
• Some experience in industry
Slide 6
Agenda
1. Personal introduction
2. Introduction to Reproducible Research (RR)
3. Software tools
4. The context – personal experience
5. The situation in Bulgaria and abroad
6. Additional resources for RR
7. Discussion
Slide 7
Introduction to Reproducible Research
Definitions
Reproducible Research (RR) is an
approach aiming at complementing classical
printed scientific articles with everything required
to independently reproduce the results they
present *. "Everything" covers:
• data
• computer codes
• a precise description of how the code was
applied to the data
* Delescluse, Matthieu, et al. "Making neurophysiological data
analysis reproducible: Why and how?" Journal of Physiology-
Paris 106.3 (2012):159-170.
Introduction to Reproducible Research
Definitions
Another definition (Signal Processing):
An article about computational science in a
scientific publication is not the scholarship itself, it
is merely advertising of the scholarship. The
actual scholarship is the complete software
development environment and the complete set of
instructions which generated the figures*.
D. Donoho
* D. Donoho et al., “Reproducible Research in Computational
Harmonic Analysis,” Computing in Science & Eng., vol. 11, no. 1,
2009, pp. 8–18
Slide 6
Slide 9
Introduction to Reproducible Research
Definitions
• Replication – independent people going out and
collecting new data to verify research* (Roger
Peng). It is considered the scientific golden
standard.
• Reproduction – independent people analyze the
same data and produce the same result* .
Focus on validity of data analysis. (Roger Peng)
* http://simplystatistics.org/2011/12/02/reproducible-research-in-
computational-science/
Introduction to Reproducible Research
Definitions
*
* Peng, R. D. (2011). Reproducible research in computational
science. Science (New York, Ny), 334(6060), 1226.
Slide 8
Slide 11
Introduction to Reproducible Research
History
The RR “movement" started with what
economists have been calling replication since
the early 1980s to reach what is now called
reproducible research in computational data
analysis. Currently, it is influenced by the open
science and open source movement.
Slide 12
Introduction to Reproducible Research
Relation to scientific method
Steps of a scientific method *:
1. Define a question
2. Observe – gather information and resources
3. Form an explanatory hypothesis
4. Test the hypothesis by performing an experiment and
collecting data in a reproducible manner
5. Analyze the data
6. Interpret the data and draw a conclusion
7. Publish results
8. Retest (reproduce) from other researchers
* Crawford S, Stucki L (1990), "Peer review and the changing research
record", "J Am Soc Info Science", vol. 41, pp. 223–228
The steps related to the Reproducible Research are in italic type
* https://scischol102.wordpress.com/category/science/
* *
Slide 11
Slide 14
Introduction to Reproducible Research
Relation to scientific method
Principles of a scientific method:
1. Empirically testable
2. Replicable
3. Objective
4. Transparent
5. Falsifiable
6. Logically consistent
Slide 15
Introduction to Reproducible Research
Scheme
*
* http://www.biostat.jhsph.edu/~rpeng/research.html (mod.)
Slide 16
Introduction to Reproducible Research
Current situation
Current situation with RR in different fields:
• Medicine (cancer research), social sciences
(psychology), etc.
Replication/Reproducibility crisis – the results of
scientific experiments are impossible to
replicate
• Natural sciences
• Computer science
* Baker, M. (2016). 1,500 scientists lift the lid on
reproducibility. Nature,533(7604), 452-454.
*
Slide 15
Slide 18
Introduction to Reproducible Research
Current situation
Reproducibility in Medical imaging &
Computer vision & Machine learning:
• Public test sets available
• Most method codes are available (papers from
major conferences and journals)
• High pressure/workload on researchers to
make their work reproducible
Slide 19
Introduction to Reproducible Research
Current situation
Reproducibility in Medical imaging &
Computer vision & Machine learning (cont.):
• Benchmark comparison with other methods -
compulsory
• Experiment automation
• Differences between Medical imaging vs.
Computer vision & Machine learning fields
Example: IPOL journal
Slide 20
Introduction to Reproducible Research
Reasons
Reasons for reproducibility/replication crisis:
• “Publish or perish” culture - pressure to obtain
publishable results
• Uneasiness to make method codes public –
additional time and efforts to improve its quality
• Most graduate non-CS students are not taught in
software engineering and statistics courses
*
* Baker, M. (2016). 1,500 scientists lift the lid on
reproducibility. Nature,533(7604), 452-454.
Slide 21
Slide 22
Other problems:
• Insufficient description of the experiment in the
publications
• Test datasets and paper method codes not publicly
available – common in social sciences
• The used mathematical methods are inclined to
malpractices – p hacking (data dredging), failing to
report non-significant tests, inclusion/exclusion of
points/results until achieving the desired result
Introduction to Reproducible Research
Reasons
Slide 23
Introduction to Reproducible Research
Reasons
Problems with method code:
• Reproducibility issues – missing method data
and code, method code errors, not all figures
and tables are reproduced
• Documentation issues – missing README file,
bad code documentation
• Programming style issues – bad coding style
*
* Wolkovich, E. M., Regetz, J., & O'Connor, M. I. (2012). Advances in
global change research require open science by individual
researchers. Global Change Biology, 18(7), 2102-2110.
Slide 24
Introduction of Reproducible Research
Guidance (Biostatistics journal)
Authors should provide all data code in
order to reproduce all results, images and
tables with:
• README file
• Consistent coding style and documentation
• Test data sets
• Simulations and random numbers
• General advice
* Peng, R. D. (2009). Reproducible research and
biostatistics. Biostatistics,10(3), 405-408.
Slide 25
Slide 26
Agenda
1. Personal introduction
2. Introduction to Reproducible Research (RR)
3. Software tools
4. The context – personal experience
5. The situation in Bulgaria and abroad
6. Additional resources for RR
7. Discussion
Slide 27
Software tools
Recommended programs to use to achieve
reproducibility:
• Latex (Tex editor)
• Version control systems - Git software systems
• Make – pipeline
Literate programming concept (Knuth).
Slide 28
Software tools
Matlab programming language:
• Matlab file exchange
• Proprietary Matlab toolboxes - disadvantages
• Examples of RR toolboxes - Wavelab,
Sparselab
• Matlab publish – no literate programming
support
Slide 29
Software tools
R programming language:
• R studio – development environment for R
programming language
• Graphic packages, such as ggplot2
• Packages as knitr or rmarkdown – literate
programming support
Slide 30
Software tools
Python programming language:
• Many open scientific libraries available – scipy,
numpy, etc.
• IPython notebook
• Sumatra package – save parameter values,
code state, output results and files
* ISMB/ECCB 2013 Keynote
*
Slide 31
Slide 32
Agenda
1. Personal introduction
2. Introduction to Reproducible Research (RR)
3. Software tools
4. The context – personal experience
5. The situation in Bulgaria and abroad
6. Additional resources for RR
7. Discussion
Slide 33
The context – personal experience
Making a current research project reproducible
at the end of the process is not the best way ….
* http://www.idiap.ch/~marcel/professional/BTAS_SS_2015.html
*
The context – personal experience
Difficulties with:
• Exact reproduction of all figures and results
• Exact parameter values setting
• Time to improve code quality and add
documentation
Slide 34
Slide 35
The context – personal experience
Motivation for achieving reproducibility:
• Better visibility of research
• More citations and higher impact
• Increased trust in research quality (outside
academia, e.g. from industry)
• Help from readers of the publication with the
improvement of the developed method
Slide 36
Agenda
1. Personal introduction
2. Introduction to Reproducible Research (RR)
3. Software tools
4. The context – personal experience
5. The situation in Bulgaria and abroad
6. Additional resources for RR
7. Discussion
Slide 37
The situation in Bulgaria and abroad
RR in Bulgaria:
• Its introduction in the scientific community is still
at the beginning
• Its principles need to be taught at under-
graduate and graduate level
• Paper code and test datasets, in general,
are not available online in most fields
Slide 38
The situation in Bulgaria and abroad
Advances of RR implementation would:
• Increase the impact of research conducted by
Bulgarian researchers abroad
• Improve reputation and applicability – especially
to people from industry
• Faster distinction of quality work and steady
improvement of lower quality papers
Slide 39
The situation in Bulgaria and abroad
Advances of RR implementation (cont.):
• Profit from the fast development of scientific
computing, machine learning, data science,
and AI
• Attract more bright young people in research
(open source movement and open data)
Slide 40
The situation in Bulgaria and abroad
RR abroad:
• A great issue in social and biomedical sciences
• An important criterion for manuscript evaluation
from reviewers in many CS fields
• One of major requirements of funding agencies
abroad for the evaluation of project proposals
Slide 41
Agenda
1. Personal introduction
2. Introduction to Reproducible Research (RR)
3. Software tools
4. The context – personal experience
5. The situation in Bulgaria and abroad
6. Additional resources for RR
7. Discussion
Slide 42
Additional resources for research
and RR methods
MOOC courses:
1. Data science specialization (www.coursera.org) (John
Hopkins University) – course 5 Reproducible research
2. Methods and Statistics in Social Sciences Specialization
(www.coursera.org) (University of Amsterdam)
3. Research Methods: An Engineering Approach
(www.edx.org) (Wits University )
4. Research Data Management and Sharing
(www.coursera.org) (The University of North Carolina at
Chapel Hill & The University of Edinburgh)
Slide 43
Additional resources for research
and RR methods
Software tools for RR:
1. Software carpentry (www.Software-carpentry.org) – basic
computing skills for researchers
2. Bootcamps - one or two day long courses – teaching coding
and professional skills for researchers.
3. MOOC courses - www.coursera.org, www.edx.org,
www.udacity.org - for programming skills in R, Python,
Matlab.
Slide 44
Additional resources for research
and RR methods
Books:
1. Stodden, V., Leisch, F., & Peng, R. D. (Eds.)
(2014). Implementing Reproducible Research. CRC Press
2. Gandrud, C. (2013). Reproducible Research with R and R
Studio. CRC Press
3. Subramanian, G. (2015). Python Data Science Cookbook.
Packt Publishing Ltd
4. Milovanovic, I., Foures, D., & Vettigli, G. (2015). Python Data
Visualization Cookbook. Packt Publishing Ltd
Slide 45
Agenda
1. Personal introduction
2. Introduction to Reproducible Research (RR)
3. Software tools
4. The context – personal experience
5. The situation in Bulgaria and abroad
6. Additional resources for RR
7. Discussion
Slide 46
Discussion
Topics for discussion:
• What do you think about reproducibility,
in general?
• Have you already met RR in your work?
• How the application of reproducibility might
impact your work as researchers, engineers, or
programmers?
Slide 47
End

More Related Content

PPTX
Data collection and enhancement
DOCX
Customer Discovery Skills
PPT
Elements Of An Effective Quality Management System
PPT
Continuous improvement and tqm [short version]
PDF
Creating awesome value proposition using Value Proposition Canvas
PPTX
VTU MBA-TQM 12MBA42 Module 1
PPTX
Reproducible research: theory
PDF
GARNet workshop on Integrating Large Data into Plant Science
Data collection and enhancement
Customer Discovery Skills
Elements Of An Effective Quality Management System
Continuous improvement and tqm [short version]
Creating awesome value proposition using Value Proposition Canvas
VTU MBA-TQM 12MBA42 Module 1
Reproducible research: theory
GARNet workshop on Integrating Large Data into Plant Science

Similar to Importance and Challenges of Reproducible Research (20)

PDF
Digital Scholar Webinar: Open reproducible research
PDF
Open reproducible research
PPTX
Intro to Reproducible Research
PPTX
Reproducibility (and the R*) of Science: motivations, challenges and trends
PPTX
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
PPTX
What is Reproducibility? The R* brouhaha and how Research Objects can help
PPT
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
PDF
Reproducibility by Other Means: Transparent Research Objects
PPT
Reproducibility challenges in computational settings: what are they, why shou...
PPTX
Reproducible research concepts and tools
PPTX
Research Objects for FAIRer Science
PPT
Results may vary: Collaborations Workshop, Oxford 2014
PDF
Reproducibility 1
PPTX
Reproducible research
PDF
Open & reproducible research - What can we do in practice?
PPTX
Reproducible Data Science with R
PPT
Berlin 6 Open Access Conference: Sergey Fomel
PPTX
Reproducibility
PDF
Five selfish reasons to work reproducibly
PDF
Is the current measure of excellence perverting Science? A Data deluge is com...
Digital Scholar Webinar: Open reproducible research
Open reproducible research
Intro to Reproducible Research
Reproducibility (and the R*) of Science: motivations, challenges and trends
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha and how Research Objects can help
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
Reproducibility by Other Means: Transparent Research Objects
Reproducibility challenges in computational settings: what are they, why shou...
Reproducible research concepts and tools
Research Objects for FAIRer Science
Results may vary: Collaborations Workshop, Oxford 2014
Reproducibility 1
Reproducible research
Open & reproducible research - What can we do in practice?
Reproducible Data Science with R
Berlin 6 Open Access Conference: Sergey Fomel
Reproducibility
Five selfish reasons to work reproducibly
Is the current measure of excellence perverting Science? A Data deluge is com...
Ad

More from Vladimir Kanchev (8)

PDF
GenAI Agents: Major Applications (Part1)
PPTX
Ethical Issues in Machine Learning Algorithms. (Part 3)
PPTX
Ethical Issues in Machine Learning Algorithms (Part 2)
PPTX
Ethical Issues in Machine Learning Algorithms. (Part 1)
PPTX
Tissue segmentation methods using 2D histogram matching in a sequence of mr b...
PPTX
Tissue Segmentation Methods Using 2D Histogram Matching in a Sequence of MR B...
PPTX
Tissue Segmentation Methods using 2D Hiistogram Matching in a Sequence of MR ...
PPTX
Tissue Segmentation Methods Using 2D Histogram Matching in a Sequence of MR B...
GenAI Agents: Major Applications (Part1)
Ethical Issues in Machine Learning Algorithms. (Part 3)
Ethical Issues in Machine Learning Algorithms (Part 2)
Ethical Issues in Machine Learning Algorithms. (Part 1)
Tissue segmentation methods using 2D histogram matching in a sequence of mr b...
Tissue Segmentation Methods Using 2D Histogram Matching in a Sequence of MR B...
Tissue Segmentation Methods using 2D Hiistogram Matching in a Sequence of MR ...
Tissue Segmentation Methods Using 2D Histogram Matching in a Sequence of MR B...
Ad

Recently uploaded (20)

PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PPTX
Understanding the Circulatory System……..
PPTX
Fluid dynamics vivavoce presentation of prakash
PPT
Animal tissues, epithelial, muscle, connective, nervous tissue
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PPTX
A powerpoint on colorectal cancer with brief background
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PPTX
Introcution to Microbes Burton's Biology for the Health
PPTX
perinatal infections 2-171220190027.pptx
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PDF
Placing the Near-Earth Object Impact Probability in Context
PPT
6.1 High Risk New Born. Padetric health ppt
PPT
LEC Synthetic Biology and its application.ppt
PPTX
BODY FLUIDS AND CIRCULATION class 11 .pptx
PPTX
BIOMOLECULES PPT........................
PPTX
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PDF
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
Understanding the Circulatory System……..
Fluid dynamics vivavoce presentation of prakash
Animal tissues, epithelial, muscle, connective, nervous tissue
Hypertension_Training_materials_English_2024[1] (1).pptx
A powerpoint on colorectal cancer with brief background
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
Introcution to Microbes Burton's Biology for the Health
perinatal infections 2-171220190027.pptx
lecture 2026 of Sjogren's syndrome l .pdf
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
Placing the Near-Earth Object Impact Probability in Context
6.1 High Risk New Born. Padetric health ppt
LEC Synthetic Biology and its application.ppt
BODY FLUIDS AND CIRCULATION class 11 .pptx
BIOMOLECULES PPT........................
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw

Importance and Challenges of Reproducible Research

  • 1. May 2016 © 2016 IEEE Importance and Challenges of Reproducible Research Vladimir Kanchev vladimir.kanchev@ieee.org
  • 3. Slide 3 Agenda 1. Personal introduction 2. Introduction to Reproducible Research (RR) 3. Software tools 4. The context – personal experience 5. The situation in Bulgaria and abroad 6. Additional resources for RR 7. Discussion
  • 4. Slide 4 Agenda 1. Personal introduction 2. Introduction to Reproducible Research (RR) 3. Software tools 4. The context – personal experience 5. The situation in Bulgaria and abroad 6. Additional resources for RR 7. Discussion
  • 5. Slide 5 Personal Introduction • Defense of my Ph.D. thesis at TU-Sofia is pending • Research in image/MR image segmentation • Publications in peer-reviewed journals • Some experience in industry
  • 6. Slide 6 Agenda 1. Personal introduction 2. Introduction to Reproducible Research (RR) 3. Software tools 4. The context – personal experience 5. The situation in Bulgaria and abroad 6. Additional resources for RR 7. Discussion
  • 7. Slide 7 Introduction to Reproducible Research Definitions Reproducible Research (RR) is an approach aiming at complementing classical printed scientific articles with everything required to independently reproduce the results they present *. "Everything" covers: • data • computer codes • a precise description of how the code was applied to the data * Delescluse, Matthieu, et al. "Making neurophysiological data analysis reproducible: Why and how?" Journal of Physiology- Paris 106.3 (2012):159-170.
  • 8. Introduction to Reproducible Research Definitions Another definition (Signal Processing): An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures*. D. Donoho * D. Donoho et al., “Reproducible Research in Computational Harmonic Analysis,” Computing in Science & Eng., vol. 11, no. 1, 2009, pp. 8–18 Slide 6
  • 9. Slide 9 Introduction to Reproducible Research Definitions • Replication – independent people going out and collecting new data to verify research* (Roger Peng). It is considered the scientific golden standard. • Reproduction – independent people analyze the same data and produce the same result* . Focus on validity of data analysis. (Roger Peng) * http://simplystatistics.org/2011/12/02/reproducible-research-in- computational-science/
  • 10. Introduction to Reproducible Research Definitions * * Peng, R. D. (2011). Reproducible research in computational science. Science (New York, Ny), 334(6060), 1226. Slide 8
  • 11. Slide 11 Introduction to Reproducible Research History The RR “movement" started with what economists have been calling replication since the early 1980s to reach what is now called reproducible research in computational data analysis. Currently, it is influenced by the open science and open source movement.
  • 12. Slide 12 Introduction to Reproducible Research Relation to scientific method Steps of a scientific method *: 1. Define a question 2. Observe – gather information and resources 3. Form an explanatory hypothesis 4. Test the hypothesis by performing an experiment and collecting data in a reproducible manner 5. Analyze the data 6. Interpret the data and draw a conclusion 7. Publish results 8. Retest (reproduce) from other researchers * Crawford S, Stucki L (1990), "Peer review and the changing research record", "J Am Soc Info Science", vol. 41, pp. 223–228 The steps related to the Reproducible Research are in italic type
  • 14. Slide 14 Introduction to Reproducible Research Relation to scientific method Principles of a scientific method: 1. Empirically testable 2. Replicable 3. Objective 4. Transparent 5. Falsifiable 6. Logically consistent
  • 15. Slide 15 Introduction to Reproducible Research Scheme * * http://www.biostat.jhsph.edu/~rpeng/research.html (mod.)
  • 16. Slide 16 Introduction to Reproducible Research Current situation Current situation with RR in different fields: • Medicine (cancer research), social sciences (psychology), etc. Replication/Reproducibility crisis – the results of scientific experiments are impossible to replicate • Natural sciences • Computer science
  • 17. * Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature,533(7604), 452-454. * Slide 15
  • 18. Slide 18 Introduction to Reproducible Research Current situation Reproducibility in Medical imaging & Computer vision & Machine learning: • Public test sets available • Most method codes are available (papers from major conferences and journals) • High pressure/workload on researchers to make their work reproducible
  • 19. Slide 19 Introduction to Reproducible Research Current situation Reproducibility in Medical imaging & Computer vision & Machine learning (cont.): • Benchmark comparison with other methods - compulsory • Experiment automation • Differences between Medical imaging vs. Computer vision & Machine learning fields Example: IPOL journal
  • 20. Slide 20 Introduction to Reproducible Research Reasons Reasons for reproducibility/replication crisis: • “Publish or perish” culture - pressure to obtain publishable results • Uneasiness to make method codes public – additional time and efforts to improve its quality • Most graduate non-CS students are not taught in software engineering and statistics courses
  • 21. * * Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature,533(7604), 452-454. Slide 21
  • 22. Slide 22 Other problems: • Insufficient description of the experiment in the publications • Test datasets and paper method codes not publicly available – common in social sciences • The used mathematical methods are inclined to malpractices – p hacking (data dredging), failing to report non-significant tests, inclusion/exclusion of points/results until achieving the desired result Introduction to Reproducible Research Reasons
  • 23. Slide 23 Introduction to Reproducible Research Reasons Problems with method code: • Reproducibility issues – missing method data and code, method code errors, not all figures and tables are reproduced • Documentation issues – missing README file, bad code documentation • Programming style issues – bad coding style
  • 24. * * Wolkovich, E. M., Regetz, J., & O'Connor, M. I. (2012). Advances in global change research require open science by individual researchers. Global Change Biology, 18(7), 2102-2110. Slide 24
  • 25. Introduction of Reproducible Research Guidance (Biostatistics journal) Authors should provide all data code in order to reproduce all results, images and tables with: • README file • Consistent coding style and documentation • Test data sets • Simulations and random numbers • General advice * Peng, R. D. (2009). Reproducible research and biostatistics. Biostatistics,10(3), 405-408. Slide 25
  • 26. Slide 26 Agenda 1. Personal introduction 2. Introduction to Reproducible Research (RR) 3. Software tools 4. The context – personal experience 5. The situation in Bulgaria and abroad 6. Additional resources for RR 7. Discussion
  • 27. Slide 27 Software tools Recommended programs to use to achieve reproducibility: • Latex (Tex editor) • Version control systems - Git software systems • Make – pipeline Literate programming concept (Knuth).
  • 28. Slide 28 Software tools Matlab programming language: • Matlab file exchange • Proprietary Matlab toolboxes - disadvantages • Examples of RR toolboxes - Wavelab, Sparselab • Matlab publish – no literate programming support
  • 29. Slide 29 Software tools R programming language: • R studio – development environment for R programming language • Graphic packages, such as ggplot2 • Packages as knitr or rmarkdown – literate programming support
  • 30. Slide 30 Software tools Python programming language: • Many open scientific libraries available – scipy, numpy, etc. • IPython notebook • Sumatra package – save parameter values, code state, output results and files
  • 31. * ISMB/ECCB 2013 Keynote * Slide 31
  • 32. Slide 32 Agenda 1. Personal introduction 2. Introduction to Reproducible Research (RR) 3. Software tools 4. The context – personal experience 5. The situation in Bulgaria and abroad 6. Additional resources for RR 7. Discussion
  • 33. Slide 33 The context – personal experience Making a current research project reproducible at the end of the process is not the best way …. * http://www.idiap.ch/~marcel/professional/BTAS_SS_2015.html *
  • 34. The context – personal experience Difficulties with: • Exact reproduction of all figures and results • Exact parameter values setting • Time to improve code quality and add documentation Slide 34
  • 35. Slide 35 The context – personal experience Motivation for achieving reproducibility: • Better visibility of research • More citations and higher impact • Increased trust in research quality (outside academia, e.g. from industry) • Help from readers of the publication with the improvement of the developed method
  • 36. Slide 36 Agenda 1. Personal introduction 2. Introduction to Reproducible Research (RR) 3. Software tools 4. The context – personal experience 5. The situation in Bulgaria and abroad 6. Additional resources for RR 7. Discussion
  • 37. Slide 37 The situation in Bulgaria and abroad RR in Bulgaria: • Its introduction in the scientific community is still at the beginning • Its principles need to be taught at under- graduate and graduate level • Paper code and test datasets, in general, are not available online in most fields
  • 38. Slide 38 The situation in Bulgaria and abroad Advances of RR implementation would: • Increase the impact of research conducted by Bulgarian researchers abroad • Improve reputation and applicability – especially to people from industry • Faster distinction of quality work and steady improvement of lower quality papers
  • 39. Slide 39 The situation in Bulgaria and abroad Advances of RR implementation (cont.): • Profit from the fast development of scientific computing, machine learning, data science, and AI • Attract more bright young people in research (open source movement and open data)
  • 40. Slide 40 The situation in Bulgaria and abroad RR abroad: • A great issue in social and biomedical sciences • An important criterion for manuscript evaluation from reviewers in many CS fields • One of major requirements of funding agencies abroad for the evaluation of project proposals
  • 41. Slide 41 Agenda 1. Personal introduction 2. Introduction to Reproducible Research (RR) 3. Software tools 4. The context – personal experience 5. The situation in Bulgaria and abroad 6. Additional resources for RR 7. Discussion
  • 42. Slide 42 Additional resources for research and RR methods MOOC courses: 1. Data science specialization (www.coursera.org) (John Hopkins University) – course 5 Reproducible research 2. Methods and Statistics in Social Sciences Specialization (www.coursera.org) (University of Amsterdam) 3. Research Methods: An Engineering Approach (www.edx.org) (Wits University ) 4. Research Data Management and Sharing (www.coursera.org) (The University of North Carolina at Chapel Hill & The University of Edinburgh)
  • 43. Slide 43 Additional resources for research and RR methods Software tools for RR: 1. Software carpentry (www.Software-carpentry.org) – basic computing skills for researchers 2. Bootcamps - one or two day long courses – teaching coding and professional skills for researchers. 3. MOOC courses - www.coursera.org, www.edx.org, www.udacity.org - for programming skills in R, Python, Matlab.
  • 44. Slide 44 Additional resources for research and RR methods Books: 1. Stodden, V., Leisch, F., & Peng, R. D. (Eds.) (2014). Implementing Reproducible Research. CRC Press 2. Gandrud, C. (2013). Reproducible Research with R and R Studio. CRC Press 3. Subramanian, G. (2015). Python Data Science Cookbook. Packt Publishing Ltd 4. Milovanovic, I., Foures, D., & Vettigli, G. (2015). Python Data Visualization Cookbook. Packt Publishing Ltd
  • 45. Slide 45 Agenda 1. Personal introduction 2. Introduction to Reproducible Research (RR) 3. Software tools 4. The context – personal experience 5. The situation in Bulgaria and abroad 6. Additional resources for RR 7. Discussion
  • 46. Slide 46 Discussion Topics for discussion: • What do you think about reproducibility, in general? • Have you already met RR in your work? • How the application of reproducibility might impact your work as researchers, engineers, or programmers?

Editor's Notes

  • #8: IEEE members are still changing the world we live in.