SlideShare a Scribd company logo
OpenML
M A K I N G M A C H I N E L E A R N I N G M O R E R E P R O D U C I B L E
( A N D E A S I E R ) B Y B R I N G I N G I T O N L I N E
J O A Q U I N VA N S C H O R E N
R E P R O D U C I B I L I T Y I N M A C H I N E L E A R N I N G , I C M L 2 0 1 7
@open_ml
Research different.
Polymaths: Solve math problems
by massive online collaboration
Broadcast question, combine
many minds to solve it
Networked Science
Designed serendipity: what’s hard for one person is easy for another
Collaboration only scales if all friction is eliminated
Friction in machine learning
• Data hard to find, in different formats
• Code hard to find, difficult to use
• Published experiments are often not reproducible
• Reproducing my own results can be difficult
• No easy overview of state-of-the-art
• Benchmarks are often incomparable
• Experimentation is mostly solitary, offline
• Only afterwards, we share aggregated results
• Limited interaction, visibility in early stage
• Fear or getting scooped
Make machine learning as simple as possible
(but not simpler)
Use any (open-source) tool to try many algorithms,
Share code and experiments online
Find richly annotated datasets, import easily.
Or share your own.
Clear problem descriptions, evaluation protocols,
overview of state of the art.
Reproducible, transparent, reusable results
Organized for easy analysis and reuse
Frictionless online collaboration, track contributions
Easy sharing of data and results
W H AT I F W E C A N A N A LY S E D ATA
C O L L A B O R AT I V E LY
W H AT I F W E C A N A N A LY S E D ATA
C O L L A B O R AT I V E LY
O N W E B S C A L E
W H AT I F W E C A N A N A LY S E D ATA
C O L L A B O R AT I V E LY
O N W E B S C A L E I N R E A L T I M E
C O L L A B O R AT I V E M A C H I N E L E A R N I N G
Easy to use: Integrated in many ML tools/environments
Modular contribution: Share data, code, models, evaluations
Organized data: Meta-data, reproducible models, link to people
Reward structure: Track your impact, build reputation
Self-learning: Learn from many experiments to help people
OpenML
It starts with data
Data (ARFF) uploaded or referenced (URL)
auto-versioned, analysed, meta-data
extracted, organised online
It starts with data
auto-versioned, analysed, meta-data
extracted, organised online
Tasks contain data, goals, procedures.
Readable by tools, automates experimentation
Train-test
splits
Classify target X
All results organized online: realtime overview
Collaborate online
Train-test
splits
All results organized online: realtime overview
Classify target X
Flows (workflows, scripts) can run anywhere (locally)
Tool integrations + APIs (REST, R, Python, Java,…)
Use any ML library
Integrations + APIs (REST, R, Python, Java,…)
Integrations + APIs (REST, R, Python, Java,…)
Run wherever you want
reproducible, linked to data, flows, authors
and all other experiments
Experiments auto-uploaded, evaluated online
Analyse results objectively
Experiments auto-uploaded, evaluated online
Publish, and track your impact
Demo
OpenML Community
Nov-Dec 2016
3000+ registered,
2500 30-day active users
Studies
Online collaboration environment
Link to GitHub, Jupyter notebooks, wiki, Overleaf,…
Sharing settings
Share datasets, flows, studies with certain people.
Easily publish them later.
Code submissions, GitHub integration
Sharing versioned code, docker images, archiving
In progress…
Benchmarking suites
Curated datasets + all reproducible results
Meta-learning, learning to learn
Learn from all prior experiments -> bots
Learn across datasets, use OpenML as ‘memory’
Deep learning integration
Native integration in deep learning libraries
Store/analyse models (architecture, parameters)
In progress (all help welcome!)
Daily research
Support other machine learning tasks, tools
Support ML in other scientific domains
Auto-experimentation bot
Join us!
Open initiative, contribute via GitHub/Slack
Regular week-long hackathons
Bring your own data, tools
Resources, ideas welcome
Thank You
@open_ml
OpenML
Now try it yourself :)
www.openml.org

More Related Content

PPT
Tips and Tricks Staff CPD
PDF
An overview of my PhD research
PDF
Easy and affordable user testing - Front Trends 2017
PDF
syllabus
PDF
OpenML DALI
PDF
OpenML data@Sheffield
PDF
Open and Automated Machine Learning
PDF
OpenML Tutorial ECMLPKDD 2015
Tips and Tricks Staff CPD
An overview of my PhD research
Easy and affordable user testing - Front Trends 2017
syllabus
OpenML DALI
OpenML data@Sheffield
Open and Automated Machine Learning
OpenML Tutorial ECMLPKDD 2015

Similar to OpenML Reproducibility in Machine Learning ICML2017 (20)

PDF
OpenML NeurIPS2018
PPTX
Afternoons with Azure - Azure Machine Learning
 
PPTX
Teaching the cloud to think
PDF
OpenML Tutorial: Networked Science in Machine Learning
PDF
H2O World - Benchmarking Open Source ML Platforms - Szilard Pafka
PPT
Results may vary: Collaborations Workshop, Oxford 2014
PPTX
2014 nicta-reproducibility
PDF
Ethics reproducibility and data stewardship
PPTX
Creativity vs Best Practices
PDF
May 2021 Spark Testing ... or how to farm reputation on StackOverflow
PDF
Matlab for a computational PhD
PDF
Data science presentation
PPT
Collaborative Data Analysis with Taverna Workflows
PDF
Software Analytics - Achievements and Challenges
PPTX
Machine Learning with Spark
PPTX
Interpretable Machine Learning
PDF
Maintaining Large Scale Julia Ecosystems
PPTX
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
PPT
01.intro
PPTX
Nautral Langauge Processing - Basics / Non Technical
OpenML NeurIPS2018
Afternoons with Azure - Azure Machine Learning
 
Teaching the cloud to think
OpenML Tutorial: Networked Science in Machine Learning
H2O World - Benchmarking Open Source ML Platforms - Szilard Pafka
Results may vary: Collaborations Workshop, Oxford 2014
2014 nicta-reproducibility
Ethics reproducibility and data stewardship
Creativity vs Best Practices
May 2021 Spark Testing ... or how to farm reputation on StackOverflow
Matlab for a computational PhD
Data science presentation
Collaborative Data Analysis with Taverna Workflows
Software Analytics - Achievements and Challenges
Machine Learning with Spark
Interpretable Machine Learning
Maintaining Large Scale Julia Ecosystems
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
01.intro
Nautral Langauge Processing - Basics / Non Technical
Ad

More from Joaquin Vanschoren (13)

PDF
Meta learning tutorial
PDF
AutoML lectures (ACDL 2019)
PDF
OpenML 2019
PDF
Exposé Ontology
PDF
Designed Serendipity
PDF
Learning how to learn
PDF
Data science
PDF
OpenML 2014
PDF
Open Machine Learning
PDF
Hadoop tutorial
PDF
Hadoop sensordata part2
PDF
Hadoop sensordata part1
PDF
Hadoop sensordata part3
Meta learning tutorial
AutoML lectures (ACDL 2019)
OpenML 2019
Exposé Ontology
Designed Serendipity
Learning how to learn
Data science
OpenML 2014
Open Machine Learning
Hadoop tutorial
Hadoop sensordata part2
Hadoop sensordata part1
Hadoop sensordata part3
Ad

Recently uploaded (20)

PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPTX
BIOMOLECULES PPT........................
PPT
protein biochemistry.ppt for university classes
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
An interstellar mission to test astrophysical black holes
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
2. Earth - The Living Planet earth and life
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
BIOMOLECULES PPT........................
protein biochemistry.ppt for university classes
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Phytochemical Investigation of Miliusa longipes.pdf
An interstellar mission to test astrophysical black holes
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Derivatives of integument scales, beaks, horns,.pptx
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Placing the Near-Earth Object Impact Probability in Context
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
neck nodes and dissection types and lymph nodes levels
2. Earth - The Living Planet earth and life
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx

OpenML Reproducibility in Machine Learning ICML2017

  • 1. OpenML M A K I N G M A C H I N E L E A R N I N G M O R E R E P R O D U C I B L E ( A N D E A S I E R ) B Y B R I N G I N G I T O N L I N E J O A Q U I N VA N S C H O R E N R E P R O D U C I B I L I T Y I N M A C H I N E L E A R N I N G , I C M L 2 0 1 7 @open_ml
  • 2. Research different. Polymaths: Solve math problems by massive online collaboration Broadcast question, combine many minds to solve it
  • 3. Networked Science Designed serendipity: what’s hard for one person is easy for another Collaboration only scales if all friction is eliminated
  • 4. Friction in machine learning • Data hard to find, in different formats • Code hard to find, difficult to use • Published experiments are often not reproducible • Reproducing my own results can be difficult • No easy overview of state-of-the-art • Benchmarks are often incomparable • Experimentation is mostly solitary, offline • Only afterwards, we share aggregated results • Limited interaction, visibility in early stage • Fear or getting scooped
  • 5. Make machine learning as simple as possible (but not simpler) Use any (open-source) tool to try many algorithms, Share code and experiments online Find richly annotated datasets, import easily. Or share your own. Clear problem descriptions, evaluation protocols, overview of state of the art. Reproducible, transparent, reusable results Organized for easy analysis and reuse Frictionless online collaboration, track contributions Easy sharing of data and results
  • 6. W H AT I F W E C A N A N A LY S E D ATA C O L L A B O R AT I V E LY
  • 7. W H AT I F W E C A N A N A LY S E D ATA C O L L A B O R AT I V E LY O N W E B S C A L E
  • 8. W H AT I F W E C A N A N A LY S E D ATA C O L L A B O R AT I V E LY O N W E B S C A L E I N R E A L T I M E
  • 9. C O L L A B O R AT I V E M A C H I N E L E A R N I N G Easy to use: Integrated in many ML tools/environments Modular contribution: Share data, code, models, evaluations Organized data: Meta-data, reproducible models, link to people Reward structure: Track your impact, build reputation Self-learning: Learn from many experiments to help people OpenML
  • 11. Data (ARFF) uploaded or referenced (URL) auto-versioned, analysed, meta-data extracted, organised online It starts with data
  • 13. Tasks contain data, goals, procedures. Readable by tools, automates experimentation Train-test splits Classify target X All results organized online: realtime overview Collaborate online
  • 14. Train-test splits All results organized online: realtime overview Classify target X
  • 15. Flows (workflows, scripts) can run anywhere (locally) Tool integrations + APIs (REST, R, Python, Java,…) Use any ML library
  • 16. Integrations + APIs (REST, R, Python, Java,…)
  • 17. Integrations + APIs (REST, R, Python, Java,…) Run wherever you want
  • 18. reproducible, linked to data, flows, authors and all other experiments Experiments auto-uploaded, evaluated online Analyse results objectively
  • 20. Publish, and track your impact
  • 21. Demo
  • 22. OpenML Community Nov-Dec 2016 3000+ registered, 2500 30-day active users
  • 23. Studies Online collaboration environment Link to GitHub, Jupyter notebooks, wiki, Overleaf,… Sharing settings Share datasets, flows, studies with certain people. Easily publish them later. Code submissions, GitHub integration Sharing versioned code, docker images, archiving In progress… Benchmarking suites Curated datasets + all reproducible results
  • 24. Meta-learning, learning to learn Learn from all prior experiments -> bots Learn across datasets, use OpenML as ‘memory’ Deep learning integration Native integration in deep learning libraries Store/analyse models (architecture, parameters) In progress (all help welcome!) Daily research Support other machine learning tasks, tools Support ML in other scientific domains
  • 26. Join us! Open initiative, contribute via GitHub/Slack Regular week-long hackathons Bring your own data, tools Resources, ideas welcome
  • 27. Thank You @open_ml OpenML Now try it yourself :) www.openml.org