SlideShare a Scribd company logo
Hacking academia for fun and profit
Thoughts on succeeding in academia despite doing good software
Varoquaux
Ga¨el
Strong is the power
of the dark side
Hacking academia for fun and profit
Thoughts on succeeding in academia despite doing good software
Varoquaux
Ga¨el
Strong is the power
of the dark side
with Python
Hacking academia for fun and profit
Thoughts on succeeding in academia despite doing good software
Varoquaux
Ga¨el
Strong is the power
of the dark side
despite good
software?
Publish or perish
Want a career?
G Varoquaux 2
Publish or perish
Want a career?
Broken value system
Only Science matters
Wrong incentives
G Varoquaux 2
Publish or perish
Want a career? Hack academia!
G Varoquaux 2
Publish or perish
Want a career? Hack academia!
G Varoquaux 2
Publish or perish
Want a career? Hack academia!
Publishing scientific software matters
[Pradal, Varoquaux, Langtangen,
J Computational Science]
G Varoquaux 2
[TL;DR]‹
Choose your battles
keeping science in the target
Win them
software production
‹ Too Long, Didn’t Read
G Varoquaux 3
Growing up as a geek scientist
G Varoquaux 4
Growing up as a geek scientist
I did a PhD in
quantum physics
G Varoquaux 5
Growing up as a geek scientist
I did a PhD in
quantum physics
Vacuum (leaks)
Electronics (shorts)
Lasers (mis-alignment)
Best training ever
for agile project
managementG Varoquaux 6
Growing up as a geek scientist
I did a PhD in
quantum physics
Vacuum (leaks)
Electronics (shorts)
Lasers (mis-alignment)
Computers were only one
of the many moving parts
Matlab
Instrument control
G Varoquaux 7
Growing up as a geek scientist
I did a PhD in
quantum physics
Vacuum (leaks)
Electronics (shorts)
Lasers (mis-alignment)
Computers were only one
of the many moving parts
Matlab
Instrument controlShaped my vision
of computing as a
means to an end
G Varoquaux 7
Success
2011
Tenured researcher
in computer science
G Varoquaux 8
Success
2011
Tenured researcher
in computer science
Today
Growing team with
data science
rock stars
G Varoquaux 8
Success
2011
Tenured researcher
in computer science
Today
Growing team with
data science
rock stars
How / why did I switch?
Fernando Perez (IPython), Prabhu
Ramachandran (Mayavi), Eric Jones
(Enthought), Travis Oliphant (Numpy)...
G Varoquaux 8
Success
2011
Tenured researcher
in computer science
Today
Growing team with
data science
rock stars
How / why did I switch?
Fernando Perez (IPython), Prabhu
Ramachandran (Mayavi), Eric Jones
(Enthought), Travis Oliphant (Numpy)...
Learning fast is more impor-
tant than knowing something
Bye bye physics
G Varoquaux 8
And now...
What I do nowadays:
Machine learning to understand brain function
Cognitive neuroscience:
Link neural activity to thoughts and cognition
G Varoquaux 9
And now...
What I do nowadays:
Machine learning to understand brain function
Learn a bilateral link
between brain activity and cognitive function
G Varoquaux 9
Software is the new medium of scientific
method
Galileo’s notes
G Varoquaux 10
The scientific method
1. Make conjectures
2. Derive prediction
3. Carry experiments
4. Confirm or infirm conjectures
G Varoquaux 11
The scientific method
1. Make conjectures
2. Derive prediction
3. Carry experiments
4. Confirm or infirm conjectures
Software is everywhere
Data-mining
Computational models
Computer-controled experiments
Data analysis
G Varoquaux 11
The scientific method
1. Make conjectures
2. Derive prediction
3. Carry experiments
4. Confirm or infirm conjectures
Code is often the very language in
which predictions are expressed
Models are now more complex than a simple
formula or sentence
G Varoquaux 11
Enabling falsification: reproducible science
Replicating
A 3rd party redoing the work
Code and data made available
Reproducing
New analysis on different data / code coming to the
same conclusion
Reusing
Applying the approach to a new problem
Let us enable reusable research
G Varoquaux 12
Enabling falsification: reproducible science
Replicating
A 3rd party redoing the work
Code and data made available
Reproducing
New analysis on different data / code coming to the
same conclusion
Reusing
Applying the approach to a new problem
Let us enable reusable research
Arguments for BSD license
No strings attached
Can tinker with it
G Varoquaux 12
Reusable science ñ evidence accumulation
Accumulation of scientific knowledge
and learning formal representations
G Varoquaux 13
Reusable science ñ evidence accumulation
Accumulation of scientific knowledge
and learning formal representations
Akin to a review paper of the field
But a mathematical model is more testable
Machine learning:
engineering knowledge from data
G Varoquaux 13
Reusable science ñ evidence accumulation
Accumulation of scientific knowledge
and learning formal representations
Akin to a review paper of the field
But a mathematical model is more testable
“A theory is a good theory if it satisfies two requirements:
It must accurately describe a large class of observa-
tions on the basis of a model that contains only a few
arbitrary elements, and it must make definite predic-
tions about the results of future observations.”
Stephen Hawking, A Brief History of Time.
G Varoquaux 13
The sweet spots across science and software
G Varoquaux 14
The advancement of knowledge
Imagine a circle that contains human knowledge
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
By the time you finish elementary school, you know a little
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
High school takes you a little bit further
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
With a bachelors degree, you gain a speciality
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
A master’s degree deepens this speciality
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
Research papers take you to the edge of human knowledge
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
Once you are at the boundary, you focus
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
You push at the boundary for a few years
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
And one day it yields
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
That dent you’ve made, is called a PhD
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
Of course, the world looks different to you now
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
But don’t forget the big picture
PhD
Courtesy of Matt Might, via Stefan van der Waalt
G Varoquaux 15
The advancement of knowledge
This is an optimistic view
Biology
Maths
Computer science
Physics
Economy
Literature
History
G Varoquaux 15
The advancement of knowledge
This is an optimistic view
Biology
Maths
Computer science
Physics
Economy
Literature
History
I want to
be there
G Varoquaux 15
Translationnal computional science
Computational science
The use of computers and mathematical models to
address scientific research
G Varoquaux 16
Translationnal computional science
Computational science
The use of computers and mathematical models to
address scientific research
Translationnal science
In medecine: bring bench science to medical practice
G Varoquaux 16
Translationnal computional science
Computational science
The use of computers and mathematical models to
address scientific research
Translationnal science
In medecine: bring bench science to medical practice
Translational
computational science?
G Varoquaux 16
Pick a problem to work on
Take the “easy” route
There needs to be a market screeming for the
software (in academia and in industry)
Refine your vision
Pull, not push
Design driven be need
G Varoquaux 17
Having an impact
G Varoquaux 18
Having an impact
G Varoquaux 18
Pick the right battles: viable projects
Project idea
A software implementing:
i) machine learning
and ii) neuroimaging
and iii) a graphical user interface
and iv) 3D plotting
G Varoquaux 19
Pick the right battles: viable projects
Project idea
A software implementing:
i) machine learning
and ii) neuroimaging
and iii) a graphical user interface
and iv) 3D plotting
G Varoquaux 19
Pick the right battles: viable projects
Project idea
A software implementing:
i) machine learning
and ii) neuroimaging
and iii) a graphical user interface
and iv) 3D plotting
Define project scope and vision
Break down projects by expertise
Don’t solve hard problems
Know the software landscape
Don’t target markets that will not
yield contributors
Need a vision = elevator pitch
G Varoquaux 19
Pick the right battles: viable projects
Project idea
A software implementing:
i) machine learning
and ii) neuroimaging
and iii) a graphical user interface
and iv) 3D plotting
Define project scope and vision
Break down projects by expertise
Don’t solve hard problems
Know the software landscape
Don’t target markets that will not
yield contributors
Need a vision = elevator pitch
Your research (PhD) probably does not qualify
ñ need to cherry-pick contributions
G Varoquaux 19
Open source and community development
Code maintenance too expensive to be alone
scikit-learn „ 300 email/month nipy „ 45 email/month
joblib „ 45 email/month mayavi „ 30 email/month
“Hey Gael, I take it you’re too
busy. That’s okay, I spent a day
trying to install XXX and I think
I’ll succeed myself. Next time
though please don’t ignore my
emails, I really don’t like it. You
can say, ‘sorry, I have no time to
help you.’ Just don’t ignore.”
G Varoquaux 20
Open source and community development
Code maintenance too expensive to be alone
scikit-learn „ 300 email/month nipy „ 45 email/month
joblib „ 45 email/month mayavi „ 30 email/month
Your “benefits” come from a fraction of the code
Data loading? Maybe?
Standard algorithms? Nah
Share the common code...
...to avoid dying under code
Code becomes less precious with time
And somebody might contribute features
G Varoquaux 20
Community development in scikit-learn
Huge feature set:
benefits of a large team
Project growth:
More than 200 contributors
„ 12 core contributors
1 full-time INRIA programmer
from the start
Estimated cost of development: $ 6 millions
COCOMO model,
http://www.ohloh.net/p/scikit-learn
G Varoquaux 21
Communities: many eyes makes code fast
L. Buitinck, O. Grisel, A. Joly, G. Louppe, J. Nothman, P. Prettenhofer
G Varoquaux 22
Having an impact
You need a community
G Varoquaux 23
What’s in a scientific-computing environment
G Varoquaux 24
The scientific workflow agile
Interaction...
Ñ script...
Ñ module...
ý interaction again...
Consolidation,
progressively
Low tech and short
turn-around times
G Varoquaux 25
Choose your weapons
Python, what else?
Interactive language
Easy to read / write
General purpose
G Varoquaux 26
Choose your weapons
Python, what else?
Interactive language
Easy to read / write
General purpose
Old virtual machine /
compiler
Younger languages
promissing (Julia)
but will they get
adoption beyond science?
G Varoquaux 26
Choose your weapons
Python, what else?
+Numpy arrays
Shoe-horn your data in a
numpy array, and you’ve won
personnally disappointed
that pandas drifted away
G Varoquaux 26
Software architecture for science
“Scriptability” is paramount
In an application: MVC (model, view, controller)
Model
Numerical or
data-processing
core
View
Ouput: graphs,
or files
Must enable
headless use
Controller
Input: dialogs,
or an API
Avoid input as files:
not expressive
Dialogs should never be far from the code
Dialog generation: traits, IPython widgets
Reactive programming:
dialogs modify object, and the model updates
Don’t own the main
G Varoquaux 27
Software architecture for science
“Scriptability” is paramount
In an application: MVC (model, view, controller)
Model
Numerical or
data-processing
core
View
Ouput: graphs,
or files
Must enable
headless use
Controller
Input: dialogs,
or an API
Avoid input as files:
not expressive
Dialogs should never be far from the code
Dialog generation: traits, IPython widgets
Reactive programming:
dialogs modify object, and the model updates
Don’t own the main
In Mayavi: script generation for free
G Varoquaux 27
Quality is free‹
‹ This is a book, by Philip Crosby
G Varoquaux 28
You need quality
Quality will give you users
Bugs give you bad rap
Quality will give you developers
Contribute to learn and improve
Quality will make your developers happy
People need to be proud of their work
Do less, do better
Goes against the grant-system incentive
G Varoquaux 29
Quality: what & how
Great documentation
Simplify, but don’t dumb down
Focus on what the user is trying to solve
Great APIs
Example-based development
If something is hard to explain, rethink the concepts
Limit the number of different concepts and objects
Consistency, consistency, consistency
Good numerics
Write tests based on mathematical properties
When a user finds an instability, write a new test
G Varoquaux 30
Quality: what & how
Great documentation
Simplify, but don’t dumb down
Focus on what the user is trying to solve
Great APIs
Example-based development
If something is hard to explain, rethink the concepts
Limit the number of different concepts and objects
Consistency, consistency, consistency
Good numerics
Write tests based on mathematical properties
When a user finds an instability, write a new test
Quality enables reuse
Beyond mere reproducibility
G Varoquaux 30
Be productive
G Varoquaux 31
Be productive
“If you spend too much time thinking about a
thing, you’ll never get it done.” — Bruce Lee
G Varoquaux 31
Limited resources
Limited resources are good
Need success in the short term, not the long term
The startup culture: fail fast
Quickly identify non-viable projects
The simpest solution that works is the best
G Varoquaux 32
Short cycles, limited ambitions
Keep coming back to your users
Release early, release often
G Varoquaux 33
Simplicity
Complexity increase superlinearly
[An Experiment on Unit Increase in Problem Complexity,
Woodfield 1979]
25% increase in problem complexity
ñ 100% increase in code complexity
G Varoquaux 34
Simplicity
Complexity increase superlinearly
[An Experiment on Unit Increase in Problem Complexity,
Woodfield 1979]
25% increase in problem complexity
ñ 100% increase in code complexity
The 80/20 rule
80% of the usecases can be solved
with 20% of the lines of code
Avoid feature creep
G Varoquaux 34
Simplicity
Complexity increase superlinearly
[An Experiment on Unit Increase in Problem Complexity,
Woodfield 1979]
25% increase in problem complexity
ñ 100% increase in code complexity
The 80/20 rule
80% of the usecases can be solved
with 20% of the lines of code
Avoid feature creep
Use objects sparingly
Don’t use classes for the sake of it
G Varoquaux 34
Software engineering
G Varoquaux 35
Software engineering good practices
Version control
Use git + github
Unit testing
If it’s not tested, it’s broken or soon will be.
Make a package,
with controlled dependencies and compilation
...
G Varoquaux 36
Research ‰ production
Need to adapt software-engineering principles
Ó
Good naming is free
Use functions, not scripts
Version control is very cheap
Tests are more expensive... Considering if goals
stabilizes
Build chains are hard
Go down the chain as your research progress
You can think of shipping a software only if it was
viable to go completely down the chain
G Varoquaux 37
Things we did right (maybe)
G Varoquaux 38
Mayavi: 3D visualization in Python
Success factors
Building upon VTK Great power
Component model (UI)
Internals open to the world
ñ from interaction to scripting
Limiting factors
Building upon VTK A lot of complexity
Codebase too complex and object-oriented
(bound to VTK)
Users of GUIs do not turn into developers
Composition is an API killer
G Varoquaux 39
Mayavi: 3D visualization in Python
Success factors
Building upon VTK Great power
Component model (UI)
Internals open to the world
ñ from interaction to scripting
Limiting factors
Building upon VTK A lot of complexity
Codebase too complex and object-oriented
(bound to VTK)
Users of GUIs do not turn into developers
Composition is an API killer
G Varoquaux 39
joblib: computational workflow patterns
Parallel for loop
>>> from joblib import Parallel, delayed
>>> Parallel(n jobs=2)(delayed(sqrt)(i**2)
... for i in range(8))
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
On-demand dispatch to ease memory consumption
Threading and processes backends
G Varoquaux 40
joblib: computational workflow patterns
Parallel for loop
>>> from joblib import Parallel, delayed
>>> Parallel(n jobs=2)(delayed(sqrt)(i**2)
... for i in range(8))
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
Memoize pattern
mem = joblib.Memory(cachedir=’.’)
g = mem.cache(f)
b = g(a) # computes a using f
c = g(a) # retrieves results from store
G Varoquaux 40
joblib: computational workflow patterns
Success factors
Simplicity of use
Patterns we really, really need (pull not push)
G Varoquaux 41
joblib: computational workflow patterns
Success factors
Simplicity of use
Patterns we really, really need (pull not push)
Limiting factor
Vision of the project unclear
Positioning with regards to landscape unclear
(IPython, where are you headed?)
Tricky code inside
G Varoquaux 41
scikit-learn: machine learning in Python
Success factors
Right project vision
Machine learning without learning the machinery
Black box that can be opened
Right trade-off between ”just works” and versatility
(think Apple vs Linux)
We’re not going to solve all the problems for you
I don’t solve hard problems
Feature-engineering, domain-specific cases...
Python is a programming language. Use it.
Cover all the 80% usecases in one package
G Varoquaux 42
scikit-learn: machine learning in Python
Success factors
Right project vision
High-level programming
- Optimize algorithmes, not for loops
- Know perfectly Numpy and scipy
All significant data should be in arrays
Avoid memory copies, rely on blas/lapack
- Use Cython, quad not C/C++
G Varoquaux 42
scikit-learn: machine learning in Python
Success factors
Right project vision
High-level programming
Good API design
- separate data from operations
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0
0
3
8
7
8
7
9
4
7
9
7
9
2
7
0
1
7
9
0
7
5
2
7
0
1
5
7
8
9
4
0
7
1
7
4
6
1
2
4
7
9
7
5
4
9
7
0
7
1
8
7
1
7
8
8
7
1
3
6
5
3
4
9
0
4
9
5
1
9
0
7
4
7
5
4
2
6
5
3
5
8
0
9
8
4
8
7
2
1
5
4
6
3
4
9
0
8
4
9
0
3
4
5
6
7
3
2
4
5
6
1
4
7
8
9
5
7
1
8
7
7
4
5
6
2
0G Varoquaux 42
scikit-learn: machine learning in Python
Success factors
Right project vision
High-level programming
Good API design
- separate data from operations
- Object API exposes a data-processing language
fit, predict, transform, score, partial fit
- Instantiated without data but with all parameters
G Varoquaux 42
scikit-learn: machine learning in Python
Success factors
Right project vision
High-level programming
Good API design
Great community
- Github + code review
G Varoquaux 42
scikit-learn: machine learning in Python
Success factors
Right project vision
High-level programming
Good API design
Great community
Great documentation
G Varoquaux 42
scikit-learn: machine learning in Python
Success factors
Right project vision
High-level programming
Good API design
Great community
Great documentation
Limiting factors
Tricky numerical code
Our own success ñ huge volume
G Varoquaux 42
@GaelVaroquaux
Succeeding in academia despite doing good software
1 Game the system
It’s about convincing a tenure committee
Code must contribute to a scientific problem
@GaelVaroquaux
Succeeding in academia despite doing good software
1 Game the system
2 Not all battles can be fought
Make sure that there is a market
Don’t solve hard problems
Problems that matter for science and industry
@GaelVaroquaux
Succeeding in academia despite doing good software
1 Game the system
2 Not all battles can be fought
3 Make good software
That actually answers scientists needs
With quality, software engineering
Relying on a communauty
Usability matters
@GaelVaroquaux
Succeeding in academia despite doing good software
1 Game the system
2 Not all battles can be fought
3 Make good software
Now go out, and code!

More Related Content

PDF
Coding for science and innovation
PDF
Open Source Scientific Software
PDF
On the code of data science
PDF
Building a cutting-edge data processing environment on a budget
PDF
Computational practices for reproducible science
PDF
Scikit-learn and nilearn: Democratisation of machine learning for brain imaging
PDF
Scikit-learn: the state of the union 2016
PDF
Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux
Coding for science and innovation
Open Source Scientific Software
On the code of data science
Building a cutting-edge data processing environment on a budget
Computational practices for reproducible science
Scikit-learn and nilearn: Democratisation of machine learning for brain imaging
Scikit-learn: the state of the union 2016
Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux

Viewers also liked (14)

PDF
Scientist meets web dev: how Python became the language of data
PDF
Simple big data, in Python
PDF
Scikit-learn for easy machine learning: the vision, the tool, and the project
PDF
Inter-site autism biomarkers from resting state fMRI
PDF
Machine learning and cognitive neuroimaging: new tools can answer new questions
PDF
Brain maps from machine learning? Spatial regularizations
PDF
A hand-waving introduction to sparsity for compressed tomography reconstruction
PDF
Advanced network modelling 2: connectivity measures, goup analysis
PDF
Processing biggish data on commodity hardware: simple Python patterns
PDF
Social-sparsity brain decoders: faster spatial sparsity
PDF
Connectomics: Parcellations and Network Analysis Methods
PDF
Scikit learn: apprentissage statistique en Python
PDF
Brain reading, compressive sensing, fMRI and statistical learning in Python
PDF
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
Scientist meets web dev: how Python became the language of data
Simple big data, in Python
Scikit-learn for easy machine learning: the vision, the tool, and the project
Inter-site autism biomarkers from resting state fMRI
Machine learning and cognitive neuroimaging: new tools can answer new questions
Brain maps from machine learning? Spatial regularizations
A hand-waving introduction to sparsity for compressed tomography reconstruction
Advanced network modelling 2: connectivity measures, goup analysis
Processing biggish data on commodity hardware: simple Python patterns
Social-sparsity brain decoders: faster spatial sparsity
Connectomics: Parcellations and Network Analysis Methods
Scikit learn: apprentissage statistique en Python
Brain reading, compressive sensing, fMRI and statistical learning in Python
Scikit-learn: apprentissage statistique en Python. Créer des machines intelli...
Ad

Similar to Succeeding in academia despite doing good_software (20)

PDF
Better neuroimaging data processing: driven by evidence, open communities, an...
PDF
Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
PDF
Open Science and Executable Papers
PPTX
SCC2011 - Talking about e-Science in a virtual world
PDF
Ocelot (OSS remote Instrumentation)
PDF
Mobile Monday (October 2014) - Riding Global Tech Trends
PDF
Improved Knowledge from Data: Building an Immersive Data Analysis Platform
PDF
Seminario eMadrid sobre "Nuevas experiencias en laboratorios remotos". Experi...
PDF
FAIR Workflows: A step closer to the Scientific Paper of the Future
PPTX
IronHacks Live: Info session #3 - COVID-19 Data Science Challenge
PDF
Increasing immersiveness into a 3D virtual world - motion tracking and natura...
PPT
Including online labs in science education classrooms
PPTX
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
PDF
ProteicJS: the next-generation visualization library
PDF
Jaroslav Vážný: The Hitch-Hacker’s Guide to Data Science  
PDF
Open & reproducible research - What can we do in practice?
PDF
深度学习639页PPT/////////////////////////////
PPTX
Enhancing and Transforming Global Learning Communities with Augmented Reality
PDF
Research2.0
PDF
From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - ...
Better neuroimaging data processing: driven by evidence, open communities, an...
Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Open Science and Executable Papers
SCC2011 - Talking about e-Science in a virtual world
Ocelot (OSS remote Instrumentation)
Mobile Monday (October 2014) - Riding Global Tech Trends
Improved Knowledge from Data: Building an Immersive Data Analysis Platform
Seminario eMadrid sobre "Nuevas experiencias en laboratorios remotos". Experi...
FAIR Workflows: A step closer to the Scientific Paper of the Future
IronHacks Live: Info session #3 - COVID-19 Data Science Challenge
Increasing immersiveness into a 3D virtual world - motion tracking and natura...
Including online labs in science education classrooms
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
ProteicJS: the next-generation visualization library
Jaroslav Vážný: The Hitch-Hacker’s Guide to Data Science  
Open & reproducible research - What can we do in practice?
深度学习639页PPT/////////////////////////////
Enhancing and Transforming Global Learning Communities with Augmented Reality
Research2.0
From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - ...
Ad

More from Gael Varoquaux (13)

PDF
Evaluating machine learning models and their diagnostic value
PDF
Measuring mental health with machine learning and brain imaging
PDF
Machine learning with missing values
PDF
Dirty data science machine learning on non-curated data
PDF
Representation learning in limited-data settings
PDF
Functional-connectome biomarkers to meet clinical needs?
PDF
Atlases of cognition with large-scale human brain mapping
PDF
Similarity encoding for learning on dirty categorical variables
PDF
Machine learning for functional connectomes
PDF
Towards psychoinformatics with machine learning and brain imaging
PDF
Simple representations for learning: factorizations and similarities
PDF
A tutorial on Machine Learning, with illustrations for MR imaging
PDF
Estimating Functional Connectomes: Sparsity’s Strength and Limitations
Evaluating machine learning models and their diagnostic value
Measuring mental health with machine learning and brain imaging
Machine learning with missing values
Dirty data science machine learning on non-curated data
Representation learning in limited-data settings
Functional-connectome biomarkers to meet clinical needs?
Atlases of cognition with large-scale human brain mapping
Similarity encoding for learning on dirty categorical variables
Machine learning for functional connectomes
Towards psychoinformatics with machine learning and brain imaging
Simple representations for learning: factorizations and similarities
A tutorial on Machine Learning, with illustrations for MR imaging
Estimating Functional Connectomes: Sparsity’s Strength and Limitations

Recently uploaded (20)

PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
history of c programming in notes for students .pptx
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
System and Network Administraation Chapter 3
PPTX
Reimagine Home Health with the Power of Agentic AI​
PPTX
Introduction to Artificial Intelligence
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
L1 - Introduction to python Backend.pptx
PDF
Nekopoi APK 2025 free lastest update
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Transform Your Business with a Software ERP System
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
top salesforce developer skills in 2025.pdf
PPTX
assetexplorer- product-overview - presentation
PPTX
CHAPTER 2 - PM Management and IT Context
How to Migrate SBCGlobal Email to Yahoo Easily
history of c programming in notes for students .pptx
Upgrade and Innovation Strategies for SAP ERP Customers
VVF-Customer-Presentation2025-Ver1.9.pptx
System and Network Administraation Chapter 3
Reimagine Home Health with the Power of Agentic AI​
Introduction to Artificial Intelligence
Navsoft: AI-Powered Business Solutions & Custom Software Development
L1 - Introduction to python Backend.pptx
Nekopoi APK 2025 free lastest update
Which alternative to Crystal Reports is best for small or large businesses.pdf
Understanding Forklifts - TECH EHS Solution
Transform Your Business with a Software ERP System
Softaken Excel to vCard Converter Software.pdf
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
top salesforce developer skills in 2025.pdf
assetexplorer- product-overview - presentation
CHAPTER 2 - PM Management and IT Context

Succeeding in academia despite doing good_software

  • 1. Hacking academia for fun and profit Thoughts on succeeding in academia despite doing good software Varoquaux Ga¨el Strong is the power of the dark side
  • 2. Hacking academia for fun and profit Thoughts on succeeding in academia despite doing good software Varoquaux Ga¨el Strong is the power of the dark side with Python
  • 3. Hacking academia for fun and profit Thoughts on succeeding in academia despite doing good software Varoquaux Ga¨el Strong is the power of the dark side despite good software?
  • 4. Publish or perish Want a career? G Varoquaux 2
  • 5. Publish or perish Want a career? Broken value system Only Science matters Wrong incentives G Varoquaux 2
  • 6. Publish or perish Want a career? Hack academia! G Varoquaux 2
  • 7. Publish or perish Want a career? Hack academia! G Varoquaux 2
  • 8. Publish or perish Want a career? Hack academia! Publishing scientific software matters [Pradal, Varoquaux, Langtangen, J Computational Science] G Varoquaux 2
  • 9. [TL;DR]‹ Choose your battles keeping science in the target Win them software production ‹ Too Long, Didn’t Read G Varoquaux 3
  • 10. Growing up as a geek scientist G Varoquaux 4
  • 11. Growing up as a geek scientist I did a PhD in quantum physics G Varoquaux 5
  • 12. Growing up as a geek scientist I did a PhD in quantum physics Vacuum (leaks) Electronics (shorts) Lasers (mis-alignment) Best training ever for agile project managementG Varoquaux 6
  • 13. Growing up as a geek scientist I did a PhD in quantum physics Vacuum (leaks) Electronics (shorts) Lasers (mis-alignment) Computers were only one of the many moving parts Matlab Instrument control G Varoquaux 7
  • 14. Growing up as a geek scientist I did a PhD in quantum physics Vacuum (leaks) Electronics (shorts) Lasers (mis-alignment) Computers were only one of the many moving parts Matlab Instrument controlShaped my vision of computing as a means to an end G Varoquaux 7
  • 16. Success 2011 Tenured researcher in computer science Today Growing team with data science rock stars G Varoquaux 8
  • 17. Success 2011 Tenured researcher in computer science Today Growing team with data science rock stars How / why did I switch? Fernando Perez (IPython), Prabhu Ramachandran (Mayavi), Eric Jones (Enthought), Travis Oliphant (Numpy)... G Varoquaux 8
  • 18. Success 2011 Tenured researcher in computer science Today Growing team with data science rock stars How / why did I switch? Fernando Perez (IPython), Prabhu Ramachandran (Mayavi), Eric Jones (Enthought), Travis Oliphant (Numpy)... Learning fast is more impor- tant than knowing something Bye bye physics G Varoquaux 8
  • 19. And now... What I do nowadays: Machine learning to understand brain function Cognitive neuroscience: Link neural activity to thoughts and cognition G Varoquaux 9
  • 20. And now... What I do nowadays: Machine learning to understand brain function Learn a bilateral link between brain activity and cognitive function G Varoquaux 9
  • 21. Software is the new medium of scientific method Galileo’s notes G Varoquaux 10
  • 22. The scientific method 1. Make conjectures 2. Derive prediction 3. Carry experiments 4. Confirm or infirm conjectures G Varoquaux 11
  • 23. The scientific method 1. Make conjectures 2. Derive prediction 3. Carry experiments 4. Confirm or infirm conjectures Software is everywhere Data-mining Computational models Computer-controled experiments Data analysis G Varoquaux 11
  • 24. The scientific method 1. Make conjectures 2. Derive prediction 3. Carry experiments 4. Confirm or infirm conjectures Code is often the very language in which predictions are expressed Models are now more complex than a simple formula or sentence G Varoquaux 11
  • 25. Enabling falsification: reproducible science Replicating A 3rd party redoing the work Code and data made available Reproducing New analysis on different data / code coming to the same conclusion Reusing Applying the approach to a new problem Let us enable reusable research G Varoquaux 12
  • 26. Enabling falsification: reproducible science Replicating A 3rd party redoing the work Code and data made available Reproducing New analysis on different data / code coming to the same conclusion Reusing Applying the approach to a new problem Let us enable reusable research Arguments for BSD license No strings attached Can tinker with it G Varoquaux 12
  • 27. Reusable science ñ evidence accumulation Accumulation of scientific knowledge and learning formal representations G Varoquaux 13
  • 28. Reusable science ñ evidence accumulation Accumulation of scientific knowledge and learning formal representations Akin to a review paper of the field But a mathematical model is more testable Machine learning: engineering knowledge from data G Varoquaux 13
  • 29. Reusable science ñ evidence accumulation Accumulation of scientific knowledge and learning formal representations Akin to a review paper of the field But a mathematical model is more testable “A theory is a good theory if it satisfies two requirements: It must accurately describe a large class of observa- tions on the basis of a model that contains only a few arbitrary elements, and it must make definite predic- tions about the results of future observations.” Stephen Hawking, A Brief History of Time. G Varoquaux 13
  • 30. The sweet spots across science and software G Varoquaux 14
  • 31. The advancement of knowledge Imagine a circle that contains human knowledge Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 32. The advancement of knowledge By the time you finish elementary school, you know a little Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 33. The advancement of knowledge High school takes you a little bit further Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 34. The advancement of knowledge With a bachelors degree, you gain a speciality Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 35. The advancement of knowledge A master’s degree deepens this speciality Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 36. The advancement of knowledge Research papers take you to the edge of human knowledge Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 37. The advancement of knowledge Once you are at the boundary, you focus Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 38. The advancement of knowledge You push at the boundary for a few years Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 39. The advancement of knowledge And one day it yields Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 40. The advancement of knowledge That dent you’ve made, is called a PhD Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 41. The advancement of knowledge Of course, the world looks different to you now Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 42. The advancement of knowledge But don’t forget the big picture PhD Courtesy of Matt Might, via Stefan van der Waalt G Varoquaux 15
  • 43. The advancement of knowledge This is an optimistic view Biology Maths Computer science Physics Economy Literature History G Varoquaux 15
  • 44. The advancement of knowledge This is an optimistic view Biology Maths Computer science Physics Economy Literature History I want to be there G Varoquaux 15
  • 45. Translationnal computional science Computational science The use of computers and mathematical models to address scientific research G Varoquaux 16
  • 46. Translationnal computional science Computational science The use of computers and mathematical models to address scientific research Translationnal science In medecine: bring bench science to medical practice G Varoquaux 16
  • 47. Translationnal computional science Computational science The use of computers and mathematical models to address scientific research Translationnal science In medecine: bring bench science to medical practice Translational computational science? G Varoquaux 16
  • 48. Pick a problem to work on Take the “easy” route There needs to be a market screeming for the software (in academia and in industry) Refine your vision Pull, not push Design driven be need G Varoquaux 17
  • 49. Having an impact G Varoquaux 18
  • 50. Having an impact G Varoquaux 18
  • 51. Pick the right battles: viable projects Project idea A software implementing: i) machine learning and ii) neuroimaging and iii) a graphical user interface and iv) 3D plotting G Varoquaux 19
  • 52. Pick the right battles: viable projects Project idea A software implementing: i) machine learning and ii) neuroimaging and iii) a graphical user interface and iv) 3D plotting G Varoquaux 19
  • 53. Pick the right battles: viable projects Project idea A software implementing: i) machine learning and ii) neuroimaging and iii) a graphical user interface and iv) 3D plotting Define project scope and vision Break down projects by expertise Don’t solve hard problems Know the software landscape Don’t target markets that will not yield contributors Need a vision = elevator pitch G Varoquaux 19
  • 54. Pick the right battles: viable projects Project idea A software implementing: i) machine learning and ii) neuroimaging and iii) a graphical user interface and iv) 3D plotting Define project scope and vision Break down projects by expertise Don’t solve hard problems Know the software landscape Don’t target markets that will not yield contributors Need a vision = elevator pitch Your research (PhD) probably does not qualify ñ need to cherry-pick contributions G Varoquaux 19
  • 55. Open source and community development Code maintenance too expensive to be alone scikit-learn „ 300 email/month nipy „ 45 email/month joblib „ 45 email/month mayavi „ 30 email/month “Hey Gael, I take it you’re too busy. That’s okay, I spent a day trying to install XXX and I think I’ll succeed myself. Next time though please don’t ignore my emails, I really don’t like it. You can say, ‘sorry, I have no time to help you.’ Just don’t ignore.” G Varoquaux 20
  • 56. Open source and community development Code maintenance too expensive to be alone scikit-learn „ 300 email/month nipy „ 45 email/month joblib „ 45 email/month mayavi „ 30 email/month Your “benefits” come from a fraction of the code Data loading? Maybe? Standard algorithms? Nah Share the common code... ...to avoid dying under code Code becomes less precious with time And somebody might contribute features G Varoquaux 20
  • 57. Community development in scikit-learn Huge feature set: benefits of a large team Project growth: More than 200 contributors „ 12 core contributors 1 full-time INRIA programmer from the start Estimated cost of development: $ 6 millions COCOMO model, http://www.ohloh.net/p/scikit-learn G Varoquaux 21
  • 58. Communities: many eyes makes code fast L. Buitinck, O. Grisel, A. Joly, G. Louppe, J. Nothman, P. Prettenhofer G Varoquaux 22
  • 59. Having an impact You need a community G Varoquaux 23
  • 60. What’s in a scientific-computing environment G Varoquaux 24
  • 61. The scientific workflow agile Interaction... Ñ script... Ñ module... ý interaction again... Consolidation, progressively Low tech and short turn-around times G Varoquaux 25
  • 62. Choose your weapons Python, what else? Interactive language Easy to read / write General purpose G Varoquaux 26
  • 63. Choose your weapons Python, what else? Interactive language Easy to read / write General purpose Old virtual machine / compiler Younger languages promissing (Julia) but will they get adoption beyond science? G Varoquaux 26
  • 64. Choose your weapons Python, what else? +Numpy arrays Shoe-horn your data in a numpy array, and you’ve won personnally disappointed that pandas drifted away G Varoquaux 26
  • 65. Software architecture for science “Scriptability” is paramount In an application: MVC (model, view, controller) Model Numerical or data-processing core View Ouput: graphs, or files Must enable headless use Controller Input: dialogs, or an API Avoid input as files: not expressive Dialogs should never be far from the code Dialog generation: traits, IPython widgets Reactive programming: dialogs modify object, and the model updates Don’t own the main G Varoquaux 27
  • 66. Software architecture for science “Scriptability” is paramount In an application: MVC (model, view, controller) Model Numerical or data-processing core View Ouput: graphs, or files Must enable headless use Controller Input: dialogs, or an API Avoid input as files: not expressive Dialogs should never be far from the code Dialog generation: traits, IPython widgets Reactive programming: dialogs modify object, and the model updates Don’t own the main In Mayavi: script generation for free G Varoquaux 27
  • 67. Quality is free‹ ‹ This is a book, by Philip Crosby G Varoquaux 28
  • 68. You need quality Quality will give you users Bugs give you bad rap Quality will give you developers Contribute to learn and improve Quality will make your developers happy People need to be proud of their work Do less, do better Goes against the grant-system incentive G Varoquaux 29
  • 69. Quality: what & how Great documentation Simplify, but don’t dumb down Focus on what the user is trying to solve Great APIs Example-based development If something is hard to explain, rethink the concepts Limit the number of different concepts and objects Consistency, consistency, consistency Good numerics Write tests based on mathematical properties When a user finds an instability, write a new test G Varoquaux 30
  • 70. Quality: what & how Great documentation Simplify, but don’t dumb down Focus on what the user is trying to solve Great APIs Example-based development If something is hard to explain, rethink the concepts Limit the number of different concepts and objects Consistency, consistency, consistency Good numerics Write tests based on mathematical properties When a user finds an instability, write a new test Quality enables reuse Beyond mere reproducibility G Varoquaux 30
  • 72. Be productive “If you spend too much time thinking about a thing, you’ll never get it done.” — Bruce Lee G Varoquaux 31
  • 73. Limited resources Limited resources are good Need success in the short term, not the long term The startup culture: fail fast Quickly identify non-viable projects The simpest solution that works is the best G Varoquaux 32
  • 74. Short cycles, limited ambitions Keep coming back to your users Release early, release often G Varoquaux 33
  • 75. Simplicity Complexity increase superlinearly [An Experiment on Unit Increase in Problem Complexity, Woodfield 1979] 25% increase in problem complexity ñ 100% increase in code complexity G Varoquaux 34
  • 76. Simplicity Complexity increase superlinearly [An Experiment on Unit Increase in Problem Complexity, Woodfield 1979] 25% increase in problem complexity ñ 100% increase in code complexity The 80/20 rule 80% of the usecases can be solved with 20% of the lines of code Avoid feature creep G Varoquaux 34
  • 77. Simplicity Complexity increase superlinearly [An Experiment on Unit Increase in Problem Complexity, Woodfield 1979] 25% increase in problem complexity ñ 100% increase in code complexity The 80/20 rule 80% of the usecases can be solved with 20% of the lines of code Avoid feature creep Use objects sparingly Don’t use classes for the sake of it G Varoquaux 34
  • 79. Software engineering good practices Version control Use git + github Unit testing If it’s not tested, it’s broken or soon will be. Make a package, with controlled dependencies and compilation ... G Varoquaux 36
  • 80. Research ‰ production Need to adapt software-engineering principles Ó Good naming is free Use functions, not scripts Version control is very cheap Tests are more expensive... Considering if goals stabilizes Build chains are hard Go down the chain as your research progress You can think of shipping a software only if it was viable to go completely down the chain G Varoquaux 37
  • 81. Things we did right (maybe) G Varoquaux 38
  • 82. Mayavi: 3D visualization in Python Success factors Building upon VTK Great power Component model (UI) Internals open to the world ñ from interaction to scripting Limiting factors Building upon VTK A lot of complexity Codebase too complex and object-oriented (bound to VTK) Users of GUIs do not turn into developers Composition is an API killer G Varoquaux 39
  • 83. Mayavi: 3D visualization in Python Success factors Building upon VTK Great power Component model (UI) Internals open to the world ñ from interaction to scripting Limiting factors Building upon VTK A lot of complexity Codebase too complex and object-oriented (bound to VTK) Users of GUIs do not turn into developers Composition is an API killer G Varoquaux 39
  • 84. joblib: computational workflow patterns Parallel for loop >>> from joblib import Parallel, delayed >>> Parallel(n jobs=2)(delayed(sqrt)(i**2) ... for i in range(8)) [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0] On-demand dispatch to ease memory consumption Threading and processes backends G Varoquaux 40
  • 85. joblib: computational workflow patterns Parallel for loop >>> from joblib import Parallel, delayed >>> Parallel(n jobs=2)(delayed(sqrt)(i**2) ... for i in range(8)) [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0] Memoize pattern mem = joblib.Memory(cachedir=’.’) g = mem.cache(f) b = g(a) # computes a using f c = g(a) # retrieves results from store G Varoquaux 40
  • 86. joblib: computational workflow patterns Success factors Simplicity of use Patterns we really, really need (pull not push) G Varoquaux 41
  • 87. joblib: computational workflow patterns Success factors Simplicity of use Patterns we really, really need (pull not push) Limiting factor Vision of the project unclear Positioning with regards to landscape unclear (IPython, where are you headed?) Tricky code inside G Varoquaux 41
  • 88. scikit-learn: machine learning in Python Success factors Right project vision Machine learning without learning the machinery Black box that can be opened Right trade-off between ”just works” and versatility (think Apple vs Linux) We’re not going to solve all the problems for you I don’t solve hard problems Feature-engineering, domain-specific cases... Python is a programming language. Use it. Cover all the 80% usecases in one package G Varoquaux 42
  • 89. scikit-learn: machine learning in Python Success factors Right project vision High-level programming - Optimize algorithmes, not for loops - Know perfectly Numpy and scipy All significant data should be in arrays Avoid memory copies, rely on blas/lapack - Use Cython, quad not C/C++ G Varoquaux 42
  • 90. scikit-learn: machine learning in Python Success factors Right project vision High-level programming Good API design - separate data from operations 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0 0 3 8 7 8 7 9 4 7 9 7 9 2 7 0 1 7 9 0 7 5 2 7 0 1 5 7 8 9 4 0 7 1 7 4 6 1 2 4 7 9 7 5 4 9 7 0 7 1 8 7 1 7 8 8 7 1 3 6 5 3 4 9 0 4 9 5 1 9 0 7 4 7 5 4 2 6 5 3 5 8 0 9 8 4 8 7 2 1 5 4 6 3 4 9 0 8 4 9 0 3 4 5 6 7 3 2 4 5 6 1 4 7 8 9 5 7 1 8 7 7 4 5 6 2 0G Varoquaux 42
  • 91. scikit-learn: machine learning in Python Success factors Right project vision High-level programming Good API design - separate data from operations - Object API exposes a data-processing language fit, predict, transform, score, partial fit - Instantiated without data but with all parameters G Varoquaux 42
  • 92. scikit-learn: machine learning in Python Success factors Right project vision High-level programming Good API design Great community - Github + code review G Varoquaux 42
  • 93. scikit-learn: machine learning in Python Success factors Right project vision High-level programming Good API design Great community Great documentation G Varoquaux 42
  • 94. scikit-learn: machine learning in Python Success factors Right project vision High-level programming Good API design Great community Great documentation Limiting factors Tricky numerical code Our own success ñ huge volume G Varoquaux 42
  • 95. @GaelVaroquaux Succeeding in academia despite doing good software 1 Game the system It’s about convincing a tenure committee Code must contribute to a scientific problem
  • 96. @GaelVaroquaux Succeeding in academia despite doing good software 1 Game the system 2 Not all battles can be fought Make sure that there is a market Don’t solve hard problems Problems that matter for science and industry
  • 97. @GaelVaroquaux Succeeding in academia despite doing good software 1 Game the system 2 Not all battles can be fought 3 Make good software That actually answers scientists needs With quality, software engineering Relying on a communauty Usability matters
  • 98. @GaelVaroquaux Succeeding in academia despite doing good software 1 Game the system 2 Not all battles can be fought 3 Make good software Now go out, and code!