Analysing Python Machine Learning Notebooks with Moose
1. Analysing Python
Machine Learning
Notebooks with Moose
Evref
fervE
Marius Mignard1
Steven Costiou1
Nicolas Anquetil1
Anne Etien1
1. Univ. Lille, CNRS, Inria, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France
5. 5
Notebooks drawbacks
- Used by people without Software
Engineering knowledge
- Lack of understanding the underlying
mechanisms
6. 6
Machine learning (ML) usage
McKinney SM, Sieniek M, Godbole V, Godwin J, Antropova N, Ashrafian H, Back T, Chesus M, Corrado GC, Darzi A, Etemadi M, Garcia-Vicente F, Gilbert FJ, Halling-Brown
M, Hassabis D, Jansen S, Karthikesalingam A, Kelly CJ, King D, Ledsam JR, Melnick D, Mostofi H, Peng L, Reicher JJ, Romera-Paredes B, Sidebottom R, Suleyman M, Tse
D, Young KC, De Fauw J, Shetty S. International evaluation of an AI system for breast cancer screening. Nature. 2020 Jan;577(7788):89-94.
(A) A sample cancer case that was missed by all six
readers in the US reader study, but correctly identified
by the AI system
(B) A sample cancer case that was caught by all six
readers in the US reader study, but missed
by the AI system.
14. 15
Rules example
Context :
All code cells;
All imports
Condition :
is reimported
Context :
All cells;
Condition :
Lines < 50
Context :
All code cells;
read_csv() invocations
Condition :
Presence of required parameters
Python – Keep the code clean Notebook – Enforce a modular design
ML – Type inference error
15. 16
Python rules – Literature extraction
Most common ML code violations detected by Pylint
Error
Convention
Warning
Refactor
16. 17
Python rules – Literature mint
Van Oort, B., Cruz, L., Aniche, M., & Van Deursen, A. (2021, May).
The prevalence of code smells in machine learning projects.
In 2021 IEEE/ACM 1st workshop on AI engineering-software engineering for AI (WAIN)
(pp. 1-8). IEEE.
Siddik, M. S., & Bezemer, C. P. (2023, October).
Do Code Quality and Style Issues Differ Across (Non-) Machine Learning Notebooks? Yes!.
In 2023 IEEE 23rd International Working Conference
on Source Code Analysis and Manipulation (SCAM) (pp. 72-83). IEEE.
Convention
Warning
Refactor
17. 18
Python rules – Literature mint
Van Oort, B., Cruz, L., Aniche, M., & Van Deursen, A. (2021, May).
The prevalence of code smells in machine learning projects.
In 2021 IEEE/ACM 1st workshop on AI engineering-software engineering for AI (WAIN)
(pp. 1-8). IEEE.
Siddik, M. S., & Bezemer, C. P. (2023, October).
Do Code Quality and Style Issues Differ Across (Non-) Machine Learning Notebooks? Yes!.
In 2023 IEEE 23rd International Working Conference
on Source Code Analysis and Manipulation (SCAM) (pp. 72-83). IEEE.
Convention
Warning
Refactor
18. 19
Python rules – Literature mint
Van Oort, B., Cruz, L., Aniche, M., & Van Deursen, A. (2021, May).
The prevalence of code smells in machine learning projects.
In 2021 IEEE/ACM 1st workshop on AI engineering-software engineering for AI (WAIN)
(pp. 1-8). IEEE.
Siddik, M. S., & Bezemer, C. P. (2023, October).
Do Code Quality and Style Issues Differ Across (Non-) Machine Learning Notebooks? Yes!.
In 2023 IEEE 23rd International Working Conference
on Source Code Analysis and Manipulation (SCAM) (pp. 72-83). IEEE.
Convention
Warning
Refactor
19. 20
Python rules
Script Notebook
a = 1 + 1
a
pointless-statement / W0104
PyLint rule used when a statement doesn't have (or at least seems to) any effect.