Maintainability Challenges inML:ASLR

Maintainability Challenges
in ML : A SLR
KARTHIK SHIVASHANKAR ANTONIO MARTINI
UNIVERSITY OF OSLO
DEPARTMENT OF INFORMATICS

Study Objective
Our study aims to identify and synthesise the maintainability
challenges in different stages of the ML workflow and understand
how these stages are interdependent and impact each other’s
maintainability.

Maintainability
Software maintainability means ”the ease with which a software
system or component can be modified to correct faults, improve
performance or other attributes and adapt to a changing
environment”

Method
We have a replication package with all the
details and metadata related to this SLR
study @
https://doi.org/10.5281/zenodo.6400559

Research Questions
(RQ1) What are the Data Engineering
Maintainability challenges?
(RQ2) What are the Model Engineering
Maintainability challenges?
(RQ3) What are the current maintainability
challenges when Building an ML system?

RQ1 Key
takeaways
•Data is messy, error-prone, and lacks transparency
and ownership.
•No guarantee that pre-processing can handle all
types of quality errors, bias and adversarial data.
•Most Data pipelines are tested in a trial and error
manner. It also changes and evolves, making it
difficult to validate and maintain it on an ongoing
basis.
Courtesy Randal Munroe of XKCD

RQ2 Key takeaways
•The entanglement in hyperparameters directly affects
the model performance and training pipeline.
•Stochastic nature of ML and rapidly changing input and
expected output create a moving target and make ML
testing an open challenge.
•Data seasonality and fluctuation in data collection may
lead to model staleness and degrading in performance
Image credits:
https://matthewmcateer.me/blog/machine-learning-technical-debt/

RQ3 Key takeaways
• In general, most cloud providers do not provide a common programming
model. They typically use either a black box or a complex runtime environment
to approach ML, leading to a tight coupling between the modelling and
infrastructure layers.
• Although AutoML alleviates some challenges by automating the model
selection and hyper-tuning, it is still hard to minimise expert intervention
easily with the current scene.
• Engineers spend significant effort developing ad hoc programs to connect
components from different software libraries, processing various forms of raw
input, and interfacing with external systems, leading to pipeline jungles and
glue codes in an MLOps-like set-up.
Credits: https://towardsdatascience.com/seven-signs-you-might-be-creating-ml-technical-debt-
1a96a840fd80

Interdependence of
ML challenges
ML has unique quality attributes concerns during
development, such as
•data-dependent behaviour,
•detecting and responding to drift over time,
•handling bias and quality issues,
•timely capture of ground truth for retraining of a model
to deliver a quality ML system
•And many more
Image credits:
https://matthewmcateer.me/blog/machine-learning-technical-debt/

Interdependence
of Maintainability
challenges in
different stages

If you try to use ML to give fashion advice, know that fashion changes over
time

CREDITS:
https://towardsdatascience.com/how-to-attack-machine-learning-evasion-poisoning-inference-trojans-backdoors-a7cb5832595c
https://medium.com/thelaunchpad/how-to-protect-your-machine-learning-product-from-time-adversaries-and-itself-ff07727d6712
ML systems are data-dependent and complex, making them susceptible to Data
and Concept Drift which leads to rapid obsolescence of input and expected
output parts

Credits: https://towardsdatascience.com/machine-learning-in-production-why-is-it-so-difficult-28ce74bfc732

Implication for developers
▪There is a lack of standard tools and method for provenance tracking, publishing of ML models
and their artefacts, tracking data transformations, querying and storing intermediate steps.
▪Many ML projects fail at the prototyping stage because setting up infrastructure for deployment
and maintenance requires integration and management of glue code, ad-hoc pipelines, and data
monitoring.
▪In collaborative or multi-organisational projects, monitoring processes are complex because
different teams have different metrics and requirements, especially in terms of governance and
regulations and also a lack of standards to communicate about ML issues and their quality

Implication for Researcher
•It is unclear even for experienced developers how to select between several data processing
steps and how they will affect the model’s performance.
•ML systems constantly adapt to new data, creating a moving target and posing a different set of
challenges to maintain unit and regression testing than traditional software.
•Need better validation algorithms and Monitoring techniques to identify key data and model
metrics over time.

Maintainability Challenges inML:ASLR

More Related Content

Similar to Maintainability Challenges inML:ASLR (20)

More from SEAA 2022 (18)

Recently uploaded (20)

Maintainability Challenges inML:ASLR