Composition in ML:
in Models, Tools, and Teams
ODSC West - Nov 16, 2021
Dr. Bryan Bischof
– Head of Data Science @ Weights and Biases –
1
In collaboration with Dr. Eric Bunch
Email: bryan.bischof@gmail.com
What is composition?
2
Definition
Compositionality, also known as Fregeʼs principle, states that the
meaning of a complex expression is determined by
1. the meanings of its constituent parts,
and
2. the rules for how those parts are combined.
3
c.f. Fong, Spivak, 2018
Model Composition
4
Examples
Matrix Factorization–or more specifically Singular Value Decomposition–is an
extremely popular latent factor model for recommendation systems. Recall that given a
user-item matrix with rating elements:
We wish to approximate this matrix via training; our approximation technique is to
factorize the matrix into three parts:
- U: representing the relationship between users and latent factors
- 𝚺: describing the strength of each latent factor
- V: indicating the similarity between items and latent factors
5
c.f. Koren, Bell, Volinsky, 2009
Definition
So sometimes,
ʻconstituent partsʼ are metric embeddings
and
ʻhow they are combinedʼ is linear-algebraically.
6
Examples
Seasonal Average Pooling–and other composite forecasting methods–are extremely
simple forecasting methods utilizing repeated model fitting on residuals-of-residuals.
For example letʼs build a univariate forecasting model for a series using only seasonal
components; during the training sequence f(t), consider Month-of-year, Week-of-month,
and Day-of-week as categorical features on each day; and consider
7
Definition
So sometimes,
ʻconstituent partsʼ are pooling layers
and
ʻhow they are combinedʼ is a recursive residual additive process.
8
Examples
Boosted Trees are an ensemble of trees fit via sequentially fit decision trees on the
residuals of the iteratively composed models. In particular, the model at each iteration
is the weighted sum of the iʼth tree, fit on the residuals of the i-1ʼth tree:
with learnable weighting parameters we get a powerful learner!
9
c.f. Friedman, 2009
Definition
So sometimes,
ʻconstituent partsʼ are weighted learners
and
ʻhow they are combinedʼ is recursive additively.
10
Examples
Foundational models–pretrained models combined with downstream task-specific
training–is becoming ubiquitous in deep learning research and applications.
11
c.f. Standley et. al., 2020, Li, Hoiem, 2017
There are numerous architectures for
model transfer. Some of the most exciting
in my opinion are those which jointly train
multiple downstream tasks, e.g. overall
network performance via minimization of
the aggregate loss over all tasks:
Definition
So sometimes,
ʻconstituent partsʼ are parameters trained via learning
and
ʻhow they are combinedʼ is a layer composition and loss sharing.
12
Examples
Equivariant Globally Natural DL–or Graph DL with invariance up to graph
isomorphism–pushes the emerging domain of graph learning to not only accommodate
global isomorphisms, but those built from local mappings.
13
c.f. Haan, Cohen, Welling, 2021
In this example the compositional structure is more
obvious, but nonetheless essential to the
formulation.
Node features may be embedded onto edge
features, and passed into convolution as normal
GNNs.
Definition
So sometimes,
ʻconstituent partsʼ are action mappings on the data structure
and
ʻhow they are combinedʼ is function composition and kernel
convolution.
14
Ok, ok. So composition is much a part of the
structural modeling we do as Machine Learning
practitioners.
But Iʼm more on the applied side...
15
Compositional tools
16
YAFPT? YAMST?
I want to sell you an ML pipeline:
- Itʼs comprised of pure components, i.e. they return the same output every time
from the same input, and have no side effects
- It is higher order–each component provides APIʼs for a function
- It is composable, i.e. theyʼre easily combined via knowledge only of the types of
their inputs and outputs
- It is curriable–providing a fixed set of parameters and inputs allows you to execute
the entire pipeline.
Are you buying? These happen to align with the core principles of Functional
Programming, but also Micro-services. Why does MLOps care about these?
17
c.f. fklearn
Models? Data? No!
Andrew Ng, has recently been proselytizing the gains of a data-centric approach to AI.
He rightly recognizes both the effectiveness of data improvement and preparation, and
systematic attention to the data that your product is built on.
In particular he identifies, correctly, that one formulation of the data pipeline is as
follows:
And he rightly identifies the importance of those backwards arrows this flow. But...
18
c.f. From Model-centric to Data-centric AI
Right answer; wrong test.
Dr. Ngʼs recommendation:
Donʼt: Hold the data fixed and iteratively improve the Model,
Hold the code fixed and iteratively improve the data.
While I deeply appreciate this suggestion to be modular and flexible, it aims too low!
The recommendation from compositional thinking:
Hold the (composition) fixed and iteratively improve (one component).
i.e. Pipeline-centric AI!
19
Itʼs about the process
Data changes, but so do the other components!
The needs of the data change, the expectations of the model change, the objective
functions change, the sources change, etc. If your focus only on the data, youʼre
focusing too closely on the short term goals, and over-constraining your solution.
By instead making primary the data transformations, data assumptions, and
compositions (input and output types).
This allows you to rapidly iterate at multiple locations across the stack where you see
the most opportunity.
20
YAAICP
Letʼs bring in yet another AI catch-phrase:
the data flywheel.
What makes sense about this analogy is the implication
that the inertia of the spinning wheel, ramps up.
In the data flywheel strategy, data products provide
personalization and insight to drive more customer
interactions which may be converted back into
learnable structures.
Notice here the focus on composition!
21
c.f. Matt Turck, Building an AI Startup
Letʼs look at a real ML system architecture
Consider this incredible
overview of just about
every RecSys out there.
This diagram is
data-structure,
infrastructure, and model
architecture agnostic!
And yet, via only the
composition rules, we
have a full system design. 22
c.f. Higley, Oldridge, 2021, Yan, 2021
Is there anything that can help?
MLOps is a somewhat nascent field focused on the overall structure of ML products
and pipelines.
Technology is beginning to be developed around these needs, both to manage the
components of a pipeline centric system, and to execute the type alignment.
People are starting to align on explicit composition coherence:
23
c.f. Shreya Shankar, 2021
And some of us are building the platform
Like these compositional pipelines, our platform is built of components
24
c.f. Weights and Biases
and our platform handles the coherence.
In practice
Machine learning engineers
can avoid writing glue code,
and assert statements, and
drift monitors, and hard
coding url-slugs, and reading
local data into dataloaders,
and training loops, and
ensemble dags, and can get
back to focusing on the data,
the models, and the tasks.
25
c.f. W&B Launch
Donʼt start from scratch
In their Dota 2 challenge landmark paper, the OpenAI team described an essential
component in their mission to train better and better models:
In order to train without restarting from the beginning after each change, we developed
a collection of tools to resume training with minimal loss in performance which we call
surgery.... we performed approximately one surgery per two weeks.
If your dog hasnʼt learned to catch a frisbee by the time theyʼre six weeks old, donʼt get a
new dog–get a new training methodology. 🐕
Composable tools allow you to swap in and out your strategies wherever necessary.
26
c.f. OpenAI, Dota 2, 2019, OpenAI & Weights and Biases
My ML products donʼt look like this!
Well Andrej Karpathyʼs do 󰤇
27
c.f. Karpathy, ICML 2019
Composable teams
28
Thereʼs more?
An even bigger challenge than building effective ML systems is building effective
team structures to support the people who can build those systems.
- What is the right team architecture to enable people to do their best work, and
yet provide opportunities for growth?
- How do you create robustness to team departures, vacations, or burnout?
29
Take from engineering
In much of the above, we took engineeringʼs learning as a foundation and built on
top of that. Here too, we can take away important lessons:
- Atomic tasks, clearly specced
- Assignee agnostic tasks
- PR processes
- Component expertise
While being a full-stack data scientist creates plenty of opportunity for innovation,
over time that stack owns you, and buckles you into a full-time maintenance role.
30
c.f. Eric Colson, 2019
Focus on the relationships
Like our components and interfaces throughout this talk, we as ML practitioners
should–at any given time–focus on executing one task.
We should be given clear inputs and expectations for our outputs.
And we should understand how to communicate and exchange with others.
When it comes time for someone else to work on this task, it should be frictionless
and context rich. With clear documentation of whatʼs been done, and a system of
record for how to reproduce it.
31
c.f. Collaborative Reports
One more Karpathy reference
Karpathyʼs team on self-driving was distributed
over many components of a
massively-multitask problem. In addition to
adversarial collaboration, he generally found
difficulty in optimizing how to compose their
efforts.
Maybe he should try back-propagation to learn
a better weighting. 󰤇
32
c.f. Karpathy, ICML 2019
Thanks!
Check out W&Bʼs composable tools at:
Wandb.ai
Totally free for individuals & academics.
Come chat with us at our booth today and tomorrow, or email contact@wandb.ai. 33

More Related Content

PDF
Sustainable & Composable Generative AI
PPTX
Compositional AI: Fusion of AI/ML Services
PDF
Hybrid use of machine learning and ontology
PDF
OpenML Tutorial ECMLPKDD 2015
PPTX
HILDA 2023 Keynote Bill Howe
PDF
Deep learning and reasoning: Recent advances
PDF
Introduction to active learning
PPTX
Integrating Machine Learning Capabilities into your team
Sustainable & Composable Generative AI
Compositional AI: Fusion of AI/ML Services
Hybrid use of machine learning and ontology
OpenML Tutorial ECMLPKDD 2015
HILDA 2023 Keynote Bill Howe
Deep learning and reasoning: Recent advances
Introduction to active learning
Integrating Machine Learning Capabilities into your team

Similar to ODSC West 2021 – Composition in ML (20)

PPTX
Neel Sundaresan - Teaching a machine to code
PDF
Practical machine learning
PDF
ODSC West 2022 – Kitbashing in ML
PPTX
The Challenges of Bringing Machine Learning to the Masses
PDF
OpenML data@Sheffield
PDF
A field guide the machine learning zoo
PPTX
Software engineering for machine learning.pptx
PDF
C2_W1---.pdf
PDF
Pitfalls of machine learning in production
PDF
Data Workflows for Machine Learning - Seattle DAML
PDF
Week 3 data journey and data storage
PDF
BSSML16 L10. Summary Day 2 Sessions
PDF
Webinar trends in machine learning ce adar july 9 2020 susan mckeever
PDF
Data Workflows for Machine Learning - SF Bay Area ML
PPTX
AI-ML-Virtual-Internship on new technology
PDF
AI and Machine Learning PG program
PPTX
Lessons Learned from Building Machine Learning Software at Netflix
PPTX
Artificial Intelligence, Machine Learning and Deep Learning
PDF
PDF
What are the Unique Challenges and Opportunities in Systems for ML?
Neel Sundaresan - Teaching a machine to code
Practical machine learning
ODSC West 2022 – Kitbashing in ML
The Challenges of Bringing Machine Learning to the Masses
OpenML data@Sheffield
A field guide the machine learning zoo
Software engineering for machine learning.pptx
C2_W1---.pdf
Pitfalls of machine learning in production
Data Workflows for Machine Learning - Seattle DAML
Week 3 data journey and data storage
BSSML16 L10. Summary Day 2 Sessions
Webinar trends in machine learning ce adar july 9 2020 susan mckeever
Data Workflows for Machine Learning - SF Bay Area ML
AI-ML-Virtual-Internship on new technology
AI and Machine Learning PG program
Lessons Learned from Building Machine Learning Software at Netflix
Artificial Intelligence, Machine Learning and Deep Learning
What are the Unique Challenges and Opportunities in Systems for ML?
Ad

Recently uploaded (20)

PPT
Geologic Time for studying geology for geologist
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PPTX
2018-HIPAA-Renewal-Training for executives
PDF
Five Habits of High-Impact Board Members
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PDF
CloudStack 4.21: First Look Webinar slides
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
STKI Israel Market Study 2025 version august
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
UiPath Agentic Automation session 1: RPA to Agents
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
Getting started with AI Agents and Multi-Agent Systems
Geologic Time for studying geology for geologist
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
The influence of sentiment analysis in enhancing early warning system model f...
2018-HIPAA-Renewal-Training for executives
Five Habits of High-Impact Board Members
Convolutional neural network based encoder-decoder for efficient real-time ob...
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
Developing a website for English-speaking practice to English as a foreign la...
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
CloudStack 4.21: First Look Webinar slides
sustainability-14-14877-v2.pddhzftheheeeee
STKI Israel Market Study 2025 version august
Taming the Chaos: How to Turn Unstructured Data into Decisions
Enhancing plagiarism detection using data pre-processing and machine learning...
Benefits of Physical activity for teenagers.pptx
UiPath Agentic Automation session 1: RPA to Agents
Comparative analysis of machine learning models for fake news detection in so...
sbt 2.0: go big (Scala Days 2025 edition)
Custom Battery Pack Design Considerations for Performance and Safety
Getting started with AI Agents and Multi-Agent Systems
Ad

ODSC West 2021 – Composition in ML

  • 1. Composition in ML: in Models, Tools, and Teams ODSC West - Nov 16, 2021 Dr. Bryan Bischof – Head of Data Science @ Weights and Biases – 1 In collaboration with Dr. Eric Bunch Email: bryan.bischof@gmail.com
  • 3. Definition Compositionality, also known as Fregeʼs principle, states that the meaning of a complex expression is determined by 1. the meanings of its constituent parts, and 2. the rules for how those parts are combined. 3 c.f. Fong, Spivak, 2018
  • 5. Examples Matrix Factorization–or more specifically Singular Value Decomposition–is an extremely popular latent factor model for recommendation systems. Recall that given a user-item matrix with rating elements: We wish to approximate this matrix via training; our approximation technique is to factorize the matrix into three parts: - U: representing the relationship between users and latent factors - 𝚺: describing the strength of each latent factor - V: indicating the similarity between items and latent factors 5 c.f. Koren, Bell, Volinsky, 2009
  • 6. Definition So sometimes, ʻconstituent partsʼ are metric embeddings and ʻhow they are combinedʼ is linear-algebraically. 6
  • 7. Examples Seasonal Average Pooling–and other composite forecasting methods–are extremely simple forecasting methods utilizing repeated model fitting on residuals-of-residuals. For example letʼs build a univariate forecasting model for a series using only seasonal components; during the training sequence f(t), consider Month-of-year, Week-of-month, and Day-of-week as categorical features on each day; and consider 7
  • 8. Definition So sometimes, ʻconstituent partsʼ are pooling layers and ʻhow they are combinedʼ is a recursive residual additive process. 8
  • 9. Examples Boosted Trees are an ensemble of trees fit via sequentially fit decision trees on the residuals of the iteratively composed models. In particular, the model at each iteration is the weighted sum of the iʼth tree, fit on the residuals of the i-1ʼth tree: with learnable weighting parameters we get a powerful learner! 9 c.f. Friedman, 2009
  • 10. Definition So sometimes, ʻconstituent partsʼ are weighted learners and ʻhow they are combinedʼ is recursive additively. 10
  • 11. Examples Foundational models–pretrained models combined with downstream task-specific training–is becoming ubiquitous in deep learning research and applications. 11 c.f. Standley et. al., 2020, Li, Hoiem, 2017 There are numerous architectures for model transfer. Some of the most exciting in my opinion are those which jointly train multiple downstream tasks, e.g. overall network performance via minimization of the aggregate loss over all tasks:
  • 12. Definition So sometimes, ʻconstituent partsʼ are parameters trained via learning and ʻhow they are combinedʼ is a layer composition and loss sharing. 12
  • 13. Examples Equivariant Globally Natural DL–or Graph DL with invariance up to graph isomorphism–pushes the emerging domain of graph learning to not only accommodate global isomorphisms, but those built from local mappings. 13 c.f. Haan, Cohen, Welling, 2021 In this example the compositional structure is more obvious, but nonetheless essential to the formulation. Node features may be embedded onto edge features, and passed into convolution as normal GNNs.
  • 14. Definition So sometimes, ʻconstituent partsʼ are action mappings on the data structure and ʻhow they are combinedʼ is function composition and kernel convolution. 14
  • 15. Ok, ok. So composition is much a part of the structural modeling we do as Machine Learning practitioners. But Iʼm more on the applied side... 15
  • 17. YAFPT? YAMST? I want to sell you an ML pipeline: - Itʼs comprised of pure components, i.e. they return the same output every time from the same input, and have no side effects - It is higher order–each component provides APIʼs for a function - It is composable, i.e. theyʼre easily combined via knowledge only of the types of their inputs and outputs - It is curriable–providing a fixed set of parameters and inputs allows you to execute the entire pipeline. Are you buying? These happen to align with the core principles of Functional Programming, but also Micro-services. Why does MLOps care about these? 17 c.f. fklearn
  • 18. Models? Data? No! Andrew Ng, has recently been proselytizing the gains of a data-centric approach to AI. He rightly recognizes both the effectiveness of data improvement and preparation, and systematic attention to the data that your product is built on. In particular he identifies, correctly, that one formulation of the data pipeline is as follows: And he rightly identifies the importance of those backwards arrows this flow. But... 18 c.f. From Model-centric to Data-centric AI
  • 19. Right answer; wrong test. Dr. Ngʼs recommendation: Donʼt: Hold the data fixed and iteratively improve the Model, Hold the code fixed and iteratively improve the data. While I deeply appreciate this suggestion to be modular and flexible, it aims too low! The recommendation from compositional thinking: Hold the (composition) fixed and iteratively improve (one component). i.e. Pipeline-centric AI! 19
  • 20. Itʼs about the process Data changes, but so do the other components! The needs of the data change, the expectations of the model change, the objective functions change, the sources change, etc. If your focus only on the data, youʼre focusing too closely on the short term goals, and over-constraining your solution. By instead making primary the data transformations, data assumptions, and compositions (input and output types). This allows you to rapidly iterate at multiple locations across the stack where you see the most opportunity. 20
  • 21. YAAICP Letʼs bring in yet another AI catch-phrase: the data flywheel. What makes sense about this analogy is the implication that the inertia of the spinning wheel, ramps up. In the data flywheel strategy, data products provide personalization and insight to drive more customer interactions which may be converted back into learnable structures. Notice here the focus on composition! 21 c.f. Matt Turck, Building an AI Startup
  • 22. Letʼs look at a real ML system architecture Consider this incredible overview of just about every RecSys out there. This diagram is data-structure, infrastructure, and model architecture agnostic! And yet, via only the composition rules, we have a full system design. 22 c.f. Higley, Oldridge, 2021, Yan, 2021
  • 23. Is there anything that can help? MLOps is a somewhat nascent field focused on the overall structure of ML products and pipelines. Technology is beginning to be developed around these needs, both to manage the components of a pipeline centric system, and to execute the type alignment. People are starting to align on explicit composition coherence: 23 c.f. Shreya Shankar, 2021
  • 24. And some of us are building the platform Like these compositional pipelines, our platform is built of components 24 c.f. Weights and Biases and our platform handles the coherence.
  • 25. In practice Machine learning engineers can avoid writing glue code, and assert statements, and drift monitors, and hard coding url-slugs, and reading local data into dataloaders, and training loops, and ensemble dags, and can get back to focusing on the data, the models, and the tasks. 25 c.f. W&B Launch
  • 26. Donʼt start from scratch In their Dota 2 challenge landmark paper, the OpenAI team described an essential component in their mission to train better and better models: In order to train without restarting from the beginning after each change, we developed a collection of tools to resume training with minimal loss in performance which we call surgery.... we performed approximately one surgery per two weeks. If your dog hasnʼt learned to catch a frisbee by the time theyʼre six weeks old, donʼt get a new dog–get a new training methodology. 🐕 Composable tools allow you to swap in and out your strategies wherever necessary. 26 c.f. OpenAI, Dota 2, 2019, OpenAI & Weights and Biases
  • 27. My ML products donʼt look like this! Well Andrej Karpathyʼs do 󰤇 27 c.f. Karpathy, ICML 2019
  • 29. Thereʼs more? An even bigger challenge than building effective ML systems is building effective team structures to support the people who can build those systems. - What is the right team architecture to enable people to do their best work, and yet provide opportunities for growth? - How do you create robustness to team departures, vacations, or burnout? 29
  • 30. Take from engineering In much of the above, we took engineeringʼs learning as a foundation and built on top of that. Here too, we can take away important lessons: - Atomic tasks, clearly specced - Assignee agnostic tasks - PR processes - Component expertise While being a full-stack data scientist creates plenty of opportunity for innovation, over time that stack owns you, and buckles you into a full-time maintenance role. 30 c.f. Eric Colson, 2019
  • 31. Focus on the relationships Like our components and interfaces throughout this talk, we as ML practitioners should–at any given time–focus on executing one task. We should be given clear inputs and expectations for our outputs. And we should understand how to communicate and exchange with others. When it comes time for someone else to work on this task, it should be frictionless and context rich. With clear documentation of whatʼs been done, and a system of record for how to reproduce it. 31 c.f. Collaborative Reports
  • 32. One more Karpathy reference Karpathyʼs team on self-driving was distributed over many components of a massively-multitask problem. In addition to adversarial collaboration, he generally found difficulty in optimizing how to compose their efforts. Maybe he should try back-propagation to learn a better weighting. 󰤇 32 c.f. Karpathy, ICML 2019
  • 33. Thanks! Check out W&Bʼs composable tools at: Wandb.ai Totally free for individuals & academics. Come chat with us at our booth today and tomorrow, or email contact@wandb.ai. 33