SlideShare a Scribd company logo
Open Source ML - from
pretrained models to
production
Run State of the Art Open Source LLMs
in Production
INTERFACE by apidays 2023 - Open Source ML, Omar Sanseviero, Hugging Face
Models
1
What exists out there?
The Hugging Face Hub
Models Spaces
Access over 200k models
shared by the community.
Build MLApps and Demos
to showcase how models
work.
Datasets
Share, access and
collaborate on over 45k
datasets.
The Hugging Face Hub
Models Spaces
Access over 200k models
shared by the community
Build MLApps and Demos
to showcase how models
work.
Datasets
Share, access and
collaborate on over 45k
datasets.
99k-> 200k 19k->60k
16k->45k
The Model Hub
● Models across modalities (Computer Vision, NLP, Audio, multimodal, RL, tabular)
● Multiple libraries (PyTorch, Keras, fastai, SpaCy, NeMo, PaddlePaddle, Stanza, timm)
● 180+ supported languages
● Model cards for documentation
○ Metrics reporting
○ CO2 emissions
○ TensorBoard hosting
○ Interactive widgets
Inference
2
How to do inference of LLMs?
StarCoder LLaMA Falcon
Recent popular models
● Code generation
● 15.5B parameters
● OpenRAILLicense
● 80+ languages
● 1 trillion tokens
● Large ecosystem
● 7B to 65B parameters
● Non-commercial
● 1-1.4 trillion tokens
● Best OS model
● 7B to 40B parameters
● Apache 2.0
● Multilingual
● 1 trillion tokens
Challenges
Evaluation
Existing benchmarks don’t fully capture real world use cases
(e.g. multi-turn).
Customizability
Users want models tuned to their own data or use cases
while preserving privacy.
Model size
LLMs require lots of memory, might not fit into a single
machine, require complex parallelism and communication.
Optimization
Due to model size, latency and throughput are often impacted
leading to require optimized models.
Some things you can do
Load in 4-bit or 8-bit mode
(bitsandbytes, accelerate)
Loading
Distribute among GPUs
(accelerate)
Multi-GPU
Use tools optimized for LLMs
(text-generation-inference)
Inference Libraries
Set device_map="auto" or
even ooad layers to CPU (slow)
Falcon 40B with 45GB (8-bit)
or 27GB (4-bit) of RAM
Used by HF in production!
Text-generation-inference (TGI)
Tensor
Parallelism
Token
Streaming
Metrics and
monitoring
TGI supports most popular LLMs, such as
StarCoder and SantaCoder
Falcon LLaMA, Galactica and OPT GPT-NeoX
Quantization Optimizations Security
Some users
HuggingChat OpenAssistant nat.dev
Training
3
How to adjust models to your own use cases?
Training Fine-tuning PEFT
● $$$
● Lots and lots of data
● Lots of expertise
● $$
● Much less data and
compute
● $
● Even less compute
Recent popular models overview
(Parameter Eicient Fine-Tuning)
You can fine-tune Whisper
or Falcon-7b in free Collab
Example: Whisper
● 1% of trainable params, 5x more batch size
● Fine-tune a 1.6B parameter model with less
than 8GB GPU VRAM
● The resulting checkpoints were less than
1% the size of the original model
Full-Tuning
Results in OOM
LoRA
Example: Stable Diffusion
“dog” adapter “toy” adapter “toy” + “dog” adapter
QLoRA
4-bit Quantization
4-bit quantized pretrained LM
RLHF
Base model with multiple adapters
Efficient
Fine-tune 65B parameter model on a single 48GB GPU
Building demos
4
How to build and share my ML apps?
Why demos?
● Easily present to a wide audience
● Increase reproducibility of research
● Diverse users can identify and debug failure points
INTERFACE by apidays 2023 - Open Source ML, Omar Sanseviero, Hugging Face
Gradio: typical usage
import gradio
app = gradio.Interface(
classify_image,
inputs=“image”,
outputs=“label”)
app.launch()
Turning point in usage of ML
ML/software engineers anyone who can
use a GUI/browser
CREDITS: This presentation template was created by
Slidesgo, and includes icons by Flaticon, infographics &
images by Freepik and illustrations by Storyset
Thanks!
omar@huggingface.co
Omar Sanseviero
@osanseviero
CREDITS: This presentation template was created by Slidesgo,
and includes icons by Flaticon, infographics & images by
Freepik and illustrations by Storyset and Chunte Lee

More Related Content

PDF
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
PDF
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
PDF
Tensorflow 2.0 and Coral Edge TPU
PDF
Utilisation de MLflow pour le cycle de vie des projet Machine learning
PDF
Scaling up Machine Learning Development
PDF
SFSCON23 - Chris Mair - Self-hosted, Open Source Large Language Models (LLMs)
PDF
MLFlow: Platform for Complete Machine Learning Lifecycle
PDF
AZConf 2023 - Considerations for LLMOps: Running LLMs in production
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
Tensorflow 2.0 and Coral Edge TPU
Utilisation de MLflow pour le cycle de vie des projet Machine learning
Scaling up Machine Learning Development
SFSCON23 - Chris Mair - Self-hosted, Open Source Large Language Models (LLMs)
MLFlow: Platform for Complete Machine Learning Lifecycle
AZConf 2023 - Considerations for LLMOps: Running LLMs in production

Similar to INTERFACE by apidays 2023 - Open Source ML, Omar Sanseviero, Hugging Face (20)

PDF
MLFlow 1.0 Meetup
PPTX
ML Ops Tools ML flow and Hugging Face(2).pptx
PDF
Reproducible AI Using PyTorch and MLflow
PDF
Reproducible AI Using PyTorch and MLflow
PDF
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
PPTX
Deployment of the Machine Learning at the production level
PDF
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
PDF
Reproducible AI using MLflow and PyTorch
PDF
Machine learning from software developers point of view
PDF
Open source ml systems that need to be built
PDF
mlflow: Accelerating the End-to-End ML lifecycle
PDF
Julien Simon - Deep Dive - Optimizing LLM Inference
PDF
Managing the Machine Learning Lifecycle with MLOps
PPTX
Serving BERT Models in Production with TorchServe
PDF
Scaling Up AI Research to Production with PyTorch and MLFlow
PDF
PyTorch - an ecosystem for deep learning with Soumith Chintala (Facebook AI)
PDF
AWS_Meetup_BLR_July_22_Social.pdf
PPTX
python_libraries_for_artificial_intelligence.pptx
PPTX
Machine Learning Toolssssssssssssss.pptx
PPTX
Introduction to LLM Post-Training - MIT 6.S191 2025
MLFlow 1.0 Meetup
ML Ops Tools ML flow and Hugging Face(2).pptx
Reproducible AI Using PyTorch and MLflow
Reproducible AI Using PyTorch and MLflow
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
Deployment of the Machine Learning at the production level
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
Reproducible AI using MLflow and PyTorch
Machine learning from software developers point of view
Open source ml systems that need to be built
mlflow: Accelerating the End-to-End ML lifecycle
Julien Simon - Deep Dive - Optimizing LLM Inference
Managing the Machine Learning Lifecycle with MLOps
Serving BERT Models in Production with TorchServe
Scaling Up AI Research to Production with PyTorch and MLFlow
PyTorch - an ecosystem for deep learning with Soumith Chintala (Facebook AI)
AWS_Meetup_BLR_July_22_Social.pdf
python_libraries_for_artificial_intelligence.pptx
Machine Learning Toolssssssssssssss.pptx
Introduction to LLM Post-Training - MIT 6.S191 2025
Ad

More from apidays (20)

PDF
apidays Munich 2025 - The Physics of Requirement Sciences Through Application...
PDF
apidays Munich 2025 - Developer Portals, API Catalogs, and Marketplaces, Miri...
PDF
apidays Munich 2025 - Making Sense of AI-Ready APIs in a Buzzword World, Andr...
PDF
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
PDF
apidays Munich 2025 - The Double Life of the API Product Manager, Emmanuel Pa...
PDF
apidays Munich 2025 - Let’s build, debug and test a magic MCP server in Postm...
PDF
apidays Munich 2025 - The life-changing magic of great API docs, Jens Fischer...
PDF
apidays Munich 2025 - Automating Operations Without Reinventing the Wheel, Ma...
PDF
apidays Munich 2025 - Geospatial Artificial Intelligence (GeoAI) with OGC API...
PPTX
apidays Munich 2025 - GraphQL 101: I won't REST, until you GraphQL, Surbhi Si...
PPTX
apidays Munich 2025 - Effectively incorporating API Security into the overall...
PPTX
apidays Munich 2025 - Federated API Management and Governance, Vince Baker (D...
PPTX
apidays Munich 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (Aavista Oy)
PPTX
apidays Munich 2025 - Streamline & Secure LLM Traffic with APISIX AI Gateway ...
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays Munich 2025 - The Physics of Requirement Sciences Through Application...
apidays Munich 2025 - Developer Portals, API Catalogs, and Marketplaces, Miri...
apidays Munich 2025 - Making Sense of AI-Ready APIs in a Buzzword World, Andr...
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays Munich 2025 - The Double Life of the API Product Manager, Emmanuel Pa...
apidays Munich 2025 - Let’s build, debug and test a magic MCP server in Postm...
apidays Munich 2025 - The life-changing magic of great API docs, Jens Fischer...
apidays Munich 2025 - Automating Operations Without Reinventing the Wheel, Ma...
apidays Munich 2025 - Geospatial Artificial Intelligence (GeoAI) with OGC API...
apidays Munich 2025 - GraphQL 101: I won't REST, until you GraphQL, Surbhi Si...
apidays Munich 2025 - Effectively incorporating API Security into the overall...
apidays Munich 2025 - Federated API Management and Governance, Vince Baker (D...
apidays Munich 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (Aavista Oy)
apidays Munich 2025 - Streamline & Secure LLM Traffic with APISIX AI Gateway ...
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
Ad

Recently uploaded (20)

PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Business Analytics and business intelligence.pdf
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
Computer network topology notes for revision
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction to machine learning and Linear Models
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
IBA_Chapter_11_Slides_Final_Accessible.pptx
climate analysis of Dhaka ,Banglades.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Introduction to Knowledge Engineering Part 1
Introduction-to-Cloud-ComputingFinal.pptx
Database Infoormation System (DBIS).pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Reliability_Chapter_ presentation 1221.5784
Fluorescence-microscope_Botany_detailed content
Business Analytics and business intelligence.pdf
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
Computer network topology notes for revision
Supervised vs unsupervised machine learning algorithms
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to machine learning and Linear Models
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...

INTERFACE by apidays 2023 - Open Source ML, Omar Sanseviero, Hugging Face

  • 1. Open Source ML - from pretrained models to production Run State of the Art Open Source LLMs in Production
  • 4. The Hugging Face Hub Models Spaces Access over 200k models shared by the community. Build MLApps and Demos to showcase how models work. Datasets Share, access and collaborate on over 45k datasets.
  • 5. The Hugging Face Hub Models Spaces Access over 200k models shared by the community Build MLApps and Demos to showcase how models work. Datasets Share, access and collaborate on over 45k datasets. 99k-> 200k 19k->60k 16k->45k
  • 6. The Model Hub ● Models across modalities (Computer Vision, NLP, Audio, multimodal, RL, tabular) ● Multiple libraries (PyTorch, Keras, fastai, SpaCy, NeMo, PaddlePaddle, Stanza, timm) ● 180+ supported languages ● Model cards for documentation ○ Metrics reporting ○ CO2 emissions ○ TensorBoard hosting ○ Interactive widgets
  • 7. Inference 2 How to do inference of LLMs?
  • 8. StarCoder LLaMA Falcon Recent popular models ● Code generation ● 15.5B parameters ● OpenRAILLicense ● 80+ languages ● 1 trillion tokens ● Large ecosystem ● 7B to 65B parameters ● Non-commercial ● 1-1.4 trillion tokens ● Best OS model ● 7B to 40B parameters ● Apache 2.0 ● Multilingual ● 1 trillion tokens
  • 9. Challenges Evaluation Existing benchmarks don’t fully capture real world use cases (e.g. multi-turn). Customizability Users want models tuned to their own data or use cases while preserving privacy. Model size LLMs require lots of memory, might not fit into a single machine, require complex parallelism and communication. Optimization Due to model size, latency and throughput are often impacted leading to require optimized models.
  • 10. Some things you can do Load in 4-bit or 8-bit mode (bitsandbytes, accelerate) Loading Distribute among GPUs (accelerate) Multi-GPU Use tools optimized for LLMs (text-generation-inference) Inference Libraries Set device_map="auto" or even ooad layers to CPU (slow) Falcon 40B with 45GB (8-bit) or 27GB (4-bit) of RAM Used by HF in production!
  • 11. Text-generation-inference (TGI) Tensor Parallelism Token Streaming Metrics and monitoring TGI supports most popular LLMs, such as StarCoder and SantaCoder Falcon LLaMA, Galactica and OPT GPT-NeoX Quantization Optimizations Security
  • 13. Training 3 How to adjust models to your own use cases?
  • 14. Training Fine-tuning PEFT ● $$$ ● Lots and lots of data ● Lots of expertise ● $$ ● Much less data and compute ● $ ● Even less compute Recent popular models overview (Parameter Eicient Fine-Tuning) You can fine-tune Whisper or Falcon-7b in free Collab
  • 15. Example: Whisper ● 1% of trainable params, 5x more batch size ● Fine-tune a 1.6B parameter model with less than 8GB GPU VRAM ● The resulting checkpoints were less than 1% the size of the original model Full-Tuning Results in OOM LoRA
  • 16. Example: Stable Diffusion “dog” adapter “toy” adapter “toy” + “dog” adapter
  • 17. QLoRA 4-bit Quantization 4-bit quantized pretrained LM RLHF Base model with multiple adapters Efficient Fine-tune 65B parameter model on a single 48GB GPU
  • 18. Building demos 4 How to build and share my ML apps?
  • 19. Why demos? ● Easily present to a wide audience ● Increase reproducibility of research ● Diverse users can identify and debug failure points
  • 21. Gradio: typical usage import gradio app = gradio.Interface( classify_image, inputs=“image”, outputs=“label”) app.launch()
  • 22. Turning point in usage of ML ML/software engineers anyone who can use a GUI/browser
  • 23. CREDITS: This presentation template was created by Slidesgo, and includes icons by Flaticon, infographics & images by Freepik and illustrations by Storyset Thanks! omar@huggingface.co Omar Sanseviero @osanseviero CREDITS: This presentation template was created by Slidesgo, and includes icons by Flaticon, infographics & images by Freepik and illustrations by Storyset and Chunte Lee