INTERFACE by apidays 2023 - Open Source ML, Omar Sanseviero, Hugging Face

Open Source ML - from
pretrained models to
production
Run State of the Art Open Source LLMs
in Production

Models
1
What exists out there?

The Hugging Face Hub
Models Spaces
Access over 200k models
shared by the community.
Build MLApps and Demos
to showcase how models
work.
Datasets
Share, access and
collaborate on over 45k
datasets.

The Hugging Face Hub
Models Spaces
Access over 200k models
shared by the community
Build MLApps and Demos
to showcase how models
work.
Datasets
Share, access and
collaborate on over 45k
datasets.
99k-> 200k 19k->60k
16k->45k

The Model Hub
● Models across modalities (Computer Vision, NLP, Audio, multimodal, RL, tabular)
● Multiple libraries (PyTorch, Keras, fastai, SpaCy, NeMo, PaddlePaddle, Stanza, timm)
● 180+ supported languages
● Model cards for documentation
○ Metrics reporting
○ CO2 emissions
○ TensorBoard hosting
○ Interactive widgets

Inference
2
How to do inference of LLMs?

StarCoder LLaMA Falcon
Recent popular models
● Code generation
● 15.5B parameters
● OpenRAILLicense
● 80+ languages
● 1 trillion tokens
● Large ecosystem
● 7B to 65B parameters
● Non-commercial
● 1-1.4 trillion tokens
● Best OS model
● 7B to 40B parameters
● Apache 2.0
● Multilingual
● 1 trillion tokens

Challenges
Evaluation
Existing benchmarks don’t fully capture real world use cases
(e.g. multi-turn).
Customizability
Users want models tuned to their own data or use cases
while preserving privacy.
Model size
LLMs require lots of memory, might not ﬁt into a single
machine, require complex parallelism and communication.
Optimization
Due to model size, latency and throughput are often impacted
leading to require optimized models.

Some things you can do
Load in 4-bit or 8-bit mode
(bitsandbytes, accelerate)
Loading
Distribute among GPUs
(accelerate)
Multi-GPU
Use tools optimized for LLMs
(text-generation-inference)
Inference Libraries
Set device_map="auto" or
even ooad layers to CPU (slow)
Falcon 40B with 45GB (8-bit)
or 27GB (4-bit) of RAM
Used by HF in production!

Text-generation-inference (TGI)
Tensor
Parallelism
Token
Streaming
Metrics and
monitoring
TGI supports most popular LLMs, such as
StarCoder and SantaCoder
Falcon LLaMA, Galactica and OPT GPT-NeoX
Quantization Optimizations Security

Some users
HuggingChat OpenAssistant nat.dev

Training
3
How to adjust models to your own use cases?

Training Fine-tuning PEFT
● $$$
● Lots and lots of data
● Lots of expertise
● $$
● Much less data and
compute
● $
● Even less compute
Recent popular models overview
(Parameter Eicient Fine-Tuning)
You can ﬁne-tune Whisper
or Falcon-7b in free Collab

Example: Whisper
● 1% of trainable params, 5x more batch size
● Fine-tune a 1.6B parameter model with less
than 8GB GPU VRAM
● The resulting checkpoints were less than
1% the size of the original model
Full-Tuning
Results in OOM
LoRA

Example: Stable Diffusion
“dog” adapter “toy” adapter “toy” + “dog” adapter

QLoRA
4-bit Quantization
4-bit quantized pretrained LM
RLHF
Base model with multiple adapters
Efﬁcient
Fine-tune 65B parameter model on a single 48GB GPU

Building demos
4
How to build and share my ML apps?

Why demos?
● Easily present to a wide audience
● Increase reproducibility of research
● Diverse users can identify and debug failure points

Gradio: typical usage
import gradio
app = gradio.Interface(
classify_image,
inputs=“image”,
outputs=“label”)
app.launch()

Turning point in usage of ML
ML/software engineers anyone who can
use a GUI/browser

CREDITS: This presentation template was created by
Slidesgo, and includes icons by Flaticon, infographics &
images by Freepik and illustrations by Storyset
Thanks!
omar@huggingface.co
Omar Sanseviero
@osanseviero
CREDITS: This presentation template was created by Slidesgo,
and includes icons by Flaticon, infographics & images by
Freepik and illustrations by Storyset and Chunte Lee

INTERFACE by apidays 2023 - Open Source ML, Omar Sanseviero, Hugging Face

More Related Content

Similar to INTERFACE by apidays 2023 - Open Source ML, Omar Sanseviero, Hugging Face (20)

More from apidays (20)

Recently uploaded (20)

INTERFACE by apidays 2023 - Open Source ML, Omar Sanseviero, Hugging Face