SlideShare a Scribd company logo
Giovedì 24 Ottobre 2024
@PyDataVenice #17 #Meetup #PyData
in presenza e in streaming
alle ore 19:00
Sara Ferro
Post Doc
@ CCHT
Alessandra Bilardi
Data / Automation Specialist
@ Corley Cloud
#HuggingFace and
#libraries
How to fine-tune a Transformer
to transcribe historical
documents
Hugging Face Overview
Venice
Promotori di PyData Venice #17
Promotori di PyData Venice #17
︖
︖
Proposte
Agenda
Speech
Prossimi incontri
Networking
Agenda
Speech
Prossimi incontri
Networking
🎉
Hugging Face Overview
@PyDataVenice #17 #Meetup #PyData
Alessandra Bilardi - Data & Automation Specialist @ Corley Cloud
Agenda
Hugging Face in numbers
Hugging Face packages and third party tools
The platform
Hugging Face in numbers
🤗 Hugging Face history
https://en.wikipedia.org/wiki/Hugging_Face
Event When Notes
Foundation 2016 Clément Delangue (CEO), Julien
Chaumond (CTO), Thomas Wolf (CSO)
BLOOM (Big-science Large
Open-science Open-access
Multilingual language model)
2021 - 2022 Open LLM with research groups
176 billion parameter
Private Hub August 2022 Support SaaS or on-prem deployment
Gradio December 2022 The fastest way to demo your ML model
with a friendly web interface
Partnerships 2023 - 2024 AWS, Google, Nvidia, AMD, Intel, IBM, ..
may
open source
with you
🤗 Hugging Face sections
https://huggingface.co/
Name section Hugging Face Kaggle Name section
Datasets 230K 389K (298K) Datasets
Models 1M 9200 (2300) Models
Spaces
(Gradio & Streamlit)
over 500K 1.2M (1M) Code
Posts, Blog, Articles, Discussions and Learn Discussions and Learn
Hardware for spaces Hardware for notebooks
Enterprise Hub / Competitions Competitions
Hugging
Face
Hub
https://github.com/huggingface/course
🤗 Hugging Face resources
https://huggingface.co/docs
Resource Quantity
huggingface_hub[cli] (manager of spaces) datasets and files of spaces
huggingface_hub python library all in Hugging Face Hub
Spaces infinity
Web interfaces for spaces (docker, gradio or streamlit) free
Hardware for spaces (with 2 vCPU 16 GB or ZeroGPU) free
Inference API (serverless) 1000 requests / day
Organization free
Hugging Face packages
and third party tools
🤗 Hugging Face packages on pypi.org
Resource Notes
accelerate, optimum, peft Hardware / software optimization
datasets, tokenizers, evaluate Basic packages
diffusers, transformers https://github.com/huggingface/transformers.js
gradio The fastest way to demo your ML model with WI
huggingface_hub https://github.com/huggingface/huggingface.js
https://huggingface.co/docs/huggingface_hub/index https://github.com/huggingface
🤗 Client use cases
🤗 Client use cases
🤗 Client use cases
🤗 Use cases tools
https://github.com/ggerganov/llama.cpp
Command line tool (GitHub contributors)
● ggerganov/llama.cpp (915)
● vllm-project/vllm (601)
● ggerganov/whisper.cpp (419)
User Interface (GitHub contributors)
● oobabooga/text-generation-webui (332)
● ollama/ollama (316)
● nomic-ai/gpt4all (115)
🤗 Use cases tools
https://github.com/ggerganov/llama.cpp
Command line tool (GitHub contributors)
● ggerganov/llama.cpp (915)
● vllm-project/vllm (601)
● ggerganov/whisper.cpp (419)
Packages (GitHub contributors)
● 🤗 /transformers (2799), 🤗 /diffusers
(816)
● gradio-app/gradio (339), streamlit (239)
● 🤗 /huggingface_hub (206)
User Interface (GitHub contributors)
● oobabooga/text-generation-webui (332)
● ollama/ollama (316)
● open-webui/open-webui (236)
Frameworks (GitHub contributors)
● tensorflow/tensorflow (3604)
● langchain-ai/langchain (3177)
● run-llama/llama_index (1275)
https://blog.apify.com/langchain-alternatives/
The platform
Questions ?
@PyDataVenice #17 #Meetup #PyData
How to fine-tune a Transformer to
transcribe historical documents
@PyDataVenice #17 #Meetup #PyData
Sara Ferro - Research Fellow @ CCHT
Questions ?
@PyDataVenice #17 #Meetup #PyData
Excursus
● 10 👥 21/09/19, PyDataVE #0
● 9 👥 13/12/19, PyDataVE #1 - Introduzione ai Notebook
● 23 👥 31/03/20, PyDataVE #2 - DataViz (geo e non)
● ..
● 26 👥 19/04/24, PyDataVE #15 - OpenCV Use Cases
● 66 👥 27/06/24, PyDataVE #16 - ComputerVision & DecisionMaking
● 59 👥 24/10/24, PyDataVE #17 - HuggingFace and libraries
YouTube: @pydatavenice/videos & @pydatavenice/streams
@PyDataVenice #17 #Meetup #PyData
Prossimi incontri
● 3 - 5 dicembre, https://pydata.org/global2024
Prossimi incontri
● 3 - 5 dicembre, https://pydata.org/global2024
● 5 - 6 dicembre, https://pydata.org/global2024/impact-program
Prossimi incontri
● 3 - 5 dicembre, https://pydata.org/global2024
● 5 - 6 dicembre, https://pydata.org/global2024/impact-program
● YouTube: @PyDataTV/playlists
Prossimi incontri
● 3 - 5 dicembre, https://pydata.org/global2024
● 5 - 6 dicembre, https://pydata.org/global2024/impact-program
● YouTube: @PyDataTV/playlists
● giovedì 12 dicembre ore 19:00
Prossimi incontri
● 3 - 5 dicembre, https://pydata.org/global2024
● 5 - 6 dicembre, https://pydata.org/global2024/impact-program
● YouTube: @PyDataTV/playlists
● giovedì 12 dicembre ore 19:00
󰏅
E
n
g
l
i
s
h
󰏅
Proposte
Thanks for listening.
@PyDataVenice #17 #Meetup #PyData

More Related Content

PDF
Overview of the Kaggle platform and its competitions
PDF
H2O at Berlin R Meetup
PDF
Berlin R Meetup
PDF
Getting started with go - Florin Patan - Codemotion Milan 2016
PDF
Mobile DevOps pipeline using Google Flutter
PDF
Getting started with Go - Florin Patan - Codemotion Rome 2017
PDF
Europace's journey to InnerSource
PDF
Overview of the Kaggle platform and its competitions
H2O at Berlin R Meetup
Berlin R Meetup
Getting started with go - Florin Patan - Codemotion Milan 2016
Mobile DevOps pipeline using Google Flutter
Getting started with Go - Florin Patan - Codemotion Rome 2017
Europace's journey to InnerSource

Similar to Overview of Hugging Face platform - 2024-10-24 (20)

PPTX
Meetups - The Oracle Ace Way
PDF
DevSecCon Singapore 2018 - in graph we trust By Imran Mohammed
PDF
In graph we trust: Microservices, GraphQL and security challenges
PDF
Google... more than just a cloud
PDF
ITKonekt 2023: The Busy Platform Engineers Guide to API Gateways
PPTX
Removing Language Barriers for Spanish-speaking Professionals
PDF
From zero to Google APIs: Beyond search & AI... leverage all of Google
PDF
Practical Examples of Serverless Architecture using AWS Lambda and PyWren as ...
PDF
iTHome Gopher Day 2017: What can Golang do? (Using project 52 as examples)
PDF
Let's build Developer Portal with Backstage
PPT
Open Source for Women / Girl Geeks
PDF
Putting data science to work
PDF
Bits+atoms+processes: the influence of code culture on Design @ Cumulus Helsi...
PDF
Manageable Data Pipelines With Airflow (and kubernetes) - GDG DevFest
PDF
Leaping Forward: Finding The Future of Your API Docs
PDF
Introduction to Go
PDF
Run your code serverlessly on Google's open cloud
PPTX
Scaling applications with go
PDF
Open Design Definition @ Fab* @ Future Everything
PDF
DevDay 2018: Ulrich Deiters - Offline First - kein Netz, kein Fehler, zufried...
Meetups - The Oracle Ace Way
DevSecCon Singapore 2018 - in graph we trust By Imran Mohammed
In graph we trust: Microservices, GraphQL and security challenges
Google... more than just a cloud
ITKonekt 2023: The Busy Platform Engineers Guide to API Gateways
Removing Language Barriers for Spanish-speaking Professionals
From zero to Google APIs: Beyond search & AI... leverage all of Google
Practical Examples of Serverless Architecture using AWS Lambda and PyWren as ...
iTHome Gopher Day 2017: What can Golang do? (Using project 52 as examples)
Let's build Developer Portal with Backstage
Open Source for Women / Girl Geeks
Putting data science to work
Bits+atoms+processes: the influence of code culture on Design @ Cumulus Helsi...
Manageable Data Pipelines With Airflow (and kubernetes) - GDG DevFest
Leaping Forward: Finding The Future of Your API Docs
Introduction to Go
Run your code serverlessly on Google's open cloud
Scaling applications with go
Open Design Definition @ Fab* @ Future Everything
DevDay 2018: Ulrich Deiters - Offline First - kein Netz, kein Fehler, zufried...
Ad

More from Alessandra Bilardi (20)

PDF
Amazon Q and Amazon Bedrock, fully managed vs. custom - 2025-06-25
PDF
The Art of Data Visualization - 2025-05-31
PDF
Data Management on AWS: from caos to centralized governance - 2025-03-26
PDF
GenAI-powered assistants compared in a real case - 2025-03-18
PDF
Forecasting in AWS - 2025-01-25
PDF
A gentle introduction to MLSecOps - 2024-10-11
PDF
Custom processing and modeling with Amazon SageMaker - 2024-09-26
PDF
Data scientist vs Cloud engineer: who wins ? - 2024-09-19
PDF
Custom processing and modeling with Amazon SageMaker - 2024-06-17
PDF
IoT: ingestion, streaming, real-time and interactive data analysis - 2024-05-29
PDF
MLOps vs LLMOps (by workflows and use cases) - 2024-05-21
PDF
How to analyze the data arriving from the IoT? - 2024-05-16
PDF
Overview of the OpenCV library and some use cases - 2024-04-19
PDF
How to move your ML system from local to production - 2024-03-15
PDF
Forecasting in AWS - 2024-02-01
PDF
From your laptop to all resource that you need - 2023-12-09
PDF
Parallelize data processing - 2023-10-24
PDF
The Fourier transformation - 2023-07-23
PDF
Anomaly Detection and IP Insights - 2023-06-10
PDF
Forecasting in AWS - 2023-05-16
Amazon Q and Amazon Bedrock, fully managed vs. custom - 2025-06-25
The Art of Data Visualization - 2025-05-31
Data Management on AWS: from caos to centralized governance - 2025-03-26
GenAI-powered assistants compared in a real case - 2025-03-18
Forecasting in AWS - 2025-01-25
A gentle introduction to MLSecOps - 2024-10-11
Custom processing and modeling with Amazon SageMaker - 2024-09-26
Data scientist vs Cloud engineer: who wins ? - 2024-09-19
Custom processing and modeling with Amazon SageMaker - 2024-06-17
IoT: ingestion, streaming, real-time and interactive data analysis - 2024-05-29
MLOps vs LLMOps (by workflows and use cases) - 2024-05-21
How to analyze the data arriving from the IoT? - 2024-05-16
Overview of the OpenCV library and some use cases - 2024-04-19
How to move your ML system from local to production - 2024-03-15
Forecasting in AWS - 2024-02-01
From your laptop to all resource that you need - 2023-12-09
Parallelize data processing - 2023-10-24
The Fourier transformation - 2023-07-23
Anomaly Detection and IP Insights - 2023-06-10
Forecasting in AWS - 2023-05-16
Ad

Recently uploaded (20)

PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Foundation of Data Science unit number two notes
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Business Acumen Training GuidePresentation.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Launch Your Data Science Career in Kochi – 2025
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Global journeys: estimating international migration
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Fluorescence-microscope_Botany_detailed content
Major-Components-ofNKJNNKNKNKNKronment.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
IB Computer Science - Internal Assessment.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Foundation of Data Science unit number two notes
Introduction to Knowledge Engineering Part 1
Business Acumen Training GuidePresentation.pptx
Miokarditis (Inflamasi pada Otot Jantung)
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Launch Your Data Science Career in Kochi – 2025
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
.pdf is not working space design for the following data for the following dat...
Global journeys: estimating international migration
IBA_Chapter_11_Slides_Final_Accessible.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx

Overview of Hugging Face platform - 2024-10-24

  • 1. Giovedì 24 Ottobre 2024 @PyDataVenice #17 #Meetup #PyData in presenza e in streaming alle ore 19:00 Sara Ferro Post Doc @ CCHT Alessandra Bilardi Data / Automation Specialist @ Corley Cloud #HuggingFace and #libraries How to fine-tune a Transformer to transcribe historical documents Hugging Face Overview Venice
  • 2. Promotori di PyData Venice #17
  • 3. Promotori di PyData Venice #17 ︖ ︖
  • 7. Hugging Face Overview @PyDataVenice #17 #Meetup #PyData Alessandra Bilardi - Data & Automation Specialist @ Corley Cloud
  • 8. Agenda Hugging Face in numbers Hugging Face packages and third party tools The platform
  • 9. Hugging Face in numbers
  • 10. 🤗 Hugging Face history https://en.wikipedia.org/wiki/Hugging_Face Event When Notes Foundation 2016 Clément Delangue (CEO), Julien Chaumond (CTO), Thomas Wolf (CSO) BLOOM (Big-science Large Open-science Open-access Multilingual language model) 2021 - 2022 Open LLM with research groups 176 billion parameter Private Hub August 2022 Support SaaS or on-prem deployment Gradio December 2022 The fastest way to demo your ML model with a friendly web interface Partnerships 2023 - 2024 AWS, Google, Nvidia, AMD, Intel, IBM, ..
  • 12. 🤗 Hugging Face sections https://huggingface.co/ Name section Hugging Face Kaggle Name section Datasets 230K 389K (298K) Datasets Models 1M 9200 (2300) Models Spaces (Gradio & Streamlit) over 500K 1.2M (1M) Code Posts, Blog, Articles, Discussions and Learn Discussions and Learn Hardware for spaces Hardware for notebooks Enterprise Hub / Competitions Competitions Hugging Face Hub https://github.com/huggingface/course
  • 13. 🤗 Hugging Face resources https://huggingface.co/docs Resource Quantity huggingface_hub[cli] (manager of spaces) datasets and files of spaces huggingface_hub python library all in Hugging Face Hub Spaces infinity Web interfaces for spaces (docker, gradio or streamlit) free Hardware for spaces (with 2 vCPU 16 GB or ZeroGPU) free Inference API (serverless) 1000 requests / day Organization free
  • 14. Hugging Face packages and third party tools
  • 15. 🤗 Hugging Face packages on pypi.org Resource Notes accelerate, optimum, peft Hardware / software optimization datasets, tokenizers, evaluate Basic packages diffusers, transformers https://github.com/huggingface/transformers.js gradio The fastest way to demo your ML model with WI huggingface_hub https://github.com/huggingface/huggingface.js https://huggingface.co/docs/huggingface_hub/index https://github.com/huggingface
  • 19. 🤗 Use cases tools https://github.com/ggerganov/llama.cpp Command line tool (GitHub contributors) ● ggerganov/llama.cpp (915) ● vllm-project/vllm (601) ● ggerganov/whisper.cpp (419) User Interface (GitHub contributors) ● oobabooga/text-generation-webui (332) ● ollama/ollama (316) ● nomic-ai/gpt4all (115)
  • 20. 🤗 Use cases tools https://github.com/ggerganov/llama.cpp Command line tool (GitHub contributors) ● ggerganov/llama.cpp (915) ● vllm-project/vllm (601) ● ggerganov/whisper.cpp (419) Packages (GitHub contributors) ● 🤗 /transformers (2799), 🤗 /diffusers (816) ● gradio-app/gradio (339), streamlit (239) ● 🤗 /huggingface_hub (206) User Interface (GitHub contributors) ● oobabooga/text-generation-webui (332) ● ollama/ollama (316) ● open-webui/open-webui (236) Frameworks (GitHub contributors) ● tensorflow/tensorflow (3604) ● langchain-ai/langchain (3177) ● run-llama/llama_index (1275) https://blog.apify.com/langchain-alternatives/
  • 22. Questions ? @PyDataVenice #17 #Meetup #PyData
  • 23. How to fine-tune a Transformer to transcribe historical documents @PyDataVenice #17 #Meetup #PyData Sara Ferro - Research Fellow @ CCHT
  • 24. Questions ? @PyDataVenice #17 #Meetup #PyData
  • 25. Excursus ● 10 👥 21/09/19, PyDataVE #0 ● 9 👥 13/12/19, PyDataVE #1 - Introduzione ai Notebook ● 23 👥 31/03/20, PyDataVE #2 - DataViz (geo e non) ● .. ● 26 👥 19/04/24, PyDataVE #15 - OpenCV Use Cases ● 66 👥 27/06/24, PyDataVE #16 - ComputerVision & DecisionMaking ● 59 👥 24/10/24, PyDataVE #17 - HuggingFace and libraries YouTube: @pydatavenice/videos & @pydatavenice/streams @PyDataVenice #17 #Meetup #PyData
  • 26. Prossimi incontri ● 3 - 5 dicembre, https://pydata.org/global2024
  • 27. Prossimi incontri ● 3 - 5 dicembre, https://pydata.org/global2024 ● 5 - 6 dicembre, https://pydata.org/global2024/impact-program
  • 28. Prossimi incontri ● 3 - 5 dicembre, https://pydata.org/global2024 ● 5 - 6 dicembre, https://pydata.org/global2024/impact-program ● YouTube: @PyDataTV/playlists
  • 29. Prossimi incontri ● 3 - 5 dicembre, https://pydata.org/global2024 ● 5 - 6 dicembre, https://pydata.org/global2024/impact-program ● YouTube: @PyDataTV/playlists ● giovedì 12 dicembre ore 19:00
  • 30. Prossimi incontri ● 3 - 5 dicembre, https://pydata.org/global2024 ● 5 - 6 dicembre, https://pydata.org/global2024/impact-program ● YouTube: @PyDataTV/playlists ● giovedì 12 dicembre ore 19:00 󰏅 E n g l i s h 󰏅
  • 32. Thanks for listening. @PyDataVenice #17 #Meetup #PyData