SlideShare a Scribd company logo
From OpenAI
to Open Source AI
Navigating Between Commercial Ownership and Collaborative Openness
https://stateofopencon.com/ #stateofopencon #soocon24 #openuk
https://hachyderm.io/@openuk
Raphaël Semeteys (and Luxin Zhang) - Worldline
Introduction
Raphaël Semeteys
• Open source since 1997, professionally since 2004
• Yoga Teacher, Creator of the QSOS method
• Head of DevRel at Worldline
7000+ engineers
in over 40 countries
Managing 43+ billion
transactions per year
€250M spent in R&D
every year
Handling 150+
payment methods
We design payments technology that powers the growth
of millions of businesses around the world
The early days of LLMs
From rule-based and simpler statistical models to LLMs
2010’s 2020’s
2017-2018
Word embeddings
such as Word2Vec
and GloVe
“Attention is All You Need"
Transformers, BERT
Generative AI, ChatGPT
responsibility concerns
GenAI is having its Linux Moment
• Just like open source and Internet, bust much faster!
• Dynamics between collaborative openness and commercial ownership
• Need of clarity on licenses
Labs &
Universities
Individuals
Enterprises
Commodities
Defining Openness of a LLM
Pre-training
Dataset
Fine-tuning
Dataset
Reward
Model
Model
Data Processing Code
Defining Openness of a LLM
Score Level Description
Model
(weights)
Pre-
training
Dataset
Fine-
tuning
Dataset
Reward
model
Data
Processing
Code
0 Closed
No access to any public
information, data or asset
1
Published
research only
Research papers(s) published but
with no more information, data or
asset
2
Restricted
access
Access to asset is possible only
with special agreement
(commercial, research…)
3
Open with
limitations
Access and reuse of asset is
possible but with certain
limitations on usage (ex. Open
RAIL)
4 Totally open
Access and reuse of asset is
possible without restriction (ex.
open source license)
Market-Leading Player: OpenAI
Deviation from original vision of research transparency & openness
Non/For-profit (US)
Component Score
Level
description
Model 4 Totally open
Dataset 1
Published
research
only
Code 1
Published
research
only
0 Closed

GPT-1 & 2 GPT-3 & 4
ChatGPT
research paper only
No training of other commercial LLMs
Market-Leading Player: OpenAI
Deviation from original vision of research transparency & openness
Non/For-profit (US)
Component Score
Level
description
Model 4 Totally open
Dataset 1
Published
research
only
Code 1
Published
research
only
0 Closed

GPT-1 & 2 GPT-3 & 4
ChatGPT
research paper only
No training of other commercial LLMs
You may not: […] Use Output to develop
models that compete with OpenAI.
Market-Leading Player: Google
Transition from open research to proprietary commercial approach
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 2
Restricted
access
Code 4 Totally open
BERT PaLM 2 & Gemini
1
Published
research only
1
Published
research only
0 Closed

Market-Leading Player: Meta
Journey to openness
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 3
Open with
limitations
Code 4 Totally open
RoBERTa Llama 2
3
Open with
limitations
1
Published
research only
1
Published
research only

Restriction on usage: license for platforms with 700+ M users
Market-Leading Player: Meta
Journey to openness
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 3
Open with
limitations
Code 4 Totally open
RoBERTa Llama 2
3
Open with
limitations
1
Published
research only
1
Published
research only

Restriction on usage: license for platforms with 700+ M users
Additional Commercial Terms. If, on the Llama 2 version release date, the
monthly active users of the products or services made available by or for
Licensee, or Licensee’s affiliates, is greater than 700 million monthly active
users in the preceding calendar month, you must request a license from
Meta, which Meta may grant to you in its sole discretion, and you are not
authorized to exercise any of the rights under this Agreement unless or
until Meta otherwise expressly grants you such rights.
Llama offspring’s: Alpaca and Vicuna
Fine-tuned models from Llama 2 by universities
Research (US)
Component Score
Level
description
Model 3
Open with
limitations
Pre-training
Dataset
1
Published
research only
Fine-tuning
Dataset
2
Research use
only
Code 4
Under Apache
2 license
Restrictions from both Llama 2 and OpenAI (ShareGPT)
Collaborative foundational LLMs
Dataset fuzziness: please refer to the specific license depending on the subset you use
Notion of responsible usage
Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR)
EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistal/Mixtral
Model 4
Access and
reuse without
restriction
3
Open with
limitations
3
Open RAIL
license
4
Access and
reuse without
restriction
4
Access and
reuse without
restriction
Dataset 3
Open with
limitations
4
Access and
reuse without
restriction
3
Open with
limitations
4
Access and
reuse without
restriction
0
No public
information or
access
Code 4
Completely
open
1
General
instructions
4
Completely
open
1 Just examples 4
Completely
open
Collaborative foundational LLMs
Dataset fuzziness: please refer to the specific license depending on the subset you use
Notion of responsible usage
Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR)
EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistal/Mixtral
Model 4
Access and
reuse without
restriction
3
Open with
limitations
3
Open RAIL
license
4
Access and
reuse without
restriction
4
Access and
reuse without
restriction
Dataset 3
Open with
limitations
4
Access and
reuse without
restriction
3
Open with
limitations
4
Access and
reuse without
restriction
0
No public
information or
access
Code 4
Completely
open
1
General
instructions
4
Completely
open
1 Just examples 4
Completely
open
This license is, in part, based on the Apache License Version 2.0,
with a series of modifications. The contribution of the Apache
License 2.0 to the framing of this document is acknowledged.
Please read this license carefully, as it is different to other ‘open
access’ licenses you may have encountered previously. Use of
Falcon180B for hosted services may require a separate license.
Collaborative fine-tuned LLMs
Impact of foundational model or pre-training datasets
Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US)
Dolly BLOOMChat Zephyr LLM360
Model 4 Based on GPT-J 3 Based on BLOOM 4 Based on Mistral 4 Open source
Pre-training
Dataset
3 Based on GPT-J 3 Based on BLOOM 0 Based on Mistral 4
RedPajama,
Falcon, StarCoder
Fine-tuning
Dataset
4
Access and reuse
without restriction
4 Dolly and LAION 2
Research use only
(OpenAI)
2
Research use only
(OpenAI)
Reward model 0
No public
information
available
0
No public
information
available
3
Paper and code
examples
0
No public
information
available
Code 4 Open source 3 OpenRAIL 3
Example code
available
4 Open source
Collaborative fine-tuned LLMs
Impact of foundational model or pre-training datasets
Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US)
Dolly BLOOMChat Zephyr LLM360
Model 4 Based on GPT-J 3 Based on BLOOM 4 Based on Mistral 4 Open source
Pre-training
Dataset
3 Based on GPT-J 3 Based on BLOOM 0 Based on Mistral 4
RedPajama,
Falcon, StarCoder
Fine-tuning
Dataset
4
Access and reuse
without restriction
4 Dolly and LAION 2
Research use only
(OpenAI)
2
Research use only
(OpenAI)
Reward model 0
No public
information
available
0
No public
information
available
3
Paper and code
examples
0
No public
information
available
Code 4 Open source 3 OpenRAIL 3
Example code
available
4 Open source
BLOOMChat Use Restrictions
l. To provide medical advice and medical results interpretation; or
m. To generate or disseminate information for the purpose to be used for
administration of justice, law enforcement, immigration or asylum processes,
such as predicting an individual will commit fraud/crime
commitment.
Collaboration platform: Hugging Face
• Startup and ecosystem dedicated to democratizing AI
• Open source Transformers library
• LLM leaderboard: upload and assess models
• The “GitHub of AI”
• Collaborative space for exploring, sharing and experimenting AI
• Hosts thousands of models, datasets, and demo applications
Enabler for collaboration and reuse
Hosting and resource paradigms
• Big players invest billions (Microsoft/OpenAI, AWS/Anthropic)
• CSP selling shovels in the AI Gold rush
Source: numind.ai
Closed models are centralized and resource-consuming
Hosting and resource paradigms
• Democratizing AI Computing
• Quantization, AI Chips
• Run models locally, in containers
• Emergence of smaller models for edge and mobile
• Small/Tiny Language Models: Gemini nano, Microsoft Phi-2, Huawei TinyBERT
• Domain Specific Language Models: BloombergGPT, Harvey (law)
• Mixture of models: Mixtral 8x7B, OpenMoE  Mixture of licenses?
Key takeaways
• Hyper-centralization leads to black boxes and closed solutions
• Openness
• Fosters collaboration and fuels community-driven innovation
• Enables inclusivity
• Just like open source software beware of licenses and restrictions
• AI's democratization continually reshapes the landscape
Thank you
Raphaël Semeteys - Worldline
@RaphaelSemeteys
https://dev.to/raphiki
Check the two-part article co-written with Luxin Zhang
Image credits
• Opensource, Internet & GenAI evolution image generated with DALL-E
• Robot evolution from Freepik
• LLMs’ #parameters evolution from numind.ai
• Shovels in Gold rush image generated with DALL-E
• Logos from official websites
• Coffee cups from Freepik
#stateofopencon #soocon24 #openuk

More Related Content

PDF
I LOVE Tech 2024 - Unlocking AI: Navigating Open Source vs. Commercial Frontiers
PDF
AI_dev Europe 2024 - From OpenAI to Opensource AI
PPTX
Open Source
PDF
Open Source
PPTX
Open source presentation enterprise ireland 2010
PDF
Product Sheet For Open Gp
PPTX
Open soucre(cut shrt)
PPT
Open source presentation to lgma workshop april 2010
I LOVE Tech 2024 - Unlocking AI: Navigating Open Source vs. Commercial Frontiers
AI_dev Europe 2024 - From OpenAI to Opensource AI
Open Source
Open Source
Open source presentation enterprise ireland 2010
Product Sheet For Open Gp
Open soucre(cut shrt)
Open source presentation to lgma workshop april 2010

Similar to SOOCon24 - From OpenAI to Opensource AI: Navigating Between Commercial Ownership and Collaborative Openness (20)

PDF
Whitepaper For Open Gp
PDF
OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27
PPS
Open Source
PPTX
Open Source And the Internet Of Things
PPTX
Open Source Software Licenses and Business Models Explained
PPTX
Open Source Software Licenses and Business Models Explained
PDF
"Open Source licensing and software quality" by Monty Michael Widenius @ eLib...
PPTX
Open source
PPT
Open Source Business Case Final
PPT
Opensource
PDF
Open Source & What It Means For Self-Sovereign Identity (SSI)
PPTX
1 open power foundation_japan meetup - v1
PDF
Open Source Software Development by TLV Partners
PDF
Open source presentation
PDF
Open Source All The Things
PPT
Flax ovum search-across_the_enterprise
PPTX
My Seminar
PDF
201704 - An Introduction to the Symphony Software Foundation
PDF
Implications of Open Source Software Use (or Let's Talk Open Source)
ODP
Red Hat - The Open Source Model
Whitepaper For Open Gp
OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27
Open Source
Open Source And the Internet Of Things
Open Source Software Licenses and Business Models Explained
Open Source Software Licenses and Business Models Explained
"Open Source licensing and software quality" by Monty Michael Widenius @ eLib...
Open source
Open Source Business Case Final
Opensource
Open Source & What It Means For Self-Sovereign Identity (SSI)
1 open power foundation_japan meetup - v1
Open Source Software Development by TLV Partners
Open source presentation
Open Source All The Things
Flax ovum search-across_the_enterprise
My Seminar
201704 - An Introduction to the Symphony Software Foundation
Implications of Open Source Software Use (or Let's Talk Open Source)
Red Hat - The Open Source Model
Ad

More from Raphaël Semeteys (20)

PDF
DevBcn 2025 - The Yoga of Image Generation with Stable DIffusion and ComfyUI
PDF
Devoxx Poland 2025 - Mastering Image Generation with Stable Diffusion
PDF
Devoxx France 2025 - D'OpenAI à Opensource AI: entre propriété commerciale et...
PDF
COTRECS - Embellir des QR Codes avec la GenAI
PDF
DevCon n°24 IA - Génération d'images locales avec Stable Diffusion
PDF
Mastering Image Generation with Stable Diffusion
PDF
The Yoga of Image Generation with Stable Diffusion & ComfyUI
PDF
Normandie.ai 2024 - D'OpenAI à Opensource AI
PDF
Image Generation with ComfyUI and Stable Diffusion
PDF
Open Source Experience 2024 - D'OpenAI à Opensource AI
PDF
DevDay 2024 - Dialoguer avec vos documents : découvrez la magie du RAG avec e...
PDF
Cloud Nord 2024 - D'OpenAI à Opensource AI
PDF
Devoxx Morocco 2024 - The Future Beyond LLMs: Exploring Agentic AI
PDF
TechForum Iberia 2024 - Towards a Redecentralization of the Internet: Explori...
PDF
2023 - Between Philosophy and Practice: Introducing Yoga
PDF
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
PPTX
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
PDF
Nantes JUG 2023 - Web3
PDF
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
PPTX
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
DevBcn 2025 - The Yoga of Image Generation with Stable DIffusion and ComfyUI
Devoxx Poland 2025 - Mastering Image Generation with Stable Diffusion
Devoxx France 2025 - D'OpenAI à Opensource AI: entre propriété commerciale et...
COTRECS - Embellir des QR Codes avec la GenAI
DevCon n°24 IA - Génération d'images locales avec Stable Diffusion
Mastering Image Generation with Stable Diffusion
The Yoga of Image Generation with Stable Diffusion & ComfyUI
Normandie.ai 2024 - D'OpenAI à Opensource AI
Image Generation with ComfyUI and Stable Diffusion
Open Source Experience 2024 - D'OpenAI à Opensource AI
DevDay 2024 - Dialoguer avec vos documents : découvrez la magie du RAG avec e...
Cloud Nord 2024 - D'OpenAI à Opensource AI
Devoxx Morocco 2024 - The Future Beyond LLMs: Exploring Agentic AI
TechForum Iberia 2024 - Towards a Redecentralization of the Internet: Explori...
2023 - Between Philosophy and Practice: Introducing Yoga
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
Nantes JUG 2023 - Web3
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
Ad

Recently uploaded (20)

PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
ai tools demonstartion for schools and inter college
PDF
Digital Strategies for Manufacturing Companies
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
System and Network Administraation Chapter 3
PPT
Introduction Database Management System for Course Database
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
medical staffing services at VALiNTRY
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
top salesforce developer skills in 2025.pdf
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
history of c programming in notes for students .pptx
Reimagine Home Health with the Power of Agentic AI​
Softaken Excel to vCard Converter Software.pdf
2025 Textile ERP Trends: SAP, Odoo & Oracle
ai tools demonstartion for schools and inter college
Digital Strategies for Manufacturing Companies
Design an Analysis of Algorithms I-SECS-1021-03
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Which alternative to Crystal Reports is best for small or large businesses.pdf
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
System and Network Administraation Chapter 3
Introduction Database Management System for Course Database
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
CHAPTER 2 - PM Management and IT Context
medical staffing services at VALiNTRY
Navsoft: AI-Powered Business Solutions & Custom Software Development
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
top salesforce developer skills in 2025.pdf
Adobe Illustrator 28.6 Crack My Vision of Vector Design
history of c programming in notes for students .pptx

SOOCon24 - From OpenAI to Opensource AI: Navigating Between Commercial Ownership and Collaborative Openness

  • 1. From OpenAI to Open Source AI Navigating Between Commercial Ownership and Collaborative Openness https://stateofopencon.com/ #stateofopencon #soocon24 #openuk https://hachyderm.io/@openuk Raphaël Semeteys (and Luxin Zhang) - Worldline
  • 2. Introduction Raphaël Semeteys • Open source since 1997, professionally since 2004 • Yoga Teacher, Creator of the QSOS method • Head of DevRel at Worldline 7000+ engineers in over 40 countries Managing 43+ billion transactions per year €250M spent in R&D every year Handling 150+ payment methods We design payments technology that powers the growth of millions of businesses around the world
  • 3. The early days of LLMs From rule-based and simpler statistical models to LLMs 2010’s 2020’s 2017-2018 Word embeddings such as Word2Vec and GloVe “Attention is All You Need" Transformers, BERT Generative AI, ChatGPT responsibility concerns
  • 4. GenAI is having its Linux Moment • Just like open source and Internet, bust much faster! • Dynamics between collaborative openness and commercial ownership • Need of clarity on licenses Labs & Universities Individuals Enterprises Commodities
  • 5. Defining Openness of a LLM Pre-training Dataset Fine-tuning Dataset Reward Model Model Data Processing Code
  • 6. Defining Openness of a LLM Score Level Description Model (weights) Pre- training Dataset Fine- tuning Dataset Reward model Data Processing Code 0 Closed No access to any public information, data or asset 1 Published research only Research papers(s) published but with no more information, data or asset 2 Restricted access Access to asset is possible only with special agreement (commercial, research…) 3 Open with limitations Access and reuse of asset is possible but with certain limitations on usage (ex. Open RAIL) 4 Totally open Access and reuse of asset is possible without restriction (ex. open source license)
  • 7. Market-Leading Player: OpenAI Deviation from original vision of research transparency & openness Non/For-profit (US) Component Score Level description Model 4 Totally open Dataset 1 Published research only Code 1 Published research only 0 Closed  GPT-1 & 2 GPT-3 & 4 ChatGPT research paper only No training of other commercial LLMs
  • 8. Market-Leading Player: OpenAI Deviation from original vision of research transparency & openness Non/For-profit (US) Component Score Level description Model 4 Totally open Dataset 1 Published research only Code 1 Published research only 0 Closed  GPT-1 & 2 GPT-3 & 4 ChatGPT research paper only No training of other commercial LLMs You may not: […] Use Output to develop models that compete with OpenAI.
  • 9. Market-Leading Player: Google Transition from open research to proprietary commercial approach Enterprise (US) Component Score Level description Model 4 Totally open Dataset 2 Restricted access Code 4 Totally open BERT PaLM 2 & Gemini 1 Published research only 1 Published research only 0 Closed 
  • 10. Market-Leading Player: Meta Journey to openness Enterprise (US) Component Score Level description Model 4 Totally open Dataset 3 Open with limitations Code 4 Totally open RoBERTa Llama 2 3 Open with limitations 1 Published research only 1 Published research only  Restriction on usage: license for platforms with 700+ M users
  • 11. Market-Leading Player: Meta Journey to openness Enterprise (US) Component Score Level description Model 4 Totally open Dataset 3 Open with limitations Code 4 Totally open RoBERTa Llama 2 3 Open with limitations 1 Published research only 1 Published research only  Restriction on usage: license for platforms with 700+ M users Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
  • 12. Llama offspring’s: Alpaca and Vicuna Fine-tuned models from Llama 2 by universities Research (US) Component Score Level description Model 3 Open with limitations Pre-training Dataset 1 Published research only Fine-tuning Dataset 2 Research use only Code 4 Under Apache 2 license Restrictions from both Llama 2 and OpenAI (ShareGPT)
  • 13. Collaborative foundational LLMs Dataset fuzziness: please refer to the specific license depending on the subset you use Notion of responsible usage Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR) EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistal/Mixtral Model 4 Access and reuse without restriction 3 Open with limitations 3 Open RAIL license 4 Access and reuse without restriction 4 Access and reuse without restriction Dataset 3 Open with limitations 4 Access and reuse without restriction 3 Open with limitations 4 Access and reuse without restriction 0 No public information or access Code 4 Completely open 1 General instructions 4 Completely open 1 Just examples 4 Completely open
  • 14. Collaborative foundational LLMs Dataset fuzziness: please refer to the specific license depending on the subset you use Notion of responsible usage Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR) EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistal/Mixtral Model 4 Access and reuse without restriction 3 Open with limitations 3 Open RAIL license 4 Access and reuse without restriction 4 Access and reuse without restriction Dataset 3 Open with limitations 4 Access and reuse without restriction 3 Open with limitations 4 Access and reuse without restriction 0 No public information or access Code 4 Completely open 1 General instructions 4 Completely open 1 Just examples 4 Completely open This license is, in part, based on the Apache License Version 2.0, with a series of modifications. The contribution of the Apache License 2.0 to the framing of this document is acknowledged. Please read this license carefully, as it is different to other ‘open access’ licenses you may have encountered previously. Use of Falcon180B for hosted services may require a separate license.
  • 15. Collaborative fine-tuned LLMs Impact of foundational model or pre-training datasets Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US) Dolly BLOOMChat Zephyr LLM360 Model 4 Based on GPT-J 3 Based on BLOOM 4 Based on Mistral 4 Open source Pre-training Dataset 3 Based on GPT-J 3 Based on BLOOM 0 Based on Mistral 4 RedPajama, Falcon, StarCoder Fine-tuning Dataset 4 Access and reuse without restriction 4 Dolly and LAION 2 Research use only (OpenAI) 2 Research use only (OpenAI) Reward model 0 No public information available 0 No public information available 3 Paper and code examples 0 No public information available Code 4 Open source 3 OpenRAIL 3 Example code available 4 Open source
  • 16. Collaborative fine-tuned LLMs Impact of foundational model or pre-training datasets Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US) Dolly BLOOMChat Zephyr LLM360 Model 4 Based on GPT-J 3 Based on BLOOM 4 Based on Mistral 4 Open source Pre-training Dataset 3 Based on GPT-J 3 Based on BLOOM 0 Based on Mistral 4 RedPajama, Falcon, StarCoder Fine-tuning Dataset 4 Access and reuse without restriction 4 Dolly and LAION 2 Research use only (OpenAI) 2 Research use only (OpenAI) Reward model 0 No public information available 0 No public information available 3 Paper and code examples 0 No public information available Code 4 Open source 3 OpenRAIL 3 Example code available 4 Open source BLOOMChat Use Restrictions l. To provide medical advice and medical results interpretation; or m. To generate or disseminate information for the purpose to be used for administration of justice, law enforcement, immigration or asylum processes, such as predicting an individual will commit fraud/crime commitment.
  • 17. Collaboration platform: Hugging Face • Startup and ecosystem dedicated to democratizing AI • Open source Transformers library • LLM leaderboard: upload and assess models • The “GitHub of AI” • Collaborative space for exploring, sharing and experimenting AI • Hosts thousands of models, datasets, and demo applications Enabler for collaboration and reuse
  • 18. Hosting and resource paradigms • Big players invest billions (Microsoft/OpenAI, AWS/Anthropic) • CSP selling shovels in the AI Gold rush Source: numind.ai Closed models are centralized and resource-consuming
  • 19. Hosting and resource paradigms • Democratizing AI Computing • Quantization, AI Chips • Run models locally, in containers • Emergence of smaller models for edge and mobile • Small/Tiny Language Models: Gemini nano, Microsoft Phi-2, Huawei TinyBERT • Domain Specific Language Models: BloombergGPT, Harvey (law) • Mixture of models: Mixtral 8x7B, OpenMoE  Mixture of licenses?
  • 20. Key takeaways • Hyper-centralization leads to black boxes and closed solutions • Openness • Fosters collaboration and fuels community-driven innovation • Enables inclusivity • Just like open source software beware of licenses and restrictions • AI's democratization continually reshapes the landscape
  • 21. Thank you Raphaël Semeteys - Worldline @RaphaelSemeteys https://dev.to/raphiki Check the two-part article co-written with Luxin Zhang
  • 22. Image credits • Opensource, Internet & GenAI evolution image generated with DALL-E • Robot evolution from Freepik • LLMs’ #parameters evolution from numind.ai • Shovels in Gold rush image generated with DALL-E • Logos from official websites • Coffee cups from Freepik