SlideShare a Scribd company logo
3
Most read
5
Most read
7
Most read
Payments to grow your world
Navigating between
Commercial Ownership
and Collaborative Openness
Raphaël Semeteys
Head of DevRel
Open Source Expert
Senior Architect at Worldline
19 June 2024
Paris, France
From OpenAI to Open Source AI
We design payments technology
that powers the growth of millions
of businesses around the world.
7000+ engineers
in over 40 countries
Managing 43+ billion
transactions per year
€250M spent in R&D
every year
Handling 150+
payment methods
The early days of LLMs
From rule-based and simpler statistical models to LLMs
2010’s 2020’s
2017-2018
Word embeddings
Word2Vec, GloVe
“Attention is All You Need"
Transformers
GenAI, ChatGPT
Responsibility concerns
Tomorrow?
Small Language Models
Mobile, Agents & LAMs
GenAI is having its Linux Moment
• Just like open source and Internet, bust much faster!
• Dynamics between collaborative openness and commercial ownership
• Need of clarity on licenses
Labs &
Universities
Individuals
Enterprises
Commodities
Defining Openness of a Model
Pre-training
Dataset
Fine-tuning
Dataset
Reward
Model
Model
Data Processing Code
Defining Openness of a Model
Score Level Description
Model
(weights)
Pre-
training
Dataset
Fine-
tuning
Dataset
Reward
model
Data
Processing
Code
0 Closed
No access to any public
information, data or asset
1
Published
research
only
Research papers(s) published but
with no more information, data or
asset
2
Restricted
access
Access to asset is possible only
with special agreement
(commercial, research…)
3
Open with
limitations
Access and reuse of asset is
possible but with certain
limitations on usage
4 Totally open
Access and reuse of asset is
possible without restriction on
usage (ex. open source license)
Market-Leading Player: OpenAI
Deviation from original vision of research transparency & openness
Non/For-profit (US)
Component Score
Level
description
Model 4 Totally open
Dataset 1
Published
research
only
Code 1
Published
research
only
0 Closed
→
GPT-1 & 2 GPT-3.x & 4.x/o
ChatGPT
research paper only
Market-Leading Player: OpenAI
Deviation from original vision of research transparency & openness
Non/For-profit (US)
Component Score
Level
description
Model 4 Totally open
Dataset 1
Published
research
only
Code 1
Published
research
only
0 Closed
→
GPT-1 & 2
ChatGPT
research paper only
No training of other commercial LLMs
You may not: […] Use Output to
develop models that compete with
OpenAI.
GPT-3.x & 4.x/o
Market-Leading Player: Google
Transition from open research to a pragmatic approach
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 2
Restricted
access
Code 4 Totally open
1
Published
research only
1
Published
research only
0 Closed
→
3
Open with
limitations
1
Published
research only
4
Toolchain
available
↔
Market-Leading Player: Google
Transition from open research to a pragmatic approach
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 2
Restricted
access
Code 4 Totally open
1
Published
research only
1
Published
research only
0 Closed
→
3
Open with
limitations
1
Published
research only
4
Toolchain
available
→
You may not use nor allow others to use Gemma or
Model Derivatives to: [illegals activities, unlicensed
practices of profession, abuse, security bypass and
promotion of hatred, abuse, violence, monitoring people
without consent, misinformation/defamation, automate
decisions concerning human rights and well-being, etc.]
Responsible AI contradicts Open Source Definition
Other Big Players
Catching up and making their mark in the GenAI Gold Rush
Partner for Infrastructure (inference and training)
Create their own (open) models
Market-Leading Player: Meta
Journey to openness
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 3
Open with
limitations
Code 4 Totally open
RoBERTa
3
Open with
limitations
1
Published
research only
1
Published
research only
→
Market-Leading Player: Meta
Journey to openness
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 3
Open with
limitations
Code 4 Totally open
RoBERTa
3
Open with
limitations
1
Published
research only
1
Published
research only
→
Restriction on usage: license for platforms with 700+ M users
Additional Commercial Terms. If, on the Llama 2 version release date,
the monthly active users of the products or services made available by or
for Licensee, or Licensee’s affiliates, is greater than 700 million monthly
active users in the preceding calendar month, you must request a license
from Meta, which Meta may grant to you in its sole discretion, and you
are not authorized to exercise any of the rights under this Agreement
unless or until Meta otherwise expressly grants you such rights.
Market-Leading Player: Meta
Journey to openness
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 3
Open with
limitations
Code 4 Totally open
RoBERTa
3
Open with
limitations
1
Published
research only
1
Published
research only
→
LLaMA 3 now more restrictive on redistribution and reuse
Redistribution and Use. If you distribute or make available the Llama Materials (or any
derivative works thereof), or a product or service that uses any of them, including
another AI model, you shall (A) provide a copy of this Agreement with any such Llama
Materials; and (B) prominently display “Built with Meta Llama 3” on a related website,
user interface, blogpost, about page, or product documentation. If you use the Llama
Materials to create, train, fine tune, or otherwise improve an AI model, which is
distributed or made available, you shall also include “Llama 3” at the beginning of any
such AI model name.
Llama 2 offspring’s: Alpaca and Vicuna
Fine-tuned models from Llama 2 by universities
Research (US)
Component Score
Level
description
Model 3
Open with
limitations
Pre-training
Dataset
1
Published
research only
Fine-tuning
Dataset
2
Research use
only
Code 4
Under Apache
2 license
Restrictions from both Llama 2 and OpenAI (ShareGPT)
Collaborative foundational LLMs
Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR)
EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistral
Model 4
Access and
reuse
without
restriction
3
Open with
limitations
3
Open RAIL
license
4
Access and
reuse
without
restriction
4
Access and
reuse
without
restriction
Dataset 3
Open with
limitations
4
Access and
reuse
without
restriction
3
Open with
limitations
4
Access and
reuse
without
restriction
0
No public
information
or access
Code 4
Completely
open
1
General
instructions
4
Completely
open
1
Just
examples
4
Completely
open
Dataset fuzziness: please refer to the specific license depending on the subset you use
Notion of responsible usage
Collaborative foundational LLMs
Modified open source licenses
Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR)
EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistral
Model 4
Access and
reuse
without
restriction
3
Open with
limitations
3
Open RAIL
license
4
Access and
reuse
without
restriction
4
Access and
reuse
without
restriction
Dataset 3
Open with
limitations
4
Access and
reuse
without
restriction
3
Open with
limitations
4
Access and
reuse
without
restriction
0
No public
information
or access
Code 4
Completely
open
1
General
instructions
4
Completely
open
1
Just
examples
4
Completely
open
This license is, in part, based on the Apache License Version 2.0, with a
series of modifications. The contribution of the Apache License 2.0 to
the framing of this document is acknowledged. Please read this license
carefully, as it is different to other ‘open access’ licenses you may have
encountered previously. Use of Falcon180B for hosted services may
require a separate license.
Mistral AI’s French sauce
Navigating both open and close waters
Just like with Open Source, rise of Community VS Enterprise
Mix of AI Models
• Mixture-of-Experts (SMoE): Mixtral 8x7B, 8x22B
• Foundational and fine-tuned models
Mix of Business Models & Licenses
• “Open Source” models, mistral-finetune SDK
• Commercial: optimized Small, Large & Embed Models
• Sustainable openness: new non-production license for codestral
Mistral AI’s French sauce
Navigation both open and close waters
Just like with Open Source, revisiting Open in Cloud era
Mix of AI Models
• Mixture-of-Experts (SMoE): Mixtral 8x7B, 8x22B
• Foundational and fine-tuned models
Mix of Business Models & Licenses
• “Open Source” models, mistral-finetune SDK
• Commercial: optimized Small, Large & Embed Models
• Sustainable openness: new non-production license for codestral
MNPL - 3.2. Usage Limitation
- You shall only use the Mistral Models and Derivatives (whether or not created
by Mistral AI) for testing, research, Personal, or evaluation purposes in Non-
Production Environments;
- Subject to the foregoing, You shall not supply the Mistral Models or
Derivatives in the course of a commercial activity, whether in return for
payment or free of charge, in any medium or form, including but not limited to
through a hosted or managed service (e.g. SaaS, cloud instances, etc.), or
behind a software layer.
Collaborative fine-tuned LLMs
Impact of foundational models or pre-training datasets
Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US) Research (US)
Dolly BLOOMChat Zephyr LLM360 OLMo-Instruct
Model 4 Based on GPT-J 3
Based on
BLOOM
4
Based on
Mistral
4 Open source 4 Open source
Pre-training
Dataset
3 Based on GPT-J 3
Based on
BLOOM
0
Based on
Mistral
4
RedPajama,
Falcon,
StarCoder
3
Dolma
(ImpACT MR)
Fine-tuning
Dataset
4
Access and
reuse without
restriction
4
Dolly and
LAION
2
Research use
only (OpenAI)
2
Research use
only (OpenAI)
3
Tülu 2
(IMPACT LR)
Reward
model
0
No public
information
available
0
No public
information
available
3
Paper and code
examples
0
No public
information
available
4
UltraFeedback
(MIT)
Code 4 Open source 3 OpenRAIL 3
Example code
available
4 Open source 4 Open source
Collaborative fine-tuned LLMs
Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US) Research (US)
Dolly BLOOMChat Zephyr LLM360 OLMo-Instruct
Model 4 Based on GPT-J 3
Based on
BLOOM
4
Based on
Mistral
4 Open source 4 Open source
Pre-training
Dataset
3 Based on GPT-J 3
Based on
BLOOM
0
Based on
Mistral
4
RedPajama,
Falcon,
StarCoder
3
Dolma
(ImpACT MR)
Fine-tuning
Dataset
4
Access and
reuse without
restriction
4
Dolly and
LAION
2
Research use
only (OpenAI)
2
Research use
only (OpenAI)
3
Tulu 2
(IMPACT LR)
Reward
model
0
No public
information
available
0
No public
information
available
3
Paper and code
examples
0
No public
information
available
4
UltraFeedback
(MIT)
Code 4 Open source 3 OpenRAIL 3
Example code
available
4 Open source 4 Open source
AI2 ImpACT Licenses - Restrictions
[…] a. military weapons purposes […]
b. purposes of military surveillance […]
c. purposes of generating or disseminating information or content […] without
expressly and intelligibly disclaiming that the text is machine generated;
d. purposes of ‘real time’ remote biometric processing […]
e. fully automated decision-making without a human in the loop […] as spreading
misinformation […]
f. purposes of the predictive administration of justice, law enforcement, immigration,
or asylum processes, such as predicting an individual will commit fraud/crime
Responsible AI contradicts Open Source Definition
Other aspects of GenAI’s Linux Moment
Democratize and Decentralize (re)use and innovation
Notebooks
Communities
New Business Models
Collaborative Tools
& Ecosystems
AI Chips
Quantization
Decentralization
Hardware
Optimization
Do One Thing Well
Interoperable Standards
Beyond Python
Opensource Tools
& Frameworks
Key takeaways
• Closed APIs → Open Weights → Free AI (as in freedom)
• Datasets and upstream transitivity
• Competitive clauses
• Responsible AI restrictions
• Open Research → Competitive Market → Coopetitive Ecosystem
• Openness fosters reuse and collaboration
• Collaboration brings commoditization and innovation
Just like Open Source!
Thank you
Raphaël Semeteys - Worldline
@RaphaelSemeteys
raphiki.github.io

More Related Content

PDF
I LOVE Tech 2024 - Unlocking AI: Navigating Open Source vs. Commercial Frontiers
PPTX
SOOCon24 - From OpenAI to Opensource AI: Navigating Between Commercial Owners...
PDF
Mistral AI Strategic Memo.pdf
PDF
How Mistral AI raised €105m with no pitch deck or product
PDF
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
PPTX
Open-Source GenAI vs. Enterprise GenAI: Navigating the Future of AI Innovatio...
PDF
OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27
PDF
Quick Overview of the Top 9 Popular LLMs.pdf
I LOVE Tech 2024 - Unlocking AI: Navigating Open Source vs. Commercial Frontiers
SOOCon24 - From OpenAI to Opensource AI: Navigating Between Commercial Owners...
Mistral AI Strategic Memo.pdf
How Mistral AI raised €105m with no pitch deck or product
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
Open-Source GenAI vs. Enterprise GenAI: Navigating the Future of AI Innovatio...
OpenChain Webinar 57 - The Open Source Initiative - 2023-11-27
Quick Overview of the Top 9 Popular LLMs.pdf

Similar to AI_dev Europe 2024 - From OpenAI to Opensource AI (20)

PPTX
AI Open-Source Models- Benefits vs. Risks.
PDF
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
PDF
Open LLMs: Viable for Production or Low-Quality Toy?
PPTX
[DSC Europe 24] Tomislav Tipuric - Exploring LLMs across clouds – A Year in t...
PDF
SFSCON23 - Simon Phipps - Regulation, AI and the State of Software Freedom in...
PPTX
Google Cloud GenAI Overview_071223.pptx
PDF
Responsible Generative AI Design Patterns
DOCX
AI Companies.docx
PPTX
Technology Trends, Disruptions and Opportunities
PDF
3rd International Congress on Recent Trends in Computer Science (ICRCS 2024)
PDF
Technology trends, disruptions and Opportunities
PDF
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
PPTX
Open, Secure & Transparent AI Pipelines
PDF
SFSCON23 - Frank Karlitschek - What the AI revolution means for Open Source, ...
PDF
Conviction LP Letter - Dec 2023 [Redacted]
PDF
Deep-Dive-AI-final-report.pdf
PPTX
Open source and AI keynote
PPTX
SacHacks Keynote Open Source Software and IBM Z
ODP
Be Innovative with Copyright - Introduction to Open Source Business Models
PDF
Wall Street Mastermind Sector Spotlight - Technology (October 2023).pdf
AI Open-Source Models- Benefits vs. Risks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Open LLMs: Viable for Production or Low-Quality Toy?
[DSC Europe 24] Tomislav Tipuric - Exploring LLMs across clouds – A Year in t...
SFSCON23 - Simon Phipps - Regulation, AI and the State of Software Freedom in...
Google Cloud GenAI Overview_071223.pptx
Responsible Generative AI Design Patterns
AI Companies.docx
Technology Trends, Disruptions and Opportunities
3rd International Congress on Recent Trends in Computer Science (ICRCS 2024)
Technology trends, disruptions and Opportunities
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
Open, Secure & Transparent AI Pipelines
SFSCON23 - Frank Karlitschek - What the AI revolution means for Open Source, ...
Conviction LP Letter - Dec 2023 [Redacted]
Deep-Dive-AI-final-report.pdf
Open source and AI keynote
SacHacks Keynote Open Source Software and IBM Z
Be Innovative with Copyright - Introduction to Open Source Business Models
Wall Street Mastermind Sector Spotlight - Technology (October 2023).pdf
Ad

More from Raphaël Semeteys (20)

PDF
DevBcn 2025 - The Yoga of Image Generation with Stable DIffusion and ComfyUI
PDF
Devoxx Poland 2025 - Mastering Image Generation with Stable Diffusion
PDF
Devoxx France 2025 - D'OpenAI à Opensource AI: entre propriété commerciale et...
PDF
COTRECS - Embellir des QR Codes avec la GenAI
PDF
DevCon n°24 IA - Génération d'images locales avec Stable Diffusion
PDF
Mastering Image Generation with Stable Diffusion
PDF
The Yoga of Image Generation with Stable Diffusion & ComfyUI
PDF
Normandie.ai 2024 - D'OpenAI à Opensource AI
PDF
Image Generation with ComfyUI and Stable Diffusion
PDF
Open Source Experience 2024 - D'OpenAI à Opensource AI
PDF
DevDay 2024 - Dialoguer avec vos documents : découvrez la magie du RAG avec e...
PDF
Cloud Nord 2024 - D'OpenAI à Opensource AI
PDF
Devoxx Morocco 2024 - The Future Beyond LLMs: Exploring Agentic AI
PDF
TechForum Iberia 2024 - Towards a Redecentralization of the Internet: Explori...
PDF
2023 - Between Philosophy and Practice: Introducing Yoga
PDF
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
PPTX
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
PDF
Nantes JUG 2023 - Web3
PDF
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
PPTX
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
DevBcn 2025 - The Yoga of Image Generation with Stable DIffusion and ComfyUI
Devoxx Poland 2025 - Mastering Image Generation with Stable Diffusion
Devoxx France 2025 - D'OpenAI à Opensource AI: entre propriété commerciale et...
COTRECS - Embellir des QR Codes avec la GenAI
DevCon n°24 IA - Génération d'images locales avec Stable Diffusion
Mastering Image Generation with Stable Diffusion
The Yoga of Image Generation with Stable Diffusion & ComfyUI
Normandie.ai 2024 - D'OpenAI à Opensource AI
Image Generation with ComfyUI and Stable Diffusion
Open Source Experience 2024 - D'OpenAI à Opensource AI
DevDay 2024 - Dialoguer avec vos documents : découvrez la magie du RAG avec e...
Cloud Nord 2024 - D'OpenAI à Opensource AI
Devoxx Morocco 2024 - The Future Beyond LLMs: Exploring Agentic AI
TechForum Iberia 2024 - Towards a Redecentralization of the Internet: Explori...
2023 - Between Philosophy and Practice: Introducing Yoga
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3
Nantes JUG 2023 - Web3
TADx 2023 - 1 plateforme à convevoir, 2 architectes : 3 possibilités ?
SnowcampIO 2023 - 1 plateforme à concevoir + 2 architectes = 3 solutions
Ad

Recently uploaded (20)

PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Electronic commerce courselecture one. Pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
A Presentation on Artificial Intelligence
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Cloud computing and distributed systems.
PPT
Teaching material agriculture food technology
PPTX
Big Data Technologies - Introduction.pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Network Security Unit 5.pdf for BCA BBA.
Electronic commerce courselecture one. Pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Encapsulation theory and applications.pdf
sap open course for s4hana steps from ECC to s4
A Presentation on Artificial Intelligence
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Review of recent advances in non-invasive hemoglobin estimation
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
NewMind AI Weekly Chronicles - August'25-Week II
Chapter 3 Spatial Domain Image Processing.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Cloud computing and distributed systems.
Teaching material agriculture food technology
Big Data Technologies - Introduction.pptx
A comparative analysis of optical character recognition models for extracting...
Per capita expenditure prediction using model stacking based on satellite ima...
Digital-Transformation-Roadmap-for-Companies.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

AI_dev Europe 2024 - From OpenAI to Opensource AI

  • 1. Payments to grow your world Navigating between Commercial Ownership and Collaborative Openness Raphaël Semeteys Head of DevRel Open Source Expert Senior Architect at Worldline 19 June 2024 Paris, France From OpenAI to Open Source AI
  • 2. We design payments technology that powers the growth of millions of businesses around the world. 7000+ engineers in over 40 countries Managing 43+ billion transactions per year €250M spent in R&D every year Handling 150+ payment methods
  • 3. The early days of LLMs From rule-based and simpler statistical models to LLMs 2010’s 2020’s 2017-2018 Word embeddings Word2Vec, GloVe “Attention is All You Need" Transformers GenAI, ChatGPT Responsibility concerns Tomorrow? Small Language Models Mobile, Agents & LAMs
  • 4. GenAI is having its Linux Moment • Just like open source and Internet, bust much faster! • Dynamics between collaborative openness and commercial ownership • Need of clarity on licenses Labs & Universities Individuals Enterprises Commodities
  • 5. Defining Openness of a Model Pre-training Dataset Fine-tuning Dataset Reward Model Model Data Processing Code
  • 6. Defining Openness of a Model Score Level Description Model (weights) Pre- training Dataset Fine- tuning Dataset Reward model Data Processing Code 0 Closed No access to any public information, data or asset 1 Published research only Research papers(s) published but with no more information, data or asset 2 Restricted access Access to asset is possible only with special agreement (commercial, research…) 3 Open with limitations Access and reuse of asset is possible but with certain limitations on usage 4 Totally open Access and reuse of asset is possible without restriction on usage (ex. open source license)
  • 7. Market-Leading Player: OpenAI Deviation from original vision of research transparency & openness Non/For-profit (US) Component Score Level description Model 4 Totally open Dataset 1 Published research only Code 1 Published research only 0 Closed → GPT-1 & 2 GPT-3.x & 4.x/o ChatGPT research paper only
  • 8. Market-Leading Player: OpenAI Deviation from original vision of research transparency & openness Non/For-profit (US) Component Score Level description Model 4 Totally open Dataset 1 Published research only Code 1 Published research only 0 Closed → GPT-1 & 2 ChatGPT research paper only No training of other commercial LLMs You may not: […] Use Output to develop models that compete with OpenAI. GPT-3.x & 4.x/o
  • 9. Market-Leading Player: Google Transition from open research to a pragmatic approach Enterprise (US) Component Score Level description Model 4 Totally open Dataset 2 Restricted access Code 4 Totally open 1 Published research only 1 Published research only 0 Closed → 3 Open with limitations 1 Published research only 4 Toolchain available ↔
  • 10. Market-Leading Player: Google Transition from open research to a pragmatic approach Enterprise (US) Component Score Level description Model 4 Totally open Dataset 2 Restricted access Code 4 Totally open 1 Published research only 1 Published research only 0 Closed → 3 Open with limitations 1 Published research only 4 Toolchain available → You may not use nor allow others to use Gemma or Model Derivatives to: [illegals activities, unlicensed practices of profession, abuse, security bypass and promotion of hatred, abuse, violence, monitoring people without consent, misinformation/defamation, automate decisions concerning human rights and well-being, etc.] Responsible AI contradicts Open Source Definition
  • 11. Other Big Players Catching up and making their mark in the GenAI Gold Rush Partner for Infrastructure (inference and training) Create their own (open) models
  • 12. Market-Leading Player: Meta Journey to openness Enterprise (US) Component Score Level description Model 4 Totally open Dataset 3 Open with limitations Code 4 Totally open RoBERTa 3 Open with limitations 1 Published research only 1 Published research only →
  • 13. Market-Leading Player: Meta Journey to openness Enterprise (US) Component Score Level description Model 4 Totally open Dataset 3 Open with limitations Code 4 Totally open RoBERTa 3 Open with limitations 1 Published research only 1 Published research only → Restriction on usage: license for platforms with 700+ M users Additional Commercial Terms. If, on the Llama 2 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
  • 14. Market-Leading Player: Meta Journey to openness Enterprise (US) Component Score Level description Model 4 Totally open Dataset 3 Open with limitations Code 4 Totally open RoBERTa 3 Open with limitations 1 Published research only 1 Published research only → LLaMA 3 now more restrictive on redistribution and reuse Redistribution and Use. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service that uses any of them, including another AI model, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Meta Llama 3” on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama 3” at the beginning of any such AI model name.
  • 15. Llama 2 offspring’s: Alpaca and Vicuna Fine-tuned models from Llama 2 by universities Research (US) Component Score Level description Model 3 Open with limitations Pre-training Dataset 1 Published research only Fine-tuning Dataset 2 Research use only Code 4 Under Apache 2 license Restrictions from both Llama 2 and OpenAI (ShareGPT)
  • 16. Collaborative foundational LLMs Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR) EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistral Model 4 Access and reuse without restriction 3 Open with limitations 3 Open RAIL license 4 Access and reuse without restriction 4 Access and reuse without restriction Dataset 3 Open with limitations 4 Access and reuse without restriction 3 Open with limitations 4 Access and reuse without restriction 0 No public information or access Code 4 Completely open 1 General instructions 4 Completely open 1 Just examples 4 Completely open Dataset fuzziness: please refer to the specific license depending on the subset you use Notion of responsible usage
  • 17. Collaborative foundational LLMs Modified open source licenses Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR) EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistral Model 4 Access and reuse without restriction 3 Open with limitations 3 Open RAIL license 4 Access and reuse without restriction 4 Access and reuse without restriction Dataset 3 Open with limitations 4 Access and reuse without restriction 3 Open with limitations 4 Access and reuse without restriction 0 No public information or access Code 4 Completely open 1 General instructions 4 Completely open 1 Just examples 4 Completely open This license is, in part, based on the Apache License Version 2.0, with a series of modifications. The contribution of the Apache License 2.0 to the framing of this document is acknowledged. Please read this license carefully, as it is different to other ‘open access’ licenses you may have encountered previously. Use of Falcon180B for hosted services may require a separate license.
  • 18. Mistral AI’s French sauce Navigating both open and close waters Just like with Open Source, rise of Community VS Enterprise Mix of AI Models • Mixture-of-Experts (SMoE): Mixtral 8x7B, 8x22B • Foundational and fine-tuned models Mix of Business Models & Licenses • “Open Source” models, mistral-finetune SDK • Commercial: optimized Small, Large & Embed Models • Sustainable openness: new non-production license for codestral
  • 19. Mistral AI’s French sauce Navigation both open and close waters Just like with Open Source, revisiting Open in Cloud era Mix of AI Models • Mixture-of-Experts (SMoE): Mixtral 8x7B, 8x22B • Foundational and fine-tuned models Mix of Business Models & Licenses • “Open Source” models, mistral-finetune SDK • Commercial: optimized Small, Large & Embed Models • Sustainable openness: new non-production license for codestral MNPL - 3.2. Usage Limitation - You shall only use the Mistral Models and Derivatives (whether or not created by Mistral AI) for testing, research, Personal, or evaluation purposes in Non- Production Environments; - Subject to the foregoing, You shall not supply the Mistral Models or Derivatives in the course of a commercial activity, whether in return for payment or free of charge, in any medium or form, including but not limited to through a hosted or managed service (e.g. SaaS, cloud instances, etc.), or behind a software layer.
  • 20. Collaborative fine-tuned LLMs Impact of foundational models or pre-training datasets Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US) Research (US) Dolly BLOOMChat Zephyr LLM360 OLMo-Instruct Model 4 Based on GPT-J 3 Based on BLOOM 4 Based on Mistral 4 Open source 4 Open source Pre-training Dataset 3 Based on GPT-J 3 Based on BLOOM 0 Based on Mistral 4 RedPajama, Falcon, StarCoder 3 Dolma (ImpACT MR) Fine-tuning Dataset 4 Access and reuse without restriction 4 Dolly and LAION 2 Research use only (OpenAI) 2 Research use only (OpenAI) 3 Tülu 2 (IMPACT LR) Reward model 0 No public information available 0 No public information available 3 Paper and code examples 0 No public information available 4 UltraFeedback (MIT) Code 4 Open source 3 OpenRAIL 3 Example code available 4 Open source 4 Open source
  • 21. Collaborative fine-tuned LLMs Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US) Research (US) Dolly BLOOMChat Zephyr LLM360 OLMo-Instruct Model 4 Based on GPT-J 3 Based on BLOOM 4 Based on Mistral 4 Open source 4 Open source Pre-training Dataset 3 Based on GPT-J 3 Based on BLOOM 0 Based on Mistral 4 RedPajama, Falcon, StarCoder 3 Dolma (ImpACT MR) Fine-tuning Dataset 4 Access and reuse without restriction 4 Dolly and LAION 2 Research use only (OpenAI) 2 Research use only (OpenAI) 3 Tulu 2 (IMPACT LR) Reward model 0 No public information available 0 No public information available 3 Paper and code examples 0 No public information available 4 UltraFeedback (MIT) Code 4 Open source 3 OpenRAIL 3 Example code available 4 Open source 4 Open source AI2 ImpACT Licenses - Restrictions […] a. military weapons purposes […] b. purposes of military surveillance […] c. purposes of generating or disseminating information or content […] without expressly and intelligibly disclaiming that the text is machine generated; d. purposes of ‘real time’ remote biometric processing […] e. fully automated decision-making without a human in the loop […] as spreading misinformation […] f. purposes of the predictive administration of justice, law enforcement, immigration, or asylum processes, such as predicting an individual will commit fraud/crime Responsible AI contradicts Open Source Definition
  • 22. Other aspects of GenAI’s Linux Moment Democratize and Decentralize (re)use and innovation Notebooks Communities New Business Models Collaborative Tools & Ecosystems AI Chips Quantization Decentralization Hardware Optimization Do One Thing Well Interoperable Standards Beyond Python Opensource Tools & Frameworks
  • 23. Key takeaways • Closed APIs → Open Weights → Free AI (as in freedom) • Datasets and upstream transitivity • Competitive clauses • Responsible AI restrictions • Open Research → Competitive Market → Coopetitive Ecosystem • Openness fosters reuse and collaboration • Collaboration brings commoditization and innovation Just like Open Source!
  • 24. Thank you Raphaël Semeteys - Worldline @RaphaelSemeteys raphiki.github.io