SlideShare a Scribd company logo
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O.ai Confidential
Intro to
Andreea Turcu
Head of Global Training @H2O.ai
H2O.ai Confidential
Table of Contents
1. What are Large Language Models (LLMs)?
2. Steps in Building LLMs
3. Importance of Data Cleaning for LLMs
4. What is LLM DataStudio? (+ Interface Demo)
5. Generate a Clean Dataset from a PDF File (Doc2QA)
Quizzes
Q&As
H2O.ai Confidential
Table of Contents
1. What are Large Language Models (LLMs)?
2. Steps in Building LLMs
3. Importance of Data Cleaning for LLMs
4. What is LLM DataStudio? (+ Interface Demo)
5. Generate a Clean Dataset from a PDF File (Doc2QA)
Quizzes
Q&As
H2O.ai Confidential
1. Definition of LLMs
Large Large Language Models
(LLMs) are sophisticated artificial
intelligence models specifically
designed to understand and
generate human-like language on
an extensive scale.
H2O.ai Confidential
1. Training, Patterns and Parameters
Extensive Training Data: Trained on massive
textual datasets from diverse sources
Pattern and Meaning Learning: They absorb
knowledge of words, sentence patterns and
meanings
Significant Parameters: Referred to as "large" due
to a substantial number of parameters
H2O.ai Confidential
H2O.ai Confidential
1. Generative AI vs. LLMs
Source: gpt.h2o.ai
H2O.ai Confidential
1. LLMs vs. Foundation Models
Foundation Models
Large Language Models (LLMs)
Unlabeled
Training Data
Additional
Text-Based Data
Transformer
Algorithm
Transformer
Algorithm
Foundation
Model
LLM
Foundation Model:
- Large machine learning model trained on
unlabeled data.
- Enhanced through transformer algorithms and
fine-tuning.
- Adaptable to various applications.
Large Language Model (LLM):
- Specific type of foundation model.
- Tailored for natural language processing tasks.
- Examples include GPT models (e.g., GPT-3).
H2O.ai Confidential
1. Benefits of LLMs
1. Natural Language Processing (NLP)
2. Versatility Across Diverse Domains
3. Elevated Creativity in Content Generation
4. Facilitating Global Communication Breakthroughs
5. Information Extraction Efficiency
H2O.ai Confidential
1. Challenges of LLMs
1. Computational Resources
2. Energy Consumption
3. Fine-tuning Complexity
4. Data Privacy Concerns
5. Interpretable Output
H2O.ai Confidential
Table of Contents
1. What are Large Language Models (LLMs)?
2. Steps in Building LLMs
3. Importance of Data Cleaning for LLMs
4. What is LLM DataStudio? (+ Interface Demo)
5. Generate a Clean Dataset from a PDF File (Doc2QA)
Quizzes
Q&As
H2O.ai Confidential
2. The LLMs Lifecycle
1. Data Collection
2. Preprocessing
3. Model Architecture Design
4. Training the Model
5. Fine-tuning
6. Validation and Evaluation
7. Deployment
8. Monitoring
H2O.ai Confidential
2. Preprocessing is important
1. Data Collection
2. Preprocessing
3. Model Architecture Design
4. Training the Model
5. Fine-tuning
6. Validation and Evaluation
7. Deployment
8. Monitoring
H2O.ai Confidential
2. Gold in - Gold Out
Data Collection
Preprocessing
Model Architecture Design
Training the Model
Fine-tuning
Validation and Evaluation
Deployment
Monitoring
H2O.ai Confidential
Fine-tuning
Refining pre-trained
models using
task-specific data,
enhancing their
performance on
targeted tasks.
Foundation
Powerful language
models trained on
extensive text data,
forming the basis for
various language
tasks.
2. Building Steps for LLMs
01 03
Eval LLMs
Thoroughly assessing
and comparing LLMs
is increasingly vital
due to their
heightened
significance and
complexity.
04
05
04
03
02
01
DataPrep
Converting
documents into
instruction pairs, like
QA pairs, facilitating
fine-tuning and tasks.
02
Database & Applications
Optimize data usage by seamlessly
integrating new PDFs into the
database, eliminating the need for
model retraining.
Improve user experiences through
advanced language comprehension
and LLM-driven response
generation.
05
H2O.ai Confidential
Fine-tuning
Refining pre-trained
models using
task-specific data,
enhancing their
performance on
targeted tasks.
Foundation
Powerful language
models trained on
extensive text data,
forming the basis for
various language
tasks.
2. Emphasis on the DataPrep Stage
01 03
Eval LLMs
Thoroughly assessing
and comparing LLMs
is increasingly vital
due to their
heightened
significance and
complexity.
04
05
04
03
02
01
DataPrep
Converting
documents into
instruction pairs, like
QA pairs, facilitating
fine-tuning and tasks.
02
Database & Applications
Optimize data usage by seamlessly
integrating new PDFs into the
database, eliminating the need for
model retraining.
Improve user experiences through
advanced language comprehension
and LLM-driven response
generation.
05
H2O.ai Confidential
Table of Contents
1. What are Large Language Models (LLMs)?
2. Steps in Building LLMs
3. Importance of Data Cleaning for LLMs
4. What is LLM DataStudio? (+ Interface Demo)
5. Generate a Clean Dataset from a PDF File (Doc2QA)
Quizzes
Q&As
H2O.ai Confidential
3. Key Benefits of Data Cleanliness in
Language Models
1. Improved Model Performance
2. Mitigated Bias and Unwanted Influences
3. Consistency and Coherence
4. Enhanced Generalization
5. Ethical Considerations
6. Improved User Experience and Trust
H2O.ai Confidential
3. Key Aspects in DataPrep for LLMs
H2O.ai Confidential
Table of Contents
1. What are Large Language Models (LLMs)?
2. Steps in Building LLMs
3. Importance of Data Cleaning for LLMs
4. What is LLM DataStudio? (+ Interface Demo)
5. Generate a Clean Dataset from a PDF File (Doc2QA)
Quizzes
Q&As
H2O.ai Confidential
4. Definition of LLM DataStudio
H2O.ai Confidential
H2O.ai Confidential
H2O AI and GenAI Ecosystem
Documents
Data Sources
LLM
DataStudio
myGPT
LLM
EvalStudio
Vector DB
(Embeddings
)
Alternative
Datasets
Query + Documents
(Context)
Talk to Your
Data
● Ques Answers
● Context Search
● Doc Retrieval
● Similar Doc
● Personalization
Contextual Similarity
Continuous Eval
(feedback)
Gen AI App Store
+ +
+ +
+
+
Datasets AI Engines AI Apps
LLM
Integration
Models
Data to QA pairs
ETL for LLMs
LLM Fine Tuning
Custom GPT
API
End User
Enterprise
H2O.ai Confidential
H2O.ai Confidential
4. Enhancing LLM Data with LLM DataStudio
LLM DataStudio features:
● Q&A Generative of text and audio data
● Text Cleaning
● Data Quality Issue Detection
● Tokenization
● Text Length Control
H2O.ai Confidential
4. Interface Demo
H2O.ai Confidential
4. Demo - Curate
H2O.ai LLM Studio Website
https://h2o.ai/platform/ai-cloud/make/llm-studio/
H2O.ai Confidential
Table of Contents
1. What are Large Language Models (LLMs)?
2. Steps in Building LLMs
3. Importance of Data Cleaning for LLMs
4. What is LLM DataStudio? (+ Interface Demo)
5. Generate a Clean Dataset from a PDF File (Doc2QA)
Quizzes
Q&As
H2O.ai Confidential
Structured Data Preparation
Workflow in LLM DataStudio
LLM DataStudio follows a structured data
preparation process.
The process includes several stages:
❏ Data intake
❏ Workflow construction
❏ Configuration
❏ Assessment
❏ Result generation
H2O.ai Confidential
5. The Workflow Builder - Demo
H2O.ai Confidential
5. Demo - Generate a Clean Dataset
from a PDF File (Doc2QA)
A Comprehensive Overview of Large Language Models
https://arxiv.org/pdf/2307.06435.pdf
H2O.ai Confidential
H2O.ai Confidential
Thank you!
H2O.ai Confidential
H2O.ai Confidential
LLM Studio
Overview
Andreea Turcu
Customer Data Scientist
@H2O.ai
H2O.ai Confidential
H2O.ai Confidential
Table of Contents
What are LLMs?
Foundation vs. Fine-tuning
LLM Studio Intro
Demo / Follow along:
Connect to LLM Studio
The LLM Studio GUI
Launching an Experiment
Monitoring the Experiment
Next Steps with LLM Studio (model export)
H2O.ai Confidential
H2O.ai Confidential
A large language model is a type of AI
algorithm trained on huge amounts of text
data that can understand and generate
text.
H2O.ai Confidential
H2O.ai Confidential
LLMs can be characterized by 4 parameters:
● size of the training dataset
● cost of training
● size of the model
● performance after training
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
Let’s follow along!
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
H2O.ai Confidential
Intro to h2oGPT
by Andreea Turcu
H2O.ai Confidential
Agenda A bit of context
What are GPTs?
Why know what LLMs are?
LLMs origins
What is h2oGPT?
Boosting your productivity with h2oGPT
Limitations of Existing models
Benefits of Open Source models
Demo of h2oGPT
H2O.ai Confidential
v
What are GPTs?
H2O.ai Confidential
v
H2O.ai Confidential
v
Why should I know what LLMs are?
H2O.ai Confidential
v
Why should I know what LLMs are?
H2O.ai Confidential
v
Why should I know what LLMs are?
Large language models like GPT have diverse business uses:
● automating content
● extracting insights from data,
● personalizing marketing,
● enabling virtual assistants,
● analyzing data,
● facilitating voice-based interactions and translations, etc.
H2O.ai Confidential
v
What are LLMs?
- LLMs (Language Models) are computational models for understanding and generating human
language.
- They are trained on vast amounts of text data.
- LLMs learn grammar, vocabulary, and contextual relationships.
- They can generate coherent and contextually relevant text based on given prompts.
- Collaboration with AI systems becomes more efficient.
- Responsible use and enhanced user experiences can be achieved.
H2O.ai Confidential
v
LLM Origins
Transformers are deep feed-forward neural networks that leverage a machine learning
mechanism called (self) attention and have seen wild success in natural language
processing problems
h2oGPT
The world’s best
completely open
source LLM and
permissible for
commercial use
2023
ChatGPT
Interactive interface
for users to interact
directly with GPT3
and GPT4 modeling
frameworks
2022
GPT
Auto-regressive
language modeling
where the goal is to
predict the next
token
2020
BERT
Bidirectional Encoder
Representations from
Transformers.
Model designed to
recover masked tokens
2019
Encoder-Decoder
(Seq2Seq)
Original Transformer
Architecture for
Machine Translation or
Sequence-to-Sequence
Problems
2017
Reference: https://arxiv.org/pdf/2207.09238.pdf
H2O.ai Confidential
v
What is h2oGPT?
H2O.ai Confidential
v
H2O.ai Confidential
H2O.ai Confidential
AI Will Boost Productivity by
10x
Continuous but slow improvements in
automatization and productivity.
Productivity in the US has increased by
250% in 70 years.*
In addition to small specialized models, LLMs
are supporting employees in their daily tasks.
Brainstorming, coding, summarization, analysis
No Code and AutoML enables all companies
to build and use highly accurate models for
specialized tasks
1-Click to solve complex business goals
AI is used in automated
mode. Employees are
supervising their AI
co-workers. Robotics leaps
forward by incorporating
LLMs
2023
2022
up to 2021
2024
2025
*2020 | MIT Work of the Future
H2O.ai Confidential
H2O.ai Confidential
AI Will Boost Productivity by
10x
Continuous but slow improvements in
automatization and productivity.
Productivity in the US has increased by
250% in 70 years.*
In addition to small specialized models, LLMs
are supporting employees in their daily tasks.
Brainstorming, coding, summarization, analysis
No Code and AutoML enables all companies
to build and use highly accurate models for
specialized tasks
1-Click to solve complex business goals
AI is used in automated
mode. Employees are
supervising their AI
co-workers. Robotics leaps
forward by incorporating
LLMs
2023
2022
up to 2021
2024
2025
*2020 | MIT Work of the Future
H2O.ai Confidential
H2O.ai Confidential
AI Will Boost Productivity by
10x
Continuous but slow improvements in
automatization and productivity.
Productivity in the US has increased by
250% in 70 years.*
In addition to small specialized models, LLMs
are supporting employees in their daily tasks.
Brainstorming, coding, summarization, analysis
No Code and AutoML enables all companies
to build and use highly accurate models for
specialized tasks
1-Click to solve complex business goals
AI is used in automated
mode. Employees are
supervising their AI
co-workers. Robotics leaps
forward by incorporating
LLMs
2023
2022
up to 2021
2024
2025
*2020 | MIT Work of the Future
H2O.ai Confidential
v
Popular models such as OpenAI's ChatGPT/GPT-4, Anthropic's Claude, Microsoft's Bing AI Chat, Google's
Bard, and Cohere are powerful and effective, they have certain limitations compared to open-source LLMs:
1. Data Privacy and Security: Using hosted LLMs requires sending data to external servers. This can raise
concerns about data privacy, security, and compliance, especially for sensitive information or industries
with strict regulations.
2. Dependency and Customization: Hosted LLMs often limit the extent of customization and control, as
users rely on the service provider's infrastructure and predefined models.
3. Cost and Scalability: Hosted LLMs usually come with usage fees, which can increase significantly with
large-scale applications.
4. Access and Availability: Hosted LLMs may be subject to downtime or limited availability, affecting users'
access to the models.
Limitations of Existing Models
H2O.ai Confidential
v
1. Cost Effective as users can scale the models on their own infrastructure
without incurring additional costs from the service provider.
2. Flexible: Deployed on-premises or on private clouds, ensuring uninterrupted
access and reducing reliance on external providers.
3. Tunable: Allow users to tailor the models to their specific needs, deploy on
their own infrastructure, and even modify the underlying code.
Overall, open-source LLMs offer greater flexibility, control, and cost-effectiveness,
while addressing data privacy and security concerns. They foster a competitive
landscape in the AI industry and empower users to innovate and customize
models to suit their specific needs.
Benefits of Open Source Models
H2O.ai Confidential
v
h2oGPT
● Released as open source under Apache-2.0 license
● Active development: h2oai/h2ogpt
● See a demo
○ gpt.h2o.ai
○ 🤗 Hugging Face Spaces
What is it?
● Commercially usable code, data, and models
● Prompt engineering - ability to prepare open-source
datasets for tuning LLMs
● Tuning: Code for fine-tuning large language models
(currently up to 20B parameters) on commodity hardware
and enterprise GPU servers (single or multi node)
Optimizations
■ LoRA (low-rank approximation)
■ 4-bit and 8-bit quantization for memory-efficient
fine-tuning and generation.
● Deployable: Chatbot with UI and Python API
● Evaluation: LLM performance evaluation
The world’s best open source GPT
H2O.ai Confidential
https://gpt-gm.h2o.ai/
https://gpt.h2o.ai/
Demo of h2oGPT!
Disclaimer:
subject to modification
and updates
H2O.ai Confidential
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
Make you Own
GPT 27
Make you Own
GPT 27
Make you Own
GPT 27
Make you Own
GPT 27
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
Thank you!
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck
GENAI AppStudio
DEMO
H2O.ai Confidential
Explore
H2O GenAI App Store
Andreea Turcu
Head of Global Training @H2O.ai
H2O.ai Confidential
Table of Contents
Introduction
1. Why Generative AI?
2. H2O Generative AI Ecosystem
3. Generative AI Applications
4. H2O GenAI App Store Demo
Wrapping Up
H2O.ai Confidential
Introduction
H2O.ai Confidential
Why Generative AI?
● Society
● Company
● Individual
H2O.ai Confidential
Benefits of Generative AI
1. Content Creation
2. Creative Assistance
3. Natural Language Understanding
4. Personalization
5. Data Augmentation
6. Automation of Repetitive Tasks
7. Language Translation
etc.
H2O.ai Confidential
H2O Generative AI
Ecosystem
H2O.ai Confidential
H2O.ai Enterprise GenAI Platform
Documents as
Data Sources
LLM
DataStudio myGPT
EvalStudio
Vector DB
(Embeddings)
R. A. G.
Talk to your Data
● Question Answering
● Context Search
● Information Retrieval
● Similar Documents
● User Personalization
Contextual Similarity
Continuous Eval
(feedback)
GenAI AppStore
+ +
+ +
+
+
Datasets AI Engines AI Apps
LLMs
Integration
Models
ETL for LLMs
Data to QA pairs LLM Fine Tuning
End Users
AI for Documents
Training Deployment
GenAI
AppStudio
Prompt
Studio
LLMOps
API
Ingestion
H2O.ai Confidential
Generative AI
Applications
H2O.ai Confidential
Possible Applications
● Content Generation
● Layout Design
● Image and Icon Generation
● Auto-Completion and Suggestions
● Personalization
● Chatbots and Conversational UIs
● Adaptive UIs
● Dynamic Theming
● Accessibility Features
● Prototyping and Design Exploration
H2O.ai Confidential
H2O.ai Enterprise GenAI Platform
GenAI App Store
+ +
+ +
+
+
Datasets AI Engines AI Apps
LLMs
Integration
Models
Training Deployment
GenAI
AppStudio
H2O.ai Confidential
Why GenAI Apps?
What does it take to solve a specific
problem?
● Custom inputs
● Custom prompts
● Custom LLMs (when needed)
● Custom data
● Management of all of the above
H2O.ai Confidential
H2O GenAI App Store
H2O.ai Confidential
Investment
Scam Shield
LLM based Scam
Prevention Service
LLM
Investment
Virtual Advisor
LLM based conversation
support services
LLM
Sales
Strategy Engine
LLM based strategy
generator
LLM
Sales
Report Generator
LLM based Report
Generator
LLM
Trading
Language Assist
LLM based multilingual
assistance
LLM
Asset Management
Risk Manager
Gen AI Risk Assessment
and Allocation
LLM
Asset Management
Recommender
Gen AI Product
Recommendations
LLM
Legal
Regulator
LLMs for Regulatory
Filings
LLM
Legal
Legal Assist
Automated Regulatory
Reporting using GenAI
LLM
Operations
Credit Scorer
LLMs Credit Scoring and
Underwriting
LLM
Operations
Transaction Monitor
LLM for Transaction
Monitoring
LLM
Security
Guard Rails
LLM in Security
LLM
Gen AI App Store
Apps Powered by LLMs (h2oGPT + myGPT) | Demos
Gen AI Applications
powered by LLMs to
provide faster
information retrieval
and search from
complex datasets,
models, and the
outputs.
LLMs blended with
typical statistical and
traditional models to
provide rich outputs
enhanced by LLMs
capabilities.
This includes :
Summarization,
Question Answering,
Talk to your Data +
Documents,
Generating Feature
stories
REPEATABLE AI / DATA USE-CASES
Packaged as AI Apps
Apps: Multiple AI Data Science Use Cases
Scam Shield
H2O.ai Entity Extraction in Legal Documents
Customer Churn Detection
Anomaly Detection
Fraud Analysis
Know Your Customer
H2O Document Insights
Next Best Conversation
Customer Profiling
Market Basket Analysis
Customer 360
GenAI App Store
powered by H2O AI Cloud
H2O GenAI App Store made public
H2O.ai Confidential
Demo Time!
genai.h2o.ai
H2O.ai Confidential
Wrapping Up
H2O.ai Confidential
H2O.ai Confidential
Thank you!
H2O.ai Confidential
Explore
H2O LLM EvalGPT
Andreea Turcu
Head of Global Training @H2O.ai
H2O.ai Confidential
Table of Contents
1. What are LLMs?
2. Why Evaluate LLMs?
3. What is H2O EvalGPT?
4. H2O EvalGPT User Interface
5. Conclusion
H2O.ai Confidential
What are LLMs?
v
H2O.ai Confidential
1. Natural Language Understanding
2. Text Generation
3. Automation and Efficiency
4. Advancements in AI Research
5. Ethical Consideration
Why are LLMs important?
v
H2O.ai Confidential
● Transforming Communication
● Augmenting Human Abilities
● Ethical and Societal Implications
● Economic Impact
LLMs are reshaping society!
H2O.ai Confidential
Why Evaluate LLMs?
H2O.ai Confidential
Key aspects of LLM evaluation (I)
1. Performance Metrics
2. Benchmarking
3. Fine-Tuning and Transfer Learning
4. Robustness and Generalization
5. Bias and Fairness
H2O.ai Confidential
Key aspects of LLM evaluation (II)
6. Computational Efficiency
7. Interpretability and Explainability
8. Domain-Specific Evaluation
9. User Feedback and Human Evaluation
H2O.ai Confidential
What is H2O
EvalGPT?
H2O.ai Confidential
GenAI AppStudio
Datasets
Unstructured
Datasets
Documents
ETL / Prep for LLMs
Documents → QA Pairs Fine Tuning LLMs
(& Prompts)
End Users
Vector DB
(Embeddings)
myGPT
R. A. G.
Talk to your Data
Document QA
Document Chat
Image/Video Chat
LLM Query
GenAI Apps
+ +
+ +
+
+
LLM
Data Studio
AI Engines
EvalStudio
AI Apps
+ LLMs
Integration
LLMOps
API
Prompt
Tuning
Parsing . Chunking
Indexing . Embeddings
LLM Agents
Chat / QA
Prompt Engineering
LLM
Workers
MLOps
Foundations of a GenAI Ecosystem
Continuous
Feedback
EvalGPT
8. GenAI Apps
5. Fine Tuning
6. Evaluation
4. Predictive ML
7. Integrations
3. Data
Preprocessing
2. Data
Collection
1. Problem
Definition
H2O.ai Confidential
GenAI AppStudio
Datasets
Unstructured
Datasets
Documents
ETL / Prep for LLMs
Documents → QA Pairs Fine Tuning LLMs
(& Prompts)
End Users
Vector DB
(Embeddings)
myGPT
R. A. G.
Talk to your Data
Document QA
Document Chat
Image/Video Chat
LLM Query
GenAI Apps
+ +
+ +
+
+
LLM
Data Studio
AI Engines
EvalStudio
AI Apps
+ LLMs
Integration
LLMOps
API
Prompt
Tuning
Parsing . Chunking
Indexing . Embeddings
LLM Agents
Chat / QA
Prompt Engineering
LLM
Workers
MLOps
Foundations of a GenAI Ecosystem
Continuous
Feedback
EvalGPT
v
H2O.ai Confidential
● Assess and compare Large Language Models
(LLMs) across tasks.
● Get detailed leaderboard results to streamline
workflows.
We evaluate LLMs using business data and offer
model submissions soon.
H2O EvalGPT:
v
H2O.ai Confidential
● Relevance
● Transparency
● Speed and Currency
● Scope
● Interactivity and Alignment
Key Features
H2O.ai Confidential
H2O EvalGPT User
Interface
evalgpt.ai
v
H2O.ai Confidential
Elo Ranking
v
H2O.ai Confidential
Evaluation Method
v
H2O.ai Confidential
A/B Testing
v
H2O.ai Confidential
Prompts
v
H2O.ai Confidential
Responses
H2O.ai Confidential
Wrapping Up
H2O.ai Confidential
Thank you!
H2O.ai Confidential
LLMs from A to Z
(data prep, building & deployment)
with H2O.ai
Audrey Létévé, Senior Customer Data Scientist
21st May 2024
Agenda
• Intro
– H2o Gen AI ecosystem and our AI-powered
search assistant : Enterprise h2oGPTe
– When/Why Fine-tuning your own LLM ?
• Preparing your data for fine-tuning
• Using Open Source H2O LLM Studio and train
your own LLM
• Deployment with H2O MLOps
H2O.ai Confidential
Unstructured
Datasets
Documents
ETL / Prep for LLMs
Documents → QA Pairs
Fine Tuning LLMs
End Users
Vector DB
(Embeddings)
myGPT
R. A. G.
Talk to your Data
Document QA
Document Chat
Image/Video Chat
LLM Query
LLM
Data Studio
EvalStudio
LLMOps
API
Continuous
Feedback
Parsing . Chunking
Indexing . Embeddings
Chat / QA
Prompt Engineering
LLM
Workers
Foundations of a GenAI Ecosystem
R. A. G. System
MLOps
Rules of thumb:
1. Don’t train to memorise facts
2. Start by trying an off-the-shelf LLM
3. Train to improve on desired task, domain, or style
Do you need to train an LLM for your task?
In many cases, your
use case may work
well with an
“off-the-shelf” LLM
without any changes
- Experiment
with how you
ask an LLM to
solve your use
case
- Small tweaks in
phrasing may
boost task
performance
dramatically
- Further train an
LLM with a
dataset of task
prompt-answers
- Start with a
dataset size in the
hundreds and
increase if
necessary
2
Use prompt
engineering
3
Fine-tune an
LLM
1
Use “off-the-shelf”
LLM as-is
Increasing
technical effort
H2O.ai Confidential
Demo
Documentation
H2O.ai Confidential
Demo
Documentation
LLM
Data Studio
H2O.ai Confidential
Demo
Documentation
Thank you!

More Related Content

PDF
Generative AI: Shifting the AI Landscape
PPTX
Google Cloud GenAI Overview_071223.pptx
PPTX
Databricks MLflow Object Relationships
PDF
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
PDF
Big Data Analytics (ML, DL, AI) hands-on
PDF
Large Language Models Bootcamp
PPTX
Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Se...
PDF
Artificial Intelligence for Project Managers: Are You Ready?
Generative AI: Shifting the AI Landscape
Google Cloud GenAI Overview_071223.pptx
Databricks MLflow Object Relationships
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Big Data Analytics (ML, DL, AI) hands-on
Large Language Models Bootcamp
Big Data Stockholm v 7 | "Federated Machine Learning for Collaborative and Se...
Artificial Intelligence for Project Managers: Are You Ready?

What's hot (20)

PPTX
Agentic AI: The Future of Intelligent Automation
PDF
GPT : Generative Pre-Training Model
PDF
LLM Learning Path Level 1 - Presentation Slides
PDF
Production machine learning: Managing models, workflows and risk at scale
PDF
Large Language Models, Data & APIs - Integrating Generative AI Power into you...
PDF
Exploring Opportunities in the Generative AI Value Chain.pdf
PDF
GenAi LLMs Zero to Hero: Mastering GenAI
PDF
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
PDF
Building and deploying LLM applications with Apache Airflow
PDF
Big data and analytics
PDF
Prompt Engineering - an Art, a Science, or your next Job Title?
PDF
The importance of model fairness and interpretability in AI systems
PPTX
Big data architectures and the data lake
PPTX
Creating an Enterprise AI Strategy
PDF
Automated Workflows and AI Agents with Amazon Bedrock
PDF
AI in healthcare - SF Bay ACM chapter
PDF
Machine Learning Ml Overview Algorithms Use Cases And Applications
PDF
LLMs in Production: Tooling, Process, and Team Structure
PPTX
Why JSON API?
PDF
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
Agentic AI: The Future of Intelligent Automation
GPT : Generative Pre-Training Model
LLM Learning Path Level 1 - Presentation Slides
Production machine learning: Managing models, workflows and risk at scale
Large Language Models, Data & APIs - Integrating Generative AI Power into you...
Exploring Opportunities in the Generative AI Value Chain.pdf
GenAi LLMs Zero to Hero: Mastering GenAI
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
Building and deploying LLM applications with Apache Airflow
Big data and analytics
Prompt Engineering - an Art, a Science, or your next Job Title?
The importance of model fairness and interpretability in AI systems
Big data architectures and the data lake
Creating an Enterprise AI Strategy
Automated Workflows and AI Agents with Amazon Bedrock
AI in healthcare - SF Bay ACM chapter
Machine Learning Ml Overview Algorithms Use Cases And Applications
LLMs in Production: Tooling, Process, and Team Structure
Why JSON API?
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
Ad

Similar to H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck (20)

PDF
LLM Learning Path Level 2 - Presentation Slides
PDF
Large Language Models (LLMs) - Level 3 Slides
PPTX
LangChain + Docugami Webinar
PDF
PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...
PDF
"Different software evolutions from Start till Release in PHP product" Oleksa...
PPTX
GPT, LLM, RAG, and RAG in Action: Understanding the Future of AI-Powered Info...
PPT
PDF Generation in Rails with Prawn and Prawn-to: John McCaffrey
PDF
Gilbane 2009 -- How Can Content Management Software Keep Pace?
PDF
Enterprise h2o GPTe Learning Path Slide Deck
PDF
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
PPTX
IBM Developer Model Asset eXchange
PDF
Scalable and Automatic Machine Learning with H2O
PPTX
Feature Store as a Data Foundation for Machine Learning
PDF
Class 12th IP project on buisness management
PPTX
Smart modeling of smart software
PPTX
Sharepoint Document Conversion
PPTX
The Information Governance Headache - SharePoint ECM
PPT
Edi text
PDF
Java Developers - What Lies Ahead in the AI era
PPTX
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
LLM Learning Path Level 2 - Presentation Slides
Large Language Models (LLMs) - Level 3 Slides
LangChain + Docugami Webinar
PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...
"Different software evolutions from Start till Release in PHP product" Oleksa...
GPT, LLM, RAG, and RAG in Action: Understanding the Future of AI-Powered Info...
PDF Generation in Rails with Prawn and Prawn-to: John McCaffrey
Gilbane 2009 -- How Can Content Management Software Keep Pace?
Enterprise h2o GPTe Learning Path Slide Deck
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
IBM Developer Model Asset eXchange
Scalable and Automatic Machine Learning with H2O
Feature Store as a Data Foundation for Machine Learning
Class 12th IP project on buisness management
Smart modeling of smart software
Sharepoint Document Conversion
The Information Governance Headache - SharePoint ECM
Edi text
Java Developers - What Lies Ahead in the AI era
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Ad

More from Sri Ambati (20)

PDF
H2O Label Genie Starter Track - Support Presentation
PDF
H2O.ai Agents : From Theory to Practice - Support Presentation
PDF
H2O Generative AI Starter Track - Support Presentation Slides.pdf
PDF
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
PDF
Intro to Enterprise h2oGPTe Presentation Slides
PDF
H2O Wave Course Starter - Presentation Slides
PDF
Data Science and Machine Learning Platforms (2024) Slides
PDF
Data Prep for H2O Driverless AI - Slides
PDF
H2O Cloud AI Developer Services - Slides (2024)
PDF
Hydrogen Torch - Starter Course - Presentation Slides
PDF
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
PDF
H2O Driverless AI Starter Course - Slides and Assignments
PPTX
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
PDF
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
PPTX
Generative AI Masterclass - Model Risk Management.pptx
PDF
AI and the Future of Software Development: A Sneak Peek
PPTX
LLMOps: Match report from the top of the 5th
PPTX
Building, Evaluating, and Optimizing your RAG App for Production
PPTX
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
PPTX
Risk Management for LLMs
H2O Label Genie Starter Track - Support Presentation
H2O.ai Agents : From Theory to Practice - Support Presentation
H2O Generative AI Starter Track - Support Presentation Slides.pdf
An In-depth Exploration of Enterprise h2oGPTe Slide Deck
Intro to Enterprise h2oGPTe Presentation Slides
H2O Wave Course Starter - Presentation Slides
Data Science and Machine Learning Platforms (2024) Slides
Data Prep for H2O Driverless AI - Slides
H2O Cloud AI Developer Services - Slides (2024)
Hydrogen Torch - Starter Course - Presentation Slides
Presentation Resources - H2O Gen AI Ecosystem Overview - Level 2
H2O Driverless AI Starter Course - Slides and Assignments
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Generative AI Masterclass - Model Risk Management.pptx
AI and the Future of Software Development: A Sneak Peek
LLMOps: Match report from the top of the 5th
Building, Evaluating, and Optimizing your RAG App for Production
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Risk Management for LLMs

Recently uploaded (20)

PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Cell Types and Its function , kingdom of life
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PPTX
Institutional Correction lecture only . . .
PDF
Insiders guide to clinical Medicine.pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
VCE English Exam - Section C Student Revision Booklet
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
2.FourierTransform-ShortQuestionswithAnswers.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
STATICS OF THE RIGID BODIES Hibbelers.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Cell Types and Its function , kingdom of life
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
102 student loan defaulters named and shamed – Is someone you know on the list?
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPH.pptx obstetrics and gynecology in nursing
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Microbial disease of the cardiovascular and lymphatic systems
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Institutional Correction lecture only . . .
Insiders guide to clinical Medicine.pdf
Complications of Minimal Access Surgery at WLH
VCE English Exam - Section C Student Revision Booklet

H2O Gen AI Ecosystem Overview - Level 1 - Slide Deck

  • 2. H2O.ai Confidential Intro to Andreea Turcu Head of Global Training @H2O.ai
  • 3. H2O.ai Confidential Table of Contents 1. What are Large Language Models (LLMs)? 2. Steps in Building LLMs 3. Importance of Data Cleaning for LLMs 4. What is LLM DataStudio? (+ Interface Demo) 5. Generate a Clean Dataset from a PDF File (Doc2QA) Quizzes Q&As
  • 4. H2O.ai Confidential Table of Contents 1. What are Large Language Models (LLMs)? 2. Steps in Building LLMs 3. Importance of Data Cleaning for LLMs 4. What is LLM DataStudio? (+ Interface Demo) 5. Generate a Clean Dataset from a PDF File (Doc2QA) Quizzes Q&As
  • 5. H2O.ai Confidential 1. Definition of LLMs Large Large Language Models (LLMs) are sophisticated artificial intelligence models specifically designed to understand and generate human-like language on an extensive scale.
  • 6. H2O.ai Confidential 1. Training, Patterns and Parameters Extensive Training Data: Trained on massive textual datasets from diverse sources Pattern and Meaning Learning: They absorb knowledge of words, sentence patterns and meanings Significant Parameters: Referred to as "large" due to a substantial number of parameters
  • 7. H2O.ai Confidential H2O.ai Confidential 1. Generative AI vs. LLMs Source: gpt.h2o.ai
  • 8. H2O.ai Confidential 1. LLMs vs. Foundation Models Foundation Models Large Language Models (LLMs) Unlabeled Training Data Additional Text-Based Data Transformer Algorithm Transformer Algorithm Foundation Model LLM Foundation Model: - Large machine learning model trained on unlabeled data. - Enhanced through transformer algorithms and fine-tuning. - Adaptable to various applications. Large Language Model (LLM): - Specific type of foundation model. - Tailored for natural language processing tasks. - Examples include GPT models (e.g., GPT-3).
  • 9. H2O.ai Confidential 1. Benefits of LLMs 1. Natural Language Processing (NLP) 2. Versatility Across Diverse Domains 3. Elevated Creativity in Content Generation 4. Facilitating Global Communication Breakthroughs 5. Information Extraction Efficiency
  • 10. H2O.ai Confidential 1. Challenges of LLMs 1. Computational Resources 2. Energy Consumption 3. Fine-tuning Complexity 4. Data Privacy Concerns 5. Interpretable Output
  • 11. H2O.ai Confidential Table of Contents 1. What are Large Language Models (LLMs)? 2. Steps in Building LLMs 3. Importance of Data Cleaning for LLMs 4. What is LLM DataStudio? (+ Interface Demo) 5. Generate a Clean Dataset from a PDF File (Doc2QA) Quizzes Q&As
  • 12. H2O.ai Confidential 2. The LLMs Lifecycle 1. Data Collection 2. Preprocessing 3. Model Architecture Design 4. Training the Model 5. Fine-tuning 6. Validation and Evaluation 7. Deployment 8. Monitoring
  • 13. H2O.ai Confidential 2. Preprocessing is important 1. Data Collection 2. Preprocessing 3. Model Architecture Design 4. Training the Model 5. Fine-tuning 6. Validation and Evaluation 7. Deployment 8. Monitoring
  • 14. H2O.ai Confidential 2. Gold in - Gold Out Data Collection Preprocessing Model Architecture Design Training the Model Fine-tuning Validation and Evaluation Deployment Monitoring
  • 15. H2O.ai Confidential Fine-tuning Refining pre-trained models using task-specific data, enhancing their performance on targeted tasks. Foundation Powerful language models trained on extensive text data, forming the basis for various language tasks. 2. Building Steps for LLMs 01 03 Eval LLMs Thoroughly assessing and comparing LLMs is increasingly vital due to their heightened significance and complexity. 04 05 04 03 02 01 DataPrep Converting documents into instruction pairs, like QA pairs, facilitating fine-tuning and tasks. 02 Database & Applications Optimize data usage by seamlessly integrating new PDFs into the database, eliminating the need for model retraining. Improve user experiences through advanced language comprehension and LLM-driven response generation. 05
  • 16. H2O.ai Confidential Fine-tuning Refining pre-trained models using task-specific data, enhancing their performance on targeted tasks. Foundation Powerful language models trained on extensive text data, forming the basis for various language tasks. 2. Emphasis on the DataPrep Stage 01 03 Eval LLMs Thoroughly assessing and comparing LLMs is increasingly vital due to their heightened significance and complexity. 04 05 04 03 02 01 DataPrep Converting documents into instruction pairs, like QA pairs, facilitating fine-tuning and tasks. 02 Database & Applications Optimize data usage by seamlessly integrating new PDFs into the database, eliminating the need for model retraining. Improve user experiences through advanced language comprehension and LLM-driven response generation. 05
  • 17. H2O.ai Confidential Table of Contents 1. What are Large Language Models (LLMs)? 2. Steps in Building LLMs 3. Importance of Data Cleaning for LLMs 4. What is LLM DataStudio? (+ Interface Demo) 5. Generate a Clean Dataset from a PDF File (Doc2QA) Quizzes Q&As
  • 18. H2O.ai Confidential 3. Key Benefits of Data Cleanliness in Language Models 1. Improved Model Performance 2. Mitigated Bias and Unwanted Influences 3. Consistency and Coherence 4. Enhanced Generalization 5. Ethical Considerations 6. Improved User Experience and Trust
  • 19. H2O.ai Confidential 3. Key Aspects in DataPrep for LLMs
  • 20. H2O.ai Confidential Table of Contents 1. What are Large Language Models (LLMs)? 2. Steps in Building LLMs 3. Importance of Data Cleaning for LLMs 4. What is LLM DataStudio? (+ Interface Demo) 5. Generate a Clean Dataset from a PDF File (Doc2QA) Quizzes Q&As
  • 22. H2O.ai Confidential H2O.ai Confidential H2O AI and GenAI Ecosystem Documents Data Sources LLM DataStudio myGPT LLM EvalStudio Vector DB (Embeddings ) Alternative Datasets Query + Documents (Context) Talk to Your Data ● Ques Answers ● Context Search ● Doc Retrieval ● Similar Doc ● Personalization Contextual Similarity Continuous Eval (feedback) Gen AI App Store + + + + + + Datasets AI Engines AI Apps LLM Integration Models Data to QA pairs ETL for LLMs LLM Fine Tuning Custom GPT API End User Enterprise
  • 23. H2O.ai Confidential H2O.ai Confidential 4. Enhancing LLM Data with LLM DataStudio LLM DataStudio features: ● Q&A Generative of text and audio data ● Text Cleaning ● Data Quality Issue Detection ● Tokenization ● Text Length Control
  • 25. H2O.ai Confidential 4. Demo - Curate H2O.ai LLM Studio Website https://h2o.ai/platform/ai-cloud/make/llm-studio/
  • 26. H2O.ai Confidential Table of Contents 1. What are Large Language Models (LLMs)? 2. Steps in Building LLMs 3. Importance of Data Cleaning for LLMs 4. What is LLM DataStudio? (+ Interface Demo) 5. Generate a Clean Dataset from a PDF File (Doc2QA) Quizzes Q&As
  • 27. H2O.ai Confidential Structured Data Preparation Workflow in LLM DataStudio LLM DataStudio follows a structured data preparation process. The process includes several stages: ❏ Data intake ❏ Workflow construction ❏ Configuration ❏ Assessment ❏ Result generation
  • 28. H2O.ai Confidential 5. The Workflow Builder - Demo
  • 29. H2O.ai Confidential 5. Demo - Generate a Clean Dataset from a PDF File (Doc2QA) A Comprehensive Overview of Large Language Models https://arxiv.org/pdf/2307.06435.pdf
  • 32. H2O.ai Confidential H2O.ai Confidential LLM Studio Overview Andreea Turcu Customer Data Scientist @H2O.ai
  • 33. H2O.ai Confidential H2O.ai Confidential Table of Contents What are LLMs? Foundation vs. Fine-tuning LLM Studio Intro Demo / Follow along: Connect to LLM Studio The LLM Studio GUI Launching an Experiment Monitoring the Experiment Next Steps with LLM Studio (model export)
  • 34. H2O.ai Confidential H2O.ai Confidential A large language model is a type of AI algorithm trained on huge amounts of text data that can understand and generate text.
  • 35. H2O.ai Confidential H2O.ai Confidential LLMs can be characterized by 4 parameters: ● size of the training dataset ● cost of training ● size of the model ● performance after training
  • 47. H2O.ai Confidential Intro to h2oGPT by Andreea Turcu
  • 48. H2O.ai Confidential Agenda A bit of context What are GPTs? Why know what LLMs are? LLMs origins What is h2oGPT? Boosting your productivity with h2oGPT Limitations of Existing models Benefits of Open Source models Demo of h2oGPT
  • 51. H2O.ai Confidential v Why should I know what LLMs are?
  • 52. H2O.ai Confidential v Why should I know what LLMs are?
  • 53. H2O.ai Confidential v Why should I know what LLMs are? Large language models like GPT have diverse business uses: ● automating content ● extracting insights from data, ● personalizing marketing, ● enabling virtual assistants, ● analyzing data, ● facilitating voice-based interactions and translations, etc.
  • 54. H2O.ai Confidential v What are LLMs? - LLMs (Language Models) are computational models for understanding and generating human language. - They are trained on vast amounts of text data. - LLMs learn grammar, vocabulary, and contextual relationships. - They can generate coherent and contextually relevant text based on given prompts. - Collaboration with AI systems becomes more efficient. - Responsible use and enhanced user experiences can be achieved.
  • 55. H2O.ai Confidential v LLM Origins Transformers are deep feed-forward neural networks that leverage a machine learning mechanism called (self) attention and have seen wild success in natural language processing problems h2oGPT The world’s best completely open source LLM and permissible for commercial use 2023 ChatGPT Interactive interface for users to interact directly with GPT3 and GPT4 modeling frameworks 2022 GPT Auto-regressive language modeling where the goal is to predict the next token 2020 BERT Bidirectional Encoder Representations from Transformers. Model designed to recover masked tokens 2019 Encoder-Decoder (Seq2Seq) Original Transformer Architecture for Machine Translation or Sequence-to-Sequence Problems 2017 Reference: https://arxiv.org/pdf/2207.09238.pdf
  • 58. H2O.ai Confidential H2O.ai Confidential AI Will Boost Productivity by 10x Continuous but slow improvements in automatization and productivity. Productivity in the US has increased by 250% in 70 years.* In addition to small specialized models, LLMs are supporting employees in their daily tasks. Brainstorming, coding, summarization, analysis No Code and AutoML enables all companies to build and use highly accurate models for specialized tasks 1-Click to solve complex business goals AI is used in automated mode. Employees are supervising their AI co-workers. Robotics leaps forward by incorporating LLMs 2023 2022 up to 2021 2024 2025 *2020 | MIT Work of the Future
  • 59. H2O.ai Confidential H2O.ai Confidential AI Will Boost Productivity by 10x Continuous but slow improvements in automatization and productivity. Productivity in the US has increased by 250% in 70 years.* In addition to small specialized models, LLMs are supporting employees in their daily tasks. Brainstorming, coding, summarization, analysis No Code and AutoML enables all companies to build and use highly accurate models for specialized tasks 1-Click to solve complex business goals AI is used in automated mode. Employees are supervising their AI co-workers. Robotics leaps forward by incorporating LLMs 2023 2022 up to 2021 2024 2025 *2020 | MIT Work of the Future
  • 60. H2O.ai Confidential H2O.ai Confidential AI Will Boost Productivity by 10x Continuous but slow improvements in automatization and productivity. Productivity in the US has increased by 250% in 70 years.* In addition to small specialized models, LLMs are supporting employees in their daily tasks. Brainstorming, coding, summarization, analysis No Code and AutoML enables all companies to build and use highly accurate models for specialized tasks 1-Click to solve complex business goals AI is used in automated mode. Employees are supervising their AI co-workers. Robotics leaps forward by incorporating LLMs 2023 2022 up to 2021 2024 2025 *2020 | MIT Work of the Future
  • 61. H2O.ai Confidential v Popular models such as OpenAI's ChatGPT/GPT-4, Anthropic's Claude, Microsoft's Bing AI Chat, Google's Bard, and Cohere are powerful and effective, they have certain limitations compared to open-source LLMs: 1. Data Privacy and Security: Using hosted LLMs requires sending data to external servers. This can raise concerns about data privacy, security, and compliance, especially for sensitive information or industries with strict regulations. 2. Dependency and Customization: Hosted LLMs often limit the extent of customization and control, as users rely on the service provider's infrastructure and predefined models. 3. Cost and Scalability: Hosted LLMs usually come with usage fees, which can increase significantly with large-scale applications. 4. Access and Availability: Hosted LLMs may be subject to downtime or limited availability, affecting users' access to the models. Limitations of Existing Models
  • 62. H2O.ai Confidential v 1. Cost Effective as users can scale the models on their own infrastructure without incurring additional costs from the service provider. 2. Flexible: Deployed on-premises or on private clouds, ensuring uninterrupted access and reducing reliance on external providers. 3. Tunable: Allow users to tailor the models to their specific needs, deploy on their own infrastructure, and even modify the underlying code. Overall, open-source LLMs offer greater flexibility, control, and cost-effectiveness, while addressing data privacy and security concerns. They foster a competitive landscape in the AI industry and empower users to innovate and customize models to suit their specific needs. Benefits of Open Source Models
  • 63. H2O.ai Confidential v h2oGPT ● Released as open source under Apache-2.0 license ● Active development: h2oai/h2ogpt ● See a demo ○ gpt.h2o.ai ○ 🤗 Hugging Face Spaces What is it? ● Commercially usable code, data, and models ● Prompt engineering - ability to prepare open-source datasets for tuning LLMs ● Tuning: Code for fine-tuning large language models (currently up to 20B parameters) on commodity hardware and enterprise GPU servers (single or multi node) Optimizations ■ LoRA (low-rank approximation) ■ 4-bit and 8-bit quantization for memory-efficient fine-tuning and generation. ● Deployable: Chatbot with UI and Python API ● Evaluation: LLM performance evaluation The world’s best open source GPT
  • 102. H2O.ai Confidential Explore H2O GenAI App Store Andreea Turcu Head of Global Training @H2O.ai
  • 103. H2O.ai Confidential Table of Contents Introduction 1. Why Generative AI? 2. H2O Generative AI Ecosystem 3. Generative AI Applications 4. H2O GenAI App Store Demo Wrapping Up
  • 105. H2O.ai Confidential Why Generative AI? ● Society ● Company ● Individual
  • 106. H2O.ai Confidential Benefits of Generative AI 1. Content Creation 2. Creative Assistance 3. Natural Language Understanding 4. Personalization 5. Data Augmentation 6. Automation of Repetitive Tasks 7. Language Translation etc.
  • 108. H2O.ai Confidential H2O.ai Enterprise GenAI Platform Documents as Data Sources LLM DataStudio myGPT EvalStudio Vector DB (Embeddings) R. A. G. Talk to your Data ● Question Answering ● Context Search ● Information Retrieval ● Similar Documents ● User Personalization Contextual Similarity Continuous Eval (feedback) GenAI AppStore + + + + + + Datasets AI Engines AI Apps LLMs Integration Models ETL for LLMs Data to QA pairs LLM Fine Tuning End Users AI for Documents Training Deployment GenAI AppStudio Prompt Studio LLMOps API Ingestion
  • 110. H2O.ai Confidential Possible Applications ● Content Generation ● Layout Design ● Image and Icon Generation ● Auto-Completion and Suggestions ● Personalization ● Chatbots and Conversational UIs ● Adaptive UIs ● Dynamic Theming ● Accessibility Features ● Prototyping and Design Exploration
  • 111. H2O.ai Confidential H2O.ai Enterprise GenAI Platform GenAI App Store + + + + + + Datasets AI Engines AI Apps LLMs Integration Models Training Deployment GenAI AppStudio
  • 112. H2O.ai Confidential Why GenAI Apps? What does it take to solve a specific problem? ● Custom inputs ● Custom prompts ● Custom LLMs (when needed) ● Custom data ● Management of all of the above
  • 114. H2O.ai Confidential Investment Scam Shield LLM based Scam Prevention Service LLM Investment Virtual Advisor LLM based conversation support services LLM Sales Strategy Engine LLM based strategy generator LLM Sales Report Generator LLM based Report Generator LLM Trading Language Assist LLM based multilingual assistance LLM Asset Management Risk Manager Gen AI Risk Assessment and Allocation LLM Asset Management Recommender Gen AI Product Recommendations LLM Legal Regulator LLMs for Regulatory Filings LLM Legal Legal Assist Automated Regulatory Reporting using GenAI LLM Operations Credit Scorer LLMs Credit Scoring and Underwriting LLM Operations Transaction Monitor LLM for Transaction Monitoring LLM Security Guard Rails LLM in Security LLM Gen AI App Store Apps Powered by LLMs (h2oGPT + myGPT) | Demos Gen AI Applications powered by LLMs to provide faster information retrieval and search from complex datasets, models, and the outputs. LLMs blended with typical statistical and traditional models to provide rich outputs enhanced by LLMs capabilities. This includes : Summarization, Question Answering, Talk to your Data + Documents, Generating Feature stories
  • 115. REPEATABLE AI / DATA USE-CASES Packaged as AI Apps Apps: Multiple AI Data Science Use Cases Scam Shield H2O.ai Entity Extraction in Legal Documents Customer Churn Detection Anomaly Detection Fraud Analysis Know Your Customer H2O Document Insights Next Best Conversation Customer Profiling Market Basket Analysis Customer 360 GenAI App Store powered by H2O AI Cloud H2O GenAI App Store made public
  • 120. H2O.ai Confidential Explore H2O LLM EvalGPT Andreea Turcu Head of Global Training @H2O.ai
  • 121. H2O.ai Confidential Table of Contents 1. What are LLMs? 2. Why Evaluate LLMs? 3. What is H2O EvalGPT? 4. H2O EvalGPT User Interface 5. Conclusion
  • 123. v H2O.ai Confidential 1. Natural Language Understanding 2. Text Generation 3. Automation and Efficiency 4. Advancements in AI Research 5. Ethical Consideration Why are LLMs important?
  • 124. v H2O.ai Confidential ● Transforming Communication ● Augmenting Human Abilities ● Ethical and Societal Implications ● Economic Impact LLMs are reshaping society!
  • 126. H2O.ai Confidential Key aspects of LLM evaluation (I) 1. Performance Metrics 2. Benchmarking 3. Fine-Tuning and Transfer Learning 4. Robustness and Generalization 5. Bias and Fairness
  • 127. H2O.ai Confidential Key aspects of LLM evaluation (II) 6. Computational Efficiency 7. Interpretability and Explainability 8. Domain-Specific Evaluation 9. User Feedback and Human Evaluation
  • 129. H2O.ai Confidential GenAI AppStudio Datasets Unstructured Datasets Documents ETL / Prep for LLMs Documents → QA Pairs Fine Tuning LLMs (& Prompts) End Users Vector DB (Embeddings) myGPT R. A. G. Talk to your Data Document QA Document Chat Image/Video Chat LLM Query GenAI Apps + + + + + + LLM Data Studio AI Engines EvalStudio AI Apps + LLMs Integration LLMOps API Prompt Tuning Parsing . Chunking Indexing . Embeddings LLM Agents Chat / QA Prompt Engineering LLM Workers MLOps Foundations of a GenAI Ecosystem Continuous Feedback EvalGPT 8. GenAI Apps 5. Fine Tuning 6. Evaluation 4. Predictive ML 7. Integrations 3. Data Preprocessing 2. Data Collection 1. Problem Definition
  • 130. H2O.ai Confidential GenAI AppStudio Datasets Unstructured Datasets Documents ETL / Prep for LLMs Documents → QA Pairs Fine Tuning LLMs (& Prompts) End Users Vector DB (Embeddings) myGPT R. A. G. Talk to your Data Document QA Document Chat Image/Video Chat LLM Query GenAI Apps + + + + + + LLM Data Studio AI Engines EvalStudio AI Apps + LLMs Integration LLMOps API Prompt Tuning Parsing . Chunking Indexing . Embeddings LLM Agents Chat / QA Prompt Engineering LLM Workers MLOps Foundations of a GenAI Ecosystem Continuous Feedback EvalGPT
  • 131. v H2O.ai Confidential ● Assess and compare Large Language Models (LLMs) across tasks. ● Get detailed leaderboard results to streamline workflows. We evaluate LLMs using business data and offer model submissions soon. H2O EvalGPT:
  • 132. v H2O.ai Confidential ● Relevance ● Transparency ● Speed and Currency ● Scope ● Interactivity and Alignment Key Features
  • 133. H2O.ai Confidential H2O EvalGPT User Interface evalgpt.ai
  • 141. H2O.ai Confidential LLMs from A to Z (data prep, building & deployment) with H2O.ai Audrey Létévé, Senior Customer Data Scientist 21st May 2024
  • 142. Agenda • Intro – H2o Gen AI ecosystem and our AI-powered search assistant : Enterprise h2oGPTe – When/Why Fine-tuning your own LLM ? • Preparing your data for fine-tuning • Using Open Source H2O LLM Studio and train your own LLM • Deployment with H2O MLOps
  • 143. H2O.ai Confidential Unstructured Datasets Documents ETL / Prep for LLMs Documents → QA Pairs Fine Tuning LLMs End Users Vector DB (Embeddings) myGPT R. A. G. Talk to your Data Document QA Document Chat Image/Video Chat LLM Query LLM Data Studio EvalStudio LLMOps API Continuous Feedback Parsing . Chunking Indexing . Embeddings Chat / QA Prompt Engineering LLM Workers Foundations of a GenAI Ecosystem R. A. G. System MLOps
  • 144. Rules of thumb: 1. Don’t train to memorise facts 2. Start by trying an off-the-shelf LLM 3. Train to improve on desired task, domain, or style Do you need to train an LLM for your task? In many cases, your use case may work well with an “off-the-shelf” LLM without any changes - Experiment with how you ask an LLM to solve your use case - Small tweaks in phrasing may boost task performance dramatically - Further train an LLM with a dataset of task prompt-answers - Start with a dataset size in the hundreds and increase if necessary 2 Use prompt engineering 3 Fine-tune an LLM 1 Use “off-the-shelf” LLM as-is Increasing technical effort