From Traction to Production Maturing your LLMOps step by step

From Traction to
Production: Maturing your
LLMOps step by step
Maxim Salnikov
Digital & App Innovation Business Lead at
Microsoft

• Building on web platform
since 90s
• Organizing developer
communities and technical
conferences
• Speaking, training, blogging:
Webdev, Cloud, Generative AI,
Prompt Engineering
Helping developers to succeed with the Dev Tools, Cloud & AI in Microsoft
I’m Maxim Salnikov

For every $1 a
company invests in AI,
it is realizing an
average return of
$3.50
14months
Average time it takes for
organizations to realize a
return on their AI
investment
Source: IDC, The Business Opportunity of AI November 2023

What slows down Generative AI adoption?
Getting Started The state of the art is evolving so quickly, it makes it difficult to decide what to use. Along with that,
guidance and documentation is hard to find.
Development Applications often require multiple cutting-edge products and frameworks which requires specialized
expertise and new tools to stitch these components together.
Context Large Language Model doesn't know about your data
Evaluation It is hard to figure out which model to use and how to optimize for their use case.
Operationalization Concerns around privacy, security, and grounding. Developers lack the experience and tools to evaluate,
improve and validate the solutions for their Proof of Concepts, and to scale and operate in production.

Introducing LLMOps == How to bring LLMs apps to production
Bring together people, process, and platform to automate LLM-infused software
delivery & provide continuous value to our users.
People Process Platform

LLMOps benefits
Automation Collaboration Reproducibility
== VELOCITY and SECURITY (For LLMs)

The paradigm shift—from MLOps to LLMOps
Assets to share
Metrics/evaluations
ML models
Traditional MLOps
Audiences
ML Engineers
Data Scientists
ML Engineers
App developers
Model, data,
environments, features
LLMs, agents, plugins,
prompts, chains, APIs
Accuracy
Quality: accuracy, similarity
Harm: bias, toxicity
Honest: groundness
Cost: token per request
Latency: response time, RPS
Build from scratch
Pre-built, fine-tuned
served as API (MaaS)
LLMOps

LLM Lifecycle in the real world
Managing
Prototyping
Find LLMs
Hypo
t
h
e
s
i
s
T
r
y
p
r
o
m
p
t
s
BUSINESS
NEED
Ideating/
exploring
PREPARE FOR APP DEPLOYMENT
SEND FEEDBACK
D
e
p
l
o
y
L
L
M
A
p
p
/
U
I
Q
u
o
ta and cost management
C
ontent Filtering
M
o
n
i
t
o
r
i
n
g
Deploying /
Monitoring
Operationalizing
Saf
e
R
o
l
l
o
u
t
/
S
t
a
g
i
n
g
Prompt Engine
e
r
i
n
g
o
r
F
i
n
e
-
t
u
n
i
n
g
Optimizing
ADVANCE PROJECT
REVERT PROJECT
Evalua
t
i
o
n
E
x
c
e
p
t
i
o
n
Handling
Building/
augmenting
R
e
t
r
i
e
v
a
l
A
u
g
m
e
n
t
e
d
Generation

Run flow against
sample data
Evaluate prompt flow
Satisfied?
Run flow against
larger dataset
Evaluate prompt flow
Satisfied?
No No
Yes Yes
Modify flow
(prompts and tools, etc.)
2. Building/augmenting
Deploy endpoint
Integrate into
application
Add monitoring
and alerts
3. Operationalizing
1. Ideating/exploring
Identify business
use case
Connect to your data
and build LLM flows
Discover your model
Test sample prompt
Compare different
models

01 INITIAL
The Foundation of Explorations
Discovery of models and testing prompts.
Basic evaluation and monitoring.
02 DEFINED
Systematizing LLM Apps Development
Iterative model augmentation with prompt engineering and RAG.
Structured deployment and prompt-based evaluations.
03 MANAGED
Advanced LLM Workflows and Proactive Monitoring
Comprehensive prompt management, evaluation, and real-time deployment.
Advanced monitoring and automated alerts.
04 OPTIMIZED
Operational Excellence and Continuous Improvement
Seamless, collaborative environment for CI/CD.
Fully automated monitoring and model/prompt refinement.
LLMOps Maturity Model
Achieve generative AI operational excellence with the LLMOps maturity model | Microsoft Azure Blog

Get to
know
Azure AI
Azure AI Foundry
One place for building and deploying AI solutions
Model
Catalog
Complete AI
Toolchain
Responsible AI
Practices
Enterprise-Grade
Production at Scale
Azure Machine Learning
Full-lifecycle tools for designing and managing responsible AI models
Responsible
Model Design
Prompt Flow
Orchestration
Model
Fine-Tuning
Model
Training
Cutting-Edge Models
Access to the latest foundation and open-source models
Azure AI Services
Pre-trained, turnkey solutions for intelligent applications
Azure OpenAI Service Azure AI Search Azure AI Speech Azure AI Vision
Azure AI Content Safety
Azure AI Document
Intelligence Azure AI Language Azure AI Translator
Azure AI
Infrastructure
State-of-the-art
silicon and systems
for AI workloads
High-Bandwidth
Networking
Microfluidic Cooling
Azure Maia Silicon
(ex-Studio)
https://ai.azure.com

Saf
e
R
o
l
l
o
u
t
/
S
t
a
g
i
n
g
Managing
SEND FEEDBACK
ADVANCE PROJECT
Find LLMs
T
r
y
p
r
o
m
p
t
s
Hypo
t
h
e
s
i
s
BUSINESS
NEED
D
e
p
l
o
y
L
L
M
A
p
p
/
U
I
Q
u
o
REVERT PROJECT
Prompt Engine
e
r
i
n
g
o
r
F
i
n
e
-
t
u
n
i
n
g
R
e
t
r
i
e
v
a
l
A
u
g
m
e
n
t
e
d
Generation
Evalua
t
i
o
n
E
x
c
e
p
t
i
o
n
Handling
C
ontent Filtering
M
o
n
i
t
o
r
i
n
g
Operationalizing
Building/
augmenting
Ideating/
exploring

Orca
Orca 1 Orca 2
Phi
Phi-1 Phi-2 Phi-3
Discover the power of small language models
Command R+
Command R
Llama-3
Llama-2
CodeLlama
*Available as
Model as a
service
Azure OpenAI
Service
GPT-4o
GPT-4o-mini
GPT-4o-realtime
GPT-o1-preview
DALL-E 3
Access a catalog full of pre-built and customizable frontier and open-source models
Falcon/TII
Stable Diffusion/
Stability AI
Dolly/Databricks
CLIP/OpenAI
Mistral Large
Mixtal 7b*8 –
Mixture of Experts
Mistral AI
Mistral 7b
Explore 1800+ comprehensive foundation models
*Available as
Model as a
service
Embed-v3
*Available as
Model as a
service
• Choose
• Compare
• Test

Comprehensive comparison view for benchmarking
metrics of foundation models
• Accuracy
• Groundedness
• Relevance
• Coherence
• Fluency
• Cost
• Latency
• Throughput

Retrieval Augmented Generation (RAG)
Anatomy of the workflow
App UX Orchestrator
User question
Query for
relevant content Retriever over
Knowledge
Base
Search results
R
Prompt + Knowledge
A
Response
G
Data Sources
(files, databases, etc.)
Azure AI Search
Azure OpenAI
Service
-2, -1 , 0, 1
2, 3, 4, 5
6, 7, 8, 9
Transform into
Embeddings
Answer

Retrieval techniques
Chunking Vectorization Indexing Ranking

Prompt Flow for LLMOps!
Create and iteratively develop flow
• Create executable flows that link LLMs, prompts,
Python code and other tools together.
• Debug and iterate your flows, especially the
interaction with LLMs with ease.
Evaluate flow quality and performance
• Evaluate your flow's quality and performance with
larger datasets.
• Integrate the testing and evaluation into your
CI/CD system to ensure quality of your flow.
Streamlined development cycle for production
• Deploy your flow to the serving platform you
choose or integrate into your app's code base
easily.
• Collaborate with your team by leveraging the
cloud version of Prompt flow in Azure AI.
Code-first!
https://github.com/microsoft/promptflow

Create your LLM flows
Develop your LLM flow from scratch
• Orchestrate executable flows with LLMs,
prompts, Python tools, and other APIs through
a visualized graph and code-first experiences
• Add new files, edit existing files, and import
filed to local for authoring flows
• Set conditional controls for the execution of
any node in a flow

Manage API connections
Manage APIs and external data sources
• Seamless integration with pre-built LLMs like
Azure OpenAI Service, Mistral Large, the Llama
family, and the Phi family.
• Built-in safety system with Azure AI Content
Safety
• Effectively manage credentials or secrets for
APIs
• Create your own connections in Python tools

Compare different prompt variants
Test different variants
• Create dynamic prompts using external data
and few shot samples
• Edit your complex prompts in full screen
• Quickly tune prompt and LLM configuration
with variants
• Run all variants with a single row of data and
check output

Fine-tune models in Azure AI model catalog
• Ready-to-use finetuning pipelines
to get started quickly – no need to
spend time installing
frameworks/dependencies
• Optimizations to reduce
finetuning resources and time
• Finetune using UI, Notebook
(Python SDK) or CLI (YAML)
• Serverless fine-tuning available in
Models as a Service

Managing
ADVANCE PROJECT
Find LLMs
T
r
y
p
r
o
m
p
t
s
Hypo
t
h
e
s
i
s
BUSINESS NEED
D
e
p
l
o
y
L
L
M
A
p
p
/
U
I
Q
u
o
SEND FEEDBACK
REVERT PROJECT
Prompt Engin
e
e
r
i
n
g
F
i
n
e
-
t
u
n
i
n
g
R
e
t
r
i
e
v
a
l
A
u
g
m
e
n
t
ed
Generation
Evalua
t
i
o
n
E
x
c
e
p
t
i
o
n
Handling
C
ontent Filtering
Safe
R
o
l
l
o
u
t
/
S
t
a
g
i
n
g
M
o
n
i
t
o
r
i
n
g
Ideating/ Exploring Building / Augmenting Operationalizing

Code-first LLMOps for production with developer tools
Use code to define flow
File based flow, organized in a well-defined folder structure
Support CLI/SDK
Smooth transition between cloud and local
Download flow to local, import flow to cloud
Develop, test, debug, deploy on local
Submit run from local to cloud
Manage runs/evaluation in cloud
Integrate with OSS frameworks
LangChain, Semantic Kernel, AutoGen
Automate with CI/CD pipelines
SDK/CLI to init, execute, evaluate, visualize flow and metrics
AZD template integration
Local development with VS Code Extension
Flow editor
Local connection management
Tracing and run history
Collaboration on experiment management
and productivity
Submit flow runs to
cloud from your repo
(anywhere)
Cloud resource
consumption
(compute, data,
storage, etc.)
Transitioning
iterative
development into
code base for
version control
Local development

Generative AI monitoring
Helps you track and improve the performance of LLM applications in production
Key benefits
• Utilize the Azure AI instrumentation SDK for
effortless production data logging
• Enable monitoring for operational (error rate,
latency)l, token and cost (token
usage), quality (groundedness, coherence,
etc) metrics for prompt flow deployments
• View extensive monitoring results for your
prompt flow deployment within a
comprehensive UI

Mature LLMOps to accelerate Generative AI adoption
• Assess current maturity stage
• Review LLM lifecycle for your solution
• Pick the right tools

Thank you! I kindly prompt you:
+ Special Offer: Identify where you are in the LLMOps journey in 5min!

From Traction to Production Maturing your LLMOps step by step

More Related Content

What's hot (20)

Similar to From Traction to Production Maturing your LLMOps step by step (20)

More from Maxim Salnikov (20)

Recently uploaded (20)

From Traction to Production Maturing your LLMOps step by step