SlideShare a Scribd company logo
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
2
AI 3-in-1: Agents, RAG, and Local Models
Presented by Brent Laster &
Tech Skills Transformations LLC
© 2025 Brent C. Laster & Tech Skills Transformations LLC
All rights reserved
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
3
About me
• Founder, Tech Skills Transformations LLC
• https://getskillsnow.com
• info@getskillsnow.com
• Long career in corporate as dev, manager,
and director in DevOps and other areas
• Author
• O'Reilly "reports"
• Books
• Professional Git
• Jenkins 2 – Up and
Running
• Learning GitHub
Actions
• Learning GitHub
Copilot
• AI-Enabled SDLC
• Speaker
• Social media
q LinkedIn: brentlaster
q X: @BrentCLaster
q Bluesky:
brentclaster.bsky.social
q GitHub: brentlaster
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
4
|
Running models locally
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
5
Why run models locally?
• Privacy - no need to share data
• Gives you control over setup, configuration, and customization options
§ Can tailor LLM to your needs, experiment with settings, integrate into your infra
• Can easily swap between different models for different tasks
• Work in offline mode
• Cost savings
§ No charges for subscriptions or API calls
• No censoring of results
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
6
Where to get models +
http://huggingface.co/models
http://kaggle.com/models
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
7
Options for running LLMs locally
• GPT4All - https://github.com/nomic-
ai/gpt4all
• LM Studio - https://lmstudio.ai
• Jan AI - https://jan.ai
• llama.cpp -
https://github.com/ggerganov/llama.cpp
• LlamaFile - https://github.com/Mozilla-
Ocho/llamafile
• Ollama - https://ollama.com/
• HuggingFace Transformers -
https://huggingface.co/docs/transformers
• More!
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
8
Ollama
• Command line tool for downloading,
exploring and using LLMs on local machine
• open source
• supports most of Hugging Face's popular
models
• allows uploading new ones
• Links:
§ main site: https://ollama.com
§ GitHub: https://github.com/ollama/
• Advantages
§ speeds up and simplifies
» model selection and download
» configuring endpoints
» integration with Python or JavaScript codebase
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
9
Working with Ollama #1
llama3.2
ollama pull
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
10
Working with Ollama #2
llama3.2
ollama run
>>> query
>>> Briefly explain what
an AI model is
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
12
Working with Ollama #3
llama3.2
ollama serve
http://localhost:11434/v1
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
13
|
Demo #1 – Simple program to work with local model
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
14
|
Agents
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
15
What is an AI Agent?
• A system that operates within an environment by using sensors to
perceive information, a decision-making mechanism to process and
reason about the data, and actuators to take actions that influence or
update/respond to the environment
• This interaction enables the agent to achieve specific goals
autonomously while continuously learning and adapting over time
• Agents use LLMs to identify key data, drive decisions, and communicate
naturally
User
LLM
Prompt “how to
think”
Tools +
Memory
Relevant
data and
decisions
about what
to do next
Response
and/or
action in
environment
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
16
Architectural Features of AI Agents
• AI autonomously outlines and
executes a logical series of
steps for accomplishing a
given objective.
• Provides the AI with a way to
dynamically adapt its
approach based on real-time
data and feedback..
• Might employ reflection to
evaluate and improve
responses
• Example: A research agent
plans search → summarize →
generate report.
• AI agents interact with
external APIs, databases,
and functions.
• Enhances LLMs by
providing access to real-
world knowledge.
• Reduces hallucinations
by using retrieval-
augmented generation
(RAG).
• Example: Calling a
Python function to
perform complex
calculations.
• Short-term handles tasks;
long term stores knowledge
and experience
• Memory ensures
consistency and efficiency in
multi-step decisions
• Memory recalls preferences
to enhance personalization
and user experience
• Example: Storing user
preferences for future
reference or personalized
responses
Planning Tool Use Memory
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
17
Agent Example
LLM
AI Agent
Weather
Search Tool
Initialize LLM with prompt
system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your primary
goal is to provide precise, helpful, and clear responses.
You have access to the following tools:
Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude: float,
Outputs: string
You should think step by step in order to fulfill the objective with a reasoning process divided into
Thought/Action/Observation. This cycle can repeat multiple times if needed.
You should first reflect with “Thought: {your_thoughts}” on the current query, then (if necessary), call a tool
with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with the prefix
“Final Answer:”“””
system_message=“””You are an AI assistant designed to help users
accurately and efficiently. Your primary goal is to provide precise, helpful,
and clear responses.
You have access to the following tools:
Tool Name: find_weather, Description: Get weather for a location.,
Arguments: latitude: float, longitude: float, Outputs: string
You should think step by step in order to fulfill the objective with a reasoning
process divided into Thought/Action/Observation. This cycle can repeat
multiple times if needed.
You should first reflect with“Thought: {your_thoughts}”on the current
query, then (if necessary), call a tool with the proper JSON formatting
“Action: {JSON_BLOB}”, or else print your final answer starting with the
prefix“Final Answer:”“””
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
18
Agent Example
User
What’s the
weather in
Paris?
LLM
Weather
Search Tool
Chain of Thought – Step 1: Interpret User Query
Thought: ”The user is asking about the weather
in Paris. I need to extract ’Paris’ as the location.
Action: Extracted location = “Paris”
AI Agent
system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your
primary goal is to provide precise, helpful, and clear responses.
You have access to the following tools:
Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude:
float, Outputs: string
You should think step by step in order to fulfill the objective with a reasoning process divided into
Thought/Action/Observation. This cycle can repeat multiple times if needed.
You should first reflect with “Thought: {your_thoughts}” on the current query, then (if necessary), call a
tool with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with
the prefix “Final Answer:”“””
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
19
Agent Example
User
What’s the
weather in
Paris?
LLM
Weather
Search Tool
AI Agent
Chain of Thought – Step 2: Decide to use tool
Thought: ”I need real-time data, so I will call
the ‘find_weather’ tool. First, I need to get the
latitude and longitude for the tool call.
AIResponse(
tool_calls=[{
name: “find_weather”
parameters: {
latitude: “48.8566”,
longitude: “2.3522”,
},
id: “call_tool123”,
type: “tool_invoke”
}]
)
system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your
primary goal is to provide precise, helpful, and clear responses.
You have access to the following tools:
Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude:
float, Outputs: string
You should think step by step in order to fulfill the objective with a reasoning process divided into
Thought/Action/Observation. This cycle can repeat multiple times if needed.
You should first reflect with “Thought: {your_thoughts}” on the current query, then (if necessary), call a
tool with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with
the prefix “Final Answer:”“””
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
20
Agent Example
User
What’s the
weather in
Paris?
LLM
Weather
Search Tool
AI Agent
AIResponse(
tool_calls=[{
name: “find_weather”
parameters: {
latitude: “48.8566”,
longitude: “2.3522”,
},
id: “call_tool123”,
type: “tool_invoke”
}]
)
{
name: “find_weather”
parameters: {
latitude: “48.8566”,
longitude: “2.3522”,
},
id: “call_tool123”,
type: “tool_invoke”
}
Agent parses LLM output
identifies JSON tool call,
parses it, forms it into
actual tool call
system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your
primary goal is to provide precise, helpful, and clear responses.
You have access to the following tools:
Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude:
float, Outputs: string
You should think step by step in order to fulfill the objective with a reasoning process divided into
Thought/Action/Observation. This cycle can repeat multiple times if needed.
You should first reflect with “Thought: {your_thoughts}” on the current query, then (if necessary), call a
tool with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with
the prefix “Final Answer:”“””
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
21
Agent Example
User
What’s the
weather in
Paris?
LLM
Weather
Search Tool
AI Agent
AIResponse(
tool_calls=[{
name: “find_weather”
parameters: {
latitude: “48.8566”,
longitude: “2.3522”,
},
id: “call_tool123”,
type: “tool_invoke”
}]
)
{
name: “find_weather”
parameters: {
latitude: “48.8566”,
longitude: “2.3522”,
},
id: “call_tool123”,
type: “tool_invoke”
}
system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your
primary goal is to provide precise, helpful, and clear responses.
You have access to the following tools:
Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude:
float, Outputs: string
You should think step by step in order to fulfill the objective with a reasoning process divided into
Thought/Action/Observation. This cycle can repeat multiple times if needed.
You should first reflect with “Thought: {your_thoughts}” on the current query, then (if necessary), call a
tool with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with
the prefix “Final Answer:”“””
Agent executes tool call
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
22
Agent Example
User
What’s the
weather in
Paris?
LLM
Weather
Search Tool
AI Agent
Weather tool returns result
ToolResponse(
content=“53 and
rainy”,
name=“find_weather”,
tool_invoke_id:
“call_tool123”
)
AIResponse(
tool_calls=[{
name: “find_weather”
parameters: {
latitude: “48.8566”,
longitude: “2.3522”,
},
id: “call_tool123”,
type: “tool_invoke”
}]
)
{
name: “find_weather”
parameters: {
latitude: “48.8566”,
longitude: “2.3522”,
},
id: “call_tool123”,
type: “tool_invoke”
}
system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your
primary goal is to provide precise, helpful, and clear responses.
You have access to the following tools:
Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude:
float, Outputs: string
You should think step by step in order to fulfill the objective with a reasoning process divided into
Thought/Action/Observation. This cycle can repeat multiple times if needed.
You should first reflect with “Thought: {your_thoughts}” on the current query, then (if necessary), call a
tool with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with
the prefix “Final Answer:”“””
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
23
Agent Example
User
What’s the
weather in
Paris?
LLM
Weather
Search Tool
AI Agent
ToolResponse(
content=“53 and
rainy”,
name=“find_weather”,
tool_invoke_id:
“call_tool123”
)
AIResponse(
tool_calls=[{
name: “find_weather”
parameters: {
latitude: “48.8566”,
longitude: “2.3522”,
},
id: “call_tool123”,
type: “tool_invoke”
}]
)
{
name: “find_weather”
parameters: {
latitude: “48.8566”,
longitude: “2.3522”,
},
id: “call_tool123”,
type: “tool_invoke”
}
system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your
primary goal is to provide precise, helpful, and clear responses.
You have access to the following tools:
Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude:
float, Outputs: string
You should think step by step in order to fulfill the objective with a reasoning process divided into
Thought/Action/Observation. This cycle can repeat multiple times if needed.
You should first reflect with “Thought: {your_thoughts}” on the current query, then (if necessary), call a
tool with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with
the prefix “Final Answer:”“””
Agent includes tool
output in
message/prompt back
to model
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
24
Agent Example
User
What’s the
weather in
Paris?
LLM
Weather
Search Tool
AI Agent
ToolResponse(
content=“53 and
rainy”,
name=“find_weather”,
tool_invoke_id:
“call_tool123”
)
Chain of Thought – Step 3 : Interpret JSON Response
Thought: ”The tool returned weather data for Paris. I
will summarize the information concisely.
AIResponse(
tool_calls=[{
name: “find_weather”
parameters: {
latitude: “48.8566”,
longitude: “2.3522”,
},
id: “call_tool123”,
type: “tool_invoke”
}]
)
{
name: “find_weather”
parameters: {
latitude: “48.8566”,
longitude: “2.3522”,
},
id: “call_tool123”,
type: “tool_invoke”
}
system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your
primary goal is to provide precise, helpful, and clear responses.
You have access to the following tools:
Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude:
float, Outputs: string
You should think step by step in order to fulfill the objective with a reasoning process divided into
Thought/Action/Observation. This cycle can repeat multiple times if needed.
You should first reflect with “Thought: {your_thoughts}” on the current query, then (if necessary), call a
tool with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with
the prefix “Final Answer:”“””
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
25
Agent Example
User
What’s the
weather in
Paris?
LLM
Weather
Search Tool
AI Agent
ToolResponse(
content=“53 and
rainy”,
name=“find_weather”,
tool_invoke_id:
“call_tool123”
)
AIFinalResponse(
content=“The current
weather in Paris is 53
degrees Celsius with
light rain.”
)
AIResponse(
tool_calls=[{
name: “find_weather”
parameters: {
latitude: “48.8566”,
longitude: “2.3522”,
},
id: “call_tool123”,
type: “tool_invoke”
}]
)
{
name: “find_weather”
parameters: {
latitude: “48.8566”,
longitude: “2.3522”,
},
id: “call_tool123”,
type: “tool_invoke”
}
system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your
primary goal is to provide precise, helpful, and clear responses.
You have access to the following tools:
Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude:
float, Outputs: string
You should think step by step in order to fulfill the objective with a reasoning process divided into
Thought/Action/Observation. This cycle can repeat multiple times if needed.
You should first reflect with “Thought: {your_thoughts}” on the current query, then (if necessary), call a
tool with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with
the prefix “Final Answer:”“””
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
26
Agent Example
User
What’s the
weather in
Paris?
LLM
Weather
Search Tool
AI Agent
AIResponse(
tool_calls=[{
name: “find_weather”
parameters: {
location: “Paris”,
},
id: “call_tool123”,
type: “tool_invoke”
}]
)
ToolResponse(
content=“53 and
rainy”,
name=“find_weather”,
tool_invoke_id:
“call_tool123”
)
AIFinalResponse(
content=“The current
weather in Paris is 53
degrees Celsius with
light rain.”
)
system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your
primary goal is to provide precise, helpful, and clear responses.
You have access to the following tools:
Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude:
float, Outputs: string
You should think step by step in order to fulfill the objective with a reasoning process divided into
Thought/Action/Observation. This cycle can repeat multiple times if needed.
You should first reflect with “Thought: {your_thoughts}” on the current query, then (if necessary), call a
tool with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with
the prefix “Final Answer:”“””
{
name: “find_weather”
parameters: {
latitude: “48.8566”,
longitude: “2.3522”,
},
id: “call_tool123”,
type: “tool_invoke”
}
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
27
|
Demo #2 – Adding agency to our code
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
28
|
RAG
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
29
What is RAG and how does it work?
Source: https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/
• Combination of retrieval and generation: RAG combines information retrieval (like a search engine) with text generation (like a
language model).
• Uses external knowledge: Instead of relying solely on pre-trained knowledge, RAG retrieves relevant documents or data from an
external source (like a database or private knowledge bases) to generate more accurate and up-to-date responses.
• Improves factual accuracy: By pulling in real-time data or documents, RAG reduces the risk of generating factually incorrect or
outdated information.
• Two-step process:
• Retrieve: The model searches for relevant information from a knowledge source.
• Generate: It then uses the retrieved data to create a coherent, contextually accurate answer.
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
30
How is RAG setup?
Retrieve
Documents
Embedding
Model Data Store
Documents /
Knowledge Base
Document
Embeddings
Doc Ingestion and Retrieval
• You provide data sources and point application to them
• Info is retrieved from the data sources and tokenized, embedded and stored in a data store
• For queries/prompts, application gathers results (most relevant ones) from the vector
database with your data
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
31
Embeddings
• Embeddings represent text as sets of numeric data - tensors (lots of
dimensions)
• Each dimension stores some info about the text's meaning, context,
or syntactical aspects
• Words or sentences with similar meanings are stored closer together
in the vector space
§ If two pieces of text are similar syntactically, they will have
similar embeddings (smaller distance between their vectors)
• During training, models learn to place text with similar meanings
closer together in the embedding space
• Common pre-trained models used for generating embeddings
include BERT and variants (RoBERTa, DistilBERT)
• Once you have embeddings, you can use them for NLP tasks like
semantic search, text classification, sentiment analysis
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
32
Understanding vectors in AI
• Collection of data points that encapsulate an item's relationship to
other items
Distance to Raleigh, NC USA
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
33
Distance to Raleigh, NC USA
3,960
Understanding vectors in AI
• Collection of data points that encapsulate an item's relationship to
other items
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
34
Distance to Raleigh, NC USA
3,960
6,609
Understanding vectors in AI
• Collection of data points that encapsulate an item's relationship to
other items
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
35
Distance to Raleigh, NC USA
3,960
6,609
2,839.4
Understanding vectors in AI
• Collection of data points that encapsulate an item's relationship to
other items
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
36
Distance to Raleigh, NC USA
3,960
6,609
2,839.4
6,001
Understanding vectors in AI
• Collection of data points that encapsulate an item's relationship to
other items
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
37
Distance to Raleigh, NC USA
3,960
6,609
2,839.4
6,001
507.6
Understanding vectors in AI
• Collection of data points that encapsulate an item's relationship to
other items
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
38
Distance to Raleigh, NC USA
3,960
6,609
2,839.4
6,001
507.6
3,872
Understanding vectors in AI
• Collection of data points that encapsulate an item's relationship to
other items
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
39
Distance to Raleigh, NC USA
3,960
6,609
2,839.4
6,001
507.6
3,872
7,679
Understanding vectors in AI
• Collection of data points that encapsulate an item's relationship to
other items
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
40
Distance to Raleigh, NC USA
3,960
6,609
2,839.4
6,001
507.6
3,872
7,679
2,870.1
Understanding vectors in AI
• Collection of data points that encapsulate an item's relationship to
other items
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
41
Semantic meaning / relationships
• Suppose we have 3
words
• King and Queen are
more similar to each
other than they are to
lunch
• In order for neural net to
understand the
relationships, each word
needs to be represented
as a vector
• Suppose each word is
represented by a 2-
dimensional vector
King
Queen
Lunch
- 130.16
89.5
- 115.43
95.2
- 89.5
34.3
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
42
Embedding space
• Plotting in 2-dimensional embedding space
shows relationships
• Way to let NN understand relationships
between words
• We want the NN to learn that King and
Queen are more similar to each other than
they are to lunch
2-dimensional space for word embeddings
Dimension
2
40
50
60
70
80
90
100
-140 -130 -120 -110 -100 -90 -80
Dimension 1
King
Queen
Lunch
- 130.16
89.5
- 115.43
95.2
- 89.5
34.3
King
Queen
Lunch
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
43
Searching for Vectors - similarity metrics
• 3 metrics commonly used to determine similarity of two vectors (2-dimensional representation)
Cosine similarity - measure the angle between two vectors; values from -
1 to 1; 1 = both point in same direction; -1 point in opposite directions; 0 =
orthogonal (perpendicular)
Dot product / inner product - measures how well 2 vectors align with
each other; values from - ∞ to ∞; positive values indicate vectors are in
same direction; negative values indicate opposite directions; 0 = orthogonal
Euclidean distance - measures the distance between two vectors; values
from 0 to ∞; 0 = identical; larger numbers farther apart credit: https://towardsdatascience.com/similarity-metrics-in-nlp-acc0777e234c
imagine 3 vectors - a,b,c
Cosine similarity
Dot product / inner product
Euclidean distance
0.0141
0.0167
0.9998
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
44
Visualizing Embeddings and Vector Similarity
source: https://projector.tensorflow.org/?config=https://gist.githubusercontent.com/martin-
labrecque/4483ff5a104f0b56417585c3bc9a12f1/raw/57348e12a70c8d70c2c573d3dbc0122ac077556b/journaux_config.json
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
45
Vectors and relationships example
• Query - what words are related to "dog" in model "English Wikipedia"?
Source: http://vectors.nlpl.eu/explore/embeddings/en/MOD_enwiki_upos_skipgram_300_2_2021/dog_NOUN/
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
46
|
Vector Databases
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
47
Vector Databases
• Specialized database that index and
stores vector embeddings
• Useful for
§ fast retrieval
§ similarity search
• Offer comprehensive data management
capabilities
§ metadata storage
§ filtering
§ dynamic querying based on associate
metadata
• Scalable and can handle large volumes
of vector data
• Support real-time updates
• Play key role in AI and ML applications
Vector Database
Vector Database
Vector Database
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
48
How data gets into Vector Databases
0.1, 1.2, ..., ..., - 0.5, 3.17
-0.57, 1.0, ..., ..., 2.15,1.1
1.1, 0.74, ..., ..., - 0.2, 1.7
2.1, 0.12, ..., ..., -1.50, 0.3
0.6, -0.71, ..., ..., 0.35, -1.2
1.1, -2.15, ..., ..., 2.1, 0.35
0.4, 0.36, ..., ..., -0.7, -2.45
• Data is input, converted to embeddings (vectors) and stored
• Queries are input, converted to embeddings (vectors) and then similarity metrics are used to find results ("nearest neighbors")
Vector Database
Audio
Images
Documents
embedding models
NLP Transformer
Image Transformer
Audio Transformer
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
49
How does RAG work?
Embedding
Model Data Store
LLM
User
Interface
Prompt
Document
Embeddings
embedded
query Prompt + enhanced
context
response (generative)
User Query and Response Generation
Prompt
Prompt
Original prompt +
matching "docs" (aka
"enhanced context")
LLM Response
-----------
------------------
---------
-------------
---------------------
• For queries/prompts, application gathers
results (most relevant ones) from the
vector database with your data
• Adds results to your regular LLM
query/prompt
• Asks the LLM to answer based on the
augmented/enriched query/prompt
• NOTE: Items returned via RAG search are
existing items from the data store, not
generated content
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
50
|
Demo #3 – Adding RAG to our code
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
51
DIY – github.com/brentlaster/3in1
• Fork if desired
• Click on button in README
to start codespace
• Follow guide.md
techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
52
That’s all - thanks!
techskillstransformations.com
getskillsnow.com
Contact: training@getskillsnow.com
qLinkedIn: brentlaster
qX: @BrentCLaster
qBluesky: brentclaster.bsky.social
qGitHub: brentlaster

More Related Content

PDF
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
PPTX
AWS Lake Formation Deep Dive
PPT
Managed Services Marketing
PDF
Getting Started: Data Factory in Microsoft Fabric (Microsoft Fabric Community...
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r1)
PPTX
Introduction to AWS Lake Formation.pptx
PPTX
unleshing the the Power Azure Open AI - MCT Summit middle east 2024 Riyhad.pptx
PDF
Weaviate Air #3 - New in AI segment.pdf
Enterprise Integration Is Dead! Long Live AI-Driven Integration with Apache C...
AWS Lake Formation Deep Dive
Managed Services Marketing
Getting Started: Data Factory in Microsoft Fabric (Microsoft Fabric Community...
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Introduction to AWS Lake Formation.pptx
unleshing the the Power Azure Open AI - MCT Summit middle east 2024 Riyhad.pptx
Weaviate Air #3 - New in AI segment.pdf

What's hot (20)

PPTX
Getting your enterprise ready for Microsoft 365 Copilot
PPTX
Real-time personalization at scale by Salesforce CDP and Interaction Studio, ...
PDF
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
PDF
MLflow Model Serving
PPTX
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
PPTX
Capgemini Cloud Assessment - A Pathway to Enterprise Cloud Migration
PDF
AWS Data Analytics on AWS
PDF
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r2)
PDF
PDF
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
PDF
Data Warehouse or Data Lake, Which Do I Choose?
PDF
stackconf 2022: Introduction to Vector Search with Weaviate
PDF
Introducing JIRA Service Desk
PPTX
Azure Synapse Analytics Overview (r1)
PDF
Leveraging the Power of Conversational AI for ITSM
PDF
Speed up data preparation for ML pipelines on AWS
PPSX
Dynamics 365
PDF
Microsoft Defender and Azure Sentinel
PPTX
Benefits of the Azure cloud
Getting your enterprise ready for Microsoft 365 Copilot
Real-time personalization at scale by Salesforce CDP and Interaction Studio, ...
Microsoft Build 2023 Updates – Copilot Stack and Azure OpenAI Service (Machin...
MLflow Model Serving
글로벌 기업들의 효과적인 데이터 분석을 위한 Data Lake 구축 및 분석 사례 - 김준형 (AWS 솔루션즈 아키텍트)
Capgemini Cloud Assessment - A Pathway to Enterprise Cloud Migration
AWS Data Analytics on AWS
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Mainframe Integration, Offloading and Replacement with Apache Kafka | Kai Wae...
Data Warehouse or Data Lake, Which Do I Choose?
stackconf 2022: Introduction to Vector Search with Weaviate
Introducing JIRA Service Desk
Azure Synapse Analytics Overview (r1)
Leveraging the Power of Conversational AI for ITSM
Speed up data preparation for ML pipelines on AWS
Dynamics 365
Microsoft Defender and Azure Sentinel
Benefits of the Azure cloud
Ad

Similar to AI 3-in-1: Agents, RAG, and Local Models - Brent Laster (20)

PDF
Gen AI: AI Agents - Making LLMs work together in an organized way - Brent Las...
PDF
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
PDF
clicks2conversations.pdf
PPTX
Code instrumentation
PPTX
#SPFestSEA Introduction to #MicrosoftGraph
PPTX
Machine Learning with GraphLab Create
DOC
RamaRaju_Profile
PPTX
Transferring Software Testing Tools to Practice
PPTX
Cloudera Data Science Challenge
PPTX
Data Science Challenge presentation given to the CinBITools Meetup Group
PDF
Are API Services Taking Over All the Interesting Data Science Problems?
PPTX
#SPSOttawa introduction to the #microsoftGraph
PDF
Sacrificing the golden calf of "coding"
PDF
TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...
PDF
Should I Bug You? Identifying Domain Experts in Software Projects Using Code...
PPTX
SF Architect Interview questions v1.3.pptx
PDF
System Design Interview - from both sides of the table.pdf
PPTX
Recruiting for Drupal #Hiring
PPT
Integris Security - Hacking With Glue ℠
Gen AI: AI Agents - Making LLMs work together in an organized way - Brent Las...
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
clicks2conversations.pdf
Code instrumentation
#SPFestSEA Introduction to #MicrosoftGraph
Machine Learning with GraphLab Create
RamaRaju_Profile
Transferring Software Testing Tools to Practice
Cloudera Data Science Challenge
Data Science Challenge presentation given to the CinBITools Meetup Group
Are API Services Taking Over All the Interesting Data Science Problems?
#SPSOttawa introduction to the #microsoftGraph
Sacrificing the golden calf of "coding"
TechSEO Boost 2021 - Rendering Strategies: Measuring the Devil’s Details in C...
Should I Bug You? Identifying Domain Experts in Software Projects Using Code...
SF Architect Interview questions v1.3.pptx
System Design Interview - from both sides of the table.pdf
Recruiting for Drupal #Hiring
Integris Security - Hacking With Glue ℠
Ad

More from All Things Open (20)

PDF
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
PPTX
Big Data on a Small Budget: Scalable Data Visualization for the Rest of Us - ...
PDF
Let's Create a GitHub Copilot Extension! - Nick Taylor, Pomerium
PDF
Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...
PDF
You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...
PPTX
DON’T PANIC: AI IS COMING – The Hitchhiker’s Guide to AI - Mark Hinkle, Perip...
PDF
Fine-Tuning Large Language Models with Declarative ML Orchestration - Shivay ...
PDF
Leveraging Knowledge Graphs for RAG: A Smarter Approach to Contextual AI Appl...
PPTX
Artificial Intelligence Needs Community Intelligence - Sriram Raghavan, IBM R...
PDF
Don't just talk to AI, do more with AI: how to improve productivity with AI a...
PPTX
Open-Source GenAI vs. Enterprise GenAI: Navigating the Future of AI Innovatio...
PDF
The Death of the Browser - Rachel-Lee Nabors, AgentQL
PDF
Making Operating System updates fast, easy, and safe
PDF
Reshaping the landscape of belonging to transform community
PDF
The Unseen, Underappreciated Security Work Your Maintainers May (or may not) ...
PDF
Integrating Diversity, Equity, and Inclusion into Product Design
PDF
The Open Source Ecosystem for eBPF in Kubernetes
PDF
Open Source Privacy-Preserving Metrics - Sarah Gran & Brandon Pitman
PDF
Open-Source Low-Code - Craig St. Jean, Xebia
PDF
How I Learned to Stop Worrying about my Infrastructure and Love [Open]Tofu
Agentic AI for Developers and Data Scientists Build an AI Agent in 10 Lines o...
Big Data on a Small Budget: Scalable Data Visualization for the Rest of Us - ...
Let's Create a GitHub Copilot Extension! - Nick Taylor, Pomerium
Leveraging Pre-Trained Transformer Models for Protein Function Prediction - T...
You Don't Need an AI Strategy, But You Do Need to Be Strategic About AI - Jes...
DON’T PANIC: AI IS COMING – The Hitchhiker’s Guide to AI - Mark Hinkle, Perip...
Fine-Tuning Large Language Models with Declarative ML Orchestration - Shivay ...
Leveraging Knowledge Graphs for RAG: A Smarter Approach to Contextual AI Appl...
Artificial Intelligence Needs Community Intelligence - Sriram Raghavan, IBM R...
Don't just talk to AI, do more with AI: how to improve productivity with AI a...
Open-Source GenAI vs. Enterprise GenAI: Navigating the Future of AI Innovatio...
The Death of the Browser - Rachel-Lee Nabors, AgentQL
Making Operating System updates fast, easy, and safe
Reshaping the landscape of belonging to transform community
The Unseen, Underappreciated Security Work Your Maintainers May (or may not) ...
Integrating Diversity, Equity, and Inclusion into Product Design
The Open Source Ecosystem for eBPF in Kubernetes
Open Source Privacy-Preserving Metrics - Sarah Gran & Brandon Pitman
Open-Source Low-Code - Craig St. Jean, Xebia
How I Learned to Stop Worrying about my Infrastructure and Love [Open]Tofu

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Big Data Technologies - Introduction.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Empathic Computing: Creating Shared Understanding
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Network Security Unit 5.pdf for BCA BBA.
PPT
Teaching material agriculture food technology
PPTX
Cloud computing and distributed systems.
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
cuic standard and advanced reporting.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Unlocking AI with Model Context Protocol (MCP)
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Diabetes mellitus diagnosis method based random forest with bat algorithm
Big Data Technologies - Introduction.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Empathic Computing: Creating Shared Understanding
NewMind AI Weekly Chronicles - August'25 Week I
Network Security Unit 5.pdf for BCA BBA.
Teaching material agriculture food technology
Cloud computing and distributed systems.
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Advanced Soft Computing BINUS July 2025.pdf
Machine learning based COVID-19 study performance prediction
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
MYSQL Presentation for SQL database connectivity
cuic standard and advanced reporting.pdf
The AUB Centre for AI in Media Proposal.docx
Per capita expenditure prediction using model stacking based on satellite ima...
Unlocking AI with Model Context Protocol (MCP)

AI 3-in-1: Agents, RAG, and Local Models - Brent Laster

  • 1. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 2 AI 3-in-1: Agents, RAG, and Local Models Presented by Brent Laster & Tech Skills Transformations LLC © 2025 Brent C. Laster & Tech Skills Transformations LLC All rights reserved
  • 2. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 3 About me • Founder, Tech Skills Transformations LLC • https://getskillsnow.com • info@getskillsnow.com • Long career in corporate as dev, manager, and director in DevOps and other areas • Author • O'Reilly "reports" • Books • Professional Git • Jenkins 2 – Up and Running • Learning GitHub Actions • Learning GitHub Copilot • AI-Enabled SDLC • Speaker • Social media q LinkedIn: brentlaster q X: @BrentCLaster q Bluesky: brentclaster.bsky.social q GitHub: brentlaster
  • 3. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 4 | Running models locally
  • 4. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 5 Why run models locally? • Privacy - no need to share data • Gives you control over setup, configuration, and customization options § Can tailor LLM to your needs, experiment with settings, integrate into your infra • Can easily swap between different models for different tasks • Work in offline mode • Cost savings § No charges for subscriptions or API calls • No censoring of results
  • 5. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 6 Where to get models + http://huggingface.co/models http://kaggle.com/models
  • 6. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 7 Options for running LLMs locally • GPT4All - https://github.com/nomic- ai/gpt4all • LM Studio - https://lmstudio.ai • Jan AI - https://jan.ai • llama.cpp - https://github.com/ggerganov/llama.cpp • LlamaFile - https://github.com/Mozilla- Ocho/llamafile • Ollama - https://ollama.com/ • HuggingFace Transformers - https://huggingface.co/docs/transformers • More!
  • 7. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 8 Ollama • Command line tool for downloading, exploring and using LLMs on local machine • open source • supports most of Hugging Face's popular models • allows uploading new ones • Links: § main site: https://ollama.com § GitHub: https://github.com/ollama/ • Advantages § speeds up and simplifies » model selection and download » configuring endpoints » integration with Python or JavaScript codebase
  • 8. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 9 Working with Ollama #1 llama3.2 ollama pull
  • 9. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 10 Working with Ollama #2 llama3.2 ollama run >>> query >>> Briefly explain what an AI model is
  • 10. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 12 Working with Ollama #3 llama3.2 ollama serve http://localhost:11434/v1
  • 11. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 13 | Demo #1 – Simple program to work with local model
  • 12. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 14 | Agents
  • 13. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 15 What is an AI Agent? • A system that operates within an environment by using sensors to perceive information, a decision-making mechanism to process and reason about the data, and actuators to take actions that influence or update/respond to the environment • This interaction enables the agent to achieve specific goals autonomously while continuously learning and adapting over time • Agents use LLMs to identify key data, drive decisions, and communicate naturally User LLM Prompt “how to think” Tools + Memory Relevant data and decisions about what to do next Response and/or action in environment
  • 14. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 16 Architectural Features of AI Agents • AI autonomously outlines and executes a logical series of steps for accomplishing a given objective. • Provides the AI with a way to dynamically adapt its approach based on real-time data and feedback.. • Might employ reflection to evaluate and improve responses • Example: A research agent plans search → summarize → generate report. • AI agents interact with external APIs, databases, and functions. • Enhances LLMs by providing access to real- world knowledge. • Reduces hallucinations by using retrieval- augmented generation (RAG). • Example: Calling a Python function to perform complex calculations. • Short-term handles tasks; long term stores knowledge and experience • Memory ensures consistency and efficiency in multi-step decisions • Memory recalls preferences to enhance personalization and user experience • Example: Storing user preferences for future reference or personalized responses Planning Tool Use Memory
  • 15. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 17 Agent Example LLM AI Agent Weather Search Tool Initialize LLM with prompt system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your primary goal is to provide precise, helpful, and clear responses. You have access to the following tools: Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude: float, Outputs: string You should think step by step in order to fulfill the objective with a reasoning process divided into Thought/Action/Observation. This cycle can repeat multiple times if needed. You should first reflect with “Thought: {your_thoughts}” on the current query, then (if necessary), call a tool with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with the prefix “Final Answer:”“”” system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your primary goal is to provide precise, helpful, and clear responses. You have access to the following tools: Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude: float, Outputs: string You should think step by step in order to fulfill the objective with a reasoning process divided into Thought/Action/Observation. This cycle can repeat multiple times if needed. You should first reflect with“Thought: {your_thoughts}”on the current query, then (if necessary), call a tool with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with the prefix“Final Answer:”“””
  • 16. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 18 Agent Example User What’s the weather in Paris? LLM Weather Search Tool Chain of Thought – Step 1: Interpret User Query Thought: ”The user is asking about the weather in Paris. I need to extract ’Paris’ as the location. Action: Extracted location = “Paris” AI Agent system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your primary goal is to provide precise, helpful, and clear responses. You have access to the following tools: Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude: float, Outputs: string You should think step by step in order to fulfill the objective with a reasoning process divided into Thought/Action/Observation. This cycle can repeat multiple times if needed. You should first reflect with “Thought: {your_thoughts}” on the current query, then (if necessary), call a tool with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with the prefix “Final Answer:”“””
  • 17. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 19 Agent Example User What’s the weather in Paris? LLM Weather Search Tool AI Agent Chain of Thought – Step 2: Decide to use tool Thought: ”I need real-time data, so I will call the ‘find_weather’ tool. First, I need to get the latitude and longitude for the tool call. AIResponse( tool_calls=[{ name: “find_weather” parameters: { latitude: “48.8566”, longitude: “2.3522”, }, id: “call_tool123”, type: “tool_invoke” }] ) system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your primary goal is to provide precise, helpful, and clear responses. You have access to the following tools: Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude: float, Outputs: string You should think step by step in order to fulfill the objective with a reasoning process divided into Thought/Action/Observation. This cycle can repeat multiple times if needed. You should first reflect with “Thought: {your_thoughts}” on the current query, then (if necessary), call a tool with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with the prefix “Final Answer:”“””
  • 18. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 20 Agent Example User What’s the weather in Paris? LLM Weather Search Tool AI Agent AIResponse( tool_calls=[{ name: “find_weather” parameters: { latitude: “48.8566”, longitude: “2.3522”, }, id: “call_tool123”, type: “tool_invoke” }] ) { name: “find_weather” parameters: { latitude: “48.8566”, longitude: “2.3522”, }, id: “call_tool123”, type: “tool_invoke” } Agent parses LLM output identifies JSON tool call, parses it, forms it into actual tool call system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your primary goal is to provide precise, helpful, and clear responses. You have access to the following tools: Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude: float, Outputs: string You should think step by step in order to fulfill the objective with a reasoning process divided into Thought/Action/Observation. This cycle can repeat multiple times if needed. You should first reflect with “Thought: {your_thoughts}” on the current query, then (if necessary), call a tool with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with the prefix “Final Answer:”“””
  • 19. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 21 Agent Example User What’s the weather in Paris? LLM Weather Search Tool AI Agent AIResponse( tool_calls=[{ name: “find_weather” parameters: { latitude: “48.8566”, longitude: “2.3522”, }, id: “call_tool123”, type: “tool_invoke” }] ) { name: “find_weather” parameters: { latitude: “48.8566”, longitude: “2.3522”, }, id: “call_tool123”, type: “tool_invoke” } system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your primary goal is to provide precise, helpful, and clear responses. You have access to the following tools: Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude: float, Outputs: string You should think step by step in order to fulfill the objective with a reasoning process divided into Thought/Action/Observation. This cycle can repeat multiple times if needed. You should first reflect with “Thought: {your_thoughts}” on the current query, then (if necessary), call a tool with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with the prefix “Final Answer:”“”” Agent executes tool call
  • 20. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 22 Agent Example User What’s the weather in Paris? LLM Weather Search Tool AI Agent Weather tool returns result ToolResponse( content=“53 and rainy”, name=“find_weather”, tool_invoke_id: “call_tool123” ) AIResponse( tool_calls=[{ name: “find_weather” parameters: { latitude: “48.8566”, longitude: “2.3522”, }, id: “call_tool123”, type: “tool_invoke” }] ) { name: “find_weather” parameters: { latitude: “48.8566”, longitude: “2.3522”, }, id: “call_tool123”, type: “tool_invoke” } system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your primary goal is to provide precise, helpful, and clear responses. You have access to the following tools: Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude: float, Outputs: string You should think step by step in order to fulfill the objective with a reasoning process divided into Thought/Action/Observation. This cycle can repeat multiple times if needed. You should first reflect with “Thought: {your_thoughts}” on the current query, then (if necessary), call a tool with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with the prefix “Final Answer:”“””
  • 21. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 23 Agent Example User What’s the weather in Paris? LLM Weather Search Tool AI Agent ToolResponse( content=“53 and rainy”, name=“find_weather”, tool_invoke_id: “call_tool123” ) AIResponse( tool_calls=[{ name: “find_weather” parameters: { latitude: “48.8566”, longitude: “2.3522”, }, id: “call_tool123”, type: “tool_invoke” }] ) { name: “find_weather” parameters: { latitude: “48.8566”, longitude: “2.3522”, }, id: “call_tool123”, type: “tool_invoke” } system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your primary goal is to provide precise, helpful, and clear responses. You have access to the following tools: Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude: float, Outputs: string You should think step by step in order to fulfill the objective with a reasoning process divided into Thought/Action/Observation. This cycle can repeat multiple times if needed. You should first reflect with “Thought: {your_thoughts}” on the current query, then (if necessary), call a tool with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with the prefix “Final Answer:”“”” Agent includes tool output in message/prompt back to model
  • 22. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 24 Agent Example User What’s the weather in Paris? LLM Weather Search Tool AI Agent ToolResponse( content=“53 and rainy”, name=“find_weather”, tool_invoke_id: “call_tool123” ) Chain of Thought – Step 3 : Interpret JSON Response Thought: ”The tool returned weather data for Paris. I will summarize the information concisely. AIResponse( tool_calls=[{ name: “find_weather” parameters: { latitude: “48.8566”, longitude: “2.3522”, }, id: “call_tool123”, type: “tool_invoke” }] ) { name: “find_weather” parameters: { latitude: “48.8566”, longitude: “2.3522”, }, id: “call_tool123”, type: “tool_invoke” } system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your primary goal is to provide precise, helpful, and clear responses. You have access to the following tools: Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude: float, Outputs: string You should think step by step in order to fulfill the objective with a reasoning process divided into Thought/Action/Observation. This cycle can repeat multiple times if needed. You should first reflect with “Thought: {your_thoughts}” on the current query, then (if necessary), call a tool with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with the prefix “Final Answer:”“””
  • 23. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 25 Agent Example User What’s the weather in Paris? LLM Weather Search Tool AI Agent ToolResponse( content=“53 and rainy”, name=“find_weather”, tool_invoke_id: “call_tool123” ) AIFinalResponse( content=“The current weather in Paris is 53 degrees Celsius with light rain.” ) AIResponse( tool_calls=[{ name: “find_weather” parameters: { latitude: “48.8566”, longitude: “2.3522”, }, id: “call_tool123”, type: “tool_invoke” }] ) { name: “find_weather” parameters: { latitude: “48.8566”, longitude: “2.3522”, }, id: “call_tool123”, type: “tool_invoke” } system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your primary goal is to provide precise, helpful, and clear responses. You have access to the following tools: Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude: float, Outputs: string You should think step by step in order to fulfill the objective with a reasoning process divided into Thought/Action/Observation. This cycle can repeat multiple times if needed. You should first reflect with “Thought: {your_thoughts}” on the current query, then (if necessary), call a tool with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with the prefix “Final Answer:”“””
  • 24. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 26 Agent Example User What’s the weather in Paris? LLM Weather Search Tool AI Agent AIResponse( tool_calls=[{ name: “find_weather” parameters: { location: “Paris”, }, id: “call_tool123”, type: “tool_invoke” }] ) ToolResponse( content=“53 and rainy”, name=“find_weather”, tool_invoke_id: “call_tool123” ) AIFinalResponse( content=“The current weather in Paris is 53 degrees Celsius with light rain.” ) system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your primary goal is to provide precise, helpful, and clear responses. You have access to the following tools: Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude: float, Outputs: string You should think step by step in order to fulfill the objective with a reasoning process divided into Thought/Action/Observation. This cycle can repeat multiple times if needed. You should first reflect with “Thought: {your_thoughts}” on the current query, then (if necessary), call a tool with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with the prefix “Final Answer:”“”” { name: “find_weather” parameters: { latitude: “48.8566”, longitude: “2.3522”, }, id: “call_tool123”, type: “tool_invoke” }
  • 25. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 27 | Demo #2 – Adding agency to our code
  • 26. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 28 | RAG
  • 27. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 29 What is RAG and how does it work? Source: https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/ • Combination of retrieval and generation: RAG combines information retrieval (like a search engine) with text generation (like a language model). • Uses external knowledge: Instead of relying solely on pre-trained knowledge, RAG retrieves relevant documents or data from an external source (like a database or private knowledge bases) to generate more accurate and up-to-date responses. • Improves factual accuracy: By pulling in real-time data or documents, RAG reduces the risk of generating factually incorrect or outdated information. • Two-step process: • Retrieve: The model searches for relevant information from a knowledge source. • Generate: It then uses the retrieved data to create a coherent, contextually accurate answer.
  • 28. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 30 How is RAG setup? Retrieve Documents Embedding Model Data Store Documents / Knowledge Base Document Embeddings Doc Ingestion and Retrieval • You provide data sources and point application to them • Info is retrieved from the data sources and tokenized, embedded and stored in a data store • For queries/prompts, application gathers results (most relevant ones) from the vector database with your data
  • 29. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 31 Embeddings • Embeddings represent text as sets of numeric data - tensors (lots of dimensions) • Each dimension stores some info about the text's meaning, context, or syntactical aspects • Words or sentences with similar meanings are stored closer together in the vector space § If two pieces of text are similar syntactically, they will have similar embeddings (smaller distance between their vectors) • During training, models learn to place text with similar meanings closer together in the embedding space • Common pre-trained models used for generating embeddings include BERT and variants (RoBERTa, DistilBERT) • Once you have embeddings, you can use them for NLP tasks like semantic search, text classification, sentiment analysis
  • 30. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 32 Understanding vectors in AI • Collection of data points that encapsulate an item's relationship to other items Distance to Raleigh, NC USA
  • 31. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 33 Distance to Raleigh, NC USA 3,960 Understanding vectors in AI • Collection of data points that encapsulate an item's relationship to other items
  • 32. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 34 Distance to Raleigh, NC USA 3,960 6,609 Understanding vectors in AI • Collection of data points that encapsulate an item's relationship to other items
  • 33. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 35 Distance to Raleigh, NC USA 3,960 6,609 2,839.4 Understanding vectors in AI • Collection of data points that encapsulate an item's relationship to other items
  • 34. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 36 Distance to Raleigh, NC USA 3,960 6,609 2,839.4 6,001 Understanding vectors in AI • Collection of data points that encapsulate an item's relationship to other items
  • 35. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 37 Distance to Raleigh, NC USA 3,960 6,609 2,839.4 6,001 507.6 Understanding vectors in AI • Collection of data points that encapsulate an item's relationship to other items
  • 36. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 38 Distance to Raleigh, NC USA 3,960 6,609 2,839.4 6,001 507.6 3,872 Understanding vectors in AI • Collection of data points that encapsulate an item's relationship to other items
  • 37. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 39 Distance to Raleigh, NC USA 3,960 6,609 2,839.4 6,001 507.6 3,872 7,679 Understanding vectors in AI • Collection of data points that encapsulate an item's relationship to other items
  • 38. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 40 Distance to Raleigh, NC USA 3,960 6,609 2,839.4 6,001 507.6 3,872 7,679 2,870.1 Understanding vectors in AI • Collection of data points that encapsulate an item's relationship to other items
  • 39. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 41 Semantic meaning / relationships • Suppose we have 3 words • King and Queen are more similar to each other than they are to lunch • In order for neural net to understand the relationships, each word needs to be represented as a vector • Suppose each word is represented by a 2- dimensional vector King Queen Lunch - 130.16 89.5 - 115.43 95.2 - 89.5 34.3
  • 40. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 42 Embedding space • Plotting in 2-dimensional embedding space shows relationships • Way to let NN understand relationships between words • We want the NN to learn that King and Queen are more similar to each other than they are to lunch 2-dimensional space for word embeddings Dimension 2 40 50 60 70 80 90 100 -140 -130 -120 -110 -100 -90 -80 Dimension 1 King Queen Lunch - 130.16 89.5 - 115.43 95.2 - 89.5 34.3 King Queen Lunch
  • 41. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 43 Searching for Vectors - similarity metrics • 3 metrics commonly used to determine similarity of two vectors (2-dimensional representation) Cosine similarity - measure the angle between two vectors; values from - 1 to 1; 1 = both point in same direction; -1 point in opposite directions; 0 = orthogonal (perpendicular) Dot product / inner product - measures how well 2 vectors align with each other; values from - ∞ to ∞; positive values indicate vectors are in same direction; negative values indicate opposite directions; 0 = orthogonal Euclidean distance - measures the distance between two vectors; values from 0 to ∞; 0 = identical; larger numbers farther apart credit: https://towardsdatascience.com/similarity-metrics-in-nlp-acc0777e234c imagine 3 vectors - a,b,c Cosine similarity Dot product / inner product Euclidean distance 0.0141 0.0167 0.9998
  • 42. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 44 Visualizing Embeddings and Vector Similarity source: https://projector.tensorflow.org/?config=https://gist.githubusercontent.com/martin- labrecque/4483ff5a104f0b56417585c3bc9a12f1/raw/57348e12a70c8d70c2c573d3dbc0122ac077556b/journaux_config.json
  • 43. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 45 Vectors and relationships example • Query - what words are related to "dog" in model "English Wikipedia"? Source: http://vectors.nlpl.eu/explore/embeddings/en/MOD_enwiki_upos_skipgram_300_2_2021/dog_NOUN/
  • 44. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 46 | Vector Databases
  • 45. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 47 Vector Databases • Specialized database that index and stores vector embeddings • Useful for § fast retrieval § similarity search • Offer comprehensive data management capabilities § metadata storage § filtering § dynamic querying based on associate metadata • Scalable and can handle large volumes of vector data • Support real-time updates • Play key role in AI and ML applications Vector Database Vector Database Vector Database
  • 46. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 48 How data gets into Vector Databases 0.1, 1.2, ..., ..., - 0.5, 3.17 -0.57, 1.0, ..., ..., 2.15,1.1 1.1, 0.74, ..., ..., - 0.2, 1.7 2.1, 0.12, ..., ..., -1.50, 0.3 0.6, -0.71, ..., ..., 0.35, -1.2 1.1, -2.15, ..., ..., 2.1, 0.35 0.4, 0.36, ..., ..., -0.7, -2.45 • Data is input, converted to embeddings (vectors) and stored • Queries are input, converted to embeddings (vectors) and then similarity metrics are used to find results ("nearest neighbors") Vector Database Audio Images Documents embedding models NLP Transformer Image Transformer Audio Transformer
  • 47. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 49 How does RAG work? Embedding Model Data Store LLM User Interface Prompt Document Embeddings embedded query Prompt + enhanced context response (generative) User Query and Response Generation Prompt Prompt Original prompt + matching "docs" (aka "enhanced context") LLM Response ----------- ------------------ --------- ------------- --------------------- • For queries/prompts, application gathers results (most relevant ones) from the vector database with your data • Adds results to your regular LLM query/prompt • Asks the LLM to answer based on the augmented/enriched query/prompt • NOTE: Items returned via RAG search are existing items from the data store, not generated content
  • 48. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 50 | Demo #3 – Adding RAG to our code
  • 49. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 51 DIY – github.com/brentlaster/3in1 • Fork if desired • Click on button in README to start codespace • Follow guide.md
  • 50. techupskills.com | techskillstransformations.com © 2025 Brent C. Laster & @techupskills 52 That’s all - thanks! techskillstransformations.com getskillsnow.com Contact: training@getskillsnow.com qLinkedIn: brentlaster qX: @BrentCLaster qBluesky: brentclaster.bsky.social qGitHub: brentlaster