AI 3-in-1: Agents, RAG, and Local Models - Brent Laster

techupskills.com | techskillstransformations.com
© 2025 Brent C. Laster &
@techupskills
2
AI 3-in-1: Agents, RAG, and Local Models
Presented by Brent Laster &
Tech Skills Transformations LLC
© 2025 Brent C. Laster & Tech Skills Transformations LLC
All rights reserved

@techupskills
3
About me
• Founder, Tech Skills Transformations LLC
• https://getskillsnow.com
• info@getskillsnow.com
• Long career in corporate as dev, manager,
and director in DevOps and other areas
• Author
• O'Reilly "reports"
• Books
• Professional Git
• Jenkins 2 – Up and
Running
• Learning GitHub
Actions
• Learning GitHub
Copilot
• AI-Enabled SDLC
• Speaker
• Social media
q LinkedIn: brentlaster
q X: @BrentCLaster
q Bluesky:
brentclaster.bsky.social
q GitHub: brentlaster

@techupskills
4
|
Running models locally

@techupskills
5
Why run models locally?
• Privacy - no need to share data
• Gives you control over setup, configuration, and customization options
§ Can tailor LLM to your needs, experiment with settings, integrate into your infra
• Can easily swap between different models for different tasks
• Work in offline mode
• Cost savings
§ No charges for subscriptions or API calls
• No censoring of results

@techupskills
6
Where to get models +
http://huggingface.co/models
http://kaggle.com/models

@techupskills
7
Options for running LLMs locally
• GPT4All - https://github.com/nomic-
ai/gpt4all
• LM Studio - https://lmstudio.ai
• Jan AI - https://jan.ai
• llama.cpp -
https://github.com/ggerganov/llama.cpp
• LlamaFile - https://github.com/Mozilla-
Ocho/llamafile
• Ollama - https://ollama.com/
• HuggingFace Transformers -
https://huggingface.co/docs/transformers
• More!

@techupskills
8
Ollama
• Command line tool for downloading,
exploring and using LLMs on local machine
• open source
• supports most of Hugging Face's popular
models
• allows uploading new ones
• Links:
§ main site: https://ollama.com
§ GitHub: https://github.com/ollama/
• Advantages
§ speeds up and simplifies
» model selection and download
» configuring endpoints
» integration with Python or JavaScript codebase

@techupskills
9
Working with Ollama #1
llama3.2
ollama pull

@techupskills
10
llama3.2
ollama run
>>> query
>>> Briefly explain what
an AI model is

@techupskills
12
llama3.2
ollama serve
http://localhost:11434/v1

@techupskills
13
|
Demo #1 – Simple program to work with local model

@techupskills
14
|
Agents

@techupskills
15
What is an AI Agent?
• A system that operates within an environment by using sensors to
perceive information, a decision-making mechanism to process and
reason about the data, and actuators to take actions that influence or
update/respond to the environment
• This interaction enables the agent to achieve specific goals
autonomously while continuously learning and adapting over time
• Agents use LLMs to identify key data, drive decisions, and communicate
naturally
User
LLM
Prompt “how to
think”
Tools +
Memory
Relevant
data and
decisions
about what
to do next
Response
and/or
action in
environment

@techupskills
16
Architectural Features of AI Agents
• AI autonomously outlines and
executes a logical series of
steps for accomplishing a
given objective.
• Provides the AI with a way to
dynamically adapt its
approach based on real-time
data and feedback..
• Might employ reflection to
evaluate and improve
responses
• Example: A research agent
plans search → summarize →
generate report.
• AI agents interact with
external APIs, databases,
and functions.
• Enhances LLMs by
providing access to real-
world knowledge.
• Reduces hallucinations
by using retrieval-
augmented generation
(RAG).
• Example: Calling a
Python function to
perform complex
calculations.
• Short-term handles tasks;
long term stores knowledge
and experience
• Memory ensures
consistency and efficiency in
multi-step decisions
• Memory recalls preferences
to enhance personalization
and user experience
• Example: Storing user
preferences for future
reference or personalized
responses
Planning Tool Use Memory

@techupskills
17
Agent Example
LLM
AI Agent
Weather
Search Tool
Initialize LLM with prompt
system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your primary
goal is to provide precise, helpful, and clear responses.
You have access to the following tools:
Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude: float,
Outputs: string
You should think step by step in order to fulfill the objective with a reasoning process divided into
Thought/Action/Observation. This cycle can repeat multiple times if needed.
You should first reflect with “Thought: {your_thoughts}” on the current query, then (if necessary), call a tool
with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with the prefix
“Final Answer:”“””
system_message=“””You are an AI assistant designed to help users
accurately and efficiently. Your primary goal is to provide precise, helpful,
and clear responses.
Tool Name: find_weather, Description: Get weather for a location.,
Arguments: latitude: float, longitude: float, Outputs: string
You should think step by step in order to fulfill the objective with a reasoning
process divided into Thought/Action/Observation. This cycle can repeat
multiple times if needed.
You should first reflect with“Thought: {your_thoughts}”on the current
query, then (if necessary), call a tool with the proper JSON formatting
“Action: {JSON_BLOB}”, or else print your final answer starting with the
prefix“Final Answer:”“””

@techupskills
18
Agent Example
User
What’s the
weather in
Paris?
LLM
Weather
Search Tool
Chain of Thought – Step 1: Interpret User Query
Thought: ”The user is asking about the weather
in Paris. I need to extract ’Paris’ as the location.
Action: Extracted location = “Paris”
AI Agent
system_message=“””You are an AI assistant designed to help users accurately and efficiently. Your
primary goal is to provide precise, helpful, and clear responses.
Tool Name: find_weather, Description: Get weather for a location., Arguments: latitude: float, longitude:
float, Outputs: string
You should first reflect with “Thought: {your_thoughts}” on the current query, then (if necessary), call a
tool with the proper JSON formatting “Action: {JSON_BLOB}”, or else print your final answer starting with
the prefix “Final Answer:”“””

@techupskills
19
Agent Example
User
What’s the
weather in
Paris?
LLM
Weather
Search Tool
AI Agent
Chain of Thought – Step 2: Decide to use tool
Thought: ”I need real-time data, so I will call
the ‘find_weather’ tool. First, I need to get the
latitude and longitude for the tool call.
AIResponse(
tool_calls=[{
name: “find_weather”
parameters: {
latitude: “48.8566”,
longitude: “2.3522”,
},
id: “call_tool123”,
type: “tool_invoke”
}]
)

@techupskills
20
Agent Example
User
What’s the
weather in
Paris?
LLM
Weather
Search Tool
AI Agent
AIResponse(
tool_calls=[{
parameters: {
},
}]
)
{
parameters: {
},
}
Agent parses LLM output
identifies JSON tool call,
parses it, forms it into
actual tool call

@techupskills
21
Agent Example
User
What’s the
weather in
Paris?
LLM
Weather
Search Tool
AI Agent
AIResponse(
tool_calls=[{
parameters: {
},
}]
)
{
parameters: {
},
}
Agent executes tool call

@techupskills
22
Agent Example
User
What’s the
weather in
Paris?
LLM
Weather
Search Tool
AI Agent
Weather tool returns result
ToolResponse(
content=“53 and
rainy”,
name=“find_weather”,
tool_invoke_id:
“call_tool123”
)
AIResponse(
tool_calls=[{
parameters: {
},
}]
)
{
parameters: {
},
}

@techupskills
23
Agent Example
User
What’s the
weather in
Paris?
LLM
Weather
Search Tool
AI Agent
ToolResponse(
content=“53 and
rainy”,
tool_invoke_id:
“call_tool123”
)
AIResponse(
tool_calls=[{
parameters: {
},
}]
)
{
parameters: {
},
}
Agent includes tool
output in
message/prompt back
to model

@techupskills
24
Agent Example
User
What’s the
weather in
Paris?
LLM
Weather
Search Tool
AI Agent
ToolResponse(
content=“53 and
rainy”,
tool_invoke_id:
“call_tool123”
)
Chain of Thought – Step 3 : Interpret JSON Response
Thought: ”The tool returned weather data for Paris. I
will summarize the information concisely.
AIResponse(
tool_calls=[{
parameters: {
},
}]
)
{
parameters: {
},
}

@techupskills
25
Agent Example
User
What’s the
weather in
Paris?
LLM
Weather
Search Tool
AI Agent
ToolResponse(
content=“53 and
rainy”,
tool_invoke_id:
“call_tool123”
)
AIFinalResponse(
content=“The current
weather in Paris is 53
degrees Celsius with
light rain.”
)
AIResponse(
tool_calls=[{
parameters: {
},
}]
)
{
parameters: {
},
}

@techupskills
26
Agent Example
User
What’s the
weather in
Paris?
LLM
Weather
Search Tool
AI Agent
AIResponse(
tool_calls=[{
parameters: {
location: “Paris”,
},
}]
)
ToolResponse(
content=“53 and
rainy”,
tool_invoke_id:
“call_tool123”
)
AIFinalResponse(
content=“The current
weather in Paris is 53
degrees Celsius with
light rain.”
)
{
parameters: {
},
}

@techupskills
27
|
Demo #2 – Adding agency to our code

@techupskills
28
|
RAG

@techupskills
29
What is RAG and how does it work?
Source: https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/
• Combination of retrieval and generation: RAG combines information retrieval (like a search engine) with text generation (like a
language model).
• Uses external knowledge: Instead of relying solely on pre-trained knowledge, RAG retrieves relevant documents or data from an
external source (like a database or private knowledge bases) to generate more accurate and up-to-date responses.
• Improves factual accuracy: By pulling in real-time data or documents, RAG reduces the risk of generating factually incorrect or
outdated information.
• Two-step process:
• Retrieve: The model searches for relevant information from a knowledge source.
• Generate: It then uses the retrieved data to create a coherent, contextually accurate answer.

@techupskills
30
How is RAG setup?
Retrieve
Documents
Embedding
Model Data Store
Documents /
Knowledge Base
Document
Embeddings
Doc Ingestion and Retrieval
• You provide data sources and point application to them
• Info is retrieved from the data sources and tokenized, embedded and stored in a data store
• For queries/prompts, application gathers results (most relevant ones) from the vector
database with your data

@techupskills
31
Embeddings
• Embeddings represent text as sets of numeric data - tensors (lots of
dimensions)
• Each dimension stores some info about the text's meaning, context,
or syntactical aspects
• Words or sentences with similar meanings are stored closer together
in the vector space
§ If two pieces of text are similar syntactically, they will have
similar embeddings (smaller distance between their vectors)
• During training, models learn to place text with similar meanings
closer together in the embedding space
• Common pre-trained models used for generating embeddings
include BERT and variants (RoBERTa, DistilBERT)
• Once you have embeddings, you can use them for NLP tasks like
semantic search, text classification, sentiment analysis

@techupskills
32
Understanding vectors in AI
• Collection of data points that encapsulate an item's relationship to
other items
Distance to Raleigh, NC USA

@techupskills
33
3,960
other items

@techupskills
34
3,960
6,609
other items

@techupskills
35
3,960
6,609
2,839.4
other items

@techupskills
36
3,960
6,609
2,839.4
6,001
other items

@techupskills
37
3,960
6,609
2,839.4
6,001
507.6
other items

@techupskills
38
3,960
6,609
2,839.4
6,001
507.6
3,872
other items

@techupskills
39
3,960
6,609
2,839.4
6,001
507.6
3,872
7,679
other items

@techupskills
40
3,960
6,609
2,839.4
6,001
507.6
3,872
7,679
2,870.1
other items

@techupskills
41
Semantic meaning / relationships
• Suppose we have 3
words
• King and Queen are
more similar to each
other than they are to
lunch
• In order for neural net to
understand the
relationships, each word
needs to be represented
as a vector
• Suppose each word is
represented by a 2-
dimensional vector
King
Queen
Lunch
- 130.16
89.5
- 115.43
95.2
- 89.5
34.3

@techupskills
42
Embedding space
• Plotting in 2-dimensional embedding space
shows relationships
• Way to let NN understand relationships
between words
• We want the NN to learn that King and
Queen are more similar to each other than
they are to lunch
2-dimensional space for word embeddings
Dimension
2
40
50
60
70
80
90
100
-140 -130 -120 -110 -100 -90 -80
Dimension 1
King
Queen
Lunch
- 130.16
89.5
- 115.43
95.2
- 89.5
34.3
King
Queen
Lunch

@techupskills
43
Searching for Vectors - similarity metrics
• 3 metrics commonly used to determine similarity of two vectors (2-dimensional representation)
Cosine similarity - measure the angle between two vectors; values from -
1 to 1; 1 = both point in same direction; -1 point in opposite directions; 0 =
orthogonal (perpendicular)
Dot product / inner product - measures how well 2 vectors align with
each other; values from - ∞ to ∞; positive values indicate vectors are in
same direction; negative values indicate opposite directions; 0 = orthogonal
Euclidean distance - measures the distance between two vectors; values
from 0 to ∞; 0 = identical; larger numbers farther apart credit: https://towardsdatascience.com/similarity-metrics-in-nlp-acc0777e234c
imagine 3 vectors - a,b,c
Cosine similarity
Dot product / inner product
Euclidean distance
0.0141
0.0167
0.9998

@techupskills
44
Visualizing Embeddings and Vector Similarity
source: https://projector.tensorflow.org/?config=https://gist.githubusercontent.com/martin-
labrecque/4483ff5a104f0b56417585c3bc9a12f1/raw/57348e12a70c8d70c2c573d3dbc0122ac077556b/journaux_config.json

@techupskills
45
Vectors and relationships example
• Query - what words are related to "dog" in model "English Wikipedia"?
Source: http://vectors.nlpl.eu/explore/embeddings/en/MOD_enwiki_upos_skipgram_300_2_2021/dog_NOUN/

@techupskills
46
|
Vector Databases

@techupskills
47
Vector Databases
• Specialized database that index and
stores vector embeddings
• Useful for
§ fast retrieval
§ similarity search
• Offer comprehensive data management
capabilities
§ metadata storage
§ filtering
§ dynamic querying based on associate
metadata
• Scalable and can handle large volumes
of vector data
• Support real-time updates
• Play key role in AI and ML applications
Vector Database
Vector Database
Vector Database

@techupskills
48
How data gets into Vector Databases
0.1, 1.2, ..., ..., - 0.5, 3.17
-0.57, 1.0, ..., ..., 2.15,1.1
1.1, 0.74, ..., ..., - 0.2, 1.7
2.1, 0.12, ..., ..., -1.50, 0.3
0.6, -0.71, ..., ..., 0.35, -1.2
1.1, -2.15, ..., ..., 2.1, 0.35
0.4, 0.36, ..., ..., -0.7, -2.45
• Data is input, converted to embeddings (vectors) and stored
• Queries are input, converted to embeddings (vectors) and then similarity metrics are used to find results ("nearest neighbors")
Vector Database
Audio
Images
Documents
embedding models
NLP Transformer
Image Transformer
Audio Transformer

@techupskills
49
How does RAG work?
Embedding
Model Data Store
LLM
User
Interface
Prompt
Document
Embeddings
embedded
query Prompt + enhanced
context
response (generative)
User Query and Response Generation
Prompt
Prompt
Original prompt +
matching "docs" (aka
"enhanced context")
LLM Response
-----------
------------------
---------
-------------
---------------------
• For queries/prompts, application gathers
results (most relevant ones) from the
vector database with your data
• Adds results to your regular LLM
query/prompt
• Asks the LLM to answer based on the
augmented/enriched query/prompt
• NOTE: Items returned via RAG search are
existing items from the data store, not
generated content

@techupskills
50
|
Demo #3 – Adding RAG to our code

@techupskills
51
DIY – github.com/brentlaster/3in1
• Fork if desired
• Click on button in README
to start codespace
• Follow guide.md

@techupskills
52
That’s all - thanks!
techskillstransformations.com
getskillsnow.com
Contact: training@getskillsnow.com
qLinkedIn: brentlaster
qX: @BrentCLaster
qBluesky: brentclaster.bsky.social
qGitHub: brentlaster

AI 3-in-1: Agents, RAG, and Local Models - Brent Laster

More Related Content

What's hot (20)

Similar to AI 3-in-1: Agents, RAG, and Local Models - Brent Laster (20)

More from All Things Open (20)

Recently uploaded (20)

AI 3-in-1: Agents, RAG, and Local Models - Brent Laster