apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic Grok, Peter Marelas, New Relic

© 2023 New Relic, Inc. All rights reserved
How we built our
Generative AI assistant
New Relic Grok
Peter Marelas
Chief Architect, APJ
New Relic

© 2023 New Relic, Inc. All rights reserved
Collect
Applications, Web,
Mobile, Cloud. IoT, etc
New Relic Cloud Observability Platform
Infrastructure
Security
DevOps
Web
AI/ML
Mobile
Network
SRE
Back-End
Full-Stack
Cloud
Kubernetes
Synthetics
Serverless
APM
Model Performance
Network
Browser
Mobile
Infrastructure
Distributed Tracing
Log Management
AIOps
Full Stack
O11y
Telemetry
Data Platform
✓ ✓ ✓
No Team Silos
Only pricing model
For ubiquity and scale
No Data Silos
Only purpose built
Telemetry data cloud
No Tool Silos
All monitoring and security
Tools in one connected
experience
Store
Filter, enrich, build
relationships - system,
software, users,
topology maps, etc
Visualise
Real-time dashboards,
service maps, query
builders, curated
experiences
Analyse
Correlation, causal
analysis, trends, anomaly
detection, real-time
alerting, health indicators

© 2023 New Relic, Inc. All rights reserved.
Motivation
How do I
….?

Motivation
What is ….?

Motivation
Peak of
hype cycle
creates
customer
expectation..
* Gartner Hype Cycle for Artiﬁcial Intelligence, 2023

Grok has 4 specific skills
Skill Common NL prefix Tool Source of Knowledge
Answer questions about New
Relic
How do I … NL 2 Docs New Relic
Documentation
Answer questions about
users data
What …
How many …
NL 2 NRQL NRDB
Check if any problems or
anomalies with users
environment
Are … NL 2 Anomalies NRDB
Interpret users dashboards What is … NL 2 Dashboards Dashboard definition,
NRDB

How Grok decides what skill (tool) to use?
“What is my
transaction
count?”
Ask LLM to pick tool
for instruction given
a description of
each tool
NL 2 NRQL
NL 2 Docs NL 2 NRQL
NL 2
Dashboards
NL 2 Anomalies

How Grok processes NL 2 NRQL requests
“What is my
transaction
count?”
Ask LLM to pick
most relevant
tables relating
to user’s
question
Ask LLM to
generate NRQL
from prompt
Get schema for
these tables as
metadata
Validate query
is syntactically
correct
Combine
metadata,
examples and
user’s question
into prompt
Execute NRQL
Render chart
and natural
language
response
Pass response
to LLM and ask
to render
natural text
response
Ask LLM to
correct query
Retrieve similar
examples of
Q/NRQL pairs
from vector
database
Y

How Grok processes NL 2 DOCS
How do I….?
Convert question
to embeddings
using LLM
Generate
prompt with
question and
relevant text
passages
Search vector
database for
similar
embeddings
Pass prompt to
LLM to render
natural response
from passages
in prompt
Extract
passages of
text associated
with similar
embeddings
Render
response to
user
In-context learning with Retrieval Augmented Generation (RAG)

Challenges & Solutions

Natural Language
Instructions
Deterministic
Output
What we want from
an AI assistant..
Speciﬁc
Knowledge DBs
Speciﬁc
Rule Interpreters
General
Output Formats

Natural Language
Instructions
Deterministic
Output
Natural Language
Instructions
Creative
Output
Generic
Knowledge DBs
Generic
Rule Interpreters
Generic
Output Formats
What we want from
an AI assistant..
Foundational LLMs
Speciﬁc
Knowledge DBs
Speciﬁc
Rule Interpreters
General
Output Formats

Generic
Knowledge DBs
Generic
Rule Interpreters
Generic
Output Formats
* Deterministic
Output
Specific
Knowledge DB
Specific
Rules
Natural Language
Instructions
Foundational LLM +
Retrieval Augmented Generation
Natural Language
Instructions
Deterministic
Output
Natural Language
Instructions
Creative
Output
Generic
Knowledge DBs
Generic
Rule Interpreters
Generic
Output Formats
What we want from
an AI assistant..
Foundational LLMs
Specific
Knowledge DBs
Specific
Rule Interpreters
General
Output Formats

What questions do our users want to ask?
User Study
79% said they wanted
to learn something
about a capability or
get insights from their
own dataset.

Finding right prompts (Prompt Engineering)
▪ Ongoing reﬁnement (edge cases)
▪ Add examples to prompts (fewshot)
▪ Add rules to prompt
▪ Feedback mechanism
▪ Robust test harness
▪ ROGUE & BERT scores
▪ 2nd
LLM to assess quality
<Context information>:
You are an AI assistant specialized in translating user questions into New Relic Query Language (NRQL),
with no knowledge of SQL. Given a user's question, information about the user, descriptions of event schemas,
and examples of questions and
answers, your task is to generate an appropriate NRQL query. The provided event schemas contain only the most
relevant ones and you need to use only one.
In the context of New Relic, an entity is a basic data reporting element, such as an application, host, or database
service; each entity has a unique Guid, which is a base64-encoded unique identifier; and if a user references an
entity by its Guid, you should use it in the NRQL you generate, but if entity guid is not explicitly referenced, you
should not use in the query that you will generate.
The wording of the question should tell you whether the user wants totals or data over a time interval. Use
TIMESERIES clause in the NRQL query that you generate only if the user requests data over time or per
day/hour. Otherwise, do not use it.
<How to select time range in NRQL queries>:
Every NRQL query should contain a SINCE and may contain an UNTIL clause, as this is the only viable way to
select a time range in NRQL. If the SINCE clause is not used, the query uses the last 1 hour of data by default,
but you should always use the SINCE clause in the query you generate, and if the time range is not explicitly
specified, use SINCE 1 hour ago.
<Examples of valid NRQL queries with time range selections>:
<User question>: How many transactions happened today?
<NRQL query>: SELECT count(*) FROM Transaction SINCE TODAY
<User question>: How many transactions happened on 25th of April?
<NRQL query>: FROM Transaction SELECT count(*) SINCE '2023-04-25 00:00:00' UNTIL '2023-04-25 23:59:59'
<User question>: How many transactions happened in the previous calendar week?
<NRQL query>: FROM Transaction SELECT COUNT(*) SINCE LAST WEEK UNTIL THIS WEEK
<User question>: How many transactions happened on Monday?
<NRQL query>: FROM Transaction SELECT count(*) SINCE MONDAY until TUESDAY
<User question>: How many transactions per day occurred this year until 10 days ago
<NRQL query>: FROM Transaction SELECT count(*) SINCE THIS YEAR until 10 days ago TIMESERIES 1 day
System
instruction
Examples
Rules
User question

Performance
▪ High variance
▪ Time to intermediate token
▪ Time to ﬁrst token
▪ Time to last token
▪ Intermediate messages
▪ Distribute requests
▪ Cache some answers

Microsoft Azure OpenAI Service
Cost of LLM
Model Prompt
(1000 tokens)
Completion
(1000 tokens)
GPT-4 $0.003 $0.006
Ada
(embeddings)
$0.0001
Query Avg
prompt
tokens
Avg
completion
tokens
Avg cost
per e2e
request
NL2Docs 3016 568 $0.13
NL2NRQL 6516 118 $0.20
New Relic Grok
Users Daily docs requests Daily NL2NRQL requests Monthly cost
1 user 5 5 $49
100 users 500 500 $4,900
10,000 users 50,000 50,000 $490,000

What’s next for Grok?
▪ Improve quality responses
▪ Experiment own models
▪ Developing new skill (NL2Conﬁg)

Peter Marelas
Chief Architect, APJ
https://www.linkedin.com/in/peter-marelas

Deprecation of LLMs
▪ Robust Test Harness
▪ ROUGE – Quantify overlap of words
between generated output and
reference text
▪ BERTScore – semantic similarity
▪ Use GPT4 to evaluate ($’s)

LLM Rate Limits
▪ 40,000 / 5200 = 7 requests/min
▪ Multiple endpoints
▪ Queue
▪ Distribute requests
▪ Limit max completion tokens (counts
towards token-per-minute limit)

LLM Context Length Limits
▪ GPT-4 8192 context length
▪ Prompt + completion within context
length to avoid hallucinations
▪ Transform prompt
▪ Remove extra spaces
▪ Remove pronouns
▪ Convert JSON to CSV

LLM have no knowledge after 2021
▪ In-context learning
▪ Pass question + relevant docs
▪ Only as good as algo used to ﬁnd
relevant docs
▪ Cross-encoder re-ranking

How are similar documents / examples found?
passages
doc
Text Embedding
[0.354, 0.234, … , 0.87]
1536 dimensions
Vector DB
Indexing
Search
text
Text Embedding
[0.354, 0.234, … , 0.87]
1536 dimensions
Vector DB
Return TopK
(maxmarginal
relevancy
convert search
convert store
Indexed by
embedding

LLM
What tools do we use?
+
Vector Database
LLM Logic

apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic Grok, Peter Marelas, New Relic

More Related Content

Similar to apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic Grok, Peter Marelas, New Relic (20)

More from apidays (20)

Recently uploaded (20)

apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic Grok, Peter Marelas, New Relic