SlideShare a Scribd company logo
© 2023 New Relic, Inc. All rights reserved
How we built our
Generative AI assistant
New Relic Grok
Peter Marelas
Chief Architect, APJ
New Relic
apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic Grok, Peter Marelas, New Relic
© 2023 New Relic, Inc. All rights reserved
Collect
Applications, Web,
Mobile, Cloud. IoT, etc
New Relic Cloud Observability Platform
Infrastructure
Security
DevOps
Web
AI/ML
Mobile
Network
SRE
Back-End
Full-Stack
Cloud
Kubernetes
Synthetics
Serverless
APM
Model Performance
Network
Browser
Mobile
Infrastructure
Distributed Tracing
Log Management
AIOps
Full Stack
O11y
Telemetry
Data Platform
✓ ✓ ✓
No Team Silos
Only pricing model
For ubiquity and scale
No Data Silos
Only purpose built
Telemetry data cloud
No Tool Silos
All monitoring and security
Tools in one connected
experience
Store
Filter, enrich, build
relationships - system,
software, users,
topology maps, etc
Visualise
Real-time dashboards,
service maps, query
builders, curated
experiences
Analyse
Correlation, causal
analysis, trends, anomaly
detection, real-time
alerting, health indicators
© 2023 New Relic, Inc. All rights reserved.
Motivation
How do I
….?
© 2023 New Relic, Inc. All rights reserved.
Motivation
What is ….?
© 2023 New Relic, Inc. All rights reserved.
Motivation
Peak of
hype cycle
creates
customer
expectation..
* Gartner Hype Cycle for Artificial Intelligence, 2023
apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic Grok, Peter Marelas, New Relic
© 2023 New Relic, Inc. All rights reserved.
Grok has 4 specific skills
Skill Common NL prefix Tool Source of Knowledge
Answer questions about New
Relic
How do I … NL 2 Docs New Relic
Documentation
Answer questions about
users data
What …
How many …
NL 2 NRQL NRDB
Check if any problems or
anomalies with users
environment
Are … NL 2 Anomalies NRDB
Interpret users dashboards What is … NL 2 Dashboards Dashboard definition,
NRDB
apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic Grok, Peter Marelas, New Relic
apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic Grok, Peter Marelas, New Relic
apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic Grok, Peter Marelas, New Relic
apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic Grok, Peter Marelas, New Relic
apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic Grok, Peter Marelas, New Relic
© 2023 New Relic, Inc. All rights reserved.
How Grok decides what skill (tool) to use?
“What is my
transaction
count?”
Ask LLM to pick tool
for instruction given
a description of
each tool
NL 2 NRQL
NL 2 Docs NL 2 NRQL
NL 2
Dashboards
NL 2 Anomalies
© 2023 New Relic, Inc. All rights reserved.
How Grok processes NL 2 NRQL requests
“What is my
transaction
count?”
Ask LLM to pick
most relevant
tables relating
to user’s
question
Ask LLM to
generate NRQL
from prompt
Get schema for
these tables as
metadata
Validate query
is syntactically
correct
Combine
metadata,
examples and
user’s question
into prompt
Execute NRQL
Render chart
and natural
language
response
Pass response
to LLM and ask
to render
natural text
response
Ask LLM to
correct query
Retrieve similar
examples of
Q/NRQL pairs
from vector
database
Y
© 2023 New Relic, Inc. All rights reserved.
How Grok processes NL 2 DOCS
How do I….?
Convert question
to embeddings
using LLM
Generate
prompt with
question and
relevant text
passages
Search vector
database for
similar
embeddings
Pass prompt to
LLM to render
natural response
from passages
in prompt
Extract
passages of
text associated
with similar
embeddings
Render
response to
user
In-context learning with Retrieval Augmented Generation (RAG)
© 2023 New Relic, Inc. All rights reserved.
Challenges & Solutions
© 2023 New Relic, Inc. All rights reserved.
Natural Language
Instructions
Deterministic
Output
What we want from
an AI assistant..
Specific
Knowledge DBs
Specific
Rule Interpreters
General
Output Formats
© 2023 New Relic, Inc. All rights reserved.
Natural Language
Instructions
Deterministic
Output
Natural Language
Instructions
Creative
Output
Generic
Knowledge DBs
Generic
Rule Interpreters
Generic
Output Formats
What we want from
an AI assistant..
Foundational LLMs
Specific
Knowledge DBs
Specific
Rule Interpreters
General
Output Formats
© 2023 New Relic, Inc. All rights reserved.
Generic
Knowledge DBs
Generic
Rule Interpreters
Generic
Output Formats
* Deterministic
Output
Specific
Knowledge DB
Specific
Rules
Natural Language
Instructions
Foundational LLM +
Retrieval Augmented Generation
Natural Language
Instructions
Deterministic
Output
Natural Language
Instructions
Creative
Output
Generic
Knowledge DBs
Generic
Rule Interpreters
Generic
Output Formats
What we want from
an AI assistant..
Foundational LLMs
Specific
Knowledge DBs
Specific
Rule Interpreters
General
Output Formats
© 2023 New Relic, Inc. All rights reserved.
What questions do our users want to ask?
User Study
79% said they wanted
to learn something
about a capability or
get insights from their
own dataset.
© 2023 New Relic, Inc. All rights reserved.
Finding right prompts (Prompt Engineering)
▪ Ongoing refinement (edge cases)
▪ Add examples to prompts (fewshot)
▪ Add rules to prompt
▪ Feedback mechanism
▪ Robust test harness
▪ ROGUE & BERT scores
▪ 2nd
LLM to assess quality
<Context information>:
You are an AI assistant specialized in translating user questions into New Relic Query Language (NRQL),
with no knowledge of SQL. Given a user's question, information about the user, descriptions of event schemas,
and examples of questions and
answers, your task is to generate an appropriate NRQL query. The provided event schemas contain only the most
relevant ones and you need to use only one.
In the context of New Relic, an entity is a basic data reporting element, such as an application, host, or database
service; each entity has a unique Guid, which is a base64-encoded unique identifier; and if a user references an
entity by its Guid, you should use it in the NRQL you generate, but if entity guid is not explicitly referenced, you
should not use in the query that you will generate.
The wording of the question should tell you whether the user wants totals or data over a time interval. Use
TIMESERIES clause in the NRQL query that you generate only if the user requests data over time or per
day/hour. Otherwise, do not use it.
<How to select time range in NRQL queries>:
Every NRQL query should contain a SINCE and may contain an UNTIL clause, as this is the only viable way to
select a time range in NRQL. If the SINCE clause is not used, the query uses the last 1 hour of data by default,
but you should always use the SINCE clause in the query you generate, and if the time range is not explicitly
specified, use SINCE 1 hour ago.
<Examples of valid NRQL queries with time range selections>:
<User question>: How many transactions happened today?
<NRQL query>: SELECT count(*) FROM Transaction SINCE TODAY
<User question>: How many transactions happened on 25th of April?
<NRQL query>: FROM Transaction SELECT count(*) SINCE '2023-04-25 00:00:00' UNTIL '2023-04-25 23:59:59'
<User question>: How many transactions happened in the previous calendar week?
<NRQL query>: FROM Transaction SELECT COUNT(*) SINCE LAST WEEK UNTIL THIS WEEK
<User question>: How many transactions happened on Monday?
<NRQL query>: FROM Transaction SELECT count(*) SINCE MONDAY until TUESDAY
<User question>: How many transactions per day occurred this year until 10 days ago
<NRQL query>: FROM Transaction SELECT count(*) SINCE THIS YEAR until 10 days ago TIMESERIES 1 day
System
instruction
Examples
Rules
User question
© 2023 New Relic, Inc. All rights reserved.
Performance
▪ High variance
▪ Time to intermediate token
▪ Time to first token
▪ Time to last token
▪ Intermediate messages
▪ Distribute requests
▪ Cache some answers
© 2023 New Relic, Inc. All rights reserved.
Microsoft Azure OpenAI Service
Cost of LLM
Model Prompt
(1000 tokens)
Completion
(1000 tokens)
GPT-4 $0.003 $0.006
Ada
(embeddings)
$0.0001
Query Avg
prompt
tokens
Avg
completion
tokens
Avg cost
per e2e
request
NL2Docs 3016 568 $0.13
NL2NRQL 6516 118 $0.20
New Relic Grok
Users Daily docs requests Daily NL2NRQL requests Monthly cost
1 user 5 5 $49
100 users 500 500 $4,900
10,000 users 50,000 50,000 $490,000
© 2023 New Relic, Inc. All rights reserved.
What’s next for Grok?
▪ Improve quality responses
▪ Experiment own models
▪ Developing new skill (NL2Config)
© 2023 New Relic, Inc. All rights reserved.
Peter Marelas
Chief Architect, APJ
https://www.linkedin.com/in/peter-marelas
© 2023 New Relic, Inc. All rights reserved.
Deprecation of LLMs
▪ Robust Test Harness
▪ ROUGE – Quantify overlap of words
between generated output and
reference text
▪ BERTScore – semantic similarity
▪ Use GPT4 to evaluate ($’s)
© 2023 New Relic, Inc. All rights reserved.
LLM Rate Limits
▪ 40,000 / 5200 = 7 requests/min
▪ Multiple endpoints
▪ Queue
▪ Distribute requests
▪ Limit max completion tokens (counts
towards token-per-minute limit)
© 2023 New Relic, Inc. All rights reserved.
LLM Context Length Limits
▪ GPT-4 8192 context length
▪ Prompt + completion within context
length to avoid hallucinations
▪ Transform prompt
▪ Remove extra spaces
▪ Remove pronouns
▪ Convert JSON to CSV
© 2023 New Relic, Inc. All rights reserved.
LLM have no knowledge after 2021
▪ In-context learning
▪ Pass question + relevant docs
▪ Only as good as algo used to find
relevant docs
▪ Cross-encoder re-ranking
© 2023 New Relic, Inc. All rights reserved.
How are similar documents / examples found?
passages
doc
Text Embedding
[0.354, 0.234, … , 0.87]
1536 dimensions
Vector DB
Indexing
Search
text
Text Embedding
[0.354, 0.234, … , 0.87]
1536 dimensions
Vector DB
Return TopK
(maxmarginal
relevancy
convert search
convert store
Indexed by
embedding
© 2023 New Relic, Inc. All rights reserved.
LLM
What tools do we use?
+
Vector Database
LLM Logic

More Related Content

PPTX
LlamaIndex_HassGeek_Workshop_for_AI.pptx
PDF
Intelligent query converter a domain independent interfacefor conversion
PDF
IRJET- A Key-Policy Attribute based Temporary Keyword Search Scheme for S...
PPT
Innovate2011 Keys to Building OSLC Integrations
PDF
Elastic Stack: Using data for insight and action
PDF
OutSystsems User Group Netherlands September 2024.pdf
PPTX
SF Architect Interview questions v1.3.pptx
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
LlamaIndex_HassGeek_Workshop_for_AI.pptx
Intelligent query converter a domain independent interfacefor conversion
IRJET- A Key-Policy Attribute based Temporary Keyword Search Scheme for S...
Innovate2011 Keys to Building OSLC Integrations
Elastic Stack: Using data for insight and action
OutSystsems User Group Netherlands September 2024.pdf
SF Architect Interview questions v1.3.pptx
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf

Similar to apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic Grok, Peter Marelas, New Relic (20)

PPTX
Webinar : Nouveautés de MongoDB 3.2
PPTX
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
PPTX
Webminar - Novedades de MongoDB 3.2
PPTX
Salesforce Multitenant Architecture: How We Do the Magic We Do
DOC
DOC
Fathima Resume
DOC
Resume
PDF
Multi-Keyword Ranked Search in Encrypted Cloud Storage
PDF
IRJET- Automatic Database Schema Generator
PPTX
ADDO Open Source Observability Tools
PDF
Web-Based System for Software Requirements Quality Analysis Using Case-Based ...
PDF
Sandesh_Rao_Unlocking Oracle Database Mysteries AHF Insights and the AI-LLM D...
PDF
Evaluating RAG pipelines built on unstructured data
PPTX
Northern New England Tableau User Group - September 2024 Meeting
PDF
mysql_pn_heatwave.pdf
PPTX
OUG Scotland 2014 - NoSQL and MySQL - The best of both worlds
PDF
Adobe After Effects 2025 v25.1.0 Free Download
PDF
iTop VPN Crack 6.3.3 serial Key Free 2025
PDF
DriverPack Solution Download Full ISO free
PDF
Atlantis Word Processor 4.4.5.1 Free Download
Webinar : Nouveautés de MongoDB 3.2
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Webminar - Novedades de MongoDB 3.2
Salesforce Multitenant Architecture: How We Do the Magic We Do
Fathima Resume
Resume
Multi-Keyword Ranked Search in Encrypted Cloud Storage
IRJET- Automatic Database Schema Generator
ADDO Open Source Observability Tools
Web-Based System for Software Requirements Quality Analysis Using Case-Based ...
Sandesh_Rao_Unlocking Oracle Database Mysteries AHF Insights and the AI-LLM D...
Evaluating RAG pipelines built on unstructured data
Northern New England Tableau User Group - September 2024 Meeting
mysql_pn_heatwave.pdf
OUG Scotland 2014 - NoSQL and MySQL - The best of both worlds
Adobe After Effects 2025 v25.1.0 Free Download
iTop VPN Crack 6.3.3 serial Key Free 2025
DriverPack Solution Download Full ISO free
Atlantis Word Processor 4.4.5.1 Free Download
Ad

More from apidays (20)

PDF
apidays Munich 2025 - The Physics of Requirement Sciences Through Application...
PDF
apidays Munich 2025 - Developer Portals, API Catalogs, and Marketplaces, Miri...
PDF
apidays Munich 2025 - Making Sense of AI-Ready APIs in a Buzzword World, Andr...
PDF
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
PDF
apidays Munich 2025 - The Double Life of the API Product Manager, Emmanuel Pa...
PDF
apidays Munich 2025 - Let’s build, debug and test a magic MCP server in Postm...
PDF
apidays Munich 2025 - The life-changing magic of great API docs, Jens Fischer...
PDF
apidays Munich 2025 - Automating Operations Without Reinventing the Wheel, Ma...
PDF
apidays Munich 2025 - Geospatial Artificial Intelligence (GeoAI) with OGC API...
PPTX
apidays Munich 2025 - GraphQL 101: I won't REST, until you GraphQL, Surbhi Si...
PPTX
apidays Munich 2025 - Effectively incorporating API Security into the overall...
PPTX
apidays Munich 2025 - Federated API Management and Governance, Vince Baker (D...
PPTX
apidays Munich 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (Aavista Oy)
PPTX
apidays Munich 2025 - Streamline & Secure LLM Traffic with APISIX AI Gateway ...
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays Munich 2025 - The Physics of Requirement Sciences Through Application...
apidays Munich 2025 - Developer Portals, API Catalogs, and Marketplaces, Miri...
apidays Munich 2025 - Making Sense of AI-Ready APIs in a Buzzword World, Andr...
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays Munich 2025 - The Double Life of the API Product Manager, Emmanuel Pa...
apidays Munich 2025 - Let’s build, debug and test a magic MCP server in Postm...
apidays Munich 2025 - The life-changing magic of great API docs, Jens Fischer...
apidays Munich 2025 - Automating Operations Without Reinventing the Wheel, Ma...
apidays Munich 2025 - Geospatial Artificial Intelligence (GeoAI) with OGC API...
apidays Munich 2025 - GraphQL 101: I won't REST, until you GraphQL, Surbhi Si...
apidays Munich 2025 - Effectively incorporating API Security into the overall...
apidays Munich 2025 - Federated API Management and Governance, Vince Baker (D...
apidays Munich 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (Aavista Oy)
apidays Munich 2025 - Streamline & Secure LLM Traffic with APISIX AI Gateway ...
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
Ad

Recently uploaded (20)

PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
Database Infoormation System (DBIS).pptx
PDF
Global Data and Analytics Market Outlook Report
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
modul_python (1).pptx for professional and student
PPTX
IMPACT OF LANDSLIDE.....................
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Business Analytics and business intelligence.pdf
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
[EN] Industrial Machine Downtime Prediction
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
annual-report-2024-2025 original latest.
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
Introduction to the R Programming Language
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Database Infoormation System (DBIS).pptx
Global Data and Analytics Market Outlook Report
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Qualitative Qantitative and Mixed Methods.pptx
modul_python (1).pptx for professional and student
IMPACT OF LANDSLIDE.....................
ISS -ESG Data flows What is ESG and HowHow
Business Analytics and business intelligence.pdf
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
retention in jsjsksksksnbsndjddjdnFPD.pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
[EN] Industrial Machine Downtime Prediction
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
annual-report-2024-2025 original latest.
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Introduction to the R Programming Language
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt

apidays Australia 2023 - How We Built Our Generative AI Assistant: New Relic Grok, Peter Marelas, New Relic

  • 1. © 2023 New Relic, Inc. All rights reserved How we built our Generative AI assistant New Relic Grok Peter Marelas Chief Architect, APJ New Relic
  • 3. © 2023 New Relic, Inc. All rights reserved Collect Applications, Web, Mobile, Cloud. IoT, etc New Relic Cloud Observability Platform Infrastructure Security DevOps Web AI/ML Mobile Network SRE Back-End Full-Stack Cloud Kubernetes Synthetics Serverless APM Model Performance Network Browser Mobile Infrastructure Distributed Tracing Log Management AIOps Full Stack O11y Telemetry Data Platform ✓ ✓ ✓ No Team Silos Only pricing model For ubiquity and scale No Data Silos Only purpose built Telemetry data cloud No Tool Silos All monitoring and security Tools in one connected experience Store Filter, enrich, build relationships - system, software, users, topology maps, etc Visualise Real-time dashboards, service maps, query builders, curated experiences Analyse Correlation, causal analysis, trends, anomaly detection, real-time alerting, health indicators
  • 4. © 2023 New Relic, Inc. All rights reserved. Motivation How do I ….?
  • 5. © 2023 New Relic, Inc. All rights reserved. Motivation What is ….?
  • 6. © 2023 New Relic, Inc. All rights reserved. Motivation Peak of hype cycle creates customer expectation.. * Gartner Hype Cycle for Artificial Intelligence, 2023
  • 8. © 2023 New Relic, Inc. All rights reserved. Grok has 4 specific skills Skill Common NL prefix Tool Source of Knowledge Answer questions about New Relic How do I … NL 2 Docs New Relic Documentation Answer questions about users data What … How many … NL 2 NRQL NRDB Check if any problems or anomalies with users environment Are … NL 2 Anomalies NRDB Interpret users dashboards What is … NL 2 Dashboards Dashboard definition, NRDB
  • 14. © 2023 New Relic, Inc. All rights reserved. How Grok decides what skill (tool) to use? “What is my transaction count?” Ask LLM to pick tool for instruction given a description of each tool NL 2 NRQL NL 2 Docs NL 2 NRQL NL 2 Dashboards NL 2 Anomalies
  • 15. © 2023 New Relic, Inc. All rights reserved. How Grok processes NL 2 NRQL requests “What is my transaction count?” Ask LLM to pick most relevant tables relating to user’s question Ask LLM to generate NRQL from prompt Get schema for these tables as metadata Validate query is syntactically correct Combine metadata, examples and user’s question into prompt Execute NRQL Render chart and natural language response Pass response to LLM and ask to render natural text response Ask LLM to correct query Retrieve similar examples of Q/NRQL pairs from vector database Y
  • 16. © 2023 New Relic, Inc. All rights reserved. How Grok processes NL 2 DOCS How do I….? Convert question to embeddings using LLM Generate prompt with question and relevant text passages Search vector database for similar embeddings Pass prompt to LLM to render natural response from passages in prompt Extract passages of text associated with similar embeddings Render response to user In-context learning with Retrieval Augmented Generation (RAG)
  • 17. © 2023 New Relic, Inc. All rights reserved. Challenges & Solutions
  • 18. © 2023 New Relic, Inc. All rights reserved. Natural Language Instructions Deterministic Output What we want from an AI assistant.. Specific Knowledge DBs Specific Rule Interpreters General Output Formats
  • 19. © 2023 New Relic, Inc. All rights reserved. Natural Language Instructions Deterministic Output Natural Language Instructions Creative Output Generic Knowledge DBs Generic Rule Interpreters Generic Output Formats What we want from an AI assistant.. Foundational LLMs Specific Knowledge DBs Specific Rule Interpreters General Output Formats
  • 20. © 2023 New Relic, Inc. All rights reserved. Generic Knowledge DBs Generic Rule Interpreters Generic Output Formats * Deterministic Output Specific Knowledge DB Specific Rules Natural Language Instructions Foundational LLM + Retrieval Augmented Generation Natural Language Instructions Deterministic Output Natural Language Instructions Creative Output Generic Knowledge DBs Generic Rule Interpreters Generic Output Formats What we want from an AI assistant.. Foundational LLMs Specific Knowledge DBs Specific Rule Interpreters General Output Formats
  • 21. © 2023 New Relic, Inc. All rights reserved. What questions do our users want to ask? User Study 79% said they wanted to learn something about a capability or get insights from their own dataset.
  • 22. © 2023 New Relic, Inc. All rights reserved. Finding right prompts (Prompt Engineering) ▪ Ongoing refinement (edge cases) ▪ Add examples to prompts (fewshot) ▪ Add rules to prompt ▪ Feedback mechanism ▪ Robust test harness ▪ ROGUE & BERT scores ▪ 2nd LLM to assess quality <Context information>: You are an AI assistant specialized in translating user questions into New Relic Query Language (NRQL), with no knowledge of SQL. Given a user's question, information about the user, descriptions of event schemas, and examples of questions and answers, your task is to generate an appropriate NRQL query. The provided event schemas contain only the most relevant ones and you need to use only one. In the context of New Relic, an entity is a basic data reporting element, such as an application, host, or database service; each entity has a unique Guid, which is a base64-encoded unique identifier; and if a user references an entity by its Guid, you should use it in the NRQL you generate, but if entity guid is not explicitly referenced, you should not use in the query that you will generate. The wording of the question should tell you whether the user wants totals or data over a time interval. Use TIMESERIES clause in the NRQL query that you generate only if the user requests data over time or per day/hour. Otherwise, do not use it. <How to select time range in NRQL queries>: Every NRQL query should contain a SINCE and may contain an UNTIL clause, as this is the only viable way to select a time range in NRQL. If the SINCE clause is not used, the query uses the last 1 hour of data by default, but you should always use the SINCE clause in the query you generate, and if the time range is not explicitly specified, use SINCE 1 hour ago. <Examples of valid NRQL queries with time range selections>: <User question>: How many transactions happened today? <NRQL query>: SELECT count(*) FROM Transaction SINCE TODAY <User question>: How many transactions happened on 25th of April? <NRQL query>: FROM Transaction SELECT count(*) SINCE '2023-04-25 00:00:00' UNTIL '2023-04-25 23:59:59' <User question>: How many transactions happened in the previous calendar week? <NRQL query>: FROM Transaction SELECT COUNT(*) SINCE LAST WEEK UNTIL THIS WEEK <User question>: How many transactions happened on Monday? <NRQL query>: FROM Transaction SELECT count(*) SINCE MONDAY until TUESDAY <User question>: How many transactions per day occurred this year until 10 days ago <NRQL query>: FROM Transaction SELECT count(*) SINCE THIS YEAR until 10 days ago TIMESERIES 1 day System instruction Examples Rules User question
  • 23. © 2023 New Relic, Inc. All rights reserved. Performance ▪ High variance ▪ Time to intermediate token ▪ Time to first token ▪ Time to last token ▪ Intermediate messages ▪ Distribute requests ▪ Cache some answers
  • 24. © 2023 New Relic, Inc. All rights reserved. Microsoft Azure OpenAI Service Cost of LLM Model Prompt (1000 tokens) Completion (1000 tokens) GPT-4 $0.003 $0.006 Ada (embeddings) $0.0001 Query Avg prompt tokens Avg completion tokens Avg cost per e2e request NL2Docs 3016 568 $0.13 NL2NRQL 6516 118 $0.20 New Relic Grok Users Daily docs requests Daily NL2NRQL requests Monthly cost 1 user 5 5 $49 100 users 500 500 $4,900 10,000 users 50,000 50,000 $490,000
  • 25. © 2023 New Relic, Inc. All rights reserved. What’s next for Grok? ▪ Improve quality responses ▪ Experiment own models ▪ Developing new skill (NL2Config)
  • 26. © 2023 New Relic, Inc. All rights reserved. Peter Marelas Chief Architect, APJ https://www.linkedin.com/in/peter-marelas
  • 27. © 2023 New Relic, Inc. All rights reserved. Deprecation of LLMs ▪ Robust Test Harness ▪ ROUGE – Quantify overlap of words between generated output and reference text ▪ BERTScore – semantic similarity ▪ Use GPT4 to evaluate ($’s)
  • 28. © 2023 New Relic, Inc. All rights reserved. LLM Rate Limits ▪ 40,000 / 5200 = 7 requests/min ▪ Multiple endpoints ▪ Queue ▪ Distribute requests ▪ Limit max completion tokens (counts towards token-per-minute limit)
  • 29. © 2023 New Relic, Inc. All rights reserved. LLM Context Length Limits ▪ GPT-4 8192 context length ▪ Prompt + completion within context length to avoid hallucinations ▪ Transform prompt ▪ Remove extra spaces ▪ Remove pronouns ▪ Convert JSON to CSV
  • 30. © 2023 New Relic, Inc. All rights reserved. LLM have no knowledge after 2021 ▪ In-context learning ▪ Pass question + relevant docs ▪ Only as good as algo used to find relevant docs ▪ Cross-encoder re-ranking
  • 31. © 2023 New Relic, Inc. All rights reserved. How are similar documents / examples found? passages doc Text Embedding [0.354, 0.234, … , 0.87] 1536 dimensions Vector DB Indexing Search text Text Embedding [0.354, 0.234, … , 0.87] 1536 dimensions Vector DB Return TopK (maxmarginal relevancy convert search convert store Indexed by embedding
  • 32. © 2023 New Relic, Inc. All rights reserved. LLM What tools do we use? + Vector Database LLM Logic