Cheat Sheet: What Most Teams Miss When Building with LLMs

Towards AI

Making AI accessible to all with our courses, blogs, tutorials, books & community.

Published Jul 16, 2025

+ Follow

Lesson 2 now free: RAG, Structured Outputs, Fine-Tuning

Everyone starts with prompts.

But if you've ever built beyond a toy project, you've probably hit this wall:

The model sounds fluent but the answers are off.

The fix? It’s not always fine-tuning. In fact, it’s almost never the first step.

That’s exactly what we walk through in Session 2 of our 10-Hour LLM Video Primer, now free to watch.

🎥 Watch the full session

Too busy to sit through two hours? Here's the distilled cheat sheet:

LLM Stack Cheat Sheet: What to Use, When

1. Prompting: Your starting point

Shape behavior and logic using well-structured prompts.

Start with: Zero-shot, few-shot, instruction formatting
Use when: General tasks, exploration, lightweight workflows
Next step if it fails: Move to RAG, not fine-tuning

2. RAG: Inject real, dynamic knowledge

Ground your model in external information it wasn’t trained on.

Tools: LangChain, LlamaIndex, vector DBs (FAISS, Pinecone, Chroma)
Core strategies:
Why it matters: Reduces hallucinations and brings domain context

3. Structured Outputs: Make answers reliable

Turn freeform generation into predictable, parsable formats.

Use when: Your system depends on clean integration or automation
Techniques:
Tools: Outlines, Pydantic (Python), Zod (JS/TS)

4. Fine-Tuning: Only when everything else falls short

Use it for narrow tasks, tone control, or domain-specific behavior, if you already have high-quality data.

Approaches: SFT, LoRA, QLoRA, RLHF, DPO, GRPO
Use cases:
Consider: Time, compute, eval pipeline; ROI must be clear

Bonus: Real-World Infrastructure

These aren't extras — they're essentials once you ship:

Evaluation: Use BLEU, ROUGE, perplexity, plus human-in-the-loop tests. Measure continuously.
Cost & Latency: Use context caching (CAG) to avoid redundant token usage; supported by OpenAI & Gemini
Tool Orchestration: Chain LLMs with APIs, agents, and conditional logic
Model Selection:

We’ve expanded the entire production pipeline into a full 10-hour course built for developers and builders working on real-world LLM applications. In the next sessions, you’ll walk through:

Evaluating LLMs with automated metrics (BLEU, ROUGE, perplexity) and human-in-the-loop testing
Understanding agent workflows, tool use, orchestration, and how to manage cost/latency trade-offs
Applying core optimization and safety practices like quantization, distillation, RLHF, and injection mitigation

By the end, you’ll know how to build, evaluate, automate, and maintain LLM systems that hold up in production, not just on a notebook.

“Outstanding resource to master LLM development.”

“Helped me debug and design with confidence.”

“Gave me the mental model I didn’t know I was missing.”

The full course is available now at launch pricing ($199).

→ Check it out here

Or watch lesson 2 for free

P.S. If you missed it, lesson 1 is also still free.

Cheat Sheet: What Most Teams Miss When Building with LLMs

Towards AI

Making AI accessible to all with our courses, blogs, tutorials, books & community.

Lesson 2 now free: RAG, Structured Outputs, Fine-Tuning

LLM Stack Cheat Sheet: What to Use, When

1. Prompting: Your starting point

2. RAG: Inject real, dynamic knowledge

3. Structured Outputs: Make answers reliable

4. Fine-Tuning: Only when everything else falls short

Bonus: Real-World Infrastructure

The Towards AI Newsletter

117,463 followers

More articles by this author

Others also viewed

The Future of APIs: Governance, Structure, and Scale in the Age of AI

The New Minimum for LLM Developers Just Changed. So Did the Course That Teaches It.

Build, Upload, and Deploy a Custom Model from Scratch

🥇Top ML Papers of the Week

Playing with prompts: a "generated" MAESTRO Analysis of MIRIX (https://mirix.io/)

Issue #02 | Reasoning, Rethinking & Research

A comprehensive journey from A to Z about prompts and prompt tuning.

LLM fine-tuning and model selection + other resources

Demystifying Model Context Protocol

Building a Dynamic, Parallel Tool-Calling Agent with LangGraph + MCP + Ollama

Explore topics

Lesson 2 now free: RAG, Structured Outputs, Fine-Tuning

LLM Stack Cheat Sheet: What to Use, When

1. Prompting: Your starting point

2. RAG: Inject real, dynamic knowledge

3. Structured Outputs: Make answers reliable

4. Fine-Tuning: Only when everything else falls short

Bonus: Real-World Infrastructure

The Towards AI Newsletter

117,463 followers

TAI #165: GPT-5’s Mixed Reception

Aug 12, 2025

All Things AI Under a Minute

Aug 11, 2025

LAI #87: Recurrent Memory, Agentic RAG, and Evaluating LLM Writing

Aug 7, 2025

TAI #164: Generative AI Monetization Accelerates As ChatGPT Weekly Active Users Hit 13% of the Global Online Population

Aug 5, 2025

LAI #86: LLM Gaps, Agent Design, and Smarter Semantic Caching

Jul 31, 2025

TAI #163: AI Unlocking History's Secrets; Deepmind’s Aeneas Continues A Recent Trend

Jul 29, 2025

Stop Guessing With AI; Make It Second Nature

Jul 28, 2025

LAI #85: Agents That Work, LLaVA Training, and the $40K RAG Deal

Jul 24, 2025

TAI #162: The Agentic Era of AI: From IMO Gold to Real-World Work with ChatGPT Agent

Jul 23, 2025

LAI #84: Prompting as a Skill, DINOv2 Embeddings, and Claude vs. OLMo 2

Jul 17, 2025

Others also viewed

The Future of APIs: Governance, Structure, and Scale in the Age of AI

The New Minimum for LLM Developers Just Changed. So Did the Course That Teaches It.

Build, Upload, and Deploy a Custom Model from Scratch

🥇Top ML Papers of the Week

Playing with prompts: a "generated" MAESTRO Analysis of MIRIX (https://mirix.io/)

Issue #02 | Reasoning, Rethinking & Research

A comprehensive journey from A to Z about prompts and prompt tuning.

LLM fine-tuning and model selection + other resources

Demystifying Model Context Protocol

Building a Dynamic, Parallel Tool-Calling Agent with LangGraph + MCP + Ollama

Explore topics