LAI #86: LLM Gaps, Agent Design, and Smarter Semantic Caching
Open-source eval tools, ReAct-based agents, vector databases, and why model weaknesses are often where the best tools get built.
Good morning, AI enthusiasts,
This week’s issue focuses on a recurring theme: most breakthroughs don’t happen in spite of model limitations, they happen because of them. In What’s AI, we break down key LLM weaknesses (reasoning, memory, retrieval) and explore how smart tooling, like ReAct agents, structured search, and caching layers, turns those gaps into new capabilities.
We also highlight a fully open-source eval toolkit from the community, a tutorial on building fast local agents, a deep dive on vector DBs, and a PhD-led research survey on how real people interact with AI. If you’ve ever found yourself duct-taping fixes around your favorite model, this issue is for you.
Let’s get into it.
What’s AI Weekly
This week in What’s AI, I dive into something that we have been discussing since the rise of LLMs: their limitations. But in this iteration, I not only highlight these weaknesses but also walk you through what you can actually do about them. I also share how each of these weaknesses and gaps is an opportunity for anyone who wants to build something on top of LLMs. Read the article to find out how you can make LLMs more reliable or watch the video on YouTube.
— Louis-François Bouchard, Towards AI Co-founder & Head of Community
Learn AI Together Community Section!
Featured Community post from the Discord
Cyber.crat2711 has created an evaluation stack designed to help GenAI teams measure and optimize their LLM pipelines with minimal overhead. It supports dozens of evaluation templates across safety, summarization, retrieval, behavior, and structure, and a wide spectrum of evaluation metrics across text, image, and audio modalities. Check it out on GitHub and support a fellow community member. If you have any questions or feedback, share it in the thread!
AI poll of the week!
This week, we’re doing something a little different. We’re helping out a community member who’s researching how people interact with AI. She's running a short, anonymous survey (just 6 minutes) to better understand user expectations of generative AI.
If you:
…you’re exactly who she’s looking for. Start the survey here!
Collaboration Opportunities
The Learn AI Together Discord community is flooding with collaboration opportunities. If you are excited to dive into applied AI, want a study partner, or even want to find a partner for your passion project, join the collaboration channel! Keep an eye on this section, too—we share cool opportunities every week!
1. Superuser666_sigil began work on an AI project called SigilDERG and has landed a Lambda AI Research Grant to take the next step: bootstrapping a training Codex starting with the Rust ecosystem. Currently, he is looking for people who can help with crate analysis / Rust OSS surfacing, metadata scraping & enrichment, building structured AI training sets (Rust-focused), pipeline logic (Python, async, llama.cpp, ONNX), and model evaluation/scoring logic (or even rule-based IRL scaffolds). If this sounds interesting, contact him in the thread!
2. Teddybrown117_45661 has built several automation projects like "voice agent for WhatsApp, Facebook, Instagram", "Chatbot for patients", " Regular News scraper and reporter". He has some more in the pipeline and is looking for co-workers. If this sounds like a relevant opportunity, reach out in the thread!
3. Llsmokell is new to AI and is looking for others who are also learning. If you are open to learning together, sharing resources, and exploring different areas of AI, feel free to connect in the thread!
Meme of the week!
Meme shared by superuser666_sigil
TAI Curated Section
Article of the week
For applications requiring efficiency and on-device deployment, Small Language Models (SLMs) present a compelling alternative to their larger counterparts. This analysis details the advantages of SLMs, such as lower operational costs and improved privacy, and covers the architectural and compression techniques used in their development, including pruning and knowledge distillation. It further argues that SLMs are well-suited for agentic AI systems due to their economic benefits and task-specific performance. A practical algorithm for migrating agentic systems from resource-intensive LLMs to more efficient, specialized SLMs is also presented for consideration.
Our must-read articles
This article presents a method for integrating Large Language Models (LLMs) and Graph Neural Networks (GNNs). The process starts with an LLM extracting entities and relationships from unstructured text to build a knowledge graph. This graph is then converted into a numerical, GNN-compatible format using libraries like PyTorch Geometric. A case study analyzing Wikipedia articles demonstrates how a GNN can be trained on this structure to predict links and generate embeddings. These embeddings capture complex semantic and structural patterns that are not apparent from text analysis alone, showing a way to enhance AI reasoning.
A practical guide for building ReAct agents with LangGraph demonstrates how to create AI workflows that merge reasoning with action. It explains how to structure a graph where nodes represent model calls or tool executions. The process starts with a simple model, then integrates a custom tool, and establishes a loop for the agent to decide when to act. To improve functionality, a feedback mechanism is added for more conversational responses, and memory is implemented using a checkpointer. This enables the agent to maintain context and recall information from previous interactions in a conversation.
3. Stop Wasting LLM Tokens: Build a Smart Semantic Cache with FAISS + HuggingFace By Sai Bhargav Rallapalli
To address latency and high costs in LLM applications, this article details the construction of a semantic cache using FAISS and HuggingFace. This technique avoids redundant API calls by storing previous query-answer pairs as vector embeddings. It then uses similarity search to retrieve cached answers for new, semantically similar questions, bypassing the LLM. It provides a step-by-step implementation guide, including logic for cache expiration to maintain data freshness. It also discusses suitable use cases, such as chatbots and RAG systems, and weighs the benefits against potential drawbacks.
4. Building Smart Agents: LangGraph + Perplexity with Memory for Developers By Sai Bhargav Rallapalli
Leveraging Perplexity AI's Sonar models and the LangGraph library, this guide details how to build a smart agent with conversational memory. It explains the process of setting up a stateful graph using a MemorySaver to retain interaction history, managed by a unique user ID. The author presents the core logic, including a customizable system prompt to define agent behavior, and demonstrates how to expose this functionality through a FastAPI endpoint. The result is a functional, context-aware agent ready for interaction and further development, making it a practical tutorial for developers.
If you are interested in publishing with Towards AI, check our guidelines and sign up. We will publish your work to our network if it meets our editorial policies and standards.