Last week, I described four design patterns for AI agentic workflows that I believe will drive significant progress: Reflection, Tool use, Planning and Multi-agent collaboration. Instead of having an LLM generate its final output directly, an agentic workflow prompts the LLM multiple times, giving it opportunities to build step by step to higher-quality output. Here, I'd like to discuss Reflection. It's relatively quick to implement, and I've seen it lead to surprising performance gains. You may have had the experience of prompting ChatGPT/Claude/Gemini, receiving unsatisfactory output, delivering critical feedback to help the LLM improve its response, and then getting a better response. What if you automate the step of delivering critical feedback, so the model automatically criticizes its own output and improves its response? This is the crux of Reflection. Take the task of asking an LLM to write code. We can prompt it to generate the desired code directly to carry out some task X. Then, we can prompt it to reflect on its own output, perhaps as follows: Here’s code intended for task X: [previously generated code] Check the code carefully for correctness, style, and efficiency, and give constructive criticism for how to improve it. Sometimes this causes the LLM to spot problems and come up with constructive suggestions. Next, we can prompt the LLM with context including (i) the previously generated code and (ii) the constructive feedback, and ask it to use the feedback to rewrite the code. This can lead to a better response. Repeating the criticism/rewrite process might yield further improvements. This self-reflection process allows the LLM to spot gaps and improve its output on a variety of tasks including producing code, writing text, and answering questions. And we can go beyond self-reflection by giving the LLM tools that help evaluate its output; for example, running its code through a few unit tests to check whether it generates correct results on test cases or searching the web to double-check text output. Then it can reflect on any errors it found and come up with ideas for improvement. Further, we can implement Reflection using a multi-agent framework. I've found it convenient to create two agents, one prompted to generate good outputs and the other prompted to give constructive criticism of the first agent's output. The resulting discussion between the two agents leads to improved responses. Reflection is a relatively basic type of agentic workflow, but I've been delighted by how much it improved my applications’ results. If you’re interested in learning more about reflection, I recommend: - Self-Refine: Iterative Refinement with Self-Feedback, by Madaan et al. (2023) - Reflexion: Language Agents with Verbal Reinforcement Learning, by Shinn et al. (2023) - CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing, by Gou et al. (2024) [Original text: https://lnkd.in/g4bTuWtU ]
Self-Updating Feedback Loops in LLM Protocols
Explore top LinkedIn content from expert professionals.
Summary
Self-updating feedback loops in LLM protocols are systems that allow large language models (LLMs) to automatically review and improve their own outputs in real time, learning from mistakes and adapting without needing constant human oversight. This process helps LLMs become more reliable and smarter over time by using their own "reflection" and experience to guide future responses.
- Encourage self-critique: Set up workflows where your AI models can analyze their own responses and make improvements based on detected errors or weaknesses.
- Automate revision cycles: Use tools that let models repeatedly refine their outputs, storing feedback and progress to gradually increase accuracy and quality.
- Monitor and filter learning: Regularly review the self-generated feedback to prevent models from reinforcing unwanted behavior and to ensure they stay aligned with safety guidelines.
-
-
For years, fine-tuning LLMs has required large amounts of data and human oversight. Small improvements can disrupt existing systems, requiring humans to go through and flag errors in order to fit the model to pre-existing workflows. This might work for smaller use cases, but it is clearly unsustainable at scale. However, recent research suggests that everything may be about to change. I have been particularly excited about two papers from Anthropic and Massachusetts Institute of Technology, which propose new methods that enable LLMs to reflect on their own outputs and refine performance without waiting for humans. Instead of passively waiting for correction, these models create an internal feedback loop, learning from their own reasoning in a way that could match, or even exceed, traditional supervised training in certain tasks. If these approaches mature, they could fundamentally reshape enterprise AI adoption. From chatbots that continually adjust their tone to better serve customers to research assistants that independently refine complex analyses, the potential applications are vast. In today’s AI Atlas, I explore how these breakthroughs work, where they could make the most immediate impact, and what limitations we still need to overcome.
-
Excited to announce my new (free!) white paper: “Self-Improving LLM Architectures with Open Source” – the definitive guide to building AI systems that continuously learn and adapt. If you’re curious how Large Language Models can critique, refine, and upgrade themselves in real-time using fully open source tools, this is the resource you’ve been waiting for. I’ve put together a comprehensive deep dive on: Foundation Models (Llama 3, Mistral, Google Gemma, Falcon, MPT, etc.): How to pick the right LLM as your base and unlock reliable instruction-following and reasoning capabilities. Orchestration & Workflow (LangChain, LangGraph, AutoGen): Turn your model into a self-improving machine with step-by-step self-critiques and automated revisions. Knowledge Storage (ChromaDB, Qdrant, Weaviate, Neo4j): Seamlessly integrate vector and graph databases to store semantic memories and advanced knowledge relationships. Self-Critique & Reasoning (Chain-of-Thought, Reflexion, Constitutional AI): Empower LLMs to identify errors, refine outputs, and tackle complex reasoning by exploring multiple solution paths. Evaluation & Feedback (LangSmith Evals, RAGAS, W&B): Monitor and measure performance continuously to guide the next cycle of improvements. ML Algorithms & Fine-Tuning (PPO, DPO, LoRA, QLoRA): Transform feedback into targeted model updates for faster, more efficient improvements—without catastrophic forgetting. Bias Amplification: Discover open source strategies for preventing unwanted biases from creeping in as your model continues to adapt. In this white paper, you’ll learn how to: Architect a complete self-improvement workflow, from data ingestion to iterative fine-tuning. Deploy at scale with optimized serving (vLLM, Triton, TGI) to handle real-world production needs. Maintain alignment with human values and ensure continuous oversight to avoid rogue outputs. Ready to build the next generation of AI? Download the white paper for free and see how these open source frameworks come together to power unstoppable, ever-learning LLMs. Drop a comment below or send me a DM for the link! Let’s shape the future of AI—together. #AI #LLM #OpenSource #SelfImproving #MachineLearning #LangChain #Orchestration #VectorDatabases #GraphDatabases #SelfCritique #BiasMitigation #Innovation #aiagents
-
Most LLM agents stop learning after fine-tuning. They can replay expert demos but can’t adapt when the world changes. That’s because we train them with imitation learning—they copy human actions without seeing what happens when they fail. It’s reward-free but narrow. The next logical step, reinforcement learning, lets agents explore and learn from rewards, yet in real settings (e.g. websites, APIs, operating systems) reliable rewards rarely exist or appear too late. RL becomes unstable and costly, leaving LLMs stuck between a method that can’t generalize and one that can’t start. Researchers from Meta and Ohio State propose a bridge called Early Experience. Instead of waiting for rewards, agents act, observe what happens, and turn those future states into supervision. It’s still reward-free but grounded in real consequences. They test two ways to use this data: 1. Implicit World Modeling: for every state–action pair, predict the next state. The model learns how the world reacts—what actions lead where, what failures look like. 2. Self-Reflection: sample a few alternative actions, execute them, and ask the model to explain in language why the expert’s move was better. These reflections become new training targets, teaching decision principles that transfer across tasks. Across eight benchmarks, from home simulations and science labs to APIs, travel planning, and web navigation, both methods beat imitation learning. In WebShop, success jumped from 42 % to 60 %; in long-horizon planning, gains reached 15 points. When later fine-tuned with RL, these checkpoints reached higher final performance and needed half (or even one-eighth) of the expert data. The gains held from 3B to 70B-parameter models. To use this yourself:, here is what you need to do: • Log each interaction and store a short summary of the next state—success, error, or side effect. • Run a brief next-state prediction phase before your normal fine-tune so the model learns transitions. • Add reflection data: run two-four alternative actions, collect results, and prompt the model to explain why the expert step was better. Train on those reflections plus the correct action. • Keep compute constant—replace part of imitation learning, not add more. This approach makes agent training cheaper, less dependent on scarce expert data, and more adaptive. As models learn from self-generated experience, the skill barrier for building capable agents drops dramatically. In my opinion, the new challenge is governance and ensuring they don’t learn the wrong lessons. That means filtering unsafe traces, constraining environments to safe actions, and auditing reflections before they become training data. When rewards are scarce and demonstrations costly, let the agent learn from what it already has, its own experience! That shift turns LLMs from static imitators into dynamic learners and moves us closer to systems that truly improve through interaction, safely and at scale.
-
Recursive Introspection: LLM finetuning approach to teach models how to self-improve Usually LLM does not exhibit the ability of continually improving their responses sequentially, even in scenarios where they are explicitly told that they are making a mistake. In this paper, authors present RISE: Recursive IntroSpEction, an iterative fine-tuning procedure, which attempts to teach the model how to alter its response after having executed previously unsuccessful attempts to solve a hard test-time problem, with optionally additional environment feedback. 𝗥𝗜𝗦𝗘 𝗔𝗽𝗽𝗿𝗼𝗮𝗰𝗵 𝗢𝘃𝗲𝗿𝘃𝗶𝗲𝘄 i) Problem formulation - convert single-turn problems into multi-turn Markov decision processes(MDPs) - The state is given by the prompt, history of prior attempts, and optional feedback from the environment. - An action is a response generated from the LLM given the state of multi-turn interaction so far ii) Data collection - collect data by unrolling the current model 𝑘 − 1 times followed by an improved version of the response, which is obtained by either (1) self-distillation: sample multiple responses from the current model, and use the best response, or (2) distillation: obtain oracle responses by querying a more capable model. In either case, RISE then trains on the generated data. 𝗥𝗜𝗦𝗘 𝗜𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗮𝘁 𝗗𝗲𝗽𝗹𝗼𝘆𝗺𝗲𝗻𝘁 𝗧𝗶𝗺𝗲 i) with oracle - each time the model improves its response, it is allowed to check its answer against an environment and terminate early as soon as a correct answer is found ii) without oracle - ask the model to sequentially revise its own responses j times, and perform majority voting on all candidate outputs from different turns to obtain the final response - If turn number 𝑗 is larger than the iteration number 𝑘, the agent only keeps the most recent history with 𝑘 interactions to avoid test-time distribution shift. 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 i) Metrics used: - with oracle, “p1@t5”: this run terminates the rollout as soon as response is correct. - without oracle, “m1@t5”: this run does not terminate rollout before five turns, and we compute maj@1 performance on candidates produced in each turn ii) Results: - RISE attains the biggest performance improvement between 1-turn (m5@t1) and 5-turn (m1@t5) performance w/o an oracle on both GSM8K and MATH - prompting-only self-refine largely degrades performance across the board - Using RISE on top of Mistral-7B exceeds even state-of-the-art math models such as Eurus-7B-SFT 𝗟𝗶𝗺𝗶𝘁𝗮𝘁𝗶𝗼𝗻𝘀 - Improving with self-generated supervision will likely require more computation and more iterations, since it will be slower than when using an off-the-shelf expert model - RISE requires running manual iterations and hence, a more “online” variant of RISE is likely solution in the long run 𝗕𝗹𝗼𝗴: https://lnkd.in/eAcCi99S 𝗣𝗮𝗽𝗲𝗿: https://lnkd.in/eP8VwHrz
-
Tired of your LLM just repeating the same mistakes when retries fail? Simple retry strategies often just multiply costs without improving reliability when models fail in consistent ways. You've built validation for structured LLM outputs, but when validation fails and you retry the exact same prompt, you're essentially asking the model to guess differently. Without feedback about what went wrong, you're wasting compute and adding latency while hoping for random success. A smarter approach feeds errors back to the model, creating a self-correcting loop. Effective AI Engineering #13: Error Reinsertion for Smarter LLM Retries 👇 The Problem ❌ Many developers implement basic retry mechanisms that blindly repeat the same prompt after a failure: [Code example - see attached image] Why this approach falls short: - Wasteful Compute: Repeatedly sending the same prompt when validation fails just multiplies costs without improving chances of success. - Same Mistakes: LLMs tend to be consistent - if they misunderstand your requirements the first time, they'll likely make the same errors on retry. - Longer Latency: Users wait through multiple failed attempts with no adaptation strategy.Beyond Blind Repetition: Making Your LLM Retries Smarter with Error Feedback. - No Learning Loop: The model never receives feedback about what went wrong, missing the opportunity to improve. The Solution: Error Reinsertion for Adaptive Retries ✅ A better approach is to reinsert error information into subsequent retry attempts, giving the model context to improve its response: [Code example - see attached image] Why this approach works better: - Adaptive Learning: The model receives feedback about specific validation failures, allowing it to correct its mistakes. - Higher Success Rate: By feeding error context back to the model, retry attempts become increasingly likely to succeed. - Resource Efficiency: Instead of hoping for random variation, each retry has a higher probability of success, reducing overall attempt count. - Improved User Experience: Faster resolution of errors means less waiting for valid responses. The Takeaway Stop treating LLM retries as mere repetition and implement error reinsertion to create a feedback loop. By telling the model exactly what went wrong, you create a self-correcting system that improves with each attempt. This approach makes your AI applications more reliable while reducing unnecessary compute and latency.
-
Most Retrieval-Augmented Generation (RAG) pipelines today stop at a single task — retrieve, generate, and respond. That model works, but it’s 𝗻𝗼𝘁 𝗶𝗻𝘁𝗲𝗹𝗹𝗶𝗴𝗲𝗻𝘁. It doesn’t adapt, retain memory, or coordinate reasoning across multiple tools. That’s where 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗔𝗜 𝗥𝗔𝗚 changes the game. 𝗔 𝗦𝗺𝗮𝗿𝘁𝗲𝗿 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 𝗳𝗼𝗿 𝗔𝗱𝗮𝗽𝘁𝗶𝘃𝗲 𝗥𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 In a traditional RAG setup, the LLM acts as a passive generator. In an 𝗔𝗴𝗲𝗻𝘁𝗶𝗰 𝗥𝗔𝗚 system, it becomes an 𝗮𝗰𝘁𝗶𝘃𝗲 𝗽𝗿𝗼𝗯𝗹𝗲𝗺-𝘀𝗼𝗹𝘃𝗲𝗿 — supported by a network of specialized components that collaborate like an intelligent team. Here’s how it works: 𝗔𝗴𝗲𝗻𝘁 𝗢𝗿𝗰𝗵𝗲𝘀𝘁𝗿𝗮𝘁𝗼𝗿 — The decision-maker that interprets user intent and routes requests to the right tools or agents. It’s the core logic layer that turns a static flow into an adaptive system. 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗠𝗮𝗻𝗮𝗴𝗲𝗿 — Maintains awareness across turns, retaining relevant context and passing it to the LLM. This eliminates “context resets” and improves answer consistency over time. 𝗠𝗲𝗺𝗼𝗿𝘆 𝗟𝗮𝘆𝗲𝗿 — Divided into Short-Term (session-based) and Long-Term (persistent or vector-based) memory, it allows the system to 𝗹𝗲𝗮𝗿𝗻 𝗳𝗿𝗼𝗺 𝗲𝘅𝗽𝗲𝗿𝗶𝗲𝗻𝗰𝗲. Every interaction strengthens the model’s knowledge base. 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗟𝗮𝘆𝗲𝗿 — The foundation. It combines similarity search, embeddings, and multi-granular document segmentation (sentence, paragraph, recursive) for precision retrieval. 𝗧𝗼𝗼𝗹 𝗟𝗮𝘆𝗲𝗿 — Includes the Search Tool, Vector Store Tool, and Code Interpreter Tool — each acting as a functional agent that executes specialized tasks and returns structured outputs. 𝗙𝗲𝗲𝗱𝗯𝗮𝗰𝗸 𝗟𝗼𝗼𝗽 — Every user response feeds insights back into the vector store, creating a continuous learning and improvement cycle. 𝗪𝗵𝘆 𝗜𝘁 𝗠𝗮𝘁𝘁𝗲𝗿𝘀 Agentic RAG transforms an LLM from a passive responder into a 𝗰𝗼𝗴𝗻𝗶𝘁𝗶𝘃𝗲 𝗲𝗻𝗴𝗶𝗻𝗲 capable of reasoning, memory, and self-optimization. This shift isn’t just technical — it’s strategic It defines how AI systems will evolve inside organizations: from one-off assistants to adaptive agents that understand context, learn continuously, and execute with autonomy.
-
Agents in production tend to fail for one or more of the following reasons: context bloat, brittle recovery, unverifiable answers, and weak observability. The R⁵ operating model — Relax, Reflect, Reference, Retry, Report — turns those failure modes into engineering disciplines: Relax: actively manage context and latency so agents stay coherent under load. Evidence shows LLMs degrade when relevant information is buried in the middle of long inputs — so compaction and structure are essential. Reflect: inject deliberate checkpoints and self-critique so agents improve mid-run without retraining. The Reflexion framework demonstrates how lightweight, verbal self-feedback boosts downstream task success. Reference: surface provenance (citations, retrieval traces) so outputs are attributable, auditable, and defensible. This reduces perceived and actual hallucination risk and supports human review. (See also Report below.) Retry: make error handling adaptive and reasoned, not blind repetition — Reflexion-style feedback loops are effective beyond games and coding tasks. Report: quantify factuality/consistency to close the loop. SelfCheckGPT shows simple sampling-based checks can detect hallucinations in black-box LLMs — exactly the kind of metric an agent runtime should expose. Google ADK 1.16 ships concrete capabilities that implement R⁵: context compaction, pause/resume, citation metadata, reflective retries, unified search, and observability/eval hooks. R⁵ is an operational contract between your agents and production grade environments.
-
💸 This will save you money on LLM fine-tuning 💰 If you fine-tune a large model for every new domain or agent, you likely run into tens of thousands of dollars of compute expenses -- and most of that cost comes from retraining weights that don’t actually need to change. A new paper from Stanford University, SambaNova and University of California, Berkeley proposes a way to avoid modifying the LLM altogether: Agentic Context Engineering (ACE) [1], a framework for self-improving LLMs that evolve through context, not weight updates. ACE treats system prompts and memory as living playbooks that grow over time via a structured generate --> reflect --> curate loop. Instead of collapsing into ever shorter "optimized" prompts, ACE accumulates reusable reasoning traces and domain heuristics through incremental delta updates. The results: ⬆️+10.6% on multi-turn agent benchmarks ⬆️+8.6% on financial reasoning tasks ⬇️~87% lower adaptation latency and 80% lower token cost vs. existing adaptive methods Matches GPT-4.1 based agents on AppWorld using a smaller open-source model (DeepSeek-V3.1) No labeled data, no re-training -- just intelligent context evolution. Compared to its closest alternative GEPA [2], which iteratively rewrites a single compact prompt, ACE decomposes adaptation into Generator-Reflector-Curator roles and performs incremental delta updates instead of full rewrites. This preserves detailed domain heuristics and prevents context collapse. If your adaptation loop still involves fine-tuning, this paper shows you can get most of the gains -- and save most of the cost -- by teaching the context to learn instead of the model. Links to the papers in the comment section ⏬ #LLMResearch #ContextEngineering #AIAgents #LLMCostReduction #PromptOptimization #MetaLearning #LLM #LargeLanguageModel #AI #ArtificialIntelligence #Stanford #SambaNova #UCBerkeley
-
Self-search reinforcement learning (SSRL) enables language models to reach high performance on search-based question answering using only internal knowledge. Experiments show that models can simulate query generation and information retrieval within a single reasoning trajectory, revealing substantial latent world knowledge whose extractability scales with sampling but remains difficult to reliably select. SSRL strengthens this capability by rewarding structured reasoning and format adherence, improving accuracy over external search-based RL baselines and reducing hallucinations. Models trained in this setting act as both reasoner and retriever, transfer effectively to real web search at inference, and require fewer external queries when guided by entropy. Repeated sampling narrows the gap between small and large models, while excessive reflection, long chains of thought, and multi-turn self-search hurt performance. Information masking and format rewards stabilize training, and on-policy retrieval proves essential. Overall, this approach supports more autonomous, scalable, and cost-efficient LLM agents. https://lnkd.in/gNK83RCv
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development