I rarely do public speaking, but earlier this year I stepped on stage at Haystack US (Charlottesville, VA) to share how we at AMBOSS evaluate AI-powered search features: Our AI Shortcuts, Vector Search and more. I'm curious: Are you using LLMs to evaluate features? How aligned are offline & online evaluations in your experience? Any cool tips & tricks you recently learned? My take why evaluation matters? 🔍 60 % of US med students and hundreds of thousands of clinicians trust AMBOSS at the bedside. In a high-stakes domain like medicine, a shiny feature that looks smart can’t ship without proof it actually helps. Here’s our playbook: 1️⃣ Outcome-first metrics. No baskets or purchases here, so we re-defined “conversion” around knowledge gained and task completion signals. 2️⃣ CTR ≠ relevance (always). A high click rate might mean the snippet was wrong and forced users to dig deeper. Context is everything. 3️⃣ Hybrid evaluation loop. Offline evaluation + an online A/B framework around engagement heuristics creates a framework that tells us in days (not quarters) whether a new algorithm moves the needle. Huge kudos to the Haystack crew & OpenSource Connections for curating a room full of search nerds, and to my team (Mehdi, Ágnes, Johannes, Serdar, Hanhan, Boaz, Matteo, Daniele, Sergii) who make me look smarter than I am. 🤓 https://lnkd.in/dgABBbz7
Valentin von Seggern’s Post
More Relevant Posts
-
Wrapped up back-to-back courses on LangGraph and Letta for long-term agentic memory. LangGraph Focuses on building stateful agent workflows as graphs. It allows you to implement semantic, episodic, and procedural memory types for context retention across interactions. Letta Functions as an AI operating system for stateful agents, developed by the creators of MemGPT. Letta abstracts memory into managed modules, including core memory, external memory, and a file system. This enables agents to self-edit, persist, and evolve their memory over time, with robust built-in support for structured, long-term memory. LangGraph is graph-based and flexible for custom workflows, while Letta's OS-like approach is designed for persistent, self-improving AI agents with comprehensive memory management.
To view or add a comment, sign in
-
Interesting analysis from G. Elliott Morris and Verasight on the use of "synthetic panels" (i.e. AI) to replicate survey responses. They found substantial variation with some high error rates. As the analysis looked into crosstabs and out-of-sample questions, the error rates looked worse... A warning to anyone thinking about using synthetic panels and digital twins for research... https://lnkd.in/ey6tfDKr
To view or add a comment, sign in
-
📢 Just 2 weeks to go until CoLLAs 2025 — and we're excited to highlight two in-depth tutorials that will be featured at the conference! 🎓 𝗠𝗮𝘁𝗵𝗲𝗺𝗮𝘁𝗶𝗰𝘀 𝗼𝗳 𝗖𝗼𝗻𝘁𝗶𝗻𝘂𝗮𝗹 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 Speakers: Liangzu Peng & René Vidal This tutorial explores the mathematical foundations of continual learning through the lens of adaptive filtering — a well-established field in signal processing. The speakers will bridge theory and practice, highlighting how insights from adaptive filtering can enhance our understanding of continual learning. 📄 Read the paper: https://lnkd.in/ey37wQGk 🧠 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻-𝗧𝗵𝗲𝗼𝗿𝗲𝘁𝗶𝗰 𝗠𝗲𝗮𝘀𝘂𝗿𝗲𝘀 𝗳𝗼𝗿 𝗠𝘂𝗹𝘁𝗶-𝗘𝘅𝗽𝗲𝗿𝘁 𝗙𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻 𝗠𝗼𝗱𝗲𝗹 𝗔𝗱𝗮𝗽𝘁𝗮𝘁𝗶𝗼𝗻 Speakers: Yang Li & Shao-Lun Huang This session addresses a timely challenge: how to adapt multiple foundation models (or experts) to new tasks with limited data or compute. The tutorial introduces a suite of information-theoretic tools (KL divergence, conditional entropy, Chi-square distance, etc.) to evaluate expert utility and guide transfer decisions in multi-source, domain adaptation, and continual learning scenarios. Both tutorials are designed to be accessible yet rigorous — perfect for researchers and practitioners alike who are interested in the theoretical underpinnings and practical strategies of lifelong learning. 📍 Join us in Philadelphia from August 11–14: 🔗 https://lnkd.in/enjRx8z2 #CoLLAs2025 #AI #MachineLearning #ContinualLearning #LifelongLearning #FoundationModels #Tutorials #Research #DeepLearning #AdaptiveLearning
To view or add a comment, sign in
-
Artificial intelligence is changing the seafood industry. Diego Lages, Global Service Director of Fish at JBT Marel, explains how AI helps monitor both production and machine health, turning processing equipment into smart data providers. New #Fish4Thought: https://lnkd.in/eeGiG2-h
Fish 4 Thought with Diego Lages
https://www.youtube.com/
To view or add a comment, sign in
-
My latest video delves into K-Nearest Neighbors (KNN) which is one of the simplest yet powerful algorithms in machine learning! In this video, I break down: - What KNN is? - How KNN works for classification in predicting categories like a pro - How KNN works for regression to estimate values using neighbors - How to choose the best value of K - Distance metrics explained – Euclidean, Manhattan, Minkowski, Hamming, Chebyshev ; with examples for clarity Fun fact: KNN doesn’t “learn” in the traditional sense. It makes decisions based on memory of data, which makes it both simple and surprisingly powerful! If you’ve ever wondered how your model “decides” what’s closest or how to pick the perfect K, this video will answer those questions. Watch here - https://lnkd.in/gQUxk9mi #machinelearning #machinelearningengineer #datascience #datascientist #ml #mlalgorithms #mltutorial #knn #knearestneighbours
K Nearest Neighbours | How to choose k | 5 Distance Metrics Used in KNN | Simple Explanation
https://www.youtube.com/
To view or add a comment, sign in
-
I’ve been diving into the internals of AI systems lately, not just calling APIs, but actually understanding how retrieval-augmented generation (RAG) works. Here’s my beginner takeaway: Chunking splits text into small, overlapping pieces that fit model limits. Embeddings turn each chunk into an array of floats that represent meaning in a high-dimensional space. Vector databases store those vectors and find the closest ones to a query using clever indexing (like HNSW). Cosine similarity measures closeness by angle, not length, so it’s all about direction in semantic space. End result: query in → vector search → relevant chunks → LLM answer. Once you break it down, it feels a lot less like magic and more like just another set of tools to build with. Full write-up here: https://lnkd.in/dW6juSud
To view or add a comment, sign in
-
What if you had an on-demand strategic analyst? 🤔 I’ve been building Hellin – an autonomous AI agent that tackles complex challenges. You give it a goal, and it executes the entire project. Instead of just answering a question, it: 1. Breaks down the problem. 2. Researches the latest data. 3. Analyzes documents. 4. Synthesizes everything into an actionable brief with clear next steps. I tested it by asking: "Develop a market entry strategy for a new eVTOL (flying car) company in North America." The results were stunning. It delivered a full competitor analysis, regulatory overview, and a phased action plan autonomously. This isn't just a chatbot. It's a force multiplier for strategy, research, and analysis. #AI #AgenticAI #Innovation #Strategy #MarketResearch #Productivity #Tech
Hellin - Lindy - 20 August 2025
https://www.loom.com
To view or add a comment, sign in
-
GPT-5 marks another shift in capability density however it was not the step change in performance that many were expecting. In this piece I want to explore where we seem to be, some of the improvements we may need to get to AGI and (stargazing a bit) what could be required to get to super intelligence. https://lnkd.in/eTS3NFcG
To view or add a comment, sign in
-
Just saw a great short demo from Gary Stafford showing how Strands Agents can coordinate semantic searches across a media library on Amazon Web Services (AWS) — combining OpenSearch and TwelveLabs on Bedrock. It’s a practical look at how these tools can work together for richer, more relevant search results — especially for large video and media collections. 🎥 Watch the demo here: https://lnkd.in/gX6_HT6j #SemanticSearch #AWS #OpenSearch #TwelveLabs #Bedrock #AI #CloudComputing
Demonstration: Video Search Agent with TwelveLabs on Bedrock, Stands Agents, and OpenSearch
https://www.youtube.com/
To view or add a comment, sign in
-
Your Brain on GPT Remember your ability to navigate, that is now gone due to using a GPS? Well, the same will happen to our writing ability if we don't use the tools right. This MIT report across my radar through Jim Nightingale. Worth taking a look at. https://lnkd.in/guu4VZF7
To view or add a comment, sign in
-