Jonathan Staude’s Post

Mathematician and Entrepreneur | AI & Data Strategy | Software Developer

LLMs hallucinate - by design. This is what OpenAI now also openly communicates. Just for everyone in Analytics this means: LLMs can not "analyze" data in a sense that it can solve mathematical equations reliably. This means every analytical system based on LLMs must have an in-between layer (SQL generator, python-code generator or equal) to ensure deterministic results.

Nam Nguyen

Technical Support Engineer at Tek Experts

A new paper from OpenAI partially supports some of my longstanding views on large language models (LLMs): - LLMs will inevitably hallucinate, even when the training data is entirely error-free. - Benchmarks are not a reliable measure of “intelligence” in LLMs. The authors are correct in pointing out that hallucinations stem from the operational mechanics of LLMs and from their training feedback loops. However, this only describes statistical tendencies. It does not fully address the deeper question: why do LLMs hallucinate at all? This gap limits the true value of the paper. More concerning is their unsubstantiated claim that it is possible to build a “non-hallucinating” model by connecting it to a Q&A database, adding a calculator, and forcing it to respond “I don’t know” whenever uncertain. There are two major flaws here: - Such a system reduces the model to a rigid program of conditional statements, rather than a generative AI. - LLMs cannot genuinely recognize what they do not know. They lack self-awareness or calibrated confidence, and thus will always appear to know everything. It is surprising to see the world’s most valuable AI company, with some of the brightest minds, present such a simplistic and unsupported proposal. The remainder of the paper is filled with elegant mathematical formulations—but without grounding, they add little substance. #artificialintelligence #LLM #hallucination https://lnkd.in/gfgNetkR

To view or add a comment, sign in

More Relevant Posts

Nam Nguyen

Technical Support Engineer at Tek Experts
2w
Report this post
A new paper from OpenAI partially supports some of my longstanding views on large language models (LLMs): - LLMs will inevitably hallucinate, even when the training data is entirely error-free. - Benchmarks are not a reliable measure of “intelligence” in LLMs. The authors are correct in pointing out that hallucinations stem from the operational mechanics of LLMs and from their training feedback loops. However, this only describes statistical tendencies. It does not fully address the deeper question: why do LLMs hallucinate at all? This gap limits the true value of the paper. More concerning is their unsubstantiated claim that it is possible to build a “non-hallucinating” model by connecting it to a Q&A database, adding a calculator, and forcing it to respond “I don’t know” whenever uncertain. There are two major flaws here: - Such a system reduces the model to a rigid program of conditional statements, rather than a generative AI. - LLMs cannot genuinely recognize what they do not know. They lack self-awareness or calibrated confidence, and thus will always appear to know everything. It is surprising to see the world’s most valuable AI company, with some of the brightest minds, present such a simplistic and unsupported proposal. The remainder of the paper is filled with elegant mathematical formulations—but without grounding, they add little substance. #artificialintelligence #LLM #hallucination https://lnkd.in/gfgNetkR
138 Comments
Like Comment
To view or add a comment, sign in
Harvey Spencer

Co-Founder of Jaime-ai an AI tool to help in regulation compliance President and Founder at Factorum llc. Xamcor Founder and Partner
2w
Report this post
Interesting opinion piece in today's NY Times that we need to look at Neuro Symbolic AI "The Fever Dream of Imminent Superintelligence Is Finally Breaking" By Gary Marcus
Nam Nguyen

Technical Support Engineer at Tek Experts
2w

A new paper from OpenAI partially supports some of my longstanding views on large language models (LLMs): - LLMs will inevitably hallucinate, even when the training data is entirely error-free. - Benchmarks are not a reliable measure of “intelligence” in LLMs. The authors are correct in pointing out that hallucinations stem from the operational mechanics of LLMs and from their training feedback loops. However, this only describes statistical tendencies. It does not fully address the deeper question: why do LLMs hallucinate at all? This gap limits the true value of the paper. More concerning is their unsubstantiated claim that it is possible to build a “non-hallucinating” model by connecting it to a Q&A database, adding a calculator, and forcing it to respond “I don’t know” whenever uncertain. There are two major flaws here: - Such a system reduces the model to a rigid program of conditional statements, rather than a generative AI. - LLMs cannot genuinely recognize what they do not know. They lack self-awareness or calibrated confidence, and thus will always appear to know everything. It is surprising to see the world’s most valuable AI company, with some of the brightest minds, present such a simplistic and unsupported proposal. The remainder of the paper is filled with elegant mathematical formulations—but without grounding, they add little substance. #artificialintelligence #LLM #hallucination https://lnkd.in/gfgNetkR
1 Comment
Like Comment
To view or add a comment, sign in
Jorge Charlin

Arquiteto de Ecossistemas de Aprendizagem | Construindo Soberania na Era da IA
2w
Report this post
OpenAI's latest paper on hallucinations isn't a confession. It's a cry for help. And the subsequent debate, highlighted here by Nam Nguyen, shows we might be listening for the wrong thing. The paper's core analogy is perfect: LLMs are like students who guess on exams because the system rewards plausible answers over honest uncertainty. They've admitted to building the perfect student, not a sage. This leads to a radical conclusion: the pursuit of "trustworthy AI" is a dangerous distraction. If the model is designed to be a brilliant, but sometimes dishonest, test-taker, then the responsibility for truth cannot be delegated to it. It must remain with the user. The future of education and professional work will not be defined by how well we build AI, but by how well we architect the human capacity to govern it. We don't need better AI. We need a generation of Sovereign Auditors and Conscious Curators who know how to wield these powerful tools without surrendering their own critical judgment. The solution isn't in the code. It's in the curriculum. #AI #Hallucinations #OpenAI #CognitiveSovereignty #Pedagogy #FutureOfWork #Kairos
Nam Nguyen

Technical Support Engineer at Tek Experts
2w

A new paper from OpenAI partially supports some of my longstanding views on large language models (LLMs): - LLMs will inevitably hallucinate, even when the training data is entirely error-free. - Benchmarks are not a reliable measure of “intelligence” in LLMs. The authors are correct in pointing out that hallucinations stem from the operational mechanics of LLMs and from their training feedback loops. However, this only describes statistical tendencies. It does not fully address the deeper question: why do LLMs hallucinate at all? This gap limits the true value of the paper. More concerning is their unsubstantiated claim that it is possible to build a “non-hallucinating” model by connecting it to a Q&A database, adding a calculator, and forcing it to respond “I don’t know” whenever uncertain. There are two major flaws here: - Such a system reduces the model to a rigid program of conditional statements, rather than a generative AI. - LLMs cannot genuinely recognize what they do not know. They lack self-awareness or calibrated confidence, and thus will always appear to know everything. It is surprising to see the world’s most valuable AI company, with some of the brightest minds, present such a simplistic and unsupported proposal. The remainder of the paper is filled with elegant mathematical formulations—but without grounding, they add little substance. #artificialintelligence #LLM #hallucination https://lnkd.in/gfgNetkR
Like Comment
To view or add a comment, sign in
Nam Nguyen

Technical Support Engineer at Tek Experts
1w
Report this post
To my surprise, my post last week blew up way more than anything I ever wrote here. I appreciate everyone who shared their opinions, even when we may not agree on everything. My updated view on LLMs: - Hallucination is a categorical problem. To the model, correct output and hallucination look the same. We need humans to decide which is which. - To train the model so it knows how to say "IDK" doesn't solve the problem. It can say that, but the underlying reasoning structure would remain largely the same. - The math suggests there is no actual reasoning inside the model. It can explain why autoregressive systems like LLMs hallucinate. - Yet the math cannot explain why certain embeddings are corresponded with certain tokens (or meanings). Emergent properties are real, whether we accept it or not. - With careful prompting (as I wrote before), we can force the model to output as if it had actual reasoning capabilities. This is reproducible and I have a solid theory for it. - Scaling does help LLMs, but of course with diminishing return. This is where we should design better architecture around LLMs or implement multimodal systems. I guess the larger point is that because we are the creators, we have the authority to decide whether a model is useful or not. But I feel like sometimes this authority is the very bias that prevents us from seeing the whole field, especially when it comes to hallucination. As we come to accept that hallucination is a 'feature' of LLMs, we should take advantage of it rather than try to eliminate it in vain. On the other hand, I also see many people claim that their AI assistants are conscious or something like that. While I would rather not dismiss their experience (because to them it's real), I would like to emphasize that drift loops carry great risks without grounding. What good is it if you can't share and make others understand your perspective? Still, I understand that reality is subjective so we all can live the way we want. Feel free to share your thoughts, I'm always all ears. #artificialintelligence #llm #hallucination
Nam Nguyen

Technical Support Engineer at Tek Experts
2w

A new paper from OpenAI partially supports some of my longstanding views on large language models (LLMs): - LLMs will inevitably hallucinate, even when the training data is entirely error-free. - Benchmarks are not a reliable measure of “intelligence” in LLMs. The authors are correct in pointing out that hallucinations stem from the operational mechanics of LLMs and from their training feedback loops. However, this only describes statistical tendencies. It does not fully address the deeper question: why do LLMs hallucinate at all? This gap limits the true value of the paper. More concerning is their unsubstantiated claim that it is possible to build a “non-hallucinating” model by connecting it to a Q&A database, adding a calculator, and forcing it to respond “I don’t know” whenever uncertain. There are two major flaws here: - Such a system reduces the model to a rigid program of conditional statements, rather than a generative AI. - LLMs cannot genuinely recognize what they do not know. They lack self-awareness or calibrated confidence, and thus will always appear to know everything. It is surprising to see the world’s most valuable AI company, with some of the brightest minds, present such a simplistic and unsupported proposal. The remainder of the paper is filled with elegant mathematical formulations—but without grounding, they add little substance. #artificialintelligence #LLM #hallucination https://lnkd.in/gfgNetkR
5 Comments
Like Comment
To view or add a comment, sign in
Rayan R.

Graduate Research Assistant@ MSU | MITACS GRI’25 | INTEL Student Ambassador | xTA @ UET
2w
Report this post
Why Large Language Models Will Always Hallucinate? I recently came across a fascinating paper from OpenAI that digs into one of the most misunderstood aspects of AI: hallucinations. Most people think hallucinations happen because of bad training data or model flaws. But this paper shows, using statistical learning theory, that hallucinations are actually inevitable: 🔹 Approximation error – LLMs can never perfectly capture reality; they only approximate patterns from data. 🔹 Generalization error – Even if trained well, models face gaps when making predictions on unseen inputs. 🔹 VC dimension trade-off – Bigger models reduce some errors, but hallucinations never vanish completely. The key takeaway? 👉 Hallucinations aren’t accidents—they’re a fundamental property of how machine learning works. We can reduce them (via retrieval, grounding, or better data), but they will always exist. This perspective is powerful: instead of chasing “zero hallucinations,” we should focus on designing systems that are robust in the face of inevitable uncertainty. Would love to hear— Do you see hallucinations as a blocker, or as a natural byproduct we can engineer around?
2 Comments
Like Comment
To view or add a comment, sign in
Gabe Perez

I Ship Business Software in 3 Days for $3K. 50+ Delivered. Ex-CITYROW | GaTech | Not Everyone Qualifies
2w
Report this post
Just read OpenAI and Georgia Tech's new paper on why AI hallucinations happen. The findings are surprisingly simple. 𝗜𝘁'𝘀 𝗻𝗼𝘁 𝗮 𝗯𝘂𝗴. 𝗜𝘁'𝘀 𝗺𝗮𝘁𝗵. The paper shows that even with perfect training data, language models will make mistakes. Why? Because generating text is fundamentally harder than checking if text is correct. The statistical proof is elegant - and inevitable. But here's what really caught my attention: We're training AI to be bad test-takers who guess instead of saying "I don't know." The researchers analyzed major AI benchmarks - MMLU, GPQA, SWE-bench, and others. Almost all use binary scoring: right or wrong. No credit for uncertainty. So what happens? Models learn to guess confidently even when they have no idea. Just like students taking the SAT before they removed the guessing penalty. 𝗧𝗵𝗲 𝗽𝗮𝗽𝗲𝗿'𝘀 𝘀𝗼𝗹𝘂𝘁𝗶𝗼𝗻 𝗶𝘀𝗻'𝘁 𝗮𝗻𝗼𝘁𝗵𝗲𝗿 𝗵𝗮𝗹𝗹𝘂𝗰𝗶𝗻𝗮𝘁𝗶𝗼𝗻 𝗱𝗲𝘁𝗲𝗰𝘁𝗼𝗿. It's changing how we score existing benchmarks. Give partial credit for "I don't know." Make uncertainty acceptable. This connects directly to what I've been building. My free app workshop generates requirements documents for custom software. But here's the thing - it asks clarifying questions when something's unclear. It doesn't guess what you want. Because in the real world, asking for clarification beats confident nonsense every time💯 The paper argues we need a "socio-technical" fix - not just new tech, but changing the culture of how we evaluate AI. 𝗠𝗮𝗸𝗲𝘀 𝘀𝗲𝗻𝘀𝗲. 𝗪𝗲 𝗴𝗲𝘁 𝘄𝗵𝗮𝘁 𝘄𝗲 𝗺𝗲𝗮𝘀𝘂𝗿𝗲. What do you think - should AI systems get credit for admitting when they don't know something? 🤔 𝘗𝘢𝘱𝘦𝘳: Why Language Models Hallucinate (Kalai et al., 2025) #AIHallucination #ResponsibleAI #MachineLearning #AIResearch #TrustworthyAI
1 Comment
Like Comment
To view or add a comment, sign in
Vaibhav Maniar

Thought Leader| Engineering Craftsman | Architecting Scalable Microservices and High-TPS Systems | Leader in Technical Innovation, AI-Enhanced SDLC, and Engineering Excellence
3w Edited
Report this post
Hello Tech Enthusiasts 🤝 🚀 Level up your LLM game! Ever struggled with Large Language Models hallucinating or needing access to real-time, private data? Retrieval-Augmented Generation (RAG) – the game-changer for building smarter, more reliable AI applications. What is RAG? It's simple: We give LLMs a dynamic "textbook" to reference before they answer! 🔍 Retrieve: Find relevant info from your knowledge base. ✍️ Generate: LLM uses this context to give precise, grounded answers. This approach transforms generic LLM responses into accurate, context-aware solutions! Check out this end-to-end guide for engineers to dive deep into RAG[https://lnkd.in/gM67XeCz] #RAG #LLM #AI #DeepLearning #SoftwareEngineering #TechGuide #MLOps #Innovation
Like Comment
To view or add a comment, sign in
Fahad Hasan

C# .NET Developer | Windows Applications | Problem Solver | Java
2w
Report this post
🚀 Why Retrieval-Augmented Generation (RAG) Matters in AI Large Language Models (LLMs) are powerful—but they have a limitation: they rely only on what they were trained on. This means their knowledge can become outdated, incomplete, or even inaccurate. 👉 That’s where RAG (Retrieval-Augmented Generation) comes in. By combining LLMs with external knowledge bases (databases, documents, APIs), RAG ensures responses are factual, up-to-date, and context-aware. ✅ Key Benefits of RAG: Accuracy: Pulls real-time verified data instead of relying solely on memory. Flexibility: Can adapt across industries—healthcare, finance, legal, or research. Scalability: No need to retrain models for every knowledge update. Transparency: Easier to trace where information comes from. 💡 Example: Instead of an LLM “guessing” stock market insights, a RAG-powered system retrieves the latest financial reports and then generates analysis. 📌 In short: RAG bridges the gap between AI’s reasoning ability and the ever-changing world of information. It’s a game-changer for building trustworthy, enterprise-ready AI applications. #ArtificialIntelligence #RAG #MachineLearning #LLM #Innovation #AI
Like Comment
To view or add a comment, sign in
Eugene Eruslanov

Seasoned leader
1w Edited
Report this post
Where Large Language Models (LLMs) are replacing classic machine learning (ML) or making it better, and where are they struggling? For ex, a recommender system (classic ML) might suggest your next thriller movie based on your watch history, but an LLM can go further. It can read feedback and discover your hidden preference, like maybe you only enjoy thrillers in Spanish, or thrillers mixed with love stories. Where else do LLMs perform the best? Making sense of unstructured info, e.g. emails, feedback, chat, or Slack messages. Where do LLMs usually struggle? In situations where absolute accuracy is critical, e.g. calculations or healthcare critical decisions (high responsibility). The future is LLM + Classic ML. #AI #MachineLearning #LLM #DataScience #Innovation

2 Comments
Like Comment
To view or add a comment, sign in
Maxim Ivanov
2w
Report this post
Yesterday I read a fascinating paper from OpenAI researchers that mathematically proves hallucinations in language models are inevitable. I feel it’s kind of a relief that we admit it publicly. The research confirms (all probably already knew that): - LLMs generate text token by token. Each word is a probabilistic guess, not a fact. - They're trained to sound confident even when uncertain. I think we should just treat it as a design constraint, not a bug. At Aimprosoft, we already build with the assumption that hallucinations WILL happen: ↳ Every AI output goes through validation loops (yep, our people check each line of code) with scoring systems ↳ We regenerate until results hit our quality bar ↳ Statistical calculations get manual spot-checking through inductive reasoning ↳ We separate AI-appropriate tasks from those better suited for traditional algorithms If we accept that hallucinations will happen, it means for our future: ➤The AI companies selling "zero hallucination" solutions will be out of business within 18 months ➤ Your data science team just became 10x more valuable, someone needs to catch AI's confident lies ➤ Every enterprise contract will require AI audit trails by 2026 ➤ Companies without validation systems are building ticking time bombs There are some great reads about the report here on LinkedIn. Worth checking out: Linas Beliūnas and Eduardo Ordax
3 Comments
Like Comment
To view or add a comment, sign in

5,235 followers

View Profile Follow

Jonathan Staude’s Post

More from this author

Das Missverständnis von künstlicher Intelligenz: Frank Thelens Aussage und die Grenzen unseres Wissens

Explore content categories