Gary Marcus on neurosymbolic AI and LLMs

5mo

A few days ago, Gary Marcus published a thought-provoking post arguing that many of today’s most advanced AI systems already qualify as neurosymbolic AI -- not because of what’s inside the model, but because of how they interact with symbolic, often deterministic tools. We tend to associate neurosymbolic AI with architectures that embed symbolic reasoning within the model itself. But Marcus makes the case that tool-using LLMs (systems that call out to code interpreters, search engines, and calculators) are just as much in that tradition. The symbolic logic may live outside the model, but it’s doing real work in shaping the system’s behavior. Set aside the baggage of the Marcus vs. LLM-world debate for a moment -- whatever side you take, he’s hitting on an important point. The reliability of LLM-powered systems is being driven not only by big improvements in model performance, but also by architectures that connect those models to external tools (e.g., web search, code interpreters), many of which add symbolic reasoning, verification, determinism. Looking at Grok 4 and Grok 4 Heavy, these made a splash this week with SOTA results on key benchmarks. But when you look closely, you see that performance gets a big boost when the models are allowed access to tools, especially those with deterministic logic like a Python interpreter. That’s a neurosymbolic system, whether or not the model internals were designed that way. This has me thinking about architectural paths forward for improving LLM reliability and security in concrete enterprise contexts. The big question I’m thinking about is: How far can generalist neurosymbolic architectures take us on reliability and security, versus approaches that anchor LLMs in domain-specific workflows and logic? Generalist systems are exciting if they generalize well. But in high-stakes and high-volume domains, we may still need tight coupling with deterministic layers and trusted domain-specific workflows to get the reliability and trustworthiness we need at scale. My question for Gary and others who've been in this space for some time: Are there examples where generalist neurosymbolic systems give us strong, niche reliability guarantees? Or is that still an open research question? Link to Marcus's post: https://lnkd.in/d6U_MSjK #artificialintelligence #ai #trustworthyai #aisafety #aisecurity

34 Comments

Péter Adorján

5mo

Will likely stay forever an open research question. To me, it looks it is all about finding the right mix between deterministic symbolic operations (with reliability, provable outcomes, but limitations of expressive power) and stochastic "problem-solving" (with unreliable and not provable outcomes and low/no explainability). On the one hand, if the LLM "reasons" false and sends the symbolic tools in the wrong direction, and tasks them with the wrong tasks, or solves symbolic problems (1+1=?) in a stochastic fashion, then results will be disappointing, tools might not converge to a solution, etc. On the other hand, if the two paradigms complement each other well, the combination brings huge power. Ultimately, this is exactly what we human beings do all day, use math (symbolic tools) where language has its limits, and vice versa. I think, that this problem of finding the proper mix can be only tackled & benchmarked with well formulated niche problem domains.

2 Reactions

Ben Torben-Nielsen, PhD, MBA

5mo

This is an interesting take on the topic Jason Stanley. Very refreshing that neurosymbolic is not used as in "better AI" and that better AI is just more reliable stuff for you. What I am not sure, though, is that this is called "neurosymbolic AI". And more reliable also does not mean "more" or "better" AI (note that I am not refering to "intelligent"). For reliability, there is still some sort of limit through what the system can and cannot "process" (as a proxy for the word "understand"). And I assume that you want a system that can understand as much as possible to work well. And btw, on the pure neursymbolic thing, I wrote something a few days ago. In case it is of interest: https://www.linkedin.com/posts/ben-torben-nielsen_ai-history-realchallenges-activity-7349306338042150913-sorL?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAC7T8IBpJke5NXLL3dCTvXujQXvGetH-JE

1 Reaction

Suresh Babu

5mo

"Are there examples where generalist neurosymbolic systems give us strong, niche reliability guarantees? Or is that still an open research question?" Language is already such a system par excellence. We can understand one another in a stochastic world in highly ambiguous contexts to a large extent because of such socially shaped reliability of meaning. Claude Shannon already provided the mathematical foundations for communications. In his article "Prediction and Entropy of Printed English," he showed how we possess some intuitive statistics for such predictions. (His demonstration of how successive approximations produce the structure of English is fascinating.) What's more language embeds a certain hierarchy for such predictions. Our brains read the "tape of symbols" sent to us (by sound/sign), decode the hierarchy, map it to memory, and arrive at a certain meaning. Shannon highlights how the principle of ergodicity makes this possible. Ask yourself how is it possible that the thoughts in one person's brain can make their way to thousands/millions of others and become comprehensible. Clearly, this is impossible without reliability guarantees. To miss this essential point is to circle the "neurosymbolic drain."

3 Reactions

Kingsley Uyi Idehen

5mo

"Are there examples where generalist neurosymbolic systems give us strong, niche reliability guarantees?" They’re certainly more reliable than relying on LLMs alone. Recent protocol-level innovations like MCP (Model Context Protocol) are making it significantly easier to integrate neurosymbolic approaches in both implementation and research contexts. These advances help bridge symbolic systems with language models, allowing for more consistent, niche-reliable results in practical workflows. Related: 1. https://www.linkedin.com/pulse/semantic-web-project-didnt-fail-waiting-ai-yin-its-yang-idehen-j01se -- AI and Semantic Web Project symbiosis. 2. https://www.linkedin.com/posts/kidehen_ai-genai-issues-activity-7350672722860871681-bCsI -- Calculators and Langulators.

2 Reactions

Mohamed Jaffer

5mo

The concept of neurosymbolic AI leveraging external deterministic tools is fascinating, especially when considering the balance between generalist versatility and domain specific reliability and could this approach be the key to scaling trustworthy AI in critical enterprise applications?

2 Reactions

Asmaa Hammouch

4mo

This exact tension , between LLM-driven generalization and the need for domain-specific symbolic logic and verification , is what inspired us to develop HIRE AI™ as a trait-based neurosymbolic architecture. Instead of relying on interpolated associations alone, HIRE AI grounds each decision in a calibrated emotional-behavioral signature (e.g. humility vs. hubris, empathy vs. bypass), forming a transparent logic circuit between response → resonance → registration. It doesn’t just reference tools ; it scores the internal integrity of how tools, inputs, and outputs reflect human-aligned traits. Think of it as a values-anchored interpreter embedded within symbolic registration gates. Still early days, but it’s showing promise in leadership reliability, feedback closure, and emotional precision, especially in high-trust, high-risk contexts where hallucinations aren’t an option. Would love to exchange ideas if this resonates.

Michael D.

5mo

There is a difference between Neurosymbolic and LLMs that can access tools. The Neurosymbolic architecture is one of the latest examples of Symbolic AI and Symbolic AI is all about knowledge representation. I used to get arguments from COBOL developers in the 1980's who would look at a rule-based system and say "I could implement the same thing in COBOL". I would tell them that they are correct but missing the point. There's a proof called Turing equivalence. Anything that can be programmed in a programming language that has the power of a Turing machine can also be programmed in any other language that has the power of a Turing machine (and all mainstream languages from Assembler to C to Python to rules and ontologies are Turing machines). The difference is how much work it takes to write the program and how easy it is to adapt, fix, and extend the program to do additional work. That's what knowledge representation is all about. (running out of chars, will continue in 2nd post).

1 Reaction

Ravneet Singh

5mo

You could point to the No‑Free‑Lunch theorem: without built‑in, task‑specific rules, you can’t promise a system will work everywhere. So far, neurosymbolic models only give assured guarantees in narrow, well‑defined areas. Building a truly general system with formal reliability proofs is still an open research problem.

Pranab Ghosh

5mo

LLMs with Inference Time Scaling are also often neurosymbolic, because the inference time optimizations are generally symbolic. DeepMind has several interesting neurosymbolic LLM based solutions, such as AlohaGeomeyry and AlohaEvolve. There are neurosymbolic LLM based planning solutions. For serious business critical applications, neurosymbolic LLM may be the best path forward. Native LLM solution in such cases isn’t going to cut the mustard.

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Yuva Sri P
2mo
Report this post
🚀 Day 19/100 of my AI Engineer Challenge. ✅ Today’s Topic: Machine learning- Naive Bayes 🔹Naive Bayes is a probabilistic machine learning classificaton algorithm based on Bayes’ Theorem. 🔹Implement a simple program on text classification 🔹Types of Naive Bayes 1.Multinomial Naive Bayes → Used for text data (word counts). 2.Bernoulli Naive Bayes → Used for binary features (word present or not). 3.Gaussian Naive Bayes → Used for continuous data (like numeric features). 🔹 Key takeaway: Fomula: P(A/B)=P(B/A)*P(A)/P(B) code 📁 :[https://lnkd.in/gwJjfNcF]
1 Comment
Like Comment
To view or add a comment, sign in
Anthony Janus
2mo
Report this post
What does it look like when an independent, unsolicited, and uncredentialed AGI proposal gets serious, high-conviction attention from experts? I think it looks like this. On October 8th, I published the whitepaper: a framework for a deterministic, constitutionally safe AGI called Structural AI. The response has been quiet, private, and statistically staggering. Forget stars or likes; the only metric that signals deep analysis is clones. The numbers from the last four days are unlike anything I've ever seen: 📈 110% Clone-to-Visitor Ratio: Since the whitepaper went live, the project has received more clones than unique visitors. This is the statistical fingerprint of a "dark social" cascade. The link is being shared and cloned directly within private, expert networks. 📈 68% Unique Cloner-to-Visitor Ratio: More than 2 out of every 3 unique individuals who visited the repository since Oct 8th have cloned the entire project. This isn't casual curiosity; it's a deep dive. The attached graph shows a massive spike in cloning with almost no corresponding public web traffic. This isn't viral noise. This is a powerful signal from a serious audience. (spike prior to Oct 8 unexplained, but likely connected to the cognitive framework) I've open-sourced the project: the cognitive framework (headers pending license correction), the StrAI whitepaper, and the Python proof-of-concept. I invite you to conduct your own analysis and see what has this community's attention. View the full project on GitHub: https://lnkd.in/gNU63AGH #AGI #AISafety #AI #ArtificialIntelligence #OpenSource #CognitiveScience #Innovation #DeepTech #FutureofAI #MachineLearning
Like Comment
To view or add a comment, sign in
James Golden
1mo
Report this post
Can we understand how LLMs make next-word predictions? I wrote a paper with a new approach to this called "Equivalent Linear Mappings of Large Language Models" that was just published in Transactions on Machine Learning Research (TMLR)! Even though LLMs are nonlinear deep networks, I found they can actually be represented as simple linear systems when you look at any specific input. For each prompt you give the model, one matrix multiplication per input word perfectly captures the generation of each output word. Unfortunately computing this linear system is slower than word generation, and it must be computed for each predicted word, but it is an interpretability tool that requires no training. This works for models like Qwen 3, Gemma 3, and Llama 3 (up to 14B parameters, although only for architectures with certain properties). I found that these LLMs operate in surprisingly low-dimensional subspaces that are semantically interpretable in the context of the input and prediction. You can watch how the model builds up the next-word predictions layer by layer, and even use this understanding to force the model to talk about concepts unrelated to the input prompt. This was influenced by “A mathematical framework for transformer circuits” by Elhage, Nanda, Olsson, et al., as well as work from Mohan, Simoncelli, et al. and Khadkhodaie, Simoncelli et al. on interpretable image denoising diffusion models. I'll be presenting this at the upcoming NeurIPS Mechanistic Interpretability workshop. If you're in attendance, come say hi! I did this while I had some recent time off (s/o Colab and open models on Huggingface), but I’m excited to share some work from my new position very soon! TMLR: https://lnkd.in/gHkbKEuG ArXiv: https://lnkd.in/gjtEgWGe Blog: https://lnkd.in/gSW46Tj7 Code: https://lnkd.in/gYRChBun #MachineLearning #LLM #AI #Interpretability #TMLR #NeurIPS #Mechinterp

4 Comments
Like Comment
To view or add a comment, sign in
Marcin Kopec, OSCE,OSWE,OSCP,CISSP,CISM,CEH,...
2mo
Report this post
The AI Race Continues Just a few days ago, Anthropic released Claude Sonnet 4.5, claiming it is the best coding model in the world and the best for agentic tasks. According to their claims, the model outperforms other Anthropic models (Opus 4.1 and Sonnet 4) as well as OpenAI GPT-5 Codex and GPT-5 on multiple coding benchmarks, including SWE-bench, Terminal-Bench, and t2-bench. If current progress in coding LLM improvement continues, we should start seeing LLMs getting around 100% on some of those benchmarks in a year or so. Source: https://lnkd.in/dzMG67nr Image: "Artificial Intelligence race, in the style of Claude Monet" by FLUX.1 Kontext [dev].
Like Comment
To view or add a comment, sign in
Baz

1,434 followers
2mo
Report this post
We’ve expanded our agentic reviewers with Deep Logical Bugs, which goes beyond diff-level PR analysis. It moves through the codebase like a teammate thinking two steps ahead: 📎 Pulls in external context (tickets, linked issues) 🔍 Explores related code paths for downstream effects 💡 Forms hypotheses about where a change could break things 🧪 Tests each hypothesis with dedicated sub-agents Under the hood, Deep Logical Bugs runs on Anthropic Sonnet 4 and OpenAI GPT-4.1, orchestrated via LangChain, with stateful multi-step reasoning managed by LangGraph AI. The result? A reviewer that surfaces subtle logic flaws, overlooked conditions, and risky edge cases before they land in production. Now live in your Baz agents dashboard.
Like Comment
To view or add a comment, sign in
Lambda World

712 followers
2mo
Report this post
Applying Functional Programming to make AI more reliable? HELL YEAH! AI isn’t only about big data and neural networks there’s also a world where logic leads the way... That’s where Alexander Gryzlov, proof engineer at IMDEA Software Institute, steps in. In his talk, he’ll show how classic techniques get a fresh twist, with clever combinators doing the heavy lifting so you can focus on the ideas. From his early days as an industry developer, Alexander blends practice and theory to show how “old-school” AI and modern techniques can work together to create software that’s both reliable and a little bit magical. Get to know him and his talk in depth here: https://lnkd.in/dcb2wzkF #SymbolicAI #Agda #DependentTypes #FormalVerification
Like Comment
To view or add a comment, sign in
Sunil Kumar
2mo
Report this post
Just hit a crucial learning point on a recent business use case. We were relying on semantic search and a vector database to answer questions against our input data. It excelled with documents and PDFs, but failed spectacularly when applied to structured tabular data. The fix? A shift back to a simple keyword/term search which dramatically improved our results. The takeaway: Even in the era of VIBE coding, we can't fall into the trap of treating AI as a panacea. A fresh dose of human critical thinking and understanding the nature of the data will always be the most valuable asset in solving a business problem. #AI #DataScience #VectorDB #SemanticSearch #HumanInTheLoop #LessonLearned 😀
Like Comment
To view or add a comment, sign in
Leon Chlon, PhD
2mo
Report this post
LLM hallucinations aren't bugs, they're compression artefacts. And we figured out how to predict them before they happen. We launched HallBayes open source one month ago and the community engagement has been remarkable. Last night we extended the approach away from OpenAI and now our hallucination toolkit has support for: - Hugging Face (transformers, TGI, or Inference API), - Ollama (local), - Anthropic (Claude Messages), - OpenRouter (OpenAI‑compatible aggregator). This generalises the original OpenAI‑only backend while leaving the math untouched. (Planner still computes Δ̄, B2T, ISR, RoH exactly as before.) I also have a super cool announcement coming soon as part of collaborative work with Rishi Puri at NVIDIA and Pytorch Geometric working on uncertainty integration with some of my favourite models: graph neural networks! I've always dreamt of the kind of advance in biomedical application this work will bring so I'm super excited to announce it once we get the green light. To recap: We proved hallucinations occur when information budgets fall below mathematical thresholds. Using our Expectation-level Decompression Law (EDFL), we can calculate exactly how many bits of information are needed to prevent any specific hallucination, before generation even starts. This resolves a fundamental paradox: LLMs achieve near-perfect Bayesian performance on average, yet systematically fail on specific inputs. We proved they're "Bayesian in expectation, not in realisation", optimising average-case compression rather than worst-case reliability. Why this changes everything? Instead of treating hallucinations as inevitable, we can now: - Calculate risk scores before generating any text - Set guaranteed error bounds (e.g., <5% hallucination rate) - Know precisely when to gather more context vs. abstain We're open-sourcing our Hallucination Risk Calculator today. Now with cross-platform and cross-model compatibility. Zero retraining required. Provides mathematical SLA guarantees for compliance. Perfect for healthcare, finance, legal, anywhere errors aren't acceptable. The era of "trust me, bro" AI is ending. Welcome to bounded, predictable AI reliability. Check out the source code, web-app and Mac release here: https://lnkd.in/e4s3X8GK
22 Comments
Like Comment
To view or add a comment, sign in
Vinicius Henning
2mo
Report this post
It's truly amazing to see such good quality work being published in the AI community. With all the hype going on, papers like this one are crucial to push the AI boundaries forward. If you already know that LLMs are fundamentally next-token predictors, this paper shows why thinking of them as strict Bayesians is misleading: they’re Bayesian only “in expectation.” This perspective sheds light on key issues in how we build RAG systems, where document ordering and ensembling strategies matter far more than we might assume.
Leon Chlon, PhD

Founder & CEO @ Reliably AI, the pre‑generation, training‑free trust infra between users & frontier APIs. Build today with confidence.
2mo

LLM hallucinations aren't bugs, they're compression artefacts. And we figured out how to predict them before they happen. We launched HallBayes open source one month ago and the community engagement has been remarkable. Last night we extended the approach away from OpenAI and now our hallucination toolkit has support for: - Hugging Face (transformers, TGI, or Inference API), - Ollama (local), - Anthropic (Claude Messages), - OpenRouter (OpenAI‑compatible aggregator). This generalises the original OpenAI‑only backend while leaving the math untouched. (Planner still computes Δ̄, B2T, ISR, RoH exactly as before.) I also have a super cool announcement coming soon as part of collaborative work with Rishi Puri at NVIDIA and Pytorch Geometric working on uncertainty integration with some of my favourite models: graph neural networks! I've always dreamt of the kind of advance in biomedical application this work will bring so I'm super excited to announce it once we get the green light. To recap: We proved hallucinations occur when information budgets fall below mathematical thresholds. Using our Expectation-level Decompression Law (EDFL), we can calculate exactly how many bits of information are needed to prevent any specific hallucination, before generation even starts. This resolves a fundamental paradox: LLMs achieve near-perfect Bayesian performance on average, yet systematically fail on specific inputs. We proved they're "Bayesian in expectation, not in realisation", optimising average-case compression rather than worst-case reliability. Why this changes everything? Instead of treating hallucinations as inevitable, we can now: - Calculate risk scores before generating any text - Set guaranteed error bounds (e.g., <5% hallucination rate) - Know precisely when to gather more context vs. abstain We're open-sourcing our Hallucination Risk Calculator today. Now with cross-platform and cross-model compatibility. Zero retraining required. Provides mathematical SLA guarantees for compliance. Perfect for healthcare, finance, legal, anywhere errors aren't acceptable. The era of "trust me, bro" AI is ending. Welcome to bounded, predictable AI reliability. Check out the source code, web-app and Mac release here: https://lnkd.in/e4s3X8GK
Like Comment
To view or add a comment, sign in

6,866 followers

View Profile Follow

LinkedIn respects your privacy

Gary Marcus on neurosymbolic AI and LLMs

More from this author

Disciplined miles for an undisciplined mind: Running frees the mind to be more creative

Sticky insights: Your message won’t travel unless you drop the junk

Without embracing vulnerability, we can’t innovate

Explore content categories

Gary Marcus on neurosymbolic AI and LLMs

More Relevant Posts

More from this author

Disciplined miles for an undisciplined mind: Running frees the mind to be more creative

Sticky insights: Your message won’t travel unless you drop the junk

Without embracing vulnerability, we can’t innovate

Explore related topics

Explore content categories