Gary Marcus on neurosymbolic AI and LLMs

A few days ago, Gary Marcus published a thought-provoking post arguing that many of today’s most advanced AI systems already qualify as neurosymbolic AI -- not because of what’s inside the model, but because of how they interact with symbolic, often deterministic tools. We tend to associate neurosymbolic AI with architectures that embed symbolic reasoning within the model itself. But Marcus makes the case that tool-using LLMs (systems that call out to code interpreters, search engines, and calculators) are just as much in that tradition. The symbolic logic may live outside the model, but it’s doing real work in shaping the system’s behavior. Set aside the baggage of the Marcus vs. LLM-world debate for a moment -- whatever side you take, he’s hitting on an important point. The reliability of LLM-powered systems is being driven not only by big improvements in model performance, but also by architectures that connect those models to external tools (e.g., web search, code interpreters), many of which add symbolic reasoning, verification, determinism. Looking at Grok 4 and Grok 4 Heavy, these made a splash this week with SOTA results on key benchmarks. But when you look closely, you see that performance gets a big boost when the models are allowed access to tools, especially those with deterministic logic like a Python interpreter. That’s a neurosymbolic system, whether or not the model internals were designed that way. This has me thinking about architectural paths forward for improving LLM reliability and security in concrete enterprise contexts. The big question I’m thinking about is: How far can generalist neurosymbolic architectures take us on reliability and security, versus approaches that anchor LLMs in domain-specific workflows and logic? Generalist systems are exciting if they generalize well. But in high-stakes and high-volume domains, we may still need tight coupling with deterministic layers and trusted domain-specific workflows to get the reliability and trustworthiness we need at scale. My question for Gary and others who've been in this space for some time: Are there examples where generalist neurosymbolic systems give us strong, niche reliability guarantees? Or is that still an open research question? Link to Marcus's post: https://lnkd.in/d6U_MSjK #artificialintelligence #ai #trustworthyai #aisafety #aisecurity

  • No alternative text description for this image

Will likely stay forever an open research question. To me, it looks it is all about finding the right mix between deterministic symbolic operations (with reliability, provable outcomes, but limitations of expressive power) and stochastic "problem-solving" (with unreliable and not provable outcomes and low/no explainability). On the one hand, if the LLM "reasons" false and sends the symbolic tools in the wrong direction, and tasks them with the wrong tasks, or solves symbolic problems (1+1=?) in a stochastic fashion, then results will be disappointing, tools might not converge to a solution, etc. On the other hand, if the two paradigms complement each other well, the combination brings huge power. Ultimately, this is exactly what we human beings do all day, use math (symbolic tools) where language has its limits, and vice versa. I think, that this problem of finding the proper mix can be only tackled & benchmarked with well formulated niche problem domains.

This is an interesting take on the topic Jason Stanley. Very refreshing that neurosymbolic is not used as in "better AI" and that better AI is just more reliable stuff for you. What I am not sure, though, is that this is called "neurosymbolic AI". And more reliable also does not mean "more" or "better" AI (note that I am not refering to "intelligent"). For reliability, there is still some sort of limit through what the system can and cannot "process" (as a proxy for the word "understand"). And I assume that you want a system that can understand as much as possible to work well. And btw, on the pure neursymbolic thing, I wrote something a few days ago. In case it is of interest: https://www.linkedin.com/posts/ben-torben-nielsen_ai-history-realchallenges-activity-7349306338042150913-sorL?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAC7T8IBpJke5NXLL3dCTvXujQXvGetH-JE

"Are there examples where generalist neurosymbolic systems give us strong, niche reliability guarantees? Or is that still an open research question?" Language is already such a system par excellence. We can understand one another in a stochastic world in highly ambiguous contexts to a large extent because of such socially shaped reliability of meaning. Claude Shannon already provided the mathematical foundations for communications. In his article "Prediction and Entropy of Printed English," he showed how we possess some intuitive statistics for such predictions. (His demonstration of how successive approximations produce the structure of English is fascinating.) What's more language embeds a certain hierarchy for such predictions. Our brains read the "tape of symbols" sent to us (by sound/sign), decode the hierarchy, map it to memory, and arrive at a certain meaning. Shannon highlights how the principle of ergodicity makes this possible. Ask yourself how is it possible that the thoughts in one person's brain can make their way to thousands/millions of others and become comprehensible. Clearly, this is impossible without reliability guarantees. To miss this essential point is to circle the "neurosymbolic drain."

"Are there examples where generalist neurosymbolic systems give us strong, niche reliability guarantees?" They’re certainly more reliable than relying on LLMs alone. Recent protocol-level innovations like MCP (Model Context Protocol) are making it significantly easier to integrate neurosymbolic approaches in both implementation and research contexts. These advances help bridge symbolic systems with language models, allowing for more consistent, niche-reliable results in practical workflows. Related: 1. https://www.linkedin.com/pulse/semantic-web-project-didnt-fail-waiting-ai-yin-its-yang-idehen-j01se -- AI and Semantic Web Project symbiosis. 2. https://www.linkedin.com/posts/kidehen_ai-genai-issues-activity-7350672722860871681-bCsI -- Calculators and Langulators.

The concept of neurosymbolic AI leveraging external deterministic tools is fascinating, especially when considering the balance between generalist versatility and domain specific reliability and could this approach be the key to scaling trustworthy AI in critical enterprise applications?

This exact tension , between LLM-driven generalization and the need for domain-specific symbolic logic and verification , is what inspired us to develop HIRE AI™ as a trait-based neurosymbolic architecture. Instead of relying on interpolated associations alone, HIRE AI grounds each decision in a calibrated emotional-behavioral signature (e.g. humility vs. hubris, empathy vs. bypass), forming a transparent logic circuit between response → resonance → registration. It doesn’t just reference tools ; it scores the internal integrity of how tools, inputs, and outputs reflect human-aligned traits. Think of it as a values-anchored interpreter embedded within symbolic registration gates. Still early days, but it’s showing promise in leadership reliability, feedback closure, and emotional precision, especially in high-trust, high-risk contexts where hallucinations aren’t an option. Would love to exchange ideas if this resonates.

Like
Reply

There is a difference between Neurosymbolic and LLMs that can access tools. The Neurosymbolic architecture is one of the latest examples of Symbolic AI and Symbolic AI is all about knowledge representation. I used to get arguments from COBOL developers in the 1980's who would look at a rule-based system and say "I could implement the same thing in COBOL". I would tell them that they are correct but missing the point. There's a proof called Turing equivalence. Anything that can be programmed in a programming language that has the power of a Turing machine can also be programmed in any other language that has the power of a Turing machine (and all mainstream languages from Assembler to C to Python to rules and ontologies are Turing machines). The difference is how much work it takes to write the program and how easy it is to adapt, fix, and extend the program to do additional work. That's what knowledge representation is all about. (running out of chars, will continue in 2nd post).

You could point to the No‑Free‑Lunch theorem: without built‑in, task‑specific rules, you can’t promise a system will work everywhere. So far, neurosymbolic models only give assured guarantees in narrow, well‑defined areas. Building a truly general system with formal reliability proofs is still an open research problem.

Like
Reply

LLMs with Inference Time Scaling are also often neurosymbolic, because the inference time optimizations are generally symbolic. DeepMind has several interesting neurosymbolic LLM based solutions, such as AlohaGeomeyry and AlohaEvolve. There are neurosymbolic LLM based planning solutions. For serious business critical applications, neurosymbolic LLM may be the best path forward. Native LLM solution in such cases isn’t going to cut the mustard.

See more comments

To view or add a comment, sign in

Explore content categories