In Defense of the Illusion: Why Chain-of-Thought (CoT) Still Matters in High-Stakes AI Reasoning.
By J.S. Patel, Barrister and PhD Student, Victoria, BC
Chain-of-thought (CoT) reasoning has become one of the most recognizable techniques in modern large language model (LLM) architecture. By prompting models to "think step by step," researchers have seen significant gains in problem-solving performance, from arithmetic and logic puzzles to medical diagnostics and legal prediction tasks. Yet a growing chorus of AI scholars is sounding the alarm. In their provocative new paper, by Professor Yoshua Bengio and Barez et al. (2025) argue that CoT explanations are not just imperfect but often fundamentally unfaithful to how models actually compute. According to them, many researchers mistakenly equate step-by-step rationales with interpretability, despite mounting evidence that these rationales are post-hoc narratives rather than true reflections of internal logic.
This article offers a critical but respectful challenge to that position. I argue that CoT remains vital in high-stakes domains not because it perfectly mirrors model computation, but because it offers something equally valuable: a structured, contestable interface between machine reasoning and human norms of justification. Dismissing CoT as merely illusion obscures its unique role as a bridge between opaque architectures and institutional epistemic demands.
1. CoT as Cognitive Alignment, Not Mechanistic Fidelity
The central critique of Barez et al. is empirical and mechanistic. CoT, they argue, often fails to reflect the actual paths of computation inside a model, especially in transformer-based systems. This is likely true. Transformers operate via distributed, parallel activations across attention heads and layers. Mapping that into linear text is a lossy compression. But in fields like law, education, and medicine, fidelity to internal activations is not the only metric that matters.
For instance, interpretability, especially in legal settings, involves alignment with procedural norms, transparency requirements, and the right to reasoned justification. A model that outputs "Guilty" without reasons is worse than one that outputs "Guilty, because of prior record, motive, and opportunity," even if those reasons are partially heuristic. CoT rationales enable what I call epistemic docking, that is they align AI outputs with the justificatory conventions of human reasoning (Floridi, 2011).
2. CoT in Institutional Practice: Reasonableness Over Replication
In judicial contexts, reasoning is not evaluated purely on computational fidelity but on the reasonableness and contestability of its form. Administrative law in Canada, for instance, emphasizes "reasonableness review", where a decision is upheld if its rationale is internally coherent and procedurally grounded, even if it is not the only or best outcome (Canada (Minister of Citizenship and Immigration) v. Vavilov, 2019 SCC 65).
Similarly, CoT enables human experts to interrogate AI reasoning. If a sentencing model outputs "High Risk" but its CoT shows over-reliance on static factors or misapplied precedent, this can be contested. The very form of CoT invites adversarial engagement, which is foundational to due process.
3. The Post-Hoc Fallacy Is Not Fatal
Barez et al. cite strong evidence that CoTs often rationalize rather than reveal. Prompt bias, silent corrections, and shortcut heuristics are all cited as ways models decouple computation from narrative. However, this critique parallels long-standing debates in philosophy and psychology. Humans, too, rationalize after the fact. Nisbett and Wilson (1977) famously showed that people often confabulate reasons for their actions. Courts do not dismiss human justifications because they may contain unconscious bias. Instead, they cross-examine, contextualize, and corroborate.
In that sense, CoT is better seen not as a transcript but as a hypothesis. It is an interpretable output that can be tested, falsified, and refined—precisely the kind of reasoning format that is manageable in regulatory environments (Doshi-Velez and Kim, 2017).
4. Faithfulness Versus Usefulness: The Need for Dual Metrics
We should distinguish two kinds of faithfulness. First, internal faithfulness, which refers to alignment between the CoT and the model’s computational path. Second, institutional faithfulness, which refers to how well a rationale aligns with the expectations and norms of a particular domain.
In medicine, a CoT that reflects clinical guidelines—even if internally shortcut—still supports decision auditing. In law, a CoT that references case law and statute may be more valuable than one that truthfully reflects an obscure latent attention pattern. Barez et al. rightly warn against mistaking CoT for mechanistic explanation, but they underappreciate its conventional intelligibility, which is its value in settings where trust and justification are co-produced.
5. Where the Paper Is Right—and Where It Overreaches
of course, I fully endorse several of the authors' core proposals not simply because of their degree of experience in the domain but rather for the cogency of the reasoning that underpins their analysis. Their call for causal validation of CoT explanations, through ablation, verifier models, or counterfactual testing—is both urgent and overdue. Their roadmap for improving CoT through cognitive science and mechanistic tracing is robust.
But where the paper overreaches is in framing CoT as an illusion that should not be trusted. CoT is not a lie. It is a communicative scaffold. To treat it as epistemically worthless because it is not causally grounded is to misunderstand the nature of high-stakes institutional reasoning. In these contexts, communicability is not an optional feature—it is a foundational one.
6. Toward a CoT+ Paradigm
Rather than abandon CoT, we should pursue what could be coined CoT+—a new paradigm that combines:
i. Layered explanation, where CoT is supplemented with causal influence maps
ii. Counterfactual chains, where models show how different premises would change conclusions
iii. Dissent mechanisms, where adversarial prompts generate critiques of the original CoT
iv. Human oversight dashboards, where legal or medical professionals annotate and flag dubious reasoning
These ideas mirror how reasoning is structured in real-world institutions. Judicial reasoning is not presented as a single truth but as an exchange of arguments, counterarguments, and dissenting views. CoT can serve this function in AI if we treat it not as ground truth but as participatory scaffolding.
7. A Note on Temporal Relevance: CoT in the Pre-AGI Epoch
It is important to recognize that the significance CoT reasoning is, in part, a byproduct of our current architectural paradigm. LLMs, particularly transformer-based systems, lack persistent internal world models, grounded memory, or modular reasoning agents. CoT operates as an external scaffold for simulated deliberation, allowing these models to mimic multi-step thought without actually possessing it. Should AGI arrive—whether through neuro-symbolic architectures, embodied agents, or quantum-enhanced models, CoT may lose its centrality. An artificial general intelligence with robust causal reasoning and explicit memory may not need to simulate its thoughts through linear text, nor would it benefit from CoT’s current communicative crutches. In that future, CoT may serve a different function entirely: not as a mirror of reasoning, but as a translation layer for human interpretability. For now, however, in the LLM-dominated present, CoT remains an indispensable bridge between deep computation and normative justification.
8. Conclusion: Bridging, Not Peering
Chain-of-thought is not a microscope into AI cognition. It is a bridge. And in high-stakes environments where explanation, fairness, and contestability matter, that bridge is indispensable.
If we are to build AI systems that interact meaningfully with human institutions, we must value not only what models compute, but how they justify. CoT may not show us everything that happens inside the model, but it shows us enough to ask questions, challenge outputs, and co-produce trustworthy systems. That is not an illusion. That is progress.
Limited References
Barez, F., Wu, T.-Y., Arcuschin, I., Lan, M., Wang, V., Siegel, N., Collignon, N., Neo, C., Lee, I., Paren, A., Bibi, A., Trager, R., Fornasiere, D., Yan, J., Elazar, Y., & Bengio, Y. (2025). Chain-of-Thought is Not Explainability. arXiv preprint arXiv:2506.15213.
Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprintarXiv:1702.08608.
Floridi, L. (2011). The Philosophy of Information. Oxford University Press.
Nisbett, R. E., & Wilson, T. D. (1977). Telling more than we can know: Verbal reports on mental processes. Psychological Review, 84(3), 231–259.
Supreme Court of Canada. (2019). Canada (Minister of Citizenship and Immigration) v. Vavilov, 2019 SCC 65, [2019] 4 S.C.R. 653.
The full article can be found here: https://papers-pdfs.assets.alphaxiv.org/2025.02v2.pdf