Evaluating LLMs: From Intelligence to Practical Usefulness

LLMs: Beyond illusions, toward usefulness Two critiques dominate the AI debate. 🔹 The Illusion of Thinking says LLMs don’t really think — they just mimic patterns, and benchmarks like MMLU or AIME exaggerate intelligence. 🔹 The Illusion of the Illusion of Thinking pushes back — dismissing LLMs as parrots ignores the fact that in practice their outputs function like reasoning. Both circle around the idea of “thinking.” A new paper — Evaluating LLM Metrics Through Real-World Capabilities (2025) — reframes the question: not are LLMs intelligent? but are they useful? Drawing on surveys and usage logs, it identifies six core capabilities people rely on: summarization, reviewing work, technical assistance, information retrieval, generation, and data structuring. It proposes human-centered criteria: coherence, accuracy, clarity, relevance, and efficiency. The results are clear: most benchmarks miss these everyday capabilities, leaving high-value tasks like reviewing or structuring work unevaluated. Current evaluations inflate abstract “intelligence” but overlook practical value. The real measure of LLMs is not whether they think, but how well they help us write, review, retrieve, generate, and structure knowledge. Read full paper: https://lnkd.in/ecFhbPSE #AI #LLM #AGI #generativeAI #futureofwork #AIevaluation

  • text, letter
Anthony Eri

AI Engineer | Data Scientist

2w

Shifting the focus from "does it think?" to "is it useful?" is the right move. Benchmarks need to measure real-world tasks like reviewing and structuring work, not just abstract knowledge. This is how we build truly helpful AI.

To view or add a comment, sign in

Explore content categories