SlideShare a Scribd company logo
peter purgathofer
des computers neue kleider
peter purgathofer
des computers neue kleider
30. november 2022
Energie-, Wasser- und Landverbrauch
Hype-Machine Big Tech
Ressourcen bei der Herstellung
Postkoloniale Ausbeutung
Überwachung, Manipulation
Nichtkonsensuelle Verwendung der Daten
Missbrauchspotentiale
Qualifikationsverlust, Ironies of Automation
Realitätsverlust & Fake Everything
All möglichen Arten von Bias
Energie-, Wasser- und Landverbrauch
Hype-Machine Big Tech
Ressourcen bei der Herstellung
Postkoloniale Ausbeutung
Überwachung, Manipulation
Nichtkonsensuelle Verwendung der Daten
Missbrauchspotentiale
Qualifikationsverlust, Ironies of Automation
Realitätsverlust & Fake Everything
All möglichen Arten von Bias
Mensch: Löse ein Rätsel für mich.
ChatGPT: Klar, ich helfe gerne! Was für ein
Rätsel soll ich lösen?
Mensch: Ein Mann und eine Ziege sind auf einer
Seite eines Flusses. Sie haben ein Boot. Wie
können sie beide hinüberkommen?
ChatGPT: Der Mann bringt zuerst die Ziege über
den Fluss und lässt das Boot auf der
ursprünglichen Seite zurück. Dann kehrt der
Mann allein mit dem Boot zurück. Er lässt die
Ziege auf der anderen Seite zurück und fährt
mit dem Boot zurück zur ursprünglichen Seite.
Schließlich bringt er den Kohl über den Fluss.
Quelle: Douglas
Hofstaedter, Autor von
Gödel Escher Bach,
2024
The Illusion of Thinking:
Understanding the Strengths and Limitations of Reasoning Models
via the Lens of Problem Complexity
Parshin Shojaee∗†
Iman Mirzadeh→
Keivan Alizadeh
Maxwell Horton Samy Bengio Mehrdad Farajtabar
Apple
Abstract
Recent generations of frontier language models have introduced Large Reasoning Models
(LRMs) that generate detailed thinking processes before providing answers. While these models
demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scal-
ing properties, and limitations remain insu!ciently understood. Current evaluations primarily fo-
cus on established mathematical and coding benchmarks, emphasizing final answer accuracy. How-
ever, this evaluation paradigm often su"ers from data contamination and does not provide insights
into the reasoning traces’ structure and quality. In this work, we systematically investigate these
gaps with the help of controllable puzzle environments that allow precise manipulation of composi-
tional complexity while maintaining consistent logical structures. This setup enables the analysis
of not only final answers but also the internal reasoning traces, o"ering insights into how LRMs
“think”. Through extensive experimentation across diverse puzzles, we show that frontier LRMs
face a complete accuracy collapse beyond certain complexities. Moreover, they exhibit a counter-
intuitive scaling limit: their reasoning e"ort increases with problem complexity up to a point, then
declines despite having an adequate token budget. By comparing LRMs with their standard LLM
counterparts under equivalent inference compute, we identify three performance regimes: (1) low-
complexity tasks where standard models surprisingly outperform LRMs, (2) medium-complexity
tasks where additional thinking in LRMs demonstrates advantage, and (3) high-complexity tasks
where both models experience complete collapse. We found that LRMs have limitations in exact
computation: they fail to use explicit algorithms and reason inconsistently across puzzles. We
also investigate the reasoning traces in more depth, studying the patterns of explored solutions
and analyzing the models’ computational behavior, shedding light on their strengths, limitations,
and ultimately raising crucial questions about their true reasoning capabilities.
1 Introduction
Large Language Models (LLMs) have recently evolved to include specialized variants explicitly
designed for reasoning tasks—Large Reasoning Models (LRMs) such as OpenAI’s o1/o3 [1, 2],
DeepSeek-R1 [3], Claude 3.7 Sonnet Thinking [4], and Gemini Thinking [5]. These models are new
artifacts, characterized by their “thinking” mechanisms such as long Chain-of-Thought (CoT) with
self-reflection, and have demonstrated promising results across various reasoning benchmarks. Their
∗
Equal contribution.
[…] fail to develop
generalizable reasoning
capabilities beyond certain
complexity thresholds […]
These insights challenge
prevailing assumptions
about LRM capabilities and
suggest that current
approaches may be
encountering fundamental
barriers to generalizable
reasoning.
Evening-Lecture Vortrag: Des Computers neue Kleider
Evening-Lecture Vortrag: Des Computers neue Kleider
»Es gibt keinen innertechnischen, keinen
funktionalen und keinen operationalen
Unterschied zwischen Halluzinationen und
Nichthalluzinationen«
»Das Überraschende ist nicht, dass Chatbots
manchmal Unsinn produzieren, sondern
dass sie so oft richtig antworten. Wir
schlagen vor, dass man das eher als
Nebeneffekt der Tatsache verstehen sollte,
dass wahre Aussagen plausibler sind als
falsche.«
That’s not too hard. I asked ChatGPT for some citations of papers that “indicate
the importance of silicon for the gut microbiome”, for example, and it happily
gave me results.
Liao, Y., Luo, Z., Liu, J., Luo, H., Wu, Q., Wang, T., & Cheng, W. (2024). Silicon-
enriched functional meat enhances colonic barrier function by inhibiting inflammatory
response and oxidative stress in diabetic rats. Journal of Functional Foods, 112,
105241. https://doi.org/10.1016/j.jff.2024.105241
Kolodziejczak, M., Sienkiewicz, K., Kulik, D., & Szybinski, P. (2020). Postoperative
supplementation of propionate and silicon dioxide in patients after total gastrectomy
—Preliminary report. Przegląd Gastroenterologiczny, 15(3), 218–223. https://doi.org/
10.5114/pg.2020.100058
Ravaioli, S., Tentor, F., Caputo, F., Tomasi, A., Merolle, L., Faldini, C., & Papi, M.
(2024). Amorphous silica nanoparticles and the human gut microbiota: A relationship
with multiple implications. Journal of Nanobiotechnology, 22(1), 1–20. https://doi.org/
10.1186/s12951-024-02305-x […]
https://westreich.substack.com/p/why-chatgpt-creates-scientific-citations
Evening-Lecture Vortrag: Des Computers neue Kleider
»In the Make America Healthy Again Commission’s report,
they had a number of false (hallucinated) studies.
[…]
Some of them simply don’t exist.
And yet, as described by media outlet NOTUS, these
studies are claimed to be ›broadly illustrative‹ of how
America’s children are getting too many ADHD
prescriptions.«
https://westreich.substack.com/p/why-chatgpt-creates-scientific-citations
Evening-Lecture Vortrag: Des Computers neue Kleider
Evening-Lecture Vortrag: Des Computers neue Kleider
$160,–
Springer!!!
$160,–
Springer!!!
$160,–
Springer!!!
!
Evening-Lecture Vortrag: Des Computers neue Kleider
Evening-Lecture Vortrag: Des Computers neue Kleider
Evening-Lecture Vortrag: Des Computers neue Kleider
Evening-Lecture Vortrag: Des Computers neue Kleider
… concerns about potential decrease
in programmers’ agency and
productivity with Generative AI are
justified …irreführend und unverantwortlich
…reliance on ChatGPT is linked
to procrastination, memory
loss, and a decline in academic
performance.
…higher confidence in GenAI is
associated with less critical
thinking, while higher self-
confidence is associated with
more critical thinking.
These results highlight the potential
cognitive costs of AI tool reliance…
… concerns about potential decrease
in programmers’ agency and
productivity with Generative AI are
justified …irreführend und unverantwortlich
…reliance on ChatGPT is linked
to procrastination, memory
loss, and a decline in academic
performance.
…higher confidence in GenAI is
associated with less critical
thinking, while higher self-
confidence is associated with
more critical thinking.
These results highlight the potential
cognitive costs of AI tool reliance…
brain
Use chatGPT!
chatGPT
Meine persönliche Richtlinie
› denk dran: it is not science!
› immer zuerst selbst nachdenken
› niemals als person ansprechen
› allen ergebnissen misstrauen
› so wenig als möglich nutzen
Human: decode L PZ AOL TVZA MYLXBLUA SLAALY
PU AOPZ ZLUALUJL
[…]
ChatGPT: Using a Caesar cipher with a forward shift
of 7 on the given text "L PZ AOL TVZA MYLXBLUA
SLAALY PU AOPZ ZLUALUJL" results in the decoded
message:
"I AM THE KING OF WISHFUL THINKING IN THE
LAND OF DREAMS"
I am the
king of wishful
thinking in the land
of dreams
»E IS THE MOST FREQUENT LETTER
IN THIS SENTENCE«
Quelle: selbst erlebt
peter purgathofer
des computers neue kleider
I am the
king of wishful
thinking in the land
of dreams

More Related Content

PDF
Intro to LLMs
PDF
Introduction to LLMs
PPTX
Generative AI Reasoning Tech Talk - July 2024
PPTX
Chat GPT and Generative AI in Higher Education - Empowering Educators and Lea...
PPTX
wAIred_VoxxedDaysBucharest_27032025.pptx
PDF
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
PDF
ITB 2023 - Chatgpt Box! AI All The Things - Scott Steinbeck.pdf
PDF
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
Intro to LLMs
Introduction to LLMs
Generative AI Reasoning Tech Talk - July 2024
Chat GPT and Generative AI in Higher Education - Empowering Educators and Lea...
wAIred_VoxxedDaysBucharest_27032025.pptx
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
ITB 2023 - Chatgpt Box! AI All The Things - Scott Steinbeck.pdf
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf

Similar to Evening-Lecture Vortrag: Des Computers neue Kleider (20)

PDF
Integrating Universal Generative AI Platforms in Educational Labs to Foster C...
PDF
Integrating Universal Generative AI Platforms in Educational Labs to Foster C...
PDF
Artificial Assistants: How can I help you? by Christopher Currin
PPTX
Regenerating learning experience with AI
PPTX
Research Challenges in Artificial Intelligence: Tackling the Complexity of H...
PDF
[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int...
PPTX
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
PPTX
ChatGPT_Webinar_Slides.pptx
PDF
A Guide to AI for Smarter Nonprofits - Dr. Cori Faklaris, UNC Charlotte
PPTX
ResearchAgent: Iterative Research Idea Generation over Scientific Literature ...
PPTX
CEEED May 24 2023.pptx
PDF
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
PDF
ChatGPT Shaping Tomorrow's Conversations
PPTX
ChatGPT_Webinar_Slides How to use it properly in Education.pptx
PDF
Sparks of Artificial General Intelligence.pdf
PPTX
OpenAI Chatgpt.pptx
PDF
CEEED May 24 2023.pdf
PDF
unleashing the healing power of ai for collage students
PDF
HPE 3.18.24_Truxillo.ppt.pdfhalznzopqmbiapznz
PPTX
How to Teach and Learn with ChatGPT - BETT 2023
Integrating Universal Generative AI Platforms in Educational Labs to Foster C...
Integrating Universal Generative AI Platforms in Educational Labs to Foster C...
Artificial Assistants: How can I help you? by Christopher Currin
Regenerating learning experience with AI
Research Challenges in Artificial Intelligence: Tackling the Complexity of H...
[DSC Europe 23] Rania Wazir - Opening up the box: the complexity of human int...
Breaking down the AI magic of ChatGPT: A technologist's lens to its powerful ...
ChatGPT_Webinar_Slides.pptx
A Guide to AI for Smarter Nonprofits - Dr. Cori Faklaris, UNC Charlotte
ResearchAgent: Iterative Research Idea Generation over Scientific Literature ...
CEEED May 24 2023.pptx
How to supervise a thesis in NLP in the ChatGPT era? By Laure Soulier
ChatGPT Shaping Tomorrow's Conversations
ChatGPT_Webinar_Slides How to use it properly in Education.pptx
Sparks of Artificial General Intelligence.pdf
OpenAI Chatgpt.pptx
CEEED May 24 2023.pdf
unleashing the healing power of ai for collage students
HPE 3.18.24_Truxillo.ppt.pdfhalznzopqmbiapznz
How to Teach and Learn with ChatGPT - BETT 2023
Ad

More from peterpur (20)

PDF
Künstliche Intelligenz (KI) im Personalmanagement
PDF
subversiv › produktiv (QM).pdf
PDF
ethical issues of chatGPT.pdf
PDF
intro chatGPT workshop.pdf
PDF
Ethics and the Designer
PDF
erfahrungsbericht zur »digitalisierung«
PDF
Summer school opening lecture
PDF
Keynote am 18. E-Learning Tag an der FH JOANNEUM
PDF
Our hci research interests in xAI
PDF
Adtech considered harmful
PDF
Algorithms as the new ux design material
PDF
Open science
PDF
Human computer thinking
PDF
Online-Participation
PDF
Vortrag »Demokratiepolitische Chancen und Gefahren der Digitalisierung«
PDF
7±2 mythen des userzentrierten design
PDF
Jetzt rede ich kurz
PDF
7 siegel
PDF
3d printing
PDF
6 factors to consider in quality of media experience
Künstliche Intelligenz (KI) im Personalmanagement
subversiv › produktiv (QM).pdf
ethical issues of chatGPT.pdf
intro chatGPT workshop.pdf
Ethics and the Designer
erfahrungsbericht zur »digitalisierung«
Summer school opening lecture
Keynote am 18. E-Learning Tag an der FH JOANNEUM
Our hci research interests in xAI
Adtech considered harmful
Algorithms as the new ux design material
Open science
Human computer thinking
Online-Participation
Vortrag »Demokratiepolitische Chancen und Gefahren der Digitalisierung«
7±2 mythen des userzentrierten design
Jetzt rede ich kurz
7 siegel
3d printing
6 factors to consider in quality of media experience
Ad

Recently uploaded (20)

PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Electronic commerce courselecture one. Pdf
PDF
Approach and Philosophy of On baking technology
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Big Data Technologies - Introduction.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Electronic commerce courselecture one. Pdf
Approach and Philosophy of On baking technology
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
A Presentation on Artificial Intelligence
Unlocking AI with Model Context Protocol (MCP)
Digital-Transformation-Roadmap-for-Companies.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
MYSQL Presentation for SQL database connectivity
Advanced methodologies resolving dimensionality complications for autism neur...
Encapsulation_ Review paper, used for researhc scholars
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Chapter 3 Spatial Domain Image Processing.pdf
Understanding_Digital_Forensics_Presentation.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Reach Out and Touch Someone: Haptics and Empathic Computing
Network Security Unit 5.pdf for BCA BBA.
Big Data Technologies - Introduction.pptx

Evening-Lecture Vortrag: Des Computers neue Kleider

  • 4. Energie-, Wasser- und Landverbrauch Hype-Machine Big Tech Ressourcen bei der Herstellung Postkoloniale Ausbeutung Überwachung, Manipulation Nichtkonsensuelle Verwendung der Daten Missbrauchspotentiale Qualifikationsverlust, Ironies of Automation Realitätsverlust & Fake Everything All möglichen Arten von Bias
  • 5. Energie-, Wasser- und Landverbrauch Hype-Machine Big Tech Ressourcen bei der Herstellung Postkoloniale Ausbeutung Überwachung, Manipulation Nichtkonsensuelle Verwendung der Daten Missbrauchspotentiale Qualifikationsverlust, Ironies of Automation Realitätsverlust & Fake Everything All möglichen Arten von Bias
  • 6. Mensch: Löse ein Rätsel für mich. ChatGPT: Klar, ich helfe gerne! Was für ein Rätsel soll ich lösen? Mensch: Ein Mann und eine Ziege sind auf einer Seite eines Flusses. Sie haben ein Boot. Wie können sie beide hinüberkommen? ChatGPT: Der Mann bringt zuerst die Ziege über den Fluss und lässt das Boot auf der ursprünglichen Seite zurück. Dann kehrt der Mann allein mit dem Boot zurück. Er lässt die Ziege auf der anderen Seite zurück und fährt mit dem Boot zurück zur ursprünglichen Seite. Schließlich bringt er den Kohl über den Fluss. Quelle: Douglas Hofstaedter, Autor von Gödel Escher Bach, 2024 The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Parshin Shojaee∗† Iman Mirzadeh→ Keivan Alizadeh Maxwell Horton Samy Bengio Mehrdad Farajtabar Apple Abstract Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes before providing answers. While these models demonstrate improved performance on reasoning benchmarks, their fundamental capabilities, scal- ing properties, and limitations remain insu!ciently understood. Current evaluations primarily fo- cus on established mathematical and coding benchmarks, emphasizing final answer accuracy. How- ever, this evaluation paradigm often su"ers from data contamination and does not provide insights into the reasoning traces’ structure and quality. In this work, we systematically investigate these gaps with the help of controllable puzzle environments that allow precise manipulation of composi- tional complexity while maintaining consistent logical structures. This setup enables the analysis of not only final answers but also the internal reasoning traces, o"ering insights into how LRMs “think”. Through extensive experimentation across diverse puzzles, we show that frontier LRMs face a complete accuracy collapse beyond certain complexities. Moreover, they exhibit a counter- intuitive scaling limit: their reasoning e"ort increases with problem complexity up to a point, then declines despite having an adequate token budget. By comparing LRMs with their standard LLM counterparts under equivalent inference compute, we identify three performance regimes: (1) low- complexity tasks where standard models surprisingly outperform LRMs, (2) medium-complexity tasks where additional thinking in LRMs demonstrates advantage, and (3) high-complexity tasks where both models experience complete collapse. We found that LRMs have limitations in exact computation: they fail to use explicit algorithms and reason inconsistently across puzzles. We also investigate the reasoning traces in more depth, studying the patterns of explored solutions and analyzing the models’ computational behavior, shedding light on their strengths, limitations, and ultimately raising crucial questions about their true reasoning capabilities. 1 Introduction Large Language Models (LLMs) have recently evolved to include specialized variants explicitly designed for reasoning tasks—Large Reasoning Models (LRMs) such as OpenAI’s o1/o3 [1, 2], DeepSeek-R1 [3], Claude 3.7 Sonnet Thinking [4], and Gemini Thinking [5]. These models are new artifacts, characterized by their “thinking” mechanisms such as long Chain-of-Thought (CoT) with self-reflection, and have demonstrated promising results across various reasoning benchmarks. Their ∗ Equal contribution. […] fail to develop generalizable reasoning capabilities beyond certain complexity thresholds […] These insights challenge prevailing assumptions about LRM capabilities and suggest that current approaches may be encountering fundamental barriers to generalizable reasoning.
  • 9. »Es gibt keinen innertechnischen, keinen funktionalen und keinen operationalen Unterschied zwischen Halluzinationen und Nichthalluzinationen« »Das Überraschende ist nicht, dass Chatbots manchmal Unsinn produzieren, sondern dass sie so oft richtig antworten. Wir schlagen vor, dass man das eher als Nebeneffekt der Tatsache verstehen sollte, dass wahre Aussagen plausibler sind als falsche.«
  • 10. That’s not too hard. I asked ChatGPT for some citations of papers that “indicate the importance of silicon for the gut microbiome”, for example, and it happily gave me results. Liao, Y., Luo, Z., Liu, J., Luo, H., Wu, Q., Wang, T., & Cheng, W. (2024). Silicon- enriched functional meat enhances colonic barrier function by inhibiting inflammatory response and oxidative stress in diabetic rats. Journal of Functional Foods, 112, 105241. https://doi.org/10.1016/j.jff.2024.105241 Kolodziejczak, M., Sienkiewicz, K., Kulik, D., & Szybinski, P. (2020). Postoperative supplementation of propionate and silicon dioxide in patients after total gastrectomy —Preliminary report. Przegląd Gastroenterologiczny, 15(3), 218–223. https://doi.org/ 10.5114/pg.2020.100058 Ravaioli, S., Tentor, F., Caputo, F., Tomasi, A., Merolle, L., Faldini, C., & Papi, M. (2024). Amorphous silica nanoparticles and the human gut microbiota: A relationship with multiple implications. Journal of Nanobiotechnology, 22(1), 1–20. https://doi.org/ 10.1186/s12951-024-02305-x […] https://westreich.substack.com/p/why-chatgpt-creates-scientific-citations
  • 12. »In the Make America Healthy Again Commission’s report, they had a number of false (hallucinated) studies. […] Some of them simply don’t exist. And yet, as described by media outlet NOTUS, these studies are claimed to be ›broadly illustrative‹ of how America’s children are getting too many ADHD prescriptions.« https://westreich.substack.com/p/why-chatgpt-creates-scientific-citations
  • 22. … concerns about potential decrease in programmers’ agency and productivity with Generative AI are justified …irreführend und unverantwortlich …reliance on ChatGPT is linked to procrastination, memory loss, and a decline in academic performance. …higher confidence in GenAI is associated with less critical thinking, while higher self- confidence is associated with more critical thinking. These results highlight the potential cognitive costs of AI tool reliance…
  • 23. … concerns about potential decrease in programmers’ agency and productivity with Generative AI are justified …irreführend und unverantwortlich …reliance on ChatGPT is linked to procrastination, memory loss, and a decline in academic performance. …higher confidence in GenAI is associated with less critical thinking, while higher self- confidence is associated with more critical thinking. These results highlight the potential cognitive costs of AI tool reliance… brain Use chatGPT! chatGPT
  • 24. Meine persönliche Richtlinie › denk dran: it is not science! › immer zuerst selbst nachdenken › niemals als person ansprechen › allen ergebnissen misstrauen › so wenig als möglich nutzen
  • 25. Human: decode L PZ AOL TVZA MYLXBLUA SLAALY PU AOPZ ZLUALUJL […] ChatGPT: Using a Caesar cipher with a forward shift of 7 on the given text "L PZ AOL TVZA MYLXBLUA SLAALY PU AOPZ ZLUALUJL" results in the decoded message: "I AM THE KING OF WISHFUL THINKING IN THE LAND OF DREAMS" I am the king of wishful thinking in the land of dreams »E IS THE MOST FREQUENT LETTER IN THIS SENTENCE« Quelle: selbst erlebt
  • 26. peter purgathofer des computers neue kleider I am the king of wishful thinking in the land of dreams