Chain-of-Thought Reasoning with Granite

IBM watsonx

Watsonx is an enterprise-ready AI and data platform designed to multiply the impact of AI across your business.

Published May 10, 2025

As large language models (LLMs) continue to revolutionize how we interact with artificial intelligence (AI), prompt engineering has emerged as a critical skill.

One of the most effective and widely discussed prompting techniques in the world of generative AI is chain of thoughts prompting (also known as CoT prompting). This method improves a model’s ability to solve complex problems by encouraging it to generate intermediate reasoning steps, just like a human would when thinking through a problem.

Unlike standard prompting, where machine learning models are asked to provide direct answers, chain of thought prompting works by guiding the model to “think out loud” in natural language to reach a conclusion. This shift leads to significant gains in problem-solving accuracy across a variety of benchmarks, especially in tasks that require multistep reasoning or logical inference.

Chain of thought reasoning in Granite

This notebook demonstrates the use of chain of thought prompting to unlock the reasoning capabilities of the IBM® Granite® Instruct large language models.

Unlike traditional AI models, Instruct LLMs have common sense reasoning embedded in them directly through fine tuning, allowing them to perform complex reasoning tasks without relying on external modules. The Granite Instruct internal reasoning process can be toggled on or off (See Reasoning when you want it for additional information) to optimize compute usage depending on the reasoning tasks involved.

This process makes it possible to observe the step-by-step reasoning path as Granite tackles complex tasks. This view reveals how it forms connections, processes natural language and arrives at the final answer, similar to watching an expert’s thought process unfold. Furthermore, the consistency of Granite improves chain of thought prompting by sampling multiple reasoning paths and selecting the most consistent answer, boosting reliability and accuracy.

This tutorial will guide you through the fundamentals of CoT prompting with Granite Instruct models. We’ll also explore how different datasets, prompt engineering techniques and prompt engineering approaches affect performance, and why chain of thought reasoning often outperforms standard prompting on real-world benchmarks involving complex problems. Explore the open source IBM Granite Community project powering this tutorial on GitHub.

Steps

Step 1. Install dependencies

Next, install the Python package dependencies for this notebook. Granite utils provide some helpful functions for recipes.

Step 2. Select your model

Select a Granite model from the ibm-granite org on Replicate. While there is a smaller model, (Granite-3.2:2b-instruct) for the purpose of this tutorial Granite-3.3:8b-instruct is the default. It is important to note that model size plays a role in the ability to handle tasks such as logic and math without being explicitly trained to do so, also referred to as emergent reasoning. This ability tends to appear naturally as the models scale.

Here we use the Replicate Langchain client to connect to the model.

To get set up with Replicate, see Getting Started with Replicate.

To connect to a model on a provider other than Replicate, substitute this code cell with one from the LLM component recipe.

Step 3. Setup two prompts

Next, create two prompt chains. The first chain will use the model’s normal (non-chain of thought reasoning) response mode. The normal response mode is the default prompt mode for Granite. The second chain is configured to use the chain of thought reasoning response mode. This step is done by passing thinking=True to the chat template. When doing so, it adds specific instructions to the system prompt, causing the model's internal reasoning process to be activated which results in the response containing the reasoning steps. By exploring variants of chain of thought prompting, one can experiment with how the models approach decision-making, making them more adaptable to a wide range of tasks.

Now that the prompts have been created, take a look at the difference between them to see which activates the Granite model’s internal reasoning process in the reasoning prompt.

NOTE: This additional prompt text is specific to the chat template for the version of Granite used and can change in future releases of Granite.

Step 4. Compare the responses of the two prompts

First, we define a helper function to take a question and use both prompts to respond to the question. The function will display the question and then display the response from the normal prompt, without CoT followed by the step-by-step response from the chain-of-thought reasoning prompt.

Step 5. Chain of thoughts reasoning use cases

In this example, chain of thought prompting supports logical problem-solving by having the model summarize the given relationships before analyzing them in detail. This helps ensure that each part of the problem is clearly understood and leads to an accurate conclusion.

The following example demonstrates how chain of thought prompting helps large language models handle basic decision-making and comparison-based problem-solving. This capacity makes the model's reasoning abilities and reasoning paths more transparent and accurate, turning a simple question into a short exercise in decision making.

This next example highlights how chain of thought prompting allows large language models to work through basic numerical comparisons with greater clarity. By encouraging step-by-step reasoning, even simple math-based questions become transparent exercises in evaluating magnitude and numerical relationships.

Building on the previous example of comparing decimal numbers, this question explores how the context of versioning can change the interpretation of similar-looking values. Chain of thought prompting helps clarify the subtle difference between numerical and version-based comparisons, guiding the model to apply reasoning that's sensitive to real-world conventions.

Continuing the exploration of version comparisons, this example introduces Maven versioning and the impact of prerelease identifiers such as -rc1 (release candidate). Chain of thought prompting allows the model to navigate domain-specific rules—such as semantic version precedence—making it easier to reason about which version is considered "greater" in practical software versioning contexts.

Chain of thought prompting helps models solve math word problems by breaking them down into clear, step-by-step reasoning. Instead of jumping to the final answer, the model explains how quantities and percentages relate, mimicking the logical reasoning of how a student might work through a mixture problem logically.

The final example demonstrates how chain of thought prompting can support geometric reasoning by breaking down shape properties and applying fundamental rules, such as angle sums in triangles. It shows how a model can translate a brief problem statement into a structured logical process, leading to a clear and correct conclusion.

References

Boshi Wang, S. M. (2022). Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters. 2717-2739, https://arxiv.org/abs/2212.10001.

“IBM Granite 3.2 Documentation – IBM Granite.” 2024. Ibm.com. 2024. https://www.ibm.com/granite/docs/models/granite/.

Wang, Xuezhi, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2022. “Self-Consistency Improves Chain of Thought Reasoning in Language Models.” ArXiv:2203.11171 [Cs], October. https://arxiv.org/abs/2203.11171.

4Tian, Jacob-Junqi, Omkar Dige, D Emerson, and Faiza Khattak. 2023. “Using Chain-of-Thought Prompting for Interpretable Recognition of Social Bias.” OpenReview. 2023. https://openreview.net/forum?id=QyRganPqPz&referrer=%5Bthe%20profile%20of%20D.%20B.%20Emerson%5D(%2Fprofile%3Fid%3D~D._B._Emerson1).

Wei, Jason, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi Quoc, V Le, and Denny Zhou. 2022. “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Chain-of-Thought Prompting.” https://proceedings.neurips.cc/paper_files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf.

Haluan Mohammad Irsad

Senior Product Engineer | GenAI Enthusiast

2mo

Granite’s thinking=True toggle is gold—full step-by-step logic for math, versioning, etc., so dev can debug prompts and show their work. The self-consistency sampling that picks the most agreed-upon answer is a smart reliability boost.

Carl Broker

Advisory AI Architect @IBM

3mo

Yeah buddy!

1 Reaction

Aymane Oualy

BI Consultant & Gen AI Enthusiast @IBM | 2×Google Cloud Certified | 4×Microsoft Certified | 2×Tableau Certified

3mo

Thanks for sharing

1 Reaction

Ameer ALHAMAMI

Contact Center Representative في ايرثلنك للاتصالات

3mo

Inspiring!! Thanks for your sharing great article @ibm watsonx%

1 Reaction

IBM

3mo

Got questions about IBM Granite? Ask us below👇

1 Reaction

See more comments

To view or add a comment, sign in

Chain-of-Thought Reasoning with Granite

IBM watsonx

Watsonx is an enterprise-ready AI and data platform designed to multiply the impact of AI across your business.

Chain of thought reasoning in Granite

Steps

Step 1. Install dependencies

Step 2. Select your model

Step 3. Setup two prompts

Step 4. Compare the responses of the two prompts

Step 5. Chain of thoughts reasoning use cases

References

More articles by this author

Others also viewed

Hugging Face: Building Custom Language Models: From Raw Data to Production AI

🗃️ GraphRAG Evolves into StructRAG

🐍 Mamba > Transformers?

🔮 Moving beyond RAG

AI Weekend Edition - Saturday, January 18, 2025: Commentary with Notable and Interesting News, Articles, and Papers

From Regression to Reasoning — A brief Intro & Use Cases by industry verticals

GPT Guide for Software Engineers and Newbies!

A Journey from AI to LLMs and MCP - 3 - Boosting LLM Performance — Fine-Tuning, Prompt Engineering, and RAG

Is OpenAI’s O1 Model a Scam? An In-Depth Look at the Debate

NRGscapes LAB Launches Automated Metadata Engine to Decode UAP Archive

Explore topics

Chain of thought reasoning in Granite

Steps

Step 1. Install dependencies

Step 2. Select your model

Step 3. Setup two prompts

Step 4. Compare the responses of the two prompts

Step 5. Chain of thoughts reasoning use cases

References

Access any model, anywhere on watsonx.ai

Jun 28, 2025

Powering LLM observability for enterprise AI with IBM watsonx

May 31, 2025

Using automatic speech recognition (ASR) to generate a podcast transcript with Granite 3.3 and watsonx.ai

Apr 19, 2025

Build an AI research agent for image analysis with Granite 3.2 Reasoning and Vision models

Apr 5, 2025

Build a multi-agent RAG system with Granite

Mar 22, 2025

AI Agents built with watsonx

Mar 1, 2025

Journey of Granite: IBM's Pioneering Path in AI Foundation Models

Feb 15, 2025

Granite 3.1: What Non-Developers Need to Know

Jan 25, 2025

Which AI assistant does what?

Aug 22, 2024

How to choose the right AI platform

Apr 13, 2024

Others also viewed

Hugging Face: Building Custom Language Models: From Raw Data to Production AI

🗃️ GraphRAG Evolves into StructRAG

🐍 Mamba > Transformers?

🔮 Moving beyond RAG

AI Weekend Edition - Saturday, January 18, 2025: Commentary with Notable and Interesting News, Articles, and Papers

From Regression to Reasoning — A brief Intro & Use Cases by industry verticals

GPT Guide for Software Engineers and Newbies!

A Journey from AI to LLMs and MCP - 3 - Boosting LLM Performance — Fine-Tuning, Prompt Engineering, and RAG

Is OpenAI’s O1 Model a Scam? An In-Depth Look at the Debate

NRGscapes LAB Launches Automated Metadata Engine to Decode UAP Archive

Explore topics