Chain-of-Thought Reasoning with Granite
Written by Bryan C. Bentley Hargrave
As large language models (LLMs) continue to revolutionize how we interact with artificial intelligence (AI), prompt engineering has emerged as a critical skill.
One of the most effective and widely discussed prompting techniques in the world of generative AI is chain of thoughts prompting (also known as CoT prompting). This method improves a model’s ability to solve complex problems by encouraging it to generate intermediate reasoning steps, just like a human would when thinking through a problem.
Unlike standard prompting, where machine learning models are asked to provide direct answers, chain of thought prompting works by guiding the model to “think out loud” in natural language to reach a conclusion. This shift leads to significant gains in problem-solving accuracy across a variety of benchmarks, especially in tasks that require multistep reasoning or logical inference.
Chain of thought reasoning in Granite
This notebook demonstrates the use of chain of thought prompting to unlock the reasoning capabilities of the IBM® Granite® Instruct large language models.
Unlike traditional AI models, Instruct LLMs have common sense reasoning embedded in them directly through fine tuning, allowing them to perform complex reasoning tasks without relying on external modules. The Granite Instruct internal reasoning process can be toggled on or off (See Reasoning when you want it for additional information) to optimize compute usage depending on the reasoning tasks involved.
This process makes it possible to observe the step-by-step reasoning path as Granite tackles complex tasks. This view reveals how it forms connections, processes natural language and arrives at the final answer, similar to watching an expert’s thought process unfold. Furthermore, the consistency of Granite improves chain of thought prompting by sampling multiple reasoning paths and selecting the most consistent answer, boosting reliability and accuracy.
This tutorial will guide you through the fundamentals of CoT prompting with Granite Instruct models. We’ll also explore how different datasets, prompt engineering techniques and prompt engineering approaches affect performance, and why chain of thought reasoning often outperforms standard prompting on real-world benchmarks involving complex problems. Explore the open source IBM Granite Community project powering this tutorial on GitHub.
Steps
Step 1. Install dependencies
Next, install the Python package dependencies for this notebook. Granite utils provide some helpful functions for recipes.
Step 2. Select your model
Select a Granite model from the ibm-granite org on Replicate. While there is a smaller model, (Granite-3.2:2b-instruct) for the purpose of this tutorial Granite-3.3:8b-instruct is the default. It is important to note that model size plays a role in the ability to handle tasks such as logic and math without being explicitly trained to do so, also referred to as emergent reasoning. This ability tends to appear naturally as the models scale.
Here we use the Replicate Langchain client to connect to the model.
To get set up with Replicate, see Getting Started with Replicate.
To connect to a model on a provider other than Replicate, substitute this code cell with one from the LLM component recipe.
Step 3. Setup two prompts
Next, create two prompt chains. The first chain will use the model’s normal (non-chain of thought reasoning) response mode. The normal response mode is the default prompt mode for Granite. The second chain is configured to use the chain of thought reasoning response mode. This step is done by passing thinking=True to the chat template. When doing so, it adds specific instructions to the system prompt, causing the model's internal reasoning process to be activated which results in the response containing the reasoning steps. By exploring variants of chain of thought prompting, one can experiment with how the models approach decision-making, making them more adaptable to a wide range of tasks.
Now that the prompts have been created, take a look at the difference between them to see which activates the Granite model’s internal reasoning process in the reasoning prompt.
NOTE: This additional prompt text is specific to the chat template for the version of Granite used and can change in future releases of Granite.
Step 4. Compare the responses of the two prompts
First, we define a helper function to take a question and use both prompts to respond to the question. The function will display the question and then display the response from the normal prompt, without CoT followed by the step-by-step response from the chain-of-thought reasoning prompt.
Step 5. Chain of thoughts reasoning use cases
In this example, chain of thought prompting supports logical problem-solving by having the model summarize the given relationships before analyzing them in detail. This helps ensure that each part of the problem is clearly understood and leads to an accurate conclusion.
The following example demonstrates how chain of thought prompting helps large language models handle basic decision-making and comparison-based problem-solving. This capacity makes the model's reasoning abilities and reasoning paths more transparent and accurate, turning a simple question into a short exercise in decision making.
This next example highlights how chain of thought prompting allows large language models to work through basic numerical comparisons with greater clarity. By encouraging step-by-step reasoning, even simple math-based questions become transparent exercises in evaluating magnitude and numerical relationships.
Building on the previous example of comparing decimal numbers, this question explores how the context of versioning can change the interpretation of similar-looking values. Chain of thought prompting helps clarify the subtle difference between numerical and version-based comparisons, guiding the model to apply reasoning that's sensitive to real-world conventions.
Continuing the exploration of version comparisons, this example introduces Maven versioning and the impact of prerelease identifiers such as -rc1 (release candidate). Chain of thought prompting allows the model to navigate domain-specific rules—such as semantic version precedence—making it easier to reason about which version is considered "greater" in practical software versioning contexts.
Chain of thought prompting helps models solve math word problems by breaking them down into clear, step-by-step reasoning. Instead of jumping to the final answer, the model explains how quantities and percentages relate, mimicking the logical reasoning of how a student might work through a mixture problem logically.
The final example demonstrates how chain of thought prompting can support geometric reasoning by breaking down shape properties and applying fundamental rules, such as angle sums in triangles. It shows how a model can translate a brief problem statement into a structured logical process, leading to a clear and correct conclusion.
References
Boshi Wang, S. M. (2022). Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters. 2717-2739, https://arxiv.org/abs/2212.10001.
“IBM Granite 3.2 Documentation – IBM Granite.” 2024. Ibm.com. 2024. https://www.ibm.com/granite/docs/models/granite/.
Wang, Xuezhi, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2022. “Self-Consistency Improves Chain of Thought Reasoning in Language Models.” ArXiv:2203.11171 [Cs], October. https://arxiv.org/abs/2203.11171.
4Tian, Jacob-Junqi, Omkar Dige, D Emerson, and Faiza Khattak. 2023. “Using Chain-of-Thought Prompting for Interpretable Recognition of Social Bias.” OpenReview. 2023. https://openreview.net/forum?id=QyRganPqPz&referrer=%5Bthe%20profile%20of%20D.%20B.%20Emerson%5D(%2Fprofile%3Fid%3D~D._B._Emerson1).
Wei, Jason, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi Quoc, V Le, and Denny Zhou. 2022. “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Chain-of-Thought Prompting.” https://proceedings.neurips.cc/paper_files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf.
Senior Product Engineer | GenAI Enthusiast
2moGranite’s thinking=True toggle is gold—full step-by-step logic for math, versioning, etc., so dev can debug prompts and show their work. The self-consistency sampling that picks the most agreed-upon answer is a smart reliability boost.
Advisory AI Architect @IBM
3moYeah buddy!
BI Consultant & Gen AI Enthusiast @IBM | 2×Google Cloud Certified | 4×Microsoft Certified | 2×Tableau Certified
3moThanks for sharing
Contact Center Representative في ايرثلنك للاتصالات
3moInspiring!! Thanks for your sharing great article @ibm watsonx%
Got questions about IBM Granite? Ask us below👇