How LLMs Use Monte Carlo Tree Search to Improve Their Own Thinking

Chris Jaimy Antony

Building smart things with AI (and a lot of Googling) | LLM Explorer | Researcher in Progress | Intel Unnati Trainer

Published May 2, 2025

The paper on Planning of Heuristics published on 17th February, 2025 by Chaoxu Mu, Xufeng Zhang and Hui Wang talks about something really fascinating! The authors introduce an interesting idea on using Self Reflection Driven LLM with the Monte Carlo Search Tree. I know it sounds complicated, but don't worry this article will help you deeply understand (or atleast try to) what the paper is all about!

Basic Terminologies

So let's get our basics down and understand the key terms of the paper before we dive deeper.

1) Combinatorial Optimization Problems (COPs) :- These are problems where we try to find the best solution from a huge number of possible combinations. Imagine trying to deliver a 100 packages, all over a big city, going at it randomly would be a waste of time. But having a method or heuristic allows for better optimization.

2) Heuristics :- A heuristic is a principle of sorts, which allows for better decision making than making blind ones. In algorithms like A* Search or Greedy First Search they have a heuristic attached to them, because of which they will deal with the same problem in a different manner. In layman language, it is a "shortcut", allowing the program to make better decisions.

3) Monte Carlo Search Tree (MCTS) :- It is a method to make decisions by simulating many possible outcomes, and using those outcomes to create simulations to see, which one fits the best. Imagine how the Chess AI Stockfish works, it thinks of every possible move for black and white and then ranks based on the previous or new simulation it runs to see how good a move is. MCTS has 4 main steps in its working :-

Selection - Selecting the most promising path.
Expansion - Expand on the promising path, and try new moves.
Simulation - Imagining how the new path may play out
Backpropagation - Based on how that path performs, storing it in your memory

4) Optimization :- It is simply about making something both effective and functional at the same time. Like how a washing machine saves more time and energy instead of washing clothes by hand.

What is the paper about?

The paper introduces a new idea of Planning of Heuristics (PoH), where it uses Large Language Models (LLMs) and Monte Carlo Search Tree (MCTS) to automatically improve heuristics. Now let's see how does this actually work :-

1) Generating Heuristics :- First the system would start with a heuristic or decision making shortcuts to solve a problem. Imagine you want to travel to point A, your heuristic would be to find the most direct and efficient path, to point A.

2) Evaluating Heuristics :- Now this heuristic would be checked to see how does it fair in simulations done by the MCTS, to see how the heuristic would perform in different scenarios.

3) Refining Heuristics :- Based on the results, the system would redefine it's heuristics to make it as efficient as possible. For example, if a path goes directly to point A, but takes more time than taking an indirect path, then the indirect path, would fare much better.

What's the importance?

In traditional methods, we have always handcrafted heuristics, which takes time to make and may not always be the best solution. But what this paper does with the introduction of PoH is that, we can automatically improve heuristics basically the system learns and keeps on refining the heuristic on its own! That too without human input, so it learns from itself instead.

Key Findings of the Paper

The paper tests PoH on two classical problems :-

Traveling Salesman Problem :- Imagine you’re a salesperson trying to visit several cities with the least travel distance.
Flow Shop Scheduling Problem (FSSP): It is about scheduling tasks on multiple machines in order to minimize processing times.

PoH improves upon the previous methods by reducing the optimality gap (which means how far was the solution made by the system far off from the best solution), from 0.483% to 0.233% on these problems. These means it performs better on these questions.

Key Takeaways from the Paper

The paper combines two things Large Language Models, which understand and generate language, with the the Monte Carlo Tree Search, which makes decisions through simulations. Both of which when combined together, allows to automatically improve heuristics for solving problems related to optimization.

It also focuses on Self-Reflection in LLMs, means the model evaluates what it has done and on the basis of that deduces whether it's choice was optimal or not. This method can be applied to a wide range of optimization problems where finding an optimal solution is usually difficult or time consuming.

Final Thoughts

Planning of Heuristics is a powerful example of how LLMs can think and evolve themselves from just language rules but to problem solvers. By teaching them to plan, reflect and optimize we can apply to them to real world scenarios for better results.

How LLMs Use Monte Carlo Tree Search to Improve Their Own Thinking

Chris Jaimy Antony

Building smart things with AI (and a lot of Googling) | LLM Explorer | Researcher in Progress | Intel Unnati Trainer

Basic Terminologies

What is the paper about?

What's the importance?

Key Findings of the Paper

Key Takeaways from the Paper

Final Thoughts

More articles by this author

Others also viewed

Heuristic Algorithms for Real-World Applications: Success Stories and Future Directions

SVD — Single Value Decomposition

SIGMA: Synthetic Integrated General Modeling and Analysis

Air Traffic Coordination Algorithm

Paper Review: FreeU: Free Lunch in Diffusion U-Net

What is Hyperbolic Exponential Smoothing (HES)?

Correlation, causation and vector autoregressions

Kalman Filter: The first dive

Complete Guide: Building Fast and Efficient Monte Carlo Tree Diffusion Models

Day 06 — Support Vector Machine

Explore topics

Basic Terminologies

What is the paper about?

What's the importance?

Key Findings of the Paper

Key Takeaways from the Paper

Final Thoughts

Your Model Might be Racist (Even Though You Tried)

Jul 16, 2025

"Just ReAct." They Said. "It'll Be Fine." They Said. (They Lied)

Jul 9, 2025

Why is Your LLM Biased? No, it's Not Because it was Trained on Reddit

Jun 25, 2025

LLMs Can't Think, Unless They Can. The Comedy of Two Conclusions

Jun 18, 2025

AI Steps Into Labs : Spatial Agent and the Future of Biology

Jun 14, 2025

AI Grows and Helps in Creation of New Compounds

Jun 8, 2025

When LLMs Stop Assisting and Start Doing Science

May 31, 2025

How AI is Transforming Science by Not Just in Assisting it but Co-Authoring it

May 24, 2025

HyperTree Planning : Teaching LLMs to think in Levels Rather than just Lines.

May 18, 2025

How can LLMs plan further ahead with active and passive modifications?

May 10, 2025

Others also viewed

Heuristic Algorithms for Real-World Applications: Success Stories and Future Directions

SVD — Single Value Decomposition

SIGMA: Synthetic Integrated General Modeling and Analysis

Air Traffic Coordination Algorithm

Paper Review: FreeU: Free Lunch in Diffusion U-Net

What is Hyperbolic Exponential Smoothing (HES)?

Correlation, causation and vector autoregressions

Kalman Filter: The first dive

Complete Guide: Building Fast and Efficient Monte Carlo Tree Diffusion Models

Day 06 — Support Vector Machine

Explore topics