Agents in The Terminal: Exploring AI-Powered CLIs

Agents in The Terminal: Exploring AI-Powered CLIs

In recent months, a new category of development tools has started to gain prominence: code agents that run directly in the terminal (CLIs). Powered by large language models (LLMs) like GPT-4o, Claude 4, and Gemini 2.5, these agents transform the terminal into an interactive environment where developers can talk to the AI about their code, generate new snippets, navigate files, automate tasks, and much more.

Unlike assistants like GitHub Copilot, which operate within IDEs, these agents run locally via a CLI and connect directly to the project repository, offering a faster and more contextualized experience.

OpenAI, Google, and Anthropic have already launched their own versions of these tools, and the race for more powerful and accessible functionality is just beginning. In this post, we’ll compare the leading options on the market: Codex CLI, Gemini CLI, and Claude Code.

What Do These CLI Code Agents Do?

These CLI-based agents function as conversational interfaces capable of understanding and interacting with your local project. With simple commands like codex, claude, or gemini, developers can launch the tools in the terminal and ask questions, request code suggestions, browse files, ask for explanations, generate unit tests, and even automate build and deployment tasks.

They read your local source code and respond based on the current context of the project, maintaining the conversational state between commands. This form of interaction reduces friction across development steps, enabling a smoother collaboration with the assistant.

The full potential of these agents is still being explored by the community. Integrations with MCP servers and other programs running on the user’s machine open up a new horizon of possibilities for applying these agents in real-world scenarios.

Comparing the CLIs

Codex CLI

  • Context Window: 200.000
  • Default model: o4‑mini
  • MCP support: No
  • Free-tier: No

Gemini CLI

  • Context Window: 1.000.000
  • Default model: Gemini 2.5 Pro
  • MCP support: Yes
  • Free-tier: Yes

Claude Code

  • Context Window: 200.000
  • Default model: Claude Sonnet 4
  • MCP support: Yes
  • Free-tier: Yes

Running a Small Comparative Test

To make a more practical comparison between the three tools, we conducted a real-world test using a project repository. We were inspired by this post that highlights some interesting use cases for CLI agents. In our case, we decided to test the agents' ability to generate documentation for a project using only the source code available in the repository as input. The ultimate goal was to create a web page with project information that could be used, for example, to onboard a new developer.

For the test, we selected a small internal project developed by Novatics called esg-score, which was originally created as a solution for a hackathon. The project is a frontend application that generates a dashboard with metrics and charts related to a user's ESG score. This score is calculated based on the user’s credit card spending. The project is built with React and uses Material UI to render most of the components.

To generate the documentation, we followed the same sequence of steps for each agent:

1. We created a prompt asking the agent to create documentation for the project and save it in markdown format in a file CODEBASE.md. The prompt was as follows:

You are an AI developer assistant. Your task is to analyze the codebase located in the current directory and write a comprehensive summary in a markdown file called CODEBASE.md        

2. Based on this, we asked the agent to convert CODEBASE.md into a presentation file about the project, also in markdown:

Convert the CODEBASE.md file into a presentation and save it as PRESENTATION.md file        

3. Finally, we asked the agent to create a web page using the content from the PRESENTATION.md file:

Now create a webpage page for the project using the information in PRESENTATION.md, make the page as a single HTML file and use a cool look and feel with a theme related to the project description        

The results generated by each of the agents are available below:

Codex CLI

Article content
https://esg-score-ui-codex.s3.sa-east-1.amazonaws.com/index.html

Gemini CLI

Article content
https://esg-score-ui-gemini.s3.sa-east-1.amazonaws.com/index.html

Claude Code

Article content
https://esg-score-ui-claude.s3.sa-east-1.amazonaws.com/index.html

Our Analysis and the Best Option

All three tools produced good results. However, the output generated by Claude was superior to the others. In addition to creating information blocks about the stack and architecture — which the others also did — it went further and generated sections describing the project's goals and main functionalities, even though we hadn’t explicitly asked for that in the prompt.

The experience of interacting with Claude was also slightly better; it seemed to carry out tasks in a more organized and direct way. In Gemini’s case, this was actually a negative point. When executing the first step and creating the CODEBASE.md file, it entered an infinite loop, repeatedly deleting and recreating the file until we told it to stop the execution.

As for Codex, it completed the task in a more concise manner, and if we look at the look and feel of the generated page, it delivered a slightly less rich experience than both Gemini and Claude. Additionally, in one of our tests, it used the temporary directory (/tmp) to create a patch file for the repository and then requested that the patch be applied. Although not incorrect, this step was unnecessary for the task at hand. It also did this only once, which gave a slight sense of inconsistency between executions.

In other tests we conducted, the conclusions were similar, with Claude Code producing slightly better results than the others. This aligns with many posts and discussions available online comparing these tools. There are even suggestions to use Gemini together with Claude, leveraging Gemini’s large context window (1 million tokens) alongside Claude’s superior “reasoning” abilities and overall output quality (in this case, Gemini could be used to analyze and summarize a project’s codebase, then pass that result to Claude to perform the desired actions).

Conclusion

Claude Code seems to be the best option at the moment when compared to Gemini CLI and Codex CLI. However, there are other aspects to consider, such as cost and context window size. Gemini, for instance, offers a very generous free tier and may generally have a lower cost than the others. So, if we don’t take cost into account, the first choice — at least for now — would be Claude Code. If we do consider cost and the possibility of a generous free tier, Gemini CLI might take the top spot.

It’s worth noting that there are other CLI agents not directly tied to one of the major players. One example is Goose, which also acts as a CLI agent and allows the use of multiple LLMs (we’ve already mentioned it in this post).

Finally, it’s important to emphasize that regardless of which agent is used, adopting these tools can represent a leap in productivity — especially for tasks like maintenance, generating repetitive code, and understanding legacy codebases.ee

Let’s go! 🚀

To view or add a comment, sign in

Others also viewed

Explore topics