Agentic Workflow RFC #15

JReinhold · 2025-09-09T14:56:56Z

JReinhold
Sep 9, 2025
Maintainer

🤖 Problem

When you're building UI using agentic programming, you lack visibility into the UI that's being created, and it's hard to verify that things work properly. LLMs can generate UI components and code, but developers need a reliable way to see, test, and validate the visual output before integrating it into their applications.

Storybook is a tool that provides those capabilities for humans, but it doesn’t integrate very well with LLMs today.

Note

We’ve identified two separate uses of Storybook in relation to LLMs. @storybook/addon-mcp and this RFC focuses on improving the development experience when using Storybook in a project - this is focused on the users that are building components with Storybook.
The other use case focuses on consumers of Storybook. That is not necessarily users that are building with Storybook (although they could be), but they are building UI and using a Storybook as documentation. We think that is a separate workflow to investigate, and we’re discussing it in the Storybook Design Systems with Agents RFC

🧩 Proposed Solution

The experimental MCP addon in this repository is an MCP server that users can add to their agents to improve the agent’s understanding of Storybook. It is packaged as a regular Storybook addon, it supports Storybook 7 and up (for now) but only for Vite-based projects.

See the documentation on how to setting up and get started with the addon.

Along with the MCP server that the addon spawns, we also have a prompt that we recommend users add to their system prompts:

Before doing any UI, frontend or React development, ALWAYS call the Storybook MCP server to get further instructions.

When users start up their Storybook dev server, the MCP addon starts up an MCP server on http://localhost:PORT/mcp that agents can connect to.

The MCP server exposes two tools for the agent to use:

UI Building Instructions Tool

This tool currently responds with instructions about:

When to write stories
Migration guide from Storybook 8 to 9
How to use the newer module mocking functionality
How and when to reliably provide links to stories

These instructions tend to nudge the LLM to write stories when relevant, teaches the LLM how to use up-to-date syntax and APIs - because it’s baseline knowledge from training data is generally okay, just outdated - and also reliably provide relevant story links for the user to inspect the result of the work.

The instructions are “hidden” behind a tool call, so it doesn’t pollute the context when working on non-UI related tasks.

Story URL Tool

This tool provides the LLM with a programmatic way to turn any story path + name into a valid Storybook URL. This enables the LLM to provide correct links to stories in the running Storybook based on its own selection of which stories are relevant for a user to inspect.

🔮 Future Work

We’ve just scratched the surface of what’s possible here, and we’ve already identified key areas to focus on.

We need to have a clearer definition of what a “good” story is. If we are to write clear instructions for the LLM to improve its story writing capabilities, we first need to align on a clear set of best practices to follow. Eg. it’s not clear cut how much should be asserted on in play-functions, how exhaustive stories should cover prop combinations, etc.

Prompt Engineering on the UI Building Instructions. We could spend a lot more time on fine tuning the instructions and the wording to make the LLM understand them better. Potentially migration notes don’t produce the best outcome, but instead a distilled, LLM-friendly version of the documentation (like how Svelte-LLM does it) produces better results. There are also more obvious improvements, like not including Storybook 9 migration docs if we detect that the current project is running Storybook 8.

Build automated evaluation setup. Our research methodology was rigorous and included clear experiments with expected outputs and structured findings. However it was a manual process trying out each scenario with each prompt, and that doesn’t scale well. Changing the wording on a prompt required re-running a matrix of 15 experiments and logging the results. We need to improve this process so that we can evaluate our iterations in a more automated setup, like what something like evalite provides.

We also have an abundance of ideas for how the MCP server can improve the workflow in other areas. (and maybe you do too? 🙏) For now we’ve focused on the first-shot attempt at writing stories and UI, and improving the LLMs general knowledge. Other ideas include:

Enabling the LLM to review its work and iterate on it via Storybook. This includes continuously running the stories’ tests and get information back about
- Are the interaction tests passing?
- Does the UI have any a11y violations?
- What does the UI look like (screenshot of story)?
Providing coverage information to the LLM, to make it more clear which stories to add or interactions to test.
Building a dedicated review UI in Storybook, tailored for when LLMs does the work and the user needs to review the visual changes and not just the code changes.

💬 Request For Comments

We’re asking for input and feedback here. We hope you’ll try this out, see what works for you and what doesn’t. Please share thoughts and ideas on what could be improved, or suggest changes to the prompts, etc. 🙏

🧪 Research and Findings

If you’re curious about how we got here, we’ve written about our methodology and research findings too, but it’s not required reading for this RFC.

❓Questions

To research how we could improve an agent-based development workflow with Storybook, we defined a set of goals:

Whenever the user prompts for UI to be generated or modified, the agent should know to write/update stories appropriately.
The stories should be of high quality, with correct CSF syntax and interaction testing where necessary.
The stories should match the project's setup in terms of imports, language, etc.
At the end of the UI writing process, the agent should respond with one or more links to stories that have been affected, for the user to visually inspect and review.

To reach these goals we defined a set of questions we needed to answer:

How do we make the LLM consistently write and update stories when working with UI?Can we change an LLMs behavior so the user doesn’t need to explicitly ask the LLM to write stories, but instead the LLM just knows that it’s a core part of any tasks that requires UI work?
How can we make an LLM write “good” stories?
- What context or information will make an LLM write the best stories possible, including valid syntax and using best practices around args, play-functions, etc.?
How can we make an LLM send valid story links to the user when relevant?

Investigating these questions and trying out different solutions is what has ultimately led to the experimental release of @storybook/addon-mcp :

📖 UI Building Instructions Tool

We found that “hiding” these instructions behind an MCP tool was an effective way of managing the LLMs’ context as much as possible. In general, minimising the usage of LLMs’ context windows is critical to get good results. Not only are there hard limits and costs associated with having too many tokens in the context. LLMs perform worse and worse the more context they have - similar to how humans struggle to focus on a task if they get bombarded with irrelevant information. Our findings show that generally the LLM will only request the UI instructions when it’s working on UI, ensuring that it doesn’t consider it when it’s working on other tasks eg. backend only.

The combination of the system prompt, the tool’s description, and the paragraph about when to write stories have proven to be an effective way to nudge the LLM to write and update stories when working on UI components, without requiring a specific user prompt for it. However AI’s are unreliable, models, agents and clients all behave different, so it’s not a surefire way to achieve this. We believe that the reliability can be improved in the future by iterating on the prompts (feedback and ideas welcome! 🙏).

We found that generally speaking, LLMs have an okay understanding of CSF and how to write stories already embedded in their training data from all the publicly available content there already exists about Storybook today. However they struggle to stay up to date, and so providing the migration guide and instructions on new features was a way to ensure it wrote valid and modern stories. It had a noticeable impact on the validity of the output, however there’s still much to be desired when it comes to writing “good” stories. More Prompt Engineering™ is needed to make it understand best practices better, eg. relevant usage of assertions in play-functions and useful args combinations.

Finally, including clear instructions on when and how to provide story links to the users proved to be very effective. Our findings indicate that with the MCP server the LLM will consistently end tasks with a set of links that the user can visit to inspect the UI work the LLM has performed. Not only when creating new stories, but also when working on UI only without any story modifications.

🔗 Story URL Tool

An exact URL to a story requires a few things:

The origin, eg. http://localhost:6006
The path to the stories-file, eg. ./src/Button.stories.ts
The export name of the story, eg. WithLongLabel
Optionally the custom name of the story, eg. Long Label

With this information it is possible for the tool to programmatically look up the story index and find the stories’ IDs. From there it’s just about combining the IDs with the origin correctly.

To get an understanding of the LLMs knowledge and capabilities we first explored if it could generate these URLs on its own, without assistance from a tool. We found that it worked in the most basic scenarios, where stories-files had explicit meta titles, no custom story name, and where running on localhost:6006. That was too limiting for practical, but interesting nonetheless that it understood the Storybook URL structure.

We then gave the LLM information about how Storybook internally constructs the story IDs, by giving it the raw code that does it, or alternatively an LLM generated description of the code. This was very reliable, and the LLM correctly gathered all the necessary information and produced valid links in all advanced scenarios. But it struggled to get the origin right when it was not localhost:6006, which led to the creation of the tool that does it programmatically and reliably.

In future iterations we need to make sure the LLM always passes stories’ custom names into the tool. Currently it’s a two-shot process where the tool will return an error reminding the LLM to pass in a story’s custom name, and then the LLM will get it right the second time.

🤔 Why an addon

There are many ways to distribute an MCP server. Here’s a quick overview of what we explored and the pros and cons:

Remote MCP Server, that we host somewhere. It’s easy to set up for users as they just provide a link, but it also has very limited capabilities, as it doesn’t have access to anything local (filesystem, local servers), it only have access to the information that the agent provides during a tool call. Local files are the main input here, so this was a no-go.
Local MCP Server, that the user configures with an npx command or similar. This is the regular way to distribute MCP servers, and is capable of accessing anything locally. The downside of this is that if it needs access to anything from the Storybook runtime - stories, the index, statuses, etc. - it would have to request that from a running Storybook dev server, and we would have to expose that in API endpoints in Storybook core. This would be cumbersome to maintain, it would be slower to iterate on and release, and the MCP server would only be compatible with specific Storybook versions.
Storybook addons have access to a lot of information from within the Storybook runtime. The can access the channel, modify presets, read configurations, etc. The Story URL tool makes use of this, by determining the Storybook framework from the user config, and adding that to the UI Building Instructions so the LLM knows that automatically. You can imagine in the future it could modify the manager UI, read story statuses, inject afterEach-hooks, etc. The MCP server will be tied to a specific instance, and in the future we’d need to figure out how that would play out in monorepo situations where there might be multiple Storybooks running at the same time.

🔍 Related Work

connorads · 2025-09-11T07:42:28Z

connorads
Sep 11, 2025

Thanks for building and sharing!

I can see how this MCP might be useful if you're already using Storybook in order to help you build your app using existing stories and maintain your stories etc. But I'm curious if you think it'll help someone like me who is new to Storybook.

Do you think this MCP would or could help in the following contexts that don't currently use Storybook?

Enterprise looking to create or document a design system in Storybook either from scratch or documenting from an existing app.
A new non-enterprise project where a developer is collaborating with a designer to build a large website. The designer would ideally be able to vibe code the components into storybook stories, either from Figma designs or from scratch. So that then the developer could then review them and get them production ready. I'm thinking something like https://www.builder.io/c/docs/connect-storybook-repository but instead you can use Cursor or Claude Code etc.

Thanks!

1 reply

JReinhold Sep 22, 2025
Maintainer Author

Those are great questions!

From scratch yes, but existing apps, maybe not yet. The workflow we've been focusing on now, has been "create new UI, also write stories for it", and it's surprisingly good at this (not always), so this should definitetly work. However we haven't yet focused on making it write good stories for existing components, so mileage may vary there. It could be a future thing though, and it shouldn't be hard to build, it's just that the instructions and tools we're serving now are not focused on use case.
Same answer as above, this should work just fine when building UI from scratch, even if you're a designer. But maybe not great results when doing it from existing components.

We still want to iterate on the instructions for a "good" story. Some times it writes meh stories - wrong props or wrong interaction tests - so we need to get that better. But it's a good start I would say.

One thing it doesn't do currently is help you set up a new Storybook - that is a different problem we hope to tackle later. With this experimental work we've focused our efforts on projects that have already set Storybook up.
Setting up Storybook up can be a breeze or it can be finicky, depending on how much custom logic your environment demands, eg. you have a lot of global stuff that needs to be set up for your components to work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Agentic Workflow RFC #15

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

❓Questions

📖 UI Building Instructions Tool

🔗 Story URL Tool

🤔 Why an addon

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Agentic Workflow RFC #15

Uh oh!

Uh oh!

JReinhold Sep 9, 2025 Maintainer

🤖 Problem

🧩 Proposed Solution

UI Building Instructions Tool

Story URL Tool

🔮 Future Work

💬 Request For Comments

🧪 Research and Findings

❓Questions

📖 UI Building Instructions Tool

🔗 Story URL Tool

🤔 Why an addon

🔍 Related Work

Replies: 1 comment · 1 reply

Uh oh!

connorads Sep 11, 2025

Uh oh!

JReinhold Sep 22, 2025 Maintainer Author

JReinhold
Sep 9, 2025
Maintainer

Replies: 1 comment 1 reply

connorads
Sep 11, 2025

JReinhold Sep 22, 2025
Maintainer Author