Automating Bug Reproduction with LLM, Playwright, and Python: A Practical Experiment

Ameya N.

Published Jul 21, 2025

Why I Built This

Reproducing bugs is one of the most repetitive and time-consuming parts of software development and testing. When a bug is reported, someone needs to interpret the report, write a test case, execute it, and confirm whether the bug is still present.

In fast-moving environments, especially during hotfixes or regression cycles, this process can slow down teams and introduce uncertainty into the release process.

I started asking:

Can we convert bug reports directly into executable tests?
Can we use LLMs to interpret vague bug descriptions reliably?
Can we automate the repetitive aspects of bug triage while maintaining test quality?

This led me to build an AI-powered Bug Reproduction Assistant that connects LLMs with Playwright (for UI testing) and Python Requests (for API testing) to automate this workflow.

How It Works

UI Bug Reproduction

API Bug Reproduction

This means faster triage, clearer validation, and less manual effort during testing cycles.

Advantages of This Approach

Saves Time: Reduces the manual cycle of interpreting, writing, and running scripts for each bug report.
Structured Validation: Using assertions ensures the bug is truly reproduced or verified as fixed, avoiding “false green” test passes.
Supports both UI and API. Bugs: Flexible across front-end and back-end workflows.
Integrates with CI/CD: Can plug into pipelines to automatically validate bugs against new builds.

Limitations and Challenges

Selector Stability: LLM-generated selectors can fail if the UI changes frequently or if the initial report lacks details.
Prompt Sensitivity: The quality of generated tests depends on prompt structure and clarity.
Not a Replacement for Critical Reviews: Critical bugs still require human judgment, especially for high-risk issues.
Needs Visual Context: Screenshots help but cannot replace thorough exploratory testing in some cases.

What I Learned

Building this project taught me several important lessons:

Recommended by LinkedIn

Optimize Cursor Workflow

Henri Johnson 7 months ago

Can you Vibe Code?

Tom Gonser 2 months ago

What language should LLMs program in?

LinearB 4 months ago

Prompt Engineering is Practical: Designing effective prompts for LLMs is a skill that directly affects output quality.
Assertions Matter: Simply running tests isn’t enough; clear assertions are necessary to accurately confirm the bug status.
LLM + Existing Frameworks are Powerful: Combining AI with proven tools like Playwright creates practical workflow improvements.
Automation Needs Guardrails: Even as we automate, we need mechanisms to catch false positives and maintain trust in test results.

Future Scope

While the current version is already saving time, I see several clear next steps:

Jira and GitHub Integration: Automatically fetch bug reports and post reproduction results back to issues.
Visual Diffing: To catch subtle UI changes that assertions may miss.
Video Recording: To capture the reproduction steps for stakeholder visibility.

Why This Matters

Shipping quality software quickly is a balancing act. Manual, repetitive testing tasks slow teams down, while skipping steps introduces risk. By automating bug reproduction, we can speed up validation while maintaining confidence in our fixes.

This project is a step toward making QA workflows more efficient, reducing cognitive load for testers, and allowing teams to focus on high-value exploratory testing and quality advocacy.

AI can write tests, but can it understand why a bug matters?

While building this tool, I realised the hardest part isn’t generating test scripts—it’s ensuring those tests validate what matters to users and the business.

Automating bug reproduction is a step toward faster, reliable testing, but it also forces us to reflect:

What signals confirm a bug is fixed?
How do we ensure we aren’t automating the wrong things?
Can we balance speed with the judgment and context only humans bring?

As we advance with AI in testing, these questions will shape the next wave of impactful tools in quality engineering.

Let’s Connect

If you are working on AI for QA, agentic testing, or workflow automation, I would love to hear how you are tackling these challenges and explore opportunities to learn from each other.

Would you use an AI-powered tool for bug reproduction in your workflow? What limitations or concerns do you see with this approach?

I look forward to hearing your thoughts.

To view or add a comment, sign in

Automating Bug Reproduction with LLM, Playwright, and Python: A Practical Experiment

Ameya N.

Why I Built This

How It Works

Advantages of This Approach

Limitations and Challenges

What I Learned

Recommended by LinkedIn

Future Scope

Why This Matters

Let’s Connect

More articles by Ameya N.

Others also viewed

Type‑Safe Python & LLM Style Guides (how I guide my GenAI coding partner)

Belitsoft Reviews Python Web Development Services in the USA – 2026 Forecast

Designing an AI-Wrapper Architecture for a Code Assistant

Catching What the Model Missed: Reviewing Code from Large Language Models

Why I Choose TypeScript for LLM‑Based Coding

Write Tests that Speed You Up, Not Slow You Down

Vibe Coding with Junie - JetBrains AI Coding Agent

Vibe Coding for Beginners: Learn to Code with AI

Streaming Amazon Bedrock with AWS Lambda on a custom python runtime

How to Deploy any LLM (ChatGPT like) Python App on Azure

Explore content categories

Why I Built This

How It Works

Advantages of This Approach

Limitations and Challenges

What I Learned

Recommended by LinkedIn

Future Scope

Why This Matters

Let’s Connect

More articles by Ameya N.

From Prompting to Engineering: Designing Better Interactions with LLMs

My Epic AWS India Summit 2025 Adventure in Mumbai!

APIGenie

URIBook

API Testing with my Excel Tool

Find_And_Replace

compare-screenshots

Salesforce Automation Tool

Database Automation Testing Framework

Web Scraping Framework to fetch Stock Prices

Others also viewed

Type‑Safe Python & LLM Style Guides (how I guide my GenAI coding partner)

Belitsoft Reviews Python Web Development Services in the USA – 2026 Forecast

Designing an AI-Wrapper Architecture for a Code Assistant

Catching What the Model Missed: Reviewing Code from Large Language Models

Why I Choose TypeScript for LLM‑Based Coding

Write Tests that Speed You Up, Not Slow You Down

Vibe Coding with Junie - JetBrains AI Coding Agent

Vibe Coding for Beginners: Learn to Code with AI

Streaming Amazon Bedrock with AWS Lambda on a custom python runtime

How to Deploy any LLM (ChatGPT like) Python App on Azure

Explore content categories