ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents
Yilei Jiang1*, Yaozhi Zheng1*, Yuxuan Wan2*, Jiaming Han1, Qunzhong Wang1,
Michael R. Lyu2, Xiangyu Yue1✉
1CUHK MMLab, 2CUHK ARISE Lab
*Equal contribution ✉Corresponding author
ScreenCoder is an intelligent UI-to-code generation system that transforms any screenshot or design mockup into clean, production-ready HTML/CSS code. Built with a modular multi-agent architecture, it combines visual understanding, layout planning, and adaptive code synthesis to produce accurate and editable front-end code.
It also supports customized modifications, allowing developers and designers to tweak layout and styling with ease. Whether you're prototyping quickly or building pixel-perfect interfaces, ScreenCoder bridges the gap between design and development — just copy, customize, and deploy.
-
Try our huggingface demo at Demo
-
Run the demo locally (download from huggingface space):
python app.py
A showcase of how ScreenCoder transforms UI screenshots into structured, editable HTML/CSS code using a modular multi-agent framework.
youtube_demo.MP4
ins_demo.MP4
draft_demo.MP4
We present qualitative examples to illustrate the improvements achieved by our method over existing approaches. The examples below compare the output of a baseline method with ours on the same input.
As shown above, our method produces results that are more accurate, visually aligned, and semantically faithful to the original design.
main.py
: The main script to generate final HTML code for a single screenshot.UIED/
: Contains the UIED (UI Element Detection) engine for analyzing screenshots and detecting components.run_single.py
: Python script to run UI component detection on a single image.
html_generator.py
: Takes the detected component data and generates a complete HTML layout with generated code for each module.image_replacer.py
: A script to replace placeholder divs in the final HTML with actual cropped images.mapping.py
: Maps the detected UIED components to logical page regions.requirements.txt
: Lists all the necessary Python dependencies for the project.doubao_api.txt
: API key file for the Doubao model (should be kept private and is included in.gitignore
).
-
Clone the repository:
git clone https://github.com/leigest519/ScreenCoder.git cd screencoder
-
Create a virtual environment:
python3 -m venv .venv source .venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
Configure the model and API key
- Choose a generation model: Set the desired model in
block_parsor.py
andhtml_generator.py
. Supported options: Doubao(default), Qwen, GPT, Gemini. - Add the API key: Create a plain-text file (
doubao_api.txt
,qwen_api.txt
,gpt_api.txt
,gemini_api.txt
) in the project root directory that corresponds to your selected model, and paste your API key inside.
- Choose a generation model: Set the desired model in
The typical workflow is a multi-step process as follows:
-
Initial Generation with Placeholders: Run the Python script to generate the initial HTML code for a given screenshot.
- Block Detection:
python block_parsor.py
- Generation with Placeholders (Gray Images Blocks):
python html_generator.py
- Block Detection:
-
Final HTML Code: Run the python script to generate final HTML code with copped images from the original screenshot.
- Placeholder Detection:
python image_box_detection.py
- UI Element Detection:
python UIED/run_single.py
- Mapping Alignment Between Placeholders and UI Elements:
python mapping.py
- Placeholder Replacement:
python image_replacer.py
- Placeholder Detection:
-
Simple Run: Run the python script to generate the final HTML code:
python main.py
-
WebPAI (Web Development Powered by AI) released a set of research resources and datasets for webpage generation studies, aiming to build an AI platform for more reliable and practical automated webpage generation.
-
Awesome-Multimodal-LLM-for-Code maintains a comprehensive list of papers on methods, benchmarks, and evaluation for code generation under multimodal scenarios.
This project builds upon several outstanding open-source efforts. We would like to thank the authors and contributors of the following projects: UIED, DCGen, Design2Code