Skip to content

X-PLUG/MobileAgent

Repository files navigation

Mobile-Agent: The Powerful GUI Agent Family

MobileAgent | Trendshift

🤗 GUI-Owl-32B | GUI-Owl-32B | 🤗 GUI-Owl-7B | GUI-Owl-7B

📢News

  • [2025.8.20]🔥 All new GUI-Owl and Mobile-Agent-v3 are released! Technical report can be found here. And model checkpoint will be released on GUI-Owl-7B and GUI-Owl-32B.
    • GUI-Owl is a multi-modal cross-platform GUI VLM with GUI perception, grounding, and end-to-end operation capabilities.
    • Mobile-Agent-v3 is a cross-platform multi-agent framework based on GUI-Owl. It provides capabilities such as planning, progress management, reflection, and memory.
  • [2025.8.14]🔥 Mobile-Agent-v3 won the best demo award at the The 24rd China National Conference on Computational Linguistics (CCL 2025).
  • [2025.3.17] PC-Agent has been accepted by the ICLR 2025 Workshop.
  • [2024.9.26] Mobile-Agent-v2 has been accepted by The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024).
  • [2024.7.29] Mobile-Agent won the best demo award at the The 23rd China National Conference on Computational Linguistics (CCL 2024).
  • [2024.3.10] Mobile-Agent has been accepted by the ICLR 2024 Workshop.

📊Results

👀Features

GUI-Owl

  • SOTA results within 7B.
  • A native end-to-end multimodal agent designed as a foundational model for GUI automation.
  • Unifying perception, grounding, reasoning, planning, and action execution within a single policy network.
  • Robust cross-platform interaction and multi-turn decision making with explicit intermediate reasoning.
  • GUI-Owl can be instantiated as different specialized agents within Mobile-Agent-v3.

Mobile-Agent-v3

  • Dynamic task decomposition, planning and progress management.
  • The highly integrated operating space reduces the perception and operation frequency of the model.
  • Extensive exception handling and reflection capabilities provide more stable performance in scenarios such as pop-ups and advertisements.
  • The key information recording capability enables cross-application tasks.

📝Series of Work

📺Demo

💻PC + 🌐Web

Search for Alibaba's stock price in the Edge browser. Then create a new table in WPS, fill in the company name in the first column and the stock price in the second column.

Search_WPS.mp4

💻PC

Create a new blank PPT, and then insert a piece of text in the form of Word Art into the first slide, with the content being "Alibaba".

PPT.mp4

🌐Web

Please help me search for flights from Beijing to Paris on Skyscanner departing on September 18th and returning on September 21st.

Skyscanner.mp4

Go to bilibili, check out Jun Lei’s videos, and like the first one.

bilibili.mp4

📱Phone

Please help me search for Jinan travel guides on Xiaohongshu, sort them by the number of collections, and save the first note.

default.mp4

Please help me search for details of Jinan Daming Lake Scenic Area on Ctrip, including address and ticket price, etc.

default.mp4

⭐Star History

Star History Chart

📑Citation

If you find Mobile-Agent useful for your research and applications, please cite using this BibTeX:

@misc{ye2025mobileagentv3foundamentalagentsgui,
      title={Mobile-Agent-v3: Foundamental Agents for GUI Automation}, 
      author={Jiabo Ye and Xi Zhang and Haiyang Xu and Haowei Liu and Junyang Wang and Zhaoqing Zhu and Ziwei Zheng and Feiyu Gao and Junjie Cao and Zhengxi Lu and Jitong Liao and Qi Zheng and Fei Huang and Jingren Zhou and Ming Yan},
      year={2025},
      eprint={2508.15144},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2508.15144}, 
}

@article{wanyan2025look,
  title={Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation},
  author={Wanyan, Yuyang and Zhang, Xi and Xu, Haiyang and Liu, Haowei and Wang, Junyang and Ye, Jiabo and Kou, Yutong and Yan, Ming and Huang, Fei and Yang, Xiaoshan and others},
  journal={arXiv preprint arXiv:2506.04614},
  year={2025}
}

@article{liu2025pc,
  title={PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC},
  author={Liu, Haowei and Zhang, Xi and Xu, Haiyang and Wanyan, Yuyang and Wang, Junyang and Yan, Ming and Zhang, Ji and Yuan, Chunfeng and Xu, Changsheng and Hu, Weiming and Huang, Fei},
  journal={arXiv preprint arXiv:2502.14282},
  year={2025}
}

@article{wang2025mobile,
  title={Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks},
  author={Wang, Zhenhailong and Xu, Haiyang and Wang, Junyang and Zhang, Xi and Yan, Ming and Zhang, Ji and Huang, Fei and Ji, Heng},
  journal={arXiv preprint arXiv:2501.11733},
  year={2025}
}

@article{wang2024mobile2,
  title={Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration},
  author={Wang, Junyang and Xu, Haiyang and Jia, Haitao and Zhang, Xi and Yan, Ming and Shen, Weizhou and Zhang, Ji and Huang, Fei and Sang, Jitao},
  journal={arXiv preprint arXiv:2406.01014},
  year={2024}
}

@article{wang2024mobile,
  title={Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception},
  author={Wang, Junyang and Xu, Haiyang and Ye, Jiabo and Yan, Ming and Shen, Weizhou and Zhang, Ji and Huang, Fei and Sang, Jitao},
  journal={arXiv preprint arXiv:2401.16158},
  year={2024}
}