This document describes a distributed staged architecture for enabling multimodal applications on mobile devices. The architecture aims to provide device and domain independence. It consists of a client-server model with a staged pipeline of XML transformations between sub-architectures for adaptation, dialog management, and the domain-specific documentation application. The architecture was implemented as an XML transformation pipeline and evaluated for mobile maintenance documentation applications, demonstrating device and domain independence. Key research questions remaining are how to better apply the multimodal interaction framework to concrete architectures.