This document summarizes the AdaMix paper, which proposes a new parameter-efficient fine-tuning method called AdaMix. AdaMix uses a mixture of adaptation modules, where it trains multiple views of the task by randomly routing inputs to different adaptation modules. By tuning only 0.1-0.2% of the model parameters, AdaMix outperforms both full model fine-tuning and other state-of-the-art PEFT methods on various NLU and NLG tasks according to experiments on datasets like GLUE, E2E, WebNLG and DART. AdaMix works by introducing a set of adaptation modules in each transformer layer and applying a stochastic routing policy during training, along with consistency regularization and adaptation
Related topics: