[[["เข้าใจง่าย","easyToUnderstand","thumb-up"],["แก้ปัญหาของฉันได้","solvedMyProblem","thumb-up"],["อื่นๆ","otherUp","thumb-up"]],[["ไม่มีข้อมูลที่ฉันต้องการ","missingTheInformationINeed","thumb-down"],["ซับซ้อนเกินไป/มีหลายขั้นตอนมากเกินไป","tooComplicatedTooManySteps","thumb-down"],["ล้าสมัย","outOfDate","thumb-down"],["ปัญหาเกี่ยวกับการแปล","translationIssue","thumb-down"],["ตัวอย่าง/ปัญหาเกี่ยวกับโค้ด","samplesCodeIssue","thumb-down"],["อื่นๆ","otherDown","thumb-down"]],["อัปเดตล่าสุด 2025-06-30 UTC"],[],[],null,["Gemma 3n is a generative AI model optimized for use in everyday devices, such as\nphones, laptops, and tablets. This model includes innovations in\nparameter-efficient processing, including Per-Layer Embedding (PLE) parameter\ncaching and a MatFormer model architecture that provides the flexibility to\nreduce compute and memory requirements. These models feature audio input\nhandling, as well as text and visual data.\n\nGemma 3n includes the following key features:\n\n- **Audio input** : Process sound data for speech recognition, translation, and audio data analysis. [Learn more](/gemma/docs/core/huggingface_inference#audio)\n- **Visual and text input** : Multimodal capabilities let you handle vision, sound, and text to help you understand and analyze the world around you. [Learn more](/gemma/docs/core/huggingface_inference#vision)\n- **Vision encoder:** High-performance MobileNet-V5 encoder substantially improves speed and accuracy of processing visual data. [Learn more](https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide/#mobilenet-v5:-new-state-of-the-art-vision-encoder)\n- **PLE caching** : Per-Layer Embedding (PLE) parameters contained in these models can be cached to fast, local storage to reduce model memory run costs. [Learn more](#ple-caching)\n- **MatFormer architecture:** Matryoshka Transformer architecture allows for selective activation of the models parameters per request to reduce compute cost and response times. [Learn more](#matformer)\n- **Conditional parameter loading:** Bypass loading of vision and audio parameters in the model to reduce the total number of loaded parameters and save memory resources. [Learn more](#conditional-parameter)\n- **Wide language support**: Wide linguistic capabilities, trained in over 140 languages.\n- **32K token context**: Substantial input context for analyzing data and handling processing tasks.\n\n[Try Gemma 3n](https://aistudio.google.com/prompts/new_chat?model=gemma-3n-e4b-it)\n[Get it on Kaggle](https://www.kaggle.com/models/google/gemma-3n)\n[Get it on Hugging Face](https://huggingface.co/collections/google/gemma-3n-685065323f5984ef315c93f4)\n\nAs with other Gemma models, Gemma 3n is provided with open weights and\nlicensed for responsible [commercial use](/gemma/terms), allowing you to tune\nand deploy it in your own projects and applications.\n| **Tip:** If you are interested in building generative AI solutions for Android mobile applications, check out Gemini Nano. For more information, see the Android [Gemini Nano](https://developer.android.com/ai/gemini-nano) developer docs.\n\nModel parameters and effective parameters\n\nGemma 3n models are listed with parameter counts, such as **`E2B`** and\n**`E4B`** , that are *lower* than the total number of parameters contained in the\nmodels. The **`E`** prefix indicates these models can operate with a reduced set\nof Effective parameters. This reduced parameter operation can be achieved using\nthe flexible parameter technology built into Gemma 3n models to help them run\nefficiently on lower resource devices.\n\nThe parameters in Gemma 3n models are divided into 4 main groups: text, visual,\naudio, and per-layer embedding (PLE) parameters. With standard execution of the\nE2B model, over 5 billion parameters are loaded when executing the model.\nHowever, using parameter skipping and PLE caching techniques, this model can be\noperated with an effective memory load of just under 2 billion (1.91B)\nparameters, as illustrated in Figure 1.\n\n**Figure 1.** Gemma 3n E2B model parameters running in standard execution\nversus an effectively lower parameter load using PLE caching and parameter\nskipping techniques.\n\nUsing these parameter offloading and selective activation techniques, you can\nrun the model with a very lean set of parameters or activate additional\nparameters to handle other data types such as visual and audio. These features\nenable you to ramp up model functionality or ramp down capabilities based on\ndevice capabilities or task requirements. The following sections explain more\nabout the parameter efficient techniques available in Gemma 3n models.\n\nPLE caching\n\nGemma 3n models include Per-Layer Embedding (PLE) parameters that are used\nduring model execution to create data that enhances the performance of each\nmodel layer. The PLE data can be generated separately, outside the operating\nmemory of the model, cached to fast storage, and then added to the model\ninference process as each layer runs. This approach allows PLE parameters to be\nkept out of the model memory space, reducing resource consumption while still\nimproving model response quality.\n\nMatFormer architecture\n\nGemma 3n models use a Matryoshka Transformer or *MatFormer* model architecture\nthat contains nested, smaller models within a single, larger model. The nested\nsub-models can be used for inferences without activating the parameters of the\nenclosing models when responding to requests. This ability to run just the\nsmaller, core models within a MatFormer model can reduce compute cost, and\nresponse time, and energy footprint for the model. In the case of Gemma 3n, the\nE4B model contains the parameters of the E2B model. This architecture also\nlets you select parameters and assemble models in intermediate sizes\nbetween 2B and 4B. For more details on this approach, see the\n[MatFormer research paper](https://arxiv.org/pdf/2310.07707).\nTry using MatFormer techniques to reduce the size of a Gemma 3n model with the\n[MatFormer Lab](https://goo.gle/gemma3n-matformer-lab)\nguide.\n\nConditional parameter loading\n\nSimilar to PLE parameters, you can skip loading of some parameters into memory,\nsuch as audio or visual parameters, in the Gemma 3n model to reduce memory load.\nThese parameters can be dynamically loaded at runtime if the device has the\nrequired resources. Overall, parameter skipping can further reduce the required\noperating memory for a Gemma 3n model, enabling execution on a wider range of\ndevices and allowing developers to increase resource efficiency for less\ndemanding tasks.\n\n\u003cbr /\u003e\n\n\nReady to start building?\n[Get started](/gemma/docs/get_started)\nwith Gemma models!"]]