Skip to content

2025

Reproducing Llama-Nemotron-Super-49B-V1.5 Evals

In this tutorial, we will reproduce the evals for the Llama-3.3-Nemotron-Super-49B-v1.5 model using NeMo-Skills. For an introduction to the NeMo-Skills framework, we recommend going over our introductory tutorial.

We assume you have /workspace defined in your cluster config and are executing all commands from that folder locally. Change all commands accordingly if running on slurm or using different paths.

A Simple Pipeline to Improve Math Reasoning Accuracy

This tutorial walks you through a simplified version of the pipeline that we used to win the AIMO2 Kaggle competition. We will start with Qwen2.5-14B-Instruct model that only scores ~10% on AIME24 benchmark and improve it to ~30% through a series of NeMo-Skills jobs.

If you’re following along, you’ll need access to either an NVIDIA DGX box with eight NVIDIA A100 (or newer) GPUs or a Slurm cluster with similarly configured nodes. All commands should only take ~2 hours to run.