TensorRT

Aug 05, 2025

Delivering 1.5 M TPS Inference on NVIDIA GB200 NVL72, NVIDIA Accelerates OpenAI gpt-oss Models from Cloud to Edge

NVIDIA and OpenAI began pushing the boundaries of AI with the launch of NVIDIA DGX back in 2016. The collaborative AI innovation continues with the OpenAI...

6 MIN READ

Aug 01, 2025

Optimizing LLMs for Performance and Accuracy with Post-Training Quantization

Quantization is a core tool for developers aiming to improve inference performance with minimal overhead. It delivers significant gains in latency, throughput,...

14 MIN READ

Jul 24, 2025

Double PyTorch Inference Speed for Diffusion Models Using Torch-TensorRT

NVIDIA TensorRT is an AI inference library built to optimize machine learning models for deployment on NVIDIA GPUs. TensorRT targets dedicated hardware in...

8 MIN READ

Jul 07, 2025

LLM Inference Benchmarking: Performance Tuning with TensorRT-LLM

This is the third post in the large language model latency-throughput benchmarking series, which aims to instruct developers on how to benchmark LLM inference...

11 MIN READ

Jul 02, 2025

Optimizing FLUX.1 Kontext for Image Editing with Low-Precision Quantization

FLUX.1 Kontext, the recently released model from Black Forest Labs, is a fascinating addition to the repertoire of community image generation models. The open...

10 MIN READ

Jun 25, 2025

Check Out Sovereign AI in Practice Through an NVIDIA Webinar

Join NVIDIA experts and leading European model builders on July 8 for a webinar on building and deploying multilingual large language models.

1 MIN READ

Jun 25, 2025

How to Streamline Complex LLM Workflows Using NVIDIA NeMo-Skills

A typical recipe for improving LLMs involves multiple stages: synthetic data generation (SDG), model training through supervised fine-tuning (SFT) or...

10 MIN READ

Jun 24, 2025

Introducing NVFP4 for Efficient and Accurate Low-Precision Inference

To get the most out of AI, optimizations are critical. When developers think about optimizing AI models for inference, model compression techniques—such as...

11 MIN READ

Jun 12, 2025

Run High-Performance AI Applications with NVIDIA TensorRT for RTX

NVIDIA TensorRT for RTX is now available for download as an SDK that can be integrated into C++ and Python applications for both Windows and Linux. At...

7 MIN READ

May 30, 2025

NVIDIA Deep Learning Institute Offers Multilingual AI Training at GTC Paris

Large language models (LLMs) are capable of recognizing, summarizing, translating, predicting, and generating content. Yet even the most powerful LLMs face...

6 MIN READ

May 22, 2025

Blackwell Breaks the 1,000 TPS/User Barrier With Meta’s Llama 4 Maverick

NVIDIA has achieved a world-record large language model (LLM) inference speed. A single NVIDIA DGX B200 node with eight NVIDIA Blackwell GPUs can achieve over...

9 MIN READ

May 19, 2025

NVIDIA TensorRT for RTX Introduces an Optimized Inference AI Library on Windows 11

AI experiences are rapidly expanding on Windows in creativity, gaming, and productivity apps. There are various frameworks available to accelerate AI inference...

9 MIN READ

May 18, 2025

Spotlight: Perfect Corp. Delivers Personalized Digital Beauty Experiences Using NVIDIA TensorRT and NVENC

Augmented reality (AR) and AI are revolutionizing the beauty and fashion industry by offering hyperpersonalized experiences, from virtual try-ons to AI-driven...

4 MIN READ

May 14, 2025

NVIDIA TensorRT Unlocks FP4 Image Generation for NVIDIA Blackwell GeForce RTX 50 Series GPUs

The launch of the NVIDIA Blackwell platform ushered in a new era of improvements in generative AI technology. At its forefront is the newly launched GeForce RTX...

11 MIN READ

May 02, 2025

Integrate and Deploy Tongyi Qwen3 Models into Production Applications with NVIDIA

Alibaba recently released Tongyi Qwen3, a family of open-source hybrid-reasoning large language models (LLMs). The Qwen3 family consists of two MoE models,...

7 MIN READ

Apr 24, 2025

Benchmarking Agentic LLM and VLM Reasoning for Gaming with NVIDIA NIM

This is the first post in the LLM Benchmarking series, which shows how to use GenAI-Perf to benchmark the Meta Llama 3 model when deployed with NVIDIA NIM. ...

7 MIN READ