NVIDIA Nemotron 3 Super: A New Open Model For Massive AI Workloads

What Is NVIDIA Nemotron 3 Super

NVIDIA has introduced Nemotron 3 Super, a powerful new open model built to handle very large and complex AI tasks. It has 120 billion parameters in total, with 12 billion active during inference, making it a highly efficient choice for multi agent AI systems that need to think through long workflows.

Nemotron 3 Super focuses on advanced reasoning, accuracy and speed for autonomous agents. It is already being used in real products and platforms, from AI search engines to code review tools and enterprise software, which shows it is ready for practical deployment and not just a research project.

One of the key ideas behind Nemotron 3 Super is to support the next generation of applications that go far beyond simple chatbots. These apps use multiple agents that talk to each other, call tools, and manage long running tasks like software development, deep research or complex financial analysis.

Why Multi Agent Systems Need Models Like Nemotron 3 Super

When companies start building multi agent systems, they quickly run into two big challenges: context explosion and the thinking tax.

Context explosion happens because each agent in the system needs access to the full history of the conversation and the tools that have been used. Every step often requires resending a growing amount of text, logs and intermediate results. Research has shown that this can create up to fifteen times more tokens than a normal chat style conversation, which drives up cost and can cause the system to lose track of the original goal.

The thinking tax appears when developers try to use very large models for every single subtask. If the system has to call a large model repeatedly for small steps, the whole workflow becomes slow and expensive. This makes it difficult to scale agentic applications to real world production use.

Nemotron 3 Super attacks both of these problems. It offers a one million token context window so agents can keep the entire workflow in memory. That helps prevent goal drift because the system does not need to repeatedly trim or regenerate context. At the same time, the model architecture is tuned for high efficiency so that developers can run complex reasoning without blowing their budget.

The model already sits at the top of the Artificial Analysis ranking for its mix of efficiency, openness and accuracy among models of similar size. It also powers the NVIDIA AI Q research agent, which holds the number one spot on DeepResearch Bench and DeepResearch Bench II. These benchmarks test how well an AI system can perform multi step research over large document collections while staying coherent and accurate.

Inside The Hybrid Architecture

Nemotron 3 Super uses a hybrid mixture of experts architecture that blends several innovations to boost speed and accuracy.

Mamba layers are used for memory and compute efficiency. They can deliver around four times better efficiency compared to standard approaches, which is crucial when working with a million token context window.
Transformer layers provide the heavy lifting for advanced reasoning and language understanding, keeping the model competitive on complex tasks.
Mixture of experts means that only 12 billion of the 120 billion parameters are active at any one time during inference. This allows the model to stay large in capacity while keeping runtime costs under control.
Latent mixture of experts is a new method where four expert specialists are effectively activated for the cost of one when generating the next token. This boosts accuracy while keeping compute budget in check.
Multi token prediction lets the model predict several future words at once instead of one at a time. This can make inference up to three times faster in real workloads.

On NVIDIA Blackwell hardware the model runs in NVFP4 precision, which significantly cuts memory needs and can make inference up to four times faster than FP8 on previous generation NVIDIA Hopper GPUs, without hurting accuracy. This combination of architecture and hardware support is designed for very high throughput environments.

Open Weights, Training Data And Real World Uses

NVIDIA is releasing Nemotron 3 Super with open weights under a permissive license. Developers can deploy and customize it on workstations, in their own data centers, or in the cloud. This openness extends beyond the weights. The company is also sharing its full training methodology.

The model was trained on synthetic data generated using strong reasoning models. NVIDIA is publishing more than ten trillion tokens of pre training and post training data, along with fifteen reinforcement learning environments and evaluation recipes. Researchers who want to adapt the model can use the NVIDIA NeMo platform to fine tune or even build their own models with similar techniques.

Nemotron 3 Super is built specifically for use inside agentic systems where it can handle difficult subtasks:

In software development an agent can load an entire codebase in one shot, enabling full end to end code generation and debugging without slicing the project into many small chunks.
In finance the model can keep thousands of pages of reports in context, avoiding repeated re reasoning and speeding up long analytical workflows.
In cybersecurity it offers high accuracy tool calling so autonomous agents can safely navigate very large function libraries without causing dangerous execution errors.

How To Access Nemotron 3 Super

Nemotron 3 Super is part of the broader Nemotron 3 family and is already available on several platforms. You can find it on build dot nvidia dot com, on Perplexity, OpenRouter and Hugging Face. Dell Technologies is bringing the model to the Dell Enterprise Hub on Hugging Face, tuned for on premise deployment as part of the Dell AI Factory. HPE is integrating Nemotron into its agents hub for enterprise level agentic AI.

Cloud access is broad. The model is available through Google Cloud Vertex AI and Oracle Cloud Infrastructure, with support coming to Amazon Web Services via Amazon Bedrock and to Microsoft Azure. NVIDIA cloud partners such as Coreweave, Crusoe, Nebius and Together AI offer hosted inference. Additional inference providers like Baseten, Cloudflare, DeepInfra, Fireworks AI, Inference dot net, Lightning AI, Modal and FriendliAI also support deployments.

Nemotron 3 Super is packaged as an NVIDIA NIM microservice. This means enterprises can deploy the model in a consistent way from on premises systems to cloud environments, making it easier to plug into existing infrastructure and multi agent applications.

For developers and researchers who want to go deeper into agentic AI and Nemotron, NVIDIA provides documentation, news, community channels and a collection of self paced video tutorials and livestreams.

Original article and image: https://blogs.nvidia.com/blog/nemotron-3-super-agentic-ai/