NVIDIA’s Blackwell Platform Sets New Benchmark for Agentic AI Performance
Artificial Analysis has unveiled AgentPerf, the first benchmark specifically designed for agentic AI systems, showcasing the NVIDIA Blackwell Ultra NVL72 platform’s remarkable capabilities. The results highlight that the Blackwell architecture can manage up to 20 times more agents per megawatt compared to its predecessor, the NVIDIA Hopper, establishing a new standard in performance for agentic workloads.
Understanding Agentic AI
Agentic AI differs significantly from traditional conversational AI. While conversational AI typically involves a single large language model (LLM) call to generate a response, agentic AI operates more like a relay system. It breaks complex tasks into multiple steps and continues processing until the goal is achieved. This methodology requires chaining together numerous LLM calls and tool interactions, allowing agents to gather context, observe their environment, reason about their actions, and execute tasks.
The nature of these tasks introduces a multiplicative complexity rather than an additive one. Existing benchmarks primarily measure the speed of individual LLM calls and system capacity for simultaneous requests. However, they fall short in evaluating agentic workloads where multiple LLM calls and tool delays can significantly impact performance metrics.
NVIDIA GB300 NVL72: A Performance Leader
In its inaugural results, AgentPerf assessed agentic performance using DeepSeek V4 Pro, a large mixture-of-experts (MoE) model known for powering advanced agents. The findings reveal that the NVIDIA GB300 NVL72 excels in this benchmark by running up to 20 times more agents per megawatt than the NVIDIA HGX H200 system.
This performance leap is attributed to an integrated design that connects 72 GPUs within a single rack-scale system, allowing large MoE models to execute efficiently at scale. The use of CUDA kernels enhances this efficiency by overlapping communication with computation, effectively minimizing latency during coordinated operations across multiple experts.
Furthermore, NVIDIA’s TensorRT LLM architecture maintains efficiency as the number of concurrent agent sessions increases. By separating input processing from output generation, each can be optimized independently without compromising overall system performance.
AgentPerf: Real-World Applications
The AgentPerf benchmark was developed based on actual coding workflows where an agent receives tasks such as reading files, writing code, executing commands, and iterating based on outcomes—all derived from real public code repositories across more than 12 programming languages. This design ensures that long sequence lengths and tool call patterns accurately reflect typical coding environments.
AgentPerf measures how many agentic tasks a platform can handle simultaneously while meeting specific performance criteria for responsiveness and output token rates. Notably, tool calls are simulated rather than executed; this approach isolates differences in results solely to accelerated computing performance.
The implications of these metrics are significant for enterprises aiming to deploy AI agents at scale. Understanding how many concurrent tasks can be executed per accelerator and per megawatt directly influences infrastructure investment decisions and productivity outcomes.
NVIDIA Partners Leverage Blackwell’s Capabilities
Several leading inference providers are already utilizing the NVIDIA Blackwell platform to support agentic workloads on frontier models like DeepSeek V4 Pro. Companies such as Baseten, DeepInfra, and Together AI are actively deploying production applications powered by this architecture.
Together AI has implemented real-time inference capabilities for Cursor, an AI-driven coding platform that assists developers by debugging issues and generating features while they continue their work. Similarly, DeepInfra powers Pam.ai—a workforce management solution for car dealerships—using NVIDIA Blackwell to deploy agents capable of handling service bookings and outbound sales campaigns autonomously.
As NVIDIA collaborates with the open-source ecosystem to enhance inference software further, improvements in performance and efficiency for agentic workloads are expected to accelerate. The introduction of the Vera Rubin architecture marks another step forward in meeting the growing demands of scalable agentic AI solutions.
What This Means
The launch of AgentPerf alongside NVIDIA’s impressive benchmark results signifies a pivotal moment for enterprises looking to implement agentic AI at scale. With enhanced performance metrics that reflect real-world applications and workflows, organizations can make informed decisions regarding infrastructure investments. As technology continues to evolve rapidly in this space, understanding these advancements will be crucial for businesses aiming to leverage AI effectively in their operations.
For more information, read the original report here.



































