NVIDIA Blackwell Tops First Benchmark for Agentic AI Infrastructure

NewsNVIDIA Blackwell Tops First Benchmark for Agentic AI Infrastructure

NVIDIA Blackwell Ultra NVL72 Sets New Benchmark for Agentic AI Performance

Artificial Analysis has launched AgentPerf, the first benchmark designed specifically for agentic AI systems, revealing that NVIDIA’s Blackwell Ultra NVL72 platform outperforms its predecessor, the Hopper architecture. The initial results indicate that the Blackwell system can run 20 times more agents per megawatt than the previous generation, marking a significant advancement in the capabilities of AI infrastructure.

Understanding Agentic AI and Its Workloads

Agentic AI differs fundamentally from traditional conversational AI. While conversational AI typically involves single interactions—like a single large language model (LLM) call followed by one response—agentic AI operates in a more complex manner. An agent breaks down a goal into multiple steps and continues executing until the task is completed. This process involves chaining together numerous LLM calls and tool interactions to gather context, reason, and act.

The complexity of agentic tasks is multiplicative rather than additive, resulting in dozens or even hundreds of LLM calls in a single operation. Each call builds upon the context provided by previous interactions and may involve various tool calls such as code execution or database searches. This intricate web of tasks necessitates a different approach to performance measurement compared to existing benchmarks that focus solely on single LLM call efficiency.

NVIDIA GB300 NVL72: A Performance Powerhouse

In its first round of testing, AgentPerf evaluated agentic performance using DeepSeek V4 Pro, a large mixture-of-experts (MoE) model that represents cutting-edge capabilities in agentic AI. The results showed that NVIDIA’s GB300 NVL72 excels by running up to 20 times more agents per megawatt than the NVIDIA HGX H200 system.

The remarkable performance of the GB300 NVL72 stems from its innovative design, which integrates 72 GPUs into a single rack-scale system. This architecture allows for efficient distribution of model execution across large MoE models like DeepSeek V4 Pro. Additionally, CUDA kernels enhance this efficiency by overlapping communication with computation, minimizing latency during coordination across experts.

NVIDIA’s TensorRT LLM further optimizes performance as concurrent agent sessions scale up. It separates input processing from output generation, allowing each component to be fine-tuned independently. This comprehensive benchmarking methodology reflects real-world operational conditions for agentic AI systems.

AgentPerf: Tailored for Real-World Applications

AgentPerf was developed based on authentic coding agent trajectories where an agent receives tasks, reads files, writes and edits code, executes commands, and iterates based on outcomes—all derived from actual public code repositories across over 12 programming languages. The benchmark simulates tool calls using representative CPU processing times rather than executing them directly, ensuring that performance differences are attributed solely to accelerated computing capabilities.

This approach provides valuable insights for enterprises looking to deploy AI agents at scale. By measuring how many concurrent agentic tasks can be executed per accelerator and per megawatt of power consumed, businesses can make informed decisions about their infrastructure investments and maximize productivity.

NVIDIA Partners Leverage Blackwell’s Superior Performance

Several leading inference providers are already utilizing NVIDIA’s Blackwell architecture to support agentic workloads on advanced models like DeepSeek V4 Pro. Companies such as Baseten, DeepInfra, and Together AI are actively deploying production applications powered by this new technology.

Together AI is enhancing real-time inference for Cursor, an AI-driven coding platform that employs agents to debug issues and generate features while developers maintain their workflow. Similarly, DeepInfra supports Pam.ai, an AI workforce platform designed for car dealerships that deploys agents to manage service appointments and sales campaigns entirely on NVIDIA Blackwell infrastructure.

As NVIDIA continues to refine its inference software alongside the open-source community, improvements in both performance and efficiency for agentic workloads are anticipated. The introduction of the Vera Rubin architecture is expected to further elevate infrastructure capacity to meet the increasing demands of scalable agentic AI applications.

What This Means

The launch of AgentPerf marks a pivotal moment in evaluating agentic AI systems’ performance capabilities. With NVIDIA’s Blackwell Ultra NVL72 leading the way in efficiency and scalability, enterprises can now better assess how their infrastructure investments translate into productive outcomes when deploying complex AI agents at scale. As this technology matures and becomes more widely adopted, it could fundamentally reshape how organizations leverage artificial intelligence across various sectors.

For more information, read the original report here.

Neil S
Neil S
Neil is a highly qualified Technical Writer with an M.Sc(IT) degree and an impressive range of IT and Support certifications including MCSE, CCNA, ACA(Adobe Certified Associates), and PG Dip (IT). With over 10 years of hands-on experience as an IT support engineer across Windows, Mac, iOS, and Linux Server platforms, Neil possesses the expertise to create comprehensive and user-friendly documentation that simplifies complex technical concepts for a wide audience.
Watch & Subscribe Our YouTube Channel
YouTube Subscribe Button

Latest From Hawkdive

You May like these Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.