Introducing NVIDIA Dynamo 1.0: High-Performance Inference for DigitalOcean Users

NVIDIA Dynamo 1.0, the latest release that was unveiled at NVIDIA GTC, is now accessible to DigitalOcean users, promising to boost performance and cut costs. This groundbreaking technology offers a remarkable 7x improvement in inference performance on NVIDIA GB200 NVL systems. By combining Dynamo 1.0 with DigitalOcean’s Agentic Inference Cloud, customers can enjoy enhanced performance at a lower cost, making deployment seamless and efficient.

DigitalOcean, in collaboration with NVIDIA, has already achieved a 67% reduction in costs for clients like Workato. Now, with the introduction of Dynamo 1.0, businesses running production-grade agentic workflows can unlock even greater benefits. Customers can access NVIDIA Dynamo 1.0 as a container image that can be run on a Droplet or deployed directly on DigitalOcean Kubernetes with an inference runtime (vLLM, SGlang, TensorRT).

NVIDIA Dynamo serves as a cutting-edge inference service framework designed to accelerate and optimize large-scale generative AI and inference models. Acting as an orchestration layer above engines like vLLM, SGLang, and NVIDIA TensorRT-LLM, Dynamo efficiently manages GPU and memory resources across a cluster, reducing bottlenecks by intelligently routing requests.

Key technical advancements offered by Dynamo 1.0 include:

– 7x Performance Boost: When paired with NVIDIA Blackwell Ultra GPUs, Dynamo can significantly increase inference performance, resulting in lower costs per token.
– KV-Aware Routing: Dynamo intelligently routes requests to specific GPUs that already possess relevant memory from previous interactions, enhancing efficiency.
– Disaggregated Serving: Dynamo divides the prefill and decode phases across different GPUs to maximize utilization and reduce latency.
– Memory Offloading: The KV Block Manager (KVBM) facilitates data movement between high-speed GPU memory and lower-cost storage tiers, enabling the handling of massive context windows without memory limitations.

Customers utilizing NVIDIA Dynamo on DigitalOcean can benefit from strong price-to-performance ratios, easy setup, and an environment that aligns well with Dynamo Architecture. DigitalOcean has previously collaborated with Workato’s AI Research Lab to scale agentic AI capabilities across its platform, processing over 1 trillion automated workloads. By deploying NVIDIA Dynamo with vLLM on DigitalOcean Managed Kubernetes (DOKS), Workato achieved significant improvements:

– 67% higher throughput per GPU with 79% lower end-to-end latency and 77% time-to-first-token compared to other configurations on identical hardware.
– 33% lower hardware cost using a NVIDIA H200 GPU vs. a NVIDIA A100 GPU for equivalent performance.
– 67% lower model cost while utilizing half the GPUs.

The introduction of Dynamo 1.0, alongside the availability of NVIDIA HGX B300s, is expected to drive even greater performance and cost efficiencies for customers like Workato.

As part of this year’s NVIDIA GTC, DigitalOcean has unveiled various product releases and updates aimed at enhancing the capabilities of the Agentic Inference Cloud. These include the new AI-first Richmond Data Center, support for NVIDIA Agent Toolkit and NemoClaw, compatibility with NVIDIA Nemotron 3 Super and other high-performance models, and more. Users can learn more about these developments directly from DigitalOcean’s CTO by visiting the latest DigitalOcean and NVIDIA GTC announcements.

Overall, the integration of NVIDIA Dynamo 1.0 with DigitalOcean’s platform showcases the commitment to advancing AI capabilities, driving performance enhancements, and optimizing cost efficiency for businesses running production-grade agentic workflows.
For more Information, Refer to this article.

Introducing NVIDIA Dynamo 1.0: High-Performance Inference for DigitalOcean Users

You may also like these:

Latest From Hawkdive

You May like these Related Articles

LEAVE A REPLY Cancel reply