In a significant advancement for cloud computing and artificial intelligence (AI), Amazon Web Services (AWS) has announced the general availability of its Amazon Elastic Compute Cloud (EC2) P6e-GB200 UltraServers. This latest offering is powered by the robust NVIDIA GB200 NVL72 technology, designed to deliver the highest graphical processing unit (GPU) performance for AI training and inference tasks. The introduction of these UltraServers is a game-changer for businesses and developers who rely on high-performance computing for AI applications.
The Amazon EC2 UltraServers are engineered to connect multiple EC2 instances using a unique accelerator interconnect. This setup offers high bandwidth and low latency, ensuring efficient processing across interconnected instances. At the heart of these UltraServers is the NVIDIA Grace Blackwell Superchip, which combines two high-performance NVIDIA Blackwell tensor core GPUs with an NVIDIA Grace CPU based on Arm architecture. The interconnection is handled via NVIDIA’s NVLink-C2C, a technology that ensures seamless communication between the GPU and CPU.
Each Grace Blackwell Superchip is capable of delivering an impressive 10 petaflops of FP8 compute power. For those unfamiliar with the term, "petaflop" is a measure of a computer’s processing speed, with one petaflop equating to a thousand trillion (10^15) floating-point operations per second. This immense computing power is complemented by up to 372 GB of HBM3e memory, a high-bandwidth memory type crucial for handling large datasets and complex computations.
The architecture of the superchip colocates the GPU and CPU within a single compute module. This arrangement significantly boosts the bandwidth between the GPU and CPU, surpassing what is currently available in the EC2 P5en instances. This improvement is particularly beneficial for AI workloads, where the speed of data transfer between processing units is often a bottleneck.
With the EC2 P6e-GB200 UltraServers, users can access up to 72 NVIDIA Blackwell GPUs within a single NVLink domain. This configuration provides a staggering 360 petaflops of FP8 compute power and 13.4 terabytes of high-bandwidth memory. Such specifications make these UltraServers ideal for the most compute-intensive AI tasks, including training models with trillions of parameters and performing complex inference tasks.
Furthermore, the UltraServers are integrated with the AWS Nitro System, a collection of building blocks that enable high performance, high availability, and high security of the AWS platform. This integration allows the UltraServers to scale securely and reliably to tens of thousands of GPUs, accommodating the needs of large-scale AI projects.
In terms of networking, the EC2 P6e-GB200 UltraServers deliver up to 28.8 terabits per second (Tbps) of total Elastic Fabric Adapter (EFAv4) networking. The Elastic Fabric Adapter is a network interface designed for high-performance computing applications, ensuring low-latency communication between GPUs across different servers without the need for the operating system to intervene.
Here are the specifications for the EC2 P6e-GB200 UltraServers:
UltraServer type
- u-p6e-gb200x36:
- GPUs: 36
- GPU Memory: 6660 GB
- vCPUs: 1296
- Instance Memory: 8640 GiB
- Instance Storage: 202.5 TB
- Aggregate EFA Network Bandwidth: 14400 Gbps
- EBS Bandwidth: 540 Gbps
- u-p6e-gb200x72:
- GPUs: 72
- GPU Memory: 13320 GB
- vCPUs: 2592
- Instance Memory: 17280 GiB
- Instance Storage: 405 TB
- Aggregate EFA Network Bandwidth: 28800 Gbps
- EBS Bandwidth: 1080 Gbps
These UltraServers are particularly suited for AI workloads that demand significant compute and memory resources, such as training and inference of advanced AI models like mixture of experts models and reasoning models at a trillion-parameter scale.
The EC2 P6e-GB200 UltraServers can be used to build a wide range of generative AI applications. These applications include question answering systems, code generation tools, video and image generation software, and speech recognition systems. The potential applications are vast, offering opportunities for innovation across various industries.
For those looking to deploy these UltraServers, AWS offers them in the Dallas Local Zone through EC2 Capacity Blocks for machine learning (ML). The Dallas Local Zone (us-east-1-dfw-2a) is an extension of the US East (N. Virginia) Region, enabling users to take advantage of local resources for their AI projects.
To reserve EC2 Capacity Blocks, users can navigate to the "Capacity Reservations" section in the Amazon EC2 console. Here, they can purchase Capacity Blocks for ML, selecting their desired capacity and specifying the duration for which they need the EC2 Capacity Block, whether for the u-p6e-gb200x36 or the u-p6e-gb200x72 UltraServers.
Once a Capacity Block is successfully scheduled, it is charged upfront, and its price remains fixed after purchase. Billing occurs within 12 hours of purchasing the EC2 Capacity Blocks. For more information, users can refer to the "Capacity Blocks for ML" section in the Amazon EC2 User Guide.
To run instances within a purchased Capacity Block, users can utilize the AWS Management Console, AWS Command Line Interface (CLI), or AWS Software Development Kits (SDKs). On the software side, AWS provides Deep Learning Amazon Machine Images (AMIs) preconfigured with popular frameworks and tools like PyTorch and JAX, making it easier for developers to get started with their AI projects.
The EC2 P6e-GB200 UltraServers can be seamlessly integrated with various AWS managed services. For instance, Amazon SageMaker Hyperpod offers managed, resilient infrastructure that automatically handles the provisioning and management of P6e-GB200 UltraServers. This service replaces faulty instances with preconfigured spare capacity within the same NVLink domain, ensuring consistent performance.
Amazon Elastic Kubernetes Services (Amazon EKS) allows for the automation of provisioning and lifecycle management within Kubernetes clusters, with one managed node group spanning multiple P6e-GB200 UltraServers. EKS topology-aware routing for P6e-GB200 UltraServers enables optimal placement of tightly coupled components of distributed workloads within a single UltraServer’s NVLink-connected instances.
For data access, Amazon FSx for Lustre file systems provide the high throughput and input/output operations per second (IOPS) required for large-scale high-performance computing (HPC) and AI workloads. Users can also utilize up to 405 terabytes of local NVMe SSD storage or virtually unlimited cost-effective storage with Amazon Simple Storage Service (Amazon S3).
Developers and businesses are encouraged to try out the Amazon EC2 P6e-GB200 UltraServers via the Amazon EC2 console. For more detailed information, they can visit the Amazon EC2 P6e instances page and provide feedback through AWS re:Post for EC2 or their usual AWS Support channels.
For those interested in learning more, the official announcement and further details can be accessed on the AWS website at AWS EC2 P6e Instances Page.
This latest development from AWS underscores the company’s commitment to providing cutting-edge technology solutions that empower businesses and developers to tackle the most demanding computational challenges efficiently.
For more Information, Refer to this article.

































