Oracle and AMD have announced a major partnership aimed at delivering large-scale, high-performance AI infrastructure on Oracle Cloud Infrastructure (OCI). Central to this collaboration is the availability of AMD’s latest Instinct MI355X GPUs, designed to double the price-performance ratio for AI training and inference workloads compared to the previous generation.
OCI will host zettascale AI clusters powered by as many as 131,072 MI355X GPUs, making it one of the most ambitious AI cloud offerings to date. The move targets a growing demand from enterprises developing complex AI models, including large language models and emerging agentic AI applications.
Mahesh Thiagarajan, Executive Vice President of Oracle Cloud Infrastructure, emphasized Oracle’s focus on expanding its AI infrastructure to meet the needs of customers operating high-intensity AI workloads. “AMD Instinct GPUs, paired with OCI’s performance, advanced networking, flexibility, security, and scale, will help our customers meet their inference and training needs.”
The new offering is backed by AMD’s latest architectural advancements. The Instinct MI355X GPUs are built to deliver nearly three times the computing performance of their predecessors, with a 50% improvement in high-bandwidth memory capacity. This enables organizations to train and deploy larger models more efficiently and with reduced latency.
Forrest Norrod, EVP and GM of AMD’s Data Center Solutions Business Group, noted the long-standing alignment between Oracle and AMD in supporting open, high-efficiency systems. “The MI355X, combined with OCI’s infrastructure and AMD’s Pollara NICs, provides the scale and flexibility needed to power the next wave of AI innovation,” Norrod said.
The collaboration introduces several technical innovations aimed at high-volume AI deployment. AMD’s MI355X shapes on OCI will support up to 288GB of HBM3 memory and 8TB/s bandwidth, allowing large models to be fully loaded into memory – dramatically improving performance for training and inference tasks. The new FP4 floating-point format also enables cost-efficient execution of generative AI workloads.
To address performance density, the new infrastructure employs a dense, liquid-cooled design, supporting 64 GPUs per rack at 1,400 watts each, delivering up to 125kW per rack. This setup is engineered for both high throughput and lower time-to-first-token (TTFT), essential for real-time AI applications.
AMD Turin CPUs
OCI’s AI platform would also benefit from a high-performance head node powered by AMD Turin CPUs, offering up to 3TB of system memory to enhance orchestration and data handling. These nodes act as the central coordination point for GPU resources, ensuring optimal utilization across large-scale deployments.
Crucially, the partnership continues Oracle’s commitment to open-source software. AMD’s ROCm software stack allows developers to migrate existing AI code seamlessly to OCI’s infrastructure, avoiding vendor lock-in. ROCm supports widely adopted AI frameworks, compilers, and libraries, enabling faster development cycles and broader accessibility.
Network architecture also receives a significant boost through AMD’s Pollara AI NICs, which support advanced RoCE (RDMA over Converged Ethernet) capabilities. As the first cloud provider to integrate these NICs, Oracle gains a competitive edge in reducing network latency and increasing throughput for hyperscale AI workloads. Support for the Ultra Ethernet Consortium’s open industry standards ensures interoperability with future networking innovations.
The launch of MI355X GPUs on OCI is expected in the fall of 2025, positioning Oracle and AMD to meet accelerating demand for AI infrastructure. As AI adoption scales rapidly across industries, this collaboration sets a new benchmark in cloud-based, high-performance computing solutions.