Google Cloud AI3/25|Research & Papers Products & Services

Dynamic Resource Allocation (DRA): Kubernetes Device Management for AI Workloads

Google Cloud and NVIDIA have donated the Dynamic Resource Allocation (DRA) driver to the Kubernetes community, enabling more efficient management of accelerators like GPUs and TPUs for AI workloads.

💡

Why it matters

DRA is a critical innovation that enables Kubernetes to better support the growing demand for AI/ML workloads, which often require specialized hardware like GPUs and TPUs.

Key Points

1DRA represents a paradigm shift from static hardware assignments to a flexible, request-based model for Kubernetes
2DRA eliminates the need for manual node pinning, offers flexible hardware parameterization, and abstracts hardware via DeviceClasses
3DRA's ResourceSlice and ResourceClaim APIs allow the Kubernetes scheduler to better match workload requirements to available hardware inventory

Details

The explosion of large language models (LLMs) has increased demand for high-performance accelerators like GPUs and TPUs. Kubernetes is becoming the de facto platform for running LLMs in the enterprise. DRA, which reached stable status in Kubernetes 1.34, solves several pain points of the previous Device Plugin framework, such as the need for manual node pinning and the inability to express granular hardware requirements. DRA introduces the concepts of ResourceSlice (to describe hardware availability) and ResourceClaim (to express workload requirements). This allows the Kubernetes scheduler to make more intelligent decisions in matching workloads to the right hardware, improving the efficiency of expensive AI compute resources.

Dynamic Resource Allocation (DRA): Kubernetes Device Management for AI Workloads

Why it matters

Key Points

Details

Dive deeper

Related Articles

Introducing Gemma 4 on Google Cloud: Our Most Capable Open …

How Honeylove Boosts Product Quality and Service Efficiency…

Run Real-Time and Async Inference on the Same Infrastructur…

Vail Resorts Builds AI Assistant for Personalized Mountain …

Building Production-Ready AI Agents with Google-Managed MCP…

The new AI literacy: Insights from student developers

Solving the Traveling Salesman Problem at Warehouse Scale w…

Kubernetes as AI Infrastructure: Google Cloud, llm-d, and t…

A Developer's Guide to Training with Ironwood TPUs

Introducing multi-cluster GKE Inference Gateway: Scale AI w…

AI Curator

Ask me anything about AI

Related Articles

Introducing Gemma 4 on Google Cloud: Our Most Capable Open …

How Honeylove Boosts Product Quality and Service Efficiency…

Run Real-Time and Async Inference on the Same Infrastructur…

Vail Resorts Builds AI Assistant for Personalized Mountain …

Building Production-Ready AI Agents with Google-Managed MCP…

The new AI literacy: Insights from student developers

Solving the Traveling Salesman Problem at Warehouse Scale w…

Kubernetes as AI Infrastructure: Google Cloud, llm-d, and t…

A Developer's Guide to Training with Ironwood TPUs

Introducing multi-cluster GKE Inference Gateway: Scale AI w…