Why RDMA for S3 Storage Is a Big Deal for AI Workloads

The AI Data Explosion And Why Storage Is Struggling

AI is hungry for data. By 2028, enterprises are expected to generate close to 400 zettabytes of data every year. On top of that, around 90 percent of new data is unstructured. Think audio, video, images, PDFs and logs instead of neat rows in a database.

This unstructured data is exactly what powers modern AI models, from large language models to recommendation engines and vision systems. But feeding all that data to thousands of GPUs is becoming a serious bottleneck.

Traditional object storage built on the S3 API has been the go to option for cheap and scalable storage. It is great for archives, backups, data lakes and analytics logs. The big drawback is performance. Object storage over standard TCP networking is usually not fast or efficient enough for large scale AI training and inference where GPUs need a constant stream of data.

On top of that, companies want their data and AI stacks to be portable. They want to run workloads both in their own data centers and in cloud or neo cloud environments without rewriting everything. That means using a common storage API like S3, but with far better performance.

This is where RDMA for S3 compatible storage comes in.

What RDMA For S3 Compatible Storage Actually Does

RDMA stands for remote direct memory access. It is a networking technology that lets one computer read or write main memory on another computer directly, bypassing the CPU as much as possible. That cuts latency and reduces CPU load, which is perfect when you have GPUs waiting for data.

NVIDIA has brought RDMA to the world of S3 compatible object storage. Instead of using the usual TCP based path, data transfers between compute nodes and storage nodes use RDMA underneath, while still exposing a familiar S3 API at the application level.

In simple terms, you keep speaking S3 but the data itself moves much faster and more efficiently.

This approach delivers several clear benefits for AI heavy environments:

Higher throughput per terabyte More data can be read and written from each unit of storage capacity.
Higher throughput per watt Networking and storage hardware deliver more work for the same power budget, which matters at AI factory scale.
Lower cost per terabyte Because performance goes up, you can often do more with the same infrastructure and keep storage costs down for AI projects.
Much lower latency compared with TCP GPUs spend less time idle waiting for data to arrive, which boosts overall utilization.
Reduced CPU utilization Since RDMA bypasses most CPU involvement in data transfer, CPU cores are freed up to run AI frameworks, orchestration and other services instead of just shoveling data.

NVIDIA provides RDMA client and server libraries that plug into existing object storage systems. Storage vendors integrate the server side libraries into their products, enabling accelerated S3 compatible access. The client libraries run on GPU compute nodes, so AI applications can pull data from object storage significantly faster than over standard TCP.

While the initial implementation is tuned for NVIDIA GPUs and NVIDIA networking, the design is open. Other vendors and customers can contribute to the libraries, integrate them into their own stacks or build custom software that uses the RDMA for S3 compatible APIs.

That openness is important because it encourages a wider ecosystem. It makes it easier for tooling, frameworks and platforms to support high speed object storage without locking into a single proprietary path.

Standardization And Who Is Shipping It

For this technology to really matter, it has to become a standard that many vendors adopt instead of a single vendor trick. NVIDIA is working with partners to push RDMA for S3 compatible storage toward standardization and broad availability.

Several big object storage players are already on board and integrating the RDMA based libraries into their products:

Cloudian HyperStore Cloudian positions object storage as the future of scalable data management for AI and is collaborating with NVIDIA to standardize RDMA for S3 compatible storage. The goal is to boost performance and efficiency while keeping full S3 compatibility so thousands of existing apps and tools can benefit without major rewrites.
Dell ObjectScale Dell has worked with NVIDIA to add RDMA acceleration into ObjectScale. This brings end to end RDMA to object storage, targeting environments where thousands of GPUs are reading and writing data at the same time. The latest ObjectScale software is aimed at being a core storage layer for AI factories and AI data platforms.
HPE Alletra Storage MP X10000 HPE has integrated RDMA for S3 compatible storage into its Alletra Storage MP X10000 system. According to HPE, this setup accelerates throughput, reduces latency and lowers total cost of ownership for unstructured and AI driven workloads.

This early vendor support shows that RDMA accelerated object storage is not just a lab experiment. It is being built into real products that enterprises can deploy on premises or connect to their cloud centered AI infrastructure.

NVIDIA is also tying this work into its broader ecosystem. The RDMA for S3 compatible storage libraries are available today for select partners and are expected to become generally available through the NVIDIA CUDA Toolkit. That makes it easier for developers and platform vendors who already rely on CUDA to experiment with and adopt the technology.

In parallel, NVIDIA is launching an object storage certification inside the NVIDIA Certified Storage program. This kind of certification is meant to give customers confidence that a storage solution will play nicely with NVIDIA based AI environments and deliver the performance AI workloads demand.

For teams building AI factories or scaling up generative AI, the big takeaway is that object storage is evolving. With RDMA for S3 compatible storage, you get the scalability and portability of S3 style object stores plus the speed and efficiency needed to keep GPU clusters busy instead of waiting around for data.

Original article and image: https://blogs.nvidia.com/blog/s3-compatible-ai-storage/