## RDMA Overview

RDMA (Remote Direct Memory Access) is a technology for directly accessing remote server memory without operating system intervention, significantly reducing network latency and CPU overhead.

## Core Value

- **Ultra-low latency**: Latency can be reduced to 1-2 microsecond level
- **Low CPU usage**: Data transfers directly between network card and memory with zero CPU participation
- **High bandwidth**: Can approach physical link bandwidth limits

## Mainstream RDMA Protocols

### RoCEv2 (RDMA over Converged Ethernet)

UDP-based lossless network protocol, the most mainstream solution currently:
- Compatible with existing Ethernet
- Switches need to support DCB/QoS
- Widely supported by domestic manufacturers like Huawei and H3C

### InfiniBand

Dedicated high-performance network protocol:
- Lowest latency
- Requires dedicated InfiniBand switches
- Deeply integrated with GPUs after NVIDIA acquisition

## AI Training Scenario Applications

RDMA is essential technology for AI training clusters:

- GPUDirect RDMA: Direct data transfer between GPUs
- Collective communication optimization: AllReduce and other operations accelerated
- Lossless network: Requires DCB/QoS configuration