Basic Knowledge and Differences of RoCE, IB, and TCP Networks – Huawei

On a distributed storage network, the RoCE, InfiniBand (IB), and TCP/IP protocols are used. RoCE and IB are RemoteDirect Memory Access (RDMA) technologies. What are the differences between RoCE and IB and traditional TCP/IP? Let’s compare them in detail.

RDMA and TCP/IP

For applications with high I/O concurrency and low latency, such as high-performance computing and big data analysis, the existing TCP/IP software and hardware architecture cannot meet the application requirements. The traditional TCP/IP network communication uses the kernel to send messages. This communication mode has high data movement and data replication overheads. The RDMA technology is developed to solve the data processing latency on the server side during network transmission. As shown in Figure 1-1, the RDMA technology can access memory data through a network port without an operating system kernel. This allows high-throughput, low-latency network communication, especially for large-scale parallel computer clusters.

Figure 1-1

Comparison between RDMA and traditional TCP/IP

RDMA Types

Currently, there are three types of RDMA networks: Infiniband, RDMA over Converged Ethernet (RoCE), and iWARP.

The InfiniBand network is specially designed for RDMA to ensure reliable transmission at the hardware level. The technology is advanced, but the cost is high. RoCE and iWARP are both Ethernet-based RDMA technologies, which enable RDMA with high speed, ultra-low latency, and extremely low CPU usage to be deployed on the most widely used Ethernet.

As shown in Figure 1-2, RoCE has two versions: RoCEv1 and RoCEv2. RoCEv1 is the RDMA protocol implemented based on the Ethernet link layer. The switch needs to support flow control technologies such as PFC to ensure reliable transmission at the physical layer. RoCEv2 is implemented at the UDP layer of the Ethernet TCP/IP protocol, the IP protocol is introduced to solve the scalability problem.

Figure 1-2

RDMA network types

Table 1-1

Comparison between RoCE and InfiniBand  

InfiniBand

iWARP

RoCE

Performance

Excellent

Slightly worse than InfiniBand (affected by TCP)

Equivalent to InfiniBand

Cost

High

Medium

Low

Stability

Excellent

Poor

Good

Switch

IB switch

Ethernet switch

Ethernet switch

As shown in Table 1-1, the three RDMA networks have the following characteristics:

  • InfiniBand: RDMA is considered at the beginning of the design to ensure reliable transmission at the hardware level and provide higher bandwidth and lower latency. However, the cost is high because IB NICs and switches must be supported.
  • RoCE: RDMA based on Ethernet consumes less resources than iWARP and supports more features than iWARP. You can use common Ethernet switches that support RoCE NICs.
  • iWARP: TCP-based RDMA network, which uses TCP to achieve reliable transmission. Compared with RoCE, on a large-scale network, a large number of TCP connections of iWARP occupy a large number of memory resources. Therefore, iWARP has higher requirements on system specifications than RoCE. You can use common Ethernet switches that support iWARP NICs.

Common Network Protocols Used in Distributed Storage

  • IB: used for the front-end storage network in the DPC scenario.
  • RoCE: used for the back-end storage network.
  • TCP/IP: used for service network.