In-Network Computing

Introduction

In-network computing is a new research area that has emerged over the last few years. In-network computing, also known as In-network computation or NetCompute, refers to the execution of programs typically running on end-hosts within network devices.

In-network computing is focused on computing within the network, using devices that already exist within the networked-system and are already used to forward traffic. It differs from the historic use of “network computing” that referred to networked-systems or to computers located within the network. The definition also excludes, for example, network-attached accelerators. The reason to choose this definition is the implications from a system’s perspective: in-network computing means that you don’t add new devices to your network, as you already use switches and NICs. Consequently, the overhead of in-network computing is minimal as no extra space, cost or idle power are required. Furthermore, in-network computing reduces the load on the network, rather than increases it, by terminating transactions as they traverse the network. To date, in-network computing was implemented on three classes of devices: FPGAs, SmartNICs and switch-ASIC.

The introduction of programmable switch-ASICs and the rise of SmartNICs have been the enablers of in-network computing. In the past, network devices were fixed-function and supported only the functionality defined by their manufacturer. In contrast, programmable network devices allow users to implement their own functionality while writing code in high level languages. Today, the dominant language used in this field is P4, an open-source, domain-specific language defined by the P4 Language Consortium.  In the beginning, the language was used mainly to define new protocols and networking related functionality (e.g., in-band network telemetry). However, researchers have quickly started to build upon the language and platforms to port more complex functionality to the network.

Use Cases

In-network computing has been applied to several classes of applications. The first class is, as one can expect, network functions such as a load balancer, NAT, or DNS server, implemented within a network device. A second class of applications proven successful is caching, using the network device to quickly reply with cached values, for example for key-value store applications. A third class of applications applied to in-network computing is data reduction and data aggregation, meaning the use of the network device to aggregate or batch data from a number of sources, as well as to reduce the amount of data sent from the network device onward. The usage of in-network computing is not limited to these intuitive use cases. An interesting class of applications is coordination, such as through the implementation of consensus algorithms within the network, including the Paxos algorithm, and its various roles. Several projects have applied in-network computing for cross-disciplinary use-cases, such as accelerating stream processing, query processing or load-balancing for storage systems.

The Benefits of In-Network Computing

The main promise of in-network computing is performance, both in terms of throughput and latency. Today, many network devices support sub-microsecond latency, with low variance in non-oversubscribed scenarios.  This is, however, not the main source of latency saving. As in-network computing refers to processing within the network, it means that transactions are terminated within their path rather than reach an end-host, saving the latency introduced by the end-host, and any network devices along the way from the in-network computing node to the end-host. Especially in cloud environments, where providers fight to tame tail latency, reduced latency is highly important.

The second performance advantage, throughput, is a property of packet processing rate. Switch ASICs process nowadays up to ten billion packets per second, and therefore potentially support billions of operations per second per offloaded application. This class of switches is designed as pipelines, continuously moving data without stalls. In most cases, even if one operation (packet) is stalled (queued), e.g., while competing on shared resources (congestion), other packets continue to be served. Applications implemented using in-network computing have demonstrated x10,000 performance improvement compared with their host-based counterparts.

An unexpected benefit of in-network computing is power efficiency. While the power-per-watt benefit of accelerators is a known secret, network switches are notoriously regarded as power hungry. Furthermore, they are not power proportional, drawing significant power even when idle. However, if you consider operations-per-Watt, network switches are a lot more attractive, supporting millions of operations per Watt, meaning for some applications x1,000 higher efficiency than software based solutions. To illustrate, a million key-value store queries will “cost’’ less than one Watt on a switch. Since network switches are part of a user’s network, most of the power consumption is already paid by packet forwarding, and the overhead of in-network computing is small, in the order of several percent of the overall switch power consumption.

Challenges

In-networking is promising, but there are still a lot of challenges ahead. Two important questions are the benefits of in-network computing when traffic is encrypted, and the security risks presented by in-network computing. In addition, the architecture of network devices does not easily lend itself to machine learning applications. While systems running machine learning can definitely benefit from acceleration within the network, running the training within the network has proven to be difficult so far, with early prototypes implemented.

To add to the above, in-network computing also faces several large technical challenges. The biggest challenge is probably being able to abstract the network-hardware from programmers. While P4 is a declarative language, it still operates at the packet-level. Ideally, programmers will be able to code using higher level abstractions. The language also currently lacks support for stateful operations, with current solutions being target-specific. Furthermore, to achieve high performance today, programmers must be aware of the hardware target and leverage its capabilities in their code. Porting between different network-hardware targets is not an easy task, and often requires a significant amount of changes to the code. Porting the same code between heterogeneous targets (e.g., CPU, GPU, switch ASIC) will be one step further. Debugging tools will play a crucial role in any future success of in-network computing. While there are several formal verification tools, building debuggers that fit network-device architectures and pipelines moving data (rather than instructions) is hard.

As in-network computing evolves, more challenges arise, such as virtualization. Is it possible to run multiple applications over the same network device? How do you isolate resources? And, what is the difference between virtualization on a CPU and on a network devices?

Summary

In-network computing brings a lot of promise, but also faces a fair amount of challenges. To date, most of in-networking computing research has emerged from the networking community, and there is a need for participation from other research communities. From compilers and abstractions, through scheduling and resource allocation, to virtualization and new use-cases, there is still a lot to innovate and discover before in-network computing will take its place as a first-class citizen in the heterogeneous computing environment.

 

About the Author: Noa Zilberman is a Fellow and an Affiliated Lecturer at the Department of Computer Science and Technology, University of Cambridge.

Disclaimer: These posts are written by individual contributors to share their thoughts on the Computer Architecture Today blog for the benefit of the community. Any views or opinions represented in this blog are personal, belong solely to the blog author and do not represent those of ACM SIGARCH or its parent organization, ACM.

Share this: