Where does the "Reliable Data Transfer" (RDT) concept come from?

Note – the actual answer to this question is (hopefully) in the last section.

Apologies to Zack67, but since I completely disagree with his answer, I want to post mine.

Mục Lục

Bit Errors

Every network design has to deal with the problem: network is not reliable.

Here I consider layer 1. The common model is that sender and receiver communicate over a channel:

S — | channel | — R

This channel is NEVER free from disturbance – the most common ones are noise (just random signal distortion) and interference (from other communication channels that share the same medium). The amount of disturbance depends on the channel – communication over wires usually has less noise than wireless. Note also that there can be temporary sources of disturbance that are of very high magnitude.

So, R never receiver exactly the same signal that S sends. It is the law of physics.

There are ways to deal with disturbance of some finite amount. For example, digital signals have discrete values. If the values of the signal can be either 0V or 5V, then when R receives 4.8V is very likely that S has sent 5V. Digital modulation of wireless channels can do something similar ( you can check QPSK or QAM modulations). Noise over the given threshold will cause bit errors (receiver receives one instead of zero or vice versa), e.g., if the receiver receives something around 2.5V it can be both 0 and 1. There are also so called error correcting codes which can correct a certain pre-defined amount of bit errors. As I said there is no guarantee that the disturbance cannot exceed these values, which means that the received data can be incorrect.

Neither circuit switched networks nor packet switched networks are immune to bit errors.

Packet Errors

As Zac pointed out there are some types of errors that happen in packet switched networks because (a) each packet is independently routed (b) there are no preallocated resources .

A first example would be a CSMA/CD bus. On collisions the affected stations send jamming signals which effectively destroy the frame. The frame is not delivered.

Also, there are all kinds of routing errors, which cause packets to either dropped, or reordered. There are also duplicated packets, but I think this is more of a consequence of having a ARQ.

Note:: If you receive a “frame” (or packet or any other type of chunk of data) and you put a checksum on it, then any bit error become packet errors.

Is this a problem?

Actually it depends. And this is the place where I disagree with Zac. This is an application (or at least OSI layer 5 and above) decision of how to deal with errors.

First, note, that, irrespectively of your network being circuit switched, reliable, or best effort the (only known) way to deal with errors (beyond the predictable ones) is the same – retransmissions (aka ARQs), that is if the receiver has to receive the exact copy of what sender has sent.

Here, it is important to understand, that different application layer traffic has different requirements with respect to error handling and retransmissions, and voice is very different from data.

Interactive voice conversations cannot deal with retransmissions (it takes to long). On the other hand, voice can handle bit errors. Thus, voice in circuit switched networks is not transmitted reliably – instead it uses reliability mechanisms on layer 8: human brain is in most cases capable of understanding noisy speech, and if not human can always ask for retransmission :).

This is not the case if you transfer files or some kind of instructions which computers are supposed to interpret – e.g., remote logins over telnet/ssh. You would probably understand bit errors in emails (wrong words), but if you type one command in your telnet and the receiver receives another command, then, best case scenario, it won’t execute. So, in this case you have to deal with errors.

So, the problem of reliable data transfer comes from the fact, that the data has to be received exactly as it was transmitted, i.e., reliably, over an (always) unreliable network. The problem occurs in any kind of network, this is the law of physics.

References

AFAIK people started to connect computers with each other around the same time when people started designing packet switched networks. That also implies that the problem of transmitting data reliably also appeared around the same time. Before that there were mostly phone networks which connected humans, so errors were not such a big problem. [There was also such thing as telegraph, and I do not know about reliability mechanisms for telegrams].

That is, somewhere in the 1970, people started solving the problem of reliable transmission over an unreliable channel – in particular designing ARQ schemes, which are basically a combination of sequence numbers, acknowledgements, and retransmissions.

The problem statement is more or less: given a channel model which describes what kind of packet errors can occur (e.g., channel only drop packets, it can drop/reorder packets, etc), design a protocol for reliable data transfer over such channel.

There are some papers that I have found (note – you can probably find a PDF by searching paper name in Google Scholar):

There are references to other ARQ schemes in the papers.

Now, as I said, the schemes like stop-and-wait, go-back-n, and selective repeat are more or less universally known. Their origins are most likely from these works. Also, an ARQ (Automatic Repeat reQuest) is also common term to describe these schemes.

The “version numbers” of reliable data transfer protocols are again AFAIK from the book by Kurose. Kurose is one of the most popular books in network education and it is used as a reference in most networking courses.

Different ARQs have different tradeoffs between complexity and performance (e.g., overhead of recovering missing packets). I believe that these version numbers were intended to show exactly this – the benefits of having schemes of gradually increased complexity and also efficiency. This part is actually pretty important to understand. (And that Wikipedia page fails to show exactly this, which is unfortunate).

AFAIK these schemes are not used precisely as they are defined in any existing protocol, but the concepts are based on these schemes and are recognizable. This is why they are always taught.