P2P(Peer To Peer) File Sharing - GeeksforGeeks - EU-Vietnam Business Network (EVBN)

Introduction
In Computer Networking, P2P is a file-sharing technology, allowing the users to access mainly the multimedia files like videos, music, e-books, games, etc. The individual users in this network are referred to as peers. The peers request files from other peers by establishing TCP or UDP connections.

How P2P works(Overview)
A peer-to-peer network allows computer hardware and software to communicate without the need for a server. Unlike client-server architecture, there is no central server for processing requests in a P2P architecture. The peers directly interact with one another without the requirement of a central server.

Now, when one peer makes a request, it is possible that multiple peers have a copy of that requested object. Now the problem is how to get the IP addresses of all those peers. This is decided by the underlying architecture supported by the P2P systems. By means of one of these methods, the client peer can get to know about all the peers which have the requested object/file and the file transfer takes place directly between these two peers.

Three such Architectures exist:

Centralized Directory
Query Flooding
Exploiting Heterogeneity

1. Centralized Directory

It is somewhat similar to client-server architecture in the sense that it maintains a huge central server to provide directory service.
All the peers inform this central server of their IP address and the files they are making available for sharing.
The server queries the peers at regular intervals to make sure if the peers are still connected or not.
So basically this server maintains a huge database regarding which file is present at which IP addresses.

Working

Now whenever a requesting peer comes in, it sends its query to the server.
Since the server has all the information of its peers, so it returns the IP addresses of all the peers having the requested file to the peer.
Now the file transfer takes place between these two peers.

The first system which made use of this method was Napster, for the purpose of Mp3 distribution.

The major problem with such an architecture is that there is a single point of failure. If the server crashes, the whole P2P network crashes. Also, since all of the processing is to be done by a single server so a huge amount of the database has to be maintained and regularly updated.

2. Query Flooding

Unlike the centralized approach, this method makes use of distributed systems.
In this, the peers are supposed to be connected to an overlay network. It means if a connection/path exists from one peer to another, it is a part of this overlay network.
In this overlay network, peers are called nodes, and the connection between peers is called an edge between the nodes, thus resulting in a graph-like structure.

Working

Now when one peer requests for some file, this request is sent to all its neighboring nodes i.e. to all nodes which are connected to this node. If those nodes don’t have the required file, they pass on the query to their neighbors and so on. This is called query flooding.
When the peer with the requested file is found (referred to as query hit), the query flooding stops and it sends back the file name and file size to the client, thus following the reverse path.
If there are multiple query hits, the client selects from one of these peers.

Gnutella was the first decentralized peer-to-peer network.

This method also has some disadvantages like, the query has to be sent to all the neighboring peers unless a match is found. This increases traffic in the network.

3. Exploiting heterogeneity

This P2P architecture makes use of both the above-discussed systems.
It resembles a distributed system like Gnutella because there is no central server for query processing.
But unlike Gnutella, it does not treat all its peers equally. The peers with higher bandwidth and network connectivity are at a higher priority and are called group leaders/supernodes. The rest of the peers are assigned to these supernodes.
These supernodes are interconnected and the peers under these supernodes inform their respective leaders about their connectivity, IP address, and the files available for sharing.

KaZaA technology is such an example that makes use of Napster and Gnutella.
Thus, the individual group leaders along with their child peers from a Napster-like structure. These group leaders then interconnect among themselves to resemble a Gnutella-like structure.

Working

This structure can process the queries in two ways.
The first one is that the supernodes could contact other super nodes and merge their databases with their own database. Thus, this supernode now has information of a large number of peers.
Another approach is that when a query comes in, it is forwarded to the neighboring super nodes until a match is found, just like in Gnutella. Thus query flooding exists but with limited scope as each supernode has many child peers. Hence, such a system exploits the heterogeneity of the peers by designating some of them as group leaders/supernodes and others as their child peers.

References- Computer Networking: A Top-down Approach By James F. Kurose

This article is contributed by Arushi Dhamija. If you like GeeksforGeeks and would like to contribute, you can also write an article using write.geeksforgeeks.org or mail your article to [email protected]. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.

My Personal Notes