Linux Raw Sockets

Linux Raw Sockets

Recently I did a userspace implementation of the
Host Identity Protokoll (HIPv2, RFC 7401) with the upcoming
Diet Exchange (HIP DEX, IETF draft 6). Doing so, I’ve learnt a lot about raw
socktet programing under Linux and here I want to share a few things with you.

So, I assume you have already worked with network sockets before – if not, don’t
fear, it’s not that hard and there are plenty of nice introductions out there. I
can for example recommend Beej’s Guide to Network Programming. For this
article I’ll start with a normal UDP/TCP based socket and work my way down the
layers. So we open a traditional socket by:

sockfd = socket(AF_INET, SOCK_DGRAM, 0);

This will open a UDP based datagram socket via IPv4. The first argument of
socket() specifies the domain of your socket in our case that’s Internet
Protocol. Sometimes you will see here AF… and sometimes PF…, this doesn’t
matter, they are the same. While PF stands for protocol family, AF is short for
address family. Historically it was thought that in the future there might be
multiple protocol families sharing the same address family – but this never
happend. So the correct way would be to use PF_INET in the socket call and
AF_INET in your struct sockaddr_in, but most people nowadays use the
address family everywhere. With the second argument type we specify if we
want to use a connection-based protocol like TCP (SOCK_STREAM) or a protocol
without connections like UDP (SOCK_DGRAM). The third argument protocol
specifies which protocol we actually want to use – we could set UDP or TCP here
(IPPROTO_UDP, IPPROTO_TCP) but setting 0 works too: this sets the
protocol to the default protocol for the combination of the domain and type
field – for AF_INET and SOCK_DGRAM the default is UDP and for SOCK_STREAM
it’s TCP. You might also see IPPROTO_IP as protocol which is simply by
definition 0. But the above variant seems to be the most common one.

But hey, we have the year 2018 – why the heck should be limit us to IPv4?
Luckily it’s easy enough to support IPv6: just replace AF_INET by AF_INET6 and
it will work with both IPv4 and IPv6! So don’t you dare to ever use AF_INET
anymore without a good excuse.
By the way: if you want IPv6 only you can set the socket option IPV6_V6ONLY.

But we don’t want to talk about ordinary TCP/UDP sockets here! So lets dig down
in the mysterious world of raw sockets.

The first thing I want to note is: you’ll need super user rights for creating a
raw socket or more precisely the CAP_NET_RAW capability otherwise you’ll get
the error ”Operation not permitted.” (EPERM).

sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_UDP);
sockfd = socket(AF_INET6, SOCK_RAW, IPPROTO_UDP);

The first kind of Raw-Socket we look at is what you get by setting type to
SOCK_RAW but still set protocol to TCP or UDP. You will still only receive
the type of packet specified (here UDP), but this time you will not only
receive the data but also the layer 4 (TCP/UDP) header and you’re also
responsible to set the layer 4 header yourself.

Contrary to above, here the choice of domain does matter a lot. First of all
here AF_INET6 will only receive IPv6 and not both! Second what you get if you
read from the socket differs: if you read from the first variant with AF_INET
you will get the IPv4 header, the UDP/TCP header and the data; in the second
variant your read will instead result in only the UDP/TCP header and data but
not the IPv6-Header!

The third important difference between AF_INET and AF_INET6 for raw sockets
is the endianness: unlike IPv4 raw sockets, all data sent via IPv6 raw sockets
must be in the network byte order and all data received via raw sockets will be
in the network byte order.

If you want to send something through the socket, your packet has to include
the Layer 4-Header but not the IP-Header. (Note: this is unspecified in POSIX,
but I focus on Linux here.) So but what if we want to change something in the
IP-Header? For IPv4 there are two options: you can set the desired
field(s) via calls to setsockopt or if you want to do the full header on your
own, you can use the socket option IP_HDRINCL to tell that you will
construct the header and write both header and payload to the socket:

sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_UDP);
int on = 1;
setsockopt(sockfd, IPPROTO_IP, IP_HDRINCL, &on, sizeof(on));

Even if you use this you won’t have to deal with Source Address and Packet ID –
the kernel will fill them in for you if you leave them all zero. The fields for
the IP checksum and the length field will be set by the kernel if you want or
not.

What’s important here: IPv6 doesn’t have IP_HDRINCL or a direct equivalent,
as per RFC 3542 section 3. You can, however, also set various parameters via
setsocketopt. Alternatively the IPv6 advanced socket API employs another
framework called “ancillary data”. For outgoing packages one can set the
majority of the fields in the header as well as supported header extensions via
ancillary data and for received packages the majority of the fields and header
extensions can be read with the same framework. A description of ancillary data
is out of the scope of this article but the basic idea is you specify which
values you want to set via a call of setsockopt then you write the value for
the header fields and the actual data into a struct msghdr and send this via
sendmsg().

If you want to send data with a transport protocol which has no user interface
you can set the protocol field to raw too:

sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_RAW);

This will automatically set IP_HDRINCL and allow you to send your data with
arbitrary layer 4 protocols. Most commons use: sending ICMP packets. Receiving
of data is however not possible with this type of socket!

So far we got full control over layer 4 and partial control over layer 3. It’s
time to step down one further level into the dungeon.

sockfd = socket(AF_PACKET, SOCK_DGRAM, htons(ETHERTYPE_IPV6));

This is called a packet socket, it allows you to receive and send raw
packets at the device driver level (layer 2). In the above version we used the
protocol to specify that we only want to receive IPv6 packets. We can drop this
requirement to receive all packets no matter if it’s IPv4, IPv6 or something
else:

sockfd = socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_ALL));

By default, a packet socket will receive all packets matching the protocol.
You can use bind() to bind the packet socket to an interface.

The field type set to SOCK_DGRAM results in the cooked mode: when reading
from the socket you will read the packet without MAC-header but you can get the
MAC-addresses comfortable by using recvfrom() and likewise you can use the
sendto() to specify the destination by the sockaddr_ll struct.
Alternatively we can set type to SOCK_RAW:

sockfd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));

This is the lowest we can get: this way ethernet frames are passed from the
device driver without any changes to your application, including the full level 2
header. Likewise, when writing to the socket the user-supplied buffer hast to
contain all the headers of layer 2 to 4.

This is the deepest we can go in userspace – at this point we have full control
of the complete ethernet frame. I hope you enjoyed our journey into the rabbit
hole.

Sources and further readings:

  • Beej’s Guide to Network Programming
  • socket(7)

  • raw(7)

  • packet(7)

  • sendto(2), recvfrom(2)

  • UNIX Network Programming, Volume 1 by W. Richard Stevens

  • IPv6 Core Protocols Implementation by Qing Li Tatuya Jinmei Keiichi Shima

  • IPv6 Socket API Extensions: Programmer’s Guide by Qing Li Tatuya Jinmei Keiichi Shima

  • Linux Kernel source code

linux c network