Network Layer

Network Layer

Back to ece358

Overview

  • Sending side, chunks segments into datagrams
  • Receiving side, delivers segments to transport layer
  • Key functions:
    • Forwarding: move packets from router’s input to appropriate output router (getting packet through single interchange)
    • Routing: determine route taken by packets from source to destination (planning packet route from source to destination)
  • Connection setup
    • Before sending datagrams, two hosts and intervening routers establish virtual connection
    • Distinction between network and transport:
      • network: between two hosts (may involve intervening routers)
      • transport: between two processes
  • Services, per datagram
    • guaranteed delivery
    • guaranteed delivery with less than 40 msec delay
  • Services for a flow of datagrams
    • in order delivery
    • guaranteed minimum bandwidth
    • restrictions on inter-packet spacing

Connection, connectionless service

  • datagram network provides network layer connectionless service
  • virtual circuit network provides network layer connection service
  • analogous to TCP/UDP connection-oriented/connectionless transport layer services, but:
    • service: host-to-host
    • no choice: network provides one or the other
    • implementation: in network core

Virtual Circuit

  • source-to-destination path behaves like a phone circuit
  • call setup, teardown for each call before data can flow
  • each packet carries VC identifier, not destination host address
  • every router on source-to-destination path has state for each passing connection
  • link, router resources (bandwidth and buffers) may be allocated to VC
    • dedicated resources gives predictable service

Datagram network

  • no call setup at network layer
  • no state about connections in routers
  • packets forwarded using destination host address
  • forwarding table has ranges of addresses instead of individual addresses
    • Longest prefix matching: looks for forwarding table entry for a destination address that has the longest matching address prefix

Router architecture

  • Input, output ports
  • Switching fabric
    • forwards data
    • Transfer packet from input buffer to output buffer
    • switching rate: rate at which packets can transfer from input to output
      • often measured as multiple of i/o line rate
      • \(n\)

        inputs: switching rate

        \(n\)

        times line rate is desirable

    • types: memory, bus, crossbar
  • routing processor
    • Runs routing algorithms and protocols
  • Has line termination, link layer protocol receiver, lookup forwarding and queueing

Input port functions

Decentralized switching

  • given datagram destination, look up output port using forwarding table in input port memory (match plus action)
  • goal: complete input port processing at “line speed”
  • queueing: if datagrams arrive faster than can be outputted, queue them up

Switching factors

  • transfer packet from input buffer to appropriate output buffer
  • Switching via memory
    • Traditional computers with switching under direct control of CPU
    • packet copied to system’s memory
    • spreed limited by memory bandwidth (2 bus crossings per datagram)
  • Switching via bus
    • datagram from input port memory to output port memory via shared bus
    • bus contention: switching speed limited by bandwidth
    • 32 Gbps: sufficient for access and enterprise routers
  • Switching via interconnection network
    • Overcome bus bandwidth limitation
    • banyan networks, crossbar, other interconnection nets initially developed to connect processors in multiprocessors
    • fragments datagram into fixed cell lengths, switch cells through fabric
    • switches 60 Gbps through the interconnection network

Output port functions

  • Same as input ports but in opposite order
  • buffering required for datagrams that arrive faster from fabric than transmission rate
  • scheduling discipline chooses among queued datagrams for transmission
  • Avg buffering equal to typical RTT (e.g. 250 msec) times link capacity

    \(C\)

    • e.g.

      \(C\)

      = 10 Gbps, link is 2.5 Gbit buffer

    • Recent recommendation: with

      \(N\)

      flows, buffering is equal to

      \(\frac{RTT \cdot C}{\sqrt{N}}\)

Queueing

  • input queueing happens when transmission rate of input is faster than fabric rate
  • Head-of-the-line (HOL) blocking: queued datagram at front of queue prevents other in queue from moving forward

Internet Protocol

  • IP has addressing conventions, datagram format, and packet handling conventions
  • ICMP (Internet Control Message Protocol) is for error reporting and router signalling

Datagram format

32 bits

version
head len
type of service
length

16-bit identifier
flags
fragment offset

time to live
upper layer
header checksum

32 bit source IP address
32 bit destination IP address
options, if any
data (variable lenfth, typically a TCP or UDP segment)

  • Time to live is the max number of remaining hops, decremented at each router
  • Overhead: 20 bytes of TCP header + 20 bytes of IP header = 40 bytes, plus app layer overhead

Fragmentation, reassembly

  • network links have maximum transfer size MTU: largest possible link level frame
    • different link types have different MTUs
  • large IP datagram becomes divided (“fragmented”) within network
    • one datagram becomes several
    • reassembled only at final destination
    • IP header bits used to identify, order related fragments
  • Have length, ID, frag flag, offset
    • fragments are specified in units of 8 byte offsets
    • fragflag = 0 implies that it is the last one, so when received, the datagram can be reassembled

IP Addressing

  • 32 bit identifier for host, router interface
  • interface: connection between router/host and physical link
    • routers typically have multiple interfaces
    • host typically has one or two interfaces (e.g. wired ethernet, wireless 802.11)
  • IP addresses associated with each interface

Subnets

  • A subnet includes device interfaces with same subnet part of IP address
    • can physically reach other without a router
  • Address
    • Subnet part: high order bits
    • Host part: low order bits
  • subnet mask: e.g. /24 at end of address means that the most significant 24 bits are the fixed subnet part, and the rest are the host part.
    • Class A: /8
    • Class B: /16
    • Class C: /24

CIDR

  • Classless InterDomain Routing
  • Subnet portion of address can be arbitrary length
  • format: a.b.c.d/x, where x is the number of bits in the subnet portion
  • All ones in the host part is used for broadcast
  • All zeros is used for network identification

Getting an IP address

  • hard-coded sometimes
  • DHCP

DHCP

  • Goal is to allow any computer to automatically get an IP address
  • leases an address, can be renewed
  • allows reuse of addresses (only holds while connected)
  • host broadcasts DHCP discover message
  • DHCP server responds with DHCP offer
  • host requests IP address with DHCP request msg
  • DHCP server replies with DHCP acknowledgement
  • Port 67 reserved for DHCP server, Port 68 reserved for incoming DHCP client
  • Server uses broadcast to reply because the client doesn’t have an address yet
  • After receiving a reply from the server, the client knows the server’s address, and can reply directly to the server

  • How servers pick IPs
    • network gets allocated a portion of its provider ISP’s address space
    • ISP gets block from ICANN: Internet Corporation for Assigned Names and Numbers

Hierarchical Routing

  • “send me anything with addresses beginning (e.g.) 200.23.16.0/20”
  • Gets progressively more specific

NAT: Network Address Translation

  • All datagrams leaving local network have the same single source NAT IP address, assigned by ISP, different source port numbers
  • Datagrams with source or destination in this network have 10.0.0/24 address for sources, destination
  • Outside IP address can change without affecting local IP addresses and vice versa (e.g. changing an ISP)
  • Router has NAT translation table from WAN side address to LAN side address
  • Talking to computers on another local network:
    • manual configuration
    • IGD
    • both connect to relay

ICMP: Internet Control Message Protocol

  • used by hosts and routers to communicate network-level information
  • ICMP message: type, code, plus first 8 bytes of datagram causing error

Traceroute

  • Send series of UDP segments to destination
    • First has TTL=1, second with TTL=2, etc
    • unlikely port number
  • When nth set of datagrams arrives to nth router:
    • router discards datagrams
    • sends source ICMP message for TTL expired (Type 11, code 0)
    • ICMP messages include name of router and IP address
  • When ICMP messages arrive, source records RTTs
  • Stops when:
    • UDP segment eventually arrives at destination host
    • destination returns ICMP “port unreachable” (type 3, code 3), because it arrived, but can’t do anything with the port specified
    • source stops

IPv6

  • 32-bit address space completely allocated
  • In IPv6, there is a fixed 40 byte header, with no fragmentation allowed

ver
pri
flow label

payload len
next hdr
hop limit

source address (128 bits)
destination address (128 bits)
data

  • No checksum
  • No options field, but is allowed outside of header, indicated by “next header” field
  • ICMPv6
    • additional types like “packet too big”

Transition from IPv4

  • not all routers can be upgraded simultaneously
  • Tunneling: IPv6 datagram carried as a payload in IPv4 datagram among IPv4 routers

Routing algorithms

Graph abstraction

A graph \(G = (N, E)\). \(c(x, x’)\) is the cost of the link \((x, x’)\).

Question: what is the least cost path from the source to the destination?

  • global:
    • all routers have complete info on topology
    • “link state” algorithms
  • decentralized
    • router knows physically connected neighbours, link costs to neughbours
    • iterative process of computation, exchange of info with neighbours
    • “distance vector” algorithms
  • static
    • routes change slowly over time
  • dynamic
    • routes change more quickly
    • periodic update
    • in response to link cost changes
  • e.g. Dijkstra’s algorithm
  • \(c(x, x’)\)

    edge weights

  • \(D(v)\)

    current value of cost path from source to dest

    \(v\)

  • \(p(v)\)

    : predecessor

  • \(N’\)

    set of routers who we know the least cost path for

Distance vector

Bellman-Ford equation (DP)
Let \(d_x(y) :=\) cost of least cost path from \(x\) to \(y\)
Then, \(d_x(y) = \min\{c(x,v) + d_v(y)\}\)

  • \(D_x(y)\)

    is the estimate of least cost from

    \(x\)

    to

    \(y\)

  • \(x\)

    maintains the distance vector

    \(D_x = [D_x(y): y \in N]\)

  • For each node

    \(x\)

    it knows the distance to each of its neighbours

  • From time to time, each node sends its own distance bector estimate to neghbours
  • when

    \(x\)

    receives new DV estimate from neighbour, it updates its own DV using B-F equation:

    • \(D_x(y) \leftarrow \min_v\{x(x,v) + D_v(y)\} \quad \forall y \in N\)

  • poisoned reverse:
    • if Z routes through Y to get to X:
      • X tells Y its distance to X is infinite
  • message complexity
    • LS: with n nodes, E links,

      \(O(nE)\)

    • DV: exchange between neighbours only. Convergence time varies
  • Speed of convergence
    • LS:

      \(O(n^2)\)

      , requires

      \(O(nE)\)

      messages. May have oscillations

    • DV: convergence time varies. May be routing loops, count-to-infinity problem
  • Robustness
    • LS: node can advertise incorrect link cost, each node computes only its own table
    • DV: DV node can advertise incorrect path cost
    • each node’s table is used by others so the error propagates through the network

Hierarchical Routing

  • aggregate routers into regions, “autonomous systems”
  • routers in same AS run same routing protocol
    • intra-AS routing protocol
    • routers in different AS can run different intra-AS routing protocols
  • gateway router:
    • at edge of its own AS
    • has link to router in another AS
  • forwarding table configured by both intra- and inter-AS routing algorithms

Inter-AS tasks

  • Router in AS1 receives datagram destined for outside AS1
  • AS1 must learn which dests are reachable through AS2, and which through AS3
  • AS1 must propagate this reachability info to all routers in AS1
  • Use internal routing to find shortest path to exit nodes in AS1, add those to forwarding table
  • At output nodes, the connecting nodes in other ASes give info to AS1 that they can both reach the final destination
    • Hot potato routing: pick closest one to forward to that will still get to the destination

Internet Intra-AS Routing

  • AKA interior gateway protocols (IGP)

RIP: Routing Information Protocol

  • included in BSD-UNIX since 1982
  • distance vector algorithm
    • distance metric: number of hops (up to 15 max), each link has cost 1
    • DVs exchanged with neighbours every 30 seconds in response message (this is advertisement)
    • each advertisement: list of up to 25 destination subnets and the hops to get there
  • Link failure
    • If no advertisement heard after 180s, neighbour/link declared dead
    • routes via neighbour invalidated
    • new advertisements sent to neighbours
    • neighbours in turn send out new advertisements
    • poison reverse used to stop infinite loops: max 16 hops
  • RIP routing tables managed by application level process called routed
    • Advertisements sent as UDP packets

OSPF: Open Shortest Path First

  • uses link state
    • topology map at each node, route computation using Dijkstra’s
  • advertisement carries one entry per neighbour
    • broadcast link state every 30 minutes even if there is no change
  • advertisements flooded to entire AS
    • carried in OSPF messages directly over IP, not TCP or UDP
  • IS-IS routing protocol is nearly identical to OSPF
  • Adds security: messages authenticated
  • multiple same-cost paths allowed (compared to only one in RIP)
  • for each link, multiple cost metrics for different types of service (e.g. satellite link cost set to “low” for best effort ToS, high for real time ToS)
  • integrated uni- and multi-cast support: Multicast OSPF (MOSPF) uses same topology data base as OSPF
  • hierarchical OSPF in large domains

BGP: Border Gateway Protocol

  • the de facto inter-domain routing protocol
  • BGP provides each AS as a means to:
    • eBGP: (external) obtain subnet reachability information form neighbour ASes
    • iBGP: (internal) propagate reachability info to all AS-internal routers
    • determine good routes to other networks based on reachability info and policy
  • Allows subnet to advertise its existence on internet
  • Messages, sent over TCP:
    • OPEN: opens TCP connection to peer and authenticates sender
    • UPDATE: advertises new path or withdraws old
    • KEEPALIVE: keeps connection open in apsence of UPDATEs, also ACKs OPEN request
    • NOTIFICATION: reports errors in previous msg, also used to close connections
  • BGP session: two BGP peers echange BGP messages:
    • advertising paths to different destination network prefices (“path vector” protocol)
    • exchanged over semi permanent TCP connections
  • Advertised prefix includes BGP attributes:
    • AS-PATH: contains ASes through which prefix advertisement has passed
    • NEXT-HOP: indicates internal-AS router to next-hop AS
  • Route selection
    • local preference value attribute: policy decision
    • shortest AS-PATH
    • closest NEXT-HOP router
    • additional criteria
  • Routing policies
    • inter-AS: admin wants control over how its traffic is routed, who routes through its network
    • intra-AS: single admin, so no policy decisions needed
  • Scale
    • hierarchical routing saves table size, reduced update traffic
  • Performance
    • intra-AS: can focus on performance
    • inter-AS: policy may dominate over performance

Broadcast Routing

  • Deliver packets from source to all other nodes
  • Source transmit

    \(n\)

    copies to the

    \(n\)

    destinations using unicast routing.

    \(n\)

    -way-unicast (source duplication)

    • Simple but inefficient
    • how does source determine recipient addresses?
      • need additional protocol mechanisms (e.g. broadcast membership.) Adds overhead
  • Uncontrolled flooding
    • When node receives broadcast, it sends to all neighbours except from the source it came from
      • Problem: cycles lead to endless flooding
      • Problem: broadcast storm. When a node is connected to more than two other nodes, it will create and forward multiple copies of the broadcast, resulting in endless multiplication of packets
  • Controlled flooding
    • Only broadcast if it hasn’t broadcast the same packet before
    • Sequence number controlled flooding: source puts its address and a sequence number into the packet, and sends that to all neighbours
    • Each node maintains a list of source address and sequence number of each broadcast packed it has received and forwarded
    • When a node receives a broadcast packet, it first checks whether the packed is in the list. If so, it is dropped. Otherwise, it is duplicated and forwarded and added to the list
  • Reverse path forwarding (RPF)
    • only forward packet if it arrived on the shortest path between node and source
    • RPF does not require that a router knows the complete shortest path. Only needs to know the next neighbour on its unicast shortest path to the sender
  • Spanning tree
    • No redundant packets received by any node
    • first, construct spanning tree
      • Each node sends unicast join message to centre node
      • message forwarded until it arrives at a node already in tree
    • nodes then forward or make copies along spanning tree