0 7 15 31
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ver | hlen | TOS | Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| TTL | Protocol | IP Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source IP Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination IP Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IP Options (if any) ...|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The ver field is set to 4 to indicate IPv4. The hlen field specifies
the length of the IP header in 32-bit words. The TOS (Type-Of-Service)
field is used to indicate the type of service the packet should receive
by routers. This field is broken up into several pieces.
+-+-+-+-+-+-+-+-+
|Prec |D|T|R|M|0|
+-+-+-+-+-+-+-+-+
The Prec field is unused today, but once meant precendence. The last
field is unused and is set to 0. The other fields are:
The total length field contains the total length of the IP header and the data it contains. IP performs fragmentation and reassembly and uses the flags and Identification field to do so. The identification field is a unique value for each datagram. Each fragment of the same datagram uses the same identification value. One of the flags is a "more fragments" flag. It indicates that more fragments for this datagram are forthcoming. The last fragment of a datagram does not have this flag set. Routers will fragment a datagram if the MTU size of the media requires it. Also, a sender will fragment a datagram it sends if the media it must send over has an MTU size that requires it. Reassembly is only done at the final destination of a datagram. The "don't fragment" flag is used to specify that a datagram can nto be fragmented by routers along its path. The offset field specifies how far from the beginning a particular fragment is. Reassmebly uses a small timer that is set when an initial fragment is received. If it expires and not all the fragments have been received, the whole datagram is discarded.
The TTL (Time-To-Live) field is used to keep routing loops from allowing datagrams to stay on the network indefinitely. This value is decrmented by 1 for each router the datagram passes through. When the value reaches 0, the datagram is discarded and not forwarded anymore.
The IP checksum is only calculated over the IP header. It is not calculated over the data.
IP provides numerous options. Including, security, timestamps, recording routes, specifying routes, Router Alerts, etc.
0 7 15 31
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| type | code | ICMP checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| contents and format depend on type and code |
ICMP message are either queries or error messages. Most ICMP error
messages contain the IP header and first 8 bytes of data from
the IP datagram that generated the error. Notice that this will include
the UDP or TCP ports if those protocols were used.
ICMP error messages are never generated for:
0 7 15 31
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| type | code | ICMP checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification | sequence number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Optional Data |
The code field is set to 0. The type field is set to 8 for an echo
request and 0 for an echo reply. The Identification field is used
to identify which application is sending the ping if multiple pings
are used. This is usually set to the Process ID of the application.
The sequence number field is used to indicate what request an reply
is for. The sequence is that a ping application sends an echo request and
hopefully gets an echo reply from the destination. The ping application
can compute RTT by recording time of echo request transmission and
time of echo reply reception.
| Destination | Gateway | Flags | Interface |
|---|---|---|---|
| localhost | localhost | host | lo0 |
| 157.182.194.0 | naur | none | hme0 |
| default | 157.182.194.1 | gateway | hme0 |
route command is used on most hosts to update the
routing table at system initialization time.
Once a message is received by a router, it consults its routing table and sees if it can get the message closer to its eventual destination by sending it to other routers.
Routing protocols are protocols designed to be used by routers to dynamically update their routing tables. Routers communicate by exchanging information and update their routing tables. Routing daemons, such as routed and gated, are the mechanisms that control the routing protocols and update the routing table.
Routing policy is the process of determining which routes to place in a table based on social, contractual, and technical agreements. To make routing scaleable, a hierarchical approach is required. Each entity that can administrate a set of routers is called an Autonomous System (AS). Routing within an AS is controlled by Interior Gateway Protocols (IGPs). Routing between ASs is controlled by Exterior Gateway Protocols (EGPs). IGPs are Intradomain Routing Protocols and include such protocols as RIP and OSPF. EGPs are Interdomain Routing Protocols and include such protocols as EGP and BGP. Common routing daemons are routed and gated. Routed supports RIPv1. Gated v2 supports RIPv1, EGP, and BGPv1. Gated v3 supports RIPv1, RIPv2, OSPFv2, EGP, BGPv2, and BGPv3.
Distance vector routing algorithms maintain distances from itself to each possible destination. Distances are computed using information in neighbors distance vectors. So for example, I am a router and one neighbor says that home.net is 10 hops away, another says it is 5 hops away, another says it is 4 hops away, and another says it is 3 hops away. If I need to send something to home.net, I would like to send it to the one who says it is 3 hops away.
Distance vector routing has one big problem. It is called "counting to infinity" and increases how long the algorithm takes to converge after a change. Imagine we have a simple routing setup that is a chain. A is directly connected to B and B is directly connected to C. Initially, A believes C is 2 hops away and B believes that C is 1 hop away. Imagine if the link connecting B and C breaks. B consults its information and sees that A is 2 hops away from C (B does not know that A calculated its distance based on B). So, B calculates its distance to C as 2+1=3. This new information causes A to recalculate its distance to C to be 3+1=4. This continues until both B and C reach the predefined number of hops called infinity (or not connected). The two ways of fixing this are to include hop information in the distance vectors or use "split horizon". Split horizon doesn't fix it in the general case, but it does help in most cases. In split horizon, a simpel rule is followed. That is if R forwards traffic for destination D through neighbor N, then R reports to N that R's distance to D is infinity.
Link-state routing is more complicated. Each router must actively test its link to its neighbors and advertise that status to other routers. This dissemination can be tricky. After getting this information, each router can then calculate the distance and path to each other router. Lets look at a graphical example below.
6 2 5
A ----- B ----- C ---\
|2 |1 |2 G
D ----- E ----- F ---/
2 4 1
Link State Database:
A: B/6, D/2
B: A/6, C/2, E/1
C: B/2, F/2, G/5
D: A/2, E/2
E: B/1, D/2, F/4
F: C/2, E/4, G/1
G: C/5, F/1
The database contains the link state for each node. Each node contains
to list of neighbors and there distances. Each node can compute the
path to each other node by using a modified version of Dijkstra's all-pairs
shortest path. Take for example the node C, it would compute its routes
as the following tree. The numbers in paranthesis are hop counts to that
destination from C.
F --- G
/(2) (3)
C
\(2) (3) (5) (7)
B --- E --- D --- A
0 7 15 31
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| command | ver | 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ---
| address family | 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IP address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ route
| 0 | (20 bytes)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| metric (1-16) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ---
A RIP message is composed of a maximum of 25 (20 bytes each) routes.
The address family field is set to 2 for IP. The command field can take
on the value 1 for request, 2 for reply, 5 for poll, or 6 for pollentry.
Values of 3 and 4 are obsolete and 5 and 6 are undocumented. Ver is
set to 1 in this case.
RIPs operation begins with a broadcast request out all interfaces. If a request is received, the router checks the address family of the request if it is 0 and the metric is 16, it then responds with its entire router table. If the address family is not 0, then it responds with the value for it has in its table for the IP address. If it has the address, it sends the metric it has. If it doesn't have the address, it sends a response with metric set to 16 (infinity). When a response is received by a router, it updates its routing table after validating the entry. This validation is usually very informal. A router regularly (every 30 seconds or so) sends its entire routing table to its neighbors using a broadcast. When a metric for a route changes, a router sends the changed routes to its neighbors.
In RIP, each route has a lifetime of about 3 minutes. If no update is sent in 3 minutes, the metric for the route is set to 16 and the route is marked for deletion. In no update is received for an additional 60 seconds, the route is then deleted.
Notice that RIPv1 does not have a subnet mask. This is because RIPv1 assumes that the subnet mask used is the same as the interfaces subnet mask. This is flawed, but works in some cases. RIPv2 adds subnet masks, a list of next-hop routers, and route tags (for ASs) as well as simple authentication, and supports multicast so that broadcasts can be avoided.
BGP uses four message types. The Open message is sent when a link comes up. An update message is sent to exchange routing information. A notification message is sent as the final message before a link is disconnected. And keepalive messages are sent to reassure a neighbor that everything is OK (in the absence of routing updates).
BGPv4 supports CIDR (Classless InterDomain Routing). CIDR allows subnet masks to lapse into the network ID portion of an address. This reduces the size of the routing table EGPs must support, by allowing many of the Class C addresses to be collapsed into a few addresses. This is only as good as the policy for allocating Class C addresses is enforced, though.