The BGP protocol is implemented in three parts: bgp.c
which takes care of the
connection and most of the interface with BIRD core, packets.c
handling
both incoming and outgoing BGP packets and attrs.c
containing functions for
manipulation with BGP attribute lists.
As opposed to the other existing routing daemons, BIRD has a sophisticated core architecture which is able to keep all the information needed by BGP in the primary routing table, therefore no complex data structures like a central BGP table are needed. This increases memory footprint of a BGP router with many connections, but not too much and, which is more important, it makes BGP much easier to implement.
Each instance of BGP (corresponding to a single BGP peer) is described by a bgp_proto structure to which are attached individual connections represented by bgp_connection (usually, there exists only one connection, but during BGP session setup, there can be more of them). The connections are handled according to the BGP state machine defined in the RFC with all the timers and all the parameters configurable.
In incoming direction, we listen on the connection's socket and each time we receive some input, we pass it to bgp_rx(). It decodes packet headers and the markers and passes complete packets to bgp_rx_packet() which distributes the packet according to its type.
In outgoing direction, we gather all the routing updates and sort them to buckets (bgp_bucket) according to their attributes (we keep a hash table for fast comparison of rta's and a fib which helps us to find if we already have another route for the same destination queued for sending, so that we can replace it with the new one immediately instead of sending both updates). There also exists a special bucket holding all the route withdrawals which cannot be queued anywhere else as they don't have any attributes. If we have any packet to send (due to either new routes or the connection tracking code wanting to send a Open, Keepalive or Notification message), we call bgp_schedule_packet() which sets the corresponding bit in a packet_to_send bit field in bgp_conn and as soon as the transmit socket buffer becomes empty, we call bgp_fire_tx(). It inspects state of all the packet type bits and calls the corresponding bgp_create_xx() functions, eventually rescheduling the same packet type if we have more data of the same type to send.
The processing of attributes consists of two functions: bgp_decode_attrs() for checking of the attribute blocks and translating them to the language of BIRD's extended attributes and bgp_encode_attrs() which does the converse. Both functions are built around a bgp_attr_table array describing all important characteristics of all known attributes. Unknown transitive attributes are attached to the route as EAF_TYPE_OPAQUE byte streams.
void bgp_start_timer (timer * t, int value) -- start a BGP timer
timer
time to fire (0 to disable the timer)
This functions calls tm_start() on t with time value and the amount of randomization suggested by the BGP standard. Please use it for all BGP timers.
void bgp_close_conn (struct bgp_conn * conn) -- close a BGP connection
connection to close
This function takes a connection described by the bgp_conn structure, closes its socket and frees all resources associated with it.
If the connection is being closed due to a protocol error, adjust the connection restart timer as well according to the error recovery policy set in the configuration.
If the connection was marked as primary, it shuts down the protocol as well.
void bgp_connect (struct bgp_proto * p) -- initiate an outgoing connection
BGP instance
The bgp_connect() function creates a new bgp_conn and initiates a TCP connection to the peer. The rest of connection setup is governed by the BGP state machine as described in the standard.
int bgp_incoming_connection (sock * sk, int dummy) -- handle an incoming connection
TCP socket
unused
This function serves as a socket hook for accepting of new BGP connections. It searches a BGP instance corresponding to the peer which has connected and if such an instance exists, it creates a bgp_conn structure, attaches it to the instance and either sends an Open message or (if there already is an active connection) it closes the new connection by sending a Notification message.
void bgp_error (struct bgp_conn * c, unsigned code, unsigned subcode, byte * data, int len) -- report a protocol error
connection
error code (according to the RFC)
error sub-code
data to be passed in the Notification message
length of the data
bgp_error() sends a notification packet to tell the other side that a protocol error has occurred (including the data considered erroneous if possible) and closes the connection.
int bgp_fire_tx (struct bgp_conn * conn) -- transmit packets
connection
Whenever the transmit buffers of the underlying TCP connection are free and we have any packets queued for sending, the socket functions call bgp_fire_tx() which takes care of selecting the highest priority packet queued (Notification > Keepalive > Open > Update), assembling its header and body and sending it to the connection.
void bgp_schedule_packet (struct bgp_conn * conn, int type) -- schedule a packet for transmission
connection
packet type
Schedule a packet of type type to be sent as soon as possible.
void bgp_rx_packet (struct bgp_conn * conn, byte * pkt, unsigned len) -- handle a received packet
BGP connection
start of the packet
packet size
bgp_rx_packet() takes a newly received packet and calls the corresponding packet handler according to the packet type.
int bgp_rx (sock * sk, int size) -- handle received data
socket
amount of data received
bgp_rx() is called by the socket layer whenever new data arrive from the underlying TCP connection. It assembles the data fragments to packets, checks their headers and framing and passes complete packets to bgp_rx_packet().
unsigned int bgp_encode_attrs (byte * w, ea_list * attrs, int remains) -- encode BGP attributes
buffer
a list of extended attributes
remaining space in the buffer
The bgp_encode_attrs() function takes a list of extended attributes and converts it to its BGP representation (a part of an Update message).
Length of the attribute block generated.
struct rta * bgp_decode_attrs (struct bgp_conn * conn, byte * attr, unsigned int len, struct linpool * pool, int mandatory) -- check and decode BGP attributes
connection
start of attribute block
length of attribute block
linear pool to make all the allocations in
1 iff presence of mandatory attributes has to be checked
This function takes a BGP attribute block (a part of an Update message), checks its consistency and converts it to a list of BIRD route attributes represented by a rta.
The OSPF protocol is quite complicated and its complex implemenation is
split to many files. In ospf.c
, you can find mostly interface
for communication with the core (e.g., reconfiguration hooks, shutdown
and initialisation and so on). In packet.c
, you can find various
functions for sending and receiving of generic OSPF packets. There are
also routines for autentication and checksumming. File iface.c
contains
the interface state machine, allocation and deallocation of OSPF's
interface data structures. Source neighbor.c
includes the neighbor state
machine and functions for election of Designed Router and Backup
Designed router. In hello.c
, there are routines for sending
and receiving of hello packets as well as functions for maintaining
wait times and the inactivity timer. Files lsreq.c
, lsack.c
, dbdes.c
contain functions for sending and receiving of link-state requests,
link-state acknoledges and database descriptions respectively.
In lsupd.c
, there are functions for sending and receiving
of link-state updates and also the flooding algorithm. Source topology.c
is
a place where routines for searching LSA's in the link-state database,
adding and deleting them reside, there also are functions for originating
of various types of LSA's (router LSA, net LSA, external LSA). File rt.c
contains routines for calculating the routing table. lsalib.c
is a set
of various functions for working with the LSA's (endianity conversions,
calculation of checksum etc.).
One instance of the protocol is able to hold LSA databases for multiple OSPF areas, to exchange routing information between multiple neighbors and to calculate the routing tables. The core structure is proto_ospf to which multiple ospf_area and ospf_iface structures are connected. To ospf_area is also connected top_hash_graph which is a dynamic hashing structure that describes the link-state database. It allows fast search, addition and deletion. Each LSA is kept in two pieces: header and body. Both of them are kept in endianity of the CPU.
Every area has its own area_disp() which is responsible for late originating of router LSA, calculating of the routing table and it also ages and flushes the LSA's. This function is called in regular intervals. To every ospf_iface, we connect one or more ospf_neighbor's -- a structure containing many timers and queues for building adjacency and for exchange of routing messages.
BIRD's OSPF implementation respects RFC2328 in every detail, but some of internal algorithms do differ. The RFC recommends to make a snapshot of the link-state database when a new adjacency is forming and send the database description packets based on information of this snapshot. The database can be quite large in some networks, so we rather walk through a slist structure which allows us to continue even if the actual LSA we were worked with is deleted. New LSA's are added at the tail of this slist.
We also don't keep a separate OSPF routing table, because the core helps us by being able to recognize when a route is updated to an identical one and it suppresses the update automatically. Due to this, we can flush all the routes we've recalculated and also those we're deleted to the core's routing table and the core will take care of the rest. This simplifies the process and conserves memory.
void area_disp (timer * timer) -- invokes link-state database aging, originating of
it's called every ospf_area->tick seconds
router LSA and routing table calculation
It ivokes aging and when ospf_area->origrt is set to 1, start function for origination of router LSA. It also start routing table calculation when ospf_area->calcrt is set.
int ospf_import_control (struct proto * p, rte ** new, ea_list ** attrs, struct linpool * pool) -- accept or reject new route from nest's routing table
current instance of protocol
the new route
list of arttributes
pool for alloction of attributes
Its quite simple. It does not accept our own routes and decision of import leaves to the filters.
int ospf_shutdown (struct proto * p) -- Finnish of OSPF instance
current instance of protocol
RFC does not define any action that should be taken befor router shutdown. To make my neighbors react as fast as possible, I send them hello packet with empty neighbor list. They should start theirs neighbor state machine with event NEIGHBOR_1WAY.
int ospf_reconfigure (struct proto * p, struct proto_config * c) -- reconfiguration hook
current instance of protocol (with old configuration)
new configuration requested by user
This hook tries to be a little bit inteligent. Instance of OSPF will survive change of many constants like hello interval, password change, addition of deletion of some neighbor on nonbroadcast network, cost of interface, etc.
void originate_rt_lsa (struct ospf_area * oa) -- build new instance of router LSA
ospf_area which is LSA built to
It builds router LSA walking through all OSPF interfaces in specified OSPF area. This function is mostly called from area_disp(). Builds new LSA, increases sequence number (if old instance exists) and sets age of LSA to zero.
void originate_net_lsa (struct ospf_iface * ifa) -- originates of deletes network LSA
interface which is LSA originated for
Interface counts number of adjacent neighbor. If this number is lower then one or interface is not in state OSPF_IS_DR it deletes and premature ages instance of network LSA for specified interface. In other case, new instance of network LSA is originated.
void originate_ext_lsa (net * n, rte * e, struct proto_ospf * po, struct ea_list * attrs) -- new route recived from nest and filters
network prefix and mask
rte
current instance of OSPF
list of extended attributes
If I receive message that new route is installed, I try to originate an external LSA. LSA header of such LSA does not contain information about prefix lenght, so if I have to originate multiple LSAs for route with different prefixes I try to increment prefix id to find a "free" one.
The function also set flag ebit. If it's first time, the new router lsa origination is necessary.
struct top_graph * ospf_top_new (struct proto_ospf * p) -- allocated new topology database
current instance of OSPF
This dynamically hashed structure is often used for keeping LSAs. Mainly its used in ospf_area structute.
void neigh_chstate (struct ospf_neighbor * n, u8 state) -- handles changes related to new or lod state of neighbor
OSPF neighbor
new state
Many actions has to be taken acording to state change of neighbor. It starts rxmt timers, call interface state machine etc.
void ospf_neigh_sm (struct ospf_neighbor * n, int event) -- ospf neighbor state machine
neighor
actual event
This part implements neighbor state machine as described in 10.3 of RFC 2328. the only difference is that state NEIGHBOR_ATTEMPT is not used. We discover neighbors on nonbroadcast networks using the same ways as on broadcast networks. The only difference is in sending hello packets. These are send to IPs listed in ospf_iface->nbma_list .
void bdr_election (struct ospf_iface * ifa) -- (Backup) Designed Router election
actual interface
When wait time fires, it time to elect (Backup) Designed Router. Structure describing me is added to this list so every electing router has the same list. Backup Designed Router is elected before Designed Router. This process is described in 9.4 of RFC 2328.
void iface_chstate (struct ospf_iface * ifa, u8 state) -- handle changes of interface state
OSPF interface
new state
Many action must be taken acording to iterface state change. New networks LSA must be originated, flushed, new multicast socket to listen messages for ALLDROUTERS has to be opened, etc.
void ospf_int_sm (struct ospf_iface * ifa, int event) -- OSPF interface state machine
OSPF interface
event comming to state machine
This fully respect 9.3 of RFC 2328 except we don't use LOOP state of interface.
int ospf_rx_hook (sock * sk, int size)
socket we recived the packet. Its ignored.
size of the packet
This in entry point for messages from neighbors. Many checks (like autnetication, checksums, size) are done before packet is passed to non generic functions.
void ospf_age (struct ospf_area * oa)
ospf area
This function is periodicaly invoked from area_disp(). It computes new age of all LSAs and old (age is higher than LSA_MAXAGE) are flushed when ever possible. If some LSA originated by router itself is older than LSREFRESHTIME new instance is originated.
RFC says, that router should check checksum of every LSA to detect some hardware problem. BIRD does not do it to minimalize CPU utilization.
If routing table calculation is scheduled, it also invalidates old routing table calculation results.
struct top_hash_entry * lsa_install_new (struct ospf_lsa_header * lsa, void * body, struct ospf_area * oa) -- install new LSA into database
LSA header
pointer to LSA body
current ospf_area
This function ensures installing new LSA into LSA database. Old instance is replaced. Several actions are taken to detec if new routing table calculation is necessary. This is described in 13.2 of RFC 2328.
void ospf_dbdes_tx (struct ospf_neighbor * n) -- transmit database description packet
neighbor
Sending of database description packet is described in 10.6 of RFC 2328. Reception of each packet is acknoledged in sequence number of another. When I send a packet to neighbor I keep a copy in buffer. If neighbor does not reply, I don't create new packet but I just send content of buffer.
void ospf_rt_spfa (struct ospf_area * oa) -- calculate internal routes
OSPF area
Calculation of internal paths in area is described in 16.1 of RFC 2328. It's based on Dijkstra shortest path tree algorithmus. RFC recommends to add ASBR routers into routing table. I don't do this and latter parts of routing table calculation looks directly into LSA Database. This function is invoked from area_disp().
void ospf_ext_spfa (struct proto_ospf * po) -- calculate external paths
protocol
After routing table for any area is calculated, calculation of external path is invoked. This process is described in 16.6 of RFC 2328. Inter- and Intra-area paths are always prefered over externals.
The Pipe protocol is very simple. It just connects to two routing tables using proto_add_announce_hook() and whenever it receives a rt_notify() about a change in one of the tables, it converts it to a rte_update() in the other one.
To avoid pipe loops, Pipe keeps a `being updated' flag in each routing table.
RIP is a pretty simple protocol, so about a half of its code is interface with the core.
We maintain our own linked list of rip_entry structures -- it serves as our small routing table. RIP never adds to this linked list upon packet reception; instead, it lets the core know about data from the packet and waits for the core to call rip_rte_notify().
Within rip_tx(), the list is walked and a packet is generated using rip_tx_prepare(). This gets tricky because we may need to send more than one packet to one destination. Struct rip_connection is used to hold context information such as how many of rip_entry's we have already sent and it's also used to protect against two concurrent sends to one destination. Each rip_interface has at most one rip_connection.
We are not going to honor requests for sending part of routing table. That would need to turn split horizon off etc.
About triggered updates, RFC says: when a triggered update was sent, don't send a new one for something between 1 and 5 seconds (and send one after that). We do something else: each 5 seconds, we look for any changed routes and broadcast them.
void rip_timer (timer * t)
timer
Broadcast routing tables periodically (using rip_tx) and kill routes that are too old. RIP keeps a list of its own entries present in the core table by a linked list (functions rip_rte_insert() and rip_rte_delete() are responsible for that), it walks this list in the timer and in case an entry is too old, it is discarded.
struct rip_interface * new_iface (struct proto * p, struct iface * new, unsigned long flags, struct iface_patt * patt)
myself
interface to be created or NULL if we are creating a magic socket. The magic socket is used for listening and also for sending requested responses.
interface flags
pattern this interface matched, used for access to config options
Create an interface structure and start listening on the interface.
The Static protocol is implemented in a straightforward way. It keeps two lists of static routes: one containing interface routes and one holding the remaining ones. Interface routes are inserted and removed according to interface events received from the core via the if_notify() hook. Routes pointing to a neighboring router use a sticky node in the neighbor cache to be notified about gaining or losing the neighbor. Special routes like black holes or rejects are inserted all the time.
The only other thing worth mentioning is that when asked for reconfiguration, Static not only compares the two configurations, but it also calculates difference between the lists of static routes and it just inserts the newly added routes and removes the obsolete ones.
The Direct protocol works by converting all ifa_notify() events it receives to rte_update() calls for the corresponding network.