BGP, Part I
In the previous tutorial, we discovered how to let OSPF dynamically configure routing inside a network. This tutorial provides an introduction to another routing protocol, which is BGP, the Border Gateway Protocol. As the name implies, this protocol acts on the border of a network. Where OSPF is well suited to keep track of all tiny details of what's happening in our internal network, BGP will be talking to the outside world to interconnect our network with other networks, managed by someone else.
BGP Essentials
When routers talk BGP to each other, they essentially just claim that network ranges are reachable via them:
Let's look at the same picture again, hiding less information:
The picture shows two networks, which are interconnected through router R3
and R10
.
- A complete network under control of somebody has an AS (Autonomous System) number. This number will be used later in the BIRD BGP configuration.
- The routes that are published to another network are as aggregated as possible, to minimize the amount of them. While the internal routing table in for example
AS64080
might contain dozens of prefixes, for each little vlan, and probably a number of single host routes (IPv4/32
and IPv6/128
), they're advertised to the outside as just three routes in total. - If neighbouring routers between different networks are directly connected, they often interconnect using a minimal sized network range. For IPv4, this is usually a
/30
and for IPv6 a/120
or a/126
prefix, containing only the two routers. In the example above, the small network ranges are taken from the network ofAS64080
.
OSPF vs. BGP
While the title of this section might seem logical, since we're considering BGP after just having spent quite some time on OSPF, it's actually a non-issue. OSPF and BGP are two very different routing protocols, which are used to get different things done. Nonetheless, let's look at some differences:
OSPF:
- Routes in the network are originated by just putting ip addresses on a network interface of a router, and letting the routing protocol pick them up automatically.
- The routes in OSPF are addresses and subnets that are actually in use.
- Every router that participates in the OSPF protocol has a full detailed view on the network using link state updates that are broadcasted over the network.
BGP:
- Routes that are published to other networks are "umbrella ranges", which are as big as possible and are defined manually.
- There is no actual proof that the addresses which are advertised are actually in use inside the network.
- A neighbour BGP router knows that some prefix is reachable via another network, but where OSPF shortest path deals with knowledge about all separate routers, paths and weights, BGP just looks on a higher level, the shortest path, considering a complete network (AS) being one step.
So, OSPF is an IGP (Interior Gateway Protocol) and BGP is an EGP (Exterior Gateway Protocol). BGP can connect OSPF networks to each other, hiding a lot of detail inside them.
BGP and OSPF with BIRD: Setting up the containers and networks
In the second half of this tutorial we'll configure a network, using OSPF, BGP and the BIRD routing software. BGP wise, it's kept simple, using just a single connection between two networks.
It's starting to look serious now! Thankfully, most of the configuration is provided already, so we can quickly set up this whole network using our LXC environment. Just like in the previous tutorial, the birdbase container can be cloned, after which the lxc network information and configuration inside the containers can be copied into them.
-
Clone this git repository somewhere to be able to use some files from the bgp-intro/lxc/ directory inside.
-
lxc-clone the birdbase container several times:
lxc-clone -s birdbase R0 lxc-clone -s birdbase R1 lxc-clone -s birdbase R3 lxc-clone -s birdbase R10 lxc-clone -s birdbase R11 lxc-clone -s birdbase R12 lxc-clone -s birdbase H6 lxc-clone -s birdbase H7 lxc-clone -s birdbase H19 lxc-clone -s birdbase H34
-
Set up the network interfaces in the lxc configuration. This can be done by removing all network related configuration that remains from the cloned birdbase container, and then appending all needed interface configuration by running the fixnetwork.sh script that can be found in
bgp-intro/lxc/
in this git repository. Of course, have a look at the contents of the script first, before executing it. Since this example is only using IPv4 and single IP addresses on the interfaces, I simply added them to the lxc configuration instead of the network/interfaces file inside the container.. ./fixnetwork.sh
-
Copy extra configuration into the containers. The bgp-intro/lxc/ directory inside this git repository contains a little file hierarchy that can just be copied over the configuration of the containers. For each router, it's a network/interfaces configuration file which adds an IP address that corresponds with the Router ID to the loopback interface, and a simple BIRD configuration file that serves as a starting point for our next steps.
-
Start all containers
lxc-start -d -n R0 lxc-start -d -n R1 lxc-start -d -n R3 lxc-start -d -n R10 lxc-start -d -n R11 lxc-start -d -n R12 lxc-start -d -n H6 lxc-start -d -n H7 lxc-start -d -n H19 lxc-start -d -n H34
-
Verify connectivity and look around a bit. Here's an example for R1:
lxc-attach -n R1 root@R1:/# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet 10.40.217.1/32 scope global lo valid_lft forever preferred_lft forever inet6 2001:db8:40::1/128 scope global valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 109: vlan216: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 02:00:0a:28:d8:03 brd ff:ff:ff:ff:ff:ff inet 10.40.216.3/28 brd 10.40.216.15 scope global vlan216 valid_lft forever preferred_lft forever inet6 2001:db8:40:d8::3/120 scope global valid_lft forever preferred_lft forever inet6 fe80::aff:fe28:d803/64 scope link valid_lft forever preferred_lft forever 111: vlan3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 02:00:0a:28:03:01 brd ff:ff:ff:ff:ff:ff inet 10.40.3.1/24 brd 10.40.3.255 scope global vlan3 valid_lft forever preferred_lft forever inet6 2001:db8:40:3::1/120 scope global valid_lft forever preferred_lft forever inet6 fe80::aff:fe28:301/64 scope link valid_lft forever preferred_lft forever root@R1:/# ip r 10.40.2.0/24 via 10.40.216.2 dev vlan216 proto bird 10.40.3.0/24 dev vlan3 proto kernel scope link src 10.40.3.1 10.40.216.0/28 dev vlan216 proto kernel scope link src 10.40.216.3 10.40.217.0 via 10.40.216.2 dev vlan216 proto bird 10.40.217.3 via 10.40.216.1 dev vlan216 proto bird 10.40.217.16/30 via 10.40.216.1 dev vlan216 proto bird root@R1:/# birdc show route BIRD 1.4.5 ready. 10.40.217.16/30 via 10.40.216.1 on vlan216 [ospf1 22:58:02] * I (150/20) [10.40.217.3] 10.40.216.0/28 dev vlan216 [ospf1 22:58:02] * I (150/10) [10.40.217.3] 10.40.217.0/32 via 10.40.216.2 on vlan216 [ospf1 22:58:02] * I (150/10) [10.40.217.0] 10.40.217.1/32 dev lo [ospf1 22:57:42] * I (150/0) [10.40.217.1] 10.40.217.3/32 via 10.40.216.1 on vlan216 [ospf1 22:58:02] * I (150/10) [10.40.217.3] 10.40.2.0/24 via 10.40.216.2 on vlan216 [ospf1 22:58:02] * I (150/20) [10.40.217.0] 10.40.3.0/24 dev vlan3 [ospf1 22:57:42] * I (150/10) [10.40.217.1] root@R1:/# ip -6 r 2001:db8:40:: via fe80::aff:fe28:d802 dev vlan216 proto bird metric 1024 unreachable 2001:db8:40::1 dev lo proto kernel metric 256 error -101 2001:db8:40::3 via fe80::aff:fe28:d801 dev vlan216 proto bird metric 1024 2001:db8:40:2::/120 via fe80::aff:fe28:d802 dev vlan216 proto bird metric 1024 2001:db8:40:3::/120 dev vlan3 proto kernel metric 256 2001:db8:40:d8::/120 dev vlan216 proto kernel metric 256 2001:db8:40:d910::/120 via fe80::aff:fe28:d801 dev vlan216 proto bird metric 1024 fe80::/64 dev vlan216 proto kernel metric 256 fe80::/64 dev vlan3 proto kernel metric 256 root@R1:/# birdc6 show route BIRD 1.4.5 ready. 2001:db8:40:d8::/120 dev vlan216 [ospf1 22:58:08] * I (150/10) [10.40.217.3] 2001:db8:40::/128 via fe80::aff:fe28:d802 on vlan216 [ospf1 22:58:08] * I (150/20) [10.40.217.0] 2001:db8:40:2::/120 via fe80::aff:fe28:d802 on vlan216 [ospf1 22:58:08] * I (150/20) [10.40.217.0] 2001:db8:40:3::/120 dev vlan3 [ospf1 22:57:41] * I (150/10) [10.40.217.1] 2001:db8:40::3/128 via fe80::aff:fe28:d801 on vlan216 [ospf1 22:58:08] * I (150/20) [10.40.217.3] 2001:db8:40:d910::/120 via fe80::aff:fe28:d801 on vlan216 [ospf1 22:58:08] * I (150/20) [10.40.217.3]
As you can see, OSPF is running for IPv4 and IPv6, and has discovered the complete internal network of AS64080.
Now make sure you can do the following, and answer the following questions:
- From H6,
traceroute -n
andtraceroute6 -n
to a few destinations in AS64080 to get acquainted with the network topology. - Look at the BIRD logging. A fun way to follow the logging is to do
tail -F R*/rootfs/var/log/bird/*.log
from outside the containers, and then start all of them. - Find out why
10.40.217.18
or2001:db8:40:d910::2
onR10
cannot be pinged fromR1
, while the route to10.40.217.16/30
and2001:db8:40:d910::/120
are actually present in the routing table ofR1
andR3
.
BIRD BGP configuration
Let's zoom in a bit first, and focus on the connection between R3
and R10
. This section will show how to configure the actual BGP connection between those two routers, so they will learn about each others network.
The routing table of R3
contains information about the internal network of its own network, AS64080
. As you can see, routes to the ranges in AS65033
are missing.
root@R3:/# ip r
10.40.2.0/24 via 10.40.216.2 dev vlan216 proto bird
10.40.3.0/24 via 10.40.216.3 dev vlan216 proto bird
10.40.216.0/28 dev vlan216 proto kernel scope link src 10.40.216.1
10.40.217.0 via 10.40.216.2 dev vlan216 proto bird
10.40.217.1 via 10.40.216.3 dev vlan216 proto bird
10.40.217.16/30 dev vlan217 proto kernel scope link src 10.40.217.17
root@R3:/# ip -6 r
2001:db8:40:: via fe80::aff:fe28:d802 dev vlan216 proto bird metric 1024
2001:db8:40::1 via fe80::aff:fe28:d803 dev vlan216 proto bird metric 1024
unreachable 2001:db8:40::3 dev lo proto kernel metric 256 error -101
2001:db8:40:2::/120 via fe80::aff:fe28:d802 dev vlan216 proto bird metric 1024
2001:db8:40:3::/120 via fe80::aff:fe28:d803 dev vlan216 proto bird metric 1024
2001:db8:40:d8::/120 dev vlan216 proto kernel metric 256
2001:db8:40:d910::/120 dev vlan217 proto kernel metric 256
fe80::/64 dev vlan216 proto kernel metric 256
fe80::/64 dev vlan217 proto kernel metric 256
Now, add the following configuration to bird.conf
of R3
:
##############################################################################
# eBGP R10
#
table t_r10;
protocol static originate_to_r10 {
table t_r10;
import all; # originate here
route 10.40.0.0/22 blackhole;
route 10.40.216.0/21 blackhole;
}
protocol bgp ebgp_r10 {
table t_r10;
local 10.40.217.17 as 64080;
neighbor 10.40.217.18 as 65033;
import filter {
if net ~ [ 10.0.0.0/8{19,24} ] then accept;
reject;
};
import keep filtered on;
export where source = RTS_STATIC;
}
protocol pipe p_master_to_r10 {
table master;
peer table t_r10;
import where source = RTS_BGP;
export none;
}
Let me explain a bit about what's going on here. So far, we've used the BIRD protocol types kernel
, device
and ospf
. This configuration snippet introduces three other ones: static
, bgp
and pipe
. Besides that, there's also a table definition on top.
- By issuing
table t_r10
, we tell BIRD that we'd like to use an extra internal routing table with the namet_r10
. By default, BIRD always has a routing table namedmaster
, and now we added a second one. Routing tables in BIRD are just a collection of routes, having some attributes. - The static protocol is used to generate a collection of static routes. In this case, we define a protocol static with name
originate_to_r10
, and connect it to tablet_r10
. The import statement causes the routes that are generated by this static route protocol to be imported into thet_r10
table. Static routes usually have a target of a neighbor router, using a via statement, but in this case, we don't care about a next hop, since it's just a collection of some prefixes that will be exported via BGP. The blackhole won't be actually used for anything here. - The bgp protocol is named after the router which it's talking to,
R10
, and is also connected to thet_r10
routing table inside BIRD. It has a local and remote IP address and AS number. The import rules are a bit more complex than a simpleimport all
, which also would have been sufficient here. I'll explain more about that later, although I doubt you wouldn't already understand what they do. The export rule contains a simple filter that tells BIRD to push all routes from tablet_r10
that originate from a static protocol to the outside, toR10
. - The pipe protocol is a simple protocol that is able to move around routes between internal BIRD routing tables. In this case, the pipe protocol
p_master_to_r10
is connected to the centralmaster
routing table and is looking at tablet_r10
. From tablet_r10
, all routes that originate from an external BGP peer are imported into the master table. Doing so will make sure that the routes that will be learned from the remote network end up in the routing table of the Linux kernel (via the kernel protocol that exports them from the BIRD master table to the outside, to the Linux kernel), while the routes that only were meant to be used to export to the BGP peer (generated by the static protocol) stay int_r10
.
Remember, the internal BIRD routing tables are not used to actually do packet forwarding. During the OSPF tutorial, we already discussed this difference between the "Control Plane" and "Forwarding Plane". Actually, the routing table inside the control plane is usually called the "RIB" (Routing Information Base), while the routing table that is used in the forwarding plane is called the "FIB" (Forwarding Information Base). Just look up all those terms on the internet to see what everyone is saying about them.
After adding the configuration, fire up the interactive BIRD console, using birdc
:
root@R3:/# birdc
BIRD 1.4.5 ready.
bird> show protocols
name proto table state since info
kernel1 Kernel master up 2015-06-14
device1 Device master up 2015-06-14
ospf1 OSPF master up 2015-06-14 Running
originate_to_r10 Static t_r10 up 23:54:16
p_master_to_r10 Pipe master up 23:54:16 => t_r10
ebgp_r10 BGP t_r10 start 00:34:16 Active Socket: Connection refused
bird> show route table t_r10
10.40.216.0/21 blackhole [originate_to_r10 23:54:16] * (200)
10.40.0.0/22 blackhole [originate_to_r10 23:54:16] * (200)
Well, the routes are waiting to be pushed to R10
in the t_r10
table, and no routes from AS65033
are visible yet. There's only an ugly "Connection refused"... reminding you that the other end of the BGP connection needs to be configured. Now it's up to you to configure R10
with the opposite part of the configuration, and make it talk to R3
!
When successful, the output of the commands above should show, on R10
:
bird> show protocols
name proto table state since info
kernel1 Kernel master up 2015-06-14
device1 Device master up 2015-06-14
ospf1 OSPF master up 2015-06-14 Running
originate_to_r3 Static t_r3 up 00:48:27
ebgp_r3 BGP t_r3 up 00:48:32 Established
p_master_to_r3 Pipe master up 00:48:27 => t_r3
bird> show route table t_r3
10.40.216.0/21 via 10.40.217.17 on vlan217 [ebgp_r3 00:48:32] * (100) [AS64080i]
10.40.32.0/19 blackhole [originate_to_r3 00:48:27] * (200)
10.40.0.0/22 via 10.40.217.17 on vlan217 [ebgp_r3 00:48:32] * (100) [AS64080i]
bird> show route
10.40.217.16/30 dev vlan217 [ospf1 2015-06-14] * I (150/10) [10.40.32.10]
10.40.216.0/21 via 10.40.217.17 on vlan217 [ebgp_r3 00:48:32] * (100) [AS64080i]
10.40.33.0/26 dev vlan33 [ospf1 2015-06-14] * I (150/10) [10.40.32.12]
10.40.36.0/24 via 10.40.33.3 on vlan33 [ospf1 2015-06-14] * I (150/20) [10.40.32.12]
10.40.48.0/21 via 10.40.33.2 on vlan33 [ospf1 2015-06-14] * I (150/20) [10.40.32.11]
10.40.32.10/32 dev lo [ospf1 2015-06-14] * I (150/0) [10.40.32.10]
10.40.32.11/32 via 10.40.33.2 on vlan33 [ospf1 2015-06-14] * I (150/10) [10.40.32.11]
10.40.0.0/22 via 10.40.217.17 on vlan217 [ebgp_r3 00:48:32] * (100) [AS64080i]
10.40.32.12/32 via 10.40.33.3 on vlan33 [ospf1 2015-06-14] * I (150/10) [10.40.32.12]
bird> show route export kernel1
10.40.216.0/21 via 10.40.217.17 on vlan217 [ebgp_r3 00:48:32] * (100) [AS64080i]
10.40.36.0/24 via 10.40.33.3 on vlan33 [ospf1 2015-06-14] * I (150/20) [10.40.32.12]
10.40.48.0/21 via 10.40.33.2 on vlan33 [ospf1 2015-06-14] * I (150/20) [10.40.32.11]
10.40.32.11/32 via 10.40.33.2 on vlan33 [ospf1 2015-06-14] * I (150/10) [10.40.32.11]
10.40.0.0/22 via 10.40.217.17 on vlan217 [ebgp_r3 00:48:32] * (100) [AS64080i]
10.40.32.12/32 via 10.40.33.3 on vlan33 [ospf1 2015-06-14] * I (150/10) [10.40.32.12]
Since there is no explicit name given for the kernel protocol in the configuration, BIRD just names it kernel1
. As you can see, the table t_r3
contains the static routes which are exported via BGP to R3
, and it also contains two routes that were learned from R3
. Yay! The default show route
command shows routes that are in the BIRD master table. You can see that the p_master_to_r3
pipe protocol correctly copied the routes that were learned from R3
from table t_r3
to the master
table. The kernel protocol then exports all dynamically learned routes to the actual forwarding table in the Linux kernel, where they show up as output of ip route
, labeled with proto bird:
root@R10:/# ip r
10.40.0.0/22 via 10.40.217.17 dev vlan217 proto bird
10.40.32.11 via 10.40.33.2 dev vlan33 proto bird
10.40.32.12 via 10.40.33.3 dev vlan33 proto bird
10.40.33.0/26 dev vlan33 proto kernel scope link src 10.40.33.1
10.40.36.0/24 via 10.40.33.3 dev vlan33 proto bird
10.40.48.0/21 via 10.40.33.2 dev vlan33 proto bird
10.40.216.0/21 via 10.40.217.17 dev vlan217 proto bird
10.40.217.16/30 dev vlan217 proto kernel scope link src 10.40.217.18
And there was much rejoicing! Yayyyy! Well, let's have a look what we can do with this result. Since both networks are now aware of each other's routes, I'd expect I can do some tracerouting into a remote network now!
root@R10:/# traceroute -n 10.40.2.6
traceroute to 10.40.2.6 (10.40.2.6), 30 hops max, 60 byte packets
1 10.40.217.17 0.356 ms 0.319 ms 0.324 ms
2 10.40.216.2 0.430 ms 0.427 ms 0.378 ms
3 10.40.2.6 0.781 ms 0.724 ms 0.716 ms
W00t!. R10
now knows the route to IPv4 ranges used in AS64080
, and it seems H6
also knows a route back to R10
.
Let's try it from H34
!
root@H34:/# traceroute -n 10.40.2.6
traceroute to 10.40.2.6 (10.40.2.6), 30 hops max, 60 byte packets
1 10.40.36.1 0.296 ms !N 0.091 ms !N *
Meh, that doesn't look to good. Apparently there's more work to do.
Some assignments
Now make sure you can do the following, and answer the following questions:
-
Configure the IPv6 BGP connection between
R3
andR10
. IPv4 and IPv6 is handled separately by BIRD now, but the configuration for IPv6 is very similar to the configuration I showed here. Just use import all for bgp if you don't want to learn more about filtering now. -
Explain why
10.40.217.18
or2001:db8:40:d910::2
onR10
can be pinged fromR1
now:root@R1:/# ping6 2001:db8:40:d910::2 PING 2001:db8:40:d910::2(2001:db8:40:d910::2) 56 data bytes 64 bytes from 2001:db8:40:d910::2: icmp_seq=1 ttl=63 time=0.399 ms 64 bytes from 2001:db8:40:d910::2: icmp_seq=2 ttl=63 time=0.099 ms ^C --- 2001:db8:40:d910::2 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 0.099/0.249/0.399/0.150 ms
-
Try to export a route outside of
10.0.0.0/8
over BGP and notice that the filter will actually stop that route from being propagated, while accepting the other routes. Using theshow route filtered protocol ebgp_r3
command the route should be visible, if it was pushed fromR3
.
Intermezzo: BIRD tables, protocols, import, export
BIRD concepts:
- table: multiple routing tables (not like the linux kernel, it's just a collection of routes). e.g. there may be multiple paths to the same network in a table present.
- protocol: move routes around between tables (pipe) or into / out of BIRD (ospf, bgp)
- import: draw routes closer to BIRD
- export: push routes away from BIRD to the outside world
- table "master": central route table, import = pull to it, export = push away
picture:
master <-- p_master_to_ebgp_r10 <-- t_ebgp_r10 <-> ebgp_r10
\ <-- originate_to_r10
- t_ebgp_r10 holds route information about routes originates to r10 and routes that are learned from r10
- originate_to_r10 is a collection of routes that is advertised by this network
- pipe to master forwards all routes that are learned from the ebgp session
- master can just export all to kernel
Let's do it
- show BIRD config for pipe and table and ebgp
- reload, show proto, show route export p_blah etc
- see that routes are imported from the other end
- see that they end up in master table and in ip r
Connecting the internal network, not only edge
- now ebgp router knows route to external network
- how to tell the ospf network?
- export the bgp routes to ospf? no!
- explain ibgp -> tell non-edge-routers about external connectivity
- ibgp: all routers must have ibgp session with edge routers
- KEEP BGP IN BGP
- do it! configure ibgp session between R0, R3
- R0 can just import the info and put it into kernel table
- wait what... bgp nexthop? -> first router outside the network
- BGP needs OSPF to provide internal nexthop towards external router -> recap: BGP trusts on OSPF behaviour
- small link nets for ebgp sessions need to be in IGP!
- make it work!
What about the third network?
- build R1,R2,R3 with 3 containers
- show AS-path
- disable link between routers, see traffic being redirected
- just put some address from the umbrella range on loopback, yolo