Networking Software networks..........
The Dark Lord of Network Operations, Tim Armstrong, sheds some light on software-defined networking and open network switches. Fear his power!
See how Tim Armstrong makes exceedingly light work of software-defined networking.
The rise of open network switches and Software Defined Networking (SDN) has begun a paradigm shift in datacentres, and with a global shortage of network engineers now is the time to get on the bandwagon.
Deep in the heart of both the Linux kernel team and the engineering department at Mellanox, engineers have been working hard to bring the biggest change to Linux networking since Intel opened the source code for the e1000 driver. And just like that driver changed the face of Linux networking 10 years ago, SwitchDev does it again. It embodies the beginning to true SDN to the Linux community, enabling complete control of a switch’s hardware with the tools you use every day for managing any Linux machine’s network stack.
SwitchDev provides an abstraction layer over the switch’s hardware, making it possible to configure your switches as if they’re nothing more than a Linux server with an enormous number of NICs. All the heavy lifting is handled for you by the kernel: bridges converted into FIBs, interfaces into ports, VLANs are… well VLANs, and all the forwarding and routing is off-loaded onto the hardware automatically.
Network emulation with GNS3
To open this tutorial up to everyone who doesn’t happen to have an £8,000 Mellanox Switch just sitting around doing nothing, we’re going to use GNS3 and some virtualised appliances. Because Mellanox’s drivers have been upstreamed to the Linux kernel tree our Open Network Switch appliance and all the kernel we’ll be using is up to date and compatible with the majority of features in the Mellanox Spectrum ASIC.
Specifically we configured the switch appliance to be as close as possible to a Mellanox SN2100 as is currently possible in GNS3. As for the distro that we’re going to use on the appliances, we’re going to go with Devaun because it’s lightweight, stable and secure.
Instructions for installing GNS3 on pretty much any OS are available on www.gns3.com, but for simplicity we’ve included the Ubuntu instructions here: $ sudo add-apt-repository ppa:gns3/ppa $ sudo apt-get update $ sudo apt-get install gns3-gui gns3-server
Because the images we’re using are KVM/QEMU, you’ll also need to install qemu and libvirtd again. We’ve included the instructions for Ubuntu below, but you can find instructions for your preferred distro quite easily online. $ sudo apt-get install qemu-kvm qemu-system-x84 qemu $ sudo apt-get install libvirt-bin
Now GNS3 is installed we need to import our appliances: a GNS3 compatible facsimile of the Mellanox SN2100, a simple Devuan server, and a Devuan desktop. These will be downloaded automatically. When you open GNS3 for the first time you’ll be prompted with a wizard that will guide you through importing appliances, and because this is incredibly straightforward we won’t duplicate the instructions here.
Plugging and playing
For this tutorial we want to keep things simple so as to keep the focus on Linux, or rather more specifically SwitchDev, without going off into an overwhelming dissertation on the merits and issues of different topologies. To this end we’ll use only one switch, two clients and one server.
To do this simply select the switch drawer on the left and drag an OpenNetworkSwitch into the middle of the work area. Next, drag a couple of Devuan Desktops and a Devuan Server from the End devices drawer into the work area. Arrange your devices around the switch however you feel comfortable.
To create a connection from Server to the Switch select the Add a Link tool from the bar on the left, then click thse server, select eno1, and then click the switch selecting SW1P1. Now do the same for Desktops 1 and 2 selecting SW1P11 and SW1P12, respectively. Then click the “Add a Link” tool again to disable it.
Once all the devices are connected correctly we can start them up. Press the play button on the top toolbar. The VMs will take a short time to launch; you can monitor the progress of the boot by opening the OpenNetworkSwitch’s console. Open the console by right-clicking the device icon and selecting Console. Once booted you can login with the user root and the password nauved. Once logged in you can see how the interfaces are presented with the ip command: $ ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc
noqueue state UNKNOWN mode DEFAULT group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: mgmt0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 00:75:f0:86:64:00 brd ff:ff:ff:ff:ff:ff 3: sw1p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br0 state UP mode DEFAULT group default qlen 1000
link/ether 00:75:f0:86:64:01 brd ff:ff:ff:ff:ff:ff ... 18: sw1p16: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 00:75:f0:86:64:10 brd ff:ff:ff:ff:ff:ff
Because Mellanox’s Spectrum series ASIC supports everything from 100Gbps down to 1Gbps it’s normally helpful to set the port speed before activating any interfaces or including them in a bridge. In this case we’re using a virtualised environment so this isn’t required. However, if for example we were going to use a 1Gbps connection for each desktop and a 10Gbps connection for the server, then we could configure that by setting the following: $ ethtool -s sw1p1 speed 100000 autoneg off $ ethtool -s sw1p11 speed 1000 autoneg off $ ethtool -s sw1p12 speed 1000 autoneg off
Bridging the gap
Now that the ports are all set to the right speeds we can create the bridge. This is largely the same as creating any Linux bridge on Debian-based systems (such as Devuan). This is achieved by editing /etc/network/interfaces.
In order to associate a switch port with a bridge you need to decide on what type of association it has with the bridge. At the time of writing Linux’s VLAN-aware bridge system isn’t particularly user friendly when it comes to attaching access ports, therefore it’s recommended that you use classic bridges. If a given interface needs to be a trunk port you also need to define each VLAN at the port level and attach it to each of the VLANs (bridges). In this example however, we’re keeping it simple, so we won’t be using any trunk ports.
Let’s set up one bridge for each of the ports we’re using, so that we have the option to attach more devices to each subnet at a later date without having to bring existing connections down. Append the following to the /etc/ network/interfaces file: auto br0 iface br0 inet static bridge-ports sw1p1 bridge-stp off auto br1 iface br1 inet static bridge-ports sw1p11 bridge-stp off auto br2 iface br2 inet static bridge-ports sw1p12 bridge-stp off
Each bridge acts as a virtual segment in the switch and enslaves the ports we’re using to the segment. This automatically alters the port-mapping and FIB.
Bringing each bridge up now would enable Layer 2 switching on that segment, which would be an acceptable solution if we wanted a Layer 2 solution, but Layer 2 networks aren’t very scalable, and no-one buys a £8,000 switch to do something a £800 switch can do. Because we want to work with modern architectures, we’ll need to enable Layer 3 switching and Inter-VLAN/cross-segment routing. On a SwitchDev platform it’s as simple as adding the relevant IP definitions on the bridges. To do this just modify /etc/ network/interfaces: auto br0 iface br0 inet static bridge-ports sw1p1 bridge-stp off address 10.1.0.254 netmask 24 auto br1 iface br1 inet static bridge-ports sw1p11 bridge-stp off address 10.10.0.254 netmask 24 auto br2 iface br2 inet static bridge-ports sw1p12 bridge-stp off address 10.20.0.254 netmask 24
Now we can start bringing interfaces up. $ ifup br0 $ ifup br1 $ ifup br2
Now the switch is configured it’s time to configure the server and the clients. This is fairly standard boilerplate but for verbosity it bears repeating. On the server you’ll want to edit /etc/network/interfaces, setting the eth0 IP in the range of
10.1.0.1 to 10.1.0.253 and the gateway to 10.1.0.254. In CIDR notation this would be 10.1.0.0/24 (excluding the network address , broadcast ). This would result in a config block similar to the following: auto eth0 iface eth0 inet static address 10.1.0.1 netmask 24 gateway 10.1.0.254
Next, we need to do the same for the two desktops, modifying the IP addresses appropriately. For example, the config block for Desktop 1 would be as follows: auto eth0 iface eth0 inet static address 10.10.0.1 netmask 24 gateway 10.10.0.254 Once we have both the server and the two desktops configured we can bring the networks up as normal: $ ifup eth0 Finally, we can test the connection from each end-device to the switch. From the Server to the switch: $ ping 10.1.0.254 PING 10.1.0.254 (10.1.0.254) 56(84) bytes of data. 64 bytes from 10.1.0.254: icmp_seq=1 ttl=64 time=0.359 ms 64 bytes from 10.1.0.254: icmp_seq=2 ttl=64 time=0.493 ms From Desktop 1 to the switch: $ ping 10.10.0.254 PING 10.10.0.254 (10.10.0.254) 56(84) bytes of data. 64 bytes from 10.10.0.254: icmp_seq=1 ttl=64 time=0.513 ms 64 bytes from 10.10.0.254: icmp_seq=2 ttl=64 time=0.471 ms From Desktop 2 to the switch: $ ping 10.20.0.254 PING 10.20.0.254 (10.20.0.254) 56(84) bytes of data. 64 bytes from 10.20.0.254: icmp_seq=1 ttl=64 time=0.314 ms 64 bytes from 10.20.0.254: icmp_seq=2 ttl=64 time=0.499 ms Finally from the Server to the each Desktop: $ ping 10.10.0.1 PING 10.10.0.1 (10.10.0.1) 56(84) bytes of data. 64 bytes from 10.10.0.1: icmp_seq=1 ttl=64 time=0.712 ms 64 bytes from 10.10.0.1: icmp_seq=2 ttl=64 time=0.520 ms $ ping 10.20.0.1 PING 10.20.0.1 (10.10.0.1) 56(84) bytes of data. 64 bytes from 10.20.0.1: icmp_seq=1 ttl=64 time=0.849 ms 64 bytes from 10.20.0.1: icmp_seq=2 ttl=64 time=0.591 ms
You should now see all the connections are functioning correctly and our switch is handling the routing for us nicely.
So what’s really happening here? Well, in our simulation, nothing much: we effectively just created a Linux-based router, but had this been a SwitchDev compatible switch it would be a very different story. We just did something that a few years back would have required two NDAs and an SLA just to get the SDK. We configured a switch, with Linux, using the tools we’ve known for years, without a single binary blob in sight. Not even a shim, just pure open source Linux. Time to stand back and admire your handiwork.
Entering the real world
For those who have configured Linux bridges before, all this will have seemed very familiar, but that’s the point. SwitchDev is supposed to make things easy, by reducing complicated switches CLIs, APIs and SDKs into simplistic Linux commands that just work. You install your distro of choice and then get on with it.
What’s more, you can fully automate the process using your favourite tools such as Chef, Ansible or Puppet. If it can configure a Linux network stack it can configure a SwitchDev compatible switch. Thanks to the engineers at Mellanox and the Linux Kernel Developers, SDN is no longer complicated, expensive or reliant on specialised black box controller units.
Unlike other SDN technologies such as OpenFlow, SAI and OpenNSL, SwitchDev is keeping the brains in the box, resulting in network latencies in the range of tenths of a millisecond − all while simplifying the implementation of massively redundant network topologies.
Furthermore, thanks to SwitchDev the convergence of Switches and Routers becomes possible, resulting in a simplified network edge where one device supports both functions. Adding BGP support to your switch is as simple as installing Quagga and hey presto, you have all the most popular routing protocols available.
With any luck the pioneering work done by the Linux Kernel developers and the engineers at Mellanox will be the beginning of the end for the layers of bureaucracy and licensing required to gain access to Switch APIs.
Beyond the basics
Now that all the basics are up and running, let’s tackle a more realistic network, splitting the responsibilities of the server(s) and the desktops into a closer approximation of a mid-size company network (not quite as complex as an Enterprise network, but it’s got most of the building blocks in place).
To start off, because we’re using KVM/QEMU as our hypervisor we need to shut down our appliances before we can rewire our network in GNS3. To do this hit the big stop
button on the top toolbar in GNS3. Next, drag another switch from the appropriate drawer on the left onto the workspace.
Now click the path between the first switch and the server and press the delete key. Then pull up the Add a Link tool, connect the server to the new switch on sw1p10, and connect
sw1p1 from the first switch to sw1p1 the second switch. Then start the Appliances again with the big play button.
After the switches have booted up it’s time to reconfigure the interfaces. First we need to adjust the IP range of the VLAN associated with sw1p1 on the original switch so that we don’t have to reconfigure the server also. Let’s have the switches talk to each other over the 172.16.0.0-255 range. To do this, edit the br0 definition in /etc/network/interfaces: auto br0 iface br0 inet static bridge-ports sw1p1 bridge-stp off address 172.16.0.254 netmask 24
Then simply restart the interface. $ ifdown br0; ifup br0
Now let’s take a look at the configuration for the new server switch. We connected sw1p1 from the desktop switch to the sw1p1 port on the server switch. So we need to add a config for that port that’s in the same range, for example: auto br0 iface br0 inet static bridge-ports sw1p1 bridge-stp off address 172.16.0.253 netmask 24
Next we need to configure the server switch port. We’ll create a VLAN-aware bridge and enslave the port so that the server is connected to the bridge, and then enable L3 on that VLAN. auto br1 iface br1 inet static bridge-ports sw1p10 bridge-stp on address 10.1.0.254 netmask 24
Once we’ve saved that config we can bring those interfaces up and test the connection to the server: $ ifup br0 $ ifup br1 $ ping 10.1.0.1 PING 10.1.0.1 (10.1.0.1) 56(84) bytes of data. 64 bytes from 10.1.0.1: icmp_seq=1 ttl=64 time=0.312 ms 64 bytes from 10.1.0.1: icmp_seq=2 ttl=64 time=0.314 ms
You might have noticed we haven’t put any routes in the switches. As a result we can’t ping from the server to the desktops or vice versa. Because we don’t want to end up managing a growing mess of static routes, we’re going to use OSPF to safely distribute and synchronise our internal routing tables. As we weren’t using VRFs in this tutorial, setting up OSPF via quagga is quite simple. Start by entering the quagga terminal: $ vtysh
The brings us into a read-only mode, where we can use the show command to inspect the various routing tables and links. To enter a write mode so that we can add our OSPF config, run the following: $ configure terminal
Now we’re in the configure mode enabling OSPF and distributing our locally attached routes is as simple as: $ router ospf $ ospf router-id 172.16.0.1 $ network 0.0.0.0/0 area 0.0.0.0 $ redistribute connected
Then we need to exit the router config, configure mode, and vty so run the exit command three times. $ exit Devuan-NOS(config)# $ exit Devuan-NOS# $ exit
Now repeat this on the client switch, replacing the routerid with the ip address of the br0 bridge, and we should be able to ping from the server to the desktops again.
This may seem a little overkill for a network with only one server and two switches, but we have laid the groundwork for building a fully meshed Clos network. This set-up, with only a few minor changes, is scalable to the extent of even the most demanding enterprise environments without significant increases in latency or decreases in throughput.
After around 30 seconds the routing table should be synchronised and you’ll be able to test the connections. This is best done with a nice cuppa and a few choice biscuits.
Tim Armstrong is the network architect at Nerdalize. He designs and implements datacentre and ISP networks. He’s a real control freak when it all comes to Bits.
Double-clicking a device that’s booted up opens the appropriate terminal.