The great escape: how to move from one firewall to another (or use several at once)
Come close. This is a private chat between firewall administrators, while the rest of this month is a private chat to home internet workers. I know, I’ve argued that communications on this topic are better when they are open – but that’s a perfect world aspiration, and I’ve not met an equally beleaguered group in 2020 than the small tribe of firewall administrators.
Besides, this was my proudest hack of the lockdown. You may remember the start of this tale from four months ago, at the start of the lockdown, when I shared the story of a client that had an ancient SonicWall firewall that was maxed out in terms of home user VPN connections ( see issue 310, p122). It was also a royal pain to live with, having strange behaviours around user accounts and passwords stored inside the device that almost guaranteed a visit to the office three times a week.
This device and the pool of VPN users it served had been adequate to the needs of the business for the previous five years. Only once lockdown started did it become apparent that extra connection licences or indeed updates were really not going to be forthcoming. That special kind of misery that comes from having to follow stupid, mean-spirited and arbitrary rules to do with licences, seemed to beckon: especially considering that the client had the replacement firewall – a WatchGuard – ready.
The problem was that in normal times, it could have gone without VPN capability for up to a week, maybe even two, while we swapped the old device out and inserted the WatchGuard. In lockdown, though, that rule was toast. Any interruption to VPN services could be a material threat to the business.
So how do you drop in a new firewall, which in most deployments must take over as the default outbound gateway and the incoming general traffic direction device? It looks like a catch-22. You can’t change the incoming address for services without creating an open loop, where the outbound packets traverse a different device from the inbound ones. This produces misapprehensions and is frowned upon by firewall coders, frequently simply causing those precious external packets to head straight for the bit-bucket.
The eventual solution was a strange mixture. In a field of IT filled with wizards and walkthroughs, it was two completely separate settings that came together to allow multi-gateway, multi-firewall, multi-VPN capability. No extra licences required, nor any baroque entanglements with certificates. The first simple setting was static routes: each firewall could declare that a particular network (public, private, whatever) was being handled by some other router. Fill in the form to define the network and enumerate the router, and henceforth any packets arriving on that device intended for a routed subnet would be sent on their way, no more interference required. The customer already had a few branch offices, so this feature was actively in use, keeping the inter-server traffic flowing up to the branch, and back out again.
Part two of the fix was realising that each pool of single-PC VPN clients, on either firewall, would look like another branch to the rest of the system. Each firewall allocates an essentially private address to any successful connecting VPN client machine.
That is, an address that helps to keep the routing tables together, inside the firewall, but which is nonetheless routable by the rest of the wider network.
Combine these two features and suddenly you have a seamless migration. If each firewall’s posse of laptop VPN machines are actually a virtual branch office, irrespective of where they’re located, then they can be addressed as a group by a static route, on the firewall that’s not responsible for their management. Suddenly, even internal servers using the old firewall can route traffic to migrated VPN users presented via the new firewall. It didn’t even matter whether the two cooperating firewalls were from the same vendor or were using the same connection protocols for their clients: the two tribes of client machines were identified solely by the synthetic subnet assigned by the firewall config, and could have been as different as Macs and PCs, or Android phones and Windows 8 phones. It didn’t matter. They were sorted by subnet alone.
I have to report that the in-house IT manager is a long-time subscriber, and I’ve been popping in to see him for nearly two decades now. Never before have I seen him so unable to hide his amazement that this combination of settings and architecture actually worked, first try and without an annoying midday device reboot. That’s what elevates it from a noteworthy idea to a great hack.