STE VE CASSIDY
In an unlikely first, and probable last, Steve channels the wisdom of Donald Rumsfield before heading off to meet DAFNI – and asking firewall vendors to break their promises.
It’s difficult to admit that you’re a fan of Donald Rumsfeld. Most people remember him as a warmonger, but it’s his comment about “unknown unknowns” and the nature of knowledge that will always stick in my mind. He even called his memoirs “Known and Unknown”, a nod to the mental tongue-twister he came out with to describe the hunt for weapons of mass destruction in the defeated state of Iraq.
To me, that appreciation of the division of everything in the world into things you can understand and affect, and things you can’t, is an admission of humility. I realise that a column in PC Pro isn’t where you’d normally come for analysis of late 20th-century political figures, but there is a link, I promise. This month, I’m looking at a part of the analytics business, having just walked through its hot breath, counted its cores in their tens of thousands, and tried to make sense of the promises I was hearing.
This is part of my increasing sense of unease and therefore deeper investigation of the promises made on behalf of artificial intelligence. Long-term readers might expect me to trot out my early career experiences within this field, as an apple-cheeked youth attending the meetings of the finance industry club within the Alvey Directorate – a 1980s government project to appraise the scale of the threat posed by alleged “fifthgeneration” Japanese expert systems. Not only was the scale of that threat overestimated, across all participating industries and sectors, but many of the clubs couldn’t get much interest in the systems they produced.
Against the background of that project, most of today’s press releases making claims about AI look a bit thin. The Alvey Directorate clubs worked on Intel 80286 PCs, a good five or seven generations back from the modern day: the idea that immense horsepower is required to deliver AI is hard to justify when the process of making decisions within the AI can be fitted into something as small and slow as a 286 CPU.
I’ve already managed to insert myself into the invite stream for a lot of supercomputing projects and organisations. Quite apart from the unlimited nerd appeal of data centres at the upper end of the budget scale, I’m intrigued by the comparison between what’s considered to be a “desktop” project, a “laptop” project, a “cloud” project, and then a “supercomputer” project.
For comparison of a problem (or, if you want to be business-like about it, of a budget) across fields that appear to be using the same resources, there’s nothing better than the process of running analysis of chunks of data. The overall lessons remain the same across all the industries that are now producing ever more data, ever more tediously, without let-up. Suddenly, people who saw their future in sewage management are having to re-learn lessons from computer science as practiced in the late 1960s.
So, even before I figured out where I’d be visiting, the invite to look at DAFNI from the STFC at RAL slotted neatly into one of my big interests for this year. That’s the Science and Technology Facilities Council, whose
operations are at Rutherford Appleton Laboratory, known to an older generation as Harwell. DAFNI stands for Data and Analytics Facility for National Infrastructure – a combination of hardware, software and skilled assistance that allows researchers to take their data analytics problem and give it a shot in the arm.
And it’s quite a shot. Going round RAL’s data suite soon after visiting the Swiss Supercomputing Centre ( see
issue 276, p120) was a fascinating comparison. The Swiss like large, water-cooled standalone machines hooked up to rather smaller industrystandard, diverse pools of storage devices. The British are almost the other way, with a vast array of commodity, general-purpose servers (mostly Dell, from what I saw) dependent on a massive deployment of cloud hypervisors and management tools (NAGIOS featured heavily).
This is hooked up to a highly specific storage pool, which lifts their ability to load and unload datasets from different disciplines as cloud demand suits. While the Swiss talked in terms of megawatts of capacity, the English had different benchmarks. They keep 18 petabytes available, and normally move about 3 petabytes a week. I saw management screens showing 23,000 cores spread across the estate of servers, and a wee box that said the average CPU load was 77% – although I didn’t see any information that might link those two statistics.
For anyone whose head is already spinning, I refer you to two pictures. One is of a handy chart ( see left) of the relevant prefixes for scales of data, left on one of the cabinet doors in the data centre. There were many of these cheery signs on all sorts of topics, because they’d just had a visiting group come through and signs make a better impression than the usual ranks of whirring machines.
The other picture is a bit of a sharp intake of breath for bandwidth geeks ( see above right). It’s actually the main architecture cabinet, where all those Dell PowerEdges are linked up through 40Gbits/sec fibre switches to talk to each other and to the whole world. The whole world arrives on a little cluster of four fibres, more or less centre-bottom in my picture: that’s their 40Gbit/sec link to the rest of JANET, the UK’s intra-academic network. All the other fibres in the picture are linking servers with each other and with storage. I was going to take more pictures of the racks themselves, but the fact is these are Dell PowerEdges (looked a bit like R Series 1Us to my eye) and you can see those anywhere.
The implicit disagreement between the two supercomputer houses over which architecture suits best isn’t easily resolved. Both are recipients of raw data dumps from CERN, so both have a guaranteed background workload that’s already onerous. However, the Swiss seem happy to parcel out the workloads by allocating not just a core pool, but also an OS to a particular requirement. The Brits, by contrast, are looking at the entire installed base of servers as one vast fabric of computation, and people may ask for largely arbitrarily chosen slices of that pool to do work on their particular requirement.
It’s a measure of the nature of the Harwell Campus that DAFNI’s work is actually a bit less glamorous, if anything, than that done by the Diamond Light Source or any of another dozen UK resident research projects. Driving in through the ancient maze of airfield roads, passing by buildings from every decade since the site was taken over, you halfexpect to see M taking Bond round an autogyro, or Austin Powers having the Union Jack on his Jag touched in.
Let’s stick to DAFNI, though. The trail of logic from its announcement (read it at pcpro.link/277dafni) is that answering tough questions about systems is best done by stopping workers in the various fields from developing their own measurements, algorithms and reporting. Instead, it’s better to get your data to be DAFNI-ready, and then benchmark not just your particular data sources, but your entire segment’s overall disaster and extreme event resilience, by submitting that data to a standardised and centralised appraisal.
This isn’t quite what I was expecting. I’m reasonably au fait with the way that large urban projects are modelled, at least when it comes to their financial requirements, but clearly here the net was being thrown far wider. For one thing, the assertion is that smart cities and a sensor-orientated design and analysis in many quite narrow fields of research are all uniformly appraisable using the same basic chunks of logic. For another, the DAFNI team threw out a casual aside that made me want to stand up and cheer. They said that quite a lot of researchers around the subject of infrastructure do their work on a single desktop PC.
Seldom have I found a snippet of validation so satisfying. When writing about such big projects, I always try to relate the challenges and the lessons back to phenomena you can experience on your own machine. Here’s a pre-baked community of people who are being approached because the work they need done can be scaled up, from that one humble machine to an array of CPUs counted in the tens of thousands, running at about 1.2 megawatts.
“I’m reasonably au fait with the way large urban projects are modelled, but clearly here the net was being thrown far wider”
Oddly, the DAFNI team was slightly shy about how many of its applicants were in this category, but I think this is par for the course. My own experiences with the emotional consequences of a sudden performance boost is similar. I once nudged a researcher to give up on his Pentium D HP workstation – running beautifully but by then eight years old – and have a go with a Nehalem-series Xeon. His run times went from 15 minutes per scenario to no detectable delay. Incredibly, he didn’t do a victory lap of the office when this result came in: he hid it away, still using the hot old Pentium D as his email and web-surfing machine.
The same reaction came much earlier in my career, when we took a chunk of BASIC that ran overnight to produce a single output number on the standard IBM PC, and adapted it to compile on a somewhat larger VAX cluster mini-computer. Runtimes were mostly within the time it takes to say “one two three GO”; unfortunately, there were up to 60 other users on the VAX at the same time. So the next meeting at the Treasury on this subject was enlightened not by 20 or so runs off a wheezing IBM PC, but by well over 2,000 courtesy of the VAX. Again, sheepishness was the predominant reaction.
The burning question posed by the headline of DAFNI’s introductory piece is, does more horsepower equate to better resilience to extreme events? I think the answer in this era of intolerance for expertise has to be frank and transparent: “not directly”. There will be a bureaucratic benefit, that a type of model benchmarking becomes possible to give researchers some idea of whether they’re under (or over) achieving when it comes to the completeness of their model, or the comparison with other models.
It’s a long road from a model of a city’s water supply and sewerage system, to a plan that keeps them separate during catastrophic flooding. Likewise, a view of the entire domestic UK internet may have certain philosophical similarities to the water/sewage design – but that doesn’t mean that the weaknesses popping out of the model are easy, obvious or even cheap to fix.
Which is why I always bear Rumsfeld’s quote in mind when it comes to extreme events and extreme computing. It isn’t the known knowns that are the problem, nor even the known unknowns. Those can be put in the model and given some numbers. It’s the unknown unknowns that play the largest part in determining the extreme status of an event, and no amount of computer power is going to help with those.
Promises in security
Another visit this month was to the new City of London offices of Juniper, the firewall and networking company. Juniper has been encroaching on the home turf of Cisco for ten to 15 years now: a fight up in the enterprise network border security world, which us humble toilers in the small-to-medium business sector don’t really have to listen to much.
Except that vendor-to-vendor isn’t the only fight these guys are in. They have the challenge of being the perceived front line of defence against each hack attack that comes along, which entails a certain economy with the truth. If you’re a firewall OS vendor, then your customers will divide into constituencies when it comes to the announcement of a new hole. There will always be those who fly into an instantaneous panic; those who immediately replace your equipment with that of your main competitor; those who sit tight; and those who utterly ignore your announcement. It’s this spread of responses that makes the whole topic of security so deadpan, so cautiously worded that the necessary impact just doesn’t seem available.
The problem I found Juniper to understand the best, though, is the matter of the complete industry right-turn. Many firewall vendors make promises; they have to – it’s the main method by which they can demonstrate their ability to compete. The problem is that the security threats have no obligation to keep their attacks within the terms of reference of a promise made by their opponents.
In short, this produces a series of statements from the vendor that have a shelf life of about half a year. Yet customers still make their security bod’s life a misery, by pointing out that last year the hot protection tech was all about unsolicited, possibly halfcomplete IP packets bashing on the firewall from outside. Here’s the sheet of paper, says the CEO. You told me before Christmas that product X was the bees knees and that we’d be okay. And now you want something else…
This is the tough part of communicating how much of a paradigm shift this business has been through. For one example, the Juniper spokespeople said as an aside that they were now having to cope with the gangsters being rich enough and well enough protected that they could afford to buy example firewalls at list prices, and take out support contracts, and then bash on those machines at their leisure in their labs, looking for ways in. So the firewall software has to take a leaf out of the Volkswagen book and have extra code that figures out when it’s being tested, and whether to call home and cry for help.
Or call elsewhere. A long conversation about the difficulty of sizing a bit of kit to do traffic analysis for ransomware ended up making reference to access to IBM Watson: buy Juniper’s smart analysis suite and start asking it questions, and what you are actually talking to is a network-trained variety of Watson. There’s a shift in the basis of function for you, right there. And Juniper’s own reaction to this is symptomatic of the problem – it’s at the same time pleased to be able to do such a thing, and somewhat embarrassed that the earlier iterations of its product line let the bad guys advance so far that it’s become necessary.
I think the entire industry will have to draw a line under all those earlier promises, just to make the conversations easier and clearer for the much larger audience that must now pay attention. Jargon and hype don’t secure your bank account, your personal data or your communications.
“Firewall vendors make promises; they have to – it’s the main method by which they can demonstrate their ability to compete”