The bot hunter

Chris Thornett has unplugged his IP camera and buried it in the bottom of his garden after meeting Pascal Geenens…

2018-11-20 -

Chris Thornett has unplugged his IP camera and buried it in the bottom of his garden after meeting Pascal Geenens! You’ll do the same once you read the interview!

Anyone working in cybersecurity will say that the attackers have an unfair advantage. This is why using machine learning and more advanced deeplearning systems are seen as the Holy Grail by some in the field. Back in September 2018, Pascal Geenens was at the Linux Foundation’s Open Networking Summit Europe in Amsterdam to talk about automating detection and mitigation of attacks as EMEA Cyber Security Evangelist for Radware. He’s also, as we discovered, an avid botnet hunter and honeypot keeper.

Linux Format: You mentioned the use of machine learning by cybercriminals in your talk. Is there a sense that security automation can restore the balance?

Pascal Geenens: Looking at automation in general terms, and I include scripting in that, automation has been used for a long time by cybercriminals. It’s always been a problem. If you look in our area in DDoS protection, for example, when you have a DDoS attack the idea is to detect it, then characterise it, then you create a signature, you put it in place and block the attack. Now if the attack stays the same for the next two to three hours that’s a good thing, but through automation hackers can change the attack vector, so they’ll write a script and change the port. They change from UDP to TCP, they use other attack mechanisms and go for amplification to reflection and other attacks, so they change the attack vector continuously.

You can have detection with something like netFlow (traffic profile monitoring technology), for example, and based on that I have a security operations centre that sits there, which is going to put in place the signature to block that attack, but by the time they investigate – they look in the pcap, in the packet capture itself, they characterise the signature and put it in place – but by that time it’s already moved on to another attack.

More recently, because of all the IoT botnets that have been going around, we see a lot more burst attacks. These are very small attacks, but with a high amplitude and a short timeframe: they last six seconds. They bring in a spike, they let it drop off and the whole infrastructure is on the verge of falling over and by the time it restores, they bring in another spike.

Now the reason they do that is because, typically, some customers might be using SNMP, for example, for detection. But if you use SNMP you’re doing polling every five minutes and if you’re doing that and have a small spike of five seconds, it’ll be averaged over five minutes and you don’t see that threshold of 80 per cent of your bandwidth being consumed, when the packets per second are high enough to cause a Denial of Service. So you can’t detect it and if you can’t detect it, you also can’t divert it to the cloud to actually block the attacks.

That’s one thing, but it also forced us to change our protections because, while

they’re completely automated, what we do is characterise the attack. It takes us 15 seconds between detecting an attack and then coming up with the automated signature and putting it in place. Of that 15 seconds, the attack is only six seconds and it starts the whole cycle and then it drops off and because we’re DDoS we’re stateless. This means we don’t want to save that state because state is a bad thing if you’re under DDoS attack, because resource consumption is one of the typical attacks, so we didn’t want to save state.

The next wave comes in, we start characterising and before we get there… boom! It’s gone again. So now we have to change that and put up defences so whenever we detect it, we start to rate limit and then we’ll characterise it for six seconds. We’ll save the state and then the next time that attack comes in, we continue where we left off until we have a full 15 seconds. So that’s something that’s done through automation, so what they do is they have their script for their botnet, they have their central control, they write a script ‘Okay, do an attack for six seconds’ and they send it to the bots. All the bots attack at the same time, at the same target for six seconds and then they drop off.

LXF: Is automation restoring the balance?

PG: Yes, because without automation we would never be able to survive those kinds of attacks, even from script kids. So it doesn’t have to be a seasoned attacker, because they can send any attack to anywhere. They just put it in the script and perform the attacks, especially on DDoS. It’s very easy. So automation in that area is massively important.

LXF: One of the things you talked about was good training data…

PG: There are some open source projects out there that have data, but that’s not relevant for your network. You could have a log of events and it also depends on what kind of machine learning or what kind of AI you want to build. If you want to base it on events from Windows or if it’s going to be network traffic logs – pcaps, for example, or NetFlow information – all that information looks different. And that’s one of the painful things about AI and the whole data science. You need to normalise your data.

That’s not the fun part of the modelling. It’s the normalisation that takes most of the work, identifying the data and then putting them in the same format so that you can push them through one model and do the actual work with it. That’s the hard part and then you need to label it.

There are two things I didn’t talk about in my presentation: supervised and unsupervised learning. So with the supervised, you need labelled data. You have images to say this is a cat, this is a dog. You feed it into the network for learning by example and it’ll generalise based on if that fitting is good. You can show it a new image and it will tell you that’s a cat or dog. Now the unsupervised, that’s unlabelled data. Here, you’re asking the machine learning to find structure in the data, which can be very interesting − especially for networks.

Typically, that’s not deep learning; that’s more the traditional machine learning that you’re using: k-nearest neighbour, for example. It’s a very simple algorithm. What it does is it takes all the points and finds the k-nearest points right in the space. By doing that on every sample you find the nearest points. You’ll start to cluster the data, then you create multiple clusters. You don’t have a label for each cluster, but you do know that when a point comes in it’s close to one of the existing clusters and you can also say which cluster it’s closer to.

This means it can help you understand the data; it can help you to find anomalies; and it can help you to build a newer model where you can label the data. Using that cluster you can say, for example, label all those points as good and that label is, for example, Office 365. Then you can label the data and that can help you enable data for supervised learning to take over.

LXF: So if readers are interested in trying it for themselves, are there any particular frameworks you would suggest? PG: Tensorflow is probably the best known and Azure [ML] is pretty elaborate, but the interesting part is the playground at Tensorflow. If you go to https:// playground.tensorflow.org you can see a visual representation and you can feed in data and then you’ll see how the model is through different epochs because the model goes through training, and that training is you actually giving a sample. Let’s take again the example of using images. You give an image to a model, it calculates and it comes out with some measure. That measures the first time

PASCAL ON THE BOTNET THREAT “MORE RECENTLY, BECAUSE OF ALL THE IOT BOTNETS THAT HAVE BEEN GOING AROUND, WE SEE LOTS MORE BURST ATTACKS. THESE ARE VERY SMALL ATTACKS, BUT WITH A HIGH AMPLITUDE AND LAST SIX SECONDS”

you put in an image in a virgin model, and of course, that will be wrong. So what you need to do is change and tune the different weights between the perceptron and the links between the perceptron. Those weights… you’re going to tune them as such so that the outcome is correct.

LXF: If you had a small Linux-based network, what kind of data would be good for you to feed in? You’re going to say it depends aren’t you?

PG: [Laughs] I know, but if you want to play around with it, so for me at home, the way that I started was through IoT, botnets and now more machine learning and AI, because those are the algorithms that I need to analyse. I started out writing my own honeypot, then back in October 2016 the Dyn attack happened. Somebody told me every two minutes there’s a botnet knocking on every IP address in the world. I said, if that’s true, I need to see that at home. So I started listening. Simple telnet in the beginning and I saw it’s true. It was every three minutes, give or take.

I started getting more elaborate on the honeypot putting in more protocols and now I’m sitting on large amounts of data, which is still restricted because it’s my home, but we took that same idea and we put it into the company and we built honeypot networks and now we have more than 300 honeypots across the world.

LXF: The same thing but on a bigger scale. PG: Yes, there we get, like, 10 million events per day and those events must be analysed. You can do some simple training, but you’re talking big data. So from there we started looking into machine learning and AI. I was already doing it before, but now I’m more actively doing it and at home. I use my honeypot data, which is interesting data because that’s all bad data. However, that’s not something that you want to use for your security per se because it’s more about finding structure in the data. So finding what is the Mirai botnet, what are the Hajime botnets or what kind of other attacks are they using or new URLs that they’re using with new vulnerabilities. So that’s more about classification, right? So you want to find classified botnets so that you can track, for example, how big is Mirai today? Or how many variants of Mirai did we see in the past two years?

LXF: Sounds like an excellent Message of the Day opportunity.

PG: There are so many Mirais, because it was open sourced. So we’ve had hundreds of variants in the past two years. And the latest one came out with 16 exploits built-in. So that’s one area where it’s very interesting to look at the data and use clustering, but I typically use standard machine learning, I don’t go into deep learning, for the AI, because there are so many surprises you can get when you just run it without knowing what you’re looking at. There’s so many pitfalls: overfitting because that’s the number one issue that you see. I want to train my model. Here’s my example data and fit it in the model then you see your error rate go down.

Your error’s 0.001 per cent – great model! The first time you give it a real-life sample the error explodes! Why? Because of overfit. Your model is going through all the points perfectly. So when you take a point that’s not exactly in your data set, it will give you a big error. That’s the problem with overfitting, but when do you know that you’re overfitting? Those are all the things I didn’t talk about in detail.

I showed you [in the talk] one deeplearning model which was multistage; one stage connected to the other but you have a chart with all different models. Some are circular models, others have feedback loops. You have the LSTM, for example, the long-term short-term memory. Typically, this is used for anything that’s a sentence – if you have multiple words. So you need to take into account the little bit of history of the previous words to be able to interpret the next word. That comes in a command or a sentence. You need to have that feedback loop that feeds back little bits of history into the perceptron, so that it can make a decision not only based on the actual data that you present now, but also the previous database data. That’s typical in speech and everything that’s time-series will more or less use LSTM.

Okay, those are little things that you need to know, if you don’t have the experience you start with your standard neural network and then it’s not working. Why is it not working? Or how do you get your data into features? An image is easy: you take all the points in the image – every point has an RGB value – you make every point a feature, so every point is a link into a perceptron and from there you just convolute your network into something that comes out of it.

Now, you can do it like that, but if you do then you’ll have a very expensive network in computing terms, because if you have a 1,000 by 1,000 times three values, [laughs] then that’s a lot of nodes, so everything is computed in a matrix. However, that’s also the funny thing about those networks. You see it as something that goes from one perceptron to the other in your mind, but actually it’s calculated in just one step. They just make a matrix out of this.

The input is an input vector. You instruct a matrix to multiply those matrices and you get the output. It’s this matrix multiplication that makes it a hard mathematical problem. To do that on one million points… well, that’s going to take a while if you want to do it at a home on your Raspberry Pi [laughs].

You can actually do it in 10 lines of code that’s what I’m saying. It’s so accessible but at the same time, it’s such a black art when you don’t know what’s happening.

For identification of a cat or dog or facial recognition on Facebook, who cares if it messes up – nobody really cares. But if it’s facial recognition that they used in China to put people in jail; to arrest people in the street because they are criminals, they had 80 per cent false positives.

LXF: Do you feel manufacturers in particular have learnt their lesson when it comes to improving IoT security?

PG: I think the manufacturers have caught up – some of them at least have tried to put security in their IoT devices, but the problem is that today we are sitting on the problem. We have all those older devices: all the modems with old firmware that don’t get updated; IP cameras that aren’t being updated.

PASCAL REVEALS HIS MOTIVATION… “SOMEBODY TOLD ME EVERY TWO MINUTES THERE’S A BOTNET KNOCKING ON EVERY IP ADDRESS IN THE WORLD. I SAID, IF THAT’S TRUE, I NEED TO SEE THAT AT HOME. SO I STARTED LISTENING”

LXF: There’s a strong sense of this being to do with the legacy devices?

PG: If [new devices] ship with new firmware and if users update their firmware on their existing devices then most of the vulnerabilities that are being exploited today will be fixed. When the security researcher releases the vulnerability, typically they will have gone through talking with the manufacturing, waiting 90 days before there’s a patch out. So when the patch is out, that’s actually the right window where hackers come in. They just copy the vulnerability, put it in their bots because they know that most users won’t update – certainly not within one day. So if there’s a new big vulnerability it’s not more than 24 hours before a new botnet is abusing it. We’ll see it in our honey pots with that vulnerability trying to exploit it.

We see vulnerabilities from D-Link routers from four years ago still being used and they have some success. So a vulnerability that’s only a day old is a gold mine in IoT terms, because this one will stay active for two or three years, probably. Until people throw away their modem or their modem breaks or for some reason they do a firmware update because their new iPhone doesn’t work anymore. They need an incentive to update, right?

LXF: Not getting their identity stolen should be a good incentive… PG: If your identity is stolen is the first thing you do is update your IP cameras or your modem?

LXF: Last year you covered the use of a telnet exploit and how old technologies are often used as a way in.

PG: That’s what the original Mirai used. That was probably about BrickerBot and BrickerBot is something that can get people to care, because it was a botnet but not a typical worm. Mirai is more like a worm: it infects a device. So you have your IP camera, the IP camera is open on the internet. How does it become open? Well, either someone installs it with a public hotspot or people at home install it, but there’s a convenient protocol called UPnP that automatically opens a pin hole in the firewall so that you can use your phone from outside and can access your IP camera, so that way it’s publicly accessible. What Mirai did was telnet on to publicly accessible devices, mostly IP cameras, DVRs. What they used in Mirai was simply 61 default passwords.

With just 61 passwords Mirai was able to infect hundreds of thousands of devices around the internet. That was just telnet. Now, when Mirai infects a device it starts to scan for new victims and tries to exploit them through Telnet. If it finds that it can access one with Telnet, it sends the IP to a loader server and that loader is then downloading Mirai right onto that device. That’s the reason why every two minutes you get every IP on the internet scanned. Every device on the internet that’s infected starts scanning for new victims. That means that you have exponential growth of the number of infected devices.

Now BrickerBot – and indeed the actor behind BrickerBot who called himself the ‘Janitor’ who I spotted using my honeypots when he came out to one of the journalists – what he does is not actively attacking devices but listening. And what he wanted to do was purge the internet of infected devices, so he was listening and when he felt that somebody was scanning him to access him with Telnet with one of those passwords. He knew that was an IoT device that was infected with Mirai, right? So he was counter-attacking it.

He’d use a list of exploits and he tried to exploit the device and then he runs a set of commands to corrupt the flash. Most of the flash in an IoT is read-only. However, there’s a small partition that needs to be writable where you save the settings. If that one gets corrupted most of the cameras can’t react to that.

LXF: What do you think of the likes of the Janitor and what they did?

PG: Hajime is one of those bots that existed before the Dyn attack. Two days before the Dyn attacks, the first report on Hajim came out. Hajime is a botnet like Mirai, but Mirai is very unsophisticated. Command and control is just TCP, fixed IP addressing, just two strings into the Mirai binary you see the IP address and the port of the command and control server, you Telnet to the command and control server.

Hajime is much more sophisticated because he was using the BitTorrent network with dynamic info_hashes. Everyday he changed the info_hash and the info_hash is how you locate a file. You had dynamic info_hashes, building an overlay that moves every day on top of BitTorrent and has RC4 private-public key encryption. So it distributes a command and control that could update the botnet itself and could exchange configuration.

A typical botnet can be taken down by eliminating its command and control, but not so with Hajime. It’s distributed so you can’t just take it down. It’s still out there and at some point had 300,000 modems in his control. But Mirai also protects itself from other bots. Why? Because it’s like a war out there.

You have hundreds of hackers trying to build a botnet for their booter and stresser services for DDoS attacks or mining. There’s lots of competition and luckily for us, because that makes the “market” very fragmented, which means that bot sizes are like 100,000 or 200,000. Now coming back to BrickerBot and Hajime.

LXF: I was going to ask about the Janitor.

PG: I understand why he wants to do it, but I don’t agree with his methods because he’s breaking things and he’s impacting businesses and he doesn’t care about that. There was a smaller Californian ISP that was impacted by Brickerbot because its modems got infected by Mirai and then they started to search for other victims. Brickerbot found it and started to attack those modems.

Customers called saying, “Hey, I can’t access the internet.” They investigate, “Oh, your firmware is corrupted, bring in your modem. We’ll give you a new one” until so many users came in their stocks were depleted. Brickerbot also apparently impacted Argentina – millions of mobile devices. In India, 60,000 modems were corrupted. So I don’t really agree with his practices. Also, he has a number of blogs where he calls his project ‘internet chemotherapy’ [...] purging the internet from all the bad IoT devices, which is… mmm, we have a problem that’s true, but going about it like this is not the right way in my opinion.

?? ?? Pascal stressed the importance of training data at the conference. — Pascal stressed the importance of training data at the conference.

?? ?? Pascal doesn’t approve of the Janitor’s antibot methods. — Pascal doesn’t approve of the Janitor’s antibot methods.

The bot hunter

Chris Thornett has unplugged his IP camera and buried it in the bottom of his garden after meeting Pascal Geenens…

Newspapers in English

Newspapers from Australia