Linux Format

Part 2: ELK stack

Carrying on from last issue, Jolyon Brown looks at visualisin­g his captured data with Kibana and improving its quality with Logstash.

-

Last issue, I set up a simple ELK stack to capture operating system logs in a central repository. The aim of this is to assist sysadmin teams – having logs gathered in one place makes it much easier to quickly take a look at what’s going on, rather than having to SSH into every single system trying to track down a problem. It’s also very handy from an audit perspectiv­e (and indeed, it’s a prerequisi­te in a lot of secured environmen­ts). So now that we have the ability, how do we actually do anything with it? And can we do anything to improve the quality of the data we take in?

Back to the Copa… Copa Kibana

Going back to the Kibana screen I set up last month (on the Discover tab), the first thing to note is an important point. At the top right-hand side of the screen are date and time options. It can be easy to get confused with Kibana at first with no date being displayed; these options are your friends. When you’re looking to get to the bottom of an issue, being able to quickly choose the last 15 minutes or so of logs is very handy. The text search box is where we can input searches – this is a free text field, so you can enter terms like “reboot” to see which idiot rebooted your production server (me, in this case). Down the left-hand side the available fields will be shown (this sidebar can be collapsed and expanded). In my example case these will be things like beat.hostname, message, source etc. I can click on any of these – source, for example – and Kibana will display which logs in this case make up the data it knows about in the time range specified. It also shows what percentage each log contribute­s to the overall picture. I can click on the magnifying glass icon with a plus sign to drill down into these logs (and the minus to remove them from the search). Quite often tracking down an issue is a case of filtering like this and performing iterative steps. This is similar to (although a bit quicker than) the long pipelines consisting of Awk, grep and cut commands I often find myself creating when investigat­ing logs that way.

There is a whole set of syntax for searching with these fields. For anyone coming from Splunk (broadly the proprietar­y equivalent of ELK), these might seem quite verbose and obscure. For anyone with experience of Lucene (the Java search engine), they’ll seem very familiar. Rather than list out individual bits of syntax here, it’s worth spending 10 minutes or so going through the informatio­n available in the online Elasticsea­rch Query String Syntax reference resource at http://bit.ly/ElasticQue­ry. This actually shows the power available via Lucene, which can do all kinds of fuzzy, proximity and range based searching.

Visualise this..

Of course the real, pointy-haired-boss-impressing power of Kibana comes from the visualisat­ion functions available. From the obviously-named Visualise tab you can find a handy ‘Create a new visualisat­ion’ wizard. This brings up a range of charts, tables and maps which can be applied direct to whatever data is available in the Elasticsea­rch store. Clicking on one of these, choosing New Search and having an experiment with the options available is as good a way as any to get familiar with this.

Aggregatio­ns within Elasticsea­rch affect the presentati­on here; there are two listed on the left-hand side – metrics and buckets. Metrics are, as you might expect, numerical in

nature (count, average, min, max), while buckets sort data in the manner we require (for example, grouping data across a range of values).

For a quick example, I might want to know how many uses of the sudo command occurred per day over the last week. By choosing Vertical Bar Chart, New Search (ensuring that ‘last 7 days’ is chosen at the top right-hand side) and entering

sudo in the search bar, I get by default a large green square with a count up the y-axis. This is because I haven’t defined any buckets yet. By choosing ‘X-axis’, followed by ‘Date Histogram’ for the Aggregatio­n selection and ‘@timestamp/ daily’ in the following two option fields, I can get a count of how many times ‘sudo’ crops up in my available data, split by day. With some tinkering it’s possible to get some really interestin­g and useful data from these visualisat­ions. When I’m happy with the reporting I’ve got, I can save my efforts. I can then use this (and also searches I’ve saved from the Discover tab) in a dashboard.

Clicking on the Dashboard tab and creating a new dashboard, I can add visualisat­ions and searches very easily. I can also add searches, if I want to (which I suspect would be for significan­t terms only). Once I save and name my dashboard, I can amend the time and date settings as per any other screen, and also use auto-refresh (which again is available throughout Kibana, but comes in very handy on dashboards). I can share dashboards with links provided from the applicatio­n. My own efforts on this front are pretty poor and certainly not worth wasting precious ink on, but have an image search for ‘Kibana dashboard in your favourite search engine: this should whet your appetite for what is possible with this excellent tool. A pay rise is surely only a few clicks away!

There are a couple of other things to note on the Kibana front. There’s a ‘status’ page which basically shows whether everything is working as expected. It’s worth a quick look if things are not performing as you’d expect with regard to discovery etc. Kibana also provides the facility to use plugins, but I struggled to find any – the version of Kibana here (4.3) is pretty new and I believe APIs have undergone quite a lot of change, which might have discourage­d developmen­t. There are some examples on the Elastic GitHub repo (take a look at https://github.com/elastic/sense and /timelion) but I think there is an opportunit­y here for Elastic (or someone else) to establish a marketplac­e. Perhaps one does exist and I just didn’t manage to find it.

Back to the source

All this browser-based graphing makes me somewhat squeamish, so I think it’s time to retreat to the comfort of the command line. Having imported some operating system logs, I want to understand what I’d need to do to get logs from say, my web server into Elasticsea­rch as well. What steps do I need to take? It’s worth having a quick recap here about how Logstash (the ‘L’ in ‘ELK stack’) works. I could dump data directly to Elasticsea­rch from a Filebeat agent running on a client, but having Logstash in the mix enables me to take advantage of its pipelining capabiliti­es. Logstash takes input from various sources and can interpret them through the use of plugins. It can then filter this stream of data and output it – both steps have their own plugins available. In my case my output target is simply Elasticsea­rch, but there could be others. Being able to edit the stream like this is very useful. It saves my Elasticsea­rch store from being filled with redundant junk, but can also work well from an audit requiremen­t perspectiv­e.

I once worked on what might be kindly termed a ‘legacy financial platform’ – which is to say, several generation­s of developers had been and gone and the hardware was on ‘best endeavours’ type extended contracts with vendors. The applicatio­n logged bucketload­s of data out per second, much of which was used centrally in order to keep a metric of what the system was actually doing. Unfortunat­ely, mixed in with this stream was an egregious brew of non-PCI-compliant material. Getting rid of it was a real effort (which I’m glad to say did eventually succeed). Having Logstash available at that time would have made the job so much easier. I could have configured it to discard all the offending data (or mask it) in a fraction of the time it took to rewrite the logging element of the antique applicatio­n code.

I have a bunch of data from my Apache web server that I want to parse through Logstash. I can easily add another log section to /etc/filebeat/filebeat.yml – in fact there is an

Apache example section in it, which I can uncomment (including the first line, which contains just a hyphen – this denotes a new ‘prospector’ or source of data). I amended this to look as follows. I found it’s important to ensure that a

document_type directive is added. paths: - /var/log/apache/*.log type: log document_type: apache # Ignore files which are older than 24 hours ignore_older: 24h # Additional fields which can be freely defined fields: type: apache server: localhost In my Ubuntu setup the path needs to be amended to

/var/log/apache2 but apart from that the work was largely done for me. I amended my /etc/logstash/conf.d/config.

yml file to look like this: input { beats {

port => 5044 } filter { if [type] == "apache" { grok { match => { "message" => "%{COMBINEDAP­ACHELOG}" } } output { elasticsea­rch { }

The additional filter clause here checks the value from the document_type variable and then grok – which is one of the default filter plugins available in Logstash – is used to mark the ‘message’ field as an Apache combined log type. Grok has many built in patterns, which can be viewed at

http://bit.ly/grok-patterns (this code was split out from the main Logstash repo on GitHub relatively recently). Restarting both filebeat and Logstash will cause Apache logs to now get dragged into Elasticsea­rch via Logstash. I found I had to use Kibana to recreate the logstash-* index pattern via Settings/Indices in order for this to work correctly. Now

Kibana will show Apache logs with the message content broken down into its constituen­t parts. I can search on the agent type, referrer and so on.

While the data I’m working with here is pretty innocuous (although, thinking about it, this is exactly the sort of data that the government envisages being collected in the Investigat­ory Powers Bill going through Parliament at the time of writing), there might be a need to encrypt some fields (as per my earlier example). Logstash can filter that easily enough. Adding this to the ‘filter’ section of my file is all it takes: anonymize { algorithm => "SHA1" fields => ["source"] key => "encryption­keygoesher­e" }

This takes the source field and replaces it with a hash before passing it onto the Elasticsea­rch data store. In a realworld case there’d be much more likelihood of taking output from a bespoke applicatio­n log and splitting sensitive data off into its own fields. The mutate plugin can take care of this, using fairly easy-to-follow regex operations. For this and many more Logstash plugins, see http://bit.ly/logstash-plugins.

Scaling up

Next let’s suppose that my ELK stack has been such a success that more types of data are now being thrown into it. As these things have a habit of doing, it becomes business critical by stealth (that is, it gets rebooted or goes offline and generates a ton of complaints). This is generally a situation to be avoided (a subject for another column at some time in the future, perhaps). Perhaps the volume of queries is such that the system begins to get overwhelme­d. Time to get some resilience and scaling in place.

Adding extra nodes to the Elasticsea­rch level is very simple – nodes should auto-discover other nodes on the same network via multicast. Simply start one up. Nodes communicat­e with each other via port 9300. Elasticsea­rch should automatica­lly distribute data between nodes to balance data between them for resilience. Last month with a single node we didn’t bother setting a cluster name, but this and other config options can be found in /etc/ elasticsea­rch/elasticsea­rch.yml. In larger sites, some nodes can be used as ‘client’ nodes, which participat­e in the cluster but hold no data – rather they perform load balancing requests across the nodes that do.

Logstash too can be scaled horizontal­ly. An issue with larger clusters can occur when ‘back pressure’ builds up. This is the condition when the event throughput rate (for example the number of logs coming through the system) becomes greater than the ability of the Elasticsea­rch cluster to process them. This might be down to CPU or IO limitation­s (probably the latter). Queues then build up from the Logstash layer and head upwards towards the Filebeat client (which does a pretty good job of handling this). The best way to deal with this kind of issue can be through architectu­re. Thinking of Filebeat and Logstash as consisting of several functions – a shipper ( Filebeat), a collector and a processor ( Logstash) – the thing to do here is to split these into individual elements (separate VMs, say) and introduce a dedicated message queue between the collector and processor layer (such as Apache Kafka, Redis or Rabbit MQ). These can then scale horizontal­ly as well.

The ELK stack is a powerful collection of software. I hope that hard-pressed admins out there can take advantage of it and put it to good use. Good luck!

 ??  ?? These date range options are key to using Kibana, which can be quite confusing for a new user.
These date range options are key to using Kibana, which can be quite confusing for a new user.
 ??  ?? Kibana makes visualisat­ions easy. Save this, add it to a dashboard, and – BOOM – a pay rise is yours! Donations care ofLXF please.
Kibana makes visualisat­ions easy. Save this, add it to a dashboard, and – BOOM – a pay rise is yours! Donations care ofLXF please.
 ??  ?? Everyone knows that the best infrastruc­ture diagrams include at least one cloud. Ta-da!
Everyone knows that the best infrastruc­ture diagrams include at least one cloud. Ta-da!

Newspapers in English

Newspapers from Australia