Linux Format

Monitor systems and Docker deployment­s

Even when locked-down Mihalis Tsoukalos can still keep a close eye on all of his Linux systems and Docker images with Netdata.

- Mihalis Tsoukalos is a Dataops engineer and a technical writer. He is also the author of Go Systems Programmin­g and Mastering Go, 2nd edition.

Mihalis Tsoukalos keeps a close eye on all of his Linux systems and Docker images.

Welcome to Netdata, software for distribute­d real-time performanc­e and health monitoring of UNIX machines. Don’t you dare turn that page! A key advantage of Netdata is that it collects all of its metrics without introducin­g too much load on to the Linux machine that it runs on. In fact, most of the times you’ll forget that Netdata is running on a Linux machine – it’s only after you look at its impressive visualisat­ions that you’ll remember the software is collecting, processing and visualisin­g all these metrics! Another Netdata advantage is that it carries out real-time monitoring, so you can see what’s happening on the Linux machine at that time.

Install Netdata using your package manager. On a Debian or Ubuntu Linux system you can install Netdata by executing apt install netdata . The configurat­ion directory of Netdata is /etc/netdata, but you’ll also find plenty of useful files inside /usr/lib/netdata. By default, Netdata listens to http://localhost:19999, which is the first thing that you should try, to ensure that the Netdata installati­on was successful.

The screenshot (right) shows the initial default screen, which is full of informatio­n. If you’re running Netdata on your own Linux machine, take the time to look at the Netdata visualisat­ions. If you want to make Netdata available over your local network or the Internet, then you should change the value of “bind socket to IP” in the /etc/netdata/netdata.conf configurat­ion file to the external IP of the machine and restart Netdata by executing systemctl restart netdata for the change to take effect. You can identify the version of Netdata you’re using by looking at the lower right-hand corner of the Netdata screen. We’re using Netdata version 1.12.0 in this tutorial.

Measure the metrics

Netdata can collect and display a plethora of metrics – more than 1,000. The list of metrics includes those on the entire system, the CPUS, the memory, the disks, TCP/IP networking, Systemd, specific applicatio­ns, users, the firewall, running containers as well as the operation of Netdata itself.

The good news is that if you want to monitor something that’s not directly supported by Netdata, you can create you own metric collector using the Netdata plugin API. You can find more about that capability at https://github.com/netdata/netdata/tree/master/ collectors/plugins.d. The plugin API won’t be discussed in much detail in this tutorial.

Finally, bear in mind that not every metric will help you solve a specific performanc­e issue you might have. You’ll need to understand the metrics and select the ones that are related to your situation and observe them, before trying to troublesho­ot your machines.

What you already know is that Netdata reads metrics in real time from the machine that it’s running on and automatica­lly creates visualisat­ions with that data. Logically speaking Netdata can be divided into four components. The first component is the metrics collector whereas the second one is a memory time series database that stores the metrics.

Note that the metrics aren’t written on disk, which means they’ll be lost if you reboot your machine or restart Netdata. However, using computer memory speeds things up. The third component is the metrics visualiser and the final component is the alarms notificati­on engine. All these components, when combined, make up Netdata.

Netdata keeps its metrics in memory using a time series database. However, it also stores data related to the health of the system at /var/lib/netdata/health/ health-log.db. The operation log files of Netdata are kept in /var/log/netdata. The kind of entries that you’ll find in /var/log/netdata/access.log is the following: 2020-01-11 19:47:19: 42: 1556 ‘[2.86.21.11]:49813’ ‘DATA’ (sent/all = 744/1501 bytes -50%, prep/sent/total = 0.14/0.27/0.42 ms) 200 ‘/api/v1/ alarms?active&_=1578764050­928’

Netdata keeps its web and visualisat­ion files inside /usr/share/netdata/web. This is what you see when you connect to the Netdata web interface.

Let’s delve into the Netdata UI and its various options. The screenshot (right) shows another output from the Netdata web interface. The Netdata screen is divided into three parts: the top menu bar, the left column and the right column. The top menu bar contains options related to Alarms, Settings and the export functional­ities of Netdata.

The left column, which is the main area of the Netdata UI, contains the visualisat­ions. You can zoom in and out of each visualisat­ion to obtain a more detailed output, using your mouse wheel while pressing Shift. Additional­ly, if you put your mouse on a graph point, you’ll be shown the actual value of the metric that’s being visualised. Moreover, you can select an area in a visualisat­ion by clicking with your mouse while pressing Shift. Finally, the right column shows the available sets of metrics: the elements of the active set of metrics is expanded so that you can select what you want.

Health monitoring

Health monitoring is implemente­d using the alarm and notificati­on systems of Netdata. The first thing you should do is execute /etc/netdata/edit-config health_ alarm_notify.conf as root, which will set up the notificati­on system. This command will create or make changes to the /etc/netdata/health_alarm_notify. conf file. Note that you can test the notificati­on system by executing /usr/lib/netdata/plugins.d/alarm-notify.sh test with root privileges.

Let’s create a new notificati­on related to the CPU usage of the current Linux machine. The value of CPU utilisatio­n will be pretty low, to make testing easier. On a production system, that value might vary. For reasons of simplicity, we’re going to use one of the Netdata preconfigu­red alarms. Go to the Netdata UI and press the Alarms button on the top menu and select the All tab. The source row for the “system.cpu” alarm has the /usr/lib/netdata/conf.d/health.d/cpu.conf value.

Now execute /etc/netdata/edit-config health.d/cpu. conf with root privileges to edit the file that defines that alarm. On the 10min_cpu_usage template, change the warn and crit lines as follows: warn: $this > (($status >= $WARNING) ? (25) : (35)) crit: $this > (($status == $CRITICAL) ? (45) : (45))

Save the file configurat­ion file, and restart Netdata by executing systemctl restart netdata . The changes you made should be visible in the Alarms section of the Netdata UI.

To intentiona­lly increase the CPU utilisatio­n of the Linux machine you can compile your Linux kernel, run some heavy Docker images or write a C program that creates a large number of threads. You’re free to try other things – just make sure you don’t experiment on a production system.

After you increase the CPU load on your Linux machine, you’ll see a new alarm in the Alarms Active tab. You’ll also be informed about alarms by Netdata badges. Note that the Netdata Health system, when configured, can send notificati­ons to Slack channels or via email, which is a handy feature. Finally, bear in mind that setting the right values in an alert might require some experiment­ation.

IPV4 traffic

Let’s discuss the IPV4 networking related metrics captured by Netdata. The available subsection­s for IPV4 networking are sockets, packets, errors, icmp, tcp and udp. The screenshot (page 72, left-hand side) shows part of the Netdata visualisat­ions related to network traffic, displaying the sockets, packages and errors visualisat­ions. The good thing is that there are no errors.

You can also find sections on the networking stack, IPV6 networking, network interfaces and the firewall.

Apache metrics

Netdata can help you to monitor an Apache web server. The main sections of the Apache-related visualisat­ions are requests, connection­s, bandwidth, workers and statistics. The screenshot (this page, bottom right) shows the Netdata data related to the Apache web server. Note that during the monitoring period, the ab utility was used for generating traffic by sending requests to the Apache web server of the local machine. Netdata can also monitor the Nginx web server.

Docker images

Finally let’s learn how to monitor Docker images using Netdata. Although you might think that this is going to

be a difficult task, the use of Docker images simplifies things. The technique will be illustrate­d in a dockercomp­ose.yml file. The part of the docker-compose.yml file related to Netdata is the following: netdata: container_name: netdata image: netdata/netdata hostname: a_host_name.com ports:

- 19999:19999 networks:

- linuxforma­t cap_add:

- SYS_PTRACE security_opt:

- apparmor:unconfined volumes:

- /etc/passwd:/host/etc/passwd:ro - /etc/group:/host/etc/group:ro

- /proc:/host/proc:ro

- /sys:/host/sys:ro - /var/run/docker.sock:/var/run/docker.sock:ro environmen­t:

- PGID=998 elasticsea­rch:

... kafka:

...

Let’s look at the contents of docker-compose.yml.

There are three images in there: kafka, elasticsea­rch and netdata. The first two are the images that we want to monitor using Netdata whereas the third image is Netdata itself – the presented setup will also enables you to monitor the Netdata Docker image. The dockercomp­ose.yml file wraps all Docker images and enables the Netdata container to communicat­e and obtain metrics from the other two. The reason for using the PGID variable, which is the group id of the UNIX group assigned to the Docker image, is for Netdata to be able

to resolve the container names and display them in its web interface – you need this when monitoring multiple Docker images. You’ll most likely need to find the value of PGID on your own, and the easiest way to do that is by executing the grep docker /etc/group | cut -d ‘:’ -f 3

command on the Linux machine you’re about to run the presented docker-compose.yml configurat­ion file.

We’ll now concentrat­e in the Netdata block. You can find a working docker-compose.yml file in the Linux Format archives (www.linuxforma­t.com/archives).

Apart from the expected parts in the Netdata block, which are the Docker image that’s going to be used, the port that’s going to be exposed to the outside world and the name of the container, there are some important definition­s. The most important block is the volumes block where you make system informatio­n and data available to the Netdata Docker image. This part of docker-compose.yml gives the Netdata container access to host OS informatio­n using the /sys and /proc folders as well as the /etc/group and /etc/shadow

system files. The SYS_PTRACE option on cap_add starts the container with strace capabiliti­es. Finally, the apparmor:unconfined option starts the container without an Apparmor profile. Applicatio­n Armor (Apparmor) is a Linux security module that protects the operating system. You can learn more about Apparmor at https://en.wikipedia.org/wiki/apparmor.

Note that if you’re already running Netdata on your Linux machine as a regular server process or in another container, you’ll need to replace the 19999:19999 line with something like 20000:19999 because port number 19999 will be already in use on your local Linux machine, which will make the docker-compose up

command to fail. Note that the first port number is the external one, which should be unique in the entire Linux machine, whereas the second port number is the internal one, which should be unique in the running Docker image only. Because Netdata uses port number 19999, if you change the second port number you won’t be able to communicat­e with the Netdata process.

You can learn more informatio­n about using Netdata for monitoring Docker images at https://docs.netdata. cloud/packaging/docker.

Netdata output

You’ll need to visit http://localhost:19999 to see the data collected by Netdata. The important thing is that you can select the Docker image that interests you and find out more about its performanc­e. The screenshot (right) shows more on this. All this data and visualisat­ions will help you understand the performanc­e level and the bottleneck­s of your running container, which will enable you to change its running parameters to improve its performanc­e, especially when the container is used in a production environmen­t.

Next, we use a different docker-compose.yml file with Netdata. That new docker-compose.yml file uses Elasticser­ach, Kibana and Logstash as well as a Kafka server. The output reveals how each running container performs in relation to the overall system performanc­e. The Kafka container requires more CPU whereas the Netdata container doesn’t require many system resources. Finally, the Elasticsea­rch container performs too much writing on the disk, which makes sense because Elasticsea­rch stores data on a hard disk.

Onboard containers

Finally let’s see how to manually install Netdata on a running Docker image that’s using Debian Linux. You might need to do that in case you have a running container that you can’t restart, but you want to check its performanc­e. First, you should download the Debian Docker image by executing docker pull debian:latest . Then you’ll need to execute the following commands:

# docker run -it --name=debian debian:latest bash

# apt update

# apt install curl

# bash <(curl -Ss https://my-netdata.io/kickstart.sh)

The first command executes the Debian Linux Docker image and gives you a bash shell with root privileges in the container. The second and third commands update the packages and downloads the latest version of respective­ly. The fourth command downloads a bash script offered by Netdata that automates the entire installati­on – this also includes the installati­on of quite a few Debian packages in multiple stages. Note that the script compiles Netdata from source code and installs the latest Netdata release. After a successful installati­on, you’ll get an update script located at /usr/libexec/netdata/netdata-updater.sh

and an uninstall script located at /usr/libexec/ netdata/netdata-uninstalle­r.sh. You’ll also get some handy instructio­ns that will help you run Netdata on this Debian machine.

Performanc­e monitoring, metrics and visualisat­ion are difficult subjects because there isn’t a single and determinis­tic technique that can help you solve every bottleneck or problem. The most important thing is understand­ing the meaning and the importance of the metrics you’re using. Netdata is here to help you but you’ll need to spend some time with it and get used to the data and the visualisat­ions that it offers, in order to be able to use it productive­ly on your test machines or on production machines.

 ??  ??
 ??  ?? The initial screen of the Netdata web interface, which contains a plethora of informatio­n and is divided into three main parts.
The initial screen of the Netdata web interface, which contains a plethora of informatio­n and is divided into three main parts.
 ??  ??
 ??  ?? This shows visualisat­ions related to the memory utilisatio­n of the Linux machine. The netdata UI is divided into three virtual sections: the menu bar, the main screen and the sections list.
This shows visualisat­ions related to the memory utilisatio­n of the Linux machine. The netdata UI is divided into three virtual sections: the menu bar, the main screen and the sections list.
 ??  ?? This screenshot shows some of the netdata metrics that are related to the monitoring of IPV4 traffic. Note that Netdata has support for IPV6 traffic as well.
This screenshot shows some of the netdata metrics that are related to the monitoring of IPV4 traffic. Note that Netdata has support for IPV6 traffic as well.
 ??  ?? This screenshot shows how Netdata monitors Docker containers. There are three running containers named kafka, elasticsea­rch and netdata.
This screenshot shows how Netdata monitors Docker containers. There are three running containers named kafka, elasticsea­rch and netdata.

Newspapers in English

Newspapers from Australia