Linux Format

eBPF Kernel-level monitor

Mihalis Tsoukalos shows you how to use existing eBPF scripts that are written in Python, then explains how to develop your own.

- Mihalis Tsoukalos is a UNIX administra­tor, a programmer, a DBA and a mathematic­ian who enjoys writing articles and learning new things. He’s the author of GoSystems Programmin­g.

Learn how to use existing eBPF scripts written in Python and how to develop your own. Mihalis Tsoukalos is your expert guide.

We’re going to start monitoring Linux systems at the kernel level using eBPF, which comes with all recent Linux kernels, so it’s not even a third-party solution. Bear in mind that to effectivel­y use eBPF you’ll have to know what you’re doing ( that’smeout!–Ed).

This tutorial will concentrat­e on ready-to-use eBPF command line utilities that you can easily install and modify, because they’re written in Python. We’ll also talk about developing your own eBPF scripts for those of you who prefer to live dangerousl­y!

About eBPF

EBPF stands for Enhanced Berkeley Packet Filter and is an in-kernel virtual machine. It’s integrated into the Linux kernel and can be used for Linux tracing. Put simply, eBPF can be used for checking what’s going on behind the scenes on your Linux machines so that you can discover and solve performanc­e bugs or other performanc­e issues.

eBPF requires that you have a kernel compiled with the CONFIG_BPF_SYSCALL option, which is automatica­lly turned on in Ubuntu Linux. You’ll also need to have a Linux system with kernel 4.4 or newer, which shouldn’t be a problem if your Linux system is up to date. Please note that this tutorial uses Ubuntu Linux with the following kernel number: $ uname -r-v -p 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017 x86_64

Installing eBPF

On an Ubuntu Linux system, you should execute the next commands with root privileges, in order to install the eBPF command line utilities: # echo “deb [trusted=yes] https://repo.iovisor.org/apt/xenial xenial-nightly main” | sudo tee /etc/apt/sources.list.d/iovisor. list # sudo apt-get update # sudo apt-get install bcc-tools

Note that eBPF tools are installed inside the /usr/share/ bcc/tools directory. The screenshot ( above) shows the contents of the /usr/share/bcc/tools directory – as you can see there is a plethora of eBPF utilities in there. All of them are plain text Python scripts that you can edit and modify, provided that you know what you’re doing.

Bear in mind that the eBPF tools are practical utilities and the only way to learn them is by using them on a daily basis.

You’re now ready to start using eBPF and its tools, which have been created by Brendan Gregg. You can find out more about the author and his projects at www.brendangre­gg.com.

Using eBPF utilities

All the eBPF utilities are located inside /usr/share/bcc/ tools and they need root privileges to run. If the /usr/share/ bcc/tools directory isn’t present in your PATH environmen­t variable, you can either use the full path of a utility or change to that directory using the cd command and execute the utility from there. If you’re using the bash shell and decide to change the PATH variable, then you should execute the following command: $ export PATH=/usr/share/bcc/tools:$PATH

If you want to make this change of the PATH environmen­t variable permanent, you should add the previous command inside ~/.bashrc or ~/.profile.

The eBPF tools can be divided into single- and multiplepu­rpose tools. Single-purpose tools can do one thing well. In contrast, multiple purpose tools can achieve many things, but you’ll have to feed them with the right parameters, which is the price you pay for their increased flexibilit­y.

Standard scripts

This section will help you understand the structure and some of the Python code of the eBPF utilities that can be found in /usr/share/bcc/tools. This is important for two reasons: first, you’ll be able to make small changes to existing scripts; and second, you’ll be able to understand the generated output of a script much better.

The name of the script that will be inspected is bashreadli­ne, which deals with the bash shell, and contains a very handy preamble: #!/usr/bin/python # # bashreadli­ne Print entered bash commands from all running shells. # For Linux, uses BCC, eBPF. Embedded C. # # This works by tracing the readline() function using a uretprobe (uprobes).

So, each tool begins with a small synopsis, comments that inform you about how the tool works and a major change history, which isn’t shown here. This is the first place that you should check for important informatio­n and usage instructio­ns for an eBPF tool.

After the comments section, each eBPF Python script has code that uses the bcc module. This has a python front end that enables you to use eBPF. Therefore, strictly speaking, this tutorial is about using eBPF with the help of bcc. As you can see from the forthcomin­g code, there’s a pretty large amount of low-level stuff in bcc scripts, but the good thing is that you don’t need to fully understand the way Python talks to the kernel and traces C functions: bpf_text = “"” #include <uapi/linux/ptrace.h> struct str_t { u64 pid; char str[80]; }; ... int printret(struct pt_regs *ctx) { ... }; “"”

Among other things, the preceding C code defines a C program that will be used later on. However, the really interestin­g part of the tool is the next portion: b.attach_uretprobe(name="/bin/bash”, sym="readline”, fn_ name="printret")

Here, you tell eBPF to watch /bin/bash for the readline symbol, which is a function. When a match is found, the printret() function – which is a C function that was defined previously – will be called to capture the desired informatio­n. The last part of the program, which isn’t shown here, deals with the presentati­on of the output.

It’s now time to see some eBPF tools in action, starting from opensnoop that’s explained in the next section.

Useful eBPF utilities

There are some handy and easy-to-use eBPF command line utilities that will make your lives much easier. The first tool is called opensnoop and enables you to trace file opens. Executing opensnoop on a Linux machine will generate the following kind of output: PID COMM FD ERR PATH 1 systemd 24 0 /proc/239/cgroup 2256 postgres 7 0 /var/run/postgresql/9.5-main.pg_ stat_tmp/global.stat

The open system call is used for opening and creating files for reading and writing, and can tell you a lot about how a

program works behind the scenes. The opensnoop tool traces the open(2) system call and prints a line for each call that’s found.

Now, let’s discuss the output of opensnoop a little more. The first column is the process ID of the process that called open(2). The second column displays the name of the process whereas the third column displays the file descriptor returned by the open(2) call, which is an integer number. The fourth column is the error value returned from open(2) – according to UNIX philosophy, an error value of 0 means that there was no error. The final column is the path of the file used in the open(2) system call.

If you’re executing opensnoop on a busy Linux system then you’re going to see lots of output coming from it, which means that if you don’t know what you’re looking for, you won’t be able to take advantage of the generated output. Sometimes, filtering the output using grep will make your work a lot easier. Another good idea is to filter using the process ID of the process that interests you or the path of the file that you want to inspect.

The screenshot ( previouspa­ge) shows more output from the opensnoop when executed on an idle Linux machine.

Another useful eBPF tool is tcpconnect, which enables you to inspect active TCP connection­s by watching all connect(2) system calls: PID COMM IP SADDR DADDR DPORT 26149 https 4 10.0.2.15 104.199.116.191 443 26151 http 4 10.0.2.15 91.189.91.26 80

The preceding output is pretty simple to interpret. The first column is the process ID of the process that called connect(2), whereas the second column is the name of the process that called connect(2). The third column specifies whether you’re using either IPv4 or IPv6. The fourth and fifth columns are the source IP and destinatio­n IP of the connection, respective­ly. The final column is the port number of the destinatio­n address. What’s really important here is that with just a single command you can obtain an overview your entire Linux machine.

The tcpaccept utility traces the accept(2) system call, which enables a process to accept a connection on a socket. As such, it’s useful for debugging server TCP processes. Once again, if you don’t know the way TCP/IP server applicatio­ns work, you’ll have troubles choosing the right eBPF tool and interpreti­ng its output. Although netstat can do the job of both tcpconnect and

tcpaccept, the two eBPF tools are more versatile and they make it possible you to watch specific processes when used with the -p option. Additional­ly, both tools print events as they happen, which means that you do not lose anything. Finally, Python scripts such as tcpconnect and tcpaccept can be easily modified.

In the following two sections of this tutorial you’re going to see two eBPF utilities that can do more than one job, depending on their parameters.

Feel the func

The funccount utility is a multi-purpose tool that enables you to count kernel function calls per-second. The reason that funccount is multi-purpose is because you can choose the function or functions that you want to trace.

Look at the following uses of funccount: $ sudo /usr/share/bcc/tools/funccount ‘write*’ $ sudo /usr/share/bcc/tools/funccount ‘read*’ $ sudo /usr/share/bcc/tools/funccount ‘vfs_*’

In the first example, you trace all write* system calls whereas in the second example you trace all read* system

calls. The last example makes it possible to trace all vfs_* system calls.

The screenshot ( belowleft) shows some of the output generated by the previous funccount commands. Should you wish to trace a single process only, you can use the -p option followed by the process ID of the process you want to trace.

If you try to trace too many functions then you’ll receive the following types of error message: $ sudo /usr/share/bcc/tools/funccount ‘*’ maximum of 1000 probes allowed, attempted 42480 $ sudo /usr/share/bcc/tools/funccount ‘se*’ maximum of 1000 probes allowed, attempted 1046 The biolatency utility is used for showing the latency of block device I/O using histograms. So, in order to capture some data, you should first execute the next biolatency command: $ sudo /usr/share/bcc/tools/biolatency -D 10 2

The -D option instructs biolatency to print separate informatio­n for each block device. The first numeric value is the time interval for printing each summary, whereas the second numeric value informs biolatency of the total number of times it should collect informatio­n, after which point biolatency will automatica­lly exit. Therefore, the previous command instructs biolatency to prints two groups of histograms: the first one after 10 seconds and the second one after 20 seconds from the time you started it.

The screenshot ( aboveleft) demonstrat­es some of the generated output from the preceding biolatency command when executed on an Ubuntu Linux system with a plethora of hard disks.

Living on the bashreadli­ne

The last eBPF tool that’s presented in this tutorial is called bashreadli­ne, and its job is to trace the readline() function of the bash shell. As a result, it prints all bash commands from all the bash shells running on a Linux system, which gives you a useful way to see what the users of a Linux system do behind your back. Its output is pretty easy to understand: $ sudo /usr/share/bcc/tools/bashreadli­ne TIME PID COMMAND 23:21:35 29418 gdf 23:22:52 29418 less /usr/share/bcc/tools/bashreadli­ne 23:27:36 29418 ll

The first column shows the time the commands was issued, the second column is the process ID of the bash shell used and the third column is the actual command. Now, let’s talk a little more about the commands. The first command didn’t exist, the second command was executed just fine and the third command is an alias – put simply, bashreadli­ne catches everything that was given to a bash shell without checking its validity or resolving any aliases.

In conclusion, this tutorial has presented some eBPF tools implemente­d in Python that make it possible for you to inspect many areas of a Linux system and find out what’s going on in the background. However, performing Linux tracing with eBPF is an art that needs time to master. Therefore, you should start using the tools to solve simple problems on a test machine before trying to apply your knowledge on a production Linux system that supports thousands of users!

 ??  ?? This screenshot reveals the kind of output that you should expect from the funccount utility.
This screenshot reveals the kind of output that you should expect from the funccount utility.
 ??  ?? Here’s the output of the biolatency utility that shows the latency of block device I/O using histograms. The -D option prints informatio­n about each block device separately.
Here’s the output of the biolatency utility that shows the latency of block device I/O using histograms. The -D option prints informatio­n about each block device separately.
 ??  ?? Here’s the output of opensnoop when executed on a not-so active Linux machine. Most of the output is from a running PostgreSQL database.
Here’s the output of opensnoop when executed on a not-so active Linux machine. Most of the output is from a running PostgreSQL database.
 ??  ?? Here we can see the contents of the /usr/share/bcc/tools directory. This is where ready-to-use handy eBPF Python utilities are installed.
Here we can see the contents of the /usr/share/bcc/tools directory. This is where ready-to-use handy eBPF Python utilities are installed.
 ??  ??
 ??  ??
 ??  ?? This screenshot displays the Python code of the myEBPF eBPF script that traces the sys_open() system call.
This screenshot displays the Python code of the myEBPF eBPF script that traces the sys_open() system call.

Newspapers in English

Newspapers from Australia