eBPF Ker­nel-level mon­i­tor

Mi­halis Tsouka­los shows you how to use ex­ist­ing eBPF scripts that are writ­ten in Python, then ex­plains how to de­velop your own.

Linux Format - - CONTENTS - Mi­halis Tsouka­los is a UNIX ad­min­is­tra­tor, a pro­gram­mer, a DBA and a mathematician who en­joys writ­ing ar­ti­cles and learn­ing new things. He’s the au­thor of GoSys­tems Pro­gram­ming.

Learn how to use ex­ist­ing eBPF scripts writ­ten in Python and how to de­velop your own. Mi­halis Tsouka­los is your ex­pert guide.

We’re go­ing to start mon­i­tor­ing Linux sys­tems at the ker­nel level us­ing eBPF, which comes with all re­cent Linux ker­nels, so it’s not even a third-party solution. Bear in mind that to ef­fec­tively use eBPF you’ll have to know what you’re do­ing ( that’sme­out!–Ed).

This tu­to­rial will con­cen­trate on ready-to-use eBPF com­mand line util­i­ties that you can eas­ily in­stall and mod­ify, be­cause they’re writ­ten in Python. We’ll also talk about de­vel­op­ing your own eBPF scripts for those of you who pre­fer to live dan­ger­ously!

About eBPF

EBPF stands for En­hanced Berke­ley Packet Fil­ter and is an in-ker­nel vir­tual ma­chine. It’s in­te­grated into the Linux ker­nel and can be used for Linux trac­ing. Put sim­ply, eBPF can be used for check­ing what’s go­ing on be­hind the scenes on your Linux ma­chines so that you can dis­cover and solve per­for­mance bugs or other per­for­mance is­sues.

eBPF re­quires that you have a ker­nel com­piled with the CONFIG_BPF_SYSCALL op­tion, which is au­to­mat­i­cally turned on in Ubuntu Linux. You’ll also need to have a Linux sys­tem with ker­nel 4.4 or newer, which shouldn’t be a prob­lem if your Linux sys­tem is up to date. Please note that this tu­to­rial uses Ubuntu Linux with the fol­low­ing ker­nel num­ber: $ un­ame -r-v -p 4.4.0-89-generic #112-Ubuntu SMP Mon Jul 31 19:38:41 UTC 2017 x86_64

In­stalling eBPF

On an Ubuntu Linux sys­tem, you should ex­e­cute the next com­mands with root priv­i­leges, in order to in­stall the eBPF com­mand line util­i­ties: # echo “deb [trusted=yes] https://repo.io­vi­sor.org/apt/xe­nial xe­nial-nightly main” | sudo tee /etc/apt/sources.list.d/io­vi­sor. list # sudo apt-get up­date # sudo apt-get in­stall bcc-tools

Note that eBPF tools are in­stalled in­side the /usr/share/ bcc/tools di­rec­tory. The screen­shot ( above) shows the con­tents of the /usr/share/bcc/tools di­rec­tory – as you can see there is a plethora of eBPF util­i­ties in there. All of them are plain text Python scripts that you can edit and mod­ify, pro­vided that you know what you’re do­ing.

Bear in mind that the eBPF tools are prac­ti­cal util­i­ties and the only way to learn them is by us­ing them on a daily ba­sis.

You’re now ready to start us­ing eBPF and its tools, which have been cre­ated by Bren­dan Gregg. You can find out more about the au­thor and his projects at www.bren­dan­gregg.com.

Us­ing eBPF util­i­ties

All the eBPF util­i­ties are lo­cated in­side /usr/share/bcc/ tools and they need root priv­i­leges to run. If the /usr/share/ bcc/tools di­rec­tory isn’t present in your PATH en­vi­ron­ment vari­able, you can either use the full path of a util­ity or change to that di­rec­tory us­ing the cd com­mand and ex­e­cute the util­ity from there. If you’re us­ing the bash shell and de­cide to change the PATH vari­able, then you should ex­e­cute the fol­low­ing com­mand: $ ex­port PATH=/usr/share/bcc/tools:$PATH

If you want to make this change of the PATH en­vi­ron­ment vari­able per­ma­nent, you should add the pre­vi­ous com­mand in­side ~/.bashrc or ~/.pro­file.

The eBPF tools can be di­vided into sin­gle- and mul­ti­plepur­pose tools. Sin­gle-pur­pose tools can do one thing well. In con­trast, mul­ti­ple pur­pose tools can achieve many things, but you’ll have to feed them with the right pa­ram­e­ters, which is the price you pay for their in­creased flex­i­bil­ity.

Stan­dard scripts

This sec­tion will help you un­der­stand the struc­ture and some of the Python code of the eBPF util­i­ties that can be found in /usr/share/bcc/tools. This is im­por­tant for two rea­sons: first, you’ll be able to make small changes to ex­ist­ing scripts; and sec­ond, you’ll be able to un­der­stand the gen­er­ated out­put of a script much bet­ter.

The name of the script that will be in­spected is bashread­line, which deals with the bash shell, and con­tains a very handy pre­am­ble: #!/usr/bin/python # # bashread­line Print en­tered bash com­mands from all run­ning shells. # For Linux, uses BCC, eBPF. Em­bed­ded C. # # This works by trac­ing the read­line() func­tion us­ing a uret­probe (up­robes).

So, each tool be­gins with a small synop­sis, comments that in­form you about how the tool works and a ma­jor change his­tory, which isn’t shown here. This is the first place that you should check for im­por­tant in­for­ma­tion and us­age in­struc­tions for an eBPF tool.

Af­ter the comments sec­tion, each eBPF Python script has code that uses the bcc mod­ule. This has a python front end that en­ables you to use eBPF. There­fore, strictly speak­ing, this tu­to­rial is about us­ing eBPF with the help of bcc. As you can see from the forth­com­ing code, there’s a pretty large amount of low-level stuff in bcc scripts, but the good thing is that you don’t need to fully un­der­stand the way Python talks to the ker­nel and traces C func­tions: bpf_­text = “"” #in­clude <uapi/linux/ptrace.h> struct str_t { u64 pid; char str[80]; }; ... int print­ret(struct pt_regs *ctx) { ... }; “"”

Among other things, the pre­ced­ing C code de­fines a C pro­gram that will be used later on. How­ever, the re­ally in­ter­est­ing part of the tool is the next por­tion: b.at­tach_uret­probe(name="/bin/bash”, sym="read­line”, fn_ name="print­ret")

Here, you tell eBPF to watch /bin/bash for the read­line sym­bol, which is a func­tion. When a match is found, the print­ret() func­tion – which is a C func­tion that was de­fined pre­vi­ously – will be called to cap­ture the de­sired in­for­ma­tion. The last part of the pro­gram, which isn’t shown here, deals with the pre­sen­ta­tion of the out­put.

It’s now time to see some eBPF tools in ac­tion, start­ing from open­snoop that’s ex­plained in the next sec­tion.

Use­ful eBPF util­i­ties

There are some handy and easy-to-use eBPF com­mand line util­i­ties that will make your lives much eas­ier. The first tool is called open­snoop and en­ables you to trace file opens. Ex­e­cut­ing open­snoop on a Linux ma­chine will gen­er­ate the fol­low­ing kind of out­put: PID COMM FD ERR PATH 1 sys­temd 24 0 /proc/239/cgroup 2256 post­gres 7 0 /var/run/post­gresql/9.5-main.pg_ stat_tmp/global.stat

The open sys­tem call is used for open­ing and cre­at­ing files for read­ing and writ­ing, and can tell you a lot about how a

pro­gram works be­hind the scenes. The open­snoop tool traces the open(2) sys­tem call and prints a line for each call that’s found.

Now, let’s dis­cuss the out­put of open­snoop a lit­tle more. The first col­umn is the process ID of the process that called open(2). The sec­ond col­umn dis­plays the name of the process whereas the third col­umn dis­plays the file de­scrip­tor re­turned by the open(2) call, which is an in­te­ger num­ber. The fourth col­umn is the er­ror value re­turned from open(2) – ac­cord­ing to UNIX phi­los­o­phy, an er­ror value of 0 means that there was no er­ror. The fi­nal col­umn is the path of the file used in the open(2) sys­tem call.

If you’re ex­e­cut­ing open­snoop on a busy Linux sys­tem then you’re go­ing to see lots of out­put com­ing from it, which means that if you don’t know what you’re look­ing for, you won’t be able to take ad­van­tage of the gen­er­ated out­put. Some­times, fil­ter­ing the out­put us­ing grep will make your work a lot eas­ier. An­other good idea is to fil­ter us­ing the process ID of the process that in­ter­ests you or the path of the file that you want to in­spect.

The screen­shot ( pre­vi­ous­page) shows more out­put from the open­snoop when ex­e­cuted on an idle Linux ma­chine.

An­other use­ful eBPF tool is tcp­con­nect, which en­ables you to in­spect ac­tive TCP con­nec­tions by watch­ing all con­nect(2) sys­tem calls: PID COMM IP SADDR DADDR DPORT 26149 https 4 10.0.2.15 104.199.116.191 443 26151 http 4 10.0.2.15 91.189.91.26 80

The pre­ced­ing out­put is pretty sim­ple to in­ter­pret. The first col­umn is the process ID of the process that called con­nect(2), whereas the sec­ond col­umn is the name of the process that called con­nect(2). The third col­umn spec­i­fies whether you’re us­ing either IPv4 or IPv6. The fourth and fifth col­umns are the source IP and des­ti­na­tion IP of the con­nec­tion, re­spec­tively. The fi­nal col­umn is the port num­ber of the des­ti­na­tion ad­dress. What’s re­ally im­por­tant here is that with just a sin­gle com­mand you can ob­tain an over­view your en­tire Linux ma­chine.

The tc­pac­cept util­ity traces the ac­cept(2) sys­tem call, which en­ables a process to ac­cept a con­nec­tion on a socket. As such, it’s use­ful for de­bug­ging server TCP pro­cesses. Once again, if you don’t know the way TCP/IP server ap­pli­ca­tions work, you’ll have trou­bles choos­ing the right eBPF tool and in­ter­pret­ing its out­put. Al­though net­stat can do the job of both tcp­con­nect and

tc­pac­cept, the two eBPF tools are more ver­sa­tile and they make it pos­si­ble you to watch spe­cific pro­cesses when used with the -p op­tion. Ad­di­tion­ally, both tools print events as they hap­pen, which means that you do not lose any­thing. Fi­nally, Python scripts such as tcp­con­nect and tc­pac­cept can be eas­ily mod­i­fied.

In the fol­low­ing two sec­tions of this tu­to­rial you’re go­ing to see two eBPF util­i­ties that can do more than one job, de­pend­ing on their pa­ram­e­ters.

Feel the func

The func­count util­ity is a multi-pur­pose tool that en­ables you to count ker­nel func­tion calls per-sec­ond. The rea­son that func­count is multi-pur­pose is be­cause you can choose the func­tion or func­tions that you want to trace.

Look at the fol­low­ing uses of func­count: $ sudo /usr/share/bcc/tools/func­count ‘write*’ $ sudo /usr/share/bcc/tools/func­count ‘read*’ $ sudo /usr/share/bcc/tools/func­count ‘vfs_*’

In the first ex­am­ple, you trace all write* sys­tem calls whereas in the sec­ond ex­am­ple you trace all read* sys­tem

calls. The last ex­am­ple makes it pos­si­ble to trace all vfs_* sys­tem calls.

The screen­shot ( be­lowleft) shows some of the out­put gen­er­ated by the pre­vi­ous func­count com­mands. Should you wish to trace a sin­gle process only, you can use the -p op­tion fol­lowed by the process ID of the process you want to trace.

If you try to trace too many func­tions then you’ll re­ceive the fol­low­ing types of er­ror mes­sage: $ sudo /usr/share/bcc/tools/func­count ‘*’ max­i­mum of 1000 probes al­lowed, at­tempted 42480 $ sudo /usr/share/bcc/tools/func­count ‘se*’ max­i­mum of 1000 probes al­lowed, at­tempted 1046 The bi­o­la­tency util­ity is used for show­ing the la­tency of block de­vice I/O us­ing his­tograms. So, in order to cap­ture some data, you should first ex­e­cute the next bi­o­la­tency com­mand: $ sudo /usr/share/bcc/tools/bi­o­la­tency -D 10 2

The -D op­tion in­structs bi­o­la­tency to print sep­a­rate in­for­ma­tion for each block de­vice. The first nu­meric value is the time in­ter­val for printing each sum­mary, whereas the sec­ond nu­meric value in­forms bi­o­la­tency of the to­tal num­ber of times it should col­lect in­for­ma­tion, af­ter which point bi­o­la­tency will au­to­mat­i­cally exit. There­fore, the pre­vi­ous com­mand in­structs bi­o­la­tency to prints two groups of his­tograms: the first one af­ter 10 se­conds and the sec­ond one af­ter 20 se­conds from the time you started it.

The screen­shot ( aboveleft) demon­strates some of the gen­er­ated out­put from the pre­ced­ing bi­o­la­tency com­mand when ex­e­cuted on an Ubuntu Linux sys­tem with a plethora of hard disks.

Living on the bashread­line

The last eBPF tool that’s pre­sented in this tu­to­rial is called bashread­line, and its job is to trace the read­line() func­tion of the bash shell. As a re­sult, it prints all bash com­mands from all the bash shells run­ning on a Linux sys­tem, which gives you a use­ful way to see what the users of a Linux sys­tem do be­hind your back. Its out­put is pretty easy to un­der­stand: $ sudo /usr/share/bcc/tools/bashread­line TIME PID COM­MAND 23:21:35 29418 gdf 23:22:52 29418 less /usr/share/bcc/tools/bashread­line 23:27:36 29418 ll

The first col­umn shows the time the com­mands was is­sued, the sec­ond col­umn is the process ID of the bash shell used and the third col­umn is the ac­tual com­mand. Now, let’s talk a lit­tle more about the com­mands. The first com­mand didn’t ex­ist, the sec­ond com­mand was ex­e­cuted just fine and the third com­mand is an alias – put sim­ply, bashread­line catches ev­ery­thing that was given to a bash shell with­out check­ing its va­lid­ity or re­solv­ing any aliases.

In con­clu­sion, this tu­to­rial has pre­sented some eBPF tools im­ple­mented in Python that make it pos­si­ble for you to in­spect many ar­eas of a Linux sys­tem and find out what’s go­ing on in the back­ground. How­ever, per­form­ing Linux trac­ing with eBPF is an art that needs time to master. There­fore, you should start us­ing the tools to solve sim­ple prob­lems on a test ma­chine be­fore try­ing to ap­ply your knowl­edge on a pro­duc­tion Linux sys­tem that sup­ports thou­sands of users!

Here we can see the con­tents of the /usr/share/bcc/tools di­rec­tory. This is where ready-to-use handy eBPF Python util­i­ties are in­stalled.

Here’s the out­put of open­snoop when ex­e­cuted on a not-so ac­tive Linux ma­chine. Most of the out­put is from a run­ning Post­greSQL data­base.

This screen­shot re­veals the kind of out­put that you should ex­pect from the func­count util­ity.

Here’s the out­put of the bi­o­la­tency util­ity that shows the la­tency of block de­vice I/O us­ing his­tograms. The -D op­tion prints in­for­ma­tion about each block de­vice sep­a­rately.

This screen­shot dis­plays the Python code of the myEBPF eBPF script that traces the sys_open() sys­tem call.

Newspapers in English

Newspapers from Australia

© PressReader. All rights reserved.