The ins and outs of Osquery
Intrusion detection and compliance testing are easy with SQL queries. Sounds weird? Read on and you’ll see that it’s really not…
Linux can reveal a lot about itself. As an administrator, you probably have your favourite spots in /proc or /sys. Tools such as ps or top aggregate this data to build a higher-level overview. Others, such as ip, rely on Netlink and speak to the kernel directly. And there are other places for you to explore from time to time: say, a package manager database.
Wouldn’t it be great to have a unified interface to query them all? Imagine you want to know which hosts on your network have a vulnerable software package installed, and which hosts have it running. This may seem like a straightforward task… until you realise there’s a mix of Linux distributions (and maybe even Windows) to consider. So, unification builds a solid foundation for automation.
Law of the instrument
First, you need a declarative language. You don’t tell the system how to obtain the data, you tell it which data you want. This language should also be easy to understand, yet powerful enough to express complex queries. And it should be a commodity: you don’t want to drop yet another thing into a mix. And there’s (at least) one perfect fit to this: SQL.
Osquery ( http://osquery.io), a free software tool created by Facebook, embraces and extends the idea. If all you have is a hammer, everything looks like a nail. With SQL, everything is a table. You use a familiar SQLite dialect (actually a superset of it) to obtain information on files, processes, sockets and pretty much everything else in your Linux, Windows or Mac OS X system.
The product comes in two major components: osqueryi and osqueryd. The later is a daemon that runs scheduled queries in the background and pushes logs somewhere for you to analyse. Osqueryi is an interactive tool that supports just the same query language, but runs queries in real time. It’s meant to be a testbed, but it’s also a great introspection tool.
It’s important to note that osqueryi doesn’t talk to osqueryd in any way. In other words, osqueryi isn’t a client to osqueryd. They are separate yet related tools that come as one package, often called “universal” for this very reason.
Osquery may not be in your distribution’s repositories, but the project ships binary packages for all major operating systems. So you’ll hardly need to build the tool from the sources. The installation process is described in the documentation at https://osquery.readthedocs.io, and it’s fairly straightforward. You add a remote repository and import the GPG key used to sign the packages. For RPM and DEB, osquery claims to support any Linux since 2011. We can’t speak for everyone, but Kubuntu 16.04 LTS box is covered.
Carefully selected
It’s query time! Osqueryi should work best for practice purposes. So fire up a terminal and run: $ osqueryi
Osqueryi builds on the SQLite interactive shell, so if you’ve ever used SQLite, it should feel like a home. Just keep in mind a few things. First, SELECTs only, please! Other verbs such as UPDATE or DELETE yield an error. Indeed, it would be odd to delete an open socket or a USB device, wouldn’t it? Then there are some tables that can’t be queried without a WHERE clause; osquery calls them “tables with arguments”. This makes sense for tables such as “hash” ( seeoverthepage): without a path, what do you expect osquery to hash? Wherever the column is required, you’ll see a pin next to it in the Schema page ( https://osquery.io/schema). Note that you may also obtain the table’s schema with the
“.schema” command from within the interactive shell, as you would do in SQLite. Similarly, “.tables” list all tables osquery supports in your system.
We’re good to go now. For starters, this is an equivalent of the ps ax command: select * from processes;
It spits out many rows, so narrowing the query is a good idea. In the command line, perhaps you would use pgrep. With osquery, you would carry out the following: select pid from processes where name like ‘%bash%’
Pgrep matches against process names by default, yet you can use switches to search command lines or process IDs instead. You can do the same with osquery as well: just specify the appropriate column. Note, however, that pgrep accepts regular expressions and there appears to be no easy way to do something similar with osquery. This is even more surprising given the fact that osquery provides the regex_ split() function to split arbitrary data using a regex pattern.
Okay, those two were easy ones. How about looking for SUID binaries in your filesystem? You can achieve this with a reasonably sophisticated find command; I can’t easily think of what you can’t do with find! But for osquery, it’s just another SELECT: select count(*) from suid_bin; This yields 31 binaries in our system. In the shell, you use pipes to glue commands together. In SQL, you have JOINs. We find it particularly annoying to grep netstat for connections that a particular process created. Here’s an alternative: select process_open_sockets.* from process_open_sockets
join processes using (pid) where name = ‘dnsmasq’;
Remember that osqueryi operates on live OS data. While the command doesn’t require root privileges to run, it needs them to fill the “pid” and “fd” columns from /proc. Otherwise, you get humble -1 in them, making JOINs impossible.
Another typical use case for JOIN is a checksum calculation. The “hash” table does just this: you SELECT a row with the given path and get md5, sha1 and sha256 for the filesystem object: select suid_bin.path, hash.sha256 from hash join suid_bin using (path);
Save the output somewhere sensible, and you’ll now have a good indicator if anything sensitive in your filesystem changes unexpectedly.
Getting trickier
Now, imagine you wanted to check if wpasupplicant package in your 16.04 LTS box has received a patch against Krack. Here you go: select * from deb_packages where name in ('wpasupplicant’, ‘hostapd') and version >= ‘2.4-0ubuntu6.2’;
RPM and DEB packages may have different names so there’s no generic “packages” table. Moreover, from osqueryi’s perspective, version is just a string: it doesn’t know anything about Debian Policy Manual and version numbering. Keep this in mind when comparing versions.
Packages aside, you can also query browsers for installed add-ons. This could seem surprising, but since malware sometimes disguises itself as an add-on, it makes sense. Mozilla Firefox, Google Chrome and Opera are all supported on Linux, but you need to supply the UID for the user whose profile you want to inspect. Typically you do it via a JOIN with the “users” table: select * from firefox_addons join users using (uid) where username = ‘val’;
Even things as ephemeral as events are tables in osquery! Linux provides many events sources: udev, inotify, syslog and the auditing subsystem, to name a few. Given their dynamic nature, some preparations are needed to attach them to osquery.
Osquery disables event sources by default: check this with ".features” command. You’ll need to respawn the tool with --disable_events=false to fix that. Moreover, for filesystem events (inotify) you’ll need to tell osquery locations to monitor. There is no way to do it in the command line, so create a configuration file (/ etc/osquery/osquery.conf) and make it look like this: {"file_paths": {"home": ["/home/*/"]}}
home is just a marker, so it can be anything meaningful. Both shell-style (*) and SQL-style globs (%) are recognised. Now, start osqueryi like this:
$ osqueryi --disable_events=false --config_path=/etc/osquery/ osquery.conf
Do something in your home directory, then SELECT from the “file_events” table. You should see some events under the “home” category you configured. Note that due to the way inotify works, files in directories below /home won’t be monitored. If this is not what you really meant, use a recursive pattern, such as /home/%%, instead.
Adding support for audit events is a bit trickier. First, you’ll need to stop the auditd daemon if you have it running. After that you can start osqueryi with: $ sudo osqueryi --disable_events=false --disable_audit=false --audit_allow_config=true --verbose --audit_debug
First, you’ll need sudo, because talking to the audit framework generally requires root privileges. Then you enable both eventing and audit, and tell osqueryi that it can change the audit rules. This is required so the tool can install its own rules to listen to the events of interest.
Two final flags help you to see what goes on behind the curtain, including raw events osqueryi receives. You may notice it hooks to execve() syscall to fill the “process_events” table. User events such as authentication attempts go to the “user_events” table. Keep osqueri running for some time, and you’ll be surprised how many activities occur in your Linux box unnoticed.
You may think that configuring event sources in osqueryi is not straightforward, and you’re probably right. The reason is they were meant to be used with another osquery component: osqueryd. Let’s cover it briefly.
Daemon stuff
Osqueryd is a daemon. It sits in the background, executing your scheduled queries and then sending them somewhere. The idea is that you deploy it on the hosts, be they servers or desktops, and gather the information network-wide. If a careless colleague open a malicious email attachment, you’ll notice a suspicious connection, hopefully before it does much harm. Sure, running something that rings back to the home on end users’ laptops rises privacy concerns, but that’s outside the osquery scope. Remember that osqueryd is installed along with osqueryi. However, it’s not enabled by default. Before you do this, you’ll need to create a configuration file. /usr/share/osquery/ osquery.example.conf is a good starting point. It’s just JSON, so copy it to /etc/osquery/osquery.conf, open in your favourite editor ( vim) and alter it to suit.
The configuration file conveys daemon options such as where to get the configuration beyond the initial one and where to store the results. By default, osqueryd reads /etc/ osquery/osquery.conf and everything under /etc/osquery/ osquery.conf.d/, but it can also fetch remote JSON over HTTPS. This is how you can make a query schedule, which is the bulk of the osqueryd configuration.
In a nutshell, a schedule item is just an SQL query and an interval telling how often you want it to execute. There are also query packs that act as higher-level aggregates. Osquery comes with several packs bundled, and you can also make your own ones if needed. Packs are smart enough to detect if they should run on a given host using discovery queries, and their output goes to results log, as with usual queries.
The default location for results log is /var/log/osquery/ osqueryd.results.log but they can also go to a HTTP handler, syslog or a Kafka topic. Logs entries are also JSON, which makes it easier to extract a structured data from them. Typically, you don’t do it yourself but rather feed osquery logs to Kibana or something similar. Result logs are differential: they record changes only. There are also snapshot logs that contain the whole thing: they are larger and typically stored separately.
Osquery is somewhat similar to Gobolinux ( www. gobolinux.org) in that it takes a fresh look at well-known objects. You may like it, you may not, but it’s still useful to know that such an option exists. Someday it may save you from having to reinvent the wheel.