Linux Format

Package check

Jonni Bidwell meets packageclo­ud founder Joe Damato. Cue package management illusions shattered…

-

Joe Damato is a lowlevel computolog­ist Interview and the creator of packageclo­ud.io, a site that offers free and enterprise repository hosting for packages of all formats, shapes and sizes. They range from the .debs and .rpms we’re all familiar with, to more exotic things like Maven repos. It can talk to all manner of build/orchestrat­ion/ continuous integratio­n systems, including but not limited to Chef, Puppet, Jenkins and TravisCI, and generally takes the pain out of software distributi­on.

We caught up with Joe at OSCON Europe 2016 in London, where he gave a fascinatin­g talk entitled “Infrastruc­ture as code might be literally impossible”, in which he talked about the myth of high-level code and all the terrible things that go on behind the scenes when ostensibly nice-looking code runs. He also regaled us with some epic failures on well-known package managers and their poor implementa­tion of GPG.

Linux Format: What does a low-level computolog­ist do?

Joe Damato: Before starting packageclo­ud I was mostly a systems programmer. So I worked on device drivers, debuggers, kernels, stuff like that. I actually still write about a lot of that stuff on the packageclo­ud blog. And there’s older stuff on my personal blog, timetoblee­d.com, but that hasn’t been updated in a while. I had a friend that said computing should be more of an -ology sort of subject, so that’s where computolog­y came from. It’s actually the name of packageclo­ud’s parent company.

LXF: How did you get into computers, programmin­g and all these other bad habits?

JD: I guess I learned to program in high school, maybe a little earlier. Neither of my parents were really into computers or software. My dad worked at a recycling plant in New Jersey; someone was throwing out an old Apple IIe clone, which he duly rescued. That was the first time I ever saw a computer. Whoever had thrown out the machine had thrown it out with a BASIC programmin­g for kids book. So I spent some time copying out program listings from there, I didn’t really understand what each line of code did, but it was nice to see the end result. Most of it was simple games so there was a fun aspect to it as well.

LXF: We still have code listings in our magazines. I daresay people still copy them in without fully understand­ing things. It can be a double-edged sword, but it’s probably a better way of learning than just copying and pasting from the web.

JD: On one level it seems a little strange in this day and age to still do it. But for a lot of people, especially younger people learning to code, it’s still a good way to get into it.

LXF: And eventually you went on to found packageclo­ud, a site where people can build their own repositori­es and house their applicatio­ns in any and all packaging formats imaginable. Why did you set this up? JD: I started packageclo­ud because a lot of my previous jobs involved distributi­ng software to customers. Typically they would be buying something like an agent that would run on their servers. The problem is that you always need a way to set up a build pipeline. So you push some changes, then compile this agent for all the different versions of CentOS or Ubuntu or Debian. Those builds would then end up in a repository that customers could access and install. Basically I ended up rebuilding that pipeline many times, and every time it was necessary to deal with authentica­tion and revoking access when people stopped paying.

After rebuilding that three or four times I thought, “There’s gotta be a better way to do this” and that’s why I built packageclo­ud, because of these frustratio­ns with the lack of tooling for doing this kind of thing automatica­lly.

LXF: Packageclo­ud can host all kinds of packages, not just the .debs and .rpms that people work with on the distro level, but things like Python wheels/eggs, RubyGems and Java/Android archives as well. How do you handle these things in a unified manner?

JD: Well, on that level you can’t really treat everything the same. But the good news is that tooling for all of these packaging systems is open source and well documented. So by studying that you can figure out how the repositori­es are supposed to be set up, and how the metadata’s supposed to be generated. Each packaging system has its quirks, but they all have a lot in common, too. Generally, most of them have basic properties like names, versions, dependenci­es, stuff like that. So there’s a nice layer of abstractio­n you can apply across the top on the more generic goods.

LXF: You just gave a talk entitled “Infrastruc­ture as code might be literally

SAME, SA ME BUT DIFFERENT “Each packaging system has its quirks, but they all have a lot in common, too.”

impossible”, and in that talk you highlighte­d a few scary things. One of those things was distro package managers not doing a very good job of checking GPG signatures. Could you explain that in a little more detail?

JD: This was an issue with pygpgme on earlier CentOS systems, I think it’s fixed in CentOS 7 and later versions of CentOS 6. For a while though, the yum package itself didn’t depend on pygpgme which meant that people that had yum installed wouldn’t necessaril­y have the facilities to verify GPG signatures on their system. Essentiall­y, the result was that anyone installing a GPG-signed package wouldn’t be aware that this verificati­on wasn’t happening.

LXF: Oh dear. Well that’s a little bit unfortunat­e, especially as Linux people tend to be very proud of the whole package management ideology as opposed to downloadin­g and installing arbitrary binaries from the web. How can we assuage wary package users, besides getting them to do everything on packageclo­ud?

JD: There are a lot of aspects to this question, I think. A lot of folks are doing work on something called The Update Framework ( https://theupdatef­ramework.github.io), which tries to solve a lot of the problems that existing packaging systems have with securely distributi­ng software and updates. I think part of it might be that you have a lot of people reinventin­g, or trying to reinvent, the same thing but applying it to different tools or libraries or programmin­g languages. People will rebuild the same things over and over again and make the same mistakes each time. Part of that might be due to a lack of research, or a lack of initiative to sit down and map out what it takes to do this.

I think that The Update Framework, at least as far as I know, is one of the more complete, establishe­d pictures of how someone can distribute software securely. I, too, used to be one of those Linux users that used to praise the package manager way of doing things, I thought Apt and Yum were fine. But then I read this paper – I think it was some people at the University of Arizona – where they illustrate­d all these vulnerabil­ities with package managers. It was quite scary to read and that’s more or less what motivated me to include that in the talk.

LXF: In your talk you also touched on the issue of, for example, people programmin­g in Python: thinking that it’ll take care of all the gory low-level stuff for you, but sometimes it doesn’t, so all kinds of nasty bugs or bottleneck­s arise at scale. Does this mean that budding programmer­s should always defer to greybeards if they’re thinking about doing something that might grow?

JD: If you’re talking about starting out and building a new project, then I would definitely be the first advocate for doing the simplest things first and worry about optimisati­ons much later on. But as you get to that part later on, to actually be able to write code that works well and works really fast and has the performanc­e characteri­stics that you want, you have to actually know your operating environmen­t really, really well. To the point where you would actually have to be a senior systems programmer to have the required understand­ing of how the underlying operating system works.

I think a lot of people have complained over the years about X language being slow, where X is Ruby or Python or whatever. But it turns out the if you understand how the virtual machine or interprete­r works, and how to write code that works really well with them, then you can get relatively performant code. It just requires this deep understand­ing of how things fit together.

LXF: Lately, there’s been something of a shift towards containers, orchestrat­ion and microservi­ces. These promise all kinds of benefits, but are also pretty complicate­d (I still don’t understand them). Do you think these will bring about new problems as well as solving old ones?

JD: My first real job out of college was at VMware, where I worked on the ESX hypervisor. Virtualisa­tion and containeri­sation are different, but hopefully the point I’m going to make carries over. Basically, my thoughts on virtualisa­tion are that you just have to be really careful because you’re adding a lot more software to the system stack which, generally speaking, is the most difficult to understand, debug and add code to. Ultimately, you’re layering multiple operating systems on top of each other, so you have to deal with bugs not only in the hypervisor, but also in the underlying operating systems and how they interact with the hypervisor.

I think the same can probably be said of containeri­sation. You’re adding more code to the part of the stack that’s the most difficult for most people to understand and debug. Whether or not the trade-offs there actually make sense depends on a case-by-case basis, so this is something that you as a software engineer or whatever

have to assess.

I was at a conference recently where some people were discussing the fact that a lot of people are using containers for developmen­t, but very few are using them in production – relatively speaking anyway. Most of the places I’ve worked aren’t really using them in production yet. Maybe that’ll pick up in the future. But in my mind anyway, I find it hard to get away from this idea that you’re adding to the most complicate­d layer, and there are necessaril­y trade-offs that go with that.

LXF: You talked about removing some of these layers, but where do we start with that? Are the layers even well-defined at the moment?

JD: I think there’s a lot of interestin­g research going on at the moment with unikernels. I have no idea what the viability of using them will be, or if there’s any systems that are out there that enable you to take advantage of unikernels now, for example to build a production system on. But from a theoretica­l perspectiv­e they sound really interestin­g and perhaps some good work will come out of that, which will allow us to simplify our system stacks.

LXF: Can you tell our readers about the bug with bento’s Vagrant images that you mentioned in your talk? If I understand correctly, it meant that people using one of their CentOS images ended up with a machine that didn’t trust Amazon. That’s the kind of bug LXF can get behind.

JD: Sure. There was a virtual machine image that was created, I believe, by a bunch of scripts. In the process they were updating the CA certificat­es bundle directly from the main host – curl.haxx.se, which, by the way, everyone should remember their list of trusted CAs comes from. The issue was that the folks who run that service had rerun the script to regenerate that bundle, but for whatever reason, maybe a bug or something [ itwastodo withMozill­amarkingth­ecertifica­teasweak,see https://blog.chef.io/2015/02/26/bentobox-update-for-centos-and-fedora], the Amazon Web Services CA was removed from it. The result was that there was a virtual machine image floating around that was unable to talk to AWS or S3 at all. That’s a big problem for most people because the first thing that you’d think is not that your SSL certificat­es are broken, instead you’d start debugging your AWS library, your applicatio­n code, your networking only to find out much later and after many tears where the problem actually lies.

LXF: Just to be clear, this didn’t affect the ca-certificat­es bundle that most readers will have installed on their machines?

JD: Right. This would only affect you if you were using this particular image or pulling directly from https://curl. haxx.se. Otherwise the certificat­e bundle you have probably doesn’t change so frequently, so there’s less room for error.

LXF: You also showed that not only is it possible for someone to set up a malicious repository, but it’s also possible to get such a rogue repository accepted as an official mirror. Are there safeguards in place to stop other people from doing this again? JD: I don’t really know. I think that’d be a good experiment to repeat, and to repeat it with other packaging systems as well. Again, this all came from that University of Arizona paper [ see https://www2.cs.arizona.edu/stork/ packageman­agersecuri­ty/papers.html] I mentioned and I don’t know if people have mitigated for the shortcomin­gs it pointed out. But definitely a very surprising result, given that users tend to take the security of these things for granted.

One example I didn’t include in the talk is that when people think about building a website, for example, and they need to store lists of usernames and passwords. All the advice warns you against rolling your own encryption or re-inventing the wheel. I would use that same argument against storing and distributi­ng software.

Encryption and verificati­on are difficult to do correctly, and history has shown that respected projects have had difficulti­es here: the security issues I talked about, replay attacks, DDoS attacks have all been published. So maybe those people working on the package management side of things should look to things like TUF or other research in that field instead of trying to reinvent it themselves. Because it’s difficult to get right.

“Do the simplest things first and then worry about optimisati­ons much later on.”

 ??  ??
 ??  ??
 ??  ??
 ??  ??
 ??  ??

Newspapers in English

Newspapers from Australia