OSTree via Git
From web development to the Linux kernel, Git is pretty much everywhere. What if we take it to our root filesystems?
Do you remember the premise of The Joel Test ( www.joelonsoftware.com/2000/08/09/thejoel-test-12-steps-to-better-code)? Back in 2000, it was a good measure to rank developer culture in your company, among others. The first question goes: “Do you have a source control [system]?”. Eighteen years later, with Github and friends everywhere, perhaps no one sane would start a software project without a VCS such as Git. Why not extend this practice to OS filesystem trees?
Arguably, the most important feature a VCS provides is the ability to “rewind the time”. If you break the code, you check out the previous version and move forward. Speaking in OS-level terms, if the latest update messes up the system, then you just roll it back and continue. The idea isn’t new: snapshots do it already, but they’re either a bit too coarse (LVM) or need filesystem support (btrfs). On the other hand, tools such as ( https://etckeeper. branchable.com) etckeeper are filesystem-agnostic but have limited scope (they can only handle config files).
OSTree ( http://ostree.readthedocs.io) tries to wear all the hats. Describing itself as a “Git for operating systems binaries”, it’s really a content-addressable filesystem that runs on top of ext4, btrfs or anything else, and atomically switches the whole root filesystem trees. It integrates with bootloaders so you can choose which tree to boot, and package managers, which you use to build those trees. It also speaks the File Hierarchy Standard (FHS) and suggests that you keep all OS binaries in /usr (but doesn’t enforce it). /var is shared, so you can preserve the state such as databases or custom applications across filesystems. For /etc, a traditional three-way merge is performed when you switch trees. This means that the configuration is always current.
Git revisited
What is a three-way merge, you ask? To answer this question (and to better understand OSTree), let’s have a quick Git recap. The Git repository is actually a bunch of files found under the .git directory in your working copy. Files in .git are named after SHA-1 hashes of the contents – that’s why it is dubbed “content-addressable”. An object’s contents give you a key to Git database, so you store each object only once.
Git operates several object types. There are blobs, which are files contents. There are trees that are almost directories, contents and point to blobs. And there are commits that tie trees and metadata such as the author’s name or commit message together.
Commits are chained parents to children. There are also references, or “refs” in short, which are more or less symbolic names for commit hashes. When you create a branch, a new chain of commits is formed and a reference is created to track its end. When you merge two branches, you combine both chains and make two refs point to the same commit. It’s fun to play with Git at this level, and if you feel like it, Chapter 10 in the free Pro Git book ( https://git-scm.com/ book/en/v2) contains all you need to know.
Back to OSTree. It’s very similar to Git, but not identical. Just as Git, OSTree has a notion of a repository, which is just a directory to store objects. There’s one system-level
repository residing in /ostree/repo. For any other location, you’ll need to provide a --repo switch to the tools, or set an $OSTREE_REPO environment variable. Then, OSTree uses SHA256 hashes, these are safer (as collisions were found in SHA1 back in 2014) and longer. Meanwhile, OSTree doesn’t support abbreviations (such as 5fe1c78 for 5fe1c78faf2b430ab937db1cfaf9f3e16592aca3) so you always have to type hashes in full. There are also branches, but no merges. Where Git sports a tricky revisions minilanguage (see git-rev-parse(1)), OSTree only understands carets (^) that refer to the previous commit.
Object types in OSTree are also similar to Git. Both have commits and content (blob) objects. What is called a tree object in Git is split between dirtree and dirmeta in OSTree. Dirtree stores filename to hash mapping while dirmeta contains associated metadata, such as UID and GID. The reason is, being Git for operating system binaries, OSTree needs to store more metadata. It does it separately for efficiency reasons: if many files share the same extended attributes list (which is often the case), it won’t be duplicated.
Storing blobs
Repository format is also a bit different. In Git, blobs are stored compressed. In OSTree, it depends. For so-called “bare” repositories, files are stored uncompressed, and OSTree “checkouts” them via hard links. “Archive” repositories store compressed files and static deltas which is useful to serve OS images over HTTP. There are a few other storage formats as well, but they are variations of the above two for the most part.
And of course, OSTree provides its own tool, dubbed ostree that – you guessed it – manages these repositories. This is where the fun begins, and what we’re going to discuss next. Before we start, make sure you have OSTree installed.
OSTree is a GNOME/Red Hat-backed project, so those on Fedora or CentOS probably win here. If it’s not installed, don’t worry: OSTree should still be in your distribution’s repositories. For Ubuntu, you’ll need 16.10 and above. For the latest and freshest you can compile it from sources. Yet we encountered dependencies issues on this route, which are certainly solvable but a bit of pain, so better leave this as a back-up option.
Assuming you have the tool installed, this is how you create your first OSTree repo to play with: $ ostree --repo=/path/to/repo --mode=bare init $ ls /path/to/repo config extensions objects refs state tmp
Here we initialise a bare repository, which is the default. Note that in real-world cases OSTree typically runs as root, but as we don’t operate at OS filesystem tree level, a normal user would do for now.
Repository is initially empty. Let’s create some files and then commit them: $ mkdir folder $ echo ‘Hello, World’ > file $ ostree commit--branch= playground--re po =/ path/ to/ re po
There are a few things to note here, compared to Git. First, the branch is required, because there is no “master” equivalent in OSTree. Typically, branches carry path-like names, say gnome-continuous/ build ma st er/x 86_64runtime, but let’s keep things simple here. In contrast, the commit’s subject and body (short message and long message, in Git’s parlance) are optional by default, and it’s fine to store an empty directory. In addition, note that there’s no intermediate ostree add stage required.
Now change something in the directory: $ echo ‘Bye, World’ > file and commit this once again to the same branch. This time, add some descriptive subject. Now you can see the log of your changes with: $ ostree log --repo=/path/to/repo playground
The output looks much like in Git. Note that ref is again required, because there’s no master. It’s possible to check any version of this tiny tree you like: $ ostree checkout --repo=/path/to/repo <hash from ostree log>
Just remember you need all hash bytes. Note that the tree isn’t switched as in Git but rather checked out in a separate directory. In real-world deployments, bootloader integration and systemd tricks are employed to check out the tree you want during the boot.
Checking out into the current directory is also supported: just add . as the last command line argument. It doesn’t work the same way as in Git, however. In fact, if you try this with our example setup, OSTree would complain: $ ostree checkout --repo=/path/to/repo <hash from ostree log> . error: File exists
This is because we already have both “file” and “folder” in the current directory, and OSTree is all about immutable readonly trees. There’s a union feature (-- union) that asks OSTree to stack filesystems one on top of another. It keeps all files and directories which OSTree considers unchanged and overwrites everything else with the commit’s contents. It’s mainly used for layering trees much like Docker does with containers. Doing this banishes the above error message.
Remember though, that OSTree wasn’t meant for mutable filesystem trees. If you check out anything from a bare repository, you really get a hard link – think the second name for blocks on the disk. If you change anything under this name, you effectively modify the repo directly and overwrite your history. This is not what you want, and to prevent this OSTree integration scripts create a read-only bind mount for OSTree- managed directories in real-world deployment setups. This is also why /etc and /var – the two typical locations for mutable files in Linux – are effectively out of the OSTree’s control.
It’s also possible to commit a tree from the tar archive. You just supply--tree= tar= something. tar tot he“os tree commit” command. This comes handy when you integrate OSTree with build systems. Let’s have a quick look at how you do this.
A real thing
OSTree has found its way into many projects. Flatpack ( https://flatpak.org) uses it to distribute and manage both applications and runtimes and touts content deduplication and rollback as driving features. OSTree also comes as an update mechanism in Endless OS ( https://endlessos.com) and forms the foundation for ( www.projectatomic.io) Project Atomic. A de-facto standard build/continuous delivery system integration for OSTree is perhaps Gnome Continuous ( https://build.gnome.org).
This being said, Gnome Continuous is experimental, and it seemed to be broken as of a time of the writing. At least, all latest builds were marked as failed. We also had some issues with the image from a year ago that was available for download. However, integration scripts are still with us, and we can see what they do, and how a typical OSTree- managed system is organised.
The build system part is quite straightforward, OSTreewise. It takes sources from Git, compiles them and then commits the result into the OSTree repo. An important thing to note here is that Git, not OSTree, is deemed to be the ultimate source in this scheme. In OSTree, there’s a common assumption that files you store in a repo can be regenerated if necessary. There’s also a metadata mechanism you can use to store additional info to aid this regeneration, such as Git commit hash or tag.
Typically, you keep around a few latest OSTree commits and prune everything else. Gnome Continuous doesn’t seem to use this feature, but it’s worth considering for your own deployments. The ostree-prune(1) man page has all the details. With ostree prune , you can delete (garbage collect) unreachable objects, everything older than a threshold, or just specific commits.
With required binaries in OSTree repo, build scripts start constructing a so-called deployment. The latter is just an OSTree checkout and behaves much like a chroot. A family of commands anchored at ostree admin are used for that. First, ostree admin os-init is called to prepare a new deployment. Then ostree pull-local pumps data from the build system repo. Finally, ostree admin deploy checkouts the target ref as the new default deployment to become effective after the reboot.
Gnome Continuous ships as a virtual machine image (qcow2) and the filesystem layout inside is also typical for OSTree. /bin, /lib and alike are symlinks to their /usr counterparts, and the latter is mounted read-only. Similarly, /home and /root are symlinks to /var so all OSTree deployments share them. There’s also a /sysroot directory, which points to the real filesystem root. This is mainly to give an access to the OSTree system repo so you can work with it.
Hopefully, this gives you an understanding of what OSTree is all about. It is not often that you have to interface with it, yet it may come as a silent workhorse in a larger system. Actually, the project’s name is now libostree to emphasise the fact you’d want to integrate it into your Python or something similar. But if you need an old good CLI, don’t panic – it’s still present and functional.