Part 1: DC/OS
The ‘Datacentre Operating System’ is a contender for running containers as well as big data applications.
Another issue, another column on orchestrating containers? I can hear the exasperation from the LXF readership already. But wait, before you flip the page and go looking for Hot FOSS (Hint: see p48) and whatnot stay just a little while (i.e. don’t go to p48) and indulge me while I talk about DC/OS, one of the bigger contenders in this crowded market. DC/OS is relatively new as an open source project (having only been released in April of 2016) but has its roots in the Apache Mesos project (which in turn came out of the Berkeley as an open source version of Google’s Borg system) and is backed by a company called Mesosphere (the original authors, who sell an enterprise version). Anyone who used Twitter a few years ago might remember the company being almost as famous for the regular appearance of the ‘fail whale’ as it was for being the hottest social media company of the day. Mesos was apparently the technology that finally cracked Twitter’s scaling issues. Not that Twitter doesn’t have plenty of other issues left to deal with of course, but I digress.
DC/OS positions itself as being able to handle really large – data centre-sized in fact – workloads. It feels a lot more heavyweight than the likes of Rancher that we’ve covered here in the past although many of the principles are the same. A big mark of faith for just how heavyweight and robust it is came from Microsoft, who has chosen elements of it to run the Azure container service (part of its AWS rival).
Pooling of resources
DC/OS is actually a bundling together of several open source projects into a cohesive whole. As well as Mesos itself, these include Zookeeper for service discovery, Marathon (written by Mesosphere originally) to provide orchestration and Exhibitor (used to install and configure Zookeeper, originally written by Netflix). The project believes that this bundling and associated polish mean that the software is greater than the sum of its parts. However, Mesos is undoubtedly the core on which everything else stands and is worth considering on its own for just a moment.
Classing itself as a distributed kernel, Mesos actually consists of a few elements, the first of which are a master daemon which manages agent daemons. There can be multiple masters, which discover each other via Zookeeper and elect one of their number as the leader. The remaining masters act as standby systems in case the leader becomes unavailable. All pretty standard stuff. Agents look to try and connect to the leader (via the Zookeeper discovery service) and register to it. The agents report back to the master about their available resources (number of available CPUs, amount of memory and such like). As well as the master/ agents, Mesos has the concept of a framework that runs tasks using the reported resources. This needs a scheduler to examine resource requests and an executor to actually start tasks on nodes. Basically, the scheduler gets offered reports of resources from the master which it ignores until an end user triggers a job of some kind on the system. At that point, it compares available resources to the job requirements and asks the master to handle sending the tasks to the agents which, in turn, use the executor to actually start them. As the task completes, its status is reported back up this chain back to the initiating client. Don’t worry, it’s a lot simpler than it sounds!
Mesos (and by extension, DC/OS) states that this separation of resource management from task scheduling means that multiple workloads can be collocated efficiently, getting the most out of compute resource. Indeed, Apache claim that Mesosphere can scale (and has done) to tens of thousands of nodes. While the average LXF reader won’t have that many machines lying around (although my garage could probably pass for a data centre given the number of old computers in it) most cloud providers do—and it’s here that DC/OS aims to make its mark.
As well as being scalable, mixed workloads – containers, web applications as well as big batch jobs and analytic type tasks – can be run on the same hardware. DC/OS uses Marathon to schedule some of these, but other schedulers (e.g. Hadoop) can also run alongside it. This is quite a difference from container only platforms. That’s not to say DC/OS is a slouch on this front. Marathon can handle container orchestration quite happily and comes with all kinds of grouping and placement rules which can be applied (more on this next month).
What else does DC/OS provide besides running jobs across multiple machines? Well, there are some options on storage with ephemeral and persistent volumes available (generally locally based). There are a bunch of packages that can be installed via the fancily titled ‘Mesosphere Universe Package Repository’. This is a common feature set for this kind of software, though. There are a number of ways to get up and running with DC/OS itself, such as public cloud options, local hardware and/or virtual machine setups and hooks into config management systems.
Scaling can be done in a manner similar to other platforms. Adding extra nodes to DC/OS is a piece of cake, while horizontal and vertical scaling is handled via Marathon (in some cases automatically in response to load). The whole platform has high availability designed into it (so long as multiple masters are used) and the services running on it are monitored by Marathon and restarted should they fail. Upgrades to DC/OS are ‘zero downtime’ with multiple deployment options available (such as rolling, blue/green and canary type scenarios). Finally, in addition to Zookeeper based service discovery, load balancing elements are built into DC/OS with Layer 4 (transport) and Layer 7 (application) choices available, automatic DNS endpoint generation, a handle CLI and a well-designed web front-end to show off to your boss.
To the clouds
Well, I’m sure you’re convinced by now that DC/OS is the answer to your dreams and can replace whatever hoary old stack you spend your days supporting at the moment. But don’t start destroying things just yet. Lets get an example stack stood up and ready to play with ahead of next month. For ease of use, I decided to use Azure this time around (other cloud providers are available) because I had some credit with them as a result of a subscription that I needed to use up! The things I do for the readers of LXF…
As it happens it was pretty straightforward to set up DC/ OS in this manner. Azure has a marketplace where it can be selected. After that, it was a couple of wizard type screens, similar to most other cloud providers if I’m honest. The DC/ OS install guide suggested choosing five agents in order to be able to use all the services it had available, and I had to supply an SSH public key (needed to access the system afterward). The whole operation took around eight minutes (at the second attempt—I tried to stand it up in the newer UK South location at first only to get an error that a particular machine type was unavailable at that location). But then I was presented with a variety of objects in my dashboard—virtual machines, load balancers, a new public IP address and some network security groups and interfaces.
From there I had to retrieve the value of ‘MASTERFQDN’ from the output of the deployment. This was the name of the system where I had to tunnel to in order to finally see the DC/ OS dashboard (and was a typically horrible cloud system type name: dcosmasterwqhhnwxtdytst.westeurope.cloudapp. azure.com or some such). Having had enough Microsoft branded screens for one day, I could now switch to good old fashioned SSH: $ ssh azureuser@dcosmasterwqhhnwxtdytst.westeurope. cloudapp.azure.com -p 2200 -L 8000:localhost:80
Firing up my browser and connecting to port 8000 on localhost brought up a nice login screen. Since I chose to enable OAuth authentication I was asked for a Google, GitHub or Microsoft account to use at this point. Entering my details (and two-factor details) I was presented with the initial dashboard screen which can be seen on this page. A brand new shiny DCOS installation ready for use! I took a quick flick through the screens which showed me my nodes, details of my newly formed cluster, a handful of packages I could install (such as Jenkins, GitLab, Cassandra and Spark) and the individual components available for me to try out.
Next month I’ll be doing just that—putting DC/OS through its paces to see how it compares to some of the other systems we’ve looked at over the last year.