Part 1: Kubernetes
Jolyon Brown shows you how to take control of your containers before they control you with the open source version of Google’s management software.
This issue I’m going to start looking at Kubernetes, beginning with some of the concepts around it, and next month I’ll look at building some infrastructure with it. Since being open-sourced by Google, Kubernetes has become a very popular project. It automates the operations around containers, such as deployment and scaling, and it runs across clusters of hosts. With the current trend for containers, agile development and micro-services architecture Kubernetes is definitely worth knowing about and taking a look at.
From where I’m sitting at least the whole world has gone container mad in the last twelve-eighteen months. I’ve only covered them on a couple of occasions here in the hallowed pages of LinuxFormat (I worried that ‘container fatigue’ might become a problem). On the outside though almost everyone I’m working with has either a fairly advanced container evaluation project on the go or has plans for one. In most of these cases this means working with Docker, and there’s a healthy ecosystem around that project as well as in the wider container space. This is driven to a certain extent by the everevolving idea of ‘The Cloud’ and the expected role that containers will have to play in it. “There’s money in them thar clouds” one might say. As many readers of this column will undoubtedly be working with containers by now, I thought it was time to start looking at them on a more regular basis.
With any infrastructure technology, one of the main worries for administrators is how they are going to manage it. No-one wants to be firefighting their systems day in and day out and having to manually fix issues. From physical systems to virtual machines to containers – the number of discrete elements to control (and maintain) is going up. Naturally, solutions for taking control of this situation have begun to spring up. Plus containers are cool! Everyone wants to impress their boss/customer/significant other by being able to provide an autoscaled, resilient and redundant service using them.
Enter Google. In infrastructure terms, the huge global ‘warehouse scale’ computing facilities built by the internet giant have achieved somewhat mythical status. This was in large part due to the secretive nature of the company when it came to discussing its internal architecture (with academic papers often being the only glimpse outsiders got). As its systems began to evolve, Google began to hit issues with running and managing virtual machines at massive scale. The company’s solution was to reconsider the problem and ensure that failures of individual components wouldn’t cause failures in other areas, that they could be distributed across different compute resources and that failures could be handled automatically. Google’s internal orchestration tool was called Borg (and there is another called Omega). Kubernetes is a descendant of Borg, which has been packaged up for external and open sourced consumption.
Pods and nodes
Kubernetes currently supports Docker and Rocket (from CoreOS) containers, with more promised in the near future. It runs these containers in collections known as ’pods’, which are the basic building block for Kubernetes. A pod can consist of one or more containers. When multiple containers exist inside a pod, they will all be located on the same physical host. Kubernetes can deploy multiple copies of pods (based on the same configuration) and handle bringing up replacements for pods that go offline. The pod is the level at which Kubernetes performs its scheduling and orchestration tasks. This is different to say, a vanilla Docker installation where the container itself is the common building block. The pod gets assigned an IP address rather than an individual one for each container, with ports being used by applications in the pod for communication. Its typical for a pod to consist of a single complete instance of a micro-service. Kubernetes provides service discovery so that IP addresses and DNS can be used to point at a collection of such services. A pod can also contain a volume which is visible to all containers in the pod, therefore allowing the sharing of temporary (ephemeral) data. For more persistent data, NFS mounts can be used.
A node is a physical machine on which one or more pods reside. Certain nodes act as masters, running control software. This includes etcd – which might be familiar from the CoreOS project covered in an earlier Administeria. This is a key-value store for shared configuration and service
discovery. An API server handles the calls which all the components of Kubernetes make and receive during cluster operations. A scheduler and controller deal with the pods on regular nodes: They make sure the correct number of pods are running, that new pods are brought up on nodes that have the capacity for them etc. Regular nodes run something called the kubelet (which maintains the pods themselves) and kube-proxy which is (surprise!) a simple network proxy and load balancer that passes traffic to services running in the pods. This all hangs together like the diagram ( seebelow).
Controlling Kubernetes: kubectl
The easiest way to interact with Kubernetes is via the kubectl command-line interface. With it, simple standalone examples of pods can be created with a single line command (This example from the Kubernetes documentation, which can be found at http://kubernetes.io). $ kubectl run my-nginx --image=nginx --replicas=2 --port=80 CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS my-nginx my-nginx nginx run=my-nginx 2
For anything other than trivial setups though, it’s more usual to create YAML or JSON format files which kubectl will read and act upon. These, of course, have the added advantage of being able to be versioned and stored in a code repository (which is always a good thing). These definition files can themselves be quite simplistic but can also include definitions for health checks of a service (which will help Kubernetes will decide if a container is in a working state or not) and user defined key/value fields known as labels and annotations. Labels are handy for assigning identifying data to containers for the cluster administrator. These would typically be single word notes, eg environment: prod or environment: dev . Annotations are intended for longer notes – perhaps Git commit related information, the name of the creator for a particular service or URLS for relevant documentation. Labels in particular will be used for sorting and searching of services, and the advice is to get off on the right foot by using them right from the beginning.
Replication controllers are created in Kubernetes by the same method as regular pods, with the addition of extra keywords, such as replicas: 2 . Kubernetes would spin up two instances of the requested pod, and maintain that number should they crash, be killed etc.
In contrast to a regular Docker setup, Kubernetes handles networking a bit differently. An admin doesn’t have to worry about port allocation to get around host-private networking which limits communication between containers to others on the same node. All pods can speak to one another, even across nodes. However, as pods are restarted, IP address allocations change. Services should be defined (again, using config files) which handle this situation automatically for the administrator. We could easily set up a service which targets pods labelled as Apache webservers and forwards traffic to them on port 80 or 443. As the pods go through their lifecycles, the service endpoint will be updated with the new IP addresses. To have external (eg internet-based) clients access services on a Kubernetes cluster, they must have public IP addresses and be connected to what Kubernetes calls NodePorts or LoadBalancers. These two similar methods manage external facing IP addresses which might be provided by the underlying cloud provider, for instance. Another popular method is to run HAProxy (which remains one of my favourite pieces of open source software).
Phew! Hopefully this wasn’t too much information to take in all at once. There’s a lot to Kubernetes, which is a really powerful system with a lot of production, real-world experience baked into it. Next issue, we’ll get Kubernetes installed and get down to some working examples, including some monitoring and examples of replication.