An open source system for automating deployment, scaling, and operations of applications.

Wednesday, September 28, 2016

How we made Kubernetes insanely easy to install

Editor's note: Today’s post is by Luke Marsden, Head of Developer Experience, at Weaveworks, showing the Special Interest Group Cluster-Lifecycle’s recent work on kubeadm, a tool to make installing Kubernetes much simpler.

Over at SIG-cluster-lifecycle, we've been hard at work the last few months on kubeadm, a tool that makes Kubernetes dramatically easier to install. We've heard from users that installing Kubernetes is harder than it should be, and we want folks to be focused on writing great distributed apps not wrangling with infrastructure!

There are three stages in setting up a Kubernetes cluster, and we decided to focus on the second two (to begin with):
  1. Provisioning: getting some machines
  2. Bootstrapping: installing Kubernetes on them and configuring certificates
  3. Add-ons: installing necessary cluster add-ons like DNS and monitoring services, a pod network, etc
We realized early on that there's enormous variety in the way that users want to provision their machines.

They use lots of different cloud providers, private clouds, bare metal, or even Raspberry Pi's, and almost always have their own preferred tools for automating provisioning machines: Terraform or CloudFormation, Chef, Puppet or Ansible, or even PXE booting bare metal. So we made an important decision: kubeadm would not provision machines. Instead, the only assumption it makes is that the user has some computers running Linux.

Another important constraint was we didn't want to just build another tool that "configures Kubernetes from the outside, by poking all the bits into place". There are many external projects out there for doing this, but we wanted to aim higher. We chose to actually improve the Kubernetes core itself to make it easier to install. Luckily, a lot of the groundwork for making this happen had already been started.

We realized that if we made Kubernetes insanely easy to install manually, it should be obvious to users how to automate that process using any tooling.

So, enter kubeadm. It has no infrastructure dependencies, and satisfies the requirements above. It's easy to use and should be easy to automate. It's still in alpha, but it works like this:
  • You install Docker and the official Kubernetes packages for you distribution.
  • Select a master host, run kubeadm init.
  • This sets up the control plane and outputs a kubeadm join [...] command which includes a secure token.
  • On each host selected to be a worker node, run the kubeadm join [...] command from above.
  • Install a pod network. Weave Net is a great place to start here. Install it using just kubectl apply -f
Presto! You have a working Kubernetes cluster! Try kubeadm today

For a video walkthrough, check this out:

Follow the kubeadm getting started guide to try it yourself, and please give us feedback on GitHub, mentioning @kubernetes/sig-cluster-lifecycle!

Finally, I want to give a huge shout-out to so many people in the SIG-cluster-lifecycle, without whom this wouldn't have been possible. I'll mention just a few here:

  • Joe Beda kept us focused on keeping things simple for the user.
  • Mike Danese at Google has been an incredible technical lead and always knows what's happening. Mike also tirelessly kept up on the many code reviews necessary.
  • Ilya Dmitrichenko, my colleague at Weaveworks, wrote most of the kubeadm code and also kindly helped other folks contribute.
  • Lucas Käldström from Finland has got to be the youngest contributor in the group and was merging last-minute pull requests on the Sunday night before his school math exam.
  • Brandon Philips and his team at CoreOS led the development of TLS bootstrapping, an essential component which we couldn't have done without.
  • Devan Goodwin from Red Hat built the JWS discovery service that Joe imagined and sorted out our RPMs.
  • Paulo Pires from Portugal jumped in to help out with external etcd support and picked up lots of other bits of work.
  • And many other contributors!

This truly has been an excellent cross-company and cross-timezone achievement, with a lovely bunch of people. There's lots more work to do in SIG-cluster-lifecycle, so if you’re interested in these challenges join our SIG. Looking forward to collaborating with you all!

--Luke Marsden, Head of Developer Experience at Weaveworks

  • Try kubeadm to install Kubernetes today
  • Get involved with the Kubernetes project on GitHub 
  • Post questions (or answer questions) on Stack Overflow 
  • Connect with the community on Slack
  • Follow us on Twitter @Kubernetesio for latest updates

Tuesday, September 27, 2016

How Qbox Saved 50% per Month on AWS Bills Using Kubernetes and Supergiant

Editor’s Note: Today’s post is by the team at Qbox, a hosted Elasticsearch provider sharing their experience with Kubernetes and how it helped save them fifty-percent off their cloud bill. 

A little over a year ago, we at Qbox faced an existential problem. Just about all of the major IaaS providers either launched or acquired services that competed directly with our Hosted Elasticsearch service, and many of them started offering it for free. The race to zero was afoot unless we could re-engineer our infrastructure to be more performant, more stable, and less expensive than the VM approach we had had before, and the one that is in use by our IaaS brethren. With the help of Kubernetes, Docker, and Supergiant (our own hand-rolled layer for managing distributed and stateful data), we were able to deliver 50% savings, a mid-five figure sum. At the same time, support tickets plummeted. We were so pleased with the results that we decided to open source Supergiant as its own standalone product. This post will demonstrate how we accomplished it.

Back in 2013, when not many were even familiar with Elasticsearch, we launched our as-a-service offering with a dedicated, direct VM model. We hand-selected certain instance types optimized for Elasticsearch, and users configured single-tenant, multi-node clusters running on isolated virtual machines in any region. We added a markup on the per-compute-hour price for the DevOps support and monitoring, and all was right with the world for a while as Elasticsearch became the global phenomenon that it is today.

As we grew to thousands of clusters, and many more thousands of nodes, it wasn’t just our AWS bill getting out of hand. We had 4 engineers replacing dead nodes and answering support tickets all hours of the day, every day. What made matters worse was the volume of resources allocated compared to the usage. We had thousands of servers with a collective CPU utilization under 5%. We were spending too much on processors that were doing absolutely nothing. 

How we got there was no great mystery. VM’s are a finite resource, and with a very compute-intensive, burstable application like Elasticsearch, we would be juggling the users that would either undersize their clusters to save money or those that would over-provision and overspend. When the aforementioned competitive pressures forced our hand, we had to re-evaluate everything.

Adopting Docker and Kubernetes
Our team avoided Docker for a while, probably on the vague assumption that the network and disk performance we had with VMs wouldn't be possible with containers. That assumption turned out to be entirely wrong.

To run performance tests, we had to find a system that could manage networked containers and volumes. That's when we discovered Kubernetes. It was alien to us at first, but by the time we had familiarized ourselves and built a performance testing tool, we were sold. It was not just as good as before, it was better.

The performance improvement we observed was due to the number of containers we could “pack” on a single machine. Ironically, we began the Docker experiment wanting to avoid “noisy neighbor,” which we assumed was inevitable when several containers shared the same VM. However, that isolation also acted as a bottleneck, both in performance and cost. To use a real-world example, If a machine has 2 cores and you need 3 cores, you have a problem. It’s rare to come across a public-cloud VM with 3 cores, so the typical solution is to buy 4 cores and not utilize them fully.

This is where Kubernetes really starts to shine. It has the concept of requests and limits, which provides granular control over resource sharing. Multiple containers can share an underlying host VM without the fear of “noisy neighbors”. They can request exclusive control over an amount of RAM, for example, and they can define a limit in anticipation of overflow. It’s practical, performant, and cost-effective multi-tenancy. We were able to deliver the best of both the single-tenant and multi-tenant worlds.

Kubernetes + Supergiant
We built Supergiant originally for our own Elasticsearch customers. Supergiant solves Kubernetes complications by allowing pre-packaged and re-deployable application topologies. In more specific terms, Supergiant lets you use Components, which are somewhat similar to a microservice. Components represent an almost-uniform set of Instances of software (e.g., Elasticsearch, MongoDB, your web application, etc.). They roll up all the various Kubernetes and cloud operations needed to deploy a complex topology into a compact entity that is easy to manage.

For Qbox, we went from needing 1:1 nodes to approximately 1:11 nodes. Sure, the nodes were larger, but the utilization made a substantial difference. As in the picture below, we could cram a whole bunch of little instances onto one big instance and not lose any performance. Smaller users would get the added benefit of higher network throughput by virtue of being on bigger resources, and they would also get greater CPU and RAM bursting.


Adding Up the Cost Savings
The packing algorithm in Supergiant, with its increased utilization, resulted in an immediate 25% drop in our infrastructure footprint. Remember, this came with better performance and fewer support tickets. We could dial up the packing algorithm and probably save even more money. Meanwhile, because our nodes were larger and far more predictable, we could much more fully leverage the economic goodness that is AWS Reserved Instances. We went with 1-year partial RI’s, which cut the remaining costs by 40%, give or take. Our customers still had the flexibility to spin up, down, and out their Elasticsearch nodes, without forcing us to constantly juggle, combine, split, and recombine our reservations. At the end of the day, we saved 50%. That is $600k per year that can go towards engineering salaries instead of enriching our IaaS provider. 

Monday, September 26, 2016

Kubernetes 1.4: Making it easy to run on Kubernetes anywhere

Today we’re happy to announce the release of Kubernetes 1.4.

Since the release to general availability just over 15 months ago, Kubernetes has continued to grow and achieve broad adoption across the industry. From brand new startups to large-scale businesses, users have described how big a difference Kubernetes has made in building, deploying and managing distributed applications. However, one of our top user requests has been making Kubernetes itself easier to install and use. We’ve taken that feedback to heart, and 1.4 has several major improvements.

These setup and usability enhancements are the result of concerted, coordinated work across the community - more than 20 contributors from SIG-Cluster-Lifecycle came together to greatly simplify the Kubernetes user experience, covering improvements to installation, startup, certificate generation, discovery, networking, and application deployment.

Additional product highlights in this release include simplified cluster deployment on any cloud, easy installation of stateful apps, and greatly expanded Cluster Federation capabilities, enabling a straightforward deployment across multiple clusters, and multiple clouds.

What’s new:

Cluster creation with two commands - To get started with Kubernetes a user must provision nodes, install Kubernetes and bootstrap the cluster. A common request from users is to have an easy, portable way to do this on any cloud (public, private, or bare metal).

  • Kubernetes 1.4 introduces ‘kubeadm’ which reduces bootstrapping to two commands, with no complex scripts involved. Once kubernetes is installed, kubeadm init starts the master while kubeadm join joins the nodes to the cluster.
  • Installation is also streamlined by packaging Kubernetes with its dependencies, for most major Linux distributions including Red Hat and Ubuntu Xenial. This means users can now install Kubernetes using familiar tools such as apt-get and yum.
  • Add-on deployments, such as for an overlay network, can be reduced to one command by using a DaemonSet.
  • Enabling this simplicity is a new certificates API and its use for kubelet TLS bootstrap, as well as a new discovery API.

Expanded stateful application support - While cloud-native applications are built to run in containers, many existing applications need additional features to make it easy to adopt containers. Most commonly, these include stateful applications such as batch processing, databases and key-value stores. In Kubernetes 1.4, we have introduced a number of features simplifying the deployment of such applications, including: 

  • ScheduledJob is introduced as Alpha so users can run batch jobs at regular intervals.
  • Init-containers are Beta, addressing the need to run one or more containers before starting the main application, for example to sequence dependencies when starting a database or multi-tier app.
  • Dynamic PVC Provisioning moved to Beta. This feature now enables cluster administrators to expose multiple storage provisioners and allows users to select them using a new Storage Class API object.  
  • Curated and pre-tested Helm charts for common stateful applications such as MariaDB, MySQL and Jenkins will be available for one-command launches using version 2 of the Helm Package Manager.

Cluster federation API additions - One of the most requested capabilities from our global customers has been the ability to build applications with clusters that span regions and clouds. 

  • Federated Replica Sets Beta - replicas can now span some or all clusters enabling cross region or cross cloud replication. The total federated replica count and relative cluster weights / replica counts are continually reconciled by a federated replica-set controller to ensure you have the pods you need in each region / cloud.
  • Federated Services are now Beta, and secrets, events and namespaces have also been added to the federation API.
  • Federated Ingress Alpha - starting with Google Cloud Platform (GCP), users can create a single L7 globally load balanced VIP that spans services deployed across a federation of clusters within GCP. With Federated Ingress in GCP, external clients point to a single IP address and are sent to the closest cluster with usable capacity in any region or zone of the federation in GCP.

Container security support - Administrators of multi-tenant clusters require the ability to provide varying sets of permissions among tenants, infrastructure components, and end users of the system.

  • Pod Security Policy is a new object that enables cluster administrators to control the creation and validation of security contexts for pods/containers. Admins can associate service accounts, groups, and users with a set of constraints to define a security context.
  • AppArmor support is added, enabling admins to run a more secure deployment, and provide better auditing and monitoring of their systems. Users can configure a container to run in an AppArmor profile by setting a single field.

Infrastructure enhancements - We continue adding to the scheduler, storage and client capabilities in Kubernetes based on user and ecosystem needs.

  • Scheduler - introducing inter-pod affinity and anti-affinity Alpha for users who want to customize how Kubernetes co-locates or spreads their pods. Also priority scheduling capability for cluster add-ons such as DNS, Heapster, and the Kube Dashboard.
  • Disruption SLOs - Pod Disruption Budget is introduced to limit impact of pods deleted by cluster management operations (such as node upgrade) at any one time.
  • Storage - New volume plugins for Quobyte and Azure Data Disk have been added.
  • Clients - Swagger 2.0 support is added, enabling non-Go clients.

Kubernetes Dashboard UI - lastly, a great looking Kubernetes Dashboard UI with 90% CLI parity for at-a-glance management.

For a complete list of updates see the release notes on GitHub. Apart from features the most impressive aspect of Kubernetes development is the community of contributors. This is particularly true of the 1.4 release, the full breadth of which will unfold in upcoming weeks.

Kubernetes 1.4 is available for download at and via the open source repository hosted on GitHub. To get started with Kubernetes try the Hello World app.

To get involved with the project, join the weekly community meeting or start contributing to the project here (marked help). 

Users and Case Studies
Over the past fifteen months since the Kubernetes 1.0 GA release, the adoption and enthusiasm for this project has surpassed everyone's imagination. Kubernetes runs in production at hundreds of organization and thousands more are in development. Here are a few unique highlights of companies running Kubernetes: 

  • Box -- accelerated their time to delivery from six months to launch a service to less than a week. Read more on how Box runs mission critical production services on Kubernetes.
  • Pearson -- minimized complexity and increased their engineer productivity. Read how Pearson is using Kubernetes to reinvent the world’s largest educational company. 
  • OpenAI -- a non-profit artificial intelligence research company, built infrastructure for deep learning with Kubernetes to maximize productivity for researchers allowing them to focus on the science.

We’re very grateful to our community of over 900 contributors who contributed more than 5,000 commits to make this release possible. To get a closer look on how the community is using Kubernetes, join us at the user conference KubeCon to hear directly from users and contributors.


Thank you for your support! 

-- Aparna Sinha, Product Manager, Google

Wednesday, September 21, 2016

High performance network policies in Kubernetes clusters

Editor's note: today’s post is by Juergen Brendel, Pritesh Kothari and Chris Marino co-founders of Pani Networks, the sponsor of the Romana project, the network policy software used for these benchmark tests.

Network Policies

Since the release of Kubernetes 1.3 back in July, users have been able to define and enforce network policies in their clusters. These policies are firewall rules that specify permissible types of traffic to, from and between pods. If requested, Kubernetes blocks all traffic that is not explicitly allowed. Policies are applied to groups of pods identified by common labels. Labels can then be used to mimic traditional segmented networks often used to isolate layers in a multi-tier application: You might identify your front-end and back-end pods by a specific “segment” label, for example. Policies control traffic between those segments and even traffic to or from external sources.

Segmenting traffic

What does this mean for the application developer? At last, Kubernetes has gained the necessary capabilities to provide "defence in depth". Traffic can be segmented and different parts of your application can be secured independently. For example, you can very easily protect each of your services via specific network policies: All the pods identified by a Replication Controller behind a service are already identified by a specific label. Therefore, you can use this same label to apply a policy to those pods.

Defense in depth has long been recommended as best practice. This kind of isolation between different parts or layers of an application is easily achieved on AWS and OpenStack by applying security groups to VMs. 

However, prior to network policies, this kind of isolation for containers was not possible. VXLAN overlays can provide simple network isolation, but application developers need more fine grained control over the traffic accessing pods. As you can see in this simple example, Kubernetes network policies can manage traffic based on source and origin, protocol and port.

apiVersion: extensions/v1beta1
kind: NetworkPolicy
name: pol1
    role: backend
- from:
  - podSelector:
      role: frontend
  - protocol: tcp
    port: 80

Not all network backends support policies

Network policies are an exciting feature, which the Kubernetes community has worked on for a long time. However, it requires a networking backend that is capable of applying the policies. By themselves, simple routed networks or the commonly used flannel network driver, for example, cannot apply network policy.

There are only a few policy-capable networking backends available for Kubernetes today: RomanaCalico, and Canal; with Weave indicating support in the near future. Red Hat’s OpenShift includes network policy features as well.

We chose Romana as the back-end for these tests because it configures pods to use natively routable IP addresses in a full L3 configuration. Network policies, therefore, can be applied directly by the host in the Linux kernel using iptables rules. This results is a high performance, easy to manage network. 

Testing performance impact of network policies

After network policies have been applied, network packets need to be checked against those policies to verify that this type of traffic is permissible. But what is the performance penalty for applying a network policy to every packet? Can we use all the great policy features without impacting application performance? We decided to find out by running some tests.

Before we dive deeper into these tests, it is worth mentioning that ‘performance’ is a tricky thing to measure, network performance especially so. 

Throughput (i.e. data transfer speed measured in Gpbs) and latency (time to complete a request) are common measures of network performance. The performance impact of running an overlay network on throughput and latency has been examined previously here and here. What we learned from these tests is that Kubernetes networks are generally pretty fast, and servers have no trouble saturating a 1G link, with or without an overlay. It's only when you have 10G networks that you need to start thinking about the overhead of encapsulation. 

This is because during a typical network performance benchmark, there’s no application logic for the host CPU to perform, leaving it available for whatever network processing is required. For this reason we ran our tests in an operating range that did not saturate the link, or the CPU. This has the effect of isolating the impact of processing network policy rules on the host. For these tests we decided to measure latency as measured by the average time required to complete an HTTP request across a range of response sizes. 

Test setup
  • Hardware: Two servers with Intel Core i5-5250U CPUs (2 core, 2 threads per core) running at 1.60GHz, 16GB RAM and 512GB SSD. NIC: Intel Ethernet Connection I218-V (rev 03)
  • Ubuntu 14.04.5
  • Kubernetes 1.3 for data collection (verified samples on v1.4.0-beta.5)
  • Romana v0.9.3.1
  • Client and server load test software
For the tests we had a client pod send 2,000 HTTP requests to a server pod. HTTP requests were sent by the client pod at a rate that ensured that neither the server nor network ever saturated. We also made sure each request started a new TCP session by disabling persistent connections (i.e. HTTP keep-alive). We ran each test with different response sizes and measured the average request duration time (how long does it take to complete a request of that size). Finally, we repeated each set of measurements with different policy configurations. 

Romana detects Kubernetes network policies when they’re created, translates them to Romana’s own policy format, and then applies them on all hosts. Currently, Kubernetes network policies only apply to ingress traffic. This means that outgoing traffic is not affected.
First, we conducted the test without any policies to establish a baseline. We then ran the test again, increasing numbers of policies for the test's network segment. The policies were of the common “allow traffic for a given protocol and port” format. To ensure packets had to traverse all the policies, we created a number of policies that did not match the packet, and finally a policy that would result in acceptance of the packet.

The table below shows the results, measured in milliseconds for different request sizes and numbers of policies:

Response Size

What we see here is that, as the number of policies increases, processing network policies introduces a very small delay, never more than 0.2ms, even after applying 200 policies. For all practical purposes, no meaningful delay is introduced when network policy is applied. Also worth noting is that doubling the response size from 0.5k to 1.0k had virtually no effect. This is because for very small responses, the fixed overhead of creating a new connection dominates the overall response time (i.e. the same number of packets are transferred).

Note: .5k and 1k lines overlap at ~.8ms in the chart above

Even as a percentage of baseline performance, the impact is still very small. The table below shows that for the smallest response sizes, the worst case delay remains at 7%, or less, up to 200 policies. For the larger response sizes the delay drops to about 1%. 

Response Size

What is also interesting in these results is that as the number of policies increases, we notice that larger requests experience a smaller relative (i.e. percentage) performance degradation.

This is because when Romana installs iptables rules, it ensures that packets belonging to established connection are evaluated first. The full list of policies only needs to be traversed for the first packets of a connection. After that, the connection is considered ‘established’ and the connection’s state is stored in a fast lookup table. For larger requests, therefore, most packets of the connection are processed with a quick lookup in the ‘established’ table, rather than a full traversal of all rules. This iptables optimization results in performance that is largely independent of the number of network policies. 

Such ‘flow tables’ are common optimizations in network equipment and it seems that iptables uses the same technique quite effectively. 

Its also worth noting that in practise, a reasonably complex application may configure a few dozen rules per segment. It is also true that common network optimization techniques like Websockets and persistent connections will improve the performance of network policies even further (especially for small request sizes), since connections are held open longer and therefore can benefit from the established connection optimization.

These tests were performed using Romana as the backend policy provider and other network policy implementations may yield different results. However, what these tests show is that for almost every application deployment scenario, network policies can be applied using Romana as a network back end without any negative impact on performance.

If you wish to try it for yourself, we invite you to check out Romana. In our GitHub repo you can find an easy to use installer, which works with AWS, Vagrant VMs or any other servers. You can use it to quickly get you started with a Romana powered Kubernetes or OpenStack cluster.

Friday, September 9, 2016

Creating a PostgreSQL Cluster using Helm

Editor’s note: Today’s guest post is by Jeff McCormick, a developer at Crunchy Data, showing how to deploy a PostgreSQL cluster using Helm, a Kubernetes package manager.

Crunchy Data supplies a set of open source PostgreSQL and PostgreSQL related containers. The Crunchy PostgreSQL Container Suite includes containers that deploy, monitor, and administer the open source PostgreSQL database, for more details view this GitHub repository

In this post we’ll show you how to deploy a PostgreSQL cluster using Helm, a Kubernetes package manager. For reference, the Crunchy Helm Chart examples used within this post are located here, and the pre-built containers can be found on DockerHub at this location

This example will create the following in your Kubernetes cluster:
  • postgres master service
  • postgres replica service
  • postgres 9.5 master database (pod)
  • postgres 9.5 replica database (replication controller)


This example creates a simple Postgres streaming replication deployment with a master (read-write), and a single asynchronous replica (read-only). You can scale up the number of replicas dynamically.


The example is made up of various Chart files as follows:

This file contains values which you can reference within the database templates allowing you to specify in one place values like database passwords
The postgres master database pod definition.  This file causes a single postgres master pod to be created.
The postgres master database has a service created to act as a proxy.  This file causes a single service to be created to proxy calls to the master database.
The postgres replica database is defined by this file.  This file causes a replication controller to be created which allows the postgres replica containers to be scaled up on-demand.
This file causes the service proxy for the replica database container(s) to be created.


Install Helm according to their GitHub documentation and then install the examples as follows:

helm init
cd crunchy-containers/examples/kubehelm
helm install ./crunchy-postgres


After installing the Helm chart, you will see the following services:

kubectl get services
crunchy-master   <none>        5432/TCP   1h
crunchy-replica    <none>        5432/TCP   1h
kubernetes     <none>        443/TCP    1h

It takes about a minute for the replica to begin replicating with the master. To test out replication, see if replication is underway with this command, enter password for the password when prompted:

psql -h crunchy-master -U postgres postgres -c 'table pg_stat_replication'

If you see a line returned from that query it means the master is replicating to the slave. Try creating some data on the master:

psql -h crunchy-master -U postgres postgres -c 'create table foo (id int)'
psql -h crunchy-master -U postgres postgres -c 'insert into foo values (1)'

Then verify that the data is replicated to the slave:

psql -h crunchy-replica -U postgres postgres -c 'table foo'

You can scale up the number of read-only replicas by running the following kubernetes command:

kubectl scale rc crunchy-replica --replicas=2

It takes 60 seconds for the replica to start and begin replicating from the master.  

The Kubernetes Helm and Charts projects provide a streamlined way to package up complex applications and deploy them on a Kubernetes cluster.  Deploying PostgreSQL clusters can sometimes prove challenging, but the task is greatly simplified using Helm and Charts.

--Jeff McCormick, Developer, Crunchy Data