An open source system for automating deployment, scaling, and operations of applications.

Monday, July 25, 2016

Why OpenStack's embrace of Kubernetes is great for both communities

Today, Mirantis, the leading contributor to OpenStack, announced that it will re-write its private cloud platform to use Kubernetes as its underlying orchestration engine. We think this is a great step forward for both the OpenStack and Kubernetes communities. With Kubernetes under the hood, OpenStack users will benefit from the tremendous efficiency, manageability and resiliency that Kubernetes brings to the table, while positioning their applications to use more cloud-native patterns. The Kubernetes community, meanwhile, can feel confident in their choice of orchestration framework, while gaining the ability to manage both container- and VM-based applications from a single platform.

The Path to Cloud Native

Google spent over ten years developing, applying and refining the principles of cloud native computing. A cloud-native application is:

  • Container-packaged. Applications are composed of hermetically sealed, reusable units across diverse environments;
  • Dynamically scheduled, for increased infrastructure efficiency and decreased operational overhead; and 
  • Microservices-based. Loosely coupled components significantly increase the overall agility, resilience and maintainability of applications.

These principles have enabled us to build the largest, most efficient, most powerful cloud infrastructure in the world, which anyone can access via Google Cloud Platform. They are the same principles responsible for the recent surge in popularity of Linux containers. Two years ago, we open-sourced Kubernetes to spur adoption of containers and scalable, microservices-based applications, and the recently released Kubernetes version 1.3 introduces a number of features to bridge enterprise and cloud native workloads. We expect that adoption of cloud-native principles will drive the same benefits within the OpenStack community, as well as smoothing the path between OpenStack and the public cloud providers that embrace them.

Making OpenStack better

We hear from enterprise customers that they want to move towards cloud-native infrastructure and application patterns. Thus, it is hardly surprising that OpenStack would also move in this direction [1], with large OpenStack users such as eBay and GoDaddy adopting Kubernetes as key components of their stack. Kubernetes and cloud-native patterns will improve OpenStack lifecycle management by enabling rolling updates, versioning, and canary deployments of new components and features. In addition, OpenStack users will benefit from self-healing infrastructure, making OpenStack easier to manage and more resilient to the failure of core services and individual compute nodes. Finally, OpenStack users will realize the developer and resource efficiencies that come with a container-based infrastructure.

OpenStack is a great tool for Kubernetes users

Conversely, incorporating Kubernetes into OpenStack will give Kubernetes users access to a robust framework for deploying and managing applications built on virtual machines. As users move to the cloud-native model, they will be faced with the challenge of managing hybrid application architectures that contain some mix of virtual machines and Linux containers. The combination of Kubernetes and OpenStack means that they can do so on the same platform using a common set of tools.

We are excited by the ever increasing momentum of the cloud-native movement as embodied by Kubernetes and related projects, and look forward to working with Mirantis, its partner Intel, and others within the OpenStack community to brings the benefits of cloud-native to their applications and infrastructure.


--Martin Buhr, Product Manager, Strategic Initiatives, Google

[1] Check out the announcement of Kubernetes-OpenStack Special Interest Group here, and a great talk about OpenStack on Kubernetes by CoreOS CEO Alex Polvi at the most recent OpenStack summit here.


Thursday, July 21, 2016

The Bet on Kubernetes, a Red Hat Perspective

Editor’s note: Today’s guest post is from a Kubernetes contributor Clayton Coleman, Architect on OpenShift at Red Hat, sharing their adoption of the project from its beginnings.

Two years ago, Red Hat made a big bet on Kubernetes. We bet on a simple idea: that an open source community is the best place to build the future of application orchestration, and that only an open source community could successfully integrate the diverse range of capabilities necessary to succeed. As a Red Hatter, that idea is not far-fetched - we’ve seen it successfully applied in many communities, but we’ve also seen it fail, especially when a broad reach is not supported by solid foundations. On the one year anniversary of Kubernetes 1.0, two years after the first open-source commit to the Kubernetes project, it’s worth asking the question:

Was Kubernetes the right bet?

The success of software is measured by the successes of its users - whether that software enables for them new opportunities or efficiencies. In that regard, Kubernetes has succeeded beyond our wildest dreams. We know of hundreds of real production deployments of Kubernetes, in the enterprise through Red Hat’s multi-tenant enabled OpenShift distribution, on Google Container Engine (GKE), in heavily customized versions run by some of the world's largest software companies, and through the education, entertainment, startup, and do-it-yourself communities. Those deployers report improved time to delivery, standardized application lifecycles, improved resource utilization, and more resilient and robust applications. And that’s just from customers or contributors to the community - I would not be surprised if there were now thousands of installations of Kubernetes managing tens of thousands of real applications out in the wild.

I believe that reach to be a validation of the vision underlying Kubernetes: to build a platform for all applications by providing tools for each of the core patterns in distributed computing. Those patterns:

  • simple replicated web software
  • distributed load balancing and service discovery
  • immutable images run in containers
  • co-location of related software into pods
  • simplified consumption of network attached storage
  • flexible and powerful resource scheduling
  • running batch and scheduled jobs alongside service workloads
  • managing and maintaining clustered software like databases and message queues


Allow developers and operators to move to the next scale of abstraction, just like they have enabled Google and others in the tech ecosystem to scale to datacenter computers and beyond. From Kubernetes 1.0 to 1.3 we have continually improved the power and flexibility of the platform while ALSO improving performance, scalability, reliability, and usability. The explosion of integrations and tools that run on top of Kubernetes further validates core architectural decisions to be composable, to expose open and flexible APIs, and to deliberately limit the core platform and encourage extension.

Today Kubernetes has one of the largest and most vibrant communities in the open source ecosystem, with almost a thousand contributors, one of the highest human-generated commit rates of any single-repository project on GitHub, over a thousand projects based around Kubernetes, and correspondingly active Stack Overflow and Slack channels. Red Hat is proud to be part of this ecosystem as the largest contributor to Kubernetes after Google, and every day more companies and individuals join us. The idea of Kubernetes found fertile ground, and you, the community, provided the excitement and commitment that made it grow.

So, did we bet correctly? For all the reasons above, and hundreds more: Yes.

What’s next?

Happy as we are with the success of Kubernetes, this is no time to rest! While there are many more features and improvements we want to build into Kubernetes, I think there is a general consensus that we want to focus on the only long term goal that matters - a healthy, successful, and thriving technical community around Kubernetes. As John F. Kennedy probably said: 

> Ask not what your community can do for you, but what you can do for your community

In a recent post to the kubernetes-dev list, Brian Grant laid out a great set of near term goals - goals that help grow the community, refine how we execute, and enable future expansion. In each of the Kubernetes Special Interest Groups we are trying to build sustainable teams that can execute across companies and communities, and we are actively working to ensure each of these SIGs is able to contribute, coordinate, and deliver across a diverse range of interests under one vision for the project.

Of special interest to us is the story of extension - how the core of Kubernetes can become the beating heart of the datacenter operating system, and enable even more patterns for application management to build on top of Kubernetes, not just into it. Work done in the 1.2 and 1.3 releases around third party APIs, API discovery, flexible scheduler policy, external authorization and authentication (beyond those built into Kubernetes) is just the start. When someone has a need, we want them to easily find a solution, and we also want it to be easy for others to consume and contribute to that solution. Likewise, the best way to prove ideas is to prototype them against real needs and to iterate against real problems, which should be easy and natural.

By Kubernetes’ second birthday, I hope to reflect back on a long year of refinement, user success, and community participation. It has been a privilege and an honor to contribute to Kubernetes, and it still feels like we are just getting started. Thank you, and I hope you come along for the ride!

-- Clayton Coleman, Contributor and Architect on Kubernetes and OpenShift at Red Hat. Follow him on Twitter and GitHub: @smarterclayton

Happy Birthday Kubernetes. Oh, the places you’ll go!

Editor’s note, Today’s guest post is from an independent Kubernetes contributor, Justin Santa Barbara, sharing his reflection on growth of the project from inception to its future.

Dear K8s,

It’s hard to believe you’re only one - you’ve grown up so fast. On the occasion of your first birthday, I thought I would write a little note about why I was so excited when you were born, why I feel fortunate to be part of the group that is raising you, and why I’m eager to watch you continue to grow up!

--Justin

You started with an excellent foundation - good declarative functionality, built around a solid API with a well defined schema and the machinery so that we could evolve going forwards. And sure enough, over your first year you grew so fast: autoscaling, HTTP load-balancing support (Ingress), support for persistent workloads including clustered databases (PetSets). You’ve made friends with more clouds (welcome Azure & OpenStack to the family), and even started to span zones and clusters (Federation). And these are just some of the most visible changes - there’s so much happening inside that brain of yours!

I think it’s wonderful you’ve remained so open in all that you do - you seem to write down everything on Github - for better or worse. I think we’ve all learned a lot about that on the way, like the perils of having engineers make scaling statements that are then weighed against claims made without quite the same framework of precision and rigor. But I’m proud that you chose not to lower your standards, but rose to the challenge and just ran faster instead - it might not be the most realistic approach, but it is the only way to move mountains!

And yet, somehow, you’ve managed to avoid a lot of the common dead-ends that other open source software has fallen into, particularly as those projects got bigger and the developers end up working on it more than they use it directly. How did you do that? There’s a probably-apocryphal story of an employee at IBM that makes a huge mistake, and is summoned to meet with the big boss, expecting to be fired, only to be told “We just spent several million dollars training you. Why would we want to fire you?”. Despite all the investment google is pouring into you (along with Redhat and others), I sometimes wonder if the mistakes we are avoiding could be worth even more. There is a very open development process, yet there’s also an “oracle” that will sometimes course-correct by telling us what happens two years down the road if we make a particular design decision. This is a parent you should probably listen to!

And so although you’re only a year old, you really have an old soul. I’m just one of the many people raising you, but it’s a wonderful learning experience for me to be able to work with the people that have built these incredible systems and have all this domain knowledge. Yet because we started from scratch (rather than taking the existing Borg code) we’re at the same level and can still have genuine discussions about how to raise you. Well, at least as close to the same level as we could ever be, but it’s to their credit that they are all far too nice ever to mention it!

If I would pick just two of the wise decisions those brilliant people made:

  • Labels & selectors give us declarative “pointers”, so we can say “why” we want things, rather than listing the things directly. It’s the secret to how you can scale to great heights; not by naming each step, but saying “a thousand more steps just like that first one”.
  • Controllers are state-synchronizers: we specify the goals, and your controllers will indefatigably work to bring the system to that state. They work through that strongly-typed API foundation, and are used throughout the code, so Kubernetes is more of a set of a hundred small programs than one big one. It’s not enough to scale to thousands of nodes technically; the project also has to scale to thousands of developers and features; and controllers help us get there.

And so on we will go! We’ll be replacing those controllers and building on more, and the API-foundation lets us build anything we can express in that way - with most things just a label or annotation away! But your thoughts will not be defined by language: with third party resources you can express anything you choose. Now we can build Kubernetes without building in Kubernetes, creating things that feel as much a part of Kubernetes as anything else. Many of the recent additions, like ingress, DNS integration, autoscaling and network policies were done or could be done in this way. Eventually it will be hard to imagine you before these things, but tomorrow’s standard functionality can start today, with no obstacles or gatekeeper, maybe even for an audience of one.

So I’m looking forward to seeing more and more growth happen further and further from the core of Kubernetes. We had to work our way through those phases; starting with things that needed to happen in the kernel of Kubernetes - like replacing replication controllers with deployments. Now we’re starting to build things that don’t require core changes. But we’re still still talking about infrastructure separately from applications. It’s what comes next that gets really interesting: when we start building applications that rely on the Kubernetes APIs. We’ve always had the Cassandra example that uses the Kubernetes API to self-assemble, but we haven’t really even started to explore this more widely yet. In the same way that the S3 APIs changed how we build things that remember, I think the k8s APIs are going to change how we build things that think.

So I’m looking forward to your second birthday: I can try to predict what you’ll look like then, but I know you’ll surpass even the most audacious things I can imagine. Oh, the places you’ll go!


-- Justin Santa Barbara, Independent Kubernetes Contributor

A Very Happy Birthday Kubernetes

Last year at OSCON, I got to reconnect with a bunch of friends and see what they have been working on. That turned out to be the Kubernetes 1.0 launch event. Even that day, it was clear the project was supported by a broad community -- a group that showed an ambitious vision for distributed computing. 

Today, on the first anniversary of the Kubernetes 1.0 launch, it’s amazing to see what a community of dedicated individuals can do. Kubernauts have collectively put in 237 person years of coding effort since launch to bring forward our most recent release 1.3. However the community is much more than simply coding effort. It is made up of people -- individuals that have given their expertise and energy to make this project flourish. With more than 830 diverse contributors, from independents to the largest companies in the world, it’s their work that makes Kubernetes stand out. Here are stories from a couple early contributors reflecting back on the project:


The community is also more than online GitHub and Slack conversation; year one saw the launch of KubeCon, the Kubernetes user conference, which started as a grassroot effort that brought together 1,000 individuals between two events in San Francisco and London. The advocacy continues with users globally. There are more than 130 Meetup groups that mention Kubernetes, many of which are helping celebrate Kubernetes’ birthday. To join the celebration, participate at one of the 20 #k8sbday parties worldwide: Austin, Bangalore, Beijing, Boston, Cape Town, Charlotte, Cologne, Geneva, Karlsruhe, Kisumu, Montreal, Portland, Raleigh, Research Triangle, San Francisco, Seattle, Singapore, SF Bay Area, or Washington DC.


The Kubernetes community continues to work to make our project more welcoming and open to our kollaborators. This spring, Kubernetes and KubeCon moved to the Cloud Native Compute Foundation (CNCF), a Linux Foundation Project, to accelerate the collaborative vision outlined only a year ago at OSCON …. lifting a glass to another great year.



-- Sarah Novotny, Kubernetes Community Wonk

Monday, July 18, 2016

Bringing End-to-End Kubernetes Testing to Azure (Part 2)

Editor’s Note: Today’s guest post is Part II from a series by Travis Newhouse, Chief Architect at AppFormix, writing about their contributions to Kubernetes.

Historically, Kubernetes testing has been hosted by Google, running e2e tests on Google Compute Engine (GCE) and Google Container Engine (GKE). In fact, the gating checks for the submit-queue are a subset of tests executed on these test platforms. Federated testing aims to expand test coverage by enabling organizations to host test jobs for a variety of platforms and contribute test results to benefit the Kubernetes project. Members of the Kubernetes test team at Google and SIG-Testing have created a Kubernetes test history dashboard that publishes the results from all federated test jobs (including those hosted by Google).

In this blog post, we describe extending the e2e test jobs for Azure, and show how to contribute a federated test to the Kubernetes project.

END-TO-END INTEGRATION TESTS FOR AZURE

After successfully implementing “development distro” scripts to automate deployment of Kubernetes on Azure, our next goal was to run e2e integration tests and share the results with the Kubernetes community.

We automated our workflow for executing e2e tests of Kubernetes on Azure by defining a nightly job in our private Jenkins server. Figure 2 shows the workflow that uses kube-up.sh to deploy Kubernetes on Ubuntu virtual machines running in Azure, then executes the e2e tests. On completion of the tests, the job uploads the test results and logs to a Google Cloud Storage directory, in a format that can be processed by the scripts that produce the test history dashboard. Our Jenkins job uses the hack/jenkins/e2e-runner.sh and hack/jenkins/upload-to-gcs.sh scripts to produce the results in the correct format.

Kubernetes on Azure - Flow Chart - New Page.png
Figure 2 - Nightly test job workflow

HOW TO CONTRIBUTE AN E2E TEST


Throughout our work to create the Azure e2e test job, we have collaborated with members of SIG-Testing to find a way to publish the results to the Kubernetes community. The results of this collaboration are documentation and a streamlined process to contribute results from a federated test job. The steps to contribute e2e test results can be summarized in 4 steps.

  1. Create a Google Cloud Storage bucket in which to publish the results.
  2. Define an automated job to run the e2e tests. By setting a few environment variables, hack/jenkins/e2e-runner.sh deploys Kubernetes binaries and executes the tests.
  3. Upload the results using hack/jenkins/upload-to-gcs.sh.
  4. Incorporate the results into the test history dashboard by submitting a pull-request with modifications to a few files in kubernetes/test-infra.

The federated tests documentation describes these steps in more detail. The scripts to run e2e tests and upload results simplifies the work to contribute a new federated test job. The specific steps to set up an automated test job and an appropriate environment in which to deploy Kubernetes are left to the reader’s preferences. For organizations using Jenkins, the jenkins-job-builder configurations for GCE and GKE tests may provide helpful examples.

RETROSPECTIVE

The e2e tests on Azure have been running for several weeks now. During this period, we have found two issues in Kubernetes. Weixu Zhuang immediately published fixes that have been merged into the Kubernetes master branch.

The first issue happened when we wanted to bring up the Kubernetes cluster using SaltStack on Azure using Ubuntu VMs. A commit (07d7cfd3) modified the OpenVPN certificate generation script to use a variable that was only initialized by scripts in the cluster/ubuntu. Strict checking on existence of parameters by the certificate generation script caused other platforms that use the script to fail (e.g. our changes to support Azure). We submitted a pull-request that fixed the issue by initializing the variable with a default value to make the certificate generation scripts more robust across all platform types.

The second pull-request cleaned up an unused import in the Daemonset unit test file. The import statement broke the unit tests with golang 1.4. Our nightly Jenkins job helped us find this error and we promptly pushed a fix for it.

CONCLUSION AND FUTURE WORK

The addition of a nightly e2e test job for Kubernetes on Azure has helped to define the process to contribute a federated test to the Kubernetes project. During the course of the work, we also saw the immediate benefit of expanding test coverage to more platforms when our Azure test job identified compatibility issues.

We want to thank Aaron Crickenberger, Erick Fejta, Joe Finney, and Ryan Hutchinson for their help to incorporate the results of our Azure e2e tests into the Kubernetes test history. If you’d like to get involved with testing to create a stable, high quality releases of Kubernetes, join us in the Kubernetes Testing SIG (sig-testing).


--Travis Newhouse, Chief Architect at AppFormix



Sunday, July 17, 2016

Update on Kubernetes for Windows Server Containers

Today's post is written by Jitendra Bhurat, Product Manager at Apprenda, and Cesar Wong, Principal Software Engineer at Red Hat; describing the progress made to bring Kubernetes on Windows Server. 

Large organizations have significant investments and long-term roadmaps for both Linux and Windows Server. With Microsoft adopting Docker in Windows Server 2016, organizations are looking for a simplified means to orchestrating containers in both their Linux and Windows environments. 

As the adoption and contribution curve of Kubernetes is unmatched, there has been a lot of interest in making Kubernetes compatible with the Microsoft ecosystem. At Apprenda, we have been building .NET distributed systems for the better part of a decade and kicked off the porting of Kubernetes to Windows in June.

Our current goal is to develop a minimally viable proof of concept (POC) of Kubernetes on Windows Server 2016 so that the community can learn about the pitfalls that will need to be overcome for future production environments. As the community drives this project to its first milestone, a number of lessons have been learned, work is ongoing, and we are currently investigating a number of networking options on Windows Server.

GETTING STARTED

In June, Apprenda and Kismatic (acquired by Apprenda), formed the Kubernetes Windows SIG team, and along with Red Hat, formed a partnership to create a Windows version of the Kubernetes Kubelet and Kube-Proxy, the two key components for operating Kubernetes on Windows Server. For a minimum viable POC, the initial work was divided into the following logical areas to help facilitate project management:


Focus Area
Architectural Notes
Status
Part of POC
Container Runtime
Expanding Kubernetes container runtimes to support Windows Server 2016 Docker containers
Work for POC complete
Yes
cAdvisor
Resource usage and performance characteristics in Kubernetes for Docker containers. Other architectural features in Kubernetes can run without this component.  
Research will start after POC milestone
No
Pod Architecture
Fundamental unit of container bundling in Kubernetes. Current area of research ongoing on ultimate architecture in Windows Server.   
Work for POC close to complete and will be finished when networking work is completed.
Yes
Networking and Kube-Proxy
Networking and communications among components (e.g. services to pod, pod to pod, etc.).
Still active area of research for POC. For some parts of Kubernetes there are no direct parallels in Windows Server - e.g. IPTables. Currently investigating Open vSwitch for container networking.
Yes
OOM Score
Frees up memory by sacrificing processes when all else fails. There is no direct comparison for OOM in Windows Server.
Research will start after POC milestone
No

Cesar Wong from Red Hat offered to contribute to Container Runtime and Pod Architecture sections and Apprenda has been working on Kubelet Integration and Kube-Proxy. 

It is important to mention that for Windows Server environments, the Kubernetes control plane (API server, schedulers, etcd, etc) would continue to run on Linux and there would not be a Windows-only version of Kubernetes. Given the vast majority of organizations are running both Linux and Windows Server instances, this requirement is not a technical roadblock to adoption. For example, vSphere has a similar requirement

Container Runtime 

The current Kubelet implementation for Linux relies on an infrastructure container per pod to hold on to an IP address while other containers may be killed/restarted. This requires that containers be able to share their network stack (--net=container:id). In Docker for Windows, it is not possible to share the network stack across containers. It is also not possible to set a container’s DNS servers. In the container runtime POC for Windows, we created a new container runtime that removed the requirement of an infrastructure container. It also uses the IP of the first container in the pod as the pod’s IP address. With some limitations, however, it was possible to stand up pods with Windows-specific containers. 

It is worth noting that a refactor of the container runtime in the Kubelet is under way and the POC code will need to be updated to reflect this new runtime architecture.

Pod Architecture

Pods in Kubernetes can include multiple containers, all sharing the same network, PID and volume namespace. Since Windows containers cannot share namespaces, multiple containers inside a pod are more isolated from each other. Furthermore, each container gets its own IP address, while the pod will only expose a single IP address for all containers. 

At least initially, this means that Windows pods in Kubernetes should be limited to a single container. This also reflects the fact that containers in Windows are more monolithic than their Linux counterparts, including more of the underlying operating system pieces, including a service manager and other processes.
Kubelet Integration

The goal here was to have Kubelet running on Windows Server Container and be capable of accepting requests/commands from the Kubernetes API Server running on Linux. The Kubelet code base is, unfortunately, closely coupled with the Linux OS. For example, even for trivial things like finding the hostname, the Kubelet code assumes Linux as the operating system. Thanks to the versatility of Kubernetes, many of the OS dependencies (like hostname)  were fixable using command line flags to provide a default value.  

Thus far, we have been able to disjoin parts of the Kubelet from its dependencies which allow it to run on Windows with the proper abstraction layers. While the current status of the Kubelet is good enough for a POC, there is more work that needs to be done to get it into a state of general availability. For one, instead of using flags and environmental variables, it would be best to have code changes upstream. There are also a number of bugs that need fixing. For example, we encountered a Golang bug where moving a directory fails in Windows and had to provide an alternate implementation.  
Networking / Kube-Proxy
Organizations using containers need the ability to deploy multiple containers and pods on a single host, have them share an IP address and be able to easily talk to other containers and pods on that same host. To accomplish this design goal, we conducted a lot of research on Windows Container Networking, as this held the keys to both the Kubernetes pod and service level networking. 

We looked in-depth at the different networking modes supported by Windows Containers and found L2 Bridge Networking mode to be the most appropriate in our case, as it allowed inter-container communication across different hosts. Working closely with the Microsoft networking team, we were able to identify and resolve issues we were having setting up the L2 Bridge networking mode in our environment using the TP5 release of Windows Server 2016.  
As we dug deep into networking, two options were clear for the Kube-Proxy implementation:
  • Implement Kube-Proxy natively on Windows  
  • Run a Linux version of the Kube-Proxy on a Hyper-V VM using the same bridge as the other containers running on Windows and have the Kube-Proxy forward requests to the other containers and have the Windows host forward requests to the proxy

For the POC, we decided to run Kube-Proxy on a Hyper-V Linux virtual machine and configure L2 Bridge mode networking with a private subnet. This implementation would enable us to forward traffic from the Kube-Proxy to containers running on Windows. Unfortunately, it did not work as it did in theory. 

On further investigation, with the help of Microsoft, we determined that for this to work, we would have to configure the L2 Bridge networking mode with an externally accessible gateway and on the same subnet as the container host. Such a requirement goes against the networking isolation boundary that the pod currently enjoys because each container on that host and other hosts can communicate with each other and all the other container hosts. This means any container can talk to any other container regardless of pod membership.  
We are currently looking at using Open vSwitch (OVS) to configure overlay networking to overcome the issue described above. Cloudbase, which is also a member of the Kubernetes Windows SIG, is actively involved in this effort. Their team has successfully implemented OVS in Windows Server and their work is promising for this effort. The community is also currently engaged with the Microsoft lead on Windows Server networking to find an alternative in case this route does not pan out. 

As we continue to make progress on the POC, we welcome ideas from the community to help us advance this vision. You can connect with us in the following ways:

--Jitendra Bhurat, Product Manager at Apprenda. Container Runtime and Pod Architecture sections contributed by Cesar Wong, Principal Software Engineer at Red Hat

Friday, July 15, 2016

Steering an Automation Platform at Wercker with Kubernetes

Editor’s note: today’s guest post is by Andy Smith, the CTO of Wercker, sharing how Kubernetes helps them save time and speed up development.  

At Wercker we run millions of containers that execute our users’ CI/CD jobs. The vast majority of them are ephemeral and only last as long as builds, tests and deploys take to run, the rest are ephemeral, too -- aren't we all --, but tend to last a bit longer and run our infrastructure. As we are running many containers across many nodes, we were in need of a highly scalable scheduler that would make our lives easier, and as such, decided to implement Kubernetes.

Wercker is a container-centric automation platform that helps developers build, test and deploy their applications. We support any number of pipelines, ranging from building code, testing API-contracts between microservices, to pushing containers to registries, and deploying to schedulers. All of these pipeline jobs run inside Docker containers and each artifact can be a Docker container.

And of course we use Wercker to build Wercker, and deploy itself onto Kubernetes!

Overview

Because we are a platform for running multi-service cloud-native code we've made many design decisions around isolation. On the base level we use CoreOS and cloud-init to bootstrap a cluster of heterogeneous nodes which I have named Patricians, Peasants, as well as controller nodes that don't have a cool name and are just called Controllers. Maybe we should switch to Constables.

k8s-architecture.jpg


Patrician nodes are where the bulk of our infrastructure runs. These nodes have appropriate network interfaces to communicate with our backend services as well as be routable by various load balancers. This is where our logging is aggregated and sent off to logging services, our many microservices for reporting and processing the results of job runs, and our many microservices for handling API calls.

On the other end of the spectrum are the Peasant nodes where the public jobs are run. Public jobs consist of worker pods reading from a job queue and dynamically generating new runner pods to handle execution of the job. The job itself is an incarnation of our open source CLI tool, the same one you can run on your laptop with Docker installed. These nodes have very limited access to the rest of the infrastructure and the containers the jobs themselves run in are even further isolated.

Controllers are controllers, I bet ours look exactly the same as yours.

Dynamic Pods
Our heaviest use of the Kubernetes API is definitely our system of creating dynamic pods to serve as the runtime environment for our actual job execution. After pulling job descriptions from the queue we define a new pod containing all the relevant environment for checking out code, managing a cache, executing a job and uploading artifacts. We launch the pod, monitor its progress, and destroy it when the job is done.

Ingresses
In order to provide a backend for HTTP API calls and allow self-registration of handlers we make use of the Ingress system in Kubernetes. It wasn't the clearest thing to set up, but reading through enough of the nginx example eventually got us to a good spot where it is easy to connect services to the frontend.

Upcoming Features in 1.3

While we generally treat all of our pods and containers as ephemeral and expect rapid restarts on failures, we are looking forward to Pet Sets and Init Containers as ways to optimize some of our processes. We are also pleased with official support for Minikube coming along as it improves our local testing and development. 

Conclusion

Kubernetes saves us the non-trivial task of managing many, many containers across many nodes. It provides a robust API and tooling for introspecting these containers, and it includes much built in support for logging, metrics, monitoring and debugging. Service discovery and networking alone saves us so much time and speeds development immensely.
Cheers to you Kubernetes, keep up the good work :)

-- Andy Smith, CTO, Wercker