An open source system for automating deployment, scaling, and operations of applications.

Thursday, October 19, 2017

Introducing Software Certification for Kubernetes


Editor's Note: Today's post is by William Denniss, Product Manager, Google Cloud on the new Kubernetes Software Conformance Certification program.


Over the last three years, Kubernetes® has seen wide-scale adoption by a vibrant and diverse community of providers. In fact, there are now more than 60 known Kubernetes platforms and distributions. From the start, one goal of Kubernetes has been consistency and portability.

In order to better serve this goal, today the Kubernetes community and the Cloud Native Computing Foundation® (CNCF®) announce the availability of the beta Kubernetes Software Conformance Certification program. The Kubernetes conformance certification program gives users the confidence that when they use a Certified Kubernetes™ product, they can rely on a high level of common functionality. Certification provides Independent Software Vendors (ISVs) confidence that if their customer is using a Certified Kubernetes product, their software will behave as expected.

CNCF and the Kubernetes Community invites all vendors to run the conformance test suite, and submit conformance testing results for review and certification by the CNCF. When the program graduates to GA (generally available) later this year, all vendors receiving certification during the beta period will be listed in the launch announcement.

Just like Kubernetes itself, conformance certification is an evolving program managed by contributors in our community. Certification is versioned alongside Kubernetes, and certification requirements receive updates with each version of Kubernetes as features are added and the architecture changes. The Kubernetes community, through SIG Architecture, controls changes and overseers what it means to be Certified Kubernetes. The Testing SIG works on the mechanics of conformance tests, while the Conformance Working Group develops process and policy for the certification program.

Once the program moves to GA, certified products can proudly display the new Certified Kubernetes logo mark with stylized version information on their marketing materials. Certified products can also take advantage of a new combination trademark rule the CNCF adopted for Certified Kubernetes providers that keep their certification up to date.

Products must complete a recertification each year for the current or previous version of Kubernetes to remain certified. This ensures that when you see the Certified Kubernetes™ mark on a product, you’re not only getting something that’s proven conformant, but also contains the latest features and improvements from the community.

Visit https://github.com/cncf/k8s-conformance for more information about Kubernetes Software Compliance Certification, and learn how you can include your product in a growing list of Certified Kubernetes providers.

“Cloud Native Computing Foundation”, “CNCF” and “Kubernetes” are registered trademarks of The Linux Foundation in the United States and other countries. “Certified Kubernetes” and the Certified Kubernetes design are trademarks of The Linux Foundation in the United States and other countries.

Tuesday, October 10, 2017

Request Routing and Policy Management with the Istio Service Mesh

Editor's note: Today’s post by Frank Budinsky, Software Engineer, IBM, Andra Cismaru, Software Engineer, Google, and Israel Shalom, Product Manager, Google, is the second post in a three-part series on Istio. It offers a closer look at request routing and policy management.

In a previous article, we looked at a simple application (Bookinfo) that is composed of four separate microservices. The article showed how to deploy an application with Kubernetes and an Istio-enabled cluster without changing any application code. The article also outlined how to view Istio provided L7 metrics on the running services.

This article follows up by taking a deeper look at Istio using Bookinfo. Specifically, we’ll look at two more features of Istio: request routing and policy management.

Running the Bookinfo Application

As before, we run the v1 version of the Bookinfo application. After installing Istio in our cluster, we start the app defined in bookinfo-v1.yaml using the following command:


kubectl apply -f <(istioctl kube-inject -f bookinfo-v1.yaml)
We created an Ingress resource for the app:


cat <<EOF | kubectl create -f -
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: bookinfo
annotations:
  kubernetes.io/ingress.class: "istio"
spec:
rules:
- http:
    paths:
    - path: /productpage
      backend:
        serviceName: productpage
        servicePort: 9080
    - path: /login
      backend:
        serviceName: productpage
        servicePort: 9080
    - path: /logout
      backend:
        serviceName: productpage
        servicePort: 9080
EOF
Then we retrieved the NodePort address of the Istio Ingress controller:

export BOOKINFO_URL=$(kubectl get po -n istio-system -l istio=ingress -o jsonpath={.items[0].status.hostIP}):$(kubectl get svc -n istio-system istio-ingress -o jsonpath={.spec.ports[0].nodePort})
Finally, we pointed our browser to http://$BOOKINFO_URL/productpage, to see the running v1 application:




HTTP request routing

Existing container orchestration platforms like Kubernetes, Mesos, and other microservice frameworks allow operators to control when a particular set of pods/VMs should receive traffic (e.g., by adding/removing specific labels). Unlike existing techniques, Istio decouples traffic flow and infrastructure scaling. This allows Istio to provide a variety of traffic management features that reside outside the application code, including dynamic HTTP request routing for A/B testing, canary releases, gradual rollouts, failure recovery using timeouts, retries, circuit breakers, and fault injection to test compatibility of failure recovery policies across services.

To demonstrate, we’ll deploy v2 of the reviews service and use Istio to make it visible only for a specific test user. We can create a Kubernetes deployment, reviews-v2, with this YAML file


apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: reviews-v2
spec:
replicas: 1
template:
  metadata:
    labels:
      app: reviews
      version: v2
  spec:
    containers:
    - name: reviews
      image: istio/examples-bookinfo-reviews-v2:0.2.3
      imagePullPolicy: IfNotPresent
      ports:
      - containerPort: 9080
From a Kubernetes perspective, the v2 deployment adds additional pods that the reviews service selector includes in the round-robin load balancing algorithm. This is also the default behavior for Istio.

Before we start reviews:v2, we’ll start the last of the four Bookinfo services, ratings, which is used by the v2 version to provide ratings stars corresponding to each review:

kubectl apply -f <(istioctl kube-inject -f bookinfo-ratings.yaml)
If we were to start reviews:v2 now, we would see browser responses alternating between v1 (reviews with no corresponding ratings) and v2 (review with black rating stars). This will not happen, however, because we’ll use Istio’s traffic management feature to control traffic.

With Istio, new versions don’t need to become visible based on the number of running pods. Version visibility is controlled instead by rules that specify the exact criteria. To demonstrate, we start by using Istio to specify that we want to send 100% of reviews traffic to v1 pods only.

Immediately setting a default rule for every service in the mesh is an Istio best practice. Doing so avoids accidental visibility of newer, potentially unstable versions. For the purpose of this demonstration, however, we’ll only do it for the reviews service:


cat <<EOF | istioctl create -f -
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
 name: reviews-default
spec:
 destination:
   name: reviews
 route:
 - labels:
     version: v1
   weight: 100
EOF
This command directs the service mesh to send 100% of traffic for the reviews service to pods with the label “version: v1”. With this rule in place, we can safely deploy the v2 version without exposing it.


kubectl apply -f <(istioctl kube-inject -f bookinfo-reviews-v2.yaml)
Refreshing the Bookinfo web page confirms that nothing has changed.

At this point we have all kinds of options for how we might want to expose reviews:v2. If for example we wanted to do a simple canary test, we could send 10% of the traffic to v2 using a rule like this:


apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
 name: reviews-default
spec:
 destination:
   name: reviews
 route:
 - labels:
     version: v2
   weight: 10
 - labels:
     version: v1
   weight: 90
A better approach for early testing of a service version is to instead restrict access to it much more specifically. To demonstrate, we’ll set a rule to only make reviews:v2 visible to a specific test user. We do this by setting a second, higher priority rule that will only be applied if the request matches a specific condition:


cat <<EOF | istioctl create -f -
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
name: reviews-test-v2
spec:
destination:
  name: reviews
precedence: 2
match:
  request:
    headers:
      cookie:
        regex: "^(.*?;)?(user=jason)(;.*)?$"
route:
- labels:
    version: v2
  weight: 100
EOF
Here we’re specifying that the request headers need to include a user cookie with value “tester” as the condition. If this rule is not matched, we fall back to the default routing rule for v1.

If we login to the Bookinfo UI with the user name “tester” (no password needed), we will now see version v2 of the application (each review includes 1-5 black rating stars). Every other user is unaffected by this change. 



Once the v2 version has been thoroughly tested, we can use Istio to proceed with a canary test using the rule shown previously, or we can simply migrate all of the traffic from v1 to v2, optionally in a gradual fashion by using a sequence of rules with weights less than 100 (for example: 10, 20, 30, ... 100). This traffic control is independent of the number of pods implementing each version. If, for example, we had auto scaling in place, and high traffic volumes, we would likely see a corresponding scale up of v2 and scale down of v1 pods happening independently at the same time. For more about version routing with autoscaling, check out "Canary Deployments using Istio".

In our case, we’ll send all of the traffic to v2 with one command:


cat <<EOF | istioctl replace -f -
apiVersion: config.istio.io/v1alpha2
kind: RouteRule
metadata:
 name: reviews-default
spec:
 destination:
   name: reviews
 route:
 - labels:
     version: v2
   weight: 100
EOF
We should also remove the special rule we created for the tester so that it doesn’t override any future rollouts we decide to do:


istioctl delete routerule reviews-test-v2
In the Bookinfo UI, we’ll see that we are now exposing the v2 version of reviews to all users.

Policy enforcement

Istio provides policy enforcement functions, such as quotas, precondition checking, and access control. We can demonstrate Istio’s open and extensible framework for policies with an example: rate limiting.

Let’s pretend that the Bookinfo ratings service is an external paid service--for example, Rotten Tomatoes®--with a free quota of 1 request per second (req/sec). To make sure the application doesn’t exceed this limit, we’ll specify an Istio policy to cut off requests once the limit is reached. We’ll use one of Istio’s built-in policies for this purpose.

To set a 1 req/sec quota, we first configure a memquota handler with rate limits:


cat <<EOF | istioctl create -f -
apiVersion: "config.istio.io/v1alpha2"
kind: memquota
metadata:
name: handler
namespace: default
spec:
quotas:
- name: requestcount.quota.default
  maxAmount: 5000
  validDuration: 1s
  overrides:
  - dimensions:
      destination: ratings
    maxAmount: 1
    validDuration: 1s
EOF
Then we create a quota instance that maps incoming attributes to quota dimensions, and create a rule that uses it with the memquota handler:


cat <<EOF | istioctl create -f -
apiVersion: "config.istio.io/v1alpha2"
kind: quota
metadata:
name: requestcount
namespace: default
spec:
dimensions:
  source: source.labels["app"] | source.service | "unknown"
  sourceVersion: source.labels["version"] | "unknown"
  destination: destination.labels["app"] | destination.service | "unknown"
  destinationVersion: destination.labels["version"] | "unknown"
---
apiVersion: "config.istio.io/v1alpha2"
kind: rule
metadata:
name: quota
namespace: default
spec:
actions:
- handler: handler.memquota
  instances:
  - requestcount.quota
EOF
To see the rate limiting in action, we’ll generate some load on the application:


wrk -t1 -c1 -d20s http://$BOOKINFO_URL/productpage
In the web browser, we’ll notice that while the load generator is running (i.e., generating more than 1 req/sec), browser traffic is cut off. Instead of the black stars next to each review, the page now displays a message indicating that ratings are not currently available.

Stopping the load generator means the limit will no longer be exceeded: the black stars return when we refresh the page.

Summary

We’ve shown you how to introduce advanced features like HTTP request routing and policy injection into a service mesh configured with Istio without restarting any of the services. This lets you develop and deploy without worrying about the ongoing management of the service mesh; service-wide policies can always be added later.

In the next and last installment of this series, we’ll focus on Istio’s security and authentication capabilities. We’ll discuss how to secure all interservice communications in a mesh, even against insiders with access to the network, without any changes to the application code or the deployment.

Thursday, October 5, 2017

Kubernetes Community Steering Committee Election Results

Beginning with the announcement of Kubernetes 1.0 at OSCON in 2015, there has been a concerted effort to share the power and burden of leadership across the Kubernetes community.

With the work of the Bootstrap Governance Committee, consisting of Brandon Philips, Brendan Burns, Brian Grant, Clayton Coleman, Joe Beda, Sarah Novotny and Tim Hockin - a cross section of long-time leaders representing 5 different companies with major investments of talent and effort in the Kubernetes Ecosystem - we wrote an initial Steering Committee Charter and launched a community wide election to seat a Kubernetes Steering Committee.

To quote from the Charter -

The initial role of the steering committee is to instantiate the formal process for Kubernetes governance. In addition to defining the initial governance process, the bootstrap committee strongly believes that it is important to provide a means for iterating the processes defined by the steering committee. We do not believe that we will get it right the first time, or possibly ever, and won’t even complete the governance development in a single shot. The role of the steering committee is to be a live, responsive body that can refactor and reform as necessary to adapt to a changing project and community.

This is our largest step yet toward making an implicit governance structure explicit. Kubernetes vision has been one of an inclusive and broad community seeking to build software which empowers our users with the portability of containers. The Steering Committee will be a strong leadership voice guiding the project toward success.

The Kubernetes Community is pleased to announce the results of the 2017 Steering Committee Elections. Please congratulate Aaron Crickenberger, Derek Carr, Michelle Noorali, Phillip Wittrock, Quinton Hoole and Timothy St. Clair, who will be joining the members of the Bootstrap Governance committee on the newly formed Kubernetes Steering Committee. Derek, Michelle, and Phillip will serve for 2 years. Aaron, Quinton, and Timothy will serve for 1 year.

This group will meet regularly in order to clarify and streamline the structure and operation of the project. Early work will include electing a representative to the CNCF Governing Board, evolving project processes, refining and documenting the vision and scope of the project, and chartering and delegating to more topical community groups.

Please see the full Steering Committee backlog for more details.

Thursday, September 28, 2017

Kubernetes 1.8: Security, Workloads and Feature Depth

Editor's note: today's post is by Aparna Sinha, Group Product Manager, Kubernetes, Google; Ihor Dvoretskyi, Developer Advocate, CNCF; Jaice Singer DuMars, Kubernetes Ambassador, Microsoft; and Caleb Miles, Technical Program Manager, CoreOS on the latest release of Kubernetes 1.8.

We’re pleased to announce the delivery of Kubernetes 1.8, our third release this year. Kubernetes 1.8 represents a snapshot of many exciting enhancements and refinements underway. In addition to functional improvements, we’re increasing project-wide focus on maturing process, formalizing architecture, and strengthening Kubernetes’ governance model. The evolution of mature processes clearly signals that sustainability is a driving concern, and helps to ensure that Kubernetes is a viable and thriving project far into the future.

Spotlight on security


Kubernetes 1.8 graduates support for role based access control (RBAC) to stable. RBAC allows cluster administrators to dynamically define roles to enforce access policies through the Kubernetes API. Beta support for filtering outbound traffic through network policies augments existing support for filtering inbound traffic to a pod. RBAC and Network Policies are two powerful tools for enforcing organizational and regulatory security requirements within Kubernetes.


Transport Layer Security (TLS) certificate rotation for the Kubelet graduates to beta. Automatic certificate rotation eases secure cluster operation.

Spotlight on workload support


Kubernetes 1.8 promotes the core Workload APIs to beta with the apps/v1beta2 group and version. The beta contains the current version of Deployment, DaemonSet, ReplicaSet, and StatefulSet. The Workloads APIs provide a stable foundation for migrating existing workloads to Kubernetes as well as developing cloud native applications that target Kubernetes natively.

For those considering running Big Data workloads on Kubernetes, the Workloads API now enables native Kubernetes support in Apache Spark.

Batch workloads, such as nightly ETL jobs, will benefit from the graduation of CronJobs to beta.

Custom Resource Definitions (CRDs) remain in beta for Kubernetes 1.8. A CRD provides a powerful mechanism to extend Kubernetes with user-defined API objects. One use case for CRDs is the automation of complex stateful applications such as key-value stores, databases and storage engines through the Operator Pattern. Expect continued enhancements to CRDs such as validation as stabilization continues.

Spoilers ahead


Volume snapshots, PV resizing, automatic taints, priority pods, kubectl plugins, oh my!

In addition to stabilizing existing functionality, Kubernetes 1.8 offers a number of alpha features that preview new functionality.

Each Special Interest Group (SIG) in the community continues to deliver the most requested user features for their area. For a complete list, please visit the release notes.

Availability


Kubernetes 1.8 is available for download on GitHub. To get started with Kubernetes, check out these interactive tutorials.

Release team


The Release team for 1.8 was led by Jaice Singer DuMars, Kubernetes Ambassador at Microsoft, and was comprised of 14 individuals responsible for managing all aspects of the release, from documentation to testing, validation, and feature completeness.

As the Kubernetes community has grown, our release process has become an amazing demonstration of collaboration in open source software development. Kubernetes continues to gain new users at a rapid clip. This growth creates a positive feedback cycle where more contributors commit code creating a more vibrant ecosystem.

User Highlights


According to Redmonk, 54 percent of Fortune 100 companies are running Kubernetes in some form with adoption coming from every sector across the world. Recent user stories from the community include:

  • Ancestry.com currently holds 20 billion historical records and 90 million family trees, making it the largest consumer genomics DNA network in the world. With the move to Kubernetes, its deployment time for its Shaky Leaf icon service was cut down from 50 minutes to 2 or 5 minutes.
  • Wink, provider of smart home devices and apps, runs 80 percent of its workloads on a unified stack of Kubernetes-Docker-CoreOS, allowing them to continually innovate and improve its products and services.
  • Pear Deck, a teacher communication app for students, ported their Heroku apps into Kubernetes, allowing them to deploy the exact same configuration in lots of different clusters in 30 seconds.
  • Buffer, social media management for agencies and marketers, has a remote team of 80 spread across a dozen different time zones. Kubernetes has provided the kind of liquid infrastructure where a developer could create an app and deploy it and scale it horizontally as necessary.


Is Kubernetes helping your team? Share your story with the community.

Ecosystem updates


Announced on September 11, Kubernetes Certified Service Providers (KCSPs) are pre-qualified organizations with deep experience helping enterprises successfully adopt Kubernetes. Individual professionals can now register for the new Certified Kubernetes Administrator (CKA) program and exam, which requires passing an online, proctored, performance-based exam that tests one’s ability to solve multiple issues in a hands-on, command-line environment.
CNCF also offers online training that teaches the skills needed to create and configure a real-world Kubernetes cluster.

KubeCon


Join the community at KubeCon + CloudNativeCon in Austin, December 6-8 for the largest Kubernetes gathering ever. The premiere Kubernetes event will feature technical sessions, case studies, developer deep dives, salons and more! A full schedule of events and speakers will be available here on September 28. Discounted registration ends October 6.

Open Source Summit EU


Ihor Dvoretskyi, Kubernetes 1.8 features release lead, will present new features and enhancements at Open Source Summit EU in Prague, October 23. Registration is still open.

Get involved


The simplest way to get involved with Kubernetes is by joining one of the many Special Interest Groups (SIGs) that align with your interests. Have something you’d like to broadcast to the Kubernetes community? Share your voice at our weekly community meeting, and through the channels below.


Tuesday, September 26, 2017

Kubernetes StatefulSets & DaemonSets Updates

Editor's note: today's post is by Janet Kuo and Kenneth Owens, Software Engineers at Google.

This post talks about recent updates to the DaemonSet and StatefulSet API objects for Kubernetes. We explore these features using Apache ZooKeeper and Apache Kafka StatefulSets and a Prometheus node exporter DaemonSet.


In Kubernetes 1.6, we added the RollingUpdate update strategy to the DaemonSet API Object. Configuring your DaemonSets with the RollingUpdate strategy causes the DaemonSet controller to perform automated rolling updates to the Pods in your DaemonSets when their spec.template are updated.


In Kubernetes 1.7, we enhanced the DaemonSet controller to track a history of revisions to the PodTemplateSpecs of DaemonSets. This allows the DaemonSet controller to roll back an update. We also added the RollingUpdate strategy to the StatefulSet API Object, and implemented revision history tracking for the StatefulSet controller. Additionally, we added the Parallel pod management policy to support stateful applications that require Pods with unique identities but not ordered Pod creation and termination.

StatefulSet rolling update and Pod management policy

First, we’re going to demonstrate how to use StatefulSet rolling updates and Pod management policies by deploying a ZooKeeper ensemble and a Kafka cluster.

Prerequisites

To follow along, you’ll need to set up a Kubernetes 1.7 cluster with at least 3 schedulable nodes. Each node needs 1 CPU and 2 GiB of memory available. You will also need either a dynamic provisioner to allow the StatefulSet controller to provision 6 persistent volumes (PVs) with 10 GiB each, or you will need to manually provision the PVs prior to deploying the ZooKeeper ensemble or deploying the Kafka cluster.

Deploying a ZooKeeper ensemble

Apache ZooKeeper is a strongly consistent, distributed system used by other distributed systems for cluster coordination and configuration management.


Note: You can create a ZooKeeper ensemble using this zookeeper_mini.yaml manifest. You can learn more about running a ZooKeeper ensemble on Kubernetes here, as well as a more in-depth explanation of the manifest and its contents.


When you apply the manifest, you will see output like the following.


$ kubectl apply -f zookeeper_mini.yaml
service "zk-hs" created
service "zk-cs" created
poddisruptionbudget "zk-pdb" created
statefulset "zk" created


The manifest creates an ensemble of three ZooKeeper servers using a StatefulSet, zk; a Headless Service, zk-hs, to control the domain of the ensemble; a Service, zk-cs, that clients can use to connect to the ready ZooKeeper instances; and a PodDisruptionBugdet, zk-pdb, that allows for one planned disruption. (Note that while this ensemble is suitable for demonstration purposes, it isn’t sized correctly for production use.)


If you use kubectl get to watch Pod creation in another terminal you will see that, in contrast to the OrderedReady strategy (the default policy that implements the full version of the StatefulSet guarantees), all of the Pods in the zk StatefulSet are created in parallel.


$ kubectl get po -lapp=zk -w
NAME      READY     STATUS     RESTARTS   AGE
zk-0      0/1       Pending    0          0s
zk-0      0/1       Pending   0          0s
zk-1      0/1       Pending   0          0s
zk-1      0/1       Pending   0          0s
zk-0      0/1       ContainerCreating    0          0s
zk-2      0/1       Pending    0          0s
zk-1      0/1       ContainerCreating   0          0s
zk-2      0/1       Pending    0          0s
zk-2      0/1       ContainerCreating    0          0s
zk-0      0/1       Running   0          10s
zk-2      0/1       Running   0          11s
zk-1      0/1       Running    0          19s
zk-0      1/1       Running    0          20s
zk-1      1/1       Running    0          30s
zk-2      1/1       Running    0          30s


This is because the zookeeper_mini.yaml manifest sets the podManagementPolicy of the StatefulSet to Parallel.


apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
 name: zk
spec:
 serviceName: zk-hs
 replicas: 3
 updateStrategy:
   type: RollingUpdate
 podManagementPolicy: Parallel
...


Many distributed systems, like ZooKeeper, do not require ordered creation and termination for their processes. You can use the Parallel Pod management policy to accelerate the creation and deletion of StatefulSets that manage these systems. Note that, when Parallel Pod management is used, the StatefulSet controller will not block when it fails to create a Pod. Ordered, sequential Pod creation and termination is performed when a StatefulSet’s podManagementPolicy is set to  OrderedReady.

Deploying a Kafka Cluster

Apache Kafka is a popular distributed streaming platform. Kafka producers write data to partitioned topics which are stored, with a configurable replication factor, on a cluster of brokers. Consumers consume the produced data from the partitions stored on the brokers.


Note: Details of the manifests contents can be found here. You can learn more about running a Kafka cluster on Kubernetes here.


To create a cluster, you only need to download and apply the kafka_mini.yaml manifest. When you apply the manifest, you will see output like the following:


$ kubectl apply -f kafka_mini.yaml
service "kafka-hs" created
poddisruptionbudget "kafka-pdb" created
statefulset "kafka" created


The manifest creates a three broker cluster using the kafka StatefulSet, a Headless Service, kafka-hs, to control the domain of the brokers; and a PodDisruptionBudget, kafka-pdb, that allows for one planned disruption. The brokers are configured to use the ZooKeeper ensemble we created above by connecting through the zk-cs Service. As with the ZooKeeper ensemble deployed above, this Kafka cluster is fine for demonstration purposes, but it’s probably not sized correctly for production use.


If you watch Pod creation, you will notice that, like the ZooKeeper ensemble created above, the Kafka cluster uses the Parallel podManagementPolicy.


$ kubectl get po -lapp=kafka -w
NAME      READY     STATUS     RESTARTS   AGE
kafka-0   0/1       Pending    0          0s
kafka-0   0/1       Pending    0          0s
kafka-1   0/1       Pending    0          0s
kafka-1   0/1       Pending    0          0s
kafka-2   0/1       Pending    0          0s
kafka-0   0/1       ContainerCreating   0          0s
kafka-2   0/1       Pending    0          0s
kafka-1   0/1       ContainerCreating   0          0s
kafka-1   0/1       Running   0          11s
kafka-0   0/1       Running   0          19s
kafka-1   1/1       Running   0          23s
kafka-0   1/1       Running   0          32s

Producing and consuming data

You can use kubectl run to execute the kafka-topics.sh script to create a topic named test.


$ kubectl run -ti --image=gcr.io/google_containers/kubernetes-kafka:1.0-10.2.1 createtopic --restart=Never --rm -- kafka-topics.sh --create \
> --topic test \
> --zookeeper zk-cs.default.svc.cluster.local:2181 \
> --partitions 1 \
> --replication-factor 3


Now you can use kubectl run to execute the kafka-console-consumer.sh command to listen for messages.


$ kubectl run -ti --image=gcr.io/google_containers/kubnetes-kafka:1.0-10.2.1 consume --restart=Never --rm -- kafka-console-consumer.sh --topic test --bootstrap-server kafka-0.kafka-hs.default.svc.cluster.local:9093


In another terminal, you can run the kafka-console-producer.sh command.


$kubectl run -ti --image=gcr.io/google_containers/kubernetes-kafka:1.0-10.2.1 produce --restart=Never --rm \
>  -- kafka-console-producer.sh --topic test --broker-list kafka-0.kafka-hs.default.svc.cluster.local:9093,kafka-1.kafka-hs.default.svc.cluster.local:9093,kafka-2.kafka-hs.default.svc.cluster.local:9093


Output from the second terminal appears in the first terminal. If you continue to produce and consume messages while updating the cluster, you will notice that no messages are lost. You may see error messages as the leader for the partition changes when individual brokers are updated, but the client retries until the message is committed. This is due to the ordered, sequential nature of StatefulSet rolling updates which we will explore further in the next section.


Updating the Kafka cluster
StatefulSet updates are like DaemonSet updates in that they are both configured by setting the spec.updateStrategy of the corresponding API object. When the update strategy is set to OnDelete, the respective controllers will only create new Pods when a Pod in the StatefulSet or DaemonSet has been deleted. When the update strategy is set to RollingUpdate, the controllers will delete and recreate Pods when a modification is made to the spec.template field of a DaemonSet or StatefulSet. You can use rolling updates to change the configuration (via environment variables or command line parameters), resource requests, resource limits, container images, labels, and/or annotations of the Pods in a StatefulSet or DaemonSet. Note that all updates are destructive, always requiring that each Pod in the DaemonSet or StatefulSet be destroyed and recreated. StatefulSet rolling updates differ from DaemonSet rolling updates in that Pod termination and creation is ordered and sequential.


You can patch the kafka StatefulSet to reduce the CPU resource request to 250m.


$ kubectl patch sts kafka --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value":"250m"}]'
statefulset "kafka" patched


If you watch the status of the Pods in the StatefulSet, you will see that each Pod is deleted and recreated in reverse ordinal order (starting with the Pod with the largest ordinal and progressing to the smallest). The controller waits for each updated Pod to be running and ready before updating the subsequent Pod.


$kubectl get po -lapp=kafka -w
NAME      READY     STATUS    RESTARTS   AGE
kafka-0   1/1       Running   0          13m
kafka-1   1/1       Running   0          13m
kafka-2   1/1       Running   0          13m
kafka-2   1/1       Terminating   0         14m
kafka-2   0/1       Terminating   0         14m
kafka-2   0/1       Terminating   0         14m
kafka-2   0/1       Terminating   0         14m
kafka-2   0/1       Pending   0         0s
kafka-2   0/1       Pending   0         0s
kafka-2   0/1       ContainerCreating   0         0s
kafka-2   0/1       Running   0         10s
kafka-2   1/1       Running   0         21s
kafka-1   1/1       Terminating   0         14m
kafka-1   0/1       Terminating   0         14m
kafka-1   0/1       Terminating   0         14m
kafka-1   0/1       Terminating   0         14m
kafka-1   0/1       Pending   0         0s
kafka-1   0/1       Pending   0         0s
kafka-1   0/1       ContainerCreating   0         0s
kafka-1   0/1       Running   0         11s
kafka-1   1/1       Running   0         21s
kafka-0   1/1       Terminating   0         14m
kafka-0   0/1       Terminating   0         14m
kafka-0   0/1       Terminating   0         14m
kafka-0   0/1       Terminating   0         14m
kafka-0   0/1       Pending   0         0s
kafka-0   0/1       Pending   0         0s
kafka-0   0/1       ContainerCreating   0         0s
kafka-0   0/1       Running   0         10s
kafka-0   1/1       Running   0         22s


Note that unplanned disruptions will not lead to unintentional updates during the update process. That is, the StatefulSet controller will always recreate the Pod at the correct version to ensure the ordering of the update is preserved. If a Pod is deleted, and if it has already been updated, it will be created from  the updated version of the StatefulSet’s spec.template. If the Pod has not already been updated, it will be created from the previous version of the StatefulSet’s spec.template. We will explore this further in the following sections.

Staging an update

Depending on how your organization handles deployments and configuration modifications, you may want or need to stage updates to a StatefulSet prior to allowing the roll out to progress. You can accomplish this by setting a partition for the RollingUpdate. When the StatefulSet controller detects a partition in the updateStrategy of a StatefulSet, it will only apply the updated version of the StatefulSet’s spec.template to Pods whose ordinal is greater than or equal to the value of the partition.


You can patch the kafka StatefulSet to add a partition to the RollingUpdate update strategy. If you set the partition to a number greater than or equal to the StatefulSet’s spec.replicas (as below), any subsequent updates you perform to the StatefulSet’s spec.template will be staged for roll out, but the StatefulSet controller will not start a rolling update.


$ kubectl patch sts kafka -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":3}}}}'
statefulset "kafka" patched


If you patch the StatefulSet to set the requested CPU to 0.3, you will notice that none of the Pods are updated.


$ kubectl patch sts kafka --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/resources/requests/cpu", "value":"0.3"}]'
statefulset "kafka" patched


Even if you delete a Pod and wait for the StatefulSet controller to recreate it, you will notice that the Pod is recreated with current CPU request.


$  kubectl delete po kafka-1
pod "kafka-1" deleted

$ kubectl get po kafka-1 -w
NAME      READY     STATUS              RESTARTS   AGE
kafka-1   0/1       ContainerCreating   0          10s
kafka-1   0/1       Running   0         19s
kafka-1   1/1       Running   0         21s

$ kubectl get po kafka-1 -o yaml
apiVersion: v1
kind: Pod
metadata:
 ...
   resources:
     requests:
       cpu: 250m
       memory: 1Gi

Rolling out a canary

Often, we want to verify an image update or configuration change on a single instance of an application before rolling it out globally. If you modify the partition created above to be 2, the StatefulSet controller will roll out a canary that can be used to verify that the update is working as intended.


$ kubectl patch sts kafka -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":2}}}}'
statefulset "kafka" patched


You can watch the StatefulSet controller update the kafka-2 Pod and pause after the update is complete.


$  kubectl get po -lapp=kafka -w
NAME      READY     STATUS    RESTARTS   AGE
kafka-0   1/1       Running   0          50m
kafka-1   1/1       Running   0          10m
kafka-2   1/1       Running   0          29s
kafka-2   1/1       Terminating   0         34s
kafka-2   0/1       Terminating   0         38s
kafka-2   0/1       Terminating   0         39s
kafka-2   0/1       Terminating   0         39s
kafka-2   0/1       Pending   0         0s
kafka-2   0/1       Pending   0         0s
kafka-2   0/1       Terminating   0         20s
kafka-2   0/1       Terminating   0         20s
kafka-2   0/1       Pending   0         0s
kafka-2   0/1       Pending   0         0s
kafka-2   0/1       ContainerCreating   0         0s
kafka-2   0/1       Running   0         19s
kafka-2   1/1       Running   0         22s

Phased roll outs

Similar to rolling out a canary, you can roll out updates based on a phased progression (e.g. linear, geometric, or exponential roll outs).


If you patch the kafka StatefulSet to set the partition to 1, the StatefulSet controller updates one more broker.


$ kubectl patch sts kafka -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":1}}}}'
statefulset "kafka" patched


If you set it to 0, the StatefulSet controller updates the final broker and completes the update.


$ kubectl patch sts kafka -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":0}}}}'
statefulset "kafka" patched


Note that you don’t have to decrement the partition by one. For a larger StatefulSet--for example, one with 100 replicas--you might use a progression more like 100, 99, 90, 50, 0. In this case, you would stage your update, deploy a canary, roll out to 10 instances, update fifty percent of the Pods, and then complete the update.

Cleaning up

To delete the API Objects created above, you can use kubectl delete on the two manifests you used to create the ZooKeeper ensemble and the Kafka cluster.


$ kubectl delete -f kafka_mini.yaml
service "kafka-hs" deleted
poddisruptionbudget "kafka-pdb" deleted
Statefulset “kafka” deleted

$ kubectl delete -f zookeeper_mini.yaml
service "zk-hs" deleted
service "zk-cs" deleted
poddisruptionbudget "zk-pdb" deleted
statefulset "zk" deleted


By design, the StatefulSet controller does not delete any persistent volume claims (PVCs): the PVCs created for the ZooKeeper ensemble and the Kafka cluster must be manually deleted. Depending on the storage reclamation policy of your cluster, you many also need to manually delete the backing PVs.

DaemonSet rolling update, history, and rollback

In this section, we’re going to show you how to perform a rolling update on a DaemonSet, look at its history, and then perform a rollback after a bad rollout. We will use a DaemonSet to deploy a Prometheus node exporter on each Kubernetes node in the cluster. These node exporters export node metrics to the Prometheus monitoring system. For the sake of simplicity, we’ve omitted the installation of the Prometheus server and the service for communication with DaemonSet pods from this blogpost.

Prerequisites

To follow along with this section of the blog, you need a working Kubernetes 1.7 cluster and kubectl version 1.7 or later. If you followed along with the first section, you can use the same cluster.

DaemonSet rolling update: Prometheus node exporters

First, prepare the node exporter DaemonSet manifest to run a v0.13 Prometheus node exporter on every node in the cluster:


$ cat >> node-exporter-v0.13.yaml <<EOF
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
 name: node-exporter
spec:
 updateStrategy:
   type: RollingUpdate
 template:
   metadata:
     labels:
       app: node-exporter
     name: node-exporter
   spec:
     containers:
     - image: prom/node-exporter:v0.13.0
       name: node-exporter
       ports:
       - containerPort: 9100
         hostPort: 9100
         name: scrape
     hostNetwork: true
     hostPID: true
EOF


Note that you need to enable the DaemonSet rolling update feature by explicitly setting DaemonSet .spec.updateStrategy.type to RollingUpdate.


Apply the manifest to create the node exporter DaemonSet:


$ kubectl apply -f node-exporter-v0.13.yaml --record
daemonset "node-exporter" created


Wait for the first DaemonSet rollout to complete:


$ kubectl rollout status ds node-exporter
daemon set "node-exporter" successfully rolled out


You should see each of your node runs one copy of the node exporter pod:


$ kubectl get pods -l app=node-exporter -o wide


To perform a rolling update on the node exporter DaemonSet, prepare a manifest that includes the v0.14 Prometheus node exporter:


$ cat node-exporter-v0.13.yaml | sed "s/v0.13.0/v0.14.0/g" > node-exporter-v0.14.yaml


Then apply the v0.14 node exporter DaemonSet:


$ kubectl apply -f node-exporter-v0.14.yaml --record
daemonset "node-exporter" configured


Wait for the DaemonSet rolling update to complete:


$ kubectl rollout status ds node-exporter
...
Waiting for rollout to finish: 3 out of 4 new pods have been updated...
Waiting for rollout to finish: 3 of 4 updated pods are available...
daemon set "node-exporter" successfully rolled out


We just triggered a DaemonSet rolling update by updating the DaemonSet template. By default, one old DaemonSet pod will be killed and one new DaemonSet pod will be created at a time.


Now we’ll cause a rollout to fail by updating the image to an invalid value:


$ cat node-exporter-v0.13.yaml | sed "s/v0.13.0/bad/g" > node-exporter-bad.yaml

$ kubectl apply -f node-exporter-bad.yaml --record
daemonset "node-exporter" configured


Notice that the rollout never finishes:


$ kubectl rollout status ds node-exporter
Waiting for rollout to finish: 0 out of 4 new pods have been updated...
Waiting for rollout to finish: 1 out of 4 new pods have been updated…
# Use ^C to exit


This behavior is expected. We mentioned earlier that a DaemonSet rolling update kills and creates one pod at a time. Because the new pod never becomes available, the rollout is halted, preventing the invalid specification from propagating to more than one node. StatefulSet rolling updates implement the same behavior with respect to failed deployments. Unsuccessful updates are blocked until it corrected via roll back or by rolling forward with a specification.


$ kubectl get pods -l app=node-exporter
NAME                  READY     STATUS         RESTARTS   AGE
node-exporter-f2n14   0/1       ErrImagePull   0          3m
...

# N = number of nodes
$ kubectl get ds node-exporter
NAME            DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
node-exporter   N         N         N-1       1            N           <none>          46m


DaemonSet history, rollbacks, and rolling forward

Next,  perform a rollback. Take a look at the node exporter DaemonSet rollout history:


$ kubectl rollout history ds node-exporter
daemonsets "node-exporter"
REVISION        CHANGE-CAUSE
1               kubectl apply --filename=node-exporter-v0.13.yaml --record=true
2               kubectl apply --filename=node-exporter-v0.14.yaml --record=true
3               kubectl apply --filename=node-exporter-bad.yaml --record=true


Check the details of the revision you want to roll back to:


$ kubectl rollout history ds node-exporter --revision=2
daemonsets "node-exporter" with revision #2
Pod Template:
 Labels:       app=node-exporter
 Containers:
  node-exporter:
   Image:      prom/node-exporter:v0.14.0
   Port:       9100/TCP
   Environment:        <none>
   Mounts:     <none>
 Volumes:      <none>


You can quickly roll back to any DaemonSet revision you found through kubectl rollout history:


# Roll back to the last revision
$ kubectl rollout undo ds node-exporter
daemonset "node-exporter" rolled back

# Or use --to-revision to roll back to a specific revision
$ kubectl rollout undo ds node-exporter --to-revision=2
daemonset "node-exporter" rolled back


A DaemonSet rollback is done by rolling forward. Therefore, after the rollback, DaemonSet revision 2 becomes revision 4 (current revision):


$ kubectl rollout history ds node-exporter
daemonsets "node-exporter"
REVISION        CHANGE-CAUSE
1               kubectl apply --filename=node-exporter-v0.13.yaml --record=true
3               kubectl apply --filename=node-exporter-bad.yaml --record=true
4               kubectl apply --filename=node-exporter-v0.14.yaml --record=true


The node exporter DaemonSet is now healthy again:


$ kubectl rollout status ds node-exporter
daemon set "node-exporter" successfully rolled out

# N = number of nodes
$ kubectl get ds node-exporter
NAME            DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
node-exporter   N         N         N         N            N           <none>          46m


If current DaemonSet revision is specified while performing a rollback, the rollback is skipped:


$ kubectl rollout undo ds node-exporter --to-revision=4
daemonset "node-exporter" skipped rollback (current template already matches revision 4)


You will see this complaint from kubectl if the DaemonSet revision is not found:


$ kubectl rollout undo ds node-exporter --to-revision=10
error: unable to find specified revision 10 in history


Note that kubectl rollout history and kubectl rollout status support StatefulSets, too!

Cleaning up

$ kubectl delete ds node-exporter


What’s next for DaemonSet and StatefulSet

Rolling updates and roll backs close an important feature gap for DaemonSets and StatefulSets. As we plan for Kubernetes 1.8, we want to continue to focus on advancing the core controllers to GA. This likely means that some advanced feature requests (e.g. automatic roll back, infant mortality detection) will be deferred in favor of ensuring the consistency, usability, and stability of the core controllers. We welcome feedback and contributions, so please feel free to reach out on Slack, to ask questions on Stack Overflow, or open issues or pull requests on GitHub.


  • Post questions (or answer questions) on Stack Overflow
  • Join the community portal for advocates on K8sPort
  • Follow us on Twitter @Kubernetesio for latest updates
  • Connect with the community on Slack
  • Get involved with the Kubernetes project on GitHub