An open source system for automating deployment, scaling, and operations of applications.

Wednesday, April 29, 2015

Weekly Kubernetes Community Hangout Notes - April 24 2015


Every week the Kubernetes contributing community meet virtually over Google Hangouts. We want anyone who's interested to know what's discussed in this forum.

Agenda:
  • Flocker and Kubernetes integration demo
Notes:
  • flocker and kubernetes integration demo
    • Cool demo by Kai Davenport
  • Flocker Q/A
    • Does the file still exists on node1 after migration?
      • Luke: still exists, but unmounted and cannot be written to. Data persists and can be used. Working on support for multiple storage backend.
    • Brendan: Any plan this to make it a volume? So we don't need powerstrip?
      • Luke:  Need to figure out interest to decide if we want to make it a first-class persistent disk provider in kube.
      • Brendan: Removing need for powerstrip would make it simple to use. Totally go for it.
      • Tim: Should take no more than 45 minutes to add it to kubernetes:)
    • Derek: Contrast this with persistent volumes and claims?
      • Luke: Not much difference, except for the novel ZFS based backend. Makes workloads really portable.
      • Tim: very different than network-based volumes. Its interesting that it is the only offering that allows upgrading media.
      • Brendan: claims, how does it look for replicated claims? eg Cassandra wants to have replicated data underneath. It would be efficient to scale up and down. Create storage on the fly based on load dynamically. Its step beyond taking snapshots - programmatically creating replicas with preallocation.
      • Tim: helps with auto-provisioning.
    • Brian: Does flocker requires any other component?
      • Kai: Flocker control service co-located with the master.  (dia on blog post). Powerstrip + Powerstrip Flocker. Very interested in mpersisting state in etcd. It keeps metadata about each volume.
      • Brendan: In future, flocker can be a plugin and we'll take care of persistence. Post v1.0.
      • Brian: Interested in adding generic plugin for services like flocker.
      • Luke: Zfs can become really valuable when scaling to lot of containers on a single node.
    • Alex: Can flocker service can be run as a pod?
      • Kai: Yes, only requirement is the flocker control service should be able to talk to zfs agent. zfs agent needs to be installed on the host and zfs binaries need to be accessible.
      • Brendan: In theory, all zfs bits can be put it into a container with devices.
      • Luke: Yes, still working through cross-container mounting issue.
      • Tim: pmorie is working through it to make kubelet work in a container. Possible re-use.
    • Kai: Cinder support is coming. Few days away.
  • Bob: What’s the process of pushing kube to GKE? Need more visibility for confidence.

Thursday, April 23, 2015

Borg: The Predecessor to Kubernetes

Google has been running containerized workloads in production for more than a decade. Whether it's service jobs like web front-ends and stateful servers, infrastructure systems like Bigtable and Spanner, or batch frameworks like MapReduce and Millwheel, virtually everything at Google runs as a container. Today, we took the wraps off of Borg, Google’s long-rumored internal container-oriented cluster-management system, publishing details at the academic computer systems conference Eurosys. You can find the paper here.


Kubernetes traces its lineage directly from Borg. Many of the developers at Google working on Kubernetes were formerly developers on the Borg project. We've incorporated the best ideas from Borg in Kubernetes, and have tried to address some pain points that users identified with Borg over the years.


To give you a flavor, here are four Kubernetes features that came from our experiences with Borg:


1) Pods. A pod is the unit of scheduling in Kubernetes. It is a resource envelope in which one or more containers run. Containers that are part of the same pod are guaranteed to be scheduled together onto the same machine, and can share state via local volumes.


Borg has a similar abstraction, called an alloc (short for “resource allocation”). Popular uses of allocs in Borg include running a web server that generates logs alongside a lightweight log collection process that ships the log to a cluster filesystem (not unlike fluentd or logstash); running a web server that serves data from a disk directory that is populated by a process that reads data from a cluster filesystem and prepares/stages it for the web server (not unlike a Content Management System); and running user-defined processing functions alongside a storage shard. Pods not only support these use cases, but they also provide an environment similar to running multiple processes in a single VM -- Kubernetes users can deploy multiple co-located, cooperating processes in a pod without having to give up the simplicity of a one-application-per-container deployment model.


2) Services. Although Borg’s primary role is to manage the lifecycles of tasks and machines, the applications that run on Borg benefit from many other cluster services, including naming and load balancing. Kubernetes supports naming and load balancing using the service abstraction: a service has a name and maps to a dynamic set of pods defined by a label selector (see next section). Any container in the cluster can connect to the service using the service name. Under the covers, Kubernetes automatically load-balances connections to the service among the pods that match the label selector, and keeps track of where the pods are running as they get rescheduled over time due to failures.


3) Labels. A container in Borg is usually one replica in a collection of identical or nearly identical containers that correspond to one tier of an Internet service (e.g. the front-ends for Google Maps) or to the workers of a batch job (e.g. a MapReduce). The collection is called a Job, and each replica is called a Task. While the Job is a very useful abstraction, it can be limiting. For example, users often want to manage their entire service (composed of many Jobs) as a single entity, or to uniformly manage several related instances of their service, for example separate canary and stable release tracks. At the other end of the spectrum, users frequently want to reason about and control subsets of tasks within a Job -- the most common example is during rolling updates, when different subsets of the Job need to have different configurations.


Kubernetes supports more flexible collections than Borg by organizing pods using labels, which are arbitrary key/value pairs that users attach to pods (and in fact to any object in the system). Users can create groupings equivalent to Borg Jobs by using a “job:<jobname>” label on their pods, but they can also use additional labels to tag the service name, service instance (production, staging, test), and in general, any subset of their pods. A label query (called a “label selector”) is used to select which set of pods an operation should be applied to. Taken together, labels and replication controllers allow for very flexible update semantics, as well as for operations that span the equivalent of Borg Jobs.


4) IP-per-Pod. In Borg, all tasks on a machine use the IP address of that host, and thus share the host’s port space. While this means Borg can use a vanilla network, it imposes a number of burdens on infrastructure and application developers: Borg must schedule ports as a resource; tasks must pre-declare how many ports they need, and take as start-up arguments which ports to use; the Borglet (node agent) must enforce port isolation; and the naming and RPC systems must handle ports as well as IP addresses.


Thanks to the advent of software-defined overlay networks such as flannel or those built into public clouds, Kubernetes is able to give every pod and service its own IP address. This removes the infrastructure complexity of managing ports, and allows developers to choose any ports they want rather than requiring their software to adapt to the ones chosen by the infrastructure. The latter point is crucial for making it easy to run off-the-shelf open-source applications on Kubernetes--pods can be treated much like VMs or physical hosts, with access to the full port space, oblivious to the fact that they may be sharing the same physical machine with other pods.


With the growing popularity of container-based microservice architectures, the lessons Google has learned from running such systems internally have become of increasing interest to the external DevOps community. By revealing some of the inner workings of our cluster manager Borg, and building our next-generation cluster manager as both an open-source project (Kubernetes) and a publicly available hosted service (Google Container Engine), we hope these lessons can benefit the broader community outside of Google and advance the state-of-the-art in container scheduling and cluster management.  

Wednesday, April 22, 2015

Kubernetes and the Mesosphere DCOS


Today Mesosphere announced the addition of Kubernetes as a standard part of their DCOS offering.  This is a great step forwards in bringing cloud native application management to the world, and should lay to rest many questions we hear about ‘Kubernetes or Mesos, which one should I use?’.  Now you can have your cake and eat it too:  use both.  Today’s announcement extends the reach of Kubernetes to a new class of users, and add some exciting new capabilities for everyone.
By way of background, Kubernetes is a cluster management framework that was started by Google nine months ago, inspired by the internal system known as Borg.  You can learn a little more about Borg by checking out this paper.  At the heart of it Kubernetes offers what has been dubbed ‘cloud native’ application management.  To us, there are three things that together make something ‘cloud native’:

  • Container oriented deployments.  Package up your application components with all their dependencies and deploy them using technologies like Docker or Rocket.  Containers radically simplify the deployment process, making rollouts repeatable and predictable.
  • Dynamically managed.  Rely on modern control systems to make moment-to-moment decisions around the health management and scheduling of applications to radically improve reliability and efficiency.  There are some things that just machines do better than people, and actively running applications is one of those things.  
  • Micro-services oriented.  Tease applications apart into small semi-autonomous services that can be consumed easily so that the resulting systems are easier to understand, extend and adapt.

Kubernetes was designed from the start to make these capabilities available to everyone, and built by the same engineers that built the system internally known as Borg.  For many users the promise of ‘Google style app management’ is interesting, but they want to run these new classes of applications on the same set of physical resources as their existing workloads like Hadoop, Spark, Kafka, etc.  Now they will have access to commercially supported offering that brings the two worlds together.

Mesosphere, one of the earliest supporters of the Kubernetes project, has been working closely with the core Kubernetes team to create a natural experience for users looking to get the best of both worlds, adding Kubernetes to every Mesos deployment they instantiate, whether it be in the public cloud, private cloud, or in a hybrid deployment model.  This is well aligned with the overall Kubernetes vision of creating ubiquitous management framework that runs anywhere a container can.  It will be interesting to see how you blend together the old world and the new on a commercially supported, versatile platform.

Craig McLuckie
Product Manager, Google and Kubernetes co-founder

Friday, April 17, 2015

Weekly Kubernetes Community Hangout Notes - April 17 2015


Every week the Kubernetes contributing community meet virtually over Google Hangouts. We want anyone who's interested to know what's discussed in this forum.

Agenda
  • Mesos Integration
  • High Availability (HA)
  • Adding performance and profiling details to e2e to track regressions
  • Versioned clients

Notes

  • Mesos integration
  • HA
    • Proposal should land today.
    • Etcd cluster.
    • Load-balance apiserver.
    • Cold standby for controller manager and other master components.
  • Adding performance and profiling details to e2e to track regression
  • Versioned clients
  • Security context
  • Discussing upstreaming of users, etc. into Kubernetes, at least as optional
  • 1.0 Roadmap
    • Focus is performance, stability, cluster upgrades
    • TJ has been making some edits to roadmap.md but hasn’t sent out a PR yet
  • Kubernetes UI
    • Dependencies broken out into third-party
    • @lavalamp is reviewer

Thursday, April 16, 2015

Introducing Kubernetes API Version v1beta3

We've been hard at work on cleaning up the API over the past several months (see https://github.com/GoogleCloudPlatform/kubernetes/issues/1519 for details). The result is v1beta3, which is considered to be the release candidate for the v1 API.

We would like you to move to this new API version as soon as possible. v1beta1 and v1beta2 are deprecated, and will be removed by the end of June, shortly after we introduce the v1 API.

As of the latest release, v0.15.0, v1beta3 is the primary, default API. We have changed the default kubectl and client API versions as well as the default storage version (which means objects persisted in etcd will be converted from v1beta1 to v1beta3 as they are rewritten). 

You can take a look at v1beta3 examples such as:

To aid the transition, we've also created a conversion tool and put together a list of important different API changes.

  • The resource id is now called name.
  • namelabelsannotations, and other metadata are now nested in a map called metadata
  • desiredState is now called spec, and currentState is now called status
  • /minions has been moved to /nodes, and the resource has kind Node
  • The namespace is required (for all namespaced resources) and has moved from a URL parameter to the path:/api/v1beta3/namespaces/{namespace}/{resource_collection}/{resource_name}
  • The names of all resource collections are now lower cased - instead of replicationControllers, usereplicationcontrollers.
  • To watch for changes to a resource, open an HTTP or Websocket connection to the collection URL and provide the?watch=true URL parameter along with the desired resourceVersion parameter to watch from.
  • The container entrypoint has been renamed to command, and command has been renamed to args.
  • Container, volume, and node resources are expressed as nested maps (e.g., resources{cpu:1}) rather than as individual fields, and resource values support scaling suffixes rather than fixed scales (e.g., milli-cores).
  • Restart policy is represented simply as a string (e.g., "Always") rather than as a nested map ("always{}").
  • The volume source is inlined into volume rather than nested.
  • Host volumes have been changed to hostDir to hostPath  to better reflect that they can be files or directories

And the most recently generated Swagger specification of the API is here:

More details about our approach to API versioning and the transition can be found here:

Another change we discovered is that with the change to the default API version in kubectl, commands that use "-o template" will break unless you specify "--api-version=v1beta1" or update to v1beta3 syntax. An example of such a change can be seen here:

If you use "-o template", I recommend always explicitly specifying the API version rather than relying upon the default. We may add this setting to kubeconfig in the future.

Let us know if you have any questions. As always, we're available on IRC (#google-containers) and github issues.

Kubernetes Release: 0.15.0

Release Notes:

  • Enables v1beta3 API and sets it to the default API version (#6098)
  • Added multi-port Services (#6182)
    • New Getting Started Guides
    • Multi-node local startup guide (#6505)
    • Mesos on Google Cloud Platform (#5442)
    • Ansible Setup instructions (#6237)
  • Added a controller framework (#5270#5473)
  • The Kubelet now listens on a secure HTTPS port (#6380)
  • Made kubectl errors more user-friendly (#6338)
  • The apiserver now supports client cert authentication (#6190)
  • The apiserver now limits the number of concurrent requests it processes (#6207)
  • Added rate limiting to pod deleting (#6355)
  • Implement Balanced Resource Allocation algorithm as a PriorityFunction in scheduler package (#6150)
  • Enabled log collection from master (#6396)
  • Added an api endpoint to pull logs from Pods (#6497)
  • Added latency metrics to scheduler (#6368)
  • Added latency metrics to REST client (#6409)
  • etcd now runs in a pod on the master (#6221)
  • nginx now runs in a container on the master (#6334)
  • Began creating Docker images for master components (#6326)
  • Updated GCE provider to work with gcloud 0.9.54 (#6270)
  • Updated AWS provider to fix Region vs Zone semantics (#6011)
  • Record event when image GC fails (#6091)
  • Add a QPS limiter to the kubernetes client (#6203)
  • Decrease the time it takes to run make release (#6196)
  • New volume support
    • Added iscsi volume plugin (#5506)
    • Added glusterfs volume plugin (#6174)
    • AWS EBS volume support (#5138)
  • Updated to heapster version to v0.10.0 (#6331)
  • Updated to etcd 2.0.9 (#6544)
  • Updated to Kibana to v1.2 (#6426)
  • Bug Fixes
    • Kube-proxy now updates iptables rules if a service's public IPs change (#6123)
    • Retry kube-addons creation if the initial creation fails (#6200)
    • Make kube-proxy more resiliant to running out of file descriptors (#6727)

To download, please visit https://github.com/GoogleCloudPlatform/kubernetes/releases/tag/v0.15.0