An open source system for automating deployment, scaling, and operations of applications.

Tuesday, March 28, 2017

Kubernetes 1.6: Multi-user, Multi-workloads at Scale

Today we’re announcing the release of Kubernetes 1.6.

In this release the community’s focus is on scale and automation, to help you deploy multiple workloads to multiple users on a cluster. We are announcing that 5,000 node clusters are supported. We moved dynamic storage provisioning to stable. Role-based access control (RBAC), kubefed, kubeadm, and several scheduling features are moving to beta. We have also added intelligent defaults throughout to enable greater automation out of the box.

What’s New

Scale and Federation: Large enterprise users looking for proof of at-scale performance will be pleased to know that Kubernetes’ stringent scalability SLO now supports 5,000 node (150,000 pod) clusters. This 150% increase in total cluster size, powered by a new version of etcd v3 by CoreOS, is great news if you are deploying applications such as search or games which can grow to consume larger clusters.

For users who want to scale beyond 5,000 nodes or spread across multiple regions or clouds, federation lets you combine multiple Kubernetes clusters and address them through a single API endpoint. In this release, the kubefed command line utility graduated to beta - with improved support for on-premise clusters. kubefed now automatically configures kube-dns on joining clusters and can pass arguments to federated components.

Security and Setup: Users concerned with security will find that RBAC, now beta adds a significant security benefit through more tightly scoped default roles for system components. The default RBAC policies in 1.6 grant scoped permissions to control-plane components, nodes, and controllers. RBAC allows cluster administrators to selectively grant particular users or service accounts fine-grained access to specific resources on a per-namespace basis. RBAC users upgrading from 1.5 to 1.6 should view the guidance here

Users looking for an easy way to provision a secure cluster on physical or cloud servers can use kubeadm, which is now beta. kubeadm has been enhanced with a set of command line flags and a base feature set that includes RBAC setup, use of the Bootstrap Token system and an enhanced Certificates API.

Advanced Scheduling: This release adds a set of powerful and versatile scheduling constructs to give you greater control over how pods are scheduled, including rules to restrict pods to particular nodes in heterogeneous clusters, and rules to spread or pack pods across failure domains such as nodes, racks, and zones.

Node affinity/anti-affinity, now in beta, allows you to restrict pods to schedule only on certain nodes based on node labels. Use built-in or custom node labels to select specific zones, hostnames, hardware architecture, operating system version, specialized hardware, etc. The scheduling rules can be required or preferred, depending on how strictly you want the scheduler to enforce them.

A related feature, called taints and tolerations, makes it possible to compactly represent rules for excluding pods from particular nodes. The feature, also now in beta, makes it easy, for example, to dedicate sets of nodes to particular sets of users, or to keep nodes that have special hardware available for pods that need the special hardware by excluding pods that don’t need it.

Sometimes you want to co-schedule services, or pods within a service, near each other topologically, for example to optimize North-South or East-West communication. Or you want to spread pods of a service for failure tolerance, or keep antagonistic pods separated, or ensure sole tenancy of nodes. Pod affinity and anti-affinity, now in beta, enables such use cases by letting you set hard or soft requirements for spreading and packing pods relative to one another within arbitrary topologies (node, zone, etc.).

Lastly, for the ultimate in scheduling flexibility, you can run your own custom scheduler(s) alongside, or instead of, the default Kubernetes scheduler. Each scheduler is responsible for different sets of pods. Multiple schedulers is beta in this release. 

Dynamic Storage Provisioning: Users deploying stateful applications will benefit from the extensive storage automation capabilities in this release of Kubernetes.

Since its early days, Kubernetes has been able to automatically attach and detach storage, format disk, mount and unmount volumes per the pod spec, and do so seamlessly as pods move between nodes. In addition, the PersistentVolumeClaim (PVC) and PersistentVolume (PV) objects decouple the request for storage from the specific storage implementation, making the pod spec portable across a range of cloud and on-premise environments. In this release StorageClass and dynamic volume provisioning are promoted to stable, completing the automation story by creating and deleting storage on demand, eliminating the need to pre-provision.

The design allows cluster administrators to define and expose multiple flavors of storage within a cluster, each with a custom set of parameters. End users can stop worrying about the complexity and nuances of how storage is provisioned, while still selecting from multiple storage options.

In 1.6 Kubernetes comes with a set of built-in defaults to completely automate the storage provisioning lifecycle, freeing you to work on your applications. Specifically, Kubernetes now pre-installs system-defined StorageClass objects for AWS, Azure, GCP, OpenStack and VMware vSphere by default. This gives Kubernetes users on these providers the benefits of dynamic storage provisioning without having to manually setup StorageClass objects. This is a change in the default behavior of PVC objects on these clouds. Note that default behavior is that dynamically provisioned volumes are created with the “delete” reclaim policy. That means once the PVC is deleted, the dynamically provisioned volume is automatically deleted so users do not have the extra step of ‘cleaning up’.

In addition, we have expanded the range of storage supported overall including:
  • ScaleIO Kubernetes Volume Plugin enabling pods to seamlessly access and use data stored on ScaleIO volumes.
  • Portworx Kubernetes Volume Plugin adding the capability to use Portworx as a storage provider for Kubernetes clusters. Portworx pools your server capacity and turns your servers or cloud instances into converged, highly available compute and storage nodes.
  • Support for NFSv3, NFSv4, and GlusterFS on clusters using the COS node image 
  • Support for user-written/run dynamic PV provisioners. A golang library and examples can be found here.
  • Beta support for mount options in persistent volumes
Container Runtime Interface, etcd v3 and Daemon set updates: while users may not directly interact with the container runtime or the API server datastore, they are foundational components for user facing functionality in Kubernetes’. As such the community invests in expanding the capabilities of these and other system components.
  • The Docker-CRI implementation is beta and is enabled by default in kubelet. Alpha support for other runtimes, cri-o, frakti, rkt, has also been implemented.
  • The default backend storage for the API server has been upgraded to use etcd v3 by default for new clusters. If you are upgrading from a 1.5 cluster, care should be taken to ensure continuity by planning a data migration window. 
  • Node reliability is improved as Kubelet exposes an admin configurable Node Allocatable feature to reserve compute resources for system daemons.
  • Daemon set updates lets you perform rolling updates on a daemon set

Alpha features: this release was mostly focused on maturing functionality, however, a few alpha features were added to support the roadmap

  • Out-of-tree cloud provider support adds a new cloud-controller-manager binary that may be used for testing the new out-of-core cloud provider flow
  • Per-pod-eviction in case of node problems combined with tolerationSeconds, lets users tune the duration a pod stays bound to a node that is experiencing problems
  • Pod Injection Policy adds a new API resource PodPreset to inject information such as secrets, volumes, volume mounts, and environment variables into pods at creation time.
  • Custom metrics support in the Horizontal Pod Autoscaler changed to use 
  • Multiple Nvidia GPU support is introduced with the Docker runtime only

These are just some of the highlights in our first release for the year. For a complete list please visit the release notes.

Community

This release is possible thanks to our vast and open community. Together, we’ve pushed nearly 5,000 commits by some 275 authors. To bring our many advocates together, the community has launched a new program called K8sPort, an online hub where the community can participate in gamified challenges and get credit for their contributions. Read more about the program here.


Release Process
A big thanks goes out to the release team for 1.6 (lead by Dan Gillespie of CoreOS) for their work bringing the 1.6 release to light. This release team is an exemplar of the Kubernetes community’s commitment to community governance. Dan is the first non-Google release manager and he, along with the rest of the team, worked throughout the release (building on the 1.5 release manager, Saad Ali’s, great work) to uncover and document tribal knowledge, shine light on tools and processes that still require special permissions, and prioritize work to improve the Kubernetes release process. Many thanks to the team.

User Adoption
We’re continuing to see rapid adoption of Kubernetes in all sectors and sizes of businesses. Furthermore, adoption is coming from across the globe, from a startup in Tennessee, USA to a Fortune 500 company in China. 

  • JD.com, one of China's largest internet companies, uses Kubernetes in conjunction with their OpenStack deployment. They’ve move 20% of their applications thus far on Kubernetes and are already running 20,000 pods daily. Read more about their setup here
  • Spire, a startup based in Tennessee, witnessed their public cloud provider experience an outage, but suffered zero downtime because Kubernetes was able to move their workloads to different zones. Read their full experience here.
    “With Kubernetes, there was never a moment of panic, just a sense of awe watching the automatic mitigation as it happened.”
  • Share your Kubernetes use case story with the community here.
Availability
Kubernetes 1.6 is available for download here on GitHub and via get.k8s.io. To get started with Kubernetes, try one of the these interactive tutorials

Get Involved
CloudNativeCon + KubeCon in Berlin is this week March 29-30, 2017. We hope to get together with much of the community and share more there!

Share your voice at our weekly community meeting
Many thanks for your contributions and advocacy!

-- Aparna Sinha, Senior Product Manager, Kubernetes, Google










Friday, March 24, 2017

The K8sPort: Engaging Kubernetes Community One Activity at a Time

Editor's note: Today’s post is by Ryan Quackenbush, Advocacy Programs Manager at Apprenda, showing a new community portal for Kubernetes advocates: the K8sPort. 

The K8sPort is a hub designed to help you, the Kubernetes community, earn credit for the hard work you’re putting forth in making this one of the most successful open source projects ever. Back at KubeCon Seattle in November, I presented a lightning talk of a preview of K8sPort. 

This hub, and our intentions in helping to drive this initiative in the community, grew out of a desire to help cultivate an engaged community of Kubernetes advocates. This is done through gamification in a community hub full of different activities called “challenges,” which are activities meant to help direct members of the community to attend various events and meetings, share and provide feedback on important content, answer questions posed on sites like Stack Overflow, and more. 

By completing these challenges, you collect points and can redeem them for different types of rewards and experiences, examples of which include charitable donations, gift certificates, conference tickets and more. As advocates complete challenges and gain points, they’ll earn performance-related badges, move up in community tiers and participate in a fun community leaderboard. 

My presentation at KubeCon, simply put, was a call for early signups. Those who’ve been piloting the program have, for the most part, had positive things to say about their experiences.
  • “Great way of improving the community and documentation. The gamification of Kubernetes gave me more insight into the stack as well.”
         - Jonas Kint, Devops Engineer at Showpad
  • “A great way to engage with the kubernetes project and also help the community. Fun stuff.” 
         - Kevin Duane, Systems Engineer at The Walt Disney Company
  • “K8sPort seems like an awesome idea for incentivising giving back to the community in a way that will hopefully cause more valuable help from more people than might usually be helping.”
         - William Stewart, Site Reliability Engineer at Superbalist
Today I am pleased to announce that the Cloud Native Computing Foundation (CNCF) is making the K8sPort generally available to the entire contributing community! We’ve simplified the signup process by allowing would-be advocates to authenticate and register through the use of their existing GitHub accounts.

Screen Shot 2017-03-22 at 10.59.04 AM.png

If you’re a contributing member of the Kubernetes community and you have an active GitHub account tied to the Kubernetes repository at GitHub, you can authenticate using your GitHub credentials and gain access to the K8sPort.

Screen Shot 2017-03-22 at 12.48.08 PM.png
Beyond the challenges that get posted regularly, community members will be recognized and compile points for things they’re already doing today. This will be accomplished through the K8sPort’s full integration with GitHub and the core Kubernetes repository. Once you authenticate, you’ll automatically begin earning points and recognition for various contributions -- including logging issues, making pull requests, code commits & more.

If you’re interested in joining the advocacy hub, please join us at k8sport.org! We hope you’re as excited about what you see as we are to continue to build it and present it to you.


For a quick walkthrough on K8sPort authentication and the hub itself, see this quick demo, below.



--Ryan Quackenbush, Advocacy Programs Manager, Apprenda

Friday, February 24, 2017

Deploying PostgreSQL Clusters using StatefulSets

Editor’s note: Today’s guest post is by Jeff McCormick, a developer at Crunchy Data, showing how to build a PostgreSQL cluster using the new Kubernetes StatefulSet feature.

In an earlier post, I described how to deploy a PostgreSQL cluster using Helm, a Kubernetes package manager. The following example provides the steps for building a PostgreSQL cluster using the new Kubernetes StatefulSets feature. 

StatefulSets Example

Step 1 - Create Kubernetes Environment

StatefulSets is a new feature implemented in Kubernetes 1.5 (prior versions it was known as PetSets). As a result, running this example will require an environment based on Kubernetes 1.5.0 or above.  

The example in this blog deploys on Centos7 using kubeadm. Some instructions on what kubeadm provides and how to deploy a Kubernetes cluster is located here.

Step 2 - Install NFS

The example in this blog uses NFS for the Persistent Volumes, but any shared file system would also work (ex: ceph, gluster).  

The example script assumes your NFS server is running locally and your hostname resolves to a known IP address. 

In summary, the steps used to get NFS working on a Centos 7 host are as follows:

sudo setsebool -P virt_use_nfs 1
sudo yum -y install nfs-utils libnfsidmap
sudo systemctl enable rpcbind nfs-server
sudo systemctl start rpcbind nfs-server rpc-statd nfs-idmapd
sudo mkdir /nfsfileshare
sudo chmod 777 /nfsfileshare/
sudo vi /etc/exports
sudo exportfs -r


The /etc/exports file should contain a line similar to this one except with the applicable IP address specified:

/nfsfileshare 192.168.122.9(rw,sync)

After these steps NFS should be running in the test environment.

Step 3 - Clone the Crunchy PostgreSQL Container Suite

The example used in this blog is found at in the Crunchy Containers GitHub repo here. Clone the Crunchy Containers repository to your test Kubernertes host and go to the example:

cd $HOME
git clone https://github.com/CrunchyData/crunchy-containers.git
cd crunchy-containers/examples/kube/statefulset

Next, pull down the Crunchy PostgreSQL container image:

docker pull crunchydata/crunchy-postgres:centos7-9.5-1.2.6

Step 4 - Run the Example

To begin, it is necessary to set a few of the environment variables used in the example:

export BUILDBASE=$HOME/crunchy-containers
export CCP_IMAGE_TAG=centos7-9.5-1.2.6

BUILDBASE is where you cloned the repository and CCP_IMAGE_TAG is the container image version we want to use.

Next, run the example:

./run.sh

That script will create several Kubernetes objects including:
  •  Persistent Volumes (pv1, pv2, pv3)
  •  Persistent Volume Claim (pgset-pvc)
  •  Service Account (pgset-sa)
  •  Services (pgset, pgset-master, pgset-replica)
  •  StatefulSet (pgset)
  •  Pods (pgset-0, pgset-1)
At this point, two pods will be running in the Kubernetes environment: 

$ kubectl get pod
NAME      READY     STATUS    RESTARTS   AGE
pgset-0   1/1       Running   0          2m
pgset-1   1/1       Running   1          2m

Immediately after the pods are created, the deployment will be as depicted below:

Step 5 - What Just Happened?

This example will deploy a StatefulSet, which in turn creates two pods.

The containers in those two pods run the PostgreSQL database. For a PostgreSQL cluster, we need one of the containers to assume the master role and the other containers to assume the replica role. 

So, how do the containers determine who will be the master, and who will be the replica?

This is where the new StateSet mechanics come into play. The StateSet mechanics assign a unique ordinal value to each pod in the set.

The StatefulSets provided unique ordinal value always start with 0. During the initialization of the container, each container examines its assigned ordinal value. An ordinal value of 0 causes the container to assume the master role within the PostgreSQL cluster. For all other ordinal values, the container assumes a replica role. This is a very simple form of discovery made possible by the StatefulSet mechanics.

PostgreSQL replicas are configured to connect to the master database via a Service dedicated to the master database. In order to support this replication, the example creates a separate Service for each of the master role and the replica role. Once the replica has connected, the replica will begin replicating state from the master.  

During the container initialization, a master container will use a Service Account (pgset-sa) to change it’s container label value to match the master Service selector.  Changing the label is important to enable traffic destined to the master database to reach the correct container within the Stateful Set.  All other pods in the set assume the replica Service label by default.

Step 6 - Deployment Diagram

The example results in a deployment depicted below:
In this deployment, there is a Service for the master and a separate Service for the replica.  The replica is connected to the master and replication of state has started.

The Crunchy PostgreSQL container supports other forms of cluster deployment, the style of deployment is dictated by setting the PG_MODE environment variable for the container.  In the case of a StatefulSet deployment, that value is set to: PG_MODE=set

This environment variable is a hint to the container initialization logic as to the style of deployment we intend.

Step 7 - Testing the Example

The tests below assume that the psql client has been installed on the test system.  If if not, the psql client has been previously installed, it can be installed as follows:

sudo yum -y install postgresql

In addition, the tests below assume that the tested environment DNS resolves to the Kube DNS and that the tested environment DNS search path is specified to match the applicable Kube namespace and domain. The master service is named pgset-master and the replica service is named pgset-replica.

Test the master as follows (the password is password):

psql -h pgset-master -U postgres postgres -c 'table pg_stat_replication'

If things are working, the command above will return output indicating that a single replica is connecting to the master.

Next, test the replica as follows:

psql -h pgset-replica -U postgres postgres  -c 'create table foo (id int)'

The command above should fail as the replica is read-only within a PostgreSQL cluster.

Next, scale up the set as follows:

kubectl scale statefulset pgset --replicas=3

The command above should successfully create a new replica pod called pgset-2 as depicted below:


Step 8 - Persistence Explained

Take a look at the persisted PostgreSQL data files on the resulting NFS mount path:

$ ls -l /nfsfileshare/
total 12
drwx------ 20   26   26 4096 Jan 17 16:35 pgset-0
drwx------ 20   26   26 4096 Jan 17 16:35 pgset-1
drwx------ 20   26   26 4096 Jan 17 16:48 pgset-2

Each container in the stateful set binds to the single NFS Persistent Volume Claim (pgset-pvc) created in the example script.  

Since NFS and the PVC can be shared, each pod can write to this NFS path.  

The container is designed to create a subdirectory on that path using the pod host name for uniqueness.

Conclusion

StatefulSets is an exciting feature added to Kubernetes for container builders that are implementing clustering. The ordinal values assigned to the set provide a very simple mechanism to make clustering decisions when deploying a PostgreSQL cluster.  


--Jeff McCormick, Developer, Crunchy Data

Tuesday, February 21, 2017

Containers as a Service, the foundation for next generation PaaS


Today’s post is by Brendan Burns, Partner Architect, at Microsoft & Kubernetes co-founder.

Containers are revolutionizing the way that people build, package and deploy software. But what is often overlooked is how they are revolutionizing the way that people build the software that builds, packages and deploys software. (it’s ok if you have to read that sentence twice…) Today, and in a talk at Container World tomorrow, I’m taking a look at how container orchestrators like Kubernetes form the foundation for next generation platform as a service (PaaS). In particular, I’m interested in how cloud container as a service (CaaS) platforms like Azure Container Service, Google Container Engine and others are becoming the new infrastructure layer that PaaS is built upon.

To see this, it’s important to consider the set of services that have traditionally been provided by PaaS platforms:

  • Source code and executable packaging and distribution
  • Reliable, zero-downtime rollout of software versions
  • Healing, auto-scaling, load balancing

When you look at this list, it’s clear that most of these traditional “PaaS” roles have now been taken over by containers. The container image and container image build tooling has become the way to package up your application. Container registries have become the way to distribute your application across the world. Reliable software rollout is achieved using orchestrator concepts like Deployment in Kubernetes, and service healing, auto-scaling and load-balancing are all properties of an application deployed in Kubernetes using ReplicaSets and Services.

What then is left for PaaS? Is PaaS going to be replaced by container as a service? I think the answer is “no.” The piece that is left for PaaS is the part that was always the most important part of PaaS in the first place, and that’s the opinionated developer experience. In addition to all of the generic parts of PaaS that I listed above, the most important part of a PaaS has always been the way in which the developer experience and application framework made developers more productive within the boundaries of the platform. PaaS enables developers to go from source code on their laptop to a world-wide scalable service in less than an hour. That’s hugely powerful. 

However, in the world of traditional PaaS, the skills needed to build PaaS infrastructure itself, the software on which the user’s software ran, required very strong skills and experience with distributed systems. Consequently, PaaS tended to be built by distributed system engineers rather than experts in a particular vertical developer experience. This means that PaaS platforms tended towards general purpose infrastructure rather than targeting specific verticals. Recently, we have seen this start to change, first with PaaS targeted at mobile API backends, and later with PaaS targeting “function as a service”. However, these products were still built from the ground up on top of raw infrastructure.

More recently, we are starting to see these platforms build on top of container infrastructure. Taking for example “function as a service” there are at least two (and likely more) open source implementations of functions as a service that run on top of Kubernetes (fission and funktion). This trend will only continue. Building a platform as a service, on top of container as a service is easy enough that you could imagine giving it out as an undergraduate computer science assignment. This ease of development means that individual developers with specific expertise in a vertical (say software for running three-dimensional simulations) can and will build PaaS platforms targeted at that specific vertical experience. In turn, by targeting such a narrow experience, they will build an experience that fits that narrow vertical perfectly, making their solution a compelling one in that target market.

This then points to the other benefit of next generation PaaS being built on top of container as a service. It frees the developer from having to make an “all-in” choice on a particular PaaS platform. When layered on top of container as a service, the basic functionality (naming, discovery, packaging, etc) are all provided by the CaaS and thus common across multiple PaaS that happened to be deployed on top of that CaaS. This means that developers can mix and match, deploying multiple PaaS to the same container infrastructure, and choosing for each application the PaaS platform that best suits that particular platform. Also, importantly, they can choose to “drop down” to raw CaaS infrastructure if that is a better fit for their application. Freeing PaaS from providing the infrastructure layer, enables PaaS to diversify and target specific experiences without fear of being too narrow. The experiences become more targeted, more powerful, and yet by building on top of container as a service, more flexible as well.

Kubernetes is infrastructure for next generation applications, PaaS and more. Given this, I’m really excited by our announcement today that Kubernetes on Azure Container Service has reached general availability. When you deploy your next generation application to Azure, whether on a PaaS or deployed directly onto Kubernetes itself (or both) you can deploy it onto a managed, supported Kubernetes cluster.

Furthermore, because we know that the world of PaaS and software development in general is a hybrid one, we’re excited to announce the preview availability of Windows clusters in Azure Container Service. We’re also working on hybrid clusters in ACS-Engine and expect to roll those out to general availability in the coming months.

I’m thrilled to see how containers and container as a service is changing the world of compute, I’m confident that we’re only scratching the surface of the transformation we’ll see in the coming months and years.


--Brendan Burns, Partner Architect, at Microsoft and co-founder of Kubernetes



Friday, February 10, 2017

Inside JD.com's Shift to Kubernetes from OpenStack

Editor's note: Today’s post is by the Infrastructure Platform Department team at JD.com about their transition from OpenStack to Kubernetes. JD.com is one of China’s largest companies and the first Chinese Internet company to make the Global Fortune 500 list.



History of cluster building

The era of physical machines (2004-2014)

Before 2014, our company's applications were all deployed on the physical machine. In the age of physical machines, we needed to wait an average of one week for the allocation to application coming on-line. Due to the lack of isolation, applications would affected each other, resulting in a lot of potential risks. At that time, the average number of tomcat instances on each physical machine was no more than nine. The resource of physical machine was seriously wasted and the scheduling was inflexible. The time of application migration took hours due to the breakdown of physical machines. And the auto-scaling cannot be achieved. To enhance the efficiency of application deployment, we developed compilation-packaging, automatic deployment, log collection, resource monitoring and some other systems.

Containerized era (2014-2016)

The Infrastructure Platform Department (IPD) led by Liu Haifeng--Chief Architect of JD.COM, sought a new resolution in the fall of 2014. Docker ran into our horizon. At that time, docker had been rising, but was slightly weak and lacked of experience in production environment. We had repeatedly tested docker. In addition, docker was customized to fix a couple of issues, such as system crash caused by device mapper and some Linux kernel bugs. We also added plenty of new features into docker, including disk speed limit, capacity management, and layer merging in image building and so on.

To manage the container cluster properly, we chose the architecture of OpenStack + Novadocker driver. Containers are managed as virtual machines. It is known as the first generation of JD container engine platform--JDOS1.0 (JD Datacenter Operating System). The main purpose of JDOS 1.0 is to containerize the infrastructure. All applications run in containers rather than physical machines since then. As for the operation and maintenance of applications, we took full advantage of existing tools. The time for developers to request computing resources in production environment reduced to several minutes rather than a week. After the pooling of computing resources, even the scaling of 1,000 containers would be finished in seconds. Application instances had been isolated from each other. Both the average deployment density of applications and the physical machine utilization had increased by three times, which brought great economic benefits.

We deployed clusters in each IDC and provided unified global APIs to support deployment across the IDC. There are 10,000 compute nodes at most and 4,000 at least in a single OpenStack distributed container cluster in our production environment. The first generation of container engine platform (JDOS 1.0) successfully supported the “6.18” and “11.11” promotional activities in both 2015 and 2016. There are already 150,000 running containers online by November 2016.

“6.18” and “11.11” are known as the two most popular online promotion of JD.COM, similar to the black Friday promotions. Fulfilled orders in November 11, 2016 reached 30 million. 

In the practice of developing and promoting JDOS 1.0, applications were migrated directly from physical machines to containers. Essentially, JDOS 1.0 was an implementation of IaaS. Therefore, deployment of applications was still heavily dependent on compilation-packaging and automatic deployment tools. However, the practice of JDOS1.0 is very meaningful. Firstly, we successfully moved business into containers. Secondly, we have a deep understanding of container network and storage, and know how to polish them to the best. Finally, all the experiences lay a solid foundation for us to develop a brand new application container platform.

New container engine platform (JDOS 2.0)

Platform architecture

When JDOS 1.0 grew from 2,000 containers to 100,000, we launched a new container engine platform (JDOS 2.0). The goal of JDOS 2.0 is not just an infrastructure management platform, but also a container engine platform faced to applications. On the basic of JDOS 1.0 and Kubernetes, JDOS 2.0 integrates the storage and network of JDOS 1.0, gets through the process of CI/CD from the source to the image, and finally to the deployment. Also, JDOS 2.0 provides one-stop service such as log, monitor, troubleshooting, terminal and orchestration. The platform architecture of JDOS 2.0 is shown below.


D:\百度云同步盘\徐新坤-新人培训计划\docker\MAE\分享\arc.png

Function
Product
Source Code Management
Gitlab
Container Tool
Docker
Container Networking
Cane
Container Engine
Kubernetes
Image Registry
Harbor
CI Tool
Jenkins
Log Management
Logstash + Elastic Search
Monitor
Prometheus

In JDOS 2.0, we define two levels, system and application. A system consists of several applications and an application consists of several Pods which provide the same service. In general, a department can apply for one or more systems which directly corresponds to the namespace of Kubernetes. This means that the Pods of the same system will be in the same namespace.


Most of the JDOS 2.0 components (GitLab / Jenkins / Harbor / Logstash / Elastic Search / Prometheus) are also containerized and deployed on the Kubernetes platform.

One Stop Solution


D:\百度云同步盘\徐新坤-新人培训计划\docker\MAE\分享\cicd.png



  1. JDOS 2.0 takes docker image as the core to implement continuous integration and continuous deployment.
  2. Developer pushes code to git.
  3. Git triggers the jenkins master to generate build job.
  4. Jenkins master invokes Kubernetes to create jenkins slave Pod.
  5. Jenkins slave pulls the source code, compiles and packs.
  6. Jenkins slave sends the package and the Dockerfile to the image build node with docker.
  7. The image build node builds the image.
  8. The image build node pushes the image to the image registry Harbor.
  9. User creates or updates app Pods in different zone.

The docker image in JDOS 1.0 consisted primarily of the operating system and the runtime software stack of the application. So, the deployment of applications was still dependent on the auto-deployment and some other tools. While in JDOS 2.0, the deployment of the application is done during the image building. And the image contains the complete software stack, including App. With the image, we can achieve the goal of running applications as designed in any environment.

D:\百度云同步盘\徐新坤-新人培训计划\docker\MAE\分享\image.png

Networking and External Service Load Balancing

JDOS 2.0 takes the network solution of JDOS 1.0, which is implemented with the VLAN model of OpenStack Neutron. This solution enables highly efficient communication between containers, making it ideal for a cluster environment within a company. Each Pod occupies a port in Neutron, with a separate IP. Based on the Container Network Interface standard (CNI) standard, we have developed a new project Cane for integrating kubelet and Neutron.

D:\百度云同步盘\徐新坤-新人培训计划\docker\MAE\分享\network.png


At the same time, Cane is also responsible for the management of LoadBalancer in Kubernetes service. When a LoadBalancer is created / deleted / modified, Cane will call the creating / removing / modifying interface of the lbaas service in Neutron. In addition, the Hades component in the Cane project provides an internal DNS resolution service for the Pods.

The source code of the Cane project is currently being finished and will be released on GitHub soon.

Flexible Scheduling


D:\百度云同步盘\徐新坤-新人培训计划\docker\MAE\分享\schedule.pngJDOS 2.0 accesses applications, including big data, web applications, deep learning and some other types, and takes more diverse and flexible scheduling approaches. In some IDCs, we experimentally mixed deployment of online tasks and offline tasks. Compared to JDOS 1.0, overall resource utilization increased by about 30%.

Summary

The rich functionality of Kubernetes allows us to pay more attention to the entire ecosystem of the platform, such as network performance, rather than the platform itself. In particular, the SREs highly appreciated the functionality of replication controller. With it, the scaling of the applications is achieved in several seconds. JDOS 2.0 now has accessed about 20% of the applications, and deployed 2 clusters with about 20,000 Pods running daily. We plan to access more applications of our company, to replace the current JDOS 1.0. And we are also glad to share our experience in this process with the community.

Thank you to all the contributors of Kubernetes and the other open source projects.


--Infrastructure Platform Department team at JD.com