The process of introducing machine learning into your product involves creating and training a model on your dataset, and then pushing the model to production to serve requests. In this blog post, we’ll show you how you can use Kubernetes with TensorFlow Serving, a high performance, open source serving system for machine learning models, to meet the scaling demands of your application.
Let’s use image classification as an example. Suppose your application needs to be able to correctly identify an image across a set of categories. For example, given the cute puppy image below, your system should classify it as a retriever.
|Image via Wikipedia|
exported, TensorFlow Serving uses the model to perform inference — predictions based on new data presented by its clients. In our example, clients submit image classification requests over gRPC, a high performance, open source RPC framework from Google.
Inference can be very resource intensive. Our server executes the following TensorFlow graph to process every classification request it receives. The Inception-v3 model has over 27 million parameters and runs 5.7 billion floating point operations per inference.
|Schematic diagram of Inception-v3|
To help you try this out yourself, we’ve written a step-by-step tutorial, which shows you how to create the TensorFlow Serving Docker container to serve the Inception-v3 image classification model, configure a Kubernetes cluster and run classification requests against it. We hope this will make it easier for you to integrate machine learning into your own applications and scale it with Kubernetes! To learn more about TensorFlow Serving, check out tensorflow.github.io/serving.
- Fangwei Li, Software Engineer, Google