Getting Started with Kubeflow - A Componentized, Portable, and Scalable Machine Learning Stack for Kubernetes

[Editor's note] This article is from David Aronchick, product manager of the Kubeflow project, and Jeremy Lewi, chief engineer, and mainly talks about some introductory knowledge of their new open source project, Kubeflow, which is committed to making the machine learning stack on Kubernetes simple and fast. and scalable.

Kubernetes has rapidly grown into a hybrid solution for deploying complex workloads. While providing only stateless services, customers have begun to move complex workloads to the Kubernetes platform and take advantage of the rich API, reliability, and good performance that Kubernetes provides. One of the fastest growing uses is the use of Kubernetes as a deployment platform for machine learning.

Building a production machine learning system involves a variety of components, often using a mix of vendors combined with homegrown solutions. Using relatively complex configurations to connect and manage these services presents a huge barrier to learning machine learning. Infrastructure engineers often spend a lot of time deploying manually before testing a model.

To make matters worse, the above deployments are closely tied to the clusters they are deployed on, these stacks are not portable, which means that migrating a model from a laptop to a highly scalable cloud cluster without major re-architectures is actually is not possible. All these differences can be a huge waste of energy, and each conversion can introduce bugs.

Getting Started with Kubeflow

To address these issues, we decided to create the Kubeflow project, a new open source Github repo dedicated to making the ML (Machine Learning) stack simple, fast, and scalable on Kubernetes.

This repository contains:

  • JupyterHub, for creating and managing interactive Jupyter notebooks
  • Tensorflow Custom Resource (CRD), whether it is GPU and CPU selection or cluster size adjustment, one command can do it.
  • A TF service container


Because this solution is based on Kubernetes, it can be used anywhere Kubernetes is running. Just start a cluster and it's ready to use.

Use Kubeflow

Suppose you are using two different Kubernetes clusters: a local minikube cluster and a GKE cluster using GPUs, and there are two kubectl contexts named minikube and gke.

First, you need to initialize ksonnet and install Kubeflow (ksonnet needs to be installed in the operating system before using it, please refer to here ).

ks init my-kubeflow
cd my-kubeflow
ks registry add kubeflow \
github.com/google/kubeflow/tree/master/kubeflow
ks pkg install kubeflow/core
ks pkg install kubeflow/tf-serving
ks pkg install kubeflow/tf-job
ks generate core kubeflow-core --name=kubeflow-core


Then define the environment variables for the communication between the two clusters.

kubectl config use-context minikube
ks env add minikube

kubectl config use-context gke
ks env add gke


Create environment variables in the cluster, first on minikube:

ks apply minikube -c kubeflow-core


Then create in a multi-node GKE cluster for fast training:
 

ks apply gke -c kubeflow-core



To easily deploy such a rich ML stack anywhere, modification and rewriting of environment variables should be kept to a minimum.

Either deployment can be accessed by executing the following command:
 

kubectl port-forward tf-hub-0 8100:8000


Open the address ttp://127.0.0.1:8100to visit JupyterHub. Environment variables can be modified via kubectl using the following command:

#To access minikube
kubectl config use-context minikube

# To access GKE
kubectl config use-context gke


When executed 应用, it will start on k8s:

  • JupyterHub to start and manage Jupyter notebooks
  • a TF CRD


Suppose you want to submit a training task, Kubeflow provides a ksonnet prototype that makes it easy to define components. The tf-job prototype makes it easy to create a task for your code, but for the training task we will use the tf-cnn prototype running the TensorFlow CNN benchmark.

Before submitting a training task, first generate a new task from the model

ks generate tf-cnn cnn --name=cnn


The tf-cnn model uses a worker by default and does not use a GPU. It is very suitable for minikube clusters and can be submitted directly.

ks apply minikube -c cnn


For GKE, we're going to switch to another model in order to take advantage of its multiple nodes and GPUs. First, list all the available parameters:

# To see a list of parameters
ks prototype list tf-job


Adjustment parameters:

ks param set --env=gke cnn num_gpus 1
ks param set --env=gke cnn num_workers 1  

ks apply gke -c cnn


Note that after we set the variable, it will only take effect when it is deployed to GKE, and the parameters in minikube have not changed (Editor's Note: Use -- envto specify the cluster whose parameters you want to modify).

After training, the model can be migrated to the service used.

Kubeflow also includes a service pack. In a separate example, we train a standard Inception model and store the trained model in gs:// kubeflow-modelsa , at path / inception.

You can deploy a trained model with the following command

ks generate tf-serving inception --name=inception
---namespace=default --model_path=gs://kubeflow-models/inception
ks apply gke -c inception


This also highlights another feature of Kubeflow - accepting input when deploying. This command creates a GKE cluster-based tf-servingservice , available to the application.

For more information on deploying and monitoring TensorFlow training tasks and TensorFlow models, see the user guide .

Kubeflow + ksonnet

What we want to elicit is the use of ksonnet. We think multi-environment (dev, test, prod) development will be the norm for most Kubeflow users. By treating environments as a first-class concept, ksonnet enables Kubeflow users to easily migrate workloads between environments.

We feel Ksonnet is a good choice, especially now that Helm is integrating Ksonnet into its next version of the platform. More information about ksonnet can be found in the ksonnet documentation .

We would also like to thank the Heptio team for expediting support for some of the key ksonnet features used by Kubeflow.

what's next?

We are working hard to build a community and look forward to your contributions! We have collaborated with many teams such as CaiCloud , Red Hat & OpenShift , Canonical , Weavework , Container Solutions and many more. CoreOS has high hopes for Kubeflow:
 

"Kubeflow has made significant strides in simplifying the provisioning and production of machine learning workloads on Kubernetes, and we believe this will lead to greater adoption of the platform by more enterprises. We look forward to working with the Kubeflow team to provide Kubeflow with enterprise-grade Kubernetes Platform -- Tectonic's tight integration." -- Reza Shafii, VP of CoreOS Product

If you want to try Kubeflow in your browser now, we have partnered with Katacoda , click here to try it easily!

We are just getting started! We'd love to help you, follow us by:


Original link: Introducing Kubeflow - A Composable, Portable, Scalable ML Stack Built for Kubernetes (Translation: Li Jiaqing )

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325362455&siteId=291194637
Recommended