GPU support example

Overview

Kubernetes can support GPUs to speed parallel processing, and auto-scaling in environments that support it. This feature can be used for Machine Learning and Data Science applications. Kublr can automaticaly detects GPUs on AWS and Azure instances and configure the environment to use GPUs with Kubernetes.

This document explains:

  • what kind of GPUs Kublr supports;
  • how Kublr does it;
  • how to use GPUs in Kublr.

What GPU instances and OSes are supported by Kublr?

On AWS Kublr supports GPUs on Ubuntu 16.04 and RedHat 7.5 for the following GPU instances:

  • p2.xlarge;
  • p2.8xlarge;
  • p2.16xlarge;
  • p3.2xlarge;
  • p3.8xlarge;
  • p3.16xlarge.

On Azure Kublr supports GPUs on Ubuntu 16.04 for the following GPU instances:

  • Standard_NC6s_v2;
  • Standard_NC12s_v2;
  • Standard_NC24s_v2;
  • Standard_NC24rs_v2;
  • Standard_NC6s_v3;
  • Standard_NC12s_v3;
  • Standard_NC24s_v3;
  • Standard_NC24rs_v3;
  • Standard_ND6s;
  • Standard_ND12s;
  • Standard_ND24s;
  • Standard_ND24rs.

How can I use GPUs instances?

  1. Login to Kublr paltform.
  2. Click “Add cluster”.
  3. Choose cloud provider (AWS or Azure).
  4. Choose appropriate instance type:

On AWS: Select instance type On Azure: Select instance type

Click ‘Create Cluster’ and wait until the cluster is created.

How to make sure that the device is working?

Check NVIDIA GPU controller and Docker configuration.

  1. Login to the worker node via ssh.
  2. Be sure that NVIDIA GPU controller exists:

    # lspci -m -nn
    ....
    00:1e.0 "3D controller [0302]""NVIDIA Corporation [10de]""GK210GL [Tesla K80] [102d]"-ra1 "NVIDIA Corporation [10de]""Device [106c]"
    ....
    #
    #
    # nvidia-smi
    Mon Jun 18 10:15:50 2018
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 396.26                 Driver Version: 396.26                    |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
    |    N/A50C    P8    27W / 149W |      0MiB / 11441MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+
    #
    # nvidia-smi -L
    GPU 0: Tesla K80 (UUID: GPU-860ba7bf-e331-54b4-6e2c-322fb389597b)
    
  3. Be sure that docker configured correctly:

    # docker run --rm nvidia/cuda nvidia-smi
    Mon Jun 18 10:15:50 2018
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 396.26                 Driver Version: 396.26                    |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
    |    N/A50C    P8    27W / 149W |      0MiB / 11441MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+
    

Run demo GPU application using Helm chart

Prerequsites:

  • cluster with GPU node should be created;
  • installed kubectl;
  • installed helm;
  • downloaded ‘demo7-gpu’ from https://github.com/kublr/demos (dockerGPUHelm contains chart with demo).

Process default video

Copy KubeConfig file from cluster’s Overview page and move it to dir .kube

$ cp ~/Downloads/config.yaml ~/.kube/config

Check that kubectl is working and using right config file:

$ kubectl config view
$ kubectl cluster-info

Change directory to ../demo7-gpu and install chart

$ helm install dockerGPUHelm --name demo7
NAME:   demo7
LAST DEPLOYED: Thu Aug  9 13:09:01 2018
NAMESPACE: default
STATUS: DEPLOYED

RESOURCES:
==> v1/Pod(related)
NAME                            READY  STATUS             RESTARTS  AGE
demo7-demo-gpu-5cc6795c4-sgnz9  0/1    ContainerCreating  0         1s
==> v1/Service
NAME            TYPE       CLUSTER-IP     EXTERNAL-IP  PORT(S)  AGE
demo7-demo-gpu  ClusterIP  100.70.66.123  <none>       80/TCP   1s
==> v1beta2/Deployment
NAME            DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
demo7-demo-gpu  1        1        1           0          1s

Run command for port forwarding (pod_name in the previous console output)

$ kubectl port-forward <POD_NAME> 8000:8000
Forwarding from 127.0.0.1:8000 -> 8000

Get video streem: open this link in browser http://localhost:8000/

Video process example

Change video

  1. Open file ../demo7-gpu/dockerGPUHelm/templates/deployment.yaml

  2. Change the value of parameter VIDEO_LINK. Note: you can change value of VIDEO_OUTPUT_COMPRESSION for the desired video quality level.

  3. Upgrade the helm chart. Note: You should wait approximately a minute until the previous Pod is terminated.

    $ helm upgrade demo7 dockerGPUHelm
    Release "demo7" has been upgraded. Happy Helming!
    LAST DEPLOYED: Thu Aug  9 13:42:54 2018
    NAMESPACE: default
    STATUS: DEPLOYED
    
  4. Create a new Pod’s name.

    $ kubectl get pods --all-namespaces |grep demo7
    
  5. Run command for port forwarding.

  6. Open link http://localhost:8000/ to get video stream. Video process example

How does it work?

In the process of installing the cluster, Kublr checks for the presence of GPU devices and, if detected, does the following:

  1. Install NVIDIA drivers.
  2. Install nvidia runtime for docker.
  3. Configure docker to use nvidia runtime:
    1. For Ubuntu 16.04 add section ‘runtime’ to docker config file (daemon.yaml)
    2. For RedHat 7.5 add OCI hook (/usr/libexec/oci/hooks.d/nvidia).

Questions? Suggestions? Need help? Contact us.