Jupyter Node
This app chart will deploy a JupyterLab server accessible through a web interface, with the home directory on persistent storage. It can be configured to:
- Use up to eight Nvidia GPUs per instance (Nvidia GTX 1080 Ti or Nvidia RTX 2080 Ti).
- Access Jupyter through a web interface on a subdomain, e.g.
myname.icedc.se
. - Run Python, R, Julia, and other programming languages to perform data analysis, machine learning, or scientific computing.
- Provide external SSH access through a NodePort.
- Install software through Ubuntu/Debian
apt
or Pythonpip
. - Deploy long-running workloads, such as training machine learning models over several days.
- Access S3 storage with Python using boto3, and from the command line with rclone.
- Attach pre-existing persistent storage that is shared with other users in the same namespace.
Installation
Install the app through Rancher.
Supported Docker images
Requirements
The app supports images built on Debian and Ubuntu since apt
is used to automatically install required software libraries such as ssh
.
If you want to use an Nvidia GPU, the image must have Nvidia CUDA pre-installed.
Jupyter not pre-installed
If you want to use a Docker image that does not have Jupyter pre-installed, you can install JupyterLab through the Entrypoint override. This requires Python pip
to be installed in the image, for example:
JupyterLab or Jupyter Notebook pre-installed
You can use the app with a Docker image that has Jupyter pre-installed, such as:
If you do not need a GPU, you can use a smaller image, such as:
- jupyter/datascience-notebook:latest
- jupyter/pyspark-notebook:latest
- jupyter/r-notebook:latest
- jupyter/scipy-notebook:latest
You can also use older images by specifying the tag, e.g.
- tensorflow/tensorflow:1.14.0-gpu-py3-jupyter.
Custom Docker image
If you need specific versions of CUDA, TensorFlow, PyTorch, etc., create your own Docker image and publish it on Harbor. This Dockerfile example can be used as a starting point.
You can use the image by specifying its address on Harbor, e.g.
- registry.ice.ri.se/myproject/custom-jupyter:0.0.1
Configure
Jupyter
Authentication token (required): The token to access Jupyter through the web UI. Can be used to change the password later (container restart is required).
Alternative Jupyter home directory (optional): If you want to use a different home directory path(s) than /home/jovyan
or /tf
, specify it here. This is useful if you want to use a Docker image that has a different home directory, such as /root
.
Web access
Subdomain for icedc.se (required): Access Jupyter through a subdomain of icedc.se
, for example, myname.icedc.se
. The subdomain must be unique, so check with your browser that it is not already in use.
External port (required): The port to expose the Jupyter server on. The default is 8888
. If you want to use a different port, you must also change the port in the Entrypoint override
SSH access
SSH public key (optional): If you want to access the Jupyter server through SSH, provide your public key from ~/.ssh/id_rsa.pub
here. This will allow you to access the server through a NodePort. Read the EKC usage guide for more information.
Commands
Autostart script (optional): Script to install your workload and start it. Runs detached at container post start and does not block Jupyter from starting
- The script is run from the file
~/autostart.sh
. - Output is saved to file
~/autostart.log
.
In this example, we clone a repository and autostart a Python script. You can clone private git
repositories using access tokens.
git clone https://github.com/myname/ml-project.git || true # ignore errors if repo already exists
cd ml-project
python ./my-training-script.py
Entrypoint override (optional): This field can be used to override the Docker image entrypoint. Must block the container from exiting. Used to install JupyterLab and start it.
pip install jupyterlab
jupyter lab --ip=0.0.0.0 --port=8888 --allow-root --no-browser
or
or leave blank to use the default Docker image entrypoint.
Resources
Requested CPU (required): The minimum number of CPU cores to request for the Jupyter server, in milli-CPU units (m). The default is 1000m
, which is equivalent to one CPU core. More cores will be available freely depending on the server load, with a maximum of 64 cores.
Memory limit (required): The maximum amount of memory to request for the Jupyter server, in Gibibytes (Gi).
Storage size (required): The amount of persistent storage to request for the Jupyter server, in Gibibytes (Gi). The storage volume will be mounted in the Jupyter home directory.
Existing shared volumes (optional): If you want to use existing persistent storage that is shared with other deployments in the same namespace, select them here as a comma-separated list. A volume named vol1
will be mounted at /mnt/vol1
etc.
Attach GPU (optional): If you want to use an Nvidia GPU, enable this option. Select the GPU type and the number of GPUs to request. You must use a Docker image that has Nvidia CUDA pre-installed.
S3 storage
S3 storage (optional): S3 buckets are useful for storing large files, such as datasets, models, and checkpoints. Unlike persistent storage, S3 buckets can be dynamically resized and shared with other software platforms.
To access S3 storage through boto3 or rclone, provide your credentials:
S3 endpoint: s3.ice.ri.se
S3 access key: 20 characters
S3 secret key: 40 charactrers
See the documentation for S3 Access keys for more information. The credentials will be saved to /root/.aws/credentials
.
You can access S3 storage from Python, for example to save a PyTorch model:
import boto3, torch, io
# ... Create and train your model
s3 = boto3.resource("s3", endpoint_url="https://s3.ice.ri.se")
bucket = s3.Bucket("my-bucket")
buffer = io.BytesIO()
torch.save(model, buffer)
bucket.put_object(Key="my_model_file.json", Body=buffer.getvalue())
The model will be saved to s3://my-bucket/my_model_file.json
. You can then load the PyTorch model in a similar manner.
Your S3 credentials will also be saved to /root/.config/rclone/rclone.conf
, allowing you to access S3 storage from the command line:
Usage
Access the Jupyter web interface through the subdomain you specified, e.g.
and log in with the authentication token.
See the JupyterLab or Jupyter Notebook documentation for more information.
Access an HTTP server
Suppose that you have developed a web application that you want to access through the web.
You can run an HTTP server in the same pod as Jupyter and proxy web requests to it using the extension jupyter-server-proxy. The proxy requires you to log in to Jupyter, so it is not suitable for public access.
Install the extension with
As an example, we will start a Python HTTP server on port 8000 and access it through the subdomain:
Add the following lines to your JupyterLab configuration file:
c.ServerProxy.servers = {
'web': { # name of the proxy path
'port': 8000, # port to proxy '/web' path to
'command': [ # optional command to run
'python', '-m', 'http.server', '8000'
]
}
}
If you are using classic Jupyter Notebook, edit instead ~/.jupyter/jupyter_notebook_config.py
.
Restart (Shut Down) the JupyterLab server from the web interface to apply the settings.
This will restart the container and interrupt any running workloads.
Read more about settings in the jupyter-server-proxy documentation.
Model serving through ASGI
Modern machine-learning applications are often served using ASGI servers, such as:
Follow the previous instructions to access an HTTP server through the Jupyter server subdomain
You must configure the ASGI server to use the correct root path, i.e. /web
, for CSS and JavaScript files to be served correctly.
Uvicorn
Gradio
import gradio as gr
def greet(name):
return "Hello " + name + "!"
demo = gr.Interface(fn=greet, inputs="text", outputs="text")
demo.launch(server_name="0.0.0.0", root_path="/web", server_port=8000)
Long-running workloads
If you want to run long-running workloads, such as training AI models that take several days to complete, you should use checkpoints to regularly save the state of your model to persistent storage (the Jupyter home directory).
If/when the Jupyter container is restarted, the execution of your workload will be interrupted. Any work that has not been saved to persistent storage will be lost. By using checkpoints, you can resume your work from the last saved state.
After the container is restarted, the Autostart script will always be executed. You can use the script to automatically resume your work from the last checkpoint.
Read more about running .ipynb
notebooks from the terminal with nbconvert, and using checkpoints in TensorFlow and PyTorch.
Connect with Visual Studio Code
If you do not want to use the Jupyter web interface, you can connect to the Jupyter server using Visual Studio Code.
- When installing/upgrading the app in Rancher, provide your SSH public key from
~/.ssh/id_rsa.pub
in the SSH public key field. - Follow the EKC usage page for Visual Studio Code.
- Optionally, you can install the Jupyter extension for Visual Studio Code to run Jupyter Notebooks directly in the editor.
Troubleshooting
My Jupyter app is not starting
If the Jupyter app is not starting, click on the app in Rancher and check the logs. If the logs are empty, the app is probably waiting for a large Docker image to be downloaded, or persistent storage to be created. If the logs contain errors, the app may not be configured correctly. Common issues are:
-
The subdomain is already in use by another app.
-
You are trying to mount a volume that does not exist or is not configured to be shared with multiple deployments.
-
The Docker image is not available or does not have Jupyter pre-installed.
My Notebook execution has stopped
If your Notebook workload suddenly stops executing, the container has likely been restarted. This can happen if you have set a memory limit that is too low, or if the Kubernetes server running Jupyter has crashed.
-
If the container runs out of memory, it will be automatically restarted on the same Kubernetes server. Increase the Memory limit to avoid this.
-
When the Kubernetes server crashes, the Jupyter container will be automatically restarted on a working server. Save checkpoints to persistent storage, and use the Autostart script to automatically resume your work.
Changelog
v0.3.6 - Add support for configuring cluster monitoring with the Ray compute framework.
v0.3.5 - Nginx ingress now sets X-Frame-Options: SAMEORIGIN
for using Grafana iframes with jupyter-server-proxy
.
v0.3.4 - Remove -lab
suffix from pod name.
v0.3.3 - Choose port to expose in Rancher questions (default 8888).
v0.3.2 - Add entrypoint override to Rancher questions.
v0.3.1 - Fix issue that prevented app upgrade with a new GPU type.
v0.3.0 - Add support for S3 configuration of boto3.
v0.2.9 - Set time zone from Rancher questions. Disable non-working Open Data Cube configuration.
v0.2.8 - Improve Autostart script.
v0.2.7 - Improve S3 rclone configuration.
v0.2.6 - Make bash
default shell, make Rancher questions more informative.
v0.2.5 - Let's Encrypt signed HTTPS certificate.
v0.2.4 - Fix shared memory issue.
v0.2.3 - Improve Rancher install questions.
v0.2.2 - Remove CPU limits, use HTTPS, and update ingress to v2.
Created: 2022-12-27