Deploy JupyterHub

This guide is based on Zero to Jupyter guide with stuff specific to Nautilus cluster. You must be the admin of the namespace you’re deploying to.

Initial Setup

Start from choosing the name for your project. It will look like your_name.nrp-nautilus.io

Register CiLogon application

Set the callback url to https://your_name.nrp-nautilus.io/hub/oauth_callback

Client Type: Confidential

Scopes: org.cilogon.userinfo,openid,profile,email

Refresh Tokens: No

Save the client ID and Secret.

Create the namespace

Create a namespace for your project on Nautilus portal and annotate it with all information.

Configuring

Install helm and download the helm chart

Follow the install guide, and use this template for the config:

Run openssl rand -hex 32 and replace the secret_token in the yaml file with the generated key
Minimally set the client_id, client_secret, admin_users, secret_token, oauth_callback_url, ingress.hosts fields.
Add security with either allowed_idps or allowed_users. Do NOT leave your JupyterHub instance open for anyone to sign in, this may result in locking of your namespace. You may find your IDP at https://cilogon.org/idplist
helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/ && helm repo update
helm upgrade --cleanup-on-fail --install jhub jupyterhub/jupyterhub --namespace <namespace> --version=3.3.7 --values config.yaml

Once the pods start, you should be able to see the installation under your selected name.

Automatic deployment

You can put your JupyterHub configuration in GitLab and automatically redeploy the application on repository changes. Please refer to this guide for details.

Customization of your JupyterHub instance

The basic installation of JupyterHub does not always provide the packages or settings needed for your particular use case and will more than likely need to be customized.

The template mentioned earlier in the guide provides a good base for working from.

Extending Existing Images

The first method to create your own custom software environment for the JupyterHub instance is to create your own Docker image to be used with the instance. The easiest way to get started is to use a pre-existing image, such as jupyter/minimal-notebook:latest for a minimal Jupyter image or using a Scientific Image, then extending either image with the packages you need. If you have an existing image that you would like to make usable within JupyterHub, you will need to install the jupyterhub and notebook Python packages in your image.

The general format would be:

FROM <registry_url>/<organization>/<your_project>:<optional_tag>

# Install Packages
pip install --no-cache-dir <packages>

An example of this would be:

FROM jupyter/minimal-notebook:latest

# Install packages
pip install --no-cache-dir pandas matplotlib scipy

Once you create your Dockerfile, you can build youre image locally and push it to a container registry such as gitlab-registry.nrp-nautilus.io or have the Gitlab build and push the image for you automatically.

Allowing custom Anaconda environments

Sometimes you may want to be able to create custom environments on the fly in your JupyterHub instance and allow them to persist across sessions. This can be useful for development, lab workflows, or exploration assignments in classes.

For this, you will need to complete two steps.

Make sure your Jupyter image has nb_conda_kernels installed in the environment.
Create .condarc file in your $HOME path and add the config below.
```
envs_dirs:
  - /home/jovyan/my-conda-envs/
```

After those steps are complete, Anaconda environments can be created within the Jupyter session and persist across sessions after they close.

Adding images to your configuration

In other cases, you may want to set specific environments to use for different tasks, assignments, or labs to avoid many redundent environments.

The example values from earlier in this guide already has a few environments defined that provide a broad range of applications to use. To add your image to the list of available images, you will need to add the values below to your profileList:

- display_name: Name To Show
    kubespawner_override:
      image_spec: <registry_url>/<organization>/<your_project>:<optional_tag>

If it is to be the default image, add default: True.

Shared Storage

If you are working with others on the same project or distributing data out for a class, you can add a PersisitentVolumeClaim as a shared location across all of the pods in the JupyterHub instance.

For example, using the example from the Zero to JupyterHub guide, we can example the storage: section of our example values to:

  storage:
    type: dynamic
    extraLabels: {}
    # Change starts here
    extraVolumes:
      - name: jupyterhub-shared
        persistentVolumeClaim:
          claimName: jupyterhub-shared-volume
    extraVolumeMounts:
      - name: jupyterhub-shared
        mountPath: /home/shared
    # Change Ends
    capacity: 5Gi
    homeMountPath: /home/jovyan
    dynamic:
      storageClass: rook-ceph-block
      pvcNameTemplate: claim-{username}{servername}
      volumeNameTemplate: volume-{username}{servername}
      storageAccessModes: [ReadWriteOnce]

This would mount the shared storage to /home/shared using the jupyterhub-shared-volume PVC. Please note, that for the PVC to be used across multiple pods, the volume would need to have an appropriate access mode such as ReadOnlyMany or ReadWriteMany.

Authentication

Limit access to your University

As a basic step to help limit access to your JupyterHub instance and not leaving it wide open, you should set the allowed_idps to be that of only your university and any other universities that your collaborators are from.

The available Identity providers (idps) are listed in this list from CILogon: https://cilogon.org/idplist/

Once on that page, search for your university by name or url. Once found, the idps value for the configuration will be under “EntityID”. The example values uses the University of Nebraska-Lincoln as the example. The “EntityID” for that university is https://shib.unl.edu/idp/shibboleth.

Under each identity provider, you will also need to add the allowed_domains, which will often be your university’s URL. For example, the University of Nebraska-Lincoln’s allowed_domains would be unl.edu.

Admin Users

Admin users can be set in the admin_users list under JupyterHub. Admins are identified by their email address used to log into Nautilus.

JupyterHub:
      admin_access: true
      admin_users: ["[email protected]","[email protected]"]

With the example config, admin users can access another user’s notebooks. If you want to disable this, set admin_access to false

Allowed Users

Allowed users can be set in the allowed_users list under JupyterHub. Admins are identified by their email address used to log into Nautilus.

JupyterHub:
      allowed_users: ["[email protected]","[email protected]"]

With the example config, admin users can access another user’s notebooks. If you want to disable this, set admin_access to false

Good Practices

When setting up a custom JupyterHub there are a couple of good practices you can implement to help keep the environment sustainable and secure.

Limit who has access

When deploying a JupyterHub instance, you should lock down the service to who should have access rather than leaving the instance open.

At minimum, it should be limited to your University.Further steps can be taken to limit it to specific individuals, which is recommended for labs and small classes.

Use Git

While developing your configuration, Git will be a useful tool to keep track of any changes you make. If something goes wrong with your configuration, using the history in Git will help revert any changes made that caused the issues. When working with the Nautilus cluster, you can use the hosted Gitlab instance to keep track of the changes and then automatically deploy the changes.

Documentation

As you start and continue to use your JupyterHub instance, it is strongly encouraged to keep a running docment of how the instance is setup and any workflows or assignments that run on it. This will help others use and maintain the instance, along with helping with future development and debugging.

This work was supported in part by National Science Foundation (NSF) awards CNS-1730158, ACI-1540112, ACI-1541349, OAC-1826967, OAC-2112167, CNS-2100237, CNS-2120019.