Create a New Single User Image

You might need to create a new user image when deploying a new hub, or changing from a shared single user server image. We use repo2docker to generate our images.

There are two approaches to creating a repo2docker image:

  1. Use a repo2docker-style image template (environment.yaml, etc)
  2. Use a Dockerfile (useful for larger/more complex images)

Generally, we prefer to use the former approach, unless we need to install specific packages or utilities outside of python/apt as root. If that is the case, only a Dockerfile format will work.

As always, create a feature branch for your changes, and submit a PR when done.

Use an existing image as a template

Browse through our image repos to find a hub that is similar to the one you are trying to create. This will give you a good starting point.

Create the image repos

Create a new image repo from the hub-user-image-template. Click “Use this template” > “Create a new repository”.

Be sure to follow convention and name the repo <hubname>-user-image, and the owner needs to be berkeley-dsep-infra. When that is done, create your own fork of the new repo.

Configuring the root image repo

There are now a few steps to set up the CI/CD for the new image repo. In the berkeley-dsep-infra image repo, click on Settings, and under General, scroll down to Pull Requests and check the box labeled Automatically delete head branches.

Scroll back up to the top of the settings, and in the left menu bar, click on Secrets and variables, and then Actions.

From there, click on the Variables tab and then New repository variable. We will be adding two new variables:

  1. HUB: the name of the hub (eg: datahub)

  2. IMAGE: the Google Artifact Registry path and image name. The path will always be ucb-datahub-2018/user-images/<image-name> and the image name will always be the same as the repo: <hubname>-user-image.

Configure your fork

Now you will want to disable Github Actions for your fork of the image repo. If you don’t, whenever you push PRs to the root repo the workflows in your fork will attempt to run, but don’t have the proper permissions to successfully complete. This will then send you a nag email about a workflow failure.

To disable this for your fork, click on Settings, Actions and General. Check the Disable actions box and click save.

Add the root image repo to the list of allowed repos in the berkeley-dsep-infra secrets.

Now, go to the berkeley-dsep-infra Secrets and Variables. You will need to give your repo permissions to push to the Artifact Registry, as well as to push a branch to the datahub repo.

Edit both DATAHUB_CREATE_PR and GAR_SECRET_KEY, and click on the gear icon, search for your repo name, check the box and save.

Update your deployment’s hubploy.yaml and add the image to the primary list of repos.

You need to let hubploy know the specifics of the image. Change the name of the image in deployments/<hubname>/hubploy.yaml to point to your new image name, and after the name add :PLACEHOLDER in place of the image sha. This will be automatically updated after your new image is built and pushed to the Artifact Registry.

Example:

images:
  images:
    - name: us-central1-docker.pkg.dev/ucb-datahub-2018/user-images/fancynewhub-user-image:PLACEHOLDER

cluster:
  provider: gcloud
  gcloud:
    project: ucb-datahub-2018
    service_key: gke-key.json
    cluster: spring-2024
    zone: us-central1

Next, add the ssh clone path of the root image repo to repos.txt.

Create a PR and merge to staging. You can cancel the Deploy staging and prod hubs job in Actions, or just let it fail.

Add a github bot notification in Slack

Go to the #ucb-datahubs-bots channel, and run the following command:

/github subscribe berkeley-dsep-infra/<your repo name>

Modify the image configuration as necessary

This step is straightforward: create a feature branch, edit/modify/delete/add any files in the image repo to configure the image as needed.

We also strongly recommend copying README-template.md over the default README.md, and modifying it to replace all occurrences of <HUBNAME> with the name of your image.

Submitting a pull request

Familiarize yourself with pull requests and repo2docker , and create a fork of the datahub staging branch.

  1. Set up your git/dev environment by following the instructions here. : - This guide is also located in your image repo!

  2. Test the changes locally using repo2docker, then submit a PR to staging.
    • To use repo2docker, be sure that you are inside the image repo directory on your device, and then run repo2docker ..
  3. Commit and push your changes to your fork of the image repo, and create a new pull request at https://github.com/berkeley-dsep-infra/.

  4. After the build passes, merge your PR in to main and the image will be built again and pushed to the Artifact Registry. If that succeeds, then a commit will be crafted that will update the PLACEHOLDER field in hubploy.yaml with the image’s SHA and pushed to the datahub repo. You can check on the progress of this workflow in your root image repo’s Actions tab.

  5. After 4 is completed successfully, go to the Datahub repo and click on the New pull request button. Next, click on the compare: staging drop down, and you should see a branch named something like update-<hubname>-image-tag-<SHA>. Select that, and create a new pull request.

  6. Once the checks has passed, merge to staging and your new image will be deployed! You can watch the progress here.