Create a New Single User Image

You might need to create a new user image when deploying a new hub, or changing from a shared single user server image. We use repo2docker to generate our images.

There are two approaches to creating a repo2docker image:

  1. Use a repo2docker-style image template (environment.yaml, etc)
  2. Use a Dockerfile (useful for larger/more complex images)

Generally, we prefer to use the former approach, unless we need to install specific packages or utilities outside of python/apt as root. If that is the case, only a Dockerfile format will work.

As always, create a feature branch for your changes, and submit a PR when done.

There are two approaches to pre-populate the image’s assets:

Image Repository Settings

There are now a few steps to set up the CI/CD for the new image repo. In the berkeley-dsep-infra image repo, click on Settings, and under General, scroll down to Pull Requests and check the box labeled Automatically delete head branches.

Scroll back up to the top of the settings, and in the left menu bar, click on Secrets and variables, and then Actions.

From there, click on the Variables tab and then New repository variable. We will be adding two new variables:

  1. HUB: the name of the hub (eg: datahub)

  2. IMAGE: the Google Artifact Registry path and image name. The path will always be ucb-datahub-2018/user-images/<image-name> and the image name will always be the same as the repo: <hubname>-user-image.

Your Fork’s Repository Settings

Now you will want to disable Github Actions for your fork of the image repo. If you don’t, whenever you push PRs to the root repo the workflows in your fork will attempt to run, but don’t have the proper permissions to successfully complete. This will then send you a nag email about a workflow failure.

To disable this for your fork, click on Settings, Actions and General. Check the Disable actions box and click save.

Enable Artifact Registry Pushing

The image repository needs to be added to the list of allowed repositories in the berkeley-dsep-infra secrets. Go to the berkeley-dsep-infra Secrets and Variables. Give your repository permissions to push to the Artifact Registry, as well as to push a branch to the datahub repo.

Edit both DATAHUB_CREATE_PR and GAR_SECRET_KEY, and click on the gear icon, search for your repo name, check the box and save.

Configure hubploy

You need to let hubploy know the specifics of the image by updating your deployment’s hubploy.yaml. Change the name of the image in deployments/<hubname>/hubploy.yaml to point to your new image name, and after the name add :PLACEHOLDER in place of the image sha. This will be automatically updated after your new image is built and pushed to the Artifact Registry.

Example:

images:
  images:
    - name: us-central1-docker.pkg.dev/ucb-datahub-2018/user-images/fancynewhub-user-image:PLACEHOLDER

cluster:
  provider: gcloud
  gcloud:
    project: ucb-datahub-2018
    service_key: gke-key.json
    cluster: spring-2024
    zone: us-central1

Next, add the ssh clone path of the root image repo to repos.txt.

Create a PR and merge to staging. You can cancel the Deploy staging and prod hubs job in Actions, or just let it fail.

Subscribe to GitHub Repo in Slack

Go to the #ucb-datahubs-bots channel, and run the following command:

/github subscribe berkeley-dsep-infra/<your repo name>

Modify the Image

This step is straightforward: create a feature branch, and edit, delete, or add any files to configure the image as needed.

We also strongly recommend copying README-template.md over the default README.md, and modifying it to replace all occurrences of <HUBNAME> with the name of your image.

Submit Pull Requests

Familiarize yourself with pull requests and repo2docker, and create a fork of the datahub staging branch.

  1. Set up your git/dev environment by following the image templat’s contributing guide.

  2. Test the image locally using repo2docker.

  3. Submit a PR to staging.

  4. Commit and push your changes to your fork of the image repo, and create a new pull request at https://github.com/berkeley-dsep-infra/.

  5. After the build passes, merge your PR in to main and the image will be built again and pushed to the Artifact Registry. If that succeeds, then a commit will be crafted that will update the PLACEHOLDER field in hubploy.yaml with the image’s SHA and pushed to the datahub repo. You can check on the progress of this workflow in your root image repo’s Actions tab.

  6. After the previous step is completed successfully, go to the Datahub repo and click on the New pull request button. Next, click on the compare: staging drop down, and you should see a branch named something like update-<hubname>-image-tag-<SHA>. Select that, and create a new pull request.

  7. Once the checks has passed, merge to staging and your new image will be deployed! You can watch the progress in the deploy-hubs workflow.