JupyterHub ORM Maintenance

Performance

JupyterHub performance sometimes scales with the total number of users in its ORM database, rather than the number of running users. Reducing the user count enables the hub to restart much faster. While this issue should be addressed, we can work around it by deleting inactive users from the hub database once in a while. Note that this does not delete the user’s storage.

The script scripts/delete-unused-users.py will delete anyone who hasn’t registered any activity in a given period of time, double checking to make sure they aren’t active right now. This will require users to log in again the next time they use the hub.

This should be done before the start of each semester, particularly on hubs with a lot of users.

Running the script

./delete-unused-users.py --help
usage: delete-unused-users.py [-h] [-c CREDENTIALS] [-H HUB_URL] [--dry_run]
                              [--inactive_since INACTIVE_SINCE] [-v] [-d]

options:
  -h, --help            show this help message and exit
  -c CREDENTIALS, --credentials CREDENTIALS
                        Path to a json file containing hub url and api keys.
                        Format is: {"hub1_url": "hub1_key", "hub2_url":, "hub2_key"}
  -H HUB_URL, --hub_url HUB_URL
                        Fully qualified URL to the JupyterHub. You must also
                        set the JUPYTERHUB_API_TOKEN environment variable with
                        the API key.
  --dry_run             Dry run without deleting users.
  --inactive_since INACTIVE_SINCE
                        Period of inactivity after which users are considered
                        for deletion (literal string constructor values for
                        timedelta objects).
  -v, --verbose         Set info log level.
  -d, --debug           Set debug log level.

The ‘best’ way to run this script is to log in to each hub and in the Admin page, generate a token. The URL will be {hub_url}/hub/token. You can store the tokens in a json-like configuration file on your device with the following format:

{
  "https://a11y.datahub.berkeley.edu": "XXXXXXXXXXXXXXXXXXXXXXXXXXX",
  "https://astro.datahub.berkeley.edu": "XXXXXXXXXXXXXXXXXXXXXXXXXXX",
  "https://biology.datahub.berkeley.edu": "XXXXXXXXXXXXXXXXXXXXXXXXXXX",
  "https://cee.datahub.berkeley.edu": "XXXXXXXXXXXXXXXXXXXXXXXXXXX",
  "https://data8.datahub.berkeley.edu": "XXXXXXXXXXXXXXXXXXXXXXXXXXX",
  "https://data100.datahub.berkeley.edu": "XXXXXXXXXXXXXXXXXXXXXXXXXXX",
  "https://data101.datahub.berkeley.edu": "XXXXXXXXXXXXXXXXXXXXXXXXXXX",
}

Then you can execute the script as such:

./delete-unused-users.py -c ~/.datahub/hub-api-tokens.json -v --inactive_since=days=30