Syncing JupyterHub Users to CalGroups (Grouper)
This document describes how to configure and operate the jupyterhub_grouper_sync service, which synchronizes lists of users who have logged into our JupyterHubs into a campus Grouper instance (CalGroups).
Overview
The jupyterhub_grouper_sync
service is a JupyterHub-managed service that periodically syncs the list of users from a JupyterHub instance to a specified Grouper group. This is useful for maintaining up-to-date group membership in CalGroups based on actual hub logins.
Configuration for Hub Admins
The service is configured via traitlets, and can be set up using either command-line arguments or environment variables. The recommended approach is to configure it as a JupyterHub service in your hub’s configuration (e.g., in your hubploy.yaml
or config.yaml
), and to provide secrets via environment variables.
Required Configuration Variables
You must specify the following configuration variables for the sync service:
JUPYTERHUB_API_URL
: The URL of the JupyterHub API (e.g.,http://localhost:8081/hub/api
).JUPYTERHUB_API_TOKEN
: API token for authenticating with the JupyterHub API. This should be provided as a secret.GROUPER_BASE_URL
: The base URL of the Grouper REST API (e.g.,https://calgroups.berkeley.edu/gws/servicesRest/json/v2_2_100
).GROUPER_USER
: Username for authenticating with Grouper (service account).GROUPER_PASSWORD
: Password for Grouper authentication (should be stored as a secret).GROUPER_ID_PATH
: The Grouper group path to sync with (e.g.,edu:berkeley:app:datahub:datahub-users
).
These can be set as environment variables in your hub deployment’s secrets file (e.g., secrets.yaml
).
Example (in a JupyterHub service definition)
= [
c.JupyterHub.services
{'name': "grouper-sync",
'command': ["grouper-sync"],
'environment': {
'JUPYTERHUB_API_URL': 'http://localhost:8081/hub/api',
'GROUPER_BASE_URL': 'https://calgroups.berkeley.edu/gws/servicesRest/json/v2_2_100',
'GROUPER_USER': '<grouper user>',
'GROUPER_PASSWORD': '<grouper password>',
'GROUPER_ID_PATH': 'edu:berkeley:app:datahub:datahub-users',
}
} ]
The JUPYTERHUB_API_TOKEN
should be provided securely, for example via a Kubernetes secret or your deployment’s secrets management system.
Required JupyterHub Role and Scopes
The service account running the sync must have the following JupyterHub scopes:
list:users
read:users
admin:auth_state
Example role assignment:
= [
c.JupyterHub.load_roles
{"name": "grouper-sync",
"scopes": [
"list:users",
"read:users",
"admin:auth_state",
],"services": ["grouper-sync"],
} ]
How and When the Script Runs
The sync service runs as a long-lived process managed by JupyterHub. It performs an initial sync immediately on startup, and then continues to sync at a regular interval (default: every hour, configurable via the sync_every
parameter in seconds).
- Startup: The sync runs once immediately when the service starts.
- Periodic Sync: The service then syncs every
sync_every
seconds (default: 3600 seconds = 1 hour). - Manual Run: You can also run the script manually if needed for testing or troubleshooting.
Security and Secrets
- Passwords and tokens should always be stored in your deployment’s secrets (never in version control).
- The Grouper service account should have only the minimum permissions required.
Google Groups Provisioning
The Grouper groups are configured to provision to Google Groups. We manually configure the following settings on the Google Groups:
- General > Who can post > Group managers
- General > Who can view members > Group managers
- Member privacy > Who can view member email addresses > Group managers
If a Google Group is ever deprovisioned and then reprovisioned, or if a new Google Group is provisioned, these settings must be applied again through the Google Groups Group settings.
Troubleshooting
- Check the logs of the
grouper-sync
service for errors or warnings. - Ensure all required environment variables and secrets are set.
- Verify that the JupyterHub service account has the correct scopes.