π§ Dynamic Cloud-Init Configuration in OpenCHAMI
Introduction: Automating Node Bootstrapping with Cloud-Init
Managing a High-Performance Computing (HPC) cluster requires automated, secure, and scalable provisioning. OpenCHAMI leverages Cloud-Init to dynamically configure compute and IO nodes at boot, ensuring each node receives the right software, security policies, and storage setup.
This post walks through a real-world example using Cloud-Init’s group-based configurations to set up two different node types, each with its own automated bootstrapping.
πΉ The Use Case: Multi-Group Node Provisioning
Weβll configure two nodes, each belonging to different Cloud-Init groups:
Node 1: Compute Node
- Groups:
slurm
,tenant-foo
- Config: Installs Slurm from OpenHPC, adds a
root-foo
user with sudo and SSH.
- Groups:
Node 2: IO Node
- Groups:
tenant-foo
,ephemeral-storage
- Config: Adds
root-foo
with SSH, ensures/dev/sda1
is partitioned and formatted, then mounts it to/opt/ephemeral
.
- Groups:
Each group contributes specific configurations, and nodes receive the combined settings from all groups they belong to.
1οΈβ£ Setting Up Group-Based Cloud-Init Configurations
π Slurm Group: Installing OpenHPC’s Slurm Client
SLURM_CLOUD_CONFIG_CONTENT=$(cat <<EOF
#cloud-config
package_update: true
package_upgrade: true
yum_repos:
OpenHPC:
baseurl: https://repos.openhpc.community/OpenHPC/3/EL_9/
enabled: true
gpgcheck: false
name: OpenHPC
packages:
- ohpc-slurm-client
EOF
)
SLURM_JSON_PAYLOAD=$(cat <<EOF
{
"name": "slurm",
"description": "Nodes in this group install Slurm from OpenHPC",
"file": {
"content": "$(echo "$SLURM_CLOUD_CONFIG_CONTENT" | base64 -w 0)",
"encoding": "base64"
}
}
EOF
)
curl -X PUT http://localhost:27777/cloud-init/admin/groups/slurm -H "Content-Type: application/json" -d "$SLURM_JSON_PAYLOAD"
π Tenant-Foo Group: Adding a Privileged User with SSH Access
TENANT_FOO_CLOUD_CONFIG_CONTENT=$(cat <<EOF
#cloud-config
users:
- name: root-foo
gecos: "Tenant Foo User"
sudo: "ALL=(ALL) NOPASSWD:ALL"
shell: /bin/bash
ssh_authorized_keys:
- "ecdsa-sha2-nistp256 AAAAE2...user@domain.com"
EOF
)
TENANT_FOO_JSON_PAYLOAD=$(cat <<EOF
{
"name": "tenant-foo",
"description": "Adds root-foo with sudo and SSH key",
"file": {
"content": "$(echo "$TENANT_FOO_CLOUD_CONFIG_CONTENT" | base64 -w 0)",
"encoding": "base64"
}
}
EOF
)
curl -X PUT http://localhost:27777/cloud-init/admin/groups/tenant-foo -H "Content-Type: application/json" -d "$TENANT_FOO_JSON_PAYLOAD"
π Ephemeral Storage Group: Formatting and Mounting /dev/sda1
EPHEMERAL_STORAGE_CLOUD_CONFIG_CONTENT=$(cat <<EOF
#cloud-config
disk_setup:
/dev/sda:
table_type: gpt
layout: true
overwrite: false
fs_setup:
- label: ephemeral-storage
filesystem: xfs
device: /dev/sda1
partition: auto
mounts:
- [ "/dev/sda1", "/opt/ephemeral", "xfs", "defaults,nofail", "0", "2" ]
EOF
)
EPHEMERAL_STORAGE_JSON_PAYLOAD=$(cat <<EOF
{
"name": "ephemeral-storage",
"description": "Ensures /dev/sda1 is partitioned, formatted as XFS, and mounted at /opt/ephemeral",
"file": {
"content": "$(echo "$EPHEMERAL_STORAGE_CLOUD_CONFIG_CONTENT" | base64 -w 0)",
"encoding": "base64"
}
}
EOF
)
curl -X PUT http://localhost:27777/cloud-init/admin/groups/ephemeral-storage -H "Content-Type: application/json" -d "$EPHEMERAL_STORAGE_JSON_PAYLOAD"
2οΈβ£ What Happens at Boot?
At boot, each node requests cloud-init information in a standard order:
Requests
/meta-data
:- Retrieves inventory information, including a unique instance-id for each boot.
- Includes hostname, location, and other identity details when available.
Requests
/user-data
:- Reserved for future use, currently empty in OpenCHAMI.
Requests
/vendor-data
:- Returns a list of cloud-config YAML files, one for each group the node belongs to.
- Example: If a node is part of
io
andtenant-foo
, it receives a list of:/io.yaml /tenant-foo.yaml
Processes Cloud-Config Files:
- The cloud-init client fetches each listed YAML file, parses them, and applies configurations.
Sends
phone-home
Confirmation:- After processing all
#cloud-config
files, the node sends a status update to the cloud-init server indicating it has fully booted.
- After processing all
π Example Boot Configurations
Compute Node (Groups:
slurm
,tenant-foo
)
β Installs Slurm
β Adds root-foo user with SSH and sudo accessIO Node (Groups:
tenant-foo
,ephemeral-storage
)
β Adds root-foo user with SSH access
β Partitions and formats/dev/sda1
if necessary
β Mounts/dev/sda1
to/opt/ephemeral
π Next Steps
- Explore OpenCHAMIβs Cloud-Init Repo β GitHub: OpenCHAMI/cloud-init
- Learn About Secure Bootstrapping with WireGuard β Security Docs
- Define Custom Configurations for Your Nodes π
π‘ Why This Matters
βοΈ Uses OpenCHAMIβs API correctly (Base64-encoded JSON).
βοΈ Automates cluster-wide provisioning without manual intervention.
βοΈ Ensures each node gets exactly what it needs based on its role.
By leveraging Cloud-Init with OpenCHAMI, HPC admins can securely and automatically configure compute and IO nodes at scaleβwithout managing per-node configurations manually.
π Want to see more Cloud-Init examples? Join the OpenCHAMI community and help shape the future of HPC automation!