πŸ”§ Dynamic Cloud-Init Configuration in OpenCHAMI

Introduction: Automating Node Bootstrapping with Cloud-Init

Managing a High-Performance Computing (HPC) cluster requires automated, secure, and scalable provisioning. OpenCHAMI leverages Cloud-Init to dynamically configure compute and IO nodes at boot, ensuring each node receives the right software, security policies, and storage setup.

This post walks through a real-world example using Cloud-Init’s group-based configurations to set up two different node types, each with its own automated bootstrapping.

πŸ”Ή The Use Case: Multi-Group Node Provisioning

We’ll configure two nodes, each belonging to different Cloud-Init groups:

  • Node 1: Compute Node

    • Groups: slurm, tenant-foo
    • Config: Installs Slurm from OpenHPC, adds a root-foo user with sudo and SSH.
  • Node 2: IO Node

    • Groups: tenant-foo, ephemeral-storage
    • Config: Adds root-foo with SSH, ensures /dev/sda1 is partitioned and formatted, then mounts it to /opt/ephemeral.

Each group contributes specific configurations, and nodes receive the combined settings from all groups they belong to.

1️⃣ Setting Up Group-Based Cloud-Init Configurations

πŸ“ Slurm Group: Installing OpenHPC’s Slurm Client

package_update: true
package_upgrade: true
    baseurl: https://repos.openhpc.community/OpenHPC/3/EL_9/
    enabled: true
    gpgcheck: false
    name: OpenHPC
  - ohpc-slurm-client

  "name": "slurm",
  "description": "Nodes in this group install Slurm from OpenHPC",
  "file": {
    "content": "$(echo "$SLURM_CLOUD_CONFIG_CONTENT" | base64 -w 0)",
    "encoding": "base64"

curl -X PUT http://localhost:27777/cloud-init/admin/groups/slurm      -H "Content-Type: application/json"      -d "$SLURM_JSON_PAYLOAD"

πŸ“ Tenant-Foo Group: Adding a Privileged User with SSH Access

  - name: root-foo
    gecos: "Tenant Foo User"
    sudo: "ALL=(ALL) NOPASSWD:ALL"
    shell: /bin/bash
      - "ecdsa-sha2-nistp256 AAAAE2...user@domain.com"

  "name": "tenant-foo",
  "description": "Adds root-foo with sudo and SSH key",
  "file": {
    "content": "$(echo "$TENANT_FOO_CLOUD_CONFIG_CONTENT" | base64 -w 0)",
    "encoding": "base64"

curl -X PUT http://localhost:27777/cloud-init/admin/groups/tenant-foo      -H "Content-Type: application/json"      -d "$TENANT_FOO_JSON_PAYLOAD"

πŸ“ Ephemeral Storage Group: Formatting and Mounting /dev/sda1

    table_type: gpt
    layout: true
    overwrite: false

  - label: ephemeral-storage
    filesystem: xfs
    device: /dev/sda1
    partition: auto

  - [ "/dev/sda1", "/opt/ephemeral", "xfs", "defaults,nofail", "0", "2" ]

  "name": "ephemeral-storage",
  "description": "Ensures /dev/sda1 is partitioned, formatted as XFS, and mounted at /opt/ephemeral",
  "file": {
    "content": "$(echo "$EPHEMERAL_STORAGE_CLOUD_CONFIG_CONTENT" | base64 -w 0)",
    "encoding": "base64"

curl -X PUT http://localhost:27777/cloud-init/admin/groups/ephemeral-storage      -H "Content-Type: application/json"      -d "$EPHEMERAL_STORAGE_JSON_PAYLOAD"

2️⃣ What Happens at Boot?

At boot, each node requests cloud-init information in a standard order:

  1. Requests /meta-data:

    • Retrieves inventory information, including a unique instance-id for each boot.
    • Includes hostname, location, and other identity details when available.
  2. Requests /user-data:

    • Reserved for future use, currently empty in OpenCHAMI.
  3. Requests /vendor-data:

    • Returns a list of cloud-config YAML files, one for each group the node belongs to.
    • Example: If a node is part of io and tenant-foo, it receives a list of:
  4. Processes Cloud-Config Files:

    • The cloud-init client fetches each listed YAML file, parses them, and applies configurations.
  5. Sends phone-home Confirmation:

    • After processing all #cloud-config files, the node sends a status update to the cloud-init server indicating it has fully booted.

πŸš€ Example Boot Configurations

  • Compute Node (Groups: slurm, tenant-foo)
    βœ… Installs Slurm
    βœ… Adds root-foo user with SSH and sudo access

  • IO Node (Groups: tenant-foo, ephemeral-storage)
    βœ… Adds root-foo user with SSH access
    βœ… Partitions and formats /dev/sda1 if necessary
    βœ… Mounts /dev/sda1 to /opt/ephemeral

πŸ”— Next Steps

πŸ’‘ Why This Matters

βœ”οΈ Uses OpenCHAMI’s API correctly (Base64-encoded JSON).
βœ”οΈ Automates cluster-wide provisioning without manual intervention.
βœ”οΈ Ensures each node gets exactly what it needs based on its role.

By leveraging Cloud-Init with OpenCHAMI, HPC admins can securely and automatically configure compute and IO nodes at scaleβ€”without managing per-node configurations manually.

πŸš€ Want to see more Cloud-Init examples? Join the OpenCHAMI community and help shape the future of HPC automation!