January 24, 2024 by Alex Lovell-Troy4 minutes
At Los Alamos National Laboratory, we’re on the brink of launching a turnkey version of OpenCHAMI, specifically tailored for summer intern teams at the Supercomputer Institute. This initiative underscores our commitment to fostering the collaborative development of HPC (High-Performance Computing) systems with cloud-like management capabilities. This post will explore our motivations for this project, detail the features we’ve chosen to include (and exclude), and discuss the impact of these decisions on OpenCHAMI’s evolution.
We are proud to offer an 11-week paid internship at Los Alamos National Laboratory, aimed at introducing students from diverse backgrounds to HPC techniques and practices. Beginning in May 2024, the program includes a two-week boot camp designed to equip students with the skills necessary to construct and manage HPC clusters, regardless of their previous experience or academic background.
Learn more at Los Alamos National Laboratory’s website.
The boot camp, led by experienced sysadmins from our HPC group, covers everything from managing HPC hardware to assembling functional clusters capable of running HPC jobs with Slurm. With a curriculum that evolves yearly, we emphasize hands-on manual operations before progressing to automation, ensuring students gain a solid understanding of HPC system components.
At the Supercomputing Conference in 2023, we showcased how OpenCHAMI could manage a small HPC cluster, using hardware identical to what’s found in the Supercomputer Institute (SI). This demonstration not only highlighted OpenCHAMI’s potential as a teaching tool but also confirmed our belief in the educational value of exposing students to core system management protocols and modern computing concepts.
Despite the success of our demonstration, we recognize the need for further simplification to make OpenCHAMI more accessible and user-friendly for intern teams. Our aim is to enhance ease of use, speed, and safety in the coming months.
Feedback from our demonstration highlighted a need for better documentation. Teams eager to replicate our setup encountered challenges with customization due to limited access to microservice settings and complex system interfaces. This feedback has been invaluable, prompting us to focus on enhancing documentation and system extensibility to support the SI’s educational goals.
The SI’s blend of novice users and expert mentors will create an environment where “all bugs are shallow,” facilitating rapid learning and system improvement.
Our development team is diligently updating our GitHub Roadmap, focusing on simplifying OpenCHAMI deployment for the SI, enhancing the instruction manual, and improving microservices usability.
As we prepare each student to manage their own cluster — complete with a dedicated head node and several compute nodes, all isolated within the network — we are focusing on a deployment strategy that simplifies the process while maintaining robust functionality. Based on our experiences and the feedback received, we aim to consolidate services using docker-compose for ease of management and scalability. Here are some example core components that may be part of the OpenCHAMI deployment recipe:
NB Ochami is designed to be moduar and each service listed provides some level of functionality. If we can achieve the same goal with a different component, we’ll swtich it up. Based on discussion in our issues, we may be switching to dnsmasq for dhcp services in SI while preserving the option for coredhcp or other solutions in the future. You can follow our discussion about what to include and what to exclude in our roadmap issue.
By incorporating these components into our docker-compose files, we ensure that each cluster is equipped with the necessary services to operate efficiently and securely. This approach not only streamlines the deployment process but also provides a modular and flexible framework for managing HPC clusters.
The Turnkey OpenCHAMI project for the Supercomputer Institute marks a significant advance in HPC education and collaborative development. By addressing essential feedback and focusing on user-centric improvements, we’re not only enhancing OpenCHAMI but also enriching the educational journey for future HPC professionals. As we progress, we look forward to the innovative contributions this initiative will spark, furthering our mission to democratize high-performance computing.
Engage with us on GitHub, Slack, or through our Contact Page to be part of this exciting journey.