About This Opportunity
HPC Specialist – Role Overview Arca dion is seeking an HPC (High-Performance Computing) Specialist responsible for the design, deployment, optimization, and management of high-performance computing systems, often used in scientific research, engineering simulations, AI/ML workloads, and large-scale data analytics.
Core Responsibilities
Architecture & System Design
Design scalable HPC clusters (on-prem, cloud, or hybrid)
Choose appropriate CPUs, GPUs, interconnects (e.g., InfiniBand), and storage
Configure Slurm, PBS, or OpenHPC job schedulers
Cluster Deployment & Maintenance
Install and manage Linux-based compute nodes
Maintain job schedulers and resource managers
Integrate monitoring tools (Prometheus, Grafana, Nagios)
Performance Tuning & Optimization
Benchmark workloads and tune for performance (e.g., MPI, CUDA, OpenMP)
Optimize I/O and inter-node communication
Ensure...