Cerebras Systems achieves 130x performance boost in nuclear simulation over Nvidia A100 GPUs.

November 15, 2023

Monte Carlo particle transport holds significant importance in the realm of High-Performance Computing (HPC) due to its ability to offer high-fidelity simulations of radiation transport, playing a crucial role in the design of fission and fusion reactors. In a collaborative research effort, the Cerebras CS-2 system exhibited remarkable performance, surpassing a highly optimized GPU implementation in the most demanding segment of the Monte Carlo neutron particle transport algorithm—the macroscopic cross-section lookup kernel. This kernel, constituting the most computationally intensive part of the simulation, can account for up to 85% of the total runtime in numerous nuclear energy applications. This achievement not only underscores the efficacy of Argonne's ALCF AI Testbed program, which aims to elevate AI accelerators in U.S. supercomputing infrastructure, but also signals a potential shift challenging the dominance of GPUs in HPC simulation. John R. Tramm, Assistant Computational Scientist at Argonne National Laboratory, shared his perspective, stating, "I've implemented this kernel in a half dozen different programming models and have run it on just about every HPC architecture over the last decade. The performance numbers we were able to get out of the Cerebras machine impressed our team—a clear advancement over what has been possible on CPU or GPU architectures to-date. Our team's work adds to growing evidence that AI accelerators have serious potential to disrupt GPU dominance in the field of HPC simulation."

The Monte Carlo neutron particle transport simulation provides a detailed representation of radiation transport, with the macroscopic cross-section lookup kernel playing a pivotal role in assembling statistical distribution data. Scientists at Argonne National Laboratory utilized the Cerebras SDK and CSL programming language to optimize this kernel, taking advantage of the Cerebras CS-2's wafer-scale architecture, boasting up to 850,000 cores and 40GB of on-chip SRAM. This architecture provided a combination of extreme bandwidth and low latency, ideal for Monte Carlo particle simulations. The study not only validates the Cerebras architecture's potential for external researchers to develop high-performance HPC applications but also highlights the efficiency of the CS-2's architecture. Andrew Feldman, CEO and co-founder of Cerebras Systems, emphasized, "These published results highlight not only the incredible performance of the CS-2 but also its architectural efficiency." The CS-2, powered by the WSE-2 processor, demonstrated a 130x speedup with 48x more transistors than the A100, showcasing a 2.7x gain in architectural efficiency for a problem widely optimized for GPUs.

Furthermore, the Cerebras CS-2 exhibited strong scaling, achieving high performance in both small- and large-scale simulations. The researchers noted that even in smaller scale simulations, no parallel arrangement of GPUs could match the performance of a single CS-2. Designed specifically for generative AI and scientific applications, the Cerebras CS-2, powered by the WSE-2, has delivered impressive results, often characterized by "100x" improvements in scientific computing. Examples include a seismic processing project at the King Abdullah University of Science and Technology and computational fluid dynamics at the National Energy Technology Laboratory, where the CS-2 outperformed traditional solutions by significant margins.

BackNews Source