HPC File Systems Fail for Deep Learning at Scale
HPC File Systems Fail for Deep Learning at Scale
When the team ran their deep learning job on over 27,000 GPUs and all of those graphics engines wanted to read data from the file system on Summit …
Link to Full Article: HPC File Systems Fail for Deep Learning at Scale