Parallel and Distributed Systems Group

Computer Science Department of Telecom SudParis

A system for massively parallel hyperparameter tuning

Reading group: Ali Mammadov presented "A system for massively parallel hyperparameter tuning" (MLSys'20) in visio the 15/1/2021 at 10h00.

You can find the video of the presentation here.

Abstract

Modern learning models are characterized by large hyperparameter spaces and long training times. These prop- erties, coupled with the rise of parallel computing and the growing demand to productionize machine learning workloads, motivate the need to develop mature hyperparameter optimization functionality in distributed com- puting settings. We address this challenge by first introducing a simple and robust hyperparameter optimization algorithm called ASHA, which exploits parallelism and aggressive early-stopping to tackle large-scale hyperparam- eter optimization problems. Our extensive empirical results show that ASHA outperforms existing state-of-the-art hyperparameter optimization methods; scales linearly with the number of workers in distributed settings; and is suitable for massive parallelism, as demonstrated on a task with 500 workers. We then describe several design decisions we encountered, along with our associated solutions, when integrating ASHA in Determined AI’s end-to-end production-quality machine learning system that offers hyperparameter tuning as a service.