FLOAT: Federated Learning Optimizations with Automated Tuning

Reading group: Brian Ooi presented "FLOAT: Federated Learning Optimizations with Automated Tuning" (EuroSys'24) at 4A312 the 17/5/2024 at 10h00.

Abstract

Federated Learning (FL) has emerged as a powerful approach that enables collaborative distributed model training without the need for data sharing. However, FL grapples with inherent heterogeneity challenges leading to issues such as stragglers, dropouts, and performance variations. Selection of clients to run an FL instance is crucial, but existing strategies introduce biases and participation issues and do not consider resource efficiency. Communication and training acceleration solutions proposed to increase client participation also fall short due to the dynamic nature of system resources. We address these challenges in this paper by designing FLOAT, a novel framework designed to boost FL client resource awareness. FLOAT optimizes resource utilization dynamically for meeting training deadlines, and mitigates stragglers and dropouts through various optimization techniques; leading to enhanced model convergence and improved performance. FLOAT leverages multi-objective Reinforcement Learning with Human Feedback (RLHF) to automate the selection of the optimization techniques and their configurations, tailoring them to individual client resource conditions. Moreover, FLOAT seamlessly integrates into existing FL systems, maintaining non-intrusiveness and versatility for both asynchronous and synchronous FL settings. As per our evaluations, FLOAT increases accuracy by up to 53%, reduces client dropouts by up to 78×, and improves communication, computation, and memory utilization by up to 81×, 44×, and 20× respectively.

← Pallas: HPC Trace Analysis at scale

State-Machine Replication Scalability Made Simple →

Parallel and Distributed Systems Group

FLOAT: Federated Learning Optimizations with Automated Tuning

Abstract

Next seminars