In-Network Leaderless Replication for Distributed Data Stores

Reading group: Bastien Gastaldi presented "In-Network Leaderless Replication for Distributed Data Stores" (VLDB'22) at 1A312 the 19/1/2024 at 11h15.


Leaderless replication allows any replica to handle any type of request to achieve read scalability and high availability for distributed data stores. However, this entails burdensome  coordination overhead of replication protocols, degrading write throughput. In addition, the data store still requires coordination for membership changes, making it  hard to resolve server failures quickly. To this end, we present NetLR, a replicated data store architecture that supports high performance, fault tolerance, and linearizability simultaneously. The key idea of NetLR is moving the entire replication functions into the network by leveraging the switch as an onpath in-network replication orchestrator. Specifically, NetLR performs consistency-aware read scheduling, high-performance write coordination, and active fault adaptation in the network switch. Our in-network replication eliminates inter-replica coordination for writes and membership changes, providing high write performance and fast failure handling. NetLR can be implemented using programmable switches at a line rate with only 5.68% of additional memory usage. We implement a prototype of NetLR on an Intel Tofino switch and conduct extensive testbed experiments. Our evaluation results show that NetLR is the only solution that achieves high throughput and low latency and is robust to server failures.