Parallel and Distributed Systems Group

Computer Science Department of Telecom SudParis

téléGC: Remote Garbage Collection using Memory Disaggregation

Team work: Adam Chader presented "téléGC: Remote Garbage Collection using Memory Disaggregation" at 1D19 the 8/9/2023 at 11h00.


Memory disaggregation allows to run memory intensive applications, such as memory-based big data frameworks or in-memory databases, while using the memory already available in big quantity in the rack, without having to modify any of the code, and without having to suffer the serialization cost to transfer data which would have been important in a distributed systems setting. The idea of disaggregation is to create a computing system with resource-specific nodes: machines focused on computing, and other on memory. The entire system being managed to keep the abstraction of a monolithic machine. Memory accesses on such systems thus become remote, and to mitigate the overhead, it makes heavy use of caching on computing nodes.

Unfortunately, a lot of application have poor locality and thus perform badly with disaggregation. Most notably Gargage Collection (GC), which is a key feature of managed languages. Indeed, GCs have to scour the entirety of the memory to figure out which objects need to be collected, which pollutes the cache, hindering both GC and application performance. The issue is that Garbage collection performance scales very poorly with memory size, and has a terrible cache locality, which makes it an antipattern for memory disaggregation.

We propose téléGC, a modified JVM runtime offering full-transparency with legacy JVM programs for far memory accesses and non obstrusive collection of objects on large heaps. This is rendered possible by offloading the collection to memory nodes, where the data is located, rather than on the compute nodes. This allows mitigating the communication and data movement between the nodes during collection.

Moreover, by using the cache coherency discrepancy between compute and memory nodes, we can perform a snapshot-at-the-beginning (SATB) collection, which allows for concurrent collection, without additional cost. The implementation of our GC is completely deported from the JVM and language agnostic, meaning téléGC could be used to collect memory from other languages than Java.