Evaluating the Performance of Apache Cassandra 5
Context
Apache Cassandra is one of the most prominent storage systems. It offers a complex data model and a rich API (Cassandra Query Language) to write distributed applications with. This storage system is able to store petabytes of data and is used in many industry-leading companies as a key building block in the application stack [a]. In particular, Cassandra is extremely robust and able to replicate data consistently across several geo-graphical locations despite network failures, or even an entire datacenter outage.
A new version of Apache Cassandra (version 5) has been announced recently [b]. It provides a full support for ACID transactions, similarly to traditional distributed databases such as Postgres. To execute a transaction, Cassandra 5 relies on a new consensus protocol called Accord [c,d]. Accord leverages the recent advances in leaderless state-machine replication to execute transactions quickly among a set of geo-distributed sites.
Objectives
The goal of this internship is to evaluate the performance of Apache Cassandra 5. To this end, we will employ standard benchmarks such as the Yahoo! Cloud Serving Benchmark (YCSB) and the TPC benchmarks suite. The evaluation will be run at scale, in a geo-distributed setting, using a public cloud infrastructure (e.g., GCP or AWS). In particular, the evaluation campaign aims to answer the following questions:
- Is Accord more efficient than Paxos (the consensus protocol used in Cassandra 4)?
- What is the behavior of Accord in the advent of the failure of one or more sites?
- How efficient are the mechanisms to bound metadata usage (e.g., garbage collection)?
If successful, the intern will be proposed to pursue in PhD.
To apply
We are looking for a Master student to run the experiments. This work is done in close cooperation with Apple Inc., who is leading the development of Accord, and the IMDEA Software Institute.
If interested, please contact Prof. Pierre Sutra with a CV and a cover letter.
[a] CloudKit: Structured Storage for Mobile Applications, A. Shraer et al., VLDB ’18.
[b] https://www.cassandrasummit.org/cassandra-forward
[c] https://github.com/apache/cassandra-accord
[d] https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15%3A+General+Purpose+Transactions?preview=/188744725/188744736/Accord.pdf