Adaptive Random Forests with Resampling for Imbalance data Streams
Reading group: Ewa Turska presented "Adaptive Random Forests with Resampling for Imbalance data Streams" (LJCNN'19) at 4A312 the 19/11/2021 at 10h30.
Abstract
The large volume of data generated by computer networks, smartphones, wearables and a wide range of sensors, which produce real-time data, are only useful if they can be efficiently processed so that individuals can make timely decisions based on them. In this context, machine learning techniques are widely used. While it performs better than humans in such tasks, every machine learning algorithm has a certain intrinsic bias, which means they assume that the data have specific characteristics, such as having a balanced distribution between classes. As many real-world applications present imbalanced traits in their data, this topic is gaining repercussion over time. In this work, we present the Adaptive Random Forest with Resampling (ARFRE), which is a classifier designed to deal with imbalanced datasets. ARFRE resample the instances based on the current class label distribution. We show through a set of extensive experiments on seven datasets that the proposed method can considerably improve the performance of the minority class(es) while avoiding degrading the performance in the majority class. On top of that, ARFRE is more efficient regarding execution time in comparison to the standard ARF algorithm.