A comprehensive analysis of concept drift locality in data streams

Paper · arXiv 2311.06396 · Published November 10, 2023

“Modern data sources continuously generate information characterized by both volume and velocity, flooding learning systems with a constant flow of data. This scenario is commonly referred to as data streams [1, 2]. Traditional classification methods, designed for static data, struggle to keep up with the ever-changing characteristics of these incoming instances [1, 3]. Given the dynamic nature of data streams, it becomes essential for learning methods to adapt and acquire knowledge about emerging concepts over time. This phenomenon is known as concept drift [4], and it can manifest in various ways, including shifts in class distribution and decision boundaries [5], and the emergence of new features or classes [6]. If not detected and addressed effectively, concept drift can significantly degrade predictive performance, as knowledge learned from older concepts may not be useful anymore to classify recent instances [7].

In recent years, the issue of concept drift has garnered significant attention within the research community across various domains, including sensors, robotics, system monitoring, and anomaly detection [8]. Current research in this field is tackling increasingly complex challenges. These challenges include accurately detecting concept drift within unstructured and noisy datasets [9], providing understandable explanations for concept drift [10], and effectively responding to drift by adapting relevant knowledge [11]. When we extend these concerns to scenarios involving multiple classes, we encounter a complex and perplexing scenario that actually occurs in many real-life applications. Detecting concept drift in such contexts becomes exceptionally demanding, as we must account for the evolving nature of multiple classes [6, 12]. In addition to the challenges previously mentioned, it is important to note that the location of concept drift within the feature space significantly influences both the performance of classifiers and the effectiveness of drift detection methods [6, 13]. However, there is a lack of studies that evaluate drift detectors under varying degrees of drift locality or provide benchmark datasets to support research in this crucial area.”