Easy, Accurate, and Fast Machine Learning on Very Large Time Series Collections: Similarity Search and Subsequence Anomaly Detection
10/10/2023
IPGP - Îlot Cuvier
14:00
Séminaires de Sismologie
Salle 310
Themis Palpanas
LIPADE, University Paris Cite
There is an increasingly pressing need, by several applications in diverse domains, for developing techniques able to manage and analyze very large collections of sequences, or data series. Examples of such applications come from various monitoring applications, including in power utility companies, where we need to apply machine learning techniques for knowledge extraction. It is not unusual for these applications to involve numbers of data series in the order of hundreds of millions to billions, which are often times not analyzed in their full detail due to their sheer size. However, no existing data management solution can offer native support for sequences and the corresponding operators necessary for complex analytics. In this talk, we describe our efforts in designing techniques for indexing and analyzing truly massive collections of data series that enable scientists to run complex analytics on their data. These techniques are orders of magnitude faster than the state of the art, and are applied on datasets derived from several different disciplines, including seismology. We also present our recent work on (essentially, parameter-free) subsequence anomaly detection and explanation, which is both more accurate and faster than competing approaches.