Robust nonparametric nearest neighbor random process clustering
AuthorsMichael Tschannen and Helmut Bölcskei
ReferenceIEEE Transactions on Signal Processing, 2017, to appear.
AbstractWe consider the problem of clustering noisy finite-length observations of stationary ergodic random processes according to their generative models without prior knowledge of the model statistics and the number of generative models. Two algorithms, both using the L1-distance between estimated power spectral densities (PSDs) as a measure of dissimilarity, are analyzed. The first one, termed nearest neighbor process clustering (NNPC), relies on partitioning the nearest neighbor graph of the observations via spectral clustering. The second algorithm, simply referred to as k-means (KM), consists of a single k-means iteration with farthest point initialization and was considered before in the literature, albeit with a different dissimilarity measure and with asymptotic performance results only. We prove that both algorithms succeed with high probability in the presence of noise and missing entries, and even when the generative process PSDs overlap significantly, all provided that the observation length is sufficiently large. Our results quantify the tradeoff between the overlap of the generative process PSDs, the observation length, the fraction of missing entries, and the noise variance. Furthermore, we prove that treating the finite-length observations of stationary ergodic random processes as vectors in Euclidean space and clustering them using the thresholding-based subspace clustering (TSC) algorithm, the subspace clustering cousin of NNPC, results in performance strictly inferior to that of NNPC. We argue that the underlying cause is to be found in TSC employing spherical distance as dissimilarity measure, thereby ignoring the stationary process structure of the observations. Finally, we provide extensive numerical results for synthetic and real data and find that NNPC outperforms state-of-the-art algorithms in human motion sequence clustering.
KeywordsClustering, stationary random processes, time series, nonparametric, k-means, nearest neighbors
Code to reproduce the figures and tables in this paper is available here.
Download this document:
Copyright Notice: © 2017 M. Tschannen and H. Bölcskei.
This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.