Noisy subspace clustering via matching pursuits


Michael Tschannen and Helmut Bölcskei


IEEE Transactions on Information Theory, 2018, to appear.

[BibTeX, LaTeX, and HTML Reference]


Sparsity-based subspace clustering algorithms have attracted significant attention thanks to their excellent performance in practical applications. A prominent example is the sparse subspace clustering (SSC) algorithm by Elhamifar and Vidal, which performs spectral clustering based on an adjacency matrix obtained by sparsely representing each data point in terms of all the other data points via the Lasso. When the number of data points is large or the dimension of the ambient space is high, the computational complexity of SSC quickly becomes prohibitive. Dyer et al. observed that SSC-OMP obtained by replacing the Lasso by the greedy orthogonal matching pursuit (OMP) algorithm results in significantly lower computational complexity, while often yielding comparable performance. The central goal of this paper is an analytical performance characterization of SSC-OMP for noisy data. Moreover, we introduce and analyze the SSC-MP algorithm, which employs matching pursuit (MP) in lieu of OMP. Both SSC-OMP and SSC-MP are proven to succeed even when the subspaces intersect and when the data points are contaminated by severe noise. The clustering conditions we obtain for SSC-OMP and SSC-MP are similar to those for SSC and for the thresholding-based subspace clustering (TSC) algorithm due to Heckel and Bölcskei. Analytical results in combination with numerical results indicate that both SSC- OMP and SSC-MP with a data-dependent stopping criterion automatically detect the dimensions of the subspaces underlying the data. Experiments on synthetic and on real data show that SSC-MP often matches or exceeds the performance of the computationally more expensive SSC-OMP algorithm. Moreover, SSC-MP compares very favorably to SSC, TSC, and the nearest subspace neighbor (NSN) algorithm, both in terms of clustering performance and running time. In addition, we find that, in contrast to SSC-OMP, the performance of SSC-MP is very robust with respect to the choice of parameters in the stopping criteria.


Subspace clustering, matching pursuit algorithms, sparse signal representations, unions of subspaces, spectral clustering, noisy data

Code to reproduce the figures and tables in this paper is available here. A Python implementation of SSC-OMP and SSC-MP is available here.

Download this document:


Copyright Notice: © 2018 M. Tschannen and H. Bölcskei.

This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.