Title: Efficient methods for detecting low-rank substructure
Abstract: A common goal in data-analysis is to capture some subset of the data using a reduced number of degrees-of-freedom. For example, when analyzing genomic data one is often interested in discovering subgroups of genes which exhibit correlated activity across a subset of patients. This goal can be rephrased as follows: given a large data matrix in a high-dimensional space, how can one efficiently determine if some submatrix is well captured using only a few principal components? Naive methods for solving this problem are either very slow, or do not scale well as the size of the matrix increases. In this talk I will present a method that is quite fast, and practical even when the data sets are very large.