PhD Proposal: Sparse Representations and Feature Learning for Image Set Classification and Correspondence Estimation
The use of effective features is a key component to solving many computer vision tasks including, but not limited to, image (set) classification and correspondence estimation. Many research directions have focused on finding good features for the task under consideration, traditionally by hand crafting and recently by machine learning. In the proposed research, we study machine feature learning when the data under consideration has or needs to have a special geometric configuration. As applications, we consider the problem of image set classification, with a focus on sparse representation-based methods, as well as correspondence estimation, with a focus on situations when the points to be matched lie on a ground plane.
We start by benchmarking various image set classification methods on a mobile video dataset that we have collected and made public. The videos were acquired while the users were performing a number of tasks under three different ambient conditions to capture the type of variations caused by the 'mobility' of the devices. An inspection of these videos reveal a combination of favorable and challenging properties unique to smartphone face videos. In addition to the variations caused by the mobility of the device, other challenges in the dataset include partial faces, occasional pose changes, blur and face/fiducial points localization errors. Based on the evaluation, the recognition rates drop dramatically when enrollment and test videos come from different sessions.
We then propose to represent image sets as a dictionaries of Symmetric Positive Definite (SPD) matrices that are more robust to local deformations and fiducial point location errors. We then learn a tangent map for transforming the SPD matrix logarithms into a lower-dimensional Log-Euclidean space such that the transformed gallery atoms adhere to a more discriminative subspace structure. A query image set is then classified by first mapping its SPD descriptors into the computed Log-Euclidean tangent space and then using the sparse representation over the tangent space to decide a label for the image set. Experiments on three public video datasets show that sparse representation-based classification based on the proposed features outperforms many state-of-the-art methods.
Finally, we present Bayesian Representation-based Classification (BRC), an approach based on sparse Bayesian regression and subspace clustering for image set classification. A Bayesian statistical framework is used to compare BRC with similar existing approaches such as Collaborative Representation-based Classification (CRC) and Sparse Representation-based Classification (SRC), where it is shown that BRC employs precision hyperpriors that are more non-informative than those of CRC/SRC. Furthermore, we present a robust probe image set handling strategy that balances the trade-off between efficiency and accuracy. Experiments on three datasets illustrate the effectiveness of our algorithm compared to state-of-the-art set-based methods.
As future work, we propose to extend our shallow feature learning method into a deep one with the same objective of enhancing the discriminative subspace arrangement of the visual data. In addition, we propose to use CNNs to learn local feature descriptors for the purpose of matching ground points on road images. This would allow more robust estimation of ground plane parameters, as needed in structure from motion problems that arise in monocular visual odometry.
Chair: Dr. Rama Chellappa
Dept rep: Dr. Ashok Agrawala
Member: Dr. David Jacobs