Face recognition applications to date have fallen into roughly two categories. Face recognition has recently seen a lot of success in a family of less-demanding applications such as online image search and family photo album organization (e.g. Google Picasa, Microsoft Photo Gallery, and Apple iPhoto). At the other end of the tractability spectrum there are the terrorist watchlist and mass surveillance applications that have for the most part dominated the field of face recognition research. However, there are many face recognition applications that fall roughly between these extremes, where very high recognition performance is desired, but the users in the gallery are still allies of the system rather than adversaries. These applications include access control for secure facilities (e.g., prisons and office buildings), computer systems, automobiles, or automatic teller machines, where controlled gallery images can be obtained in advance. These applications are very interesting due to their potential sociological impact. Since the gallery subjects are allies rather than opponents of the recognition system, this creates the possibility of carefully controlling the acquisition of the training data. While the same can be said for other biometrics such as fingerprints and iris recognition, face recognition has the potential of working with test data that is much less controlled, allowing the access control system to be made less intrusive to the users of the system.

Many of the existing face recognition techniques are still far from meeting the performance demand of many real-world applications and have resulted in several high-profile failures when their application domain gets stretched into security. In fact, almost all commercial image-based face recognition systems today still rely on very primitive pattern recognition techniques: they essentially find similar instances to the query face based on similarity of certain local features of the face image, which can be solved with reasonable accuracy and speed. For instance, finding duplicated or similar images over the Internet. The performance of such recognition systems however degrades rapidly whenever the input image is different from the training data in the following aspects:

  • Taken under significantly different lighting conditions;
  • Partially occluded or corrupted;
  • With a significant pose or viewpoint change.

Our goal is to systematically study and develop new computational methods that can harness all the information encoded in an entire face image (instead of a small set of features) and make face recognition truly robust against illumination, occlusion, and pose variation. Why these three factors substantially make face recognition problem difficult can be found at Challenges. We call this a “holistic approach” to face recognition. To this end, we have been investigating the fundamental minimum requirements in the training data, in terms of number of samples and their resolution and conditions, for guaranteed recognition performance; In addition, for this new approach to be practical for modern Internet-scale recognition tasks, we have been studying the computational complexity of the associated recognition tasks and ensure that they allow truly scalable solutions.


Scalable and Robust Face Recognition System

Face recognition recently has gained increasing popularity largely due to the ease of acquisition and dissemination of family photos over the Internet. Almost all major IT companies have recently released their consumer face recognition softwares, e.g. Picassa of Google, iPhoto of Apple, photo gallery of Microsoft. Rather fortunately, recent breakthroughs in the study of high-dimensional statistics, geometry, and convex optimization have offered some exciting new insights and effective tools that have shown great promise in addressing the robustness and scalability issues in face recognition in a principled way. We try to express the query image y, up to a domain transform (e.g. affine), as a sparse linear supervision x of the training image gallery A and some sparse corrupting error e, as shown in the following figure.


A face acquisition system, face detection, recognition, and an extensive realistic test image dataset has been built at Perception and Decision Laboratory @ UIUC. The recognition system has achieved by far the best state-of-the-art recognition performance on the largest public face recognition databases such as CMU Multi-PIE (up to about 350 subjects) and the database captured at UIUC (up to about 120 subjects). The system is able to achieve above 95% recognition rate for all types of lighting, scale, with or without glasses. At ADSC, we are creating the next generation of robust face recognition systems by solving the scalability, cost, portability, and remaining robustness issues associated with UIUC approach, while preserving its best features.


Object Recognition with Less Controlled Training Data

The sparse representation based recognition methods rely on training data being well captured and well aligned. This is a stringent condition. So if one believes the training images (after properly alignment and transformation) are linearly correlated and span a low-dimensional subspace, then a strategy similar to the one used for face recognition (mentioned above) would allow us to simultaneously align all the training images despite severe corruption in individual images. The following figure shows simultaneous alignment of 40 images of a person’s face under different illuminations, poses, expressions, and occlusions, via minimizing matrix rank and error sparsity.


Based on new computational tools for recovering sparse signals or low-rank matrices, we are developing new algorithms for face recognition, which we will also apply to text and landmark recognition. These algorithms can not only handle change of viewpoint and partial occlusion of the object, but also scale up to thousands and even millions of object classes without degrading performance. The figure below gives representative examples of objects for recognition: human faces, license plates (or texts), and landmarks.

1 2  4
5    6        8

[Top row] input images; [Bottom row] rectified output by our algorithms.