A Holistic Approach to Robust and Scalable Object Recognition

This project is conducted at Advanced Digital Sciences Center (ADSC), as part of the Interactive Digital Media (IDM) sub-program. The PI leading our research efforts is Dr. Yi Ma from UIUC and MSRA.


[Left plot] Training acquisition system: Four projectors and two cameras controlled by one computer. [Top-right plot] 38 traning images of one subject captured by the training acquisition system. [Bottom-right plot] Representative testing examples, including indoor/outdoor images (1st row), images of subjects with eyeglasses (2nd row), and image of subjects with sunglasses (3rd row).


Such exciting applications as augmented reality also pose tremendous challenges for the research on object recognition. Many of the existing object recognition techniques are still far from meeting the performance demand of many real-world applications. For instance, existing commercial face recognition software do not yet meet the high performance requirements of security systems, and have resulted in several high-profile failures when their application domain gets stretched into security. Also, most of the commercial text recognition systems or barcode readers require very precise alignment of the image orientation and often cannot tolerate any corruption in the input. In fact, the performance of almost all commercial image-based object recognition systems degrades rapidly whenever the input image is different from the training data in the following aspects:

  1. Taken under significantly different lighting conditions;

The illumination variations among images of the same face are larger than image changes in face identity [1]. Chen et al. [2] illustrated this complexity by showing that there is no discriminative illumination invariant for Lambertian objects under distant point light sources; i.e., it is not possible to determine whether two images were created by the same object under two distant point light sources or by different objects. Hence, given a test image of an object, it is difficult to predict anything definite about this object or how it will appear under different lighting conditions. While the test image can be expressed by a linear combination of an appropriate set of training images [3], [4], the test image taken under significantly different lighting conditions may not be well represented by the training images.

  1. Partially occluded or corrupted;

Partial occlusion poses a significant obstacle to robust real-world face recognition [5], [6], [7]. Handling occlusion is difficult mainly due to the unpredictable nature of the errors: it may affect any part of the image and may be arbitrarily large in magnitude. A face recognition system can confront occluded faces in real world applications very often due to use of accessories, such as scarf or sunglasses, hands on the face, the objects that persons carry, and external sources that partially occlude the camera view. Therefore, the face recognition system has to be robust to occlusion in order to guarantee reliable real-world operation.

  1. With a significant pose or viewpoint change.

Another major challenge encountered by current face recognition techniques lies in the difficulties of handling varying poses, i.e., recognition of faces in arbitrary in-depth rotations. The face image differences caused by rotations are often larger than the inter-person differences used in distinguishing identities. Face recognition across pose, on the other hand, has great potentials in many applications dealing with uncooperative subjects, in which the full power of face recognition being a passive biometric technique can be implemented and utilised. A long line of research exists on using Active Appearance Models [8], and the closely related Active Shape Models [9] to register images against a relatively high-dimensional model of plausible face appearances, often leveraging face specific contours. While these model-based techniques have advantages in dealing with variations in expression and pose, they may add unnecessary complexity to applications where subjects normally present a neutral face or only have moderate expression.

In recent years, from the lessons learned from face recognition, people have come to realize that in order for any object recognition system to function reliably under real-world conditions, it has to be made robust to at least the above three factors. Precisely, the goal of this project is to systematically study and develop new computational methods that can harness all the information encoded in the entire object image (instead of a small set of features) and make object recognition truly robust to illumination, occlusion, and pose variation. We call this a “holistic approach” to object recognition.


[1] Y. Adini, Y. Moses, and S. Ullman, "Face recognition: The problem of compensating for changes in illumination direction," IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 721-732, July 1997.
[2] H. Chen, P. Belhumeur, and D. Jacobs, "In search of illumination invariants," in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recog., 2000, vol 1, pp. 254-261.
[3] P. Belhumeur and G. Hager, “Tracking in 3D: Image variability decomposition for recovering object pose and illumination,” Pattern Analysis and Applications, vol. 2, pp. 82–91, 1999.
[4] H. Murase and S. Nayar, “Visual learning and recognition of 3D objects from appearance,” International Journal of Computer Vision, vol. 14, pp. 5–24, 1995.
[5] A. Martinez, “Recognizing imprecisely localized, partially occluded, and expression variant faces from a single sample per class,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 6, pp. 748-763, June 2002.
[6] A. Leonardis and H. Bischof, “Robust recognition using Eigenimages,” Computer Vision and Image Understanding, vol. 78, no. 1, pp. 99-118, 2000.
[7] F. Sanja, D. Skocaj, and A. Leonardis, “Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 3, Mar. 2006.
[8] T. Cootes, G. Edwards, and C. Taylor, “Active appearance models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 681–685, 2001.
[9] T. Cootes and C. Taylor, “Active shape models – ‘smart snakes’,” in Proceedings of British Machine Vision Conference, 1992.