Robust Face Recognition via Sparse Representation_图文

Robust Face Recognition via Sparse Representation -- A Q&A about the recent advances in face recognition

and how to protect your facial identity

Allen Y. Yang (yang@eecs.berkeley.edu)

Department of EECS, UC Berkeley

July 21, 2008

Q: What is this technique all about?

A: The technique, called robust face recognition via sparse representation, provides a new solution to use computer program to classify human identity using frontal facial images, i.e., the well-known problem of face recognition.

Face recognition has been one of the most extensively studied problems in the area of artificial intelligence and computer vision. Its applications include human-computer interaction, multimedia data compression, and security, to name a few. The significance of face recognition is also highlighted by a

contrast between human’s high accuracy to recognize face images under various conditions and the computer’s historical poor accuracy.

This technique proposes a highly accurate recognition framework. The extensive experiment has shown the method can achieve similar recognition accuracy as human vision, for the first time. In some cases, the method has outperformed what human vision can achieve in face recognition.

Q: Who are the authors of this technique?

A: The technique was developed in 2007 by Mr. John Wright, Dr. Allen Y. Yang, Dr. S. Shankar Sastry, and Dr. Yi Ma.

The technique is jointly owned by the University of Illinois and the University of California, Berkeley. A provisional US patent has been filed in 2008. The technique is also being published in the IEEE Transactions on Pattern Analysis and Machine Intelligence [Wright 2008].

Q: Why is face recognition difficult for computers?

A: There are several issues that have historically hindered the improvement of face recognition in computer science.

中学物理教学参考1.High dimensionality, namely, the data size is large for face images.

When we take a picture of a face, the face image under certain color metrics will be stored as an image file on a computer, e.g., the image shown in Figure 1. Because the human brain is a massive parallel processor, it can quickly process a 2-D image and match the image with the other images learned in the past. However, the modern computer algorithms can only process 2-D images sequentially, meaning, it can only process an image pixel-by-pixel. Hence although the image file usually only takes less than 100 K Bytes to store on computer, if we treat each image as a sample point, it sits in a space of more than 10-100 K dimension (that is each pixel owns an individual dimension). Any pattern recognition problem in high-dimensional space (>100 D) is known to be difficult in the literature.

Fig. 1. A frontal face image from the AR database [Martinez 1998]. The size of a JPEG file for this image is typically about 60 Kbytes.

2.The number of identities to classify is high.

To make the situation worse, an adult human being can learn to recognize thousands if not tens of thousands of different human faces over the span of his/her life. To ask a computer to match the similar ability, it has to first store tens of thousands of learned face images, which in the literature is called the training images. Then using whatever algorithm, the computer has to process the massive data and quickly identify a correct person using a new face image, which is called the test image.

Fig. 2. An ensemble of 28 individuals in the Yale B database [Lee 2005]. A typical face recognition system needs to recognition 10-100 times more individuals. Arguably an adult can recognize thousands times more individuals in daily life.

Combine the above two problems, we are solving a pattern recognition problem to carefully partition a high-dimensional data space into thousands of domains, each domain represents the possible appearance of an individual’s face images.

3.Face recognition has to be performed under various real-world conditions.

山东省劳动厅

When you walk into a drug store to take a passport photo, you would usually be asked to pose a frontal, neutral expression in order to be qualified for a good passport photo. The store associate will also control the photo resolution, background, and lighting condition by using a uniform color screen and flash light. However in the real world, a computer program is asked to identify humans without all the above constraints. Although past solutions exist to achieve recognition under very limited relaxation of the constraints, to this day, none of the algorithms can answer all the possible challenges, including this technique we present.

To further motivate the issue, human vision can accurately recognize learned human faces under different expressions, backgrounds, poses, and resolutions [Sinha 2006]. With professional training, humans can also identify face images with facial disguise. Figure 3 demonstrates this ability using images of Abraham Lincoln.

野生动植物资源保护与利用

Fig. 3. Images of Abraham Lincoln under various conditions (available online). Arguably humans can recognize the identity of Lincoln from each of these images.

A natural question arises: Do we simply ask too much for a computer algorithm to achieve? For some applications such as at security check-points, we can mandate individuals to pose a frontal, neural face in order to be identified. However, in most other applications, this requirement is simply not practical. For example, we may want to search our photo albums to find all the images that contain our best friends

under normal indoor/outdoor conditions, or we may need to identify a criminal suspect from a murky, low-resolution hidden camera who would naturally try to disguise his identity. Therefore, the study to recognize human faces under real-world conditions is motivated not only by pure scientific rigor, but also by urgent demands from practical applications.

交通流模型Q: What is the novelty of this technique? Why is the method related to sparse representation?

A: The method is built on a novel pattern recognition framework, which relies on a scientific concept called sparse representation. In fact, sparse representation is not a new topic in many scientific areas. Particularly in human perception, scientists have discovered that accurate low-level and mid-l

经济研究参考

evel visual perceptions are a result of sparse representation of visual patterns using highly redundant visual neurons [Olshausen 1997, Serre 2006].

Without diving into technical detail, let us consider an analogue. Assume that a normal individual, Tom, is very good at identifying different types of fruit juice such as orange juice, apple juice, lemon juice, and grape juice. Now he is asked to identify the ingredients of a fruit punch, which contains an unknown mixture of drinks. Tom discovers that when the ingredients of the punch are highly concentrated on a single type of juice (e.g., 95% orange juice), he will have no difficulty in identifying the dominant ingredient. On the other hand, when the punch is a largely even mixture of multiple drinks (e.g., 33% orange, 33% apple, and 33% grape), he has the most difficulty in identifying the individual ingredients. In this example, a fruit punch drink can be represented as a sum of the amounts of individual fruit drinks. We say such representation is sparse if the majority of the juice comes from a single fruit type. Conversely, we say the representation is not sparse. Clearly in this example, sparse representation leads to easier and more accurate recognition than nonsparse representation.

The human brain turns out to be an excellent machine in calculation of sparse representation from biological sensors. In face recognition, when a new image is presented in front of the eyes, the visua

l cortex immediately calculates a representation of the face image based on all the prior face images it remembers from the past. However, such representation is believed to be only sparse in human visual cortex. For example, although Tom remembers thousands of individuals, when he is given a photo of his friend, Jerry, he will assert that the photo is an image of Jerry. His perception does not attempt to calculate the similarity of Jerry’s photo with all the images from other individuals. On the other hand, with the help of image-editing software such as Photoshop, an engineer now can seamlessly combine facial features from multiple individuals into a single new image. In this case, a typical human would assert that he/she cannot recognize the new image, rather than analytically calculating the percentage of similarities with multiple individuals (e.g., 33% Tom, 33% Jerry, 33% Tyke) [Sinha 2006].

Q: What are the conditions that the technique applies to?

A: Currently, the technique has been successfully demonstrated to classify frontal face images under different expressions, lighting conditions, resolutions, and severe facial disguise and image distortion. We believe it is one of the most comprehensive solutions in face recognition, and definitely one of the most accurate.

钢水脱氧

Further study is required to establish a relation, if any, between sparse representation and face images with pose variations.

Q: More technically, how does the algorithm estimate a sparse representation using face images? Why do the other methods fail in this respect?

A: This technique has demonstrated the first solution in the literature to explicitly calculate sparse representation for the purpose of image-based pattern recognition. It is hard to say that the other extant methods have failed in this respect. Why? Simply because previously investigators did not realize the importance of sparse representation in human vision and computer vision for the purpose of classification. For example, a well-known solution to face recognition is called the nearest-neighbor method. It compares the similarity between a test image with all individual training images separately. Figure 4 shows an illustration of the similarity measurement. The nearest-neighbor method identifies the test image with a training image that is most similar to the test image. Hence the method is called the nearest neighbor. We can easily observe that the so-estimated representation is not sparse. This is because a single face image can be similar to multiple images in terms of its RGB pixel values. Therefore, an accurate classification based on this type of metrics is known to be difficult.

Fig. 4. A similarity metric (the y-axis) between a test face image and about 1200 training images. The smaller the metric value, the more similar between two images. Our technique abandons the conventional wisdom to compare any similarity between the test image and individual training images or individual training classes. Rather, the algorithm attempts to calculate a representation of the input all available training images as a whole. Furthermore, the method imposes one

extra constraint that the optimal representation should use the smallest number of training images. Hence, the majority of the coefficients in the representation should be zero, and the representation is sparse (as shown in Figure 5).

本文发布于:2024-09-20 17:18:48，感谢您对本站的认可！

本文链接：https://www.17tex.com/xueshu/645466.html

上一篇：【R语言】RobustRankAggregation(RRA)方法介绍

下一篇：Robust quasi-LPV model reference FTC of a quadroto

标签：参考资源山东省模型钢水动植物利用劳动厅

留言与评论（共有 0 条评论）