Monday, May 30, 2011

Viola-Jones object detection framework

The Viola-Jones object detection framework is the first object detection framework to provide competitive object detection rates in real-time proposed in 2001 by Paul Viola and Michael Jones.

Introduction
Object detection is detecting a specified object class such as cars, faces, plates ext. in a given image or a video sequence. Object detection has many applications in computer based vision such as object tracking, object recognition, and scene surveillance.

The technique relies on the use of simple Haar-like features that are evaluated quickly through the use of a new image representation. Based on the concept of an “Integral Image” it generates a large set of features and uses the boosting algorithm AdaBoost to reduce the over-complete set and the introduction of a degenerative tree of the boosted classifiers provides for robust and fast interferences. The detector is applied in a scanning fashion and used on gray-scale images, the scanned window that is applied can also be scaled, as well as the features evaluated.

In the technique only simple rectangular (Haar-like) features are used, reminiscent to Haar basis functions. These features are equivalent to intensity difference readings and are quite easy to compute. There are three feature types used with varying numbers of sub-rectangles, two, two rectangles, one three and one four rectangle feature types. Using rectangular features instead of the pixels in an image provides a number of benefits, namely a sort of a ad-hoc domain knowledge is implied as well as a speed increase over pixel based systems. The calculation of the features is facilitated with the use of an “integral image”. With the introduction of a integral image Viola and Jones are able to calculate in one pass of the sample image, and is one of the keys to the speed of the system. An integral image is similar to a “summed are table”, used in computer graphics but its use is applied in pixel area evaluation.

It was outlined that the implementation of a system that used such features would provide a feature set that was far too large, hence the feature set must be only restricted to a small number of critical features. This is done with the use of boosting algorithm, AdaBoost. Interference is enhanced with the use of AdaBoost where a small set of features is selected from a large set, and in doing so a strong hypothesis is formed, in this case resulting in a strong classifier. Simply having a reduced set of features was not enough to reduce the vast amounts of computation in a detector task, since it is naturally a probabilistic one, hence Viola and Jones proposed the use of degenerative tree of classifiers.

Described by Viola and Jones as a degenerative tree, and sometimes referred to as a decision stump, its use also speeds the detection process. A degenerative tree is the daisy chaining of general to specific classifiers, whereby the first few classifiers are general enough to discount an image sub window and so on the time of further observations by the more specific classifiers down the chain, this can save a large degree of computation.

Integral Image
In order to be successful a face detection algorithm must possess two key features, accuracy and speed. There is generally a trade-off between the two. Through the use of a new image representation, termed integral images, Viola and Jones describe a means for fast feature evaluation, and this proves to be an effective means to speed up the classification task of the system.

Integral images are easy to understand, they are constructed by simply taking the sum of the luminance values above and to the left of a pixel in an image. Viola and Jones make note of the fact that the integral image is effectively the double integral of the sample image, first along the rows then along the columns. Integral images are equivalent to summed-area tables, yet their use is not texture mapping, being so, their implementation us quite well documented.

1 1 1
1 1 1
1 1 1


1 2 3
2 4 6
3 6 9
The brilliance in using an integral image to speed up a feature extraction lies in the fact that any rectangle in an image can be calculated from that images integral image, in only four indexes to the integral image. This makes the otherwise exhaustive process of summing luminances quite rapid. In fact the calculation of an images integral image can be calculated in only one pass of the image, and Matlab experiments have shown that a large set of images (12000) can be calculated within less than 2 seconds.

Integral application
Given a rectangle specified as four coordinates A(x1,y1) upper left and D(x4,y4) lower right, evaluating the area of the rectangle is done in four image references to the integral image, this represents a huge performance increase in terms of feature extraction.

Sum of grey rectangle = D - (B + C) + A

Since both rectangle B and C include rectangle A the sum of A has to be added to the calculation.


Source:
http://www.codeproject.com/Articles/85113/Efficient-Face-Detection-Algorithm-using-Viola-Jon.aspx

No comments:

Post a Comment