Fast example-based pose estimation

The objective of articulated pose estimation is to infer location and orientation of artuculated object's body parts (the level of refinement and the accuracy of description depends on the task at hand). The most common, and perhaps the most important, instance of this problem is the estimation of human articulated pose: the goal here is to understand the "posture" of the person. The images below illustrate this: basically, our goal is to have a computer look at the image on the left and infer all the information it needs to render the figure on the right, showing a synthetic character in the same body posture as the person on the left.

Parameter-Sensitive Hashing (PSH) is an algorithm for fast similarity search for regression tasks, when the goal is to find examples in the database which have values of some parameter(s) similar to those of the query example, but the parameters can not be directly computed from the examples.

Locality Sensitive Hashing

PSH is essentially an adaptation of Locality-Sensitive Hashing (LSH) algorithm by Indyk and Motwani. Below is a very coarse summary of LSH; please refer to the LSH website and the original LSH papers for more complete explanations.

This is of course an approximate similarity search, in the sense that we may miss the answer even if it exists in the database, however this happens with probability which can in practice be made arbitrarily small. Please see the LSH literature to learn about the guarantees on running time and probability of success of this algorithm.

Parameter Sensitive Hashing

It is clear how LSH can be applied to search in a space where all we care about is the

The main idea of PSH is to find a feature space (i.e., a transformation of the image) in which the similarity in terms of L1 distance would be closely related to similarity in parameter (in our case, pose) space. Seen in this way, illustrated below, a decision stump at the value T on feature φ is acting as a paired classifier - applied on a pair of images, it "classifies" the two images as having a similar pose if both of them fall on the same side (blue dots), or as having sufficiently different poses if they are separated (red dots).

Given a training set of images with known poses (our database), we automatically also have a huge set of image pairs with known labels: we can label a pair as positive if the poses in the two images are close enough. Just what is considered close enough depends on the application needs. In our experiment, we decided that poses which correspond distances less than 5cm between key body joints (in 3D) are close enough. Below is an example of a positive pair and a negative pair.


As you can see, these are not real people. We used Poser® to generate a databes of 500,000 images like the ones above, with a variety of clothing, hair, face and lighting conditions.

Now we are faced with a straighforward learning task: select the decision stumps that have precision better than some minimum on this classification task. Parameter-Sensitive Hashing then proceeds with

Combining a number of such decision stumps into a k-bit hash table makes a more complex classifier, which assigns a positive label to a pair such that both of the images fall in the same bucket. To those who are familiar with the idea of boosting (building a strong ensemble classifier from a set of so-called weak learners, each of which may be only slightly better than random) this may sound familiar. This analogy led us to develop a boosting-based approach to PSH: we learn the decision stumps in a greedy fashion, explicitly increasing the performance of the hashing. This is described in the CMU tech. report linked below.