A Scale-Invariant Image Representation: the CDT Graph

Introduction

 Many empirical studies, including our work on boundary contours, have confirmed the intuition that natural images are scale-invariant. It is our belief that scale-invariance will become more and more important as we move to tackle vision problems in more realistic settings, where we cannot assume the knowledge of object/scene scales. How can we cope with scale-invariance in natural images? One possible approach, being top-down and brute-force, is to explicitly search over all possible scales. This becomes expensive if one has to cover a large range of scale and a large number of object categories. An alternative approach, as we choose here, is to develop a scale-invariant representation from bottom-up, and build on this representation later stages of visual processing.

Piecewise Linear Approximation of Contours

 Suppose that we are given a boundary contour, some parts of which are straight and some of which are curved. How can we represent this contour? A common way is to parametrize it by arclength, i.e. to sample it uniformly. Such a representation is rather inefficient: we would like to put enough sample points at fine-scale details, but only a few would suffice for straight lines. Attneave had his theory that information along contours is mostly concentrated at high-curvature locations. How about we break up a contour at high-curvature locations, as done in our test of Markov assumption? Such a decomposition has the desired property that straight line segments remain undivided, and fine-scale details are sufficiently sampled. Of course, curvature is not a scale-independent measure. We can easily fix this problem by using a scale-invariant measure, i.e. angle, as the criterion of decomposition. The result is a piecewise linear approximation of the input contour.

Constrained Delaunay Triangulation

The piecewise linear representation above is yet incomplete, as there may exist low-contrast contours, or gaps, that are not detected by the local edge operator. We use the constrained Delaunay triangulation algorithm to complete the piecewise linear approximations. The constrained Delaunay triangulation (CDT) is a variant of of the standard Delaunay Triangulation in which a set of user-specified edges (in our case, the edges in the piecewise linear approximation) must lie in the triangulation.

 (a) (b)
 Figure 1: a scale-invariant representation: (a) an object at two scales; (b) the Constrained Delaunay Triangulation (CDT) graph, based on a piecewise linear approximation of low-level edges (G-edges, in black). The internal structure of the object is independent of scale.
 Figure 2: an example of the CDT graph. We use the Pb (Probability of Boundary) edge detector as the input. G-edges (gradient edges detected by Pb) are in black and C-edges (completed by CDT) in green. Note how the CDT manages to complete gaps on the front legs of the elephant (red highlight on the inset at right). These gaps are commonly formed when an object contour passes in front of a background whose appearance (brightness/texture) is similar to that of the object.

Empirical Validation

 The CDT graph can be viewed as a boundary-based, scale-invariant superpixel map. Before using it, we need to ask two questions: As a discrete image representation, how much structure is lost when we move from the image to the CDT graph? How good is the CDT graph at completing gradient-less gaps? These questions can only be answered by empirical validation on large datasets of natural images.
 Figure 3: empirical validation of CDT graphs on the Berkeley Segmentation Dataset (BSDS). The blue curve is obtained by averaging Pb values on each CDT edge, and then project the averages back to the pixel-grid and benchmark. There is little loss by using the CDT edges instead of pixels. The green curve shows the upper-bound (generated from matching to groundtruth boundaries) using CDT graphs. The precision is close to 100% (again little loss of structure); and the asymptotic recall rate is significantly increased (completions at gradient-less locations).

 Figure 4: relative merits of the CDT versus an alternate completion scheme based on connecting each vertex to the k-nearest visible vertices for k={1,3,5,7}. The plot shows the asymptotic recall rate (i.e. the number of illusory contours found) versus the number of potential completions which need to be considered. An ideal algorithm would achieve asymptotic recall of 1 with very few potential completions. The single filled marker shows performance of the CDT based completion while the curve shows performance over a range of choices of k. For each dataset, we find that the CDT based completion gives better recall rate at a given number of potential completions the than k-nearest visible neighbor algorithm.

Applications

 The CDT graph is a generic image representation that can be applied to many vision problems. It has all the nice properties of the superpixels. In addition, it is scale-invariant and can be computed very efficiently. It is a nice representation to model and exploit mid-level visual cues, such as curvilinear continuity, region grouping, or figure/ground organization. We have developed probabilistic models of mid-level vision using conditional random fields on top of the CDT graphs. We can easily incorporate high-level object knowledge into these models. The CDT graph, being small and scale-invariant, has also enabled us to efficiently find human body parts using parallelism in a scale-independent way.

References

1. Scale-Invariant Contour Completion using Conditional Random Fields.   [abstract] [pdf] [ps] [bibtex]
Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, to appear in ICCV '05, Beijing 2005.

2. Cue Integration in Figure/Ground Labeling.   [abstract] [draft] [bibtex]
Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, to appear in NIPS '05, Vancouver 2005.

3. Recovering Human Body Configurations using Pairwise Constraints between Parts.   [abstract] [pdf] [bibtex]
Xiaofeng Ren, Alex Berg and Jitendra Malik, to appear in ICCV '05, Beijing 2005.

4. Learning and Matching Line Aspects for Articulated Objects.   [abstract] [pdf] [ps] [bibtex]
Xiaofeng Ren, in CVPR '07, Minneapolis 2007.

5. Mid-level Cues Improve Boundary Detection.   [abstract] [pdf] [ps] [bibtex]
Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, Berkeley Technical Report 05-1382, CSD 2005.