Scale-Invariant Contour Completion Using Conditional Random Fields

Introduction

Finding the boundaries of objects and surfaces in a scene is a fundamental problem for computer vision, which has immediate appliations in higher-level tasks such as object recognition. Traditionally people do edge detection locally: the "boundariness" of each pixel is decided based on a small local neighborhood around it, independently of other locations in the image. This is limited by the amount of information avaiable in such small patches. There is evidence from psychophysical experiments that the performance of local edge detectors has been approaching this limit.

 Figure 1: purely local boundary detection can easily be fooled. Utilizing more context could greatly improve boundary detection.

One way to incorporate more contextual information is to use shapemes which try to capture shape/appearance context in a generic way. Or, more specifically, one well known source of contextual information is that provided by curvilinear continuity, i.e. the fact that contours in natural images are extended and smooth.

In this work we develop probabilistic models of continuity (as well as junction type) and show that curvilinear continuity is quantitatively useful for a large variety of natural images.

Using the CDT Graph

We bulid our approach on top of the CDT graph, a discrete scale-invariant image representation. For every edge e in the CDT graph, we can associate a random variable Xe, where Xe=1 if e corresponds to a true boundary contour and Xe=0 otherwise. The variables {Xe} interact with each other through vertices or junctions in the graph. Our goal is to build a probabilistic model of continuity and junction frequency on the CDT graph and make inferences about {Xe}.

A Baseline Model of Local Continuity

In our baseline model, each Xe is estimated independently in its local context, including: neighboring edges, their low-level contrast (measured by Pb), and the continuity of their connections.

 (a) (b)
 Figure 2: (a) A simple 2-edge model of local curvilinear continuity. (b) Evidence of continuity come from both ends.

The simplest model of local context consists of two edges ( see Figure 2(a) ). Each edge has associated features: Pb (average low-level contrast), G (whether it is a G-edge or C-edge), and continuity (measured by the angle θ). A logistic classifier is trained to predict P(X0=1 and X1=1) and P(X0=0 and X1=0).

To compute a new continuity-enhanced "probability of boundary" Pblocal, recall that contours are extended, and evidence of continuity come from both ends of edge e0. Therefore we compute Pblocal as the product of the 2-edge model on both pairs (e0,e1) and (e0,e2).

Gloal Continuity with Conditional Random Fields

The edges {e} in the CDT graph interacts with each other through the graph structure. A natural probabilistic model for {e} is therefore a random field defined on the CDT graph. We choose to use conditional random fields (CRF), a discriminative graphical model with exponential potential functions. Originally developed for parsing natural languages, conditional random fields have been shown to be superior to the traditional Markov random fields (again, discriminative vs. generative).

 Figure 3: the factor graph representing our conditional random field (CRF) model for global continuity. The CDT provides both the geometric and graphical structure for our model. Each edge in the CDT corresponds to a variable (represented by circles) which is 1 if pixels corresponding to edge Xe constitute a true boundary. Potential functions (squares) consist of a singleton potential for each edge encodes the underlying low-level edge contrast and continuity/closure potentials at the vertices whose values are dependent on both the angles between incoming segments and the numbers of edges (degree) entering a junction. The degree encodes various junction types, including line endings deg=1, contour continuation deg=2, and T-jucntions deg=3.

We use loopy belief propagation to make inference on this probabilistic structure. Although the graphical structure of CDT is more complex than a regular grid, empirical studies show that loopy belief propagation converges quickly on this structure (<10 iterations, <1 second). This enables us to train and test our model on large datasets of natural images.

We train the parameters (weights for potential functions) by gradient descent with momentum. Because the potential functions are all in the exponential family (i.e. log-linear feature combination), gradient can be easily computed for CRF models. Training converges in a few hundrend iterations.

Results: Continuity Improves Boundary Detection

We use three human-segmented datasets: a set of 30 new photos of baseball players, 344 images of horses, , and 300 images from the Berkeley Segmentation Dataset (BSDS). For quantitative analysis, we use the Precision-Recall framework for boundary detector evaluation, as developed together with the BSDS dataset.

 Figure 4: precision-recall evaluation for the horse dataset. Both the local and global models of continuity significantly improve boundary detection, with the global CRF model being the best. The improvement is most noticeable in the low-recall/high-precision range which corresponds to the case of boosting the most prominent boundaries and suppressing background noise. These boundaries are typically smooth; thus continuity helps suppress false positives in the background. Our models also push the asymptotic recall rate much closer to 100% (without loss in precision), reflecting their abilities to complete gradient-less gaps.
 Figure 5: precision-recall curves for the baseball player and the BSDS datasets. The results are qualitatively similar to the horse dataset, with the continuity models improving boundary detection and the global CRF model being the best.

References

1. Scale-Invariant Contour Completion using Conditional Random Fields.   [abstract] [pdf] [ps] [bibtex]
Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, in ICCV '05, Beijing 2005.