CS180 Project 1

Methodology

My program took a glass plate image as input and produced a single color image as output. First, I divided the image into three equal parts as each image was essentially the three channels stacked as images on top of each other. Then, I aligned the second and third parts (G and R channels) to the first (B) iteratively. I aligned the images by exhaustively searching over a window of possible displacements (I chose [-15, 15] pixels), scored all of the alignments using an image matching metric, and then taking the displacement with the best score for the final image output.

Alignment Metrics

There are different metrics to score how well the images match. In this case, I used Euclidean distance (L2 norm) and Normalized Cross-Correlation (NCC) on a cropped version of the images (so that the chromatic abberation on the boundaries of the image wouldn't affect the results).

The SSD Norm

This is the simplest metric to use on the images (formula shown below). It worked well on the smaller images, but performance worsened on larger image sizes. To compute the euclidean distance, you first compute the per-element difference for each index i. Square each difference and then sum up those squared values to obtain the Sum-of-Squared-Differences. Finally, take the square root to get the Euclidean distance.

d(u, v) = √(∑_i=1ⁿ (u_i − v_i)²)

The NCC Norm

The Normalized Cross-Correlation metric is simply a dot product between two normalized vectors. This metric performed slightly better than the L2 Norm, but the effect was most noticeable on the larger images combined with the pyramid speedup. To compute the normalized cross-correlation metric, you first subtract the mean intensity from each image so they have zero mean. This makes the measure invariant to brightness shifts and centers the data. Then, you compute the dot product of the zero-mean image vectors which measures how much the patterns align. Finally, you compute the magnitude of each vector using the L2 norm and divide the dot product by the product of the magnitudes to normalize. Then, the result is between -1 and 1 where 1 indicates perfect alignment. This adjusts for differences in brightness and contrast compared to simply using Euclidean distance.

∑_i (u_i − ū)(v_i − v̄) √(∑_i (u_i − ū)²) × √(∑_i (v_i − v̄)²)

Improves further with pyramid speedup as shown in gallery below.

Pyramid Search

For larger images, exhaustive search becomes too expensive since the pixel displacement I've set ([-15,15] pixels) is too large. In this case, I implemented a faster search procedure, namely an image pyramid, that represents the image at multiple scales and processes from the smallest image downwards while updating my estimate.

Algorithm Steps:

Build a pyramid by repeatedly downsampling the input images to create lower-resolution versions.
Start motion estimation (full search) at the coarsest, lowest-resolution level. At each higher level, only search for displacements in a small neighborhood (e.g. [-1,1]) around the motion vector found in the previous level.
Double the displacement coordinates when moving from a coarser level to a finer level to account for the increased resolution.
Refine the motion estimate progressively until reaching the original, full-resolution image.

Image Gallery

Here is the gallery of my algorithm ran on all of the example images provided (other than Emir which is shown up top) using the NCC image metric and image pyramid speedup.