Project 3: Image Warping and Mosaicing

Part A1: Shoot the Pictures

Three of these image pairs were taken in SF on a cloudy day with my digicam. The last was taken of a corner of my room on my phone (hi Laufey!!).

Part A2: Recover Homographies

I use the hints provided in lecture notes in order to set up the calculation of the homography matrix. As the discussion so eloquently explains, homographies are defined up to scale, so we can fix h9 = 1. With 8 unknowns (we fix i to be 1, as the scaling factor), we need at least 4 sets of points (each providing an x, y value) to completely solve the equation. However, it is helpful to have more than 4 sets of points, for greater accuracy, as overdetermining the system can help make up for correspondence errors. Then, we can use least squares to minimize the inaccuracies in the homography matrix. I change the shape of H to make it easier to solve for a-h. Discussion 5 does a great way of explaining the process of solving and simplifying equations to get the resulting homography matrix.

Part A3: Warp the Images

For both of these functions, I implement a helper function to get the bounds of the new image, after applying the homography matrix. To implement the nearest neighbor code, I use the inverse of the homography matrix to get the corresponding source pixel for each output pixel, rounding coordinates to the nearest pixel data. I make sure to normalize the result of the coordinates (as they are all multiplied by w), round to the nearest whole number index, and ensure the corresponding source pixel is within bounds of the original image. We actually already had to implement bilinear interpolation during the first 189 homework (which caused me quite some grief) so I got to implement it again! We still get the coordinates in the source pixel that correspond with each output pixel, but instead, use the weighted averaging of the four neighboring pixels. It took some time for me to figure out the weights of each pixel (it can be somewhat counterintuitive) and construct the image based on these points and weights. I also set the alpha mask for each image to be 1 where the image exists, and 0 in other places. This becomes helpful later in the project! Having a part of the image that doesn’t exist being interpreted as black vs. clear could definitely cause annoying issues. To test out my functions, I perform rectification with images that have known squares. I select the four points of the square clockwise for im1points and map to [0,0],[100,0],[100,100],[0,100]. These were both taken in the Netherlands!
original
selected points
nearest neighbor
bilinear
original
selected points
nearest neighbor
bilinear
It takes about 20-25% more time to do bilinear interpolation vs nearest neighbor interpolation (2.22 vs 2.86 seconds for the window and 2.69 vs 3.00 seconds for the floor). This might be because we need to consider four points instead of one and thus more computations to weight each point. In terms of quality, the bilinear interpolation takes the cake. The lines around the window are so much less grainy than nearest neighbor interpolation because it takes into account many pixels, instead of a lumpy look where edges that are black get somewhat warped. Some of the lines in the floor also have a jagged look for nearest neighbor interpolation, particularly with the dark brown in the upper right-hand corner.

Part A4: Blend the Images into a Mosaic

First, to make sure that my image overlapping was working well, I implemented a naive algorithm that just weights images by 0.5 in the overlap region and 1 elsewhere. You can see that the edges of the images are very clear, so this was a sign to try something fancier.
  1. I calculate the dimensions of the output image
  2. I pad both images with 0s to match the dimensions of the output image.
  3. I use scipy.ndimage.distance_transform_edt to create a distance map for each image
  4. Utilizing my code from project 2, I create Gaussian and Laplacian pyramids of two levels and create the final image with weighted averaging determined by the distance_transform_edt function.
You can still see a little bit of the seam.
It’s almost impossible to see the seam for this indoor image!
You can see the seam if you look close enough.
This other outdoor image taken from the roof of an apartment building does quite well.

Part B1: Harris Corner Detection

I used the harris.py provided as part of the project to detect Harris corners. I tried implementing ANMS after that, but my kernel kept crashing. Turns out calculating pairwise distance between 195k points crashes your kernel! After doing some light filtering on the Harris points, I calculated ANMS (Adaptive Non-Maximal Suppression) as follows, with c_robust = 0.9 and num_corners = 500.
  1. Obtained the Harris score at each corner
  2. Constructed an array of distances for all pairs of corners
  3. Created a mask to figure out if f(x_i) < c_robust * f(x_j). This is the specific line of code I used mask=scores[:, np.newaxis] < (c * scores[np.newaxis, :])
  4. I set 0 values to infinity, because we sort by min values later
  5. Find the minimum distance for each point
  6. Sort indices by min distance, then reverse the array to get the largest radius. I then selected the top 500 points after sorting by radius.
Funny thing is I also heavily relied upon the Laufey poster to find correspondences.
Harris corners (with prefiltering)
ANMS corners

Part B2: Feature Descriptor Extraction

To extract features, I pass in the points associated with good Harris corners, as found earlier. For each one of these points, I extract the 40x40 pixel patch around it, if the full 40x40 pixel patch falls within the image. Then, I downsample the patch to 8x8 pixels and normalize (which allows us to compare patches between images). I keep track of which points correspond to extractable features (ones that fall within the image bounds).

Part B3: Feature Matching

To match features between images, I use the procedure described in section 5 of the paper. First, I calculate feature pairwise differences. Then I sort each row in ascending order of pixelwise difference, to get the first closest and second closest feature match. I calculate the Lowe ratio as the error of the first nearest neighbor / error of the second nearest neighbor. I only keep matches that are less than my Lowe threshold.

Part B4: RANSAC for Robust Homography

I implement the RANSAC (random sample consensus) algorithm as described in lecture and discussion.
  1. I randomly choose a set of 4 corresponding points
  2. I calculate the homography matrix between these 4 points
  3. I use this to transform the full set of input source points
  4. I compute the number of inliers -> source points that when transformed, fall within 0.8 pixels of the predicted point
  5. I save this mask of inliers if the number of inliers from this homography is greater than seen before
  6. I repeat this process num_iterations (1000) times
  7. I calculate the homography from the maximum set of inliers.
manually stitched
automatically stitched
manually stitched
automatically stitched
There seems to be slight jittering near the L in the poster :( shaky hands perhaps
manually stitched
automatically stitched
The seam seems a little less prevalent in the automatically stitched one. Some reflections! This project was really interesting because it showed me how something that humans find so intuitive (finding matching points between images) requires so much computation for computers to achieve. It also made me really interested in how the iPhone panorama works. This also illustrated the importance of understanding your coordinate system--many of my bugs came from switched y, x or x, y. I presume this will also be a common source of bugs in the next project :p