Accurate Image Alignment and Registration using OpenCV

The image alignment and registration pipeline takes two input images that contain the same scene from slightly different viewing angles. The picture above displays both input images side by side with the common scene (object) being the painting Las Meninas (1656) by Velázquez, currently at the Museo del Prado in Madrid (Spain).

The first step is computing the projection that establishes the mathematical relationships which maps pixel coordinates from one image to another 1. The most general planar 2D transformation is the eight-parameter perspective transform or homography denoted by a general $ 3×3 $ matrix $ \mathbf{H} $. It operates on 2D homogeneous coordinate vectors, $\mathbf{x’} = (x’,y’,1)$ and $\mathbf{x} = (x,y,1)$, as follows:

$$ \mathbf{x’} \sim \mathbf{Hx} $$

Afterwards, we take the homographic matrix and use it to warp the perspective of one of the images over the other, aligning the images together. With task clearly defined and the pipeline introduced, the next sections describe how this can be archieved using OpenCV.

Feature Detection

To compute the perspective transform matrix $ \mathbf{H} $, we need the link both input images and assess which regions are the same. We could manually select the corners of each painting and use that to compute the homography, however this method has several problems: the corners of a painting could be occluded in one of the scenes, not all scenes are rectangular paintings so this would not be suitable for those cases, and it would require manual work per scene, which is not ideal if want to process numerous scenes in an automatic manner.

Therefore, a feature detection and matching process is used to link common regions in both images. The only limitation of this technique is that the scene must include enough features evenly distributed. The used method here was ORB 2, but other feature extraction methods are also available — the code of the class FeatureExtraction is presented at the end of the post for brevity.

img0 = cv.imread("lasmeninas0.jpg", cv.COLOR_BGR2RGBA)
img1 = cv.imread("lasmeninas1.jpg", cv.COLOR_BGR2RGBA)
features0 = FeatureExtraction(img0)
features1 = FeatureExtraction(img1)

Feature Matching

The aforementioned class computed the keypoints (position of a feature) and descriptors (description of said feature) for both images, so now we have to pair them up and remove the outliers. Firstly, FLANN (Fast Library for Approximate Nearest Neighbors) computes the pairs of matching features whilst taking into account the nearest neighbours of each feature. Secondly, the best features are selected using the Lowe’s ratio of distances test, which aims to eliminate false matches from the previous phase 3. The code is presented below, and the full function at the end. Right after the code, the picture presents both input images side by side with the matching pairs of features.

matches = feature_matching(features0, features1)
matched_image = cv.drawMatches(img0, features0.kps, \
    img1, features1.kps, matches, None, flags=2)

Homography Computation

After computing the pairs of matching features of the input images, it is possible to compute the homography matrix. It takes as input the matching points on each image and using RANSAC (random sample consensus) we are able to efficiently compute the projective matrix. Although the feature pairs were already filtered in the previous phase, they are filtered again so that only the inliers are used to compute the homography. This removes the outliers from the calculation, which leads to a minimization of the error associated with the homography computation.

H, _ = cv.findHomography( features0.matched_pts, \
    features1.matched_pts, cv.RANSAC, 5.0)

This function gives as output the following $ 3×3 $ matrix (for our input):

$$ \mathbf{H} = \begin{bmatrix} +7.85225708\text{e-}01 & -1.28373989\text{e-}02 & +4.06705815\text{e}02 \cr -4.21741196\text{e-}03 & +7.76450089\text{e-}01 & +8.15665534\text{e}01 \cr -1.20903215\text{e-}06 & -2.34464498\text{e-}05 & +1.00000000\text{e}00 \cr \end{bmatrix} $$

Perspective Warping & Overlay

Now that we have computed the transformation matrix that establishes the mathematical relationships which maps pixel coordinates from one image to another, we can do the image registration process. This process will do a perspective warp of one of the input images so that it overlaps on the other one. The outside of the warped image is filled with transparency, which then allows us to overlay that over the other image and verify its correct alignment.

h, w, c = img1.shape
warped = cv.warpPerspective(img0, H, (w, h), \
    borderMode=cv.BORDER_CONSTANT, borderValue=(0, 0, 0, 0))
output = np.zeros((h, w, 3), np.uint8)
alpha = warped[:, :, 3] / 255.0
output[:, :, 0] = (1. - alpha) * img1[:, :, 0] + alpha * warped[:, :, 0]
output[:, :, 1] = (1. - alpha) * img1[:, :, 1] + alpha * warped[:, :, 1]
output[:, :, 2] = (1. - alpha) * img1[:, :, 2] + alpha * warped[:, :, 2]
main.py
import cv2 as cv
import numpy as np
from aux import FeatureExtraction, feature_matching


img0 = cv.imread("lasmeninas0.jpg", cv.COLOR_BGR2RGBA)
img1 = cv.imread("lasmeninas1.jpg", cv.COLOR_BGR2RGBA)
features0 = FeatureExtraction(img0)
features1 = FeatureExtraction(img1)

matches = feature_matching(features0, features1)
# matched_image = cv.drawMatches(img0, features0.kps, \
#     img1, features1.kps, matches, None, flags=2)

H, _ = cv.findHomography( features0.matched_pts, \
    features1.matched_pts, cv.RANSAC, 5.0)

h, w, c = img1.shape
warped = cv.warpPerspective(img0, H, (w, h), \
    borderMode=cv.BORDER_CONSTANT, borderValue=(0, 0, 0, 0))

output = np.zeros((h, w, 3), np.uint8)
alpha = warped[:, :, 3] / 255.0
output[:, :, 0] = (1. - alpha) * img1[:, :, 0] + alpha * warped[:, :, 0]
output[:, :, 1] = (1. - alpha) * img1[:, :, 1] + alpha * warped[:, :, 1]
output[:, :, 2] = (1. - alpha) * img1[:, :, 2] + alpha * warped[:, :, 2]
aux.py
import cv2 as cv
import numpy as np
import copy


orb = cv.ORB_create(
    nfeatures=10000,
    scaleFactor=1.2,
    scoreType=cv.ORB_HARRIS_SCORE)

class FeatureExtraction:
    def __init__(self, img):
        self.img = copy.copy(img)
        self.gray_img = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
        self.kps, self.des = orb.detectAndCompute( \
            self.gray_img, None)
        self.img_kps = cv.drawKeypoints( \
            self.img, self.kps, 0, \
            flags=cv.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
        self.matched_pts = []


LOWES_RATIO = 0.7
MIN_MATCHES = 50
index_params = dict(
    algorithm = 6, # FLANN_INDEX_LSH
    table_number = 6,
    key_size = 10,
    multi_probe_level = 2)
search_params = dict(checks=50)
flann = cv.FlannBasedMatcher(
    index_params,
    search_params)

def feature_matching(features0, features1):
    matches = [] # good matches as per Lowe's ratio test
    if(features0.des is not None and len(features0.des) > 2):
        all_matches = flann.knnMatch( \
            features0.des, features1.des, k=2)
        try:
            for m,n in all_matches:
                if m.distance < LOWES_RATIO * n.distance:
                    matches.append(m)
        except ValueError:
            pass
        if(len(matches) > MIN_MATCHES):    
            features0.matched_pts = np.float32( \
                [ features0.kps[m.queryIdx].pt for m in matches ] \ 
                ).reshape(-1,1,2)
            features1.matched_pts = np.float32( \
                [ features1.kps[m.trainIdx].pt for m in matches ] \
                ).reshape(-1,1,2)
    return matches
requirements.txt
opencv-python==4.2.0.34
numpy==1.19.2

  1. Szeliski R. (2006) Image Alignment and Stitching. In: Paragios N., Chen Y., Faugeras O. (eds) Handbook of Mathematical Models in Computer Vision. Springer, Boston, MA. https://doi.org/10.1007/0-387-28831-7_17↩︎

  2. Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011, November). ORB: An efficient alternative to SIFT or SURF. In 2011 International conference on computer vision (pp. 2564-2571). Ieee. http://doi.org/10.1109/ICCV.2011.6126544 ↩︎

  3. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2), 91-110. https://doi.org/10.1023/B:VISI.0000029664.99615.94 ↩︎