Real-Time Video Convolution Using WebGL

A convolution is a mathematical operation that is done by multiplying a pixel’s and its neighboring pixels color value by a weighted matrix, and then adding those values together (for all the pixels of an image). The small matrix that defines the weights of the multiplication is called kernel or convolution matrix. This is used to apply effects to an image such as blurring, sharpening, outlining, and more; where each effect uses a distinct kernel.

The aforementioned technique is also used in the field of deep learning with convolutional neural networks (CNNs). In this context, we try to learn the weights for each element of the convolution matrix in order to improve the score of our model. However, this post is more focused on just understanding how convolutions work.

A convolution can be defined by the following formula: 1

$$ g(x, y)=\omega * f(x, y)=\sum_{d x=-a}^{a} \sum_{d y=-b}^{b} \omega(d x, d y) f(x+d x, y+d y) $$

where $g(x,y)$ is the output filtered image, $f(x, y)$ is the input image and $\omega$ is the kernel. This is pictured in the following animation: 2

For something more interactive, I highly recommend you to visit “Image Kernels” by Victor Powell. There are several examples that already display the process of performing convolutions, so I’m bringing a different approach where we apply the kernels in real-time to a video feed using WebGL.


In this approach we will be using WebGL - a rasterization engine that draws points, lines, and triangles based on code we supply. To compute what an image will look like and where it is placed, we need to write a program formed by two different functions. These functions are called shaders:

  • Vertex Shader, responsible for the positions
  • Fragment Shader, responsible for the colors

But for more material on this topic, visit “WebGL2 Fundamentals”. Since we want to compute the pixel’s new color, we need to implement our logic within the fragment shader. Therefore, all that we do is “translate” the previous math formula into code as such:

precision mediump float;
uniform sampler2D image;
uniform vec2 resolution;
uniform mat3 kernel;
varying vec2 uv;

void main(){
    vec2 cellSize = 1.0 / resolution;
    for(int i=-1; i<=1; i++){
        for(int j=-1; j<=1; j++){
            vec2 vec = cellSize * vec2(float(i), float(j));
            gl_FragColor += 
                texture2D(image, uv + vec) *
                kernel[i][j];
        }
    }
    gl_FragColor[3] = 1.0; //alpha correction
}

In this example, we have the particularity of the vec2 uv variable which represents the current pixel cooordinates, and the vec2 cellSize which is the size of each pixel both horizontally and vertically.

Afterwards, we just need to perform this computation for each frame from the video feed - and there you have it: a real-time convolution over a video using WebGL. For more details on how this is working under the hood, you can check the page source code.

Visualization



Test a different kernel:
Identity
Laplacian Edge Detection
Gaussian Blur
Box Blur
Sharpen
Unsharpen
1 1 1
1 -8 1
1 1 1

  1. Song Ho Ahn (안성호). Convolution. Digital Signal Processing. ↩︎

  2. Michael Plotke. 2D Convolution Animation. Wikimedia Commons. ↩︎