1. Description

In the last story , we covered some of the most relevant coding aspects of machine learning, such as functional programming , vectorization , and linear algebraic programming .

Now, let's start our way by implementing a practical encoding deep learning model using 2D convolutions. let's start.

2. About this series

We'll learn how to code must-know deep learning algorithms such as convolutions, backpropagation, activation functions, optimizers, deep neural networks, and more, using only plain and modern C++.

The story is: Coding 2D Convolution in C++

Check out other stories:

0 — Basics of Modern C++ Deep Learning Programming

2 — Cost function using Lambda

3 — Implementing Gradient Descent

4 — activation function

...and more coming soon.

3. Convolution

Convolution is an old friend of the signal processing field. Originally, it was defined as follows:

In machine learning terms:

I (... often referred to as input
K(... as the kernel, and
F(...) as a feature map of I(x) given K.

Considering a multidimensional discrete domain, we can transform the integral into the following summation:

Finally, for 2D digital images, we can rewrite this as :

A simpler way to understand convolution is the following diagram:

We can easily see the kernel slide over the input matrix, producing another matrix as output. This is a simple case of convolution, known as efficient convolution. In this case, the dimensions of the matrix are given by:Output

dim(Output) = (m-k+1, n-k+1)

here:

mare the number of rows and columns in the input matrix, respectively, andn
kis the size of the square kernel.

Now, let's encode our first 2D convolution.

4. Encoding 2D convolutions using loops

The most intuitive way to implement convolution is to use a loop:

auto Convolution2D = [](const Matrix &input, const Matrix &kernel)
{
    const int kernel_rows = kernel.rows();
    const int kernel_cols = kernel.cols();
    const int rows = (input.rows() - kernel_rows) + 1;
    const int cols = (input.cols() - kernel_cols) + 1;

    Matrix result = Matrix::Zero(rows, cols);

    for (int i = 0; i < rows; ++i) 
    {
        for (int j = 0; j < cols; ++j) 
        {
            double sum = input.block(i, j, kernel_rows, kernel_cols).cwiseProduct(kernel).sum();
            result(i, j) = sum;
        }
    }

    return result;
};

There are no secrets here. We slide the kernel over columns and rows, applying an inner product for each step. Now, we can simply use it like this:

#include <iostream>
#include <Eigen/Core>

using Matrix = Eigen::MatrixXd;

auto Convolution2D = ...;

int main(int, char **) 
{
    Matrix kernel(3, 3);
    kernel << 
        -1, 0, 1,
        -1, 0, 1,
        -1, 0, 1;

    std::cout << "Kernel:\n" << kernel << "\n\n";

    Matrix input(6, 6);
    input << 3, 1, 0, 2, 5, 6,
        4, 2, 1, 1, 4, 7,
        5, 4, 0, 0, 1, 2,
        1, 2, 2, 1, 3, 4,
        6, 3, 1, 0, 5, 2,
        3, 1, 0, 1, 3, 3;

    std::cout << "Input:\n" << input << "\n\n";

    auto output = Convolution2D(input, kernel);
    std::cout << "Convolution:\n" << output << "\n";

    return 0;
}

This is our first implementation of convolution in 2D, designed to be easy to understand. For a while, we didn't care about performance or input validation. Let's move on for more insights.

In the next story, we will learn how to implement convolution using Fast Fourier Transform and Toeplitz matrix .

5. Filling

In the previous example, we noticed that the output matrix is always smaller than the input matrix. Sometimes this reduction is good and sometimes bad. We can avoid this reduction by adding padding around the input matrix :

Input image padded with 1

The result of the padding in the convolution looks like this:

Padded Convolution — Author Image

A simple (and brute force) way to implement a padded convolution is as follows:

auto Convolution2D = [](const Matrix &input, const Matrix &kernel, int padding)
{
    int kernel_rows = kernel.rows();
    int kernel_cols = kernel.cols();
    int rows = input.rows() - kernel_rows + 2*padding + 1;
    int cols = input.cols() - kernel_cols + 2*padding + 1;

    Matrix padded = Matrix::Zero(input.rows() + 2*padding, input.cols() + 2*padding);
    padded.block(padding, padding, input.rows(), input.cols()) = input;

    Matrix result = Matrix::Zero(rows, cols);

    for(int i = 0; i < rows; ++i) 
    {
        for(int j = 0; j < cols; ++j) 
        {
            double sum = padded.block(i, j, kernel_rows, kernel_cols).cwiseProduct(kernel).sum();
            result(i, j) = sum;
        }
    }

    return result;
};

This code is simple, but very expensive in terms of memory usage. Note that we are making a full copy of the input matrix to create a padded version:

Matrix padded = Matrix::Zero(input.rows() + 2*padding, input.cols() + 2*padding);
padded.block(padding, padding, input.rows(), input.cols()) = input;

A better solution could use pointers to control slice and kernel bounds:

auto Convolution2D_v2 = [](const Matrix &input, const Matrix &kernel, int padding)
{
    const int input_rows = input.rows();
    const int input_cols = input.cols();
    const int kernel_rows = kernel.rows();
    const int kernel_cols = kernel.cols();

    if (input_rows < kernel_rows) throw std::invalid_argument("The input has less rows than the kernel");
    if (input_cols < kernel_cols) throw std::invalid_argument("The input has less columns than the kernel");
    
    const int rows = input_rows - kernel_rows + 2*padding + 1;
    const int cols = input_cols - kernel_cols + 2*padding + 1;

    Matrix result = Matrix::Zero(rows, cols);

    auto fit_dims = [&padding](int pos, int k, int length) 
    {
        int input = pos - padding;
        int kernel = 0;
        int size = k;
        if (input < 0) 
        {
            kernel = -input;
            size += input;
            input = 0;
        }
        if (input + size > length) 
        {
            size = length - input;
        }
        return std::make_tuple(input, kernel, size);
    };

    for(int i = 0; i < rows; ++i) 
    {
        const auto [input_i, kernel_i, size_i] = fit_dims(i, kernel_rows, input_rows);
        for(int j = 0; size_i > 0 && j < cols; ++j) 
        {
            const auto [input_j, kernel_j, size_j] = fit_dims(j, kernel_cols, input_cols);
            if (size_j > 0) 
            {
                auto input_tile = input.block(input_i, input_j, size_i, size_j);
                auto input_kernel = kernel.block(kernel_i, kernel_j, size_i, size_j);
                result(i, j) = input_tile.cwiseProduct(input_kernel).sum();
            }
        }
    }
    return result;
};

This new code is much better because here we are not allocating a temporary memory to hold the populated input. However, it can still be improved. Call and memory costs are also high.input.block(…)kernel.block(…)

One solution to calls is to replace them with CwiseNullaryOp .block(…)

We can run a padded convolution in the following way:

#include <iostream>

#include <Eigen/Core>
using Matrix = Eigen::MatrixXd;
auto Convolution2D = ...; // or Convolution2D_v2

int main(int, char **) 
{
    Matrix kernel(3, 3);
    kernel << 
        -1, 0, 1,
        -1, 0, 1,
        -1, 0, 1;
    std::cout << "Kernel:\n" << kernel << "\n\n";

    Matrix input(6, 6);
    input << 
        3, 1, 0, 2, 5, 6,
        4, 2, 1, 1, 4, 7,
        5, 4, 0, 0, 1, 2,
        1, 2, 2, 1, 3, 4,
        6, 3, 1, 0, 5, 2,
        3, 1, 0, 1, 3, 3;
    std::cout << "Input:\n" << input << "\n\n";

    const int padding = 1;
    auto output = Convolution2D(input, kernel, padding);
    std::cout << "Convolution:\n" << output << "\n";

    return 0;
}

Note that now, the input and output matrices have the same dimensions. Therefore, it is called padding. The default padding mode, which is no padding, is often called padding. Our code allows , or any non-negative padding.samevalidsamevalid

6. Kernel

In deep learning models, the kernel is usually an odd-order matrix, such as , , etc. Some kernels are very famous, such as Sobel's filter :3x35x511x11

It's easier to see the effect of each Sobel filter on the image:

The code to use the Sobel filter is here .

Gy highlights horizontal edges, Gx highlights vertical edges. Therefore, the Sobel kernels Gx and Gy are often referred to as "edge detectors".

Edges are the original features of an image, such as texture, brightness, color, etc. A key point of modern computer vision is to use algorithms to automatically find kernels directly from the data, such as Sobel filters. Or, to use a better term, fit the kernel through an iterative training process.

It turns out that the training process teaches computer programs how to perform complex tasks, such as recognizing and detecting objects, understanding natural language, etc... The training of the kernel will be introduced in the next story.

7. Conclusion and next steps

In this story, we code our first 2D convolution and use a Sobel filter as an illustrative example of applying this convolution to an image. Convolutions play a central role in deep learning. They are heavily used in every real-world machine learning model today. We'll revisit convolutions to learn how to improve our implementation, and cover some features like strides.

In the next story , we will discuss the most central problem in machine learning: the cost function.

quote

A Guide to Convolutional Algorithms for Deep Learning

The Book of Deep Learning, Goodfellow

Neural Networks and Deep Learning: A Textbook, Aggarwal

Computer Vision: Algorithms and Applications, Szeliski.

Signals and Systems, Roberts

Deep Learning from Scratch in Modern C++: [5/8] Convolution