Feature Engineering Using Image Data

a34c45459c64671ae4b55e09b7723146.png

With feature engineering, we immediately think of tabular data. However, we can also obtain features for image data. The goal is to extract the most important aspects of an image. Doing this will make it easier for us to find the mapping between the data and the target variable.

This means you can use less data and smaller models for training. Smaller models reduce the time it takes to make predictions. This is especially useful when deployed on edge devices. Another benefit is that you can be more certain about what your model is using to make predictions.

We'll demonstrate this with some image feature engineering using Python:

  • cut out

  • Grayscale

  • Select RGB channel

  • intensity threshold

  • edge detection

  • Color filters (i.e. extract pixels within a given color range)

To keep it interesting, we'll be doing it for autonomous cars. As shown below, we want to train the model using images of the track. This model will then be used to make predictions to guide the car. Finally, we discuss the limitations of feature engineering from image data.

e13076522563b2ca2b4ca98a28f01a39.png

Feature Engineering and Augmentation

Before we dive in, it's worth discussing image enhancement. This approach has similar goals to feature engineering. However, they achieve these goals in different ways.

What is data augmentation?

Data augmentation is when we alter data either systematically or randomly using code. For images, this includes methods such as flipping, adjusting color, and adding random noise. These methods allow us to artificially introduce noise and increase the size of the dataset. If you want to learn about image augmentation in more detail, please refer to this article:

https://towardsdatascience.com/augmenting-images-for-deep-learning-3f1ea92a891c

In production, models will need to be run under different conditions. These conditions are determined by variables such as lighting, the angle of the camera, the color of the room or objects in the background.

The purpose of data augmentation is to create a model that can respond to changes in different conditions, and it does this by adding noise that mimics real-world conditions. For example, changing the brightness of an image is similar to collecting data at different times.

Data augmentation also allows us to train more complex architectures by increasing the size of the dataset. In other words, it helps in the convergence of model parameters.

Image Feature Engineering

The goal of image feature engineering is similar, we want to create a more robust model. But now, we need to remove unnecessary noise to make accurate predictions. In other words, we remove variables that vary with different conditions.

By extracting the most important aspects of an image, we are also simplifying the problem. This allows us to rely on simpler model structures. This means we can use smaller datasets to find the mapping between input and target.

An important difference is how these methods are handled in production. Your model does not make predictions on augmented images. However, in feature engineering, your model needs to make predictions on the same features it was trained on. This means you must be able to do feature engineering in production.

Python image feature engineering

Ok, with that in mind, let's start feature engineering.

We'll walk through the code step by step, and you can also find the project on GitHub.

https://github.com/conorosully/medium-articles/blob/master/src/image_tools/image_features.ipynb

First, we use the following imports. We have some standard packages (2-3 lines). Globs are used to handle file paths (line 5). We also have some packages for working with images (Lines 7-8).

#Imports 
import numpy as np
import matplotlib.pyplot as plt

import glob

from PIL import Image
import cv2

As mentioned, we will be using images for self-driving cars. You can find examples of these on Kaggle.

https://www.kaggle.com/datasets/conorsully1/jatracer-images?select=object_detection

We use the following code to load one of the images. We start by loading the file paths of all images (lines 2-3). The image for the first path is then loaded (line 8) and displayed (line 9). You can see this image in Figure 1.

#Load image paths
read_path = "../../data/direction/"
img_paths = glob.glob(read_path + "*.jpg")

fig = plt.figure(figsize=(10,10))

#Display image
img = Image.open(img_paths[0])
plt.imshow(img)
1a4b715fc81eea2a63540bf5ee6e5274.png

cut out

A simple approach is to crop the image to remove unwanted outer regions. The goal is to remove only parts of the image that are not necessary for prediction. For our self-driving car, we can remove pixels from the background.

To achieve this, we first load an image (Line 2). This image is then converted to an array (line 5). The array will have dimensions 224 x 224 x 3.

The image has a height and width of 224 pixels, and each pixel has an RGB channel.

To crop the image, we only select pixels starting at the 25th position on the y-axis (line 8). You can see the result in Figure 2.

#Load image
img = Image.open(img_paths[609])

#Covert to array
img = np.array(img)

#Simple crop
crop_img = img[25:,]
422fee58be744167f1989b12fb4ac107.png

You probably want to keep the aspect ratio. In this case, you can achieve a similar result by turning unwanted pixels black (line 3).

#Change pixels to black
crop_img = np.array(img)
crop_img[:25,] = [0,0,0]
5a5ddc9233a228dcc8e28199e1a26a54.png

By cropping, we can remove unnecessary pixels. We also avoid overfitting the model to the training data. For example, chairs in the background may exist on all left corners. Therefore, the model may associate them with predictions.

Looking at the image above, you might want to crop it further. That is, you can crop the left side of the image without removing any tracks. However, in Figure 4, you can see that for the other images, we will remove significant parts of the track.

#Additional cropping 
crop_img = np.array(img)
crop_img[:25,] = [0,0,0]
crop_img[:,:40] = [0,0,0]
873f78b1a892db3a17ab0be534048be3.png

Here, you don't know which image will be shown to the model when. This means that the same cropping function must be applied to all images. You need to make sure it never deletes important parts of the image.

grayscale

For some applications, the color of the image is not important. In this case, we can convert the image to grayscale. We do this using OpenCV's cvtColor function (line 2).

#Gray scale
gray_img = cv2.cvtColor(img,cv2.COLOR_RGB2GRAY)
05257824cb2fb09f9616c19d5d8b4982.png

Grayscale scaling works by capturing the intensity of colors in an image. It does this by taking a weighted average of the RGB channels. Specifically, the function above uses this formula:

Y = 0.299*R + 0.587*G + 0.114*B

We can see the benefit of this by looking at the number of input values ​​per image. If we use all RGB channels, it will contain 150,528 values ​​(224*224*3). For grayscale images we now only have 50,176 values ​​(224*224). Simple inputs mean we need less data and simpler models.

RGB channel

One of the channels may be more important. We could just use that channel instead of grayscale. Next, we select the R (row 6), G (row 7) and B (row 8) channels. The dimensions of each resulting array will be 224 x 224. You can see the respective images in Figure 6.

#Load image
img = Image.open(img_paths[700])
img = np.array(img)

#Get rgb channels
r_img = img[:, :, 0]
g_img = img[:, :, 1]
b_img = img[:, :, 2]
ece520d3ca25ec58f0246b53725c7919.png

You can also use channel_filter function. Here, the channel parameter (c) will take the value 0, 1 or 2 depending on which channel you want. Keep in mind that some packages load channels in a different order. We are using PIL which is RGB. However, if you load the image using cv2.imread(), the channels will be in BGR order.

def channel_filter(img,c=0):
    """Returns given channel from image pixels"""
    img = np.array(img)
    c_img = img[:, :, c]

    return c_img

When using these transforms, you need to consider whether important information is being removed from the image. For our application, the track is orange. In other words, the color of the track can help differentiate it from the rest of the image.

intensity threshold

With grayscaling, each pixel will have a value between 0 and 255. We can further simplify the input by converting it to a binary value. If the gray value exceeds the threshold, the pixel value is 1, otherwise it is 0. We refer to this cutoff as the intensity threshold.

The function below is used to apply this threshold. We first grayscale the image (Line 5). If the pixel value exceeds the cutoff value, it is assigned the value 1000 (line 8). If we set the pixel to 1, it will be below the cutoff. In other words, in the next step, all pixels will be set to 0 (line 9). Finally, we scale all pixels so that they take on a value of 0 or 1 (Line 11).

def threshold(img,cutoff=80):
    """Apply intesity thresholding"""
    
    img = np.array(img)
    img = cv2.cvtColor(img,cv2.COLOR_RGB2GRAY)
    
    #Apply cutoff
    img[img>cutoff] = 1000 #black
    img[img<=cutoff] = 0 #white
    
    img = img/1000

    return img

Part of a self-driving car project is avoiding obstacles. These are tins painted black. In Figure 7, you can see how applying an intensity threshold function we can isolate the can from the rest of the image. This is only possible if the can is black. In other words, its intensity is greater than the rest of the image.

b8c0142f91cc3cec8aa6a1acd74e38d5.png

The cutoff point can be thought of as a hyperparameter. As can be seen in Figure 7, a larger cutoff means we include less background noise. The downside is that we catch fewer cans.

edge detection

If we want to isolate tracks, we can use Canny edge detection. This is a multi-stage algorithm for detecting edges in images. If you want to understand how it works, I recommend reading Sofiane Sahir's article on Canny edge detection.

https://towardsdatascience.com/canny-edge-detection-step-by-step-in-python-computer-vision-b49c3a2d8123

We apply the algorithm using the cv2.Canny function. The threshold1 and threshold2 parameters are used for hysteresis processes. This is the final pass of the edge detection algorithm to determine which lines are actually edges.

#Apply canny edge detection
edge_img = cv2.Canny(img,threshold1 = 50, threshold2 = 80)

You can see some examples in Figure 8. Similar to intensity thresholding, what we get is a binary map - white for edges, black otherwise. Hopefully now the track is easier to distinguish from the rest of the image. However, you will find that the edges in the background are also detected.

27eeb7fe9a51cbff85b17ad1aba595c2.png

color filter

By using pixel colors, we might have better luck isolating the tracks. We do this using the pixel_filter function below. Convert the image to a binary map using cv2.inRange (Line 10).

This function checks whether the pixel falls within the range given by the list of lower bounds (line 5) and upper bounds (line 6). Specifically, each RGB channel must fall within its respective range (eg, 134-t≤R≤194+t).

def pixel_filter(img, t=0):
    
    """Filter pixels within range"""
    
    lower = [134-t,84-t,55-t]
    upper = [192+t,121+t,101+t]

    img = np.array(img)
    orange_thresh = 255 - cv2.inRange(img, np.array(lower), np.array(upper))

    return orange_thresh

In simple terms, this function determines whether the pixel color is close enough to racetrack orange. You can see the result in Figure 9. The parameter t introduces some flexibility. Higher values ​​capture more of the track, but also retain more noise. This is because pixels in the background also fall within range.

1f76582ca24edee9042ec78d179c62be.png

You might be wondering how we got the lower and upper bounds. That is, how do we know that the channel will fall between [134,84,55] and [192,121,101]? We have created a color picker using Python. We outline how to create it in the article below.

https://towardsdatascience.com/building-a-color-picker-with-python-55e8357539e7

In Figure 10, you can see how the selector works. We select pixels from multiple images and try to select them at different positions on the track. This way we get the full range of pixel values ​​under different conditions.

c3a9a43ad830ab8078615155831abc70.gif

We chose 60 colors in total. You can see all of these colors (with an additional optical illusion) in Figure 11. The RGB channels of all these colors are stored in a list -- "colors".

215a34d821c840a9a87f9fdc695a0ade.png

Finally, we get the min and max values ​​for each RGB channel. This gives the lower and upper bounds.

lower = [min(x[0] for x in colours),
              min(x[1] for x in colours),
              min(x[2] for x in colours)]

upper = [max(x[0] for x in colours),
              max(x[1] for x in colours),
              max(x[2] for x in colours)]

Limitations of Feature Engineering

After the operation, you may not be convinced yet. A major advantage of deep learning is that it can recognize complex patterns without the need for feature engineering. This is a great point.

Feature engineering requires critical thinking. You need to figure out which aspects of the image are important. Then, you need to write code to extract these aspects. For some applications, the time it takes to perform all of these operations is not meaningful.

Also, for some methods, we found that we could not remove all noise. For example, a black background in intensity thresholding. Intuitively, the remaining noise may be harder to distinguish from important content. That is, the remaining noise and object pixels have the same value (1).

In fact, the benefit is only gained when dealing with relatively simple computer vision problems. Our track never changes, the objects are always the same color. For more complex problems, you need more data. Alternatively, you can fine-tune a pretrained model on a smaller dataset.

data set

JatRacer Images (CC0: Public Domain) https://www.kaggle.com/datasets/conorsully1/jatracer-images

References

OpenCV, Color conversions https://docs.opencv.org/3.4/de/d25/imgproc_color_conversions.html

☆ END ☆

If you see this, it means you like this article, please forward and like it. Search "uncle_pn" on WeChat, welcome to add the editor's WeChat "woshicver", and update a high-quality blog post in the circle of friends every day.

Scan the QR code to add editor↓

75572f0de91a3f2a4f55a3ad395da337.jpeg

Guess you like

Origin blog.csdn.net/woshicver/article/details/130592127