[Computer Vision] Introduction to simple entry code for computer vision (including source code)

1. Introduction

Computer vision is the discipline that studies how computers understand and interpret images and video. Its goal is to enable computers to mimic the human visual system, allowing them to recognize, analyze and understand visual input.

Computer vision utilizes computer algorithms and techniques to process image and video data. These algorithms can extract features from images and videos, recognize and classify objects, detect and track motion, measure and estimate object properties, and generate and process images and videos.

The basic steps of computer vision include image acquisition, preprocessing, feature extraction, object detection and recognition, image segmentation and understanding. The image acquisition phase involves acquiring image or video data using a camera or other image acquisition device. The preprocessing stage includes image denoising, enhancement and adjustment to prepare it for further processing. The feature extraction phase involves extracting meaningful information from images, such as edges, corners or textures. Object detection and recognition involves identifying a specific object or scene in an image and is usually done using machine learning and deep learning algorithms. The image segmentation stage divides the image into different regions or objects for further analysis and understanding.

Computer vision has applications in various fields, including medical image analysis, autonomous driving, surveillance, facial recognition, image search and retrieval, augmented reality, and more. With advances in deep learning and neural networks, computer vision has made major breakthroughs in tasks such as image classification, object detection, and semantic segmentation.

Despite many advances in computer vision, it still faces challenges such as object recognition in complex scenes, image understanding, visual reasoning, and adversarial attacks. In the future, computer vision is expected to further develop and promote the application of artificial intelligence and computer technology in visual understanding and perception.

2. Project code

2.1 Import three-party package

import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt

2.2 Read and display pictures

img = cv.imread('/kaggle/input/images-for-computer-vision/horse.jpg')
plt.imshow(img)

insert image description here

print(type(img))
print(img.shape)

insert image description here

OpenCV reads images as 'BGR' by default, so we have to convert the image to 'RGB' before printing.

img_convert = cv.cvtColor(img, cv.COLOR_BGR2RGB)
plt.imshow(img_convert)

insert image description here

2.3 Painting on the image

Using OpenCV, you can draw rectangles, circles, or any other shape you want on an image.

img = cv.imread('/kaggle/input/images-for-computer-vision/tiger1.jpg')

# Rectangle
# Color of the rectangle
color = (240,150,240)
## For filled rectangle, use thickness = -1
cv.rectangle(img, (100,100),(300,300),color,thickness=10, lineType=8)

## (100,100) are (x,y) coordinates for the top left point of the rectangle and (300, 300) are (x,y) cordinates for the bottom righeight point

# Circle
color=(150,260,50)
cv.circle(img, (650,350),100, color,thickness = 10) ## For filled circle, use thickness = -1

## (250, 250) are (x,y) coordinates for the center of the circle and 100 is the radius

# Text
color=(50,200,100)
font=cv.FONT_HERSHEY_SCRIPT_COMPLEX
cv.putText(img, 'Save Tigers',(200,150), font, 5, color,thickness=5, lineType=20)

# Converting BGR to RGB
img_convert=cv.cvtColor(img, cv.COLOR_BGR2RGB)

plt.imshow(img_convert)

insert image description here

2.4 Blending images

Blending images refers to creating a new image by combining two or more images and combining certain characteristics or information of the original images. This compositing can be achieved through simple pixel-level manipulations or more complex image processing techniques.

The core idea behind blending images is to combine the pixel values ​​of two images to generate a new image. Typically, this involves taking a weighted average of the pixel values ​​of the two images, where the weight of each pixel determines its contribution to the final image. By adjusting the weight of the pixel values, you can control the visibility of each original image in the blended image.

In addition to simple pixel-level blending, there are more advanced blending techniques that allow for more complex effects. For example, blending images can utilize techniques such as image fusion, image compositing, multi-layer blending, etc. to achieve fine control and effects. These techniques allow the selective synthesis of specific parts or features from different images, creating images with novel visual effects.

Blending images has a wide range of applications in computer graphics, image processing, and computer vision. They can be used for artistic creation, image editing, advertising design, digital entertainment, and analog and augmented reality. Blending images is also widely used in applications such as special effects, image blending, style transfer, and face synthesis.

In summary, image blending is a technique for creating new images by combining multiple images. It combines the features and information of different images through pixel-level operations or advanced image processing techniques to produce images with unique visual effects.

def myplot(images, titles):
    fig, axs = plt.subplots(1, len(images), sharey = True)
    fig.set_figwidth(15)
    for img, ax, title in zip(images, axs, titles):
        if img.shape[-1] == 3:
            img = cv.cvtColor(img, cv.COLOR_BGR2RGB)
        else:
            img = cv.cvtColor(img, cv.COLOR_GRAY2BGR)
        ax.imshow(img)
        ax.set_title(title)
img1 = cv.imread('/kaggle/input/images-for-computer-vision/horse.jpg')
img2 = cv.imread('/kaggle/input/images-for-computer-vision/tiger1.jpg')

# Resizing the img1
img1_resize = cv.resize(img1, (img2.shape[1], img2.shape[0]))

# Adding, Subtracting, Multiplying and Dividing Images
img_add = cv.add(img1_resize, img2)
img_subtract = cv.subtract(img1_resize, img2)
img_multiply = cv.multiply(img1_resize, img2)
img_divide = cv.divide(img1_resize, img2)

# Blending Images
img_blend = cv.addWeighted(img1_resize, 0.3, img2, 0.7, 0)

myplot([img1_resize, img2], ['Tiger','Horse'])
myplot([img_add, img_subtract, img_multiply, img_divide, img_blend], ['Addition', 'Subtraction', 'Multiplication', 'Division', 'Blending'])

insert image description here

2.5 Image transformation

Image transformation refers to the process of changing the appearance or characteristics of an image by applying various mathematical operations or techniques. The goal of image transformation is to manipulate an image in a way that enhances its visual quality, extracts specific information, or achieves a desired visual effect.

img = cv.imread('/kaggle/input/images-for-computer-vision/horse.jpg')

width, height, _ = img.shape

# Translating
M_translate = np.float32([[1, 0, 200],[0, 1, 100]]) # 200=> Translation along x-axis and 100=>translation along y-axis
img_translate = cv.warpAffine(img, M_translate, (height, width)) 

# Rotating
center = (width / 2, height / 2)
M_rotate = cv.getRotationMatrix2D(center, angle = 90, scale = 1)
img_rotate = cv.warpAffine(img, M_rotate, (width, height))

# Scaling
scale_percent = 50
width = int(img.shape[1] * scale_percent / 100)
height = int(img.shape[0] * scale_percent / 100)
dim = (width, height)
img_scale = cv.resize(img, dim, interpolation = cv.INTER_AREA)

# Flipping
img_flip=cv.flip(img,1) # 0:Along horizontal axis, 1:Along verticle axis, -1: first along verticle then horizontal

# Shearing
srcTri = np.array( [[0, 0], [img.shape[1] - 1, 0], [0, img.shape[0] - 1]] ).astype(np.float32)
dstTri = np.array( [[0, img.shape[1] * 0.33], [img.shape[1] * 0.85, img.shape[0] * 0.25], [img.shape[1] * 0.15, img.shape[0] * 0.7]] ).astype(np.float32)
warp_mat = cv.getAffineTransform(srcTri, dstTri)
img_warp = cv.warpAffine(img, warp_mat, (height, width))


myplot([img, img_translate, img_rotate, img_scale, img_flip, img_warp],
       ['Original Image', 'Translated Image', 'Rotated Image', 'Scaled Image', 'Flipped Image', 'Sheared Image'])

insert image description here

2.6 Image processing

Image processing refers to the processing and analysis of digital images using various techniques and algorithms. It involves applying mathematical operations to images to improve image quality, extract useful information, or perform specific tasks. Image processing can be performed on 2D images or 3D volumes in different domains such as grayscale, color or multispectral.

There are two broad categories of image processing: analog and digital. Analog image processing involves manipulating a physical photograph or print using techniques such as filtering, cropping, and adjusting exposure. On the other hand, digital image processing involves processing images in digital format using computer algorithms.

Digital image processing encompasses a wide range of operations and techniques, including:

  1. Image Enhancement: Improve the visual quality of images through techniques such as contrast adjustment, brightness correction, noise reduction, and sharpening.
  2. Image Restoration: Image restoration techniques aim to restore images degraded by factors such as noise, blur, or compression artifacts.
  3. Image Compression: Compression techniques reduce the size of image data while preserving important visual information. Lossless and lossy compression algorithms are used for efficient storage and transmission of images.
  4. Image Segmentation: Image segmentation involves dividing an image into meaningful regions or objects. It is used for tasks such as object detection, image understanding, and computer vision applications.
  5. Feature extraction: Feature extraction techniques identify and extract specific patterns or features from images. These features can be used for tasks such as object recognition, image classification, and image retrieval.
  6. Image registration: Image registration techniques align two or more images of the same scene for comparison, fusion or analysis. It is useful in applications such as medical imaging, remote sensing, and computer graphics.
  7. Object detection and tracking: Techniques such as edge detection, contour analysis, and template matching are used to detect and track objects in images or videos.
  8. Image Analysis: Image analysis involves extracting quantitative information from images, such as measuring object properties, computing image statistics, or performing pattern recognition.

Image processing has applications in various fields, including medical imaging, surveillance, robotics, remote sensing, forensic analysis, industrial inspection, and digital entertainment. It plays a vital role in interpreting, understanding and processing visual information in digital form.

Overall, image processing is a fundamental discipline that involves applying mathematical operations and algorithms to digital images for various purposes, from enhancing visual quality to extracting valuable information. It is a key component of many technologies and applications that rely on digital image data.

import plotly.graph_objects as go
from plotly.subplots import make_subplots

def plot_3d(img1, img2, titles):
    
    fig = make_subplots(rows=1, cols=2,
                    specs=[[{
    
    'is_3d': True}, {
    
    'is_3d': True}]],
                    subplot_titles=[titles[0], titles[1]],
                    )
    x, y=np.mgrid[0:img1.shape[0], 0:img1.shape[1]]
    fig.add_trace(go.Surface(x=x, y=y, z=img1[:,:,0]), row=1, col=1)
    
    fig.add_trace(go.Surface(x=x, y=y, z=img2[:,:,0]), row=1, col=2)
    
    fig.update_traces(contours_z=dict(show=True, usecolormap=True,
                                  highlightcolor="limegreen", project_z=True))
    fig.show()

Threshold:

img = cv.imread('/kaggle/input/images-for-computer-vision/simple_shapes.png')

# Converting BGR to RGB
img_convert = cv.cvtColor(img, cv.COLOR_BGR2RGB)
plt.imshow(img_convert)

insert image description here

img = cv.imread('/kaggle/input/images-for-computer-vision/simple_shapes.png')
# Pixel value less than threshold becomes 0 and more than threshold becomes 255
_, img_threshold = cv.threshold(img,150,255,cv.THRESH_BINARY)

plot_3d(img, img_threshold, ['Original Image', 'Threshold Image=150'])
img = cv.imread('../input/images-for-computer-vision/simple_shapes.png')

# Gaussian Filter
ksize = (11,11) # Both should be odd numbers
img_guassian = cv.GaussianBlur(img, ksize, 0)
plot_3d(img, img_guassian, ['Original Image','Guassian Image'])

# Median Filter
ksize = 11
img_medianblur = cv.medianBlur(img, ksize)
plot_3d(img, img_medianblur, ['Original Image','Median blur'])

# Bilateral Filter
img_bilateralblur = cv.bilateralFilter(img, d = 5, sigmaColor = 50, sigmaSpace = 5)
myplot([img, img_bilateralblur], ['Original Image', 'Bilateral blur Image'])
plot_3d(img, img_bilateralblur, ['Original Image','Bilateral blur'])

insert image description here
Gaussian Filter:

img = cv.imread('../input/images-for-computer-vision/simple_shapes.png')
# Gaussian Filter
ksize = (11, 11) # Both should be odd numbers
img_guassian = cv.GaussianBlur(img, ksize, 0)

myplot([img, img_guassian],['Original Image', 'Guassian Image'])

insert image description here

plot_3d(img, img_guassian, ['Original Image','Guassian Image'])

Median Filter:

img = cv.imread('../input/images-for-computer-vision/simple_shapes.png')
# Median Filter
ksize = 11
img_medianblur = cv.medianBlur(img, ksize)
myplot([img, img_medianblur], ['Original Image', 'Median blur Image'])

insert image description here

plot_3d(img, img_medianblur, ['Original Image','Median blur'])

Bilateral Filter:

img = cv.imread('../input/images-for-computer-vision/simple_shapes.png')
# Bilateral Filter
img_bilateralblur = cv.bilateralFilter(img, d = 5, sigmaColor = 50, sigmaSpace = 5)
myplot([img, img_bilateralblur],['Original Image', 'Bilateral blur Image'])

insert image description here

plot_3d(img, img_bilateralblur, ['Original Image','Bilateral blur'])

2.7 Feature Detection

Feature detection is an important task in the field of computer vision, which aims to automatically identify and locate salient features with specific attributes or structures from images or image sequences. These features can be corners, edges, textures, color regions, etc. in the image. The goal of feature detection is to extract unique, stable and distinguishable feature points or feature descriptors from the input image data.

Feature detection plays a key role in many computer vision tasks, such as object recognition, image registration, motion tracking, image stitching, etc. It provides an effective way to describe and represent the key information in the image, so as to realize the understanding and analysis of the image content.

Feature detection algorithms usually include the following steps:

  1. Scale space construction: By using scale space (such as Gaussian pyramid) to detect image features at different scales, it can cope with the scale changes of objects in the image.
  2. Response Computation: Compute feature responses on the image using appropriate filters or operators to find potential feature point locations.
  3. Feature point screening: perform non-maximum suppression or other screening methods on the response image to select the most representative feature points.
  4. Feature description: Describe the selected feature points and generate feature descriptors that can describe their local appearance and structure.

Common feature detection algorithms include Harris corner detection, SIFT (Scale Invariant Feature Transform), SURF (Speeded Up Robust Feature) and ORB (Oriented FAST and Rotated BRIEF), etc. These algorithms use different methods to find, describe and match the feature points in the image, and choose the appropriate algorithm according to the needs of specific tasks and application scenarios.

The advantage of feature detection is that it has certain invariance to changes in illumination, scale, rotation, and viewing angle, and can provide robust feature representation, which is widely used in the field of computer vision.

import numpy as np
import pandas as pd
import plotly.express as px
img = cv.imread('../input/images-for-computer-vision/simple_shapes.png')
img_canny1 = cv.Canny(img,50, 200)

# Smoothing the img before feeding it to canny
filter_img = cv.GaussianBlur(img, (7,7), 0)
img_canny2 = cv.Canny(filter_img,50, 200)

myplot([img, img_canny1, img_canny2],
       ['Original Image', 'Canny Edge Detector(Without Smoothing)', 'Canny Edge Detector(With Smoothing)'])

insert image description here

img = cv.imread('../input/images-for-computer-vision/simple_shapes.png')
img_copy = img.copy()
img_gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
_,img_binary = cv.threshold(img_gray,50,200,cv.THRESH_BINARY)

#Edroing and Dilating for smooth contours
img_binary_erode = cv.erode(img_binary,(10,10), iterations=5)
img_binary_dilate = cv.dilate(img_binary,(10,10), iterations=5)

contours,hierarchy = cv.findContours(img_binary,cv.RETR_TREE, cv.CHAIN_APPROX_SIMPLE)
cv.drawContours(img, contours,-1,(0,0,255),3) # Draws the contours on the original image just like draw function

myplot([img_copy, img], ['Original Image', 'Contours in the Image'])

insert image description here

img = cv.imread('../input/images-for-computer-vision/simple_shapes.png', 0)
_, threshold = cv.threshold(img, 50, 255, cv.THRESH_BINARY)
contours, hierarchy = cv.findContours(threshold, cv.RETR_TREE, cv.CHAIN_APPROX_SIMPLE)
hulls = [cv.convexHull(c) for c in contours]

img_hull = cv.drawContours(img, hulls, -1, (0, 0, 255), 2) #Draws the contours on the original image just like draw function
plt.imshow(img)

insert image description here

Guess you like

Origin blog.csdn.net/wzk4869/article/details/131321956