[Computer Vision] CS131 Study Notes#0

CS131 Study Note #0

1. Getting Started with Numpy

The essence of image recognition processing is matrix operation, and python's numpy library performs such operations, so learning numpy is a necessary step before image learning.

Usually used import numpy as npto use the numpy package

1.1 general matrix creation method

  • Cannot create empty array
  • The general way to create an array:y = np.array([[1,2,3,4,5], [6,7,8,9,10]])
  • Read size:y.shape
  • Create a zero matrix:np.zero((3,3))#创建大小为3*3的0矩阵
  • Create an identity matrix:identity = np.identity(3)
  • Create an all-ones matrix:ones = np.ones((2,2))

1.2 The use of Broadcasting and np.mean

import numpy as np
#如果我们想要将任一个矩阵的行平均值调整到0:
matrix = 10*np.random.rand(4,5)
row_means = matrix.mean(axis = 1).reshape((4,1))
matrix = matrix - row_means
print(matrix)
#axis 不设置值,对 m*n 个数求均值,返回一个实数
#axis = 0:压缩行,对各列求均值
#axis =1 :压缩列,对各行求均值

1.3 numpy.random uses

  • numpy.random.randient uses
#low、high、size三个参数。默认high是None,如果只有low,那范围就是[0,low)。如果有high,范围就是[low,high)。
#返回随机的整数,位于半开区间 [low, high)。
>>> np.random.randint(2, size=10)
array([1, 0, 0, 0, 1, 1, 0, 0, 1, 0])

>>> np.random.randint(1, size=10)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

>>> np.random.randint(5, size=(2, 4))
array([[4, 0, 2, 1],
       [3, 2, 2, 0]])
  • numpy.random.rand uses
#通过本函数可以返回一个或一组服从“0~1”均匀分布的随机样本值。随机样本取值范围是[0,1),不包括1。 
>>> np.random.rand(3,2)
array([[ 0.14022471,  0.96360618],  
       [ 0.37601032,  0.25528411],  
       [ 0.49313049,  0.94909878]]) 
  • numpy.random.randn uses
#randn函数返回一个或一组样本,具有标准正态分布。
np.random.randn(2,4)
array([[ 0.27795239, -2.57882503,  0.3817649 ,  1.42367345],
      [-1.16724625, -0.22408299,  0.63006614, -0.41714538]])
#标准正态分布—-standard normal distribution
#标准正态分布又称为u分布,是以0为均值、以1为标准差的正态分布,记为N(0,1)。

1.4 boolean masks use

  • basic judgment
import numpy as np
array = np.array(range(20)).reshape((4,5))#4*5,1-20的矩阵
print(array)

output = array > 10
output
#out:
array([[False, False, False, False, False],
       [False, False, False, False, False],
       [False,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True]])

array[output]
#out:
array([11, 12, 13, 14, 15, 16, 17, 18, 19])

#可以进行多元的判断
mask = (array < 5) | (array > 15)
#mask = array < 5 | array > 15
mask
#out:
array([[ True,  True,  True,  True,  True],
       [False, False, False, False, False],
       [False, False, False, False, False],
       [False,  True,  True,  True,  True]])

  • practical use
#Given a matrix, change all of the negative values to zero
matrix = 2*np.random.rand(5, 5) - 1#(-1,1)均匀分布的随机矩阵
### SOLUTION ###
mask = matrix < 0
print(mask)
matrix[mask] = 0#将mask中的值全部赋为0
print(matrix)

1.5 reshape usage

#when your reshape, by default you fill the new array by rows
x = np.linspace(1, 12, 6)
print(x)
#[ 1.   3.2  5.4  7.6  9.8 12. ]

x = x.reshape((3,2)) #does not reshape in place!
print(x)
#[[ 1.   3.2]
# [ 5.4  7.6]
# [ 9.8 12. ]]

print(x.reshape(-1))#-1相当于默认值,将由系统自动算出
[ 1.   3.2  5.4  7.6  9.8 12. ]

print(x.reshape(2,-1))
[[ 1.   3.2  5.4]
 [ 7.6  9.8 12. ]]

1.6 numpy deep copy

We found that matrix assignments in numpy are all shallow copies, and the copies are addresses, for example:

array = np.linspace(1, 10, 10)
array
#out
#array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

dup = array
dup
#out
#array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

array[0] = 100
dup
#out
#array([100.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.,  10.])

print(id(array))
print(id(dup))
#out
#120645422176
#120645422176

It can be seen that after using '=' to assign values, the addresses pointed to by array and dup are the same, so modifying one of them will also change the other. To avoid this situation, we use numpy's deep copy method.

#using copy
import copy
array = np.linspace(1, 10, 10)
dup = copy.deepcopy(array)
#此处也可以写为dup = np.copy(array)或者dup = array.copy()
print(id(array))
print(id(dup))
array[0] = 100
dup
120649253152
120664256640
array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

The wrong way: use slicing syntax [:]

#slicing
array = np.linspace(1, 10, 10)
dup = array[:]
print(id(array))
print(id(dup))
array[0] = 100
dup
2552119240816
2552119240336
[100.   2.   3.   4.   5.   6.   7.   8.   9.  10.]

We found that although the addresses are different, the values ​​of dup and array still change together

2. Getting started with Pyplot

2.1 pyplots

import matplotlib.pyplot as plt

x = np.arange(10)**2
print(x)
plt.plot(x)
plt.show()

The output table is as follows:

Please add a picture description

Of course, many details can also be added:

plt.figure(figsize = (15,15))
plt.plot(x)
plt.title("This is a graph")
plt.xlabel("this is the x label")
plt.ylabel("this is the y label")
plt.show()

Please add a picture description

2.2 Scatter plot

x = np.concatenate((np.linspace(1, 5, 10).reshape(10, 1), np.ones(10).reshape(10, 1)), axis = 1)
print(x)
y = x[:,0].copy() + 2*np.random.rand(10) - 0.5
print(y)
plt.scatter(x[:,0], y)#散点图

3. Image reading

3.1 Basic Composition of Pictures

As we all know, an image is composed of three color layers of RGB. For an image, we can use a matrix of (h, w, 3) to represent it. Among them, h and w respectively represent the height and width of the picture, and 3 represents three basic color channels, and the numbers stored in the matrix corresponding to each color channel represent the grayscale value of the color light, and the pixels composed of three different grayscale colors Stitched into a colorful image.

The gray value is not the "black and white" value in the literal sense, but refers to the brightness value of a certain color. For example, a certain layer of the picture (400, 300, 1) represents the red channel matrix, and the red gray value is stored in it.

Each color channel stores its corresponding grayscale value, and the grayscale values ​​of the last three layers of channels can be adjusted to the desired color in the picture according to the grayscale values ​​of different colors in the three primary colors.

Take a random point in the picture, when displaying, put the red gray value of the point into the R channel, the green gray value into the G channel, and the blue gray value into the B channel, and the three gray values ​​can be adjusted like The same as the color to call out the corresponding color.

All in all, channels represent channels of different colors, (of course there are some special channels, such as alpha channels, that store image transparency information.) The grayscale value represents the brightness of a color.

3.2 Code implementation of image reading

def display(img):
    plt.figure(figsize = (5,5))
    plt.imshow(img)#显示图片
    plt.axis('off')#不显示坐标轴
    plt.show() 
def load(image_path):
    out = io.imread(image_path)
    #读取图片,第二个参数默认为False,为True时是灰度图
    out = out.astype(np.float64) / 255
    return out
from skimage import io
img = load('image1.jpg')
display(img)
def rgb_exclusion(image, channel):
    out = image.copy()
    if channel == 'R':
        out[:, :, 0] = 0
    elif channel == 'G':
        out[:, :, 1] = 0
    elif channel == 'B':
        out[:, :, 2] = 0
    return out#关闭RGB通道中的一个

Note: scikit-image is an image processing package based on scipy. It processes images as numpy arrays. It is a very good digital image processing tool. It needs further study. The following table is for reference.

submodule name Main functions
io Read, save and display pictures or videos
data Provide some test pictures and sample data
color color space transformation
filters Image enhancement, edge detection, sorting filters, automatic thresholding, etc.
draw Basic graphic drawing that operates on numpy arrays, including lines, rectangles, circles, and text, etc.
transform Geometric or other transformations, such as rotation, stretching, and Radon transformations, etc.
morphology Morphological operations, such as opening and closing operations, skeleton extraction, etc.
exposure Image intensity adjustment, such as brightness adjustment, histogram equalization, etc.
feature Feature detection and extraction, etc.
measure Measurement of image properties, such as similarity or contour lines, etc.
segmentation Image segmentation
restoration image restoration
util Universal function

reference

https://zhuanlan.zhihu.com/p/360220467

https://www.jianshu.com/p/be7af337ffcd

4. Linear Algebra

4.1 Solving linear equations:

For example, say we wanted to solve the linear system
A x = b Ax=b Ax=b

A = np.array([[1, 1], [2, 1]])
b = np.array([[1], [0]])
#This function takes parameters A, b, and returns x such that Ax =b. 
x = np.linalg.solve(A, b)

4.2 Find the best fit line (best fit):

Linear regression finds the “line of best fit” by minimizing the residual sum of squares.

If we have n datapoints { ( x 1 , y 1 ) , . . . , ( x n , y n ) } \{(x_1, y_1), ... ,(x_n, y_n)\} {(x1,y1),...,(xn,yn)}​, the objective function takes the form l o s s ( X ) = Σ i = 1 n ( y i − f ( x i ) ) 2 loss(X) = \Sigma_{i = 1}^n (y_i - f(x_i))^2 loss(X)=Si=1n(yif(xi))2​ where f ( x i ) = θ 0 + θ 1 x 1 + . . . + θ n x n f(x_i) = \theta_0 + \theta_1 x_1 + ... +\theta_n x_n f(xi)=i0+i1x1+...+inxn

It turns out the parameters such that the loss function is minimized are given by the closed form solution θ = ( X T X ) − 1 X T y \theta = (X^T X)^{-1} X^T y i=(XTX)1XTy

For this algorithm we recall the method of least squares in linear algebra:

For error: E ( x ) = ∣ ∣ b − A x ∣ ∣ 2 E(x)=||b-Ax||^2E ( x )=∣∣bAx2. Find x to minimize E, where A is a full-rank matrix, and p is the projection of b on the column A space.

By the Pythagorean theorem:

∣ ∣ A x − p ∣ ∣ 2 + ∣ ∣ b − p ∣ ∣ 2 = ∣ ∣ b − A x ∣ ∣ 2 || Ax-p||^2+||b-p||^2=||b-Ax||^2 ∣∣Axp2+∣∣bp2=∣∣bAx2

For any b:

∣ ∣ b − A x ∣ ∣ 2 ≥ ∣ ∣ b − p ∣ ∣ 2 ||b-Ax||^2 \geq ||b-p||^2 ∣∣bAx2∣∣bp2

Therefore, E is minimized if and only if x is taken such that A x = p Ax=pAx=p . Since A has full rank, the equation has a unique solution:

x ^ = ( A T A ) − 1 A T b \hat{x} = (A^TA)^{-1}A^Tb x^=(AT A)1ATb

Next we use python to do some practical operations

get some points first

x = np.concatenate((np.linspace(1, 5, 10).reshape(10, 1), np.ones(10).reshape(10, 1)), axis = 1)#axis=1表示按列拼接
print(x)
y = x[:,0].copy() + 2*np.random.rand(10) - 0.5
print(y)
plt.scatter(x[:,0], y)
plt.show()

Please add a picture description

Find the coefficient θ \thetai

theta = np.linalg.lstsq(x, y, rcond=None)[0]
#leastsquare最小二乘求解,利用内置函数
print(theta)

​ [0.72037691 1.55604653]

or:

theta = np.linalg.inv(x.T.dot(x)).dot(x.T).dot(y)
#利用公式求解最小二程
print(theta)

Got the same result: [0.72037691 1.55604653]

Finally draw the line:

plt.scatter(x[:,0], y)
plt.plot(x[:,0], x[:,0]*theta[0] + theta[1])

Please add a picture description

Guess you like

Origin blog.csdn.net/qq_56199570/article/details/119705626