Image classification and KNN

1 image classification

1.1 What is image classification

The so-called image classification problem is already fixed classification label set, then for image input, identify a classification label from the classification label set, and finally the classification label assigned to the input image. , But it's one of the core issues in computer vision may seem very simple, and have a variety of practical applications. Many computer vision seemingly different problems (such as object detection and segmentation), can be attributed to the image classification.

For example taste:

The following diagram, for example, reads the image classification model image, and generates the picture belongs to the set {cat, dog, hat, mug} probability of each label. It should be noted that, for the computer, the image is a huge 3-dimensional array of numbers. In this example, cat image size 248 pixels wide, 400 pixels high, there are three color channels, namely red, green and blue (referred to as RGB). Thus, the image contains the 248X400X3 = 297600 digits, each digit is integer in the range 0-255, where 0 represents full black, 255 represents full white. Our task is to take these millions of numbers into a simple label, such as "cat."

Image classification task, that is, for a given image, predicted that it belongs to the category labels (or are given the possibility of belonging to a series of different labels). The image is a 3-dimensional array, the array elements is in the range from 0 to 255 integer. Size array that width x height x3, is represented by 3 wherein the red, green and blue three color channels.

Difficulties 1.2 image classification task

For people, identified as a "cat" as a visual concept is extremely simple, but from the point of view on computer vision algorithms worth pondering. Below, we've listed some of the difficulties encountered in computer vision algorithms in image recognition, to remember that images are represented by 3-dimensional array, the array elements are brightness values.

Viewing angle (Viewpoint Variation) : the same object from a plurality of camera angles may exhibit.
Changes in the size (Scale Variation) : visual object size will usually change (not just in the picture, in the real world size also varies).
Deformation (the Deformation) : shape a lot of things are not immutable, there will be great changes.
Occlusion (Occlusion) : the target object may be blocked. Sometimes only a small portion of the object (as small as several pixels) is visible.
Lighting conditions (Illumination Conditions) : On the pixel level, the effect of light is very large.
Background interference (Background Clutter) : objects may be mixed into the background, making it difficult to be identified.
Differences (Intra-class variation) the categories : the shape differences between individual large a class of objects, such as chairs. There are many different objects of this class of objects, each with its own shape.

1.3 Method and process image classification

1.3.1 Method

How to write an algorithm image classification it? This and write a sorting algorithm but quite different. How to write a recognized cat from the image algorithm? It does not seem feasible. Therefore, rather than directly stated in the end is what looks like all kinds of objects in your code, it is better that we take the kids and teach children to identify the object similar to Figure methods: a lot of data to the computer, and then implement the learning algorithm, let the computer learning shape to each class. This method is data-driven approach .

1.3.2 Process

An image classification element is input array of pixel values, and then allocating a classification label to it. The complete process is as follows:

Input : Input that contains a collection of N images, each image of a label of K kinds of labels. This set is called the training set.
Learning : This step task is to use the training set to learn each class in the end look like. This step is generally called a training or learning a classifier model.
Evaluation : Let classifier to predict the classification label image of it never seen before, and in order to evaluate the quality of the classifier. The real classification label comparison we will classifier predicted labels and images. Undoubtedly, the classifier predicted image classification and labeling of category labels if real consistent, and that is a good thing, such a situation better.

2 nearest neighbor classifier

Let's achieve a Nearest Neighbor Classifier. Although this convolution neural network classifiers and nothing to do, there was very little practical use, but by implementing it allows the reader for image classification method to solve the problem have a basic understanding.

2.1 CIFAR10

A very popular image classification data set is CIFAR-10. This data set contains 60,000 small image of 32X32. Each image has a classification tag 10 species. This image is divided into 60 000 50 000 images included training set and test set contains 10,000 images. In the figure below you can see 10 random pictures 10 class.

Left : sample image from CIFAR-10 database. On the right : The first column is the test images, each test image on the right and then the first column is to use the Nearest Neighbor algorithm, based on the pixel difference from the training set selected 10 most similar images.

2.2 Nearest Neighbor simple image-based category judgment

Suppose now that we have CIFAR-10 of 50000 pictures (each classification 5000) as a training set, we hope the remaining 10,000 as a test set and give them marked with labels. Nearest Neighbor algorithm will be holding a training set and test images each picture to compare, and then think it is most similar to the training set picture label assigned to this test image. The above picture on the right to show such a result. Note 10 above classification, only three are accurate. For example, line 8, the horse is classified as a red sports car, because black background red sports car is very strong, so the horse was misclassified as a sports car.

So specifically how to compare two pictures? In the present embodiment, the comparison is a pixel block of 32x32x3. The easiest way is by-pixel comparison, the final value of the difference all together. In other words, the first two pictures into two vectors \ (I_ {1} \) and \ ({2} of I_ \) , and then calculate their distance L1:

$d_{1}\left(I_{1}, I_{2}\right)=\sum_{p}\left|I_{1}^{p}-I_{2}^{p}\right|$

Here is the summation for all pixels. The following is a comparison of the entire process Legend:

With a color picture of the channel will be described as an example. Two pictures using L1 distance for comparison. Differencing pixel by pixel, and then all together to obtain a difference value. If the two pictures exactly the same, then the distance L1 is 0, but if the two images are very different, and that L1 value will be very large.

Calculating the distance between vectors there are many methods, other commonly used method is the distance L2, from the geometric point of view, it will be understood that in the calculation of the Euclidean distance between two vectors. Distance L2 the following formula:

$d_{2}\left(I_{1}, I_{2}\right)=\sqrt{\sum_{p}\left(I_{1}^{p}-I_{2}^{p}\right)^{2}}$

Comparing the two metrics is quite interesting. In the face of the differences between the two vectors, L2 These differences can not tolerate more than L1. In other words, with respect to a huge difference, L2 distance is more inclined to accept more moderate differences.

在cifar10数据集上实现最近邻算法:

# -*- coding: utf-8 -*-
#! /usr/bin/env python
#coding=utf-8

import os
import pickle
import numpy as np

def load_CIFAR_batch(filename):
    """
    cifar-10数据集是分batch存储的,这是载入单个batch

    @参数 filename: cifar文件名
    @r返回值: X, Y: cifar batch中的 data 和 labels
    """

    with open(filename,'rb') as f:
        datadict=pickle.load(f,encoding='bytes')

        X=datadict[b'data']
        Y=datadict[b'labels']

        X=X.reshape(10000, 3, 32, 32).transpose(0,2,3,1).astype("float")
        Y=np.array(Y)

        return X, Y


def load_CIFAR10(ROOT):
    """
    读取载入整个 CIFAR-10 数据集

    @参数 ROOT: 根目录名
    @return: X_train, Y_train: 训练集 data 和 labels
             X_test, Y_test: 测试集 data 和 labels
    """

    xs=[]
    ys=[]

    for b in range(1,6):
        f=os.path.join(ROOT, "data_batch_%d" % (b, ))
        X, Y=load_CIFAR_batch(f)
        xs.append(X)
        ys.append(Y)

    X_train=np.concatenate(xs)
    Y_train=np.concatenate(ys)

    del X, Y

    X_test, Y_test=load_CIFAR_batch(os.path.join(ROOT, "test_batch"))

    return X_train, Y_train, X_test, Y_test

# 载入训练和测试数据集
X_train, Y_train, X_test, Y_test = load_CIFAR10('data/cifar10/') 
# 把32*32*3的多维数组展平
Xtr_rows = X_train.reshape(X_train.shape[0], 32 * 32 * 3) # Xtr_rows : 50000 x 3072
Xte_rows = X_test.reshape(X_test.shape[0], 32 * 32 * 3) # Xte_rows : 10000 x 3072

class NearestNeighbor:
  def __init__(self):
    pass

  def train(self, X, y):
    """ 
    这个地方的训练其实就是把所有的已有图片读取进来 -_-||
    """
    # the nearest neighbor classifier simply remembers all the training data
    self.Xtr = X
    self.ytr = y

  def predict(self, X):
    """ 
    所谓的预测过程其实就是扫描所有训练集中的图片,计算距离,取最小的距离对应图片的类目
    """
    num_test = X.shape[0]
    # 要保证维度一致哦
    Ypred = np.zeros(num_test, dtype = self.ytr.dtype)

    # 把训练集扫一遍 -_-||
    for i in range(num_test):
      # 计算l1距离,并找到最近的图片
      distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)
      min_index = np.argmin(distances) # 取最近图片的下标
      Ypred[i] = self.ytr[min_index] # 记录下label

    return Ypred

nn = NearestNeighbor() # 初始化一个最近邻对象
nn.train(Xtr_rows, Y_train) # 训练...其实就是读取训练集
Yte_predict = nn.predict(Xte_rows) # 预测
# 比对标准答案,计算准确率
print ('accuracy: %f' % ( np.mean(Yte_predict == Y_test) ))

如果你用这段代码跑CIFAR-10,你会发现准确率能达到38.6%。这比随机猜测的10%要好,但是比人类识别的水平(据研究推测是94%)和卷积神经网络能达到的95%还是差多了。

注:python3实现的

3 K最近邻分类器

你可能注意到了,为什么只用最相似的1张图片的标签来作为测试图像的标签呢?这不是很奇怪吗!是的,使用k-Nearest Neighbor分类器就能做得更好。它的思想很简单:与其只找最相近的那1个图片的标签,我们找最相似的k个图片的标签,然后让他们针对测试图片进行投票,最后把票数最高的标签作为对测试图片的预测。所以当k=1的时候,k-Nearest Neighbor分类器就是Nearest Neighbor分类器。从直观感受上就可以看到,更高的k值可以让分类的效果更平滑,使得分类器对于异常值更有抵抗力。

上面示例展示了Nearest Neighbor分类器和5-Nearest Neighbor分类器的区别。例子使用了2维的点来表示,分成3类(红、蓝和绿)。不同颜色区域代表的是使用L2距离的分类器的决策边界。白色的区域是分类模糊的例子(即图像与两个以上的分类标签绑定)。需要注意的是,在NN分类器中,异常的数据点(比如:在蓝色区域中的绿点)制造出一个不正确预测的孤岛。5-NN分类器将这些不规则都平滑了,使得它针对测试数据的泛化(generalization)能力更好(例子中未展示)。注意,5-NN中也存在一些灰色区域,这些区域是因为近邻标签的最高票数相同导致的(比如:2个邻居是红色,2个邻居是蓝色,还有1个是绿色)。

k-NN分类器需要设定k值,那么选择哪个k值最合适的呢?我们可以选择不同的距离函数,比如L1范数和L2范数等,那么选哪个好?还有不少选择我们甚至连考虑都没有考虑到(比如:点积)。所有这些选择,被称为超参数(hyperparameter)。在基于数据进行学习的机器学习算法设计中,超参数是很常见的。一般说来,这些超参数具体怎么设置或取值并不是显而易见的。

你可能会建议尝试不同的值,看哪个值表现最好就选哪个。好主意!我们就是这么做的,但这样做的时候要非常细心。特别注意:决不能使用测试集来进行调优。当你在设计机器学习算法的时候,应该把测试集看做非常珍贵的资源,不到最后一步,绝不使用它。如果你使用测试集来调优,而且算法看起来效果不错,那么真正的危险在于:算法实际部署后,性能可能会远低于预期。这种情况,称之为算法对测试集过拟合。从另一个角度来说,如果使用测试集来调优,实际上就是把测试集当做训练集,由测试集训练出来的算法再跑测试集,自然性能看起来会很好。这其实是过于乐观了,实际部署起来效果就会差很多。所以,最终测试的时候再使用测试集,可以很好地近似度量你所设计的分类器的泛化性能。

好在我们有不用测试集调优的方法。其思路是:从训练集中取出一部分数据用来调优,我们称之为验证集(validation set)。以CIFAR-10为例,我们可以用49000个图像作为训练集,用1000个图像作为验证集。验证集其实就是作为假的测试集来调优。下面就是代码:

# 假定已经有Xtr_rows, Ytr, Xte_rows, Yte了,其中Xtr_rows为50000*3072 矩阵
Xval_rows = Xtr_rows[:1000, :] # 构建1000的交叉验证集
Yval = Y_train[:1000]
Xtr_rows = Xtr_rows[1000:, :] # 保留49000的训练集
Ytr = Y_train[1000:]

# 设置一些k值,用于试验
validation_accuracies = []
for k in [1, 3, 5, 7, 10, 20, 50, 100]:

    nn = NearestNeighbor() # 初始化一个最近邻对象
    nn.train(Xtr_rows, Ytr) # 训练...其实就是读取训练集
    Yval_predict = nn.predict(Xval_rows,k=k) # 预测
    # 比对标准答案,计算准确率
    acc = np.mean(Yval_predict == Yval)
    print ('accuracy: %f' % (acc,))

    # 输出结果
    validation_accuracies.append((k, acc))

关于上述的代码,预测阶段加入了新参数k,所以,修改predict函数如下:

def predict(self, X, k=1):
    """ 
    所谓的预测过程其实就是扫描所有训练集中的图片,计算距离,取最小的距离对应图片的类目
    """
    num_test = X.shape[0]
    # 要保证维度一致哦
    Ypred = np.zeros(num_test, dtype = self.ytr.dtype)

    # 把训练集扫一遍 -_-||
    for i in range(num_test):
      closest_y = []
      # 计算l1距离,并找到最近的图片
      distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)
      a = np.argsort(distances)[:k]#从小到大按照索引排序
      closest_y = self.ytr[a]
      np.sort(closest_y)#从小到大排序,改变原列表的值的顺序
      y =  np.bincount(closest_y)#统计各个元素出现的次数
      z = np.argmax(y)#结合上一步,得出出现最多的数值(整数时成立)
      Ypred[i] = z # 记录下label

    return Ypred

程序结束后,我们会作图分析出哪个k值表现最好,然后用这个k值来跑真正的测试集,并作出对算法的评价。

4 最近邻分类器的优劣

首先,Nearest Neighbor分类器易于理解,实现简单。其次,算法的训练不需要花时间,因为其训练过程只是将训练集数据存储起来。然而测试要花费大量时间计算,因为每个测试图像需要和所有存储的训练图像进行比较,这显然是一个缺点。在实际应用中,我们关注测试效率远远高于训练效率。其实,我们后续要学习的卷积神经网络在这个权衡上走到了另一个极端:虽然训练花费很多时间,但是一旦训练完成,对新的测试数据进行分类非常快。这样的模式就符合实际使用需求。

Nearest Neighbor分类器在某些特定情况(比如数据维度较低)下,可能是不错的选择。但是在实际的图像分类工作中,很少使用。因为图像都是高维度数据(他们通常包含很多像素),而高维度向量之间的距离通常是反直觉的。

上图中,右边3张图片和左边第1张原始图片的L2距离是一样的。很显然,基于像素比较的相似和感官上以及语义上的相似是不同的。

Guess you like

Origin www.cnblogs.com/Terrypython/p/10971866.html