Starfruit Python Machine Learning 5-Data Visualization 1: Scatterplot

My CSDN blog column: https://blog.csdn.net/yty_7

Github address: https://github.com/yot777/

 

Visualize data using Matplotlib

Matplotlib can create a lot of visual charts, it has a rich environment of Python tools, please move to the following tutorial to learn:

https://blog.csdn.net/zw0Pi8G5C1x/article/details/79186024

If you don't plan to study in depth, you can use the following diagram to briefly understand some important terms of Matplotlib:

Combined with the content of our previous section, demonstrate how to use Matplotlib to visualize data in Python.

Step 1: We want to introduce the Matplotlib library, use the following import statement, and then use the abbreviation of the Matplotlib library as plt .

import matplotlib.pyplot as plt

If you are prompted that the matplotlib module is not installed, use Python's pip command to install it.

Step 2: We have to decide which drawing to draw, because the data used in the previous section is:

1    2    1
4    5    0
2    1    1
4    2    1
6    1    0
3    3    1
5    2    0
4    5    0
2    7    0
2    6    1

The first two columns are features, and the last column is a label, a total of 10 pieces of data, so we draw a scatter plot , and different labels show different colors .

scatter () is a function of Matplotlib library to draw a scatterplot. The prototype is as follows:

matplotlib.pyplot.scatter(x, y, s=None, c=None, marker=None, cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, verts=None, edgecolors=None, *, data=None, **kwargs)

For the time being, we will only use the simplest form of the scatter () function: scatter (x, y)

The parameters (x, y) represent the x-axis coordinate and y-axis coordinate of the data point to be drawn.

Pay special attention to x, y are tuples of length n , examples are as follows:

import matplotlib.pyplot as plt

#画出3个散点,坐标分别是(1,1)、(2,4)、(3,9)
plt.scatter((1,2,3), (1,4,9))
plt.show()

The results are shown in the figure:

For a detailed explanation of the scatter () function, please move on:

https://blog.csdn.net/m0_37393514/article/details/81298503

Step 3: Sort the original data according to different labels, that is, put the group with label 0 into a group, and put the group with label 1 into a group

Step 4: Import the set of data labeled 0 into the x-axis coordinates and y-axis coordinates using the scatter () function.

Step 5: Import the set of data labeled 1 into the x-axis coordinates and y-axis coordinates using the scatter () function.

Step 6: Display the drawn scatterplot.

The code is as follows, note that there is a slight difference from the previous code:

The code in the previous section:

        #特征矩阵featureMat实际上是二维列表,注意添加元素的方法和一维列表稍有不同
        featureMat.append([lineArr[0], lineArr[1]])
        #向标签向量labelMat添加元素,即lineArr当前行的最后1个元素
        labelMat.append(lineArr[-1])

Code in this section:

        #特征矩阵featureMat实际上是二维列表,注意添加元素的方法和一维列表稍有不同
        featureMat.append([float(lineArr[0]), float(lineArr[1])])
        #向标签向量labelMat添加元素,即lineArr当前行的最后1个元素
        labelMat.append(float(lineArr[-1]))

Note: Each element of the feature matrix featureMat and label vector labelMat is forced to be converted to float floating-point numeric type, this is to facilitate the scatter () function to identify the x-axis coordinates and y-axis coordinates, because the coordinates can only be numbers can not be character.

The complete code is as follows:

import matplotlib.pyplot as plt
import numpy as np

def loadDataSet(fileName):
    #创建空特征矩阵
    featureMat = []
    #创建空标签向量
    labelMat = []
    #打开文件
    fr = open(fileName)
    #按行遍历读取文件
    for line in fr.readlines(): 
        #每一行先去掉回车换行符,再以Tab键为元素之间的分隔符号,把每一行分割成若干个元素
        lineArr = line.strip().split('\t')
        #向特征矩阵featureMat添加元素,即lineArr当前行的第0个元素和第1个元素
        #特征矩阵featureMat实际上是二维列表,注意添加元素的方法和一维列表稍有不同
        featureMat.append([float(lineArr[0]), float(lineArr[1])])
        #向标签向量labelMat添加元素,即lineArr当前行的最后1个元素
        labelMat.append(float(lineArr[-1]))
        #当前行的元素已添加到特征矩阵featureMat和标签向量labelMat,进入下一行继续
    #所有行都读取完毕后关闭文件
    fr.close()
    #整个loadDataSet()函数返回特征矩阵featureMat和标签向量labelMat
    return featureMat, labelMat

def showDataSet(featureMat, labelMat):
    #创建标签为1的样本列表
    data_one = []
    #创建标签为0的样本列表
    data_zero = []
    #遍历特征矩阵featureMat,i是特征矩阵featureMat的当前行
    #特征矩阵featureMat的两个特征列,正好是散点图的数据点的x轴坐标和y轴坐标
    for i in range(len(featureMat)):
        #如果特征矩阵featureMat的当前行号i对应的标签列表labelMat[i]的值为1
        if labelMat[i] == 1:
            #将当前特征矩阵featureMat[i]行添入data_one列表
            data_one.append(featureMat[i])
        #如果特征矩阵featureMat的当前行号i对应的标签列表labelMat[i]的值为0
        elif labelMat[i] == 0:
            #将当前特征矩阵featureMat[i]行添入data_zero列表
            data_zero.append(featureMat[i])
    #将做好的data_one列表转换为numpy数组data_one_np
    data_one_np = np.array(data_one)
    #将做好的data_zero列表转换为numpy数组data_zero_np
    data_zero_np = np.array(data_zero)
    #根据标签为1的样本的x坐标(即data_one_np的第0列)和y坐标(即data_one_np的第1列)来绘制散点图
    plt.scatter(data_one_np[:,0], data_one_np[:,1])
    #根据标签为0的样本的x坐标(即data_zero_np的第0列)和y坐标(即data_zero_np的第1列)来绘制散点图
    plt.scatter(data_zero_np[:,0], data_zero_np[:,1])
    #显示画好的散点图
    plt.show()


if __name__ == '__main__':
    #调用loadDataSet()函数
    X, y = loadDataSet('test.txt')
    #调用showDataSet()函数
    showDataSet(X, y)

The results are shown in the figure:

We can intuitively see from the scatter plot:

The blue dots (labeled 1) seem to be concentrated in the lower left part of the figure , and the orange dots (labeled 0) seem to be concentrated in the upper right part of the figure

Let's add two more points A and B to the scatterplot. Think about what their labels should be. This is the problem that KNN, the first algorithm for machine learning, is about to solve.

to sum up

Matplotlib can create a lot of visual charts, drawing steps:

Step 1: Import the Matplotlib library and use import matplotlib.pyplot as plt

Step 2: Decide which graph to draw, this section is a scatter plot

For the time being, only the simplest form of the scatter () function is used for scatter charts: scatter (x, y)

The parameters (x, y) represent the x-axis coordinate and y-axis coordinate of the data point to be drawn.

Pay special attention to x, y are tuples of length n

Step 3: Sort the original data according to different labels, that is, put the group with label 0 into a group, and put the group with label 1 into a group

Step 4: Import the set of data labeled 0 into the x-axis coordinates and y-axis coordinates using the scatter () function.

Step 5: Import the set of data labeled 1 into the x-axis coordinates and y-axis coordinates using the scatter () function.

Step 6: Display the drawn scatterplot.

Note: Each element of the feature matrix featureMat and label vector labelMat must be forced to transform into a floating-point numeric type.

 

 

My CSDN blog column: https://blog.csdn.net/yty_7

Github address: https://github.com/yot777/

If you think this chapter is helpful to you, welcome to follow, comment and like! Github welcomes your Follow and Star!

Published 55 original articles · won praise 16 · views 6111

Guess you like

Origin blog.csdn.net/yty_7/article/details/105164521