Introduction to machine learning (1): algorithm classification and data set calling

1. Classification of machine learning algorithms:

1. Supervised learning:
      target value: category (discrete data) ------ classification problem (for example, the classification of cats and dogs in Figure 1)
      Classification algorithms: k-nearest neighbor algorithm, Bayesian classification, decision tree and random forest, Logistic regression
      target value: continuous data (house prices, etc.) ------ regression problems (for example, the house price prediction in Figure 2)
      Regression algorithm: linear regression, ridge regression
2. Unsupervised learning:
      target value: none-- ---- Unsupervised learning (for example, Figure 3, classify a group of people but no target value)
      clustering algorithm: k-means

Insert picture description here
                                        Figure 1
figure 2:
                                        Figure 2
Insert picture description here
                                        Figure 3

to sum up:
Insert picture description here

2. Use of sklearn data set

Common framework:
machine learning framework: sklearn
deep learning framework: Tensorflow, Pytorch, caffe2, theano, Chainer

Commonly used public data sets: sklearn, kaggle, UCI

1. Call the dataset in the sklearn library and
    use datasets.load_***()he datasets.fetch_***(dataset=None) to load the
    return value: the
    return value type is datasets.base.Bunch (dictionary format), including five Key-value pairs:
        data: feature data array
        target: tag array
        DESCR: data description
        feature_names: feature names (not available in news data, handwritten numbers, regression data sets)
        target_names: tag names

from sklearn.datasets import load_iris
def datasets_demo():
     iris=load_iris()
     print('鸢尾花数据集:\n',iris)
     print('查看数据集描述:\n', iris['DESCR'])
     print('查看特征值的名字:\n', iris.feature_names)
     print('查看特征值:\n', iris.data, iris.data.shape)
if __name__=='__main__':
    datasets_demo()

Insert picture description here
Insert picture description here
Insert picture description here
2. Data set division
Use sklearn.model_selection.train_test_split(arrays,*option)
Insert picture description here

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
def datasets_demo():
    #获取数据集
     iris=load_iris()
     print('鸢尾花数据集:\n',iris)
     print('查看数据集描述:\n', iris['DESCR'])
     print('查看特征值的名字:\n', iris.feature_names)
     print('查看特征值:\n', iris.data, iris.data.shape)
     #数据集划分
     x_train,x_test,y_train,y_test = train_test_split(iris.data,iris.target,test_size=0.2,random_state=22)
     print('训练集的特征值:\n',x_train,x_train.shape)
if __name__=='__main__':
    datasets_demo()

Insert picture description here
Insert picture description here

Guess you like

Origin blog.csdn.net/qq_45234219/article/details/114636009