1. Classification of machine learning algorithms:
1. Supervised learning:
target value: category (discrete data) ------ classification problem (for example, the classification of cats and dogs in Figure 1)
Classification algorithms: k-nearest neighbor algorithm, Bayesian classification, decision tree and random forest, Logistic regression
target value: continuous data (house prices, etc.) ------ regression problems (for example, the house price prediction in Figure 2)
Regression algorithm: linear regression, ridge regression
2. Unsupervised learning:
target value: none-- ---- Unsupervised learning (for example, Figure 3, classify a group of people but no target value)
clustering algorithm: k-means
Figure 1
Figure 2
Figure 3
to sum up:
2. Use of sklearn data set
Common framework:
machine learning framework: sklearn
deep learning framework: Tensorflow, Pytorch, caffe2, theano, Chainer
Commonly used public data sets: sklearn, kaggle, UCI
1. Call the dataset in the sklearn library and
use datasets.load_***()he datasets.fetch_***(dataset=None) to load the
return value: the
return value type is datasets.base.Bunch (dictionary format), including five Key-value pairs:
data: feature data array
target: tag array
DESCR: data description
feature_names: feature names (not available in news data, handwritten numbers, regression data sets)
target_names: tag names
from sklearn.datasets import load_iris
def datasets_demo():
iris=load_iris()
print('鸢尾花数据集:\n',iris)
print('查看数据集描述:\n', iris['DESCR'])
print('查看特征值的名字:\n', iris.feature_names)
print('查看特征值:\n', iris.data, iris.data.shape)
if __name__=='__main__':
datasets_demo()
2. Data set division
Use sklearn.model_selection.train_test_split(arrays,*option)
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
def datasets_demo():
#获取数据集
iris=load_iris()
print('鸢尾花数据集:\n',iris)
print('查看数据集描述:\n', iris['DESCR'])
print('查看特征值的名字:\n', iris.feature_names)
print('查看特征值:\n', iris.data, iris.data.shape)
#数据集划分
x_train,x_test,y_train,y_test = train_test_split(iris.data,iris.target,test_size=0.2,random_state=22)
print('训练集的特征值:\n',x_train,x_train.shape)
if __name__=='__main__':
datasets_demo()