Python machine learning - data preprocessing & algorithm preliminary


insert image description here

data preprocessing

1. Get data

insert image description here
insert image description here

from sklearn.datasets import load_iris
li=load_iris()
print("获取特征值")
print(li.data)
print("目标值",li.target)#分类数据集

Large Datasets for Classification

insert image description here

from sklearn.datasets import load_iris,fetch_20newsgroups
news=fetch_20newsgroups(subset="all")
print(news.data)
print(news.target)
from sklearn.datasets import load_boston
lb=load_boston()
print(lb.data)
print(lb.target)#回归数据集

2. Handling missing values

insert image description here
insert image description here

3. Divide the dataset

Training set and test set division: 70% 30%; 80% 20%; 75% 25%
insert image description here

from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()##将数据集赋值给cancer变量
cancer_data = cancer['data']
cancer_target = cancer['target']
from sklearn.model_selection import train_test_split
cancer_data_train, cancer_data_test,cancer_target_train, cancer_target_test = \
train_test_split(cancer_data, cancer_target,test_size=0.2, random_state=42) # test_size表示测试集在总数中的占比

4. Data preprocessing and PCA dimensionality reduction

insert image description here

#离差标准化
from sklearn.preprocessing import MinMaxScaler
Scaler = MinMaxScaler().fit(cancer_data_train) ##生成规则
##将规则应用于训练集
cancer_trainScaler = Scaler.transform(cancer_data_train) 
##将规则应用于测试集
cancer_testScaler = Scaler.transform(cancer_data_test)

insert image description here

from sklearn.decomposition import PCA
pca_model = PCA(n_components=10).fit(cancer_trainScaler) ##生成规则
cancer_trainPca = pca_model.transform(cancer_trainScaler) ##将规则应用于训练集
cancer_testPca = pca_model.transform(cancer_testScaler) ##将规则应用于测试集

5. Algorithm Implementation: Estimator

insert image description here
insert image description here

Guess you like

Origin blog.csdn.net/Pireley/article/details/131362333