Machine learning practical --One-Hot encoding of code (one-hot encoding)

1. The experiment

According to csv file to predict the price of the car to the property (Car Model, Mileage, Sell Price ($), Age (yrs)). Two predictions will be given below onehot coding method, wherein the model with LinearRegression.

Autodata
Password: 7izi

2. Training + forecast

2.1.get_dummies method

import pandas as pd

df = pd.read_csv('carprices.csv')  
dummies = pd.get_dummies(df['Car Model'])  #对Car Model字段用get_dummies数字化
dummies

Here Insert Picture Description

merged = pd.concat([df,dummies],axis='columns')  #合并字段
final = merged.drop(['Car Model','Mercedez Benz C class'],axis='columns')   #删除原Car Model字段和Mercedez Benz C class,其中删除Mercedez Benz C class是为了防止虚拟陷阱(详细请查阅相关资料)

X = final.drop('Sell Price($)',axis='columns')  #训练数据
y = final['Sell Price($)']    #训练标签
 
from sklearn.linear_model import LinearRegression

model = LinearRegression()   
model.fit(X,y)    #用LinearRegression拟合训练数据
model.score(X,y)  #计算得分

Here Insert Picture Description
prediction:
Here Insert Picture Description

2.2.OneHotEncoder method

from sklearn.preprocessing import LabelEncoder   #导入LabelEncoder模块

le = LabelEncoder()  #实例化对象
dfle = df
dfle['Car Model'] = le.fit_transform(dfle['Car Model']) #利用LabelEncoder将字段Car Model数字化
dfle

Here Insert Picture Description

X = dfle[['Car Model','Mileage','Age(yrs)']].values
y = dfle['Sell Price($)'].values

from sklearn.preprocessing import OneHotEncoder  #导入OneHotEncoder模块
ohe = OneHotEncoder(categorical_features=[0]) #对第一个字段OneHot编码
X = ohe.fit_transform(X).toarray() #转化成0、1形式
X = X[:,1:]   #其中删除Mercedez Benz C class是为了防止虚拟陷阱(详细请查阅相关资料)
X

model.fit(X,y)
model.score(X,y)

Here Insert Picture Description
Here Insert Picture Description
prediction:
Here Insert Picture Description

He published 198 original articles · won praise 566 · views 20000 +

Guess you like

Origin blog.csdn.net/weixin_37763870/article/details/105360462