Mark a mistake(一)

today,when I use sklearn's KMeans algorithm to fit my trainingset , meet some mistake..

there is my code...

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn import linear_model
import csv
import os

file = pd.read_excel(io="D://influence.xlsx",encoding='UTF-8',sheet_name= 'movie_metadata')

print(file.head())

#对影响电影的所有特征进行统计与分析(数量,去重数,最高,最高出现次数,平均值,最小等多组特征进行分析)
file_data = file.describe(include = 'all').T
print(file_data)

#分别从原数据集中读取电影名name1,以及由票房数和IMDB评分构成的dataframe--data1
name1 = pd.read_excel(io="D://influence.xlsx",encoding='UTF-8',sheet_name= 'movie_metadata',usecols=[0,19])
data1 = pd.read_excel(io="D://influence.xlsx",encoding='UTF-8',sheet_name= 'movie_metadata',usecols=[19,25])

#数据清洗,去除dataframe中为NaN的行(为了不影响之后的聚类分析)
data2 = data1.dropna(axis=0)
name2 = name1.dropna(axis=0)
print(data2)
print(name2)
name = name1['电影名']

#创建KMeans对象,进行初始化参数设定
km = KMeans(n_clusters=3)
label = km.fit_predict(data2)

print(label)
category = np.sum(km.cluster_centers_,axis=1)

print(category)
print(sum(label))

MoviesCluster = [[],[],[]]
for i in range(len(name)):
    MoviesCluster(label[i].append(name[i]))
for i in range(len(MoviesCluster)):
    print("Category:%.2f" % category[i])
    print(MoviesCluster[i])

Error:


I want affect like this:




猜你喜欢

转载自blog.csdn.net/weixin_40896352/article/details/80863303
今日推荐