The number (pandas teaching methods, reference dark horse tutorial) statistics under different labels movie

Topics requirements: the original data are as follows, each movie will have different classification, statistics how many different types of movies have their own, such as the type contained in the Action section number. Here Insert Picture Description
Ideas:
first build a two-dimensional array are all DataFrame 0, each line of each movie, each column is a film type, in the corresponding Genre below the 0 to 1
Here Insert Picture Description

import pandas as pd 
from matplotlib import pyplot as plt 
import numpy as np
file_path="./IMDB-Movie-Data.csv"

df=pd.read_csv(file_path) 
print(df["Genre"].head(3)) 
#统计分类的列表
temp_list=df["Genre"].str.split(",").tolist()  #转换为[[],[],[]]形式的数据,列表中嵌套列表

genre_list=list(set([i for j in temp_list for i in j]))
#set()表示为集合,因为集合里边所有的数必须唯一,因此可以通过将列表转换为集合set来实现去重
#即所有类型只出现一次。此外还可以通过list.unique()方法实现去重

#构造全为0的数组 
zeros_df=pd.DataFrame(np.zeros((df.shape[0],len(genre_list))),columns=genre_list) 
#df.shape[0]表示df中总共又多少行,colums用来表示行标签,用genre_list代替,相应的,indexs表示列表标签,自动用0123等代替
 
 # print(zeros_df) 
 #给每个电影出现分类的位置赋值1 
for i in range(df.shape[o]):
    #zeros_df.loc[0,["sci-fi","Mucical"]]=1 
    zeros_df.loc[i,temp_list[i]]=1
#loc表示通过pandas方法中用标签来进行检索,i表示行标签,0123等,temp_list[i]表示一个list表
#这个list表中数据均为列标签,也就是不同列标签的列表,因此可通过loc方法来定位相应的位置,从而将响应数据改为1
    #print(zeros_df.head(3))
#统计每个分类的电影的数量和,axis表示轴,axis=0表示统计所有行的加计总和,即纵向相加,axis=1表示统计所有列的加计总和,即横向相加
genre_count=zeros_df.sum(axis=0)

For a data only in the case of a classification can use another statistical method: Add categories column, and then use the function groupby
example: 119 Telephone cause of the accident category (fire, emergency, traffic accident three kinds) statistics
The first is the original method: construction array Here Insert Picture Description
here using the new method, the new one, and then use the method groubyHere Insert Picture Description

Published 12 original articles · won praise 0 · Views 206

Guess you like

Origin blog.csdn.net/Alden_Wei/article/details/105213528