This article records the code that uses the numpy module in the python language to count the values in the Excel table data, and counts the number of occurrences of a certain value or character in the entire table, as well as the number of occurrences of the value in a certain row and column number. The main function used is .value_counts().
Article directory
1. Function introduction
.value_counts() count
value_counts( values, sort=True, ascending=False, normalize=False, bins=None, dropna=True)
parameter name | effect |
---|---|
sort=True | Whether to sort (default: sort) |
ascending=False | (default: sort descending) |
normalize=False | Whether to normalize the calculation results and display the normalized results (default: False) |
bins=None | You can customize the grouping interval (default: no) |
dropna=True | Whether to remove missing values nan (default: remove) |
Two, .value_counts() counts a column of data
Sometimes, the calculation results are not completely displayed in the middle, which is called "truncated data". We output the data as csv, and you can see all the data by adding mode='a' to the output function.
Output data:
You can see all the results after outputting all the results as csv
df = df["RA_C"].value_counts()
print(df)
df.to_csv("df.csv",mode='a')
3. Statistics of repeated values in multiple columns of data (different column attributes are the same)
Count all data in an Excel table, not just one column.
pieces = []
for col in dfS.columns:
tmp_series = df[col].value_counts()
tmp_series.name = col
pieces.append(tmp_series)
df_value_counts = pd.concat(pieces, axis=1)
The result is shown in the figure:
the number of occurrences of the value in each column is all, if you want to know the number of occurrences in the whole table, just sum.
#将NaN填充为0
df = df_value_counts.fillna(0)
#添加新的一行:对每一行求和
df["Total"] =df.apply(lambda x:x.sum(),axis =1)
#只提取index和Total
df = df[['Total']]
# 只显示df的前5行
print(df.head())
# 将结果输出为csv文件
df.to_csv("result_3.csv",mode='a')
The results are as follows: It can be seen that the value 221 appears in the excel table for 132825 times, 222 appears 94924 times, and 220 appears 9276 times... The
output results are sorted in Excel, and you can know the number of values
.value_counts()
4. Statistics of repeated values in multiple columns of data (different column attributes)
input:
output: count result
import pandas as pd
import os
os.chdir(r'C:\Users\Administrator\Desktop')
df = pd.read_excel('数据.xls')
def fun(df):
dic = {
}
for i in df.columns:
dic[i] = df[i].value_counts()
return dic
dd = fun(df)
# 写入excel
import xlwt
f = xlwt.Workbook() #创建工作薄
sheet1 = f.add_sheet(u'sheet1',cell_overwrite_ok=True) #创建sheet
pattern = xlwt.Pattern()
pattern.pattern = xlwt.Pattern.SOLID_PATTERN
pattern.pattern_fore_colour = 5
style = xlwt.XFStyle()
style.pattern = pattern
al = xlwt.Alignment()
al.horz = 0x02 # 设置水平居中
al.vert = 0x01 # 设置垂直居中
style.alignment = al
# 获取字典的键
list_ = [k for k in dd]
k=0
l=0
for s in range(len(dd)):
l=k+1
# 写入第一行
sheet1.write_merge(0, 1, k, l,list_[s] , style)
# 写入内容
j = 2
for v,h in zip(dd[list_[s]],dd[list_[s]].index):
sheet1.write(j,k,h) #循环写入 竖着写
sheet1.write(j,l,v) #循环写入 竖着写
j=j+1
k=k+3
f.save('统计数据.xls')#保存文件