python: pandas numerical statistics, usage of .value_counts(), full DataFrame data count

This article records the code that uses the numpy module in the python language to count the values ​​in the Excel table data, and counts the number of occurrences of a certain value or character in the entire table, as well as the number of occurrences of the value in a certain row and column number. The main function used is .value_counts().



1. Function introduction

.value_counts() count

value_counts( values, sort=True, ascending=False, normalize=False, bins=None, dropna=True)

parameter name effect
sort=True Whether to sort (default: sort)
ascending=False (default: sort descending)
normalize=False Whether to normalize the calculation results and display the normalized results (default: False)
bins=None You can customize the grouping interval (default: no)
dropna=True Whether to remove missing values ​​nan (default: remove)

Two, .value_counts() counts a column of data

Sometimes, the calculation results are not completely displayed in the middle, which is called "truncated data". We output the data as csv, and you can see all the data by adding mode='a' to the output function.
Output data:
Incompletely displayed data
You can see all the results after outputting all the results as csv
The output result after adding parameters

df = df["RA_C"].value_counts()

print(df)

df.to_csv("df.csv",mode='a')

3. Statistics of repeated values ​​in multiple columns of data (different column attributes are the same)

Count all data in an Excel table, not just one column.

pieces = []

for col in dfS.columns:

	tmp_series = df[col].value_counts()
	
	tmp_series.name = col
	
	pieces.append(tmp_series)
	
	df_value_counts = pd.concat(pieces, axis=1)

The result is shown in the figure:
the number of occurrences of the value in each column is all, if you want to know the number of occurrences in the whole table, just sum.
insert image description here


#将NaN填充为0
df = df_value_counts.fillna(0)
 
#添加新的一行:对每一行求和
df["Total"] =df.apply(lambda x:x.sum(),axis =1)
  
#只提取index和Total
df = df[['Total']]  

# 只显示df的前5行
print(df.head())

# 将结果输出为csv文件
df.to_csv("result_3.csv",mode='a')

The results are as follows: It can be seen that the value 221 appears in the excel table for 132825 times, 222 appears 94924 times, and 220 appears 9276 times... The
insert image description here
output results are sorted in Excel, and you can know the number of values


.value_counts()

4. Statistics of repeated values ​​in multiple columns of data (different column attributes)

input:

output: count result
insert image description here

import pandas as pd
import os 
os.chdir(r'C:\Users\Administrator\Desktop')
df = pd.read_excel('数据.xls')
def fun(df):
    dic = {
    
    }
    for i in df.columns:
        dic[i] = df[i].value_counts()
    return dic
dd = fun(df)

# 写入excel
import xlwt
f = xlwt.Workbook() #创建工作薄
sheet1 = f.add_sheet(u'sheet1',cell_overwrite_ok=True) #创建sheet
pattern = xlwt.Pattern()
pattern.pattern = xlwt.Pattern.SOLID_PATTERN
pattern.pattern_fore_colour = 5  
 
style = xlwt.XFStyle()
style.pattern = pattern

al = xlwt.Alignment()
al.horz = 0x02 # 设置水平居中
al.vert = 0x01 # 设置垂直居中
style.alignment = al
# 获取字典的键
list_ = [k  for  k in  dd]
k=0
l=0
for s in range(len(dd)):
    l=k+1
    # 写入第一行
    sheet1.write_merge(0, 1, k, l,list_[s] , style)
    # 写入内容
    j = 2
    for v,h  in zip(dd[list_[s]],dd[list_[s]].index):
        sheet1.write(j,k,h) #循环写入 竖着写 
        sheet1.write(j,l,v) #循环写入 竖着写 
        j=j+1
    k=k+3
f.save('统计数据.xls')#保存文件

Guess you like

Origin blog.csdn.net/qq_35591253/article/details/115639032