python excel file read, do pivot (cell merging)

The demand described first, as shown a

             Figure I

To count the number of each user id, nicknames and their corresponding word (row 1 1 count data), and its corresponding teacher process (ps: there is at least one teacher process), and the number and proportion of each teacher process,

Finally, results obtained under II in FIG.

              Figure II

"""
数据处理
1. 读取数据中的 用户id,用户昵称,字,老师id,老师昵称
2. 统计用户id及其字的个数(多少行),以及老师id的数量
3. 然后统计该用户下各老师处理的数量及占比
"""

import xlrd,xlwt
from collections import Counter
workbook=xlrd.open_workbook('1月正式课情况.xlsx') # 打开文件
# sheet_name=workbook.sheet_names()  #所有sheet的名字
sheets=workbook.sheets()#返回可迭代的sheets对象
sh = sheets[0] # 此处0,表示取第一个sheet,0为索引,也可以通过sheet表名取sheet,方法是: workbook.sheet_by_name("sheet名")

# 动态获取 用户id 等所在的列索引
row_1 = sh.row_values(0) # 获取第1行内容
user_id_index = row_1.index("用户id")
user_nick_index = row_1.index("用户昵称")
teacher_id_index = row_1.index("老师id")
teacher_nick_index = row_1.index("老师昵称")
# 取用户id集合
col_user_id = sh.col_values(user_id_index)[1:]
# 统计用户id列表的值及个数
user_id_and_count = dict(Counter(col_user_id))
user_ids = list(set(col_user_id))
rx = sh.nrows # 表格总行数

# 新建一个文件,准备写入数据
wb = xlwt.Workbook()
new_sheet = wb.add_sheet('用户-老师技术表') # sheet名,不填默认sheet1
row = ['用户id', '用户昵称', '老师计数', '计数项:字', '老师id','老师昵称','老师处理量','占比']

# 生成第一行
for i in range(0, len(row)):
    new_sheet.write(0, i, row[i])

cursor = 1

# 获取用户id和用户昵称,以及老师id及昵称对应关系
user_id_nicks = {}
teacher_nick_dict = {}
for i in range(1,rx):
    id = int(sh.row_values(i)[user_id_index])
    nick = sh.row_values(i)[user_nick_index]
    teacher_id = sh.row_values(i)[teacher_id_index]
    teacher_nick = sh.row_values(i)[teacher_nick_index]
    if user_id_nicks.__contains__(id): # 判断字典是否有该键,没有则添加
        user_id_nicks[id] = nick
    if teacher_nick_dict.__contains__(id):
        teacher_nick_dict[teacher_id] = teacher_nick


def write_data(sheet, item, cursor):
    item_head_4 = item[0]
    item_tail_3 = item[1]
    len = item_head_4[2]  # 老师计数
    for i, v in enumerate(item_head_4): # 合并单元格就需要一列列写入
        sheet.write_merge(cursor,cursor+len-1,i, i, v)
    flag = 0
    for m in item_tail_3:
        for i, v in enumerate(m):
            sheet.write(cursor+flag, i+4, v)
        flag += 1
    cursor = cursor + len
    return cursor


for user_id in user_ids:
    # 遍历所有行
    teacher_id_list = []
    for i in range(rx):
        if user_id == sh.row_values(i)[user_id_index]:
            # 取出该user_id下的所有老师id及数量统计
            teacher_id_list.append(sh.row_values(i)[teacher_id_index])
    user_teacher_id_and_count = dict(Counter(teacher_id_list))
    user_teacher_count = len(user_teacher_id_and_count)
    char_num = user_id_and_count[user_id]
    # '用户id', '用户昵称','老师计数', '处理量计数',  '老师id', '老师昵称''老师处理量','占比'
    item_head_4 = [int(user_id),user_id_nicks.get(int(user_id)),user_teacher_count,char_num]
    item_tail_3 = []
    for k, v in user_teacher_id_and_count.items():
        item_tail_3.append([int(k), teacher_nick_dict.get(int(k)), v, "%.2f%%" % (v/char_num * 10000/100)])
    item = [item_head_4,item_tail_3]
    cursor = write_data(new_sheet,item,cursor)

# 要保存的文件名
wb.save('用户-老师技术表.xls')

 

最后,想额外说的一点是上图二中合并单元格得操作, 即上图中write_data函数中使用的write_merge函数,因为对部分列的部分行单元格进行了合并,所有写入数据时,不能和之前一样标准地一行一列写入,而是先写入合并了行单元格的列数据,然后再一行行写入未合并的行的列数据;

write_merge(x, x + m, y, y + n, string, sytle)
x表示行,y表示列,m表示跨行个数,n表示跨列个数,string表示要写入的单元格内容,style表示单元格样式。其中,x,y,m,n,都是以0开始计算的。

如:sheet.write_merge(21,21,0,1,u'合计',set_style('Times New Roman',220,True))

即在22行合并第1,2列,合并后的单元格内容为"合计",并设置了style。

本文参考资料: https://www.jb51.net/article/60510.htm

Guess you like

Origin www.cnblogs.com/jason-Gan/p/12311430.html