Python combat: the freedom to customize the table

Many developers have said that since Python / Pandas, Excel is not how to use, and use it to process and visualize table very quickly.

Let me give you a few examples.

1. Remove duplicates and blank lines

We direct method dict.fromkeys of the current data into the dictionary, because the default value None less than, it does not matter. Then we re-direct the results list type conversion, convert list.

In [135]:
for row in rows4:
    print(row)
('name', 'address')
('tom li', 'beijing')
('tom li', 'beijing')
('',)
('mary wang', 'shandong')
('mary wang', 'shandong')
('',)
('de8ug', 'guangzhou')
In [148]:
dict.fromkeys(rows4)
Out[148]:
{('name', 'address'): None,
 ('tom li', 'beijing'): None,
 ('',): None,
 ('mary wang', 'shandong'): None,
 ('de8ug', 'guangzhou'): None}
In [137]:
list(dict.fromkeys(rows4))
Out[137]:
[('name', 'address'),
 ('tom li', 'beijing'),
 ('',),
 ('mary wang', 'shandong'),
 ('de8ug', 'guangzhou')]

At this time, remove the duplicate data directly, we note here dict is a new version of python3, it did not affect the order, if you are still using python2 python3.5 or less, it is recommended to upgrade python version.

Next, is empty of data processed. Observation ( '') is a tuple, a position data of the empty string, then the overall length 1, can be directly removed by the circulation. Loop here we can use the Python written syntactic sugar, get direct line, leaving only the final add is determined length is greater than 1, and finally into a list by a list.

In [179]:
list(x for x in dict.fromkeys(rows4) if len(x[0])>1)
Out[179]:
[('name', 'address'),
 ('tom li', 'beijing'),
 ('mary wang', 'shandong'),
 ('de8ug', 'guangzhou')]

The above studies get directly to the research results into the function to solve the problem of duplicate rows and empty rows.

Note that this time we are dealing with rows of data, so do not then listed the cycle. Moreover, after the current sheet in the process, the content will modify the position of each row or deleted. So we will start with old_rows = [x for x in sheet.values] to take each row of data from the old, with a note sheet here after taking values ​​directly to the data, rather than cell objects. old_rows here is a list, you can delete duplicate data into and empty rows directly with research earlier.

Next, delete all the rows with sheet.delete_rows (1, sheet.max_row), the first argument starts from the first row, the second parameter is the maximum number of rows. Finally, a new way loop rows of data, the new data is written to the current sheet.

In [189]:
def handle_duplicate(wb, sheetname):
    """
    去除重复行,空行
    先取出每一行,清空sheet,处理后写回
    """
    print(f'开始处理工作表:{sheetname}'.center(18, '-'))
    sheet = wb[sheetname]
    old_rows = [x for x in sheet.values]
    print('修改前:', old_rows)
    new_rows = list(x for x in dict.fromkeys(old_rows) if len(x[0])>1)
    print('修改后-》》', new_rows)

    # 删除所有行
    sheet.delete_rows(1, sheet.max_row)
    # 写入新数据
    for row in new_rows:
        sheet.append(row)

Run the test, to see the results. Again, you must remember to test ah! If there are errors on the error prompt, view the code, repeated testing, remove the bugs.

In [190]:
wb = load_data()
handle_duplicate(wb, '重复行')
save_as(wb)

2. Remove the space

Remove the spaces also need to use a string function, so here is simple look. If we want to remove the middle of a string space can be divided with a split default, and then dividing the result by the '' .join method to connect it. Note that join before an empty string. There is also less than the removal of the spaces at both ends of the strip, because the list only after the split divided the last several strings of.

In [192]:
a="a b c   "
In [194]:
a.strip()
Out[194]:
'a b c'
In [195]:
a.split()
Out[195]:
['a', 'b', 'c']
In [196]:
''.join(a.split())
Out[196]:
'abc'
In [ ]:

After the successful research, write functions. The name handle_blank.

In [197]:
def handle_blank(wb, sheetname):
    """
    按列循环, 通过参数确认目标
    """
    print(f'开始处理工作表:{sheetname}'.center(18, '-'))
    sheet = wb[sheetname]
    for col in sheet.iter_cols():  # 不加参数,循环所有列
        for cell in col:
            print('修改前:', cell.value, end='')
            cell.value = ''.join(cell.value.split())
            print('修改后-》》',cell.value)
In [198]:
handle_blank(wb, '空格')

3. Modify the date and time format

Sometimes, we need to form a cell in a time-dependent format changes, this time need to use the Python module datetime, the format will be needed after stitching, converted by strftime.

The assumption here before we wanted a simple 1 / November day format, change the style of date, plus intermediate separator / or - you need to use "% x" or "% Y-% m-% d" to carry out the operation. Note that the% the letters are officially defined format only, we used splicing time, it passed to the function.

More specific splicing the following format:

In [199]:
import datetime
In [209]:
d=datetime.datetime(2019,1,11)
In [203]:
d.strftime("%x")
Out[203]:
'01/11/19'
In [205]:
d.strftime("%Y-%m-%d")
Out[205]:
'2019-01-11'

After the study is completed, we write the function.

First required parameters ( '/') before the date the simple segmentation, m, for the month and day, and then converting datetime, generating a time-dependent objects Day, attention inside with m, d = cell.value.split numbers, so use int conversion, the final day for the formatted output. After writing function, you must remember the test.

In [218]:
def handle_time(wb, sheetname):
    """
    按列循环, 通过参数确认目标
    """
    print(f'开始处理工作表:{sheetname}'.center(18, '-'))
    sheet = wb[sheetname]
    for col in sheet.iter_cols(max_col=1, min_row=2):  # 找到时间的列, 第一列,从第二行开始
        for cell in col:
            print('修改前:', cell.value, end='')
            m, d = cell.value.split('/')
            day = datetime.datetime(2019, int(m), int(d))
            cell.value = day.strftime("%Y-%m-%d")
            print('修改后-》》',cell.value)

In [220]:
wb = load_data()
handle_time(wb, '时间')
save_as(wb)

4. Repair numbers and symbols

Subsequently, processing numbers, and symbols related operations. Price prior to joining us, there is a lot of the decimal point, and this time wanted to save two decimal places, and add RMB symbol prefix. We need a new wave of studies.

Decimal point, it is necessary to ensure that the number of bits, here we requested two, the second is to be rounded to the extra digits. There are two ways you can complete one, with a Decimal with a round, the difference between the two is Decimal ( "0.00") after a specified number, it will automatically fill 0, 0 and round encounter will automatically give it up. And rounding in the calculation of the round, a little bit special. Specific to view the official documentation.

We are here with Decimal to be completed within a function related operations. Remember to test ah!

In [227]:
from decimal import Decimal
In [240]:
a = 3.1
b=Decimal(a).quantize(Decimal("0.00"))
print(b)
3.10
In [244]:
round(a,2)  # 位数自动省略0
Out[244]:
3.1

In [247]:
def handle_num(wb, sheetname):
    """
    按列循环, 通过参数确认目标
    """
    print(f'开始处理工作表:{sheetname}'.center(18, '-'))
    sheet = wb[sheetname]
    for col in sheet.iter_cols(min_col=3, max_col=3, min_row=2):  # 找到时间的列, 第一列,从第二行开始
        for cell in col:
            print('修改前:', cell.value, end='')
#             cell.value = round(float(cell.value), 3)
            cell.value = '¥' + str(Decimal(cell.value).quantize(Decimal("0.00")))
            print('修改后-》》',cell.value)
In [249]:
wb = load_data()
handle_num(wb, '数字符号')
save_as(wb)
Published 38 original articles · won praise 1 · views 2181

Guess you like

Origin blog.csdn.net/wulishinian/article/details/104991119