[Python] using pandas Tencent questionnaire quickly extract information, compare the list of unfilled staff

Introduction Background

Epidemic these days, the school requires each student must fill out the questionnaire. So I simply created a Tencent questionnaire. Let me count counselor today under which several students did not fill, and later told every day count.

There are about 40 people in our class, fill out a questionnaire about 30 people a day who, if the traditional method of relying on the naked eye, to fill out the questionnaire and a list of eleven large list than to identify unfilled people, would not have to see eye spent. What's more remembered counselor said that the next day you will all likewise, is simply time consuming.

I have always believed in the truth: all repetitive works will be performed by the program, even if the writing process time may have exceeded the time to complete!

Pick unfilled people, is to make a difference set operation between essentially all staff and personnel to fill , it's just a level of microsecond time-consuming, but in reality it takes a long time in the python inside. So I began writing programs to solve the practical problems.

Obtain information questionnaire

Here Insert Picture Description
I am using a questionnaire Tencent, where you can see all the data to fill in data recovery. Since many days I use a questionnaire is the same, that is where my data is all so many days, and my task is to find out not to fill in the names of people every day.

Python want to deal with the case, it is imperative data to win. Here you can do with reptiles, but Tencent has provided direct export interface, data can be exported as .csv (like excel).

There is a python library called pandas, designed for use with excel, csv file format. Before the stock market is doing at the time the information is used, where it is thought the first time, do this task with the pandas very appropriate.

Tencent questionnaire exported csv as follows:

Here Insert Picture Description

Use pandas are processed

import pandas as pd
csv = pd.read_csv('2.csv')

The exported csv file renamed '2.csv', it is very simple to use read_csv this api directly read information of the csv.

Here Insert Picture Description
Because here pycharm automatic thumbnail in the middle of the content, but we can know that the pandas will become read csv form.

Next to complete the first task is to extract the complete list of staff given date day.

Tencent questionnaire time format is this:

Here Insert Picture Description
But in fact we do not need to know the specific time, only the date on it. Thus, the first treatment with a simple function:

a = lambda m:m[:2]
csv['开始答题时间'] = csv['开始答题时间'].apply(a)

Be customized to a lambda function, the effect is taken first two strings. And then apply the function which was used to "answer start time" column.
Thus, the start time will answer only the first two, which is the date, such as data Feb. 25, the start time of the answer is 25. Next we screened by this data.

today = '25'
today_finish = csv['2.姓名'][csv['开始答题时间']==today]

For example, I am looking for today (February 25) to complete the list, it will be today assigned '25', the second statement is extracted in csv "names" in the column, "Start answer time" as the element 25. A specified before the brackets to be extracted columns, brackets again after a screening of a given condition, the resulting answer is the start time is today's list.

Next, I asked the instructor to the class roster who also excel format. For convenience, it will be named "1.xlsx".

s = pd.read_excel('1.xlsx')
total = s['姓名'].values
not_finish = set(total).difference(set(today_finish))

Also, read with the pandas, pandas use the second sentence, extracted excel names contained in the list is full. Finally, the set (), the full list and a list of persons has been completed today we just extracted are converted to the collection, a collection of methods difference and then to obtain a set difference between the two. This difference sets, it is also unfinished today a list of people.

postscript

import pandas as pd

csv = pd.read_csv('2.csv')


a = lambda m:m[:2]
csv['开始答题时间'] = csv['开始答题时间'].apply(a)

today = '25'
today_finish = csv['2.姓名'][csv['开始答题时间']==today]

s = pd.read_excel('1.xlsx')
total = s['姓名'].values

not_finish = set(total).difference(set(today_finish))
print(not_finish)

As a whole the code is very simple. Operating efficiency is also high, so that instead of manual labor, saving a lot of time. If you want to lazy to the extreme, you should write a reptile automatically export data questionnaire, after all, can only be manually export and then manually save, not enough to fully automated. But also it has enough of the time-saving. I think the code should not use more than the pursuit of advanced grammar, the purpose of all codes have only one, that is more efficient in dealing with repetitive work.

Published 43 original articles · won praise 85 · views 720 000 +

Guess you like

Origin blog.csdn.net/weixin_39274659/article/details/104495801