Background: The association and Teach for China need to recruit volunteers and use the questionnaire star to post a questionnaire. There are many questions, including a multiple-choice question, asking about the volunteer's free time arrangement. Finally, it needs to be arranged according to the volunteer's time, but The downloaded form is a headache
Need to select all people in the same time period,
so I wrote a small script in python and recorded it
using jupyter notebook
import pandas as pd data = pd.read_excel('C:\\Users\\dell\\Desktop\\aaa.xlsx', encoding ='utf-8') #Read the original data, specify UTF-8 encoding (required Use a text editor to convert the data to UTF-8 encoding) data.drop_duplicates(['5, please enter your QQ account','4, please enter your mobile phone number:'],keep='last') # Delete data["1, your name:"]=data["1, your name:"]+"("+data["6, the subject you want to teach"]+")" data.set_index (["1, your name:"], inplace=True) #Specify the column index as name + intention course 12345678
#data_ is the df for processing data. According to special characters, this large column is divided into several small columns data_ = data["8. The teaching time you can participate in (you must enter the classroom twenty minutes before the class)"].str.split ('┋',expand=True) data_ 123
#Find all unique time periods and store them in the list thelist =[] for index, row in data_.iterrows(): #traverse each row for i in range(0,6): #each column if row[i]! ="None" and row[i]!=None and (row[i] not in thelist): thelist.append(row[i]) 12345678
from pandas.core.frame import DataFrame y=[]#The empty list stores the names and wishes of volunteers in a certain period of time for x in range(0,13):#The above 13 periods of time for index, row in data_.iterrows (): # traverse for i in range(0,6): if row[i]==thelist[x]:#Both time periods are the same # print(index) y.append(index)#Add index to y (Index is name + intention) c={thelist[x]:y} #Turn into a dictionary cc=DataFrame(c)#Turn into df data_ = data_.reset_index()#Reset the index to the default index 01234... data_ =pd.concat([cc,data_], axis=1) #Merge the above df into data_data_.set_index(["1, your name:"], inplace=True)#Replace the index back to your name +意向 c={}# Set back to the new list dictionary, loop the above, only all time periods are merged into data_ y=[] data_ 12345678910111213141516
Under the time is the name + intention, under 0123 is the time, etc. Delete
del data_[0]#I’m too lazy to check if I can delete it, CV is very simple del data_[1] del data_[2] del data_[3] del data_[4] del data_[5] del data_[6] data_ = data_.reset_index(drop=True) #Set the index as the default index, and do not retain (drop=True) data_.to_excel('C:\\Users\\dell\\Desktop\\bbb.xlsx', encoding = ' utf-8',index=False) #Save to the desktop, remove the first column index (index=False)
Click here to get the complete project