[Study Notes] Python Office Automation - Task 01 Automatic File Processing & Automatic Email Sending

In this learning process, I mainly try to use practice instead of copying, and use and be proficient in various basic operation methods in the process of handwriting codes to solve specific problems.
This note mainly records the ideas, problems encountered and solutions when coding to solve the exercises.

Problem 1: Generating Random Quiz Question Papers
Suppose you are a geography teacher with 35 students in your class and you want to give a quiz on the US state capitals. Unfortunately, with a few bad guys in the class, you can't be sure that students won't cheat. You want to shuffle the order of the questions so that each test is unique so that no one can copy answers from anyone else. Of course, doing this by hand is time-consuming and boring. Fortunately, you know some Python.

import os
import random

#America 各州对应首府字典
capitals = {
    
    'Alabama': 'Montgomery', 'Alaska': 'Juneau', 'Arizona': 'Phoenix',
'Arkansas': 'Little Rock', 'California': 'Sacramento', 'Colorado': 'Denver',
'Connecticut': 'Hartford', 'Delaware': 'Dover', 'Florida': 'Tallahassee',
'Georgia': 'Atlanta', 'Hawaii': 'Honolulu', 'Idaho': 'Boise', 'Illinois':
'Springfield', 'Indiana': 'Indianapolis', 'Iowa': 'Des Moines', 'Kansas':
'Topeka', 'Kentucky': 'Frankfort', 'Louisiana': 'Baton Rouge', 'Maine':
'Augusta', 'Maryland': 'Annapolis', 'Massachusetts': 'Boston', 'Michigan':
'Lansing', 'Minnesota': 'Saint Paul', 'Mississippi': 'Jackson', 'Missouri':
'Jefferson City', 'Montana': 'Helena', 'Nebraska': 'Lincoln', 'Nevada':
'Carson City', 'New Hampshire': 'Concord', 'New Jersey': 'Trenton', 'New Mexico': 'Santa Fe', 'New York': 'Albany', 'North Carolina': 'Raleigh',
'North Dakota': 'Bismarck', 'Ohio': 'Columbus', 'Oklahoma': 'Oklahoma City',
'Oregon': 'Salem', 'Pennsylvania': 'Harrisburg', 'Rhode Island': 'Providence',
'South Carolina': 'Columbia', 'South Dakota': 'Pierre', 'Tennessee':
'Nashville', 'Texas': 'Austin', 'Utah': 'Salt Lake City', 'Vermont':
'Montpelier', 'Virginia': 'Richmond', 'Washington': 'Olympia', 'West Virginia': 'Charleston', 'Wisconsin': 'Madison', 'Wyoming': 'Cheyenne'}

# 获取各州和对应首府的名称列表
questions = list(capitals.keys())
answers = list(capitals.values())

#创建存放试卷和答案的文件夹
if os.path.exists('./papers')==False:
    os.makedirs('./papers')
if os.path.exists('./keys')==False:
    os.makedirs('./keys')

#为第stu(1-35)位同学生成试卷
for stu in range(1,36):
    random.shuffle(questions)
    #新建试卷文件paper[stu]和对应答案文件key
    paper_name = 'paper['+str(stu)+'].txt'
    key_name = 'key['+str(stu)+'].txt'
    paper = open(os.path.join('./papers',paper_name),'w',encoding='utf-8')
    key = open(os.path.join('./keys',key_name),'w',encoding='utf-8')
    #向paper写入50道试题和对应选项
    for i in range(1,51):
        paper.write(str(i)+'.where is the capital of '+questions[i-1]+'?\n')  #第i个题目
        sele_num = ['A','B','C','D']                #选项序号
        selections = []                             #选项内容列表
        right_ansewer = capitals[questions[i-1]]    #正确答案
        selections.append(right_ansewer)            #将正确答案加入选项列表

        # # 在控制台输出题目和正确答案
        # print(str(i)+'.where is the capital of '+questions[i-1]+'?')
        # print('answer is:'+right_ansewer+'\n')
		
		#随机打乱答案顺序
		random.shuffle(answers)
        #添加3个随机错误答案进入选项列表
        for _ in range(0,50):
            if len(selections) == 4:
                break
            if answers[_] != right_ansewer:
                selections.append(answers[_])
        random.shuffle(selections)  #打乱答案排序
        right_ansewer_num = selections.index(right_ansewer)     #保存正确答案序号
        #输出4个选项
        for _ in range(0,4):
            paper.write(sele_num[_]+'.'+selections[_]+'\n')
        #向答案文件中写入第i题的答案
        key.write('第'+str(i)+'道题答案为:'+sele_num[right_ansewer_num]+'\t'+right_ansewer+'\n')
    #关闭试卷和答案文件
    paper.close()
    key.close()
1. Topic idea:
(1) First create two lists of questions (state names) and answers (capital names).
(2) Before generating each test paper, shuffle the list of questions, and then generate 50 questions in order.
(3) Every time a question is generated, shuffle the answer list, create an option list (including 4 answers, find out one correct one through the dictionary, and find 3 wrong ones according to the answer list), and scramble the option list.
(4) Find the serial number of the correct answer from the answer list and write it into the answer file of the corresponding test paper.

2. The use of slashes and backslashes as local path separators:
(1) Slash: [/] [can be recorded as a slash with a positive slope]      can only be used in windows
         backslash: [\] [can be recorded as a slash with a negative slope]     can be used in windows and unix
(2) In Python programming, we can directly use all slashes [/] to divide the address path, such as `D:/Python/data`, which can be used on both windows and unix platforms. If you use a backslash [\] to mark the path in the windows environment, you need to use double backslashes such as `D:\\Python\\data`, the first of which is used for escaping.

Topic 2: Generate Random Quiz Files
Write a program that walks a directory tree looking for files with a specific extension (such as .pdf or .jpg). Wherever these files are located, copy them to a new folder.

import os
import shutil

rootPath = 'F:\\pycharm\\work\\OfficeAutomation'    #设置遍历根路径
fileType = 'txt'                                    #设置遍历文件后缀类型
destination = 'F:\\copy'                            #设置文件复制后地址
for curFolder,subFolder,files in os.walk(rootPath):
    # print('The current folder is ' + curFolder)
    for file in files:
        if(file.split('.')[-1]==fileType):
            print("copy "+os.path.join(curFolder,file)+" to "+destination+"...")
            shutil.copy(os.path.join(curFolder,file),destination)



1. Topic idea:
(1) Use os.walk() to traverse each file in the root directory and subdirectories
(2) Use the string split method split to split the suffix name, if necessary, copy the file to the corresponding file through the shutil.copy method target position

2. The target folder in the destination is best manually generated in advance, otherwise it will become an unknown document without a suffix after copying it.
3. Be careful not to set the destination under the rootPath, otherwise the files that have just been copied in will also be traversed, resulting in the error of shutil.SameFileError being reported when copying itself to the original location


Topic 3: It is not uncommon for some unnecessary and huge files or folders to take up hard disk space. If you're trying to free up space on your computer, deleting unwanted huge files works best. But first you have to find them. Write a program that walks a directory tree looking for particularly large files or folders, say, over 100MB (recall that to get the size of a file, you can use the os module's) os.path.getsize(). Print the absolute paths of these files to the screen.

import os
import send2trash

rootPath = 'F:\\pycharm\\work\\OfficeAutomation'    #设置遍历根路径
threshold = pow(2,20)*100                           #删除大于阈值(100M)的文件

for curFolder,subFolder,files in os.walk(rootPath):
    for file in files:
        fileSize = os.path.getsize(os.path.join(curFolder,file))
        if(fileSize>threshold):
            print("The size of "+os.path.join(curFolder,file)+" is "+str(fileSize)+",which > "+str(threshold)+" , deleting...")
            send2trash.send2trash(os.path.join(curFolder,file))



1. Topic idea:
(1) Use os.walk() to traverse each file in the root directory and subdirectories
(2) Use the os.path.getsize() method to judge the relationship between the file size and the set threshold, if it is greater than the threshold, print and delete

2. The unit of the file size obtained by the os.path.getsize() method is Byte. The unit we usually use is 1MB = 2^10KB = 2 ^20Byte. Pay attention to the conversion relationship when comparing.


Topic 4: Write a program to find all files with a specified prefix in a folder, such as spam001.txt, spam002.txt, etc., and locate the missing number (for example, spam001.txt and spam003.txt exist, but not spam002.txt). Have the program rename all subsequent files, eliminating missing numbers. As an added challenge, write another program that, among consecutively numbered files, leaves some numbers empty so new files can be added.

import os
import shutil

rootPath = 'F:\\pycharm\\work\\OfficeAutomation\\files'    #设置遍历根路径
prefix = 'spam'                                            #设置文件名前缀
fileList = []                                              #旧文件名列表
fileList_New = []                                          #新文件名列表

# #生成测试文件
# for i in range(11,21):
#     file = open(os.path.join(rootPath,'spam{:03d}.txt'.format(i)),'w')
#     file.write('Name of this file is spam{:03d}.txt'.format(i))
# file.close()

count = 1

for name in os.listdir(rootPath):                           #遍历rootPath下所有文件或文件夹
    if os.path.isfile(os.path.join(rootPath,name)):         #找到文件类型的文件(非文件夹)
        if name.startswith(prefix):                         #将所有含有前缀名的文件名加入旧列表
            fileList.append(name)
            newName = prefix+'{:03d}'.format(count)+'.txt'  #将当前文件的正确文件名加入新列表
            fileList_New.append(newName)
            count+=1
fileList.sort()                                             #将旧文件名列表排序

#将错误序号之后的所有文件夹重命名为正确的文件名
for old,new in zip(fileList,fileList_New):
    if(old!=new):
        print("change the name ["+old+"] to ["+new+"]")
        shutil.move(os.path.join(rootPath,old),os.path.join(rootPath,new))



1. Topic idea:
(1) Use os.listdir() to get every file in the current path (excluding folders)
(2) If the file matches the prefix name, it will be added to the old naming list, and the correct name corresponding to the file will be added to the new naming list
( 3) Traverse the old and new file lists, and use the shutil.move method to correct the name if they are not equal.

2. for a,b in zip(X,Y) is a common traversal method that can improve efficiency and can be used more in the future.


Guess you like

Origin blog.csdn.net/qq_41794040/article/details/117966446