The company's project application is soft, and I saw my colleague copying it line by line, and I helped him in 2 minutes with python

foreword

The company has been applying for soft titles and patents for several projects recently, and those who have applied for them know that when applying for soft titles, they
need to submit a word code.

When I first came here in the morning, I saw my colleagues at:

Run_申请软著();

When I passed by to pick up the water, I saw that my colleagues were still there:

Run_申请软著();

When passing by the toilet, I saw my colleague still in:

Run_申请软著();

When I was passing by for a meeting, I saw that my colleagues were still there:

Run_申请软著();

At noon, I saw my colleagues still at:

Run_申请软著();

Omitted 99+ times;

Omitted 99+ times;

When I got off work at night, I saw that my colleagues were still there:

Run_申请软著();

For simplicity of illustration, I've brought up the common part:

Run_申请软著():{
    
    
	项目=项目0
	
	loop(1):
	{
    
    
	start:
	
		//1.打开[项目]源码目录;
		//2.打开[项目]源码中的其他子目录;
		//3.找到[项目]中:{
    
    .c,.cpp,.h,...}源码;
		//3-1.打开找到的{
    
    .c,.cpp,.h,...}源码;
		//3-2.复制找到的{
    
    .c,.cpp,.h,...}源码;
		//3-3.粘贴找到的{
    
    .c,.cpp,.h,...}源码 到 word中;
		//4.调整word格式;
		//5.挨个删除每一行回车换行符;
		//6.挨个删除每一行空白的行;
		//7.保存
		
	end:
		jump start;
	}
}

Python: It's my turn

Python bull nose.

I prefer to analyze before doing things, so that when I actually do it, the error rate can also be reduced;

In fact, the process is as follows:

Run_申请软著():{
    
    
	项目=项目0
	
	loop(1):
	{
    
    
	start:
	
		//1.打开[项目]源码目录;
		//2.打开[项目]源码中的其他子目录;
		//3.找到[项目]中:{.c,.cpp,.h,...}源码;
		//3-1.打开找到的{.c,.cpp,.h,...}源码;
		//3-2.复制找到的{.c,.cpp,.h,...}源码;
		//3-3.粘贴找到的{.c,.cpp,.h,...}源码 到 word中;
		//4.调整word格式;
		//5.挨个删除每一行回车换行符;
		//6.挨个删除每一行空白的行;
		//7.保存
		
	end:
		jump start;
	}
}

The directory interface is roughly as follows:

zhenghui@zh-pc:/软著代码$ tree ./ |grep -E -v ".txt|.c|.h"
./
├── 项目A
│   ├── master
│   │   ├── a.c
│   │   ├── a.h
│   │   ├── b.cpp
│   ├── slave
│   │   ├── a.c
│   │   ├── a.h
│   │   ├── b.cpp
│   └── ui
│   │   ├── a.c
│   │   ├── a.h
│   │   ├── b.cpp
├── 项目B
│   ├── master
│   │   ├── a.c
│   │   ├── a.h
│   │   ├── b.cpp
│   ├── slave
│   │   ├── a.c
│   │   ├── a.h
│   │   ├── b.cpp
│   └── ui
│   │   ├── a.c
│   │   ├── a.h
│   │   ├── b.cpp
├── 项目C
│   ├── master
│   │   ├── a.c
│   │   ├── a.h
│   │   ├── b.cpp
│   ├── slave
│   │   ├── a.c
│   │   ├── a.h
│   │   ├── b.cpp
│   └── ui
│   │   ├── a.c
│   │   ├── a.h
│   │   ├── b.cpp
└── └── 

core:

1. The encoding type is different, because the encoding style was not uniform before, resulting in some encoding formats such as UTF-8, GB2312, Windows-1254, Windows-1252, GBK, etc., resulting in some data that python cannot directly parse , you need to manually specify the encoding format. God, so many codes, manually specify, don’t be exhausted, just check it, you can do this:

First judge the encoding type of the following files according to the file:

# 获取文件类型
def get_files_encoding_type(file_dir):
    # 判断文件的编码类型
    enc = ""
    with open(file_dir, 'rb') as file:
        # 验证该文件的字符编码类型
        encoding_message = chardet.detect(file.read())
        enc = encoding_message['encoding']
        # GB2312,GBK,GB18030,是兼容的,包含的字符个数:GB2312 < GBK < GB18030
        # "Windows-1254" and  "Windows-1252" 也统一让gb18030处理,后面需要抑制一下报错
        if enc == "GB2312" or enc == "GBK" or enc == "Windows-1254" or enc == "Windows-1252":
            enc = "gb18030"
    return enc

Then when reading the file, suppress the following error:

# 读源代码文件 errors='ignore':忽略报错
    file = open(read_dir, 'r', encoding=enc, errors='ignore')

2. There may be some .ini, .txt, makefile and other similar configuration files in the source code. I don’t need it, so I need to judge the following:

# 判断是否为想要的文件格式
def verify_file_type(file_path):
    flag = False
    # 判断文件后缀
    file_suffix = os.path.splitext(file_path)[-1]  # .c/.h/.cpp
    if file_suffix == ".h" or file_suffix == ".c" or file_suffix == ".cpp":
        flag = True
    return flag

Full code:


```python
# -*- coding: UTF-8 -*-
import os
import chardet

# 是否输出文件名
# _printf_src_name = False
_printf_src_name = True

# 获取文件类型
def get_files_encoding_type(file_dir):
    # 判断文件的编码类型
    enc = ""
    with open(file_dir, 'rb') as file:
        # 验证该文件的字符编码类型
        encoding_message = chardet.detect(file.read())
        enc = encoding_message['encoding']
        # GB2312,GBK,GB18030,是兼容的,包含的字符个数:GB2312 < GBK < GB18030
        # "Windows-1254" and  "Windows-1252" 也统一让gb18030处理,后面需要抑制一下报错
        if enc == "GB2312" or enc == "GBK" or enc == "Windows-1254" or enc == "Windows-1252":
            enc = "gb18030"
    return enc

# 读取文件并写入新文件
def read_and_write_file(read_dir, enc, save_file):

    # 是否在首行写入源代码名字
    if _printf_src_name:
        dirarray = read_dir.split("/")
        curr_save_file_name = dirarray[len(dirarray) - 1]

        # 第一行写入源代码文件的名字
        save_file.write(("//"+ curr_save_file_name +":").encode())
        save_file.write("\n".encode())

    # 读源代码文件 errors='ignore':忽略报错
    file = open(read_dir, 'r', encoding=enc, errors='ignore')
    for line in file:
        if (len(line) > 0):
            # 跳过空行
            if line.isspace() == False:
                # 替换换行符
                res = line.replace('\n', '')

                # write file
                save_file.write(res.encode())
                save_file.write("\n".encode())

# 判断是否为想要的文件格式
def verify_file_type(file_path):
    flag = False
    # 判断文件后缀
    file_suffix = os.path.splitext(file_path)[-1]  # .c/.h/.cpp
    if file_suffix == ".h" or file_suffix == ".c" or file_suffix == ".cpp":
        flag = True
    return flag

# 递归遍历目录
def traversal_files_save_txt(dir_path, save_file):
    for dir in os.listdir(dir_path):
        dir = os.path.join(dir_path, dir)
        # 判断当前目录是否为文件
        if os.path.isfile(dir):

            # 判断文件后缀
            if verify_file_type(dir):
                # 判断文件的编码类型,
                enc = get_files_encoding_type(dir)
                if enc == "":
                    continue;

                # 写文件
                read_and_write_file(dir, enc, save_file)

            else:
                print(dir, "不支持,该文件不是源代码文件")

        # 判断当前目录是否为文件夹
        if os.path.isdir(dir):
            traversal_files_save_txt(dir, save_file)


def traversal_files_to_txt(dir_path):
    for dir in os.listdir(dir_path):
        # 如果是目录就以目录的名字创建一个txt文件存储这个目录中的代码
        save_file_name = dir + ".txt"
        if _printf_src_name:
            save_file_name = dir + "-首行带源文件名.txt"

        save_dir = os.path.join(dir_path, dir)
        if os.path.isdir(save_dir):
            save_txt_file = os.path.join(dir_path, save_file_name)
            print("save_path=", save_dir, save_txt_file)

            # open save file text
            save_file = open(save_txt_file, 'wb+')

            # write .c .cpp .h to txt file
            traversal_files_save_txt(save_dir, save_file)


if __name__ == '__main__':
    dir_path = r'./软著代码/'
    traversal_files_to_txt(dir_path)

Guess you like

Origin blog.csdn.net/qq_17623363/article/details/127893950