[Python编程]统计目录下Java源文件的关键字出现次数

每次学习新的语言,就想重新实现一遍做过的课设=,=
这里实现的是"综合性实验 Java源代码分析程序"的第二部分
第一部分见:[Python基础]统计指定目录中文件的个数和总的大小
原来的Java实现在这:
[Java]统计目录下Java源文件的关键字出现次数

题目

统计文件正文中某些字符串出现的次数

1.保存关键字

Java中关键字共有50个,存入字典,用keywords[‘class’]得到出现次数

keywords = {key: 0 for key in ["abstract", "assert", "boolean", "break", "byte",
                               "case", "catch", "char", "class", "const",
                               "continue", "default", "do", "double", "else",
                               "enum", "extends", "final", "finally", "float",
                               "for", "goto", "if", "implements", "import",
                               "instanceof", "int", "interface", "long", "native",
                               "new", "package", "private", " protected", "public",
                               "return", "strictfp", "short", "static", "super",
                               "switch", "synchronized", "this", "throw", "throws",
                               "transient", "try", "void", "volatile", "while"]}

2.搜索目录下的Java文件

这里沿用第一部分的代码,做一点小的修改

filelist = []
def search(path):
    files = listdir(path)
    for file in files:
        filepath = path + "\\" + file
        if op.isdir(filepath):
            search(filepath)
        else
            if filepath.endswith('.java'):
                filelist.append(filepath)

3.统计

def keyword_analyze(filepath):
    with open(filepath) as file:
        lines = file.read().split('\n')  # 一次性读取一个文件,并用换行分割每一行
        for line in lines:
            noteline = re.match(r'^/(.*)|^\*(.*)|(.*)\*/$', line.strip(), flags=0)  # 匹配以/*、* 、//开头 或*/结尾的注释行
            if noteline is None:  # 匹配为代码行
                codeline = re.sub(r'//(.*)$|/\*(.*)|\"(.*)\"', '', line)  # 去除行后注释及字符串常量的代码行
                filterline = re.sub('\W', ' ', codeline)  # 过滤行中'{ , } .+-='等字符
                for key in filterline.split(' '):
                    if key in keywords.keys():
                        keywords[key] = keywords[key] + 1
            else:
                pass

4. 测试

root = r'E:\java\util' #根目录
search(root) #搜索文件加入到filelist
for f in filelist: keyword_analyze(f) #统计每个文件
#输出按值排序后的字典
print(sorted(keywords.items(), key=lambda x: x[1], reverse=True))
[('if', 9021), ('int', 7994), ('return', 7994), ('public', 6709), ('new', 4486), ('final', 4214), ('private', 3315), ('static', 3175), ('this', 3175), ('long', 2632), ('boolean', 2312), ('else', 2173), ('void', 2171), ('throw', 1893), ('super', 1611), ('import', 1449), ('for', 1349), ('extends', 1348), ('class', 1142), ('while', 833), ('break', 760), ('try', 592), ('throws', 586), ('case', 482), ('implements', 447), ('synchronized', 420), ('byte', 419), ('double', 403), ('instanceof', 402), ('catch', 374), ('package', 364), ('char', 321), ('abstract', 299), ('finally', 245), ('assert', 214), ('do', 203), ('default', 174), ('transient', 171), ('volatile', 168), ('continue', 134), ('interface', 134), ('float', 131), ('short', 104), ('switch', 65), ('native', 54), ('enum', 12), ('const', 0), ('goto', 0), (' protected', 0), ('strictfp', 0)]

用于测试的util文件夹下载:百度网盘

改进

filterline.split(' ')中会分割出很多空串,如果先去掉再判断是否为key,速度会得到提升

猜你喜欢

转载自blog.csdn.net/xHibiki/article/details/83788597