每次学习新的语言,就想重新实现一遍做过的课设=,=
这里实现的是"综合性实验 Java源代码分析程序"的第二部分
第一部分见:[Python基础]统计指定目录中文件的个数和总的大小
原来的Java实现在这:
[Java]统计目录下Java源文件的关键字出现次数
题目
统计文件正文中某些字符串出现的次数
1.保存关键字
Java中关键字共有50个,存入字典,用keywords[‘class’]得到出现次数
keywords = {key: 0 for key in ["abstract", "assert", "boolean", "break", "byte",
"case", "catch", "char", "class", "const",
"continue", "default", "do", "double", "else",
"enum", "extends", "final", "finally", "float",
"for", "goto", "if", "implements", "import",
"instanceof", "int", "interface", "long", "native",
"new", "package", "private", " protected", "public",
"return", "strictfp", "short", "static", "super",
"switch", "synchronized", "this", "throw", "throws",
"transient", "try", "void", "volatile", "while"]}
2.搜索目录下的Java文件
这里沿用第一部分的代码,做一点小的修改
filelist = []
def search(path):
files = listdir(path)
for file in files:
filepath = path + "\\" + file
if op.isdir(filepath):
search(filepath)
else
if filepath.endswith('.java'):
filelist.append(filepath)
3.统计
def keyword_analyze(filepath):
with open(filepath) as file:
lines = file.read().split('\n') # 一次性读取一个文件,并用换行分割每一行
for line in lines:
noteline = re.match(r'^/(.*)|^\*(.*)|(.*)\*/$', line.strip(), flags=0) # 匹配以/*、* 、//开头 或*/结尾的注释行
if noteline is None: # 匹配为代码行
codeline = re.sub(r'//(.*)$|/\*(.*)|\"(.*)\"', '', line) # 去除行后注释及字符串常量的代码行
filterline = re.sub('\W', ' ', codeline) # 过滤行中'{ , } .+-='等字符
for key in filterline.split(' '):
if key in keywords.keys():
keywords[key] = keywords[key] + 1
else:
pass
4. 测试
root = r'E:\java\util' #根目录
search(root) #搜索文件加入到filelist
for f in filelist: keyword_analyze(f) #统计每个文件
#输出按值排序后的字典
print(sorted(keywords.items(), key=lambda x: x[1], reverse=True))
[('if', 9021), ('int', 7994), ('return', 7994), ('public', 6709), ('new', 4486), ('final', 4214), ('private', 3315), ('static', 3175), ('this', 3175), ('long', 2632), ('boolean', 2312), ('else', 2173), ('void', 2171), ('throw', 1893), ('super', 1611), ('import', 1449), ('for', 1349), ('extends', 1348), ('class', 1142), ('while', 833), ('break', 760), ('try', 592), ('throws', 586), ('case', 482), ('implements', 447), ('synchronized', 420), ('byte', 419), ('double', 403), ('instanceof', 402), ('catch', 374), ('package', 364), ('char', 321), ('abstract', 299), ('finally', 245), ('assert', 214), ('do', 203), ('default', 174), ('transient', 171), ('volatile', 168), ('continue', 134), ('interface', 134), ('float', 131), ('short', 104), ('switch', 65), ('native', 54), ('enum', 12), ('const', 0), ('goto', 0), (' protected', 0), ('strictfp', 0)]
用于测试的util文件夹下载:百度网盘
改进
在filterline.split(' ')
中会分割出很多空串,如果先去掉再判断是否为key,速度会得到提升