cat test2.html | sed -e 's/\(^\|[^0-9]\)\(13[0-9][0-9]\{8\}\|14[579][0-9]\{8\}\|15[0-3,5-9][0-9]\{8\}\|16[6][0-9]\{8\}\|17[0135678][0-9]\{8\}\|18[0-9][0-9]\{8\}\|19[89][0-9]\{8\}\)\($\|[^0-9]\)/\nfind_phone:\2\n/g' | sed -e 's/\(^\|[^0-9]\)\([0-9]\{6\}[1-2][0-9]\{3\}\(\(0[1-9]\)\|\(10\|11\|12\)\)\(\([0-2][1-9]\)\|10\|20\|30\|31\)[0-9]\{3\}[0-9Xx]\)\($\|[^0-9]\)/\nfind_idcard:\2\n/g' | awk '/find_.*/{printf $1;printf "\t"}'
测试文件test2.html内容:
dddd
bbb131102198910084421ccc eee13611112222fff13133334444
h15855556666j
aaaa
13177778888
13199990000
18611112222
370785199507319527
测试结果:
find_idcard:131102198910084421 find_phone:13611112222 find_phone:13133334444 find_phone:15855556666 find_phone:13177778888 find_phone:13199990000 find_phone:18611112222 find_idcard:370785199507319527