Python Challenge
第 2
关攻略:ocr
题目地址
http://www.pythonchallenge.com/pc/def/ocr.html
题目内容
recognize the characters. maybe they are in the book,
but MAYBE they are in the page source.
General tips:
- Use the hints. They are helpful, most of the times.
- Investigate the data given to you.
- Avoid looking for spoilers.
Forums: Python Challenge Forums, read before you post.
IRC: irc.freenode.net #pythonchallenge
To see the solutions to the previous level, replace pc with pcc, i.e. go to: http://www.pythonchallenge.com/pcc/def/ocr.html
题目解法
提示要查看网页源代码。
发现源代码中有提示和一串乱码,提示如下:
find rare characters in the mess below:
下面先爬取网页,用正则匹配出乱码,然后提取乱码中的字母并打印:
from urllib import request
import re
# 获取html源代码
url = 'http://www.pythonchallenge.com/pc/def/ocr.html'
response = request.urlopen(url)
text = str(response.read())
# 获取乱码字符串
pattern = re.compile(r'<!--(.+?)-->')
result = pattern.findall(text)
result = result[1]
# 去除换行符\n
result = result.replace(r'\n', '')
# 查找字母
characters = re.findall(r'[a-zA-Z]+', result)
msg = ''.join(characters)
print(msg)
得到结果 equality
,老方法修改 URL
,放入浏览器回车:
http://www.pythonchallenge.com/pc/def/equality.html