Python Challenge 第 2 关攻略:ocr

Python Challenge2 关攻略:ocr


题目地址
http://www.pythonchallenge.com/pc/def/ocr.html


题目内容

recognize the characters. maybe they are in the book,
but MAYBE they are in the page source.

General tips:
- Use the hints. They are helpful, most of the times.
- Investigate the data given to you.
- Avoid looking for spoilers.

Forums: Python Challenge Forums, read before you post.
IRC: irc.freenode.net #pythonchallenge

To see the solutions to the previous level, replace pc with pcc, i.e. go to: http://www.pythonchallenge.com/pcc/def/ocr.html


题目解法
提示要查看网页源代码。
发现源代码中有提示一串乱码,提示如下:
find rare characters in the mess below:

下面先爬取网页,用正则匹配出乱码,然后提取乱码中的字母并打印:

from urllib import request
import re

# 获取html源代码
url = 'http://www.pythonchallenge.com/pc/def/ocr.html'
response = request.urlopen(url)
text = str(response.read())

# 获取乱码字符串
pattern = re.compile(r'<!--(.+?)-->')
result = pattern.findall(text)
result = result[1]
# 去除换行符\n
result = result.replace(r'\n', '')

# 查找字母
characters = re.findall(r'[a-zA-Z]+', result)
msg = ''.join(characters)
print(msg)

得到结果 equality ,老方法修改 URL ,放入浏览器回车:
http://www.pythonchallenge.com/pc/def/equality.html

猜你喜欢

转载自blog.csdn.net/jpch89/article/details/81270637