Python Challenge
第 4
关攻略:follow the chain
题目地址
http://www.pythonchallenge.com/pc/def/linkedlist.php
题目内容
题目解法
- 网页的标题是
follow the chain
追随链条 - 网页的
URL
地址是linkedlist
链表 - 图中也是链条
首先查看网页源代码,发现注释:
<!-- urllib may help. DON'T TRY ALL NOTHINGS, since it will never
end. 400 times is more than enough. -->
提示使用 urllib
库,还说不要尝试所有的 nothings
,因为它永远不会结束, 400
次足够了。
发现 a
标签里有链接,点击图片跳转,得到如下的 URL
:
http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345
发现网页内容是 and the next nothing is 44827
把 nothing
改了应该可以继续跳转,下面用 urllib
库试一下,获取 400
次会有什么样的响应。
from urllib.request import urlopen
import re
suffix = '12345'
contents = []
contents.append(suffix + '\n')
url = f'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing={suffix}'
for i in range(400):
response = urlopen(url)
html = str(response.read())
content = re.search(r"'(.+)'", html).group(1)
try:
suffix = re.search(r'\d+', content).group()
except:
print(content)
break
contents.append(suffix + '\n')
url = f'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing={suffix}'
print(suffix)
发现获取到一半的时候报错了,加入异常处理,然后打印出出错的内容,得到:
Yes. Divide by two and keep going.
即没错,除以二然后继续。
那么把初始的 URL
设置为它的上一个数字除以二,即 16044 / 2 = 8022
, 继续循环。
相应地修改代码如下:
from urllib.request import urlopen
import re
suffix = '12345'
contents = []
contents.append(suffix + '\n')
url = f'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing={suffix}'
for i in range(400):
response = urlopen(url)
html = str(response.read())
content = re.search(r"'(.+)'", html).group(1)
try:
suffix = re.search(r'\d+', content).group()
except:
print(content)
contents.append(content + '\n')
suffix = str(int(int(suffix) / 2))
contents.append(suffix + '\n')
url = f'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing={suffix}'
print(suffix)
发现又报错了,报错之前的数字后缀是 82683
,于是访问以下网址:
http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=82683
发现提示信息: You've been misleaded to here. Go to previous one and check.
即你被误导到这里了,返回上一页检查。
按照提示返回上一页:http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=82682
网页显示: There maybe misleading numbers in the text. One example is 82683. Look only for the next nothing and the next nothing is 63579
即文本中可能存在误导数字。一个例子就是 82683
。
所以为了提取正确的数字,需要修改正则表达式,另外我加入了文本写入的代码,方便以后查看,修改后的代码如下:
from urllib.request import urlopen
import re
suffix = '12345'
contents = []
contents.append(suffix + '\n')
url = f'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing={suffix}'
for i in range(400):
response = urlopen(url)
html = str(response.read())
content = re.search(r"'(.+)'", html).group(1)
try:
suffix = re.search(r'next nothing is (\d+)', content).group(1)
except:
print(content)
contents.append(content + '\n')
suffix = str(int(int(suffix) / 2))
contents.append(suffix + '\n')
url = f'http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing={suffix}'
print(suffix)
with open('level4.txt', 'w', encoding = 'utf-8') as fp:
fp.writelines(contents)
检查输出,发现 peak.html
,修改网址,进入下一关:
http://www.pythonchallenge.com/pc/def/peak.html