In the previous section, we already know how to use Fiddler for packet capture analysis, so let's start to complete a simple small example
Grab the comment content of Tencent Video
First, we open the official website of Tencent Video https://v.qq.com/
Let's open the column of [TV series] and find a more exciting TV series to crawl. For example, let's crawl the [Next stop, leave].
We found the comments on this TV series as follows:
We see the marked part in the picture above [see more comments]
We first use the command clear in Fiddelr to clear the previously browsed records
Enter the command and press Enter
Then we click [View More Comments], and then look at Fiddler again, we can see the small icon [JS]
We right-click the marked part of the image above
first click
Then【Copy】---【Just Url】
The address is:
How about we put this address in the browser to see the effect?
Now that we haven't found any patterns, let's click again
second click
browser display
After we put the above two addresses in word for analysis, the analysis results are as follows:
We can see that the yellow marked part is irregular, and finally the red mark is incremented by 1
So let's verify whether the yellow marked part is necessary? Delete the yellow part and execute it in the browser, check the result, whether the yellow part has the same result, then we will start the code next
import urllib.request import re import urllib.error headers=("User_Agent","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0") #自定义opener opener = urllib.request.build_opener() opener.addheaders = [headers] urllib.request.install_opener(opener) cursor_id = '6394260346548095809' v_id =1524402700840 url = "https://video.coral.qq.com/varticle/2580302776/comment/v2?callback=_varticle2580302776commentv2&orinum=10&oriorder=o&pageflag=1&cursor="+cursor_id+"&scorecursor=0&orirepnum=2&reporder=o&reppageflag=1&source=9&_="+str(v_id) for i in range(0,10): content = urllib.request.urlopen(url).read().decode("utf-8") patnext = '"last":"(.*?)"' nextid = re.compile(patnext).findall(content)[0] patcomment = '"content":"(.*?)",' comment_content = re.compile(patcomment).findall(content) for j in range(1,len(comment_content)): print (" -----The content of the "+str(i)+str(j)+" comment is: ") #print(eval("u"+"\'"+comment_content[j]+"\ '")) try: t1 = comment_content[j].encode('latin-1').decode('unicode_escape') print(t1) except Exception as e: print (" ***********This comment contains special characters ************ ") url="https://video.coral.qq.com/varticle/2580302776/comment/v2?callback=_varticle2580302776commentv2&orinum=10&oriorder=o&pageflag=1&cursor="+nextid+"&scorecursor=0&orirepnum=2&reporder=o&reppageflag=1&source=9&_="+str(v_id+i)