[Python3 crawler] 16_ Grab Tencent video comment content

In the previous section, we already know how to use Fiddler for packet capture analysis, so let's start to complete a simple small example

Grab the comment content of Tencent Video

First, we open the official website of Tencent Video https://v.qq.com/

image

Let's open the column of [TV series] and find a more exciting TV series to crawl. For example, let's crawl the [Next stop, leave].

We found the comments on this TV series as follows:


image

We see the marked part in the picture above [see more comments]

We first use the command clear in Fiddelr to clear the previously browsed records

image

Enter the command and press Enter

Then we click [View More Comments], and then look at Fiddler again, we can see the small icon [JS]

image

We right-click the marked part of the image above

first click

Then【Copy】---【Just Url】

The address is:

https://video.coral.qq.com/varticle/2580302776/comment/v2?callback=_varticle2580302776commentv2&orinum=10&oriorder=o&pageflag=1&cursor=6392930402023585386&scorecursor=0&orirepnum=2&reporder=o&reppageflag=1&source=9&_=1524713312689

How about we put this address in the browser to see the effect?

image

Now that we haven't found any patterns, let's click again

second click

The address is: https://video.coral.qq.com/varticle/2580302776/comment/v2?callback=_varticle2580302776commentv2&orinum=10&oriorder=o&pageflag=1&cursor=6394261147223571180&scorecursor=0&orirepnum=2&reperorder=o&reppageflag=13&source=996&reppageflag=13&source=994261147223571180

browser display

image

After we put the above two addresses in word for analysis, the analysis results are as follows:

image

We can see that the yellow marked part is irregular, and finally the red mark is incremented by 1

So let's verify whether the yellow marked part is necessary? Delete the yellow part and execute it in the browser, check the result, whether the yellow part has the same result, then we will start the code next

import urllib.request
import re
import urllib.error
headers=("User_Agent","Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0")
#自定义opener
opener = urllib.request.build_opener()
opener.addheaders = [headers]
urllib.request.install_opener(opener)

cursor_id = '6394260346548095809'
v_id =1524402700840
url = "https://video.coral.qq.com/varticle/2580302776/comment/v2?callback=_varticle2580302776commentv2&orinum=10&oriorder=o&pageflag=1&cursor="+cursor_id+"&scorecursor=0&orirepnum=2&reporder=o&reppageflag=1&source=9&_="+str(v_id)
for i in range(0,10):
	content = urllib.request.urlopen(url).read().decode("utf-8")
	patnext = '"last":"(.*?)"'
	nextid = re.compile(patnext).findall(content)[0]
	patcomment = '"content":"(.*?)",'
	comment_content = re.compile(patcomment).findall(content)
	for j in range(1,len(comment_content)):
		print (" -----The content of the "+str(i)+str(j)+" comment is: ")
		 #print(eval("u"+"\'"+comment_content[j]+"\ '"))
		try:
			t1 = comment_content[j].encode('latin-1').decode('unicode_escape')
			print(t1)
		except Exception as e:
			print (" ***********This comment contains special characters ************ ")
	url="https://video.coral.qq.com/varticle/2580302776/comment/v2?callback=_varticle2580302776commentv2&orinum=10&oriorder=o&pageflag=1&cursor="+nextid+"&scorecursor=0&orirepnum=2&reporder=o&reppageflag=1&source=9&_="+str(v_id+i)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324907655&siteId=291194637