用python爬取快手评论（烂活新整）

前提，上次我用selenium写了一个抖音直播评论获取，这次烂活新整，用python发送post请求获取快手的视频评论！

1.首先打开网页版的快手

在网页里面按下F12，打开开发者模式，点击网络，查看Fetch/XHR。看看里面的请求。

找到一个叫graphql的请求。这个就是评论的请求。我们点击进去，然后查看预览。可以看到如下效果。

返回的是一个json数据，这下就好办了。我们现在只要模仿浏览器给快手服务器发送请求就行了。

2.发送请求给快手服务器

首先，我们来看一下这个是个什么请求，是个get请求还是post请求。（一般都是post）点击表头。查看请求。

我们可以看到，这是个post请求，请求url是https://www.kuaishou.com/graphql，发送post请求都是要带一些表单数据的。

看看表单数据要传输哪些数据

可以看到这个请求负载里的数据就是我们要传输的数据了，这个请求负载不同于Form data的。Form data传输的一个字典，而 Request Payload是一个json数据。这个有所不同，如果你此时还是传入一个dict数据的话，是不会返回数据的。必须传入json数据。那么我现在就可以用python发送请求了。做到这里一切都很顺利！！

3.用python发送请求

import requests
import json
headers = {
  'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36 Edg/94.0.992.31',
  'Cookie': 'kpf=PC_WEB; kpn=KUAISHOU_VISION; clientid=3; did=web_1e83306028e4691a0acb7f78ecfbf145; didv=1632459485000; client_key=65890b29; Hm_lvt_86a27b7db2c5c0ae37fee4a8a35033ee=1632459683; userId=153866369; kuaishou.server.web_st=ChZrdWFpc2hvdS5zZXJ2ZXIud2ViLnN0EqABSkrOjzdufZa4lBYXxddWpU-Da_CebI6GVHm2vvSBBxRUv9kJ_BKaJ3weX3FWf3acdLw2yy6uCM1MpHB8Pfi7EnJBlcRb0GqyXYMpPlaCdMqKlVo0PI-iY9zN0wdxA89wOXtml-fWR7CFTT54hsd3PZTsTwlWBA7vhJvwim07-A1RaTmwi66PbBkj7eCrkmJX0hdCln9MiFVyA_CL44TScBoSuDcrlwmr6APhXfdZrBO5uo0FIiA8xE-BzwPs3Wp_Q9mI4y5GcZxo1E-B0xr4CkR4zohqJigFMAE; kuaishou.server.web_ph=eeb2272fa742f8d8f79bd635c3e8044ac1f8'}
data = {
  "operationName": "commentListQuery",
  "variables": {
    "photoId": "3xhcrvum26yz3vg",
    "pcursor": ''
  },
  "query": "query commentListQuery($photoId: String, $pcursor: String) {\n visionCommentList(photoId: $photoId, pcursor: $pcursor) {\n   commentCount\n   pcursor\n   rootComments {\n     commentId\n     authorId\n     authorName\n     content\n     headurl\n     timestamp\n     likedCount\n     realLikedCount\n     liked\n     status\n     subCommentCount\n     subCommentsPcursor\n     subComments {\n       commentId\n       authorId\n       authorName\n       content\n       headurl\n       timestamp\n       likedCount\n       realLikedCount\n       liked\n       status\n       replyToUserName\n       replyTo\n       __typename\n     }\n     __typename\n   }\n   __typename\n }\n}\n"
}
conment = requests.post('https://www.kuaishou.com/graphql', headers=headers, json=data)
conments = json.loads(conment.text)
print(conments)

此时，就一个成功返回json数据了。

但是，好像有个问题，只有28条，返回数据不全，what，fuck！，看来是遇到问题了。

没关系，我又重新运行了几次，还是不行。我又回到快手页面。又打开开发者模式。开始找问题。突然我发现，有好多个叫graphql的请求。此时，我就猜想，是不是这个是动态加载的，你将评论往下滑，它才会加载。取好几个graphql的请求数据进行对比，发现只有photoId和pcursor其中的数据不同。经过我的仔细查找发现，这个photoId就是这个视频的id，那个网页地址上都有。不同的视频photold才不同。行了，photold解决了，那pucursor呢，这个是个什么鬼？我看了几个graphql的传输的数据，除了第一个graphql的pucursor是空的。其他的都全是数字。然后我打印了那返回不全的json的数据。发现了一个惊人的秘密。pucursor里面数据是上一个graphql返回的数据里面最后一个人的id。解密了！！！，他是动态渲染的，pucursor接收上一个发评论的人的id来发送下一个请求，从而请求数据。

4.最后来请求完整数据

先写好主程序，最后来个递归就解决一切，评论顺利到手！！！

import requests
import json
import sys
lists = []
def text(w):
 ww =0
 list = []
 headers = {
   'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36 Edg/94.0.992.31',
   'Cookie': 'kpf=PC_WEB; kpn=KUAISHOU_VISION; clientid=3; did=web_1e83306028e4691a0acb7f78ecfbf145; didv=1632459485000; client_key=65890b29; Hm_lvt_86a27b7db2c5c0ae37fee4a8a35033ee=1632459683; userId=153866369; kuaishou.server.web_st=ChZrdWFpc2hvdS5zZXJ2ZXIud2ViLnN0EqABSkrOjzdufZa4lBYXxddWpU-Da_CebI6GVHm2vvSBBxRUv9kJ_BKaJ3weX3FWf3acdLw2yy6uCM1MpHB8Pfi7EnJBlcRb0GqyXYMpPlaCdMqKlVo0PI-iY9zN0wdxA89wOXtml-fWR7CFTT54hsd3PZTsTwlWBA7vhJvwim07-A1RaTmwi66PbBkj7eCrkmJX0hdCln9MiFVyA_CL44TScBoSuDcrlwmr6APhXfdZrBO5uo0FIiA8xE-BzwPs3Wp_Q9mI4y5GcZxo1E-B0xr4CkR4zohqJigFMAE; kuaishou.server.web_ph=eeb2272fa742f8d8f79bd635c3e8044ac1f8'}
 data = {
   "operationName": "commentListQuery",
   "variables": {
     "photoId": "3xhcrvum26yz3vg",
     "pcursor": str(w)
  },
   "query": "query commentListQuery($photoId: String, $pcursor: String) {\n visionCommentList(photoId: $photoId, pcursor: $pcursor) {\n   commentCount\n   pcursor\n   rootComments {\n     commentId\n     authorId\n     authorName\n     content\n     headurl\n     timestamp\n     likedCount\n     realLikedCount\n     liked\n     status\n     subCommentCount\n     subCommentsPcursor\n     subComments {\n       commentId\n       authorId\n       authorName\n       content\n       headurl\n       timestamp\n       likedCount\n       realLikedCount\n       liked\n       status\n       replyToUserName\n       replyTo\n       __typename\n     }\n     __typename\n   }\n   __typename\n }\n}\n"
}
 conment = requests.post('https://www.kuaishou.com/graphql', headers=headers, json=data)
 conments = json.loads(conment.text)
 #print(conments['data'])
 s = len(conments['data']['visionCommentList']['rootComments']) - 1
 #print(conments['data']['visionCommentList']['rootComments'][s]['commentId'])
 for ii in range(0,s):
   #print(len(conments['data']['visionCommentList']['rootComments']))
   #print(conments['data']['visionCommentList']['rootComments'][s])
   print(conments['data']['visionCommentList']['rootComments'][ii]['content'])
 www = conments['data']['visionCommentList']['rootComments'][ii]['content']
 w = conments['data']['visionCommentList']['rootComments'][ii]['commentId']
 if len(lists)!=-1 :
   lists.append(www)
   print('111111')
   if len(lists)!=1:
     a = lists[-2]
     if lists[-1] == a and len(lists) != 1:
       print("评论已经爬完")
       sys.exit()
 for a in range(0,len(list)):
   ww +=list[a]
   print(ww)
 return w
w = ''
while w!=1:
 w = text(w)

兄弟们，今天就到这里了，剩余就靠你们自己发挥了，（偷偷告诉你们哦，这个不止可以对快手哦，好多网站都可以这样）。但是还是遵守法律法规。下次教兄弟们怎么用python群发消息，也是post请求哦！