Common pocketing mechanisms and solutions

Common pocketing mechanisms and solutions

Summed up the anti-climb mechanism or ideas and solutions encountered in the work, to facilitate future use

1, User-Agent, client version information
2, request, Method different ways, common GET, POST
+ post,有下面这种色儿的:
  formData = {
      '__EVENTVALIDATION': eventAliation,
      '__VIEWSTATE': viewState,
      '__EVENTTARGET': eventTaget,
      # 'pageIndex': int(pageIndex) + 1,
  }
3, cookie limit
4, access frequency, latency access
5, IP, IP Agent
6, hands and feet in html, jquery
+ 加一些无意义的字符
+ 使用lxml解, 正则过滤 或 其他筛选方法
+ 源码查看目标信息,根据实际规则 过滤出 自己的目标信息
+ 
7, Ajax dynamic loading of the specific information in json
+ 直接json.loads(html.text) 加载、解析,简单
8, the US group lines, numbers, text font encryption woff
+ 找到目标woff文件,加载出字体库的内容,用QQ截图、识图,识别文字
+ 加载 `from fontTools.ttLib import TTFont`,构造字典,
+ 在获取网页内容后,理解替换掉加密部分,
+ 再xpath解析 或 其他方式解析
+ 
+ 听说有每个子页面都是使用新的woff字体库,这种比较变态的反爬,还未遇到,遇到的时候再说。。。。
++




Encountered anti-climb, but not solved record

1, to obtain public comment Store Contact phone, you need to log in, session was closed, not a request to the destination page, this is not the only multiple accounts can be resolved?
2,58 query recruitment information, can not use a proxy IP request to the content ,,,, this is perhaps the proxy IP pool is not big enough, then take a look at the subsequent optimization
3,58 gesture codes. . . This is because no time to engage in the back ,,, and see what the situation. . . .
4, sliders break, the general background images and pictures do comparison gap
+ 但是 58 的就只有带缺口的图片,这就给定位缺口位置带来了麻烦,之前都是使用像素对比,现在咋弄???
ds
5, Sogou platform CAPTCHA image can not be used ocr resolved? ? It is not the picture is too small? ? ? Learning to be ,,,,,
sad




The famous anti-climb mechanism, have not met the record

Honey Pot
Published 85 original articles · won praise 27 · views 160 000 +

Guess you like

Origin blog.csdn.net/qq_22038327/article/details/104003158