python reptile's font anti-climb

First, what is the font anti-climb?

  Font anti-climb is the key data corresponding to other Unicode encoding, the browser uses the font that comes with page load critical data files, normal, and when we copy and paste the data, crawling operation, or use standard Unicode character mapping is parsed interference data, to Cat's Eye film as an example:

 

 The figure shows that the browser rendering data show that in normal debugging interface incorrect data, even if we copy and paste the same way (speculation copy and paste the Unicode encoding) is displayed, so that played a counter-climbing effect.

Second, the solution

  1, find the corresponding font file

 Click on the arrow pointing to the css file

 Arrow pointing to a link that we have to look for font files, we have to download the font file down to analyze, to find the corresponding relationship

 

 If the font file is fixed, we can analyze manually, and then create a mapping table to resolve, but if the font file each time the request will change, this solution does not.

We refresh link, and then download a font file compare to see whether the change

 After the comparison, not difficult to find a font file is completely different.

  2, bypassing the font anti-climb

So far, I climbed the data can be divided into PC-side data, and mobile data terminal APP data from Web sources, since the PC side there is the font anti-climb, we can try to move from the end.

Start with a simple mobile data terminal Web start, you can use selenium, plus a mobile browser's User-Agent, we can show the same effect as the mobile browser in the PC browser, following figure shows a mobile end Web data do not use anti-climb font measures, too.

 1 from selenium import webdriver
 2 from selenium.webdriver.support.wait import WebDriverWait
 3 from selenium.webdriver.support import expected_conditions as EC
 4 from selenium.webdriver.common.action_chains import ActionChains
 5 from selenium.webdriver.common.by import By
 6 import time
 7 
 8 options = webdriver.ChromeOptions()
 9 
10 options.add_argument('User-Agent="Mozilla/5.0 (Linux; U; android 2.3.7; en-us; Nexus One Build/FRF91) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1"')
11 
12 chrome = webdriver.Chrome(r"D:\chromedriver_win32\chromedriver.exe", options=options)
13 
14 chrome.get("https://m.maoyan.com")
View Code

 当我们分析完成后,我们就可以使用requests+lxml来编写爬虫了。

移动端APP数据也就是常说的手机APP爬虫,参照:https://www.cnblogs.com/loveprogramme/p/12209172.html

 

Guess you like

Origin www.cnblogs.com/loveprogramme/p/12234712.html