爬虫之字体反爬(二)猫眼票房

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/DataCastle/article/details/84764707

今天为大家带来的是字体反爬的另一个案例,猫眼票房。具体来看下面的分析与代码。

首先参考的网站:https://piaofang.maoyan.com/?ver=normal

从网站中可以观察到,它的反爬是这样的:

 再从网页源码中观察,发现又是这样的:

同样的还是存在一个特殊的标签 <style>,点开之后如下图所示: 

 和之前不同的是,这里的字体文件经过了base64编码过,所以只需解码即可,然后将相应的字体文件保存到本地,方便查看。代码如下:

import base64
font_face = 'd09GRgABAAAAAAgkAAsAAAAAC7gAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAABHU1VCAAABCAAAADMAAABCsP6z7U9TLzIAAAE8AAAARAAAAFZW7lS5Y21hcAAAAYAAAAC5AAACTDYdn/RnbHlmAAACPAAAA5UAAAQ0l9+jTWhlYWQAAAXUAAAALwAAADYTWqnsaGhlYQAABgQAAAAcAAAAJAeKAzlobXR4AAAGIAAAABIAAAAwGhwAAGxvY2EAAAY0AAAAGgAAABoGjAWybWF4cAAABlAAAAAfAAAAIAEZADxuYW1lAAAGcAAAAVcAAAKFkAhoC3Bvc3QAAAfIAAAAWgAAAI/KSrO5eJxjYGRgYOBikGPQYWB0cfMJYeBgYGGAAJAMY05meiJQDMoDyrGAaQ4gZoOIAgCKIwNPAHicY2Bk0mWcwMDKwMHUyXSGgYGhH0IzvmYwYuRgYGBiYGVmwAoC0lxTGBwYKr46Mev812GIYdZhuAIUZgTJAQDcEgtJeJzFkrENgzAQRb8DARJSpAw7pGMDpmIBMkHEIKkQDR17IGQJiQ6LmvzjaCKFNjnrWbpv63y6bwBHAB65Ex8wFQwkXlTNqns4r7qPB/MbrlQi5J3pC1vZdmjGeipd4tI5Wxbe2D/5FoYVvy058XDAiR36iBGwiwgh5WCn0g/C/O/pz7is+3PLYpJvsMVuQ3ztC4WThK0UzhS2VcT/oVHE/7FWpOZUKpw9XKLIX3CpQj8wZwrCN1IIQrAAAAB4nEWTS28aVxzF7x0ixsEE4zKPgBNgGMwMg23G88LAeCCMIfGTYgPGOCHGSghxm8S14tRJrDahDymp+gHSTaUuuom6yD6VqmbVpmq96Aeo1G13jZSNhXsH43QWI92rmXvO+f3PBRCAo3+ABAiAAZCQScJP8AA9sPd6i/0BwgCMkoySMGwJAyY0JcKG7DhnQIkiCTvugrjLhsO3XW7QMcInI6kiGZ3XMwuwfnrv9z0mRpgiL9HvDZTLAb83HleD4tz5qeuzcwVH6+ZOZXxRojM8M36WPvNOcxdpOgFg2GF0tqpZojLcrQXb/OzUCD+YxES/7q6EJK9Io89tyP8bDKB/CBAA4wB4IqqiWd5wP0QGQ5y1ThCULFneQ3YbQdFooR2vXny4/XJnK5fv/HkhWxBzisgyZuvCudBoKBqUyWj5kxL8nN96/+adhTZPXc1d2Tf0ZqHxvZIJBhpmtvuEyxMekuAeLZeO/R/9C4+Qlxhi6YdIReEQLjxBS5raB4eEZYn2Q8SuZ5ANRbjO0EXNqHBR3Rd2uJJrGU2ecdTcyVQ5JU2q0mTm4pP21f3Tv8znqvsc71iE6WkxY+SG6vFJ39naxjw1dLlw5bPtuoUPe8eDBfG+C0TQgNNQ4ey4JYd4yJKl7YIswsPBnjOSoClJ+2pQF4U057Lj0BsfS6w9+HRzZldP3ytWFM0B28tT6WpUuF/8QVdHDdWnjQycsgs+36OtW1/Mf915+l1lIl6B6YW1xlIhGlv9v0dH2CvgQbNRGRK1xY6zVpOsycbhAWvOyB7vwDocdgfS/iyD3a7kw837D7P1D4SWvncneTlykgt2Ua4gGEMnRnpgrf6RBopIHadB2Xp4KRoSPf6qNWT4jZMMK0JQoJ1nguvy6n7qWu720wXzo4qmOrvPuHxEKxXvlTFKoUfpQPL8ijY50WmZd6e/fXnQWBYnyt3XY5VYfXF2tdqb8RvsFPYzatvJjI8H62FIBu97sTgjsl86ZrVsrWrGTGIlD691/+aCM2zjcTL/8ea0MfAqn9t8Vo0EHHC7/BNFP76xcWlVm6qfZD3sdxp40N2DvVD9G2edjRotaW4ughgIXl97aSd9zu12ukauF2/ohXrpwYrAPwyPw2Znbqm8LmT1W5kWt7QyV3v94u4u3Ein5FxPB5E8xH4DDoDuH6MyKpSHZZIluWEbNLu/wsKlZrP21/MSPOiKpeeHaO9HAP4DCCXgwgAAAHicY2BkYGAA4pbay2/j+W2+MnCzMIDADdnNUgj6/xsWBqbzQC4HAxNIFAA8RwrAAHicY2BkYGDW+a/DEMPCAAJAkpEBFfAAADNiAc14nGNhAIIUBgYmHeIwADeMAjUAAAAAAAAADABGAGAApgDoATABVAGYAcoB/gIaAAB4nGNgZGBg4GEwYGBmAAEmIOYCQgaG/2A+AwAOgwFWAHicZZG7bsJAFETHPPIAKUKJlCaKtE3SEMxDqVA6JCgjUdAbswYjv7RekEiXD8h35RPSpcsnpM9grhvHK++eOzN3fSUDuMY3HJyee74ndnDB6sQ1nONBuE79SbhBfhZuoo0X4TPqM+EWungVbuMGb7zBaVyyGuND2EEHn8I1XOFLuE79R7hB/hVu4tZpCp+h49wJt7BwusJtPDrvLaUmRntWr9TyoII0sT3fMybUhk7op8lRmuv1LvJMWZbnQps8TBM1dAelNNOJNuVt+X49sjZQgUljNaWroyhVmUm32rfuxtps3O8Hort+GnM8xTWBgYYHy33FeokD9wApEmo9+PQMV0jfSE9I9eiXqTm9NXaIimzVrdaL4qac+rFWGMLF4F9qxlRSJKuz5djzayOqlunjrIY9MWkqvZqTRGSFrPC2VHzqLjZFV8af3ecKKnm3mCH+A9idcsEAeJxtyTsOgCAQhOEdXyjiXQTEQClG7mJjZ+Lxjbutf/NlMlSRpOk/gwo1GrTooNBjgMYIg4nwqPs6Dx8Da2fLxlzYsK+fxSUxLI713sufo2ybeOe8Eb0TURdyAAA='
b = base64.b64decode(font_face)
font = TTFont(BytesIO(b))#从内存中读取字节内容
cmap = font.getBestCmap()
with open(r'C:..\maoyan1.ttf','wb') as f:
    f.write(b)

网页分析告一段落,接下来是字体分析

打开下载好的字体文件,如下所示:

 同样的,自定义一个字典,用来映射一一对应关系,如下所示:

 但观察后发现,字典中的键值并没有规律和特殊性,于是只能通过字体文件对应的xml文件去分析(xml文件可通过fonttools读取字体文件,调用font.saveXML()方法即可生成,这里需要两个字体文件来生成xml文件)

代码如下:(b 在上面的代码中定义)

font = TTFont(BytesIO(b))
font.saveXML(r'C:..\maoyan1.xml')

先放两张图大家感受一下:

 这是相同数字在不同xml文件中的对象,可以发现,同一数字的对象是相同的。

于是当有新的字体文件时只需要对比同一数字的编码其对象是否相同就可以得到对应的数字,对比对象代码如下所示:

font1 = TTFont( r'C:..\maoyan1.ttf') 
font2 = TTFont( r'C:..\maoyan2.ttf') 
uni_list1=font1.getGlyphOrder()[2:] #取出数字的编码
uni_list2=font2.getGlyphOrder()[2:] 
for uni2 in uni_list2:
    obj2=font2['glyf'][uni2] #获取编码uni2在字体文件中对应的对象
    for uni1 in uni_list1:
        obj1=font1['glyf'][uni1] #获取编码uni1在字体文件中对应的对象
        if obj1==obj2:
            print(uni2,dict[uni1])

 对比过后,结果如下图所示:

 与字体2中的数字对应关系一致。到此,其他的字体文件通过这样的方式就可以找出其中的映射关系了。

需要注意,与之前的案例不同的是这种映射关系需要通过对比才能发现,而不是自定义映射关系,明白这一点就没什么问题了。

猜你喜欢

转载自blog.csdn.net/DataCastle/article/details/84764707
今日推荐