Analysis of Results:
http://xxx.com?method=getrequest&gesnum=00000001
http://xxx.com?method=getrequest&gesnum=00000002
http://xxx.com?method=getrequest&gesnum=00000003
returned data crawling
Since a single escape character is present python3 JSON data returned "\" does not handle the process
req = requests.get (url = url, headers = headers, verify = False, timeout = 60) .json ()
Then return is processed by binary data bytes type.
= requests.get REQ (URL = URL, headers = headers, Verify = False, False = allow_redirects, timeout = 60)
Data = json.dumps (bytes.decode (req.content, 'UTF-. 8'))
#!/usr/bin/python3 #-*- coding:utf-8 -*- #编写环境 windows 7 x64 Notepad++ + Python3.5.0 import urllib3 urllib3.disable_warnings() import sys import requests import re import json cookie = '''JSESSIONID=1B7407076DE01727BC48DCD56FF9BA70; entsoft=entsoft; JSESSIONID=4877B5AC1DF6307E90CF1641D3863A6C; radId=45991FBF-0BC4-3BA4-08E2-00072022FB2C''' headers ={ 'Accept': 'application/json, text/plain, */*', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36', 'Accept-Encoding': 'gzip, deflate', 'Accept-Language': 'zh-CN,zh;q=0.9', 'Cookie': cookie, } #输出00000001-00000300存放在num.txt中 def getNum(): filename='C:\\Users\\Administrator\\Desktop\\脚本\\num.txt' file = open(filename,'w') for i in range(1,300): file.write(("%08d" % i)+'\n') file.close() def main(): #url ='http://xxx.com?method=getrequest&gesnum=00000001' getNum() filename='C:\\Users\\Administrator\\Desktop\\脚本\\num.txt' with open(filename,'r') as file: for line in file: url = ' Http://xxx.com?method=getrequest&gesnum={line} ' .format (Line = Line) # Print (URL) # REQ = requests.get (URL = URL, headers = headers, Verify = False, timeout 60 =) .json () # encounter problems: dealing with a single escape character data exists in python3 JSON "\" is not solved then use the following way req = requests.get (url = url, headers = headers, verify = False , allow_redirects = False, timeout = 60 ) # use json.dumps method, the object may be converted to a string json # Print (req.content) # response.text unicode returns a text data type # response.content return is a type of binary data bytes # Since the return type unicode text data error, use the returned bytes of binary data type data= json.dumps(bytes.decode(req.content,'UTF-8')) #print(data) #正则匹配邮箱地址 emailRegex = r"[-_\w\.]{0,64}@([-\w]{1,63}\.)*[-\w]{1,63}" email = re.search(emailRegex,data) print(email) if __name__ == '__main__': main()
<_sre.SRE_Match object; span=(158, 184), match='[email protected]'> <_sre.SRE_Match object; span=(145, 170), match='[email protected]'>
#!/usr/bin/python3 #-*- coding:utf-8 -*- #编写环境 windows 7 x64 Notepad++ + Python3.5.0 def main(): filename = "C:\\Users\\Administrator\\Desktop\\脚本\\email_handle.txt" filename1 = "C:\\Users\\Administrator\\Desktop\\脚本\\email_handle_handle.txt" file1 = open(filename1,'w') with open(filename,'r') as file: for line in file: data=line[48:] print(data) file1.write(data) file.close() file1.close() if __name__ == '__main__': main()
Cookie python crawler using two methods
https://blog.csdn.net/weixin_38706928/article/details/80376572
Python3 on UnicodeDecodeError / UnicodeEncodeError: 'gbk' codec can not decode / encode bytes text encoding similar problem
https: / /www.cnblogs.com/worstprogrammer/p/5189758.html
Python simulated landing (library use requests)
https://blog.csdn.net/majianfei1023/article/details/49927969
urllib3 package certificate authentication and warnings of Python disable
https://blog.csdn.net/taiyangdao/article/details/72825735
JSON parsing and formatting online verification
https://www.json.cn/