Reptile fifth day
1. js decryption, confusion, reverse
- url:https://www.aqistudy.cn/html/city_detail.html
- analysis:
- The index data is dynamically loaded air out
- After modifying the search criteria click the Search button initiates ajax request, the request to index the data we want.
- The step of locating the data packet from the extracted url, mode request, request parameters
- url and request method can be used to directly
- And the request parameters are dynamic encryption
- The response data is encrypted ciphertext data
- The index data is dynamically loaded air out
- Click the Search button to find the ajax request initiated by the corresponding code
- Find the corresponding search button click event bound js function based on the Firefox browser developer tools which
- (), The function realized getData
- type = HOUR: hour to query data in units
- Call two other functions: getAQIData (), getWeatherData ()
- And no code corresponding to the request to find ajax
- 分析getAQIData&getWeatherData:
- These two functions to achieve almost the same, the only difference is
- var method = 'GETDETAIL';
- var method = 'GETCITYWEATHER';
- Ajax request but could not find the corresponding code, but found the call to another function:
- getServerData(method, param, function(obj),0.5 )
- method:
- 'GETDETAIL'
- 'GETCITYWEATHER'
- param is a dictionary, there are four sets of key-value pairs:
- city;
- type;
- startTime;
- endTime;
- method:
- getServerData(method, param, function(obj),0.5 )
- Analysis realize getServerData functions:
- Global search based capture tool, to locate a specific packet, appeared getServerData keyword, the keyword corresponding to the encrypted code is js
- JS confusion: the core code of the encryption js
- JS anti-aliasing:
- Brute force:
- url:https://www.bm8.com.cn/jsConfusion/
- Analysis of the implementation code after the anti-aliasing getServerData function:
- Finally found ajax code corresponding to the request:
- getParam (method, object) and returns the value of the dynamic change of an encrypted request parameter d.
- method == method
- object == param
- decodeData (data): receiving the encrypted plaintext data decrypted data is returned in response to
- data: encrypted response data
- getParam (method, object) and returns the value of the dynamic change of an encrypted request parameter d.
- Finally found ajax code corresponding to the request:
- Brute force:
- js Reverse:
- Automatic Reverse:
- PyExecJS description: PyExecJS is a Python can be used to simulate the operation of JavaScript libraries.
- Must be implemented in the machine installed nodejs development environment
- We need to pip install PyExecJS its installation environment.
- PyExecJS description: PyExecJS is a Python can be used to simulate the operation of JavaScript libraries.
- Automatic Reverse:
- These two functions to achieve almost the same, the only difference is
- (), The function realized getData
- Find the corresponding search button click event bound js function based on the Firefox browser developer tools which
In [3]:
#模拟执行js函数获取动态变化且解密的请求参数d的值
import execjs
node = execjs.get()
# Params
method = 'GETCITYWEATHER'
city = '北京'
type = 'HOUR'
start_time = '2018-01-25 00:00:00'
end_time = '2018-01-25 23:00:00'
# Compile javascript
file = 'test.js'
ctx = node.compile(open(file,encoding='utf-8').read())
# Get params
js = 'getPostParamCode("{0}", "{1}", "{2}", "{3}", "{4}")'.format(method, city, type, start_time, end_time)
params = ctx.eval(js)#模拟执行指定的js函数
print(params)
tdgHOYxwKdDSgYXe+RLPzYCgLvrddahasI5XXklB4gVLYqab+XRPpMD/oSqnJ/aEmFwzVEUhLnPzRy03+X1BI4qc9EYeRPqiKrT+f1JQExGQ4ii8kKvZhGH+nPffaX/xq5iLB6vblcvBC/L8e6UxdnHlajfkXrLQf1qv5Hcg3c++RoGxPAMOgNc6HbCbQG2sE6yemJ7l8HI9CyNktTP7AwQC04bTbY+s+o7lljhqUvsyMZq88MU1VV46TFExCP7vxfmEl6YFeV892bU27lPedTCtSnYbCEfFCJDP0DfEBHe0XFOcgXs+Yl5h58efciX69k9IEvGCKenhokOJQ2tS178anRoT37sEBV5cZeLY8Uzh8UUWgxg2sH+JJsg8ARclHhK0AN/SA4wFy8XmwdBun1zHxV8LoPfn3cxqzXnNKOp/nowpNnbyuMSZtftbf41HB1dEdkm07a2LzCaJgUEpPmLZUuA7+lDlCKqTsEZVh9w=
- url of carrying d ajax post request transmits the requested parameters may acquire the encrypted response data
In [5]:
import execjs
import requests
node = execjs.get()
# Params
method = 'GETCITYWEATHER'
city = '北京'
type = 'HOUR'
start_time = '2018-01-25 00:00:00'
end_time = '2018-01-25 23:00:00'
# Compile javascript
file = 'test.js'
ctx = node.compile(open(file,encoding='utf-8').read())
# Get params
js = 'getPostParamCode("{0}", "{1}", "{2}", "{3}", "{4}")'.format(method, city, type, start_time, end_time)
params = ctx.eval(js)
#发起post请求
url = 'https://www.aqistudy.cn/apinew/aqistudyapi.php'
response_text = requests.post(url, data={'d': params}).text
print(response_text)
eAkzHZvdWqslCrP29e8XgEP22qdvyxus1TrEFB8uvsD0ChwbOTBCJErsCqVJyLQJ9wdhdK9lk3nl/SEeVqoXSY48w11ODT7v6rhQkkXuZ3Vv+VOQ7C7zXtLvbJJDIq9Nu3RRA+8rS/R0lnyMUk98IQ==
In [12]:
#将密文的响应数据进行解密:模拟调用decodeData(data)
import execjs
import requests
node = execjs.get()
# Params
method = 'GETCITYWEATHER'
city = '北京'
type = 'HOUR'
start_time = '2018-01-25 00:00:00'
end_time = '2018-01-25 23:00:00'
# Compile javascript
file = 'test.js'
ctx = node.compile(open(file,encoding='utf-8').read())
# Get params
js = 'getPostParamCode("{0}", "{1}", "{2}", "{3}", "{4}")'.format(method, city, type, start_time, end_time)
params = ctx.eval(js)
#发起post请求
url = 'https://www.aqistudy.cn/apinew/aqistudyapi.php'
response_text = requests.post(url, data={'d': params}).text
# #对加密的响应数据进行解密
jss = 'decodeData("{0}")'.format(response_text)
print(jss)
decrypted_data = ctx.eval(jss)
print(decrypted_data)
decodeData("S+PG+bQmwr20q9LEnYxZb6d9kwGuj+GKqm/YqBW0N9VTTsfFzS6mR86ne1uxqNuepTIfI+opvFV/np093XWIf2IXLkXoN7yUEFNnINrJBIN9MFj2Y9rWgCXZXe4k0PtMub9YKalryHwuO7IlNJN3OA==")
---------------------------------------------------------------------------
ProgramError Traceback (most recent call last)
<ipython-input-12-775eb2217305> in <module>()
26 jss = 'decodeData("{0}")'.format(response_text)
27 print(jss)
---> 28 decrypted_data = ctx.eval(jss)
29 print(decrypted_data)
~\Anaconda3\lib\site-packages\execjs\_abstract_runtime_context.py in eval(self, source)
25 if not self.is_available():
26 raise execjs.RuntimeUnavailableError
---> 27 return self._eval(source)
28
29 def call(self, name, *args):
~\Anaconda3\lib\site-packages\execjs\_external_runtime.py in _eval(self, source)
76
77 code = 'return eval({data})'.format(data=data)
---> 78 return self.exec_(code)
79
80 def _exec_(self, source):
~\Anaconda3\lib\site-packages\execjs\_abstract_runtime_context.py in exec_(self, source)
16 if not self.is_available():
17 raise execjs.RuntimeUnavailableError
---> 18 return self._exec_(source)
19
20 def eval(self, source):
~\Anaconda3\lib\site-packages\execjs\_external_runtime.py in _exec_(self, source)
86 else:
87 output = self._exec_with_pipe(source)
---> 88 return self._extract_result(output)
89
90 def _call(self, identifier, *args):
~\Anaconda3\lib\site-packages\execjs\_external_runtime.py in _extract_result(self, output)
165 return value
166 else:
--> 167 raise ProgramError(value)
168
169
ProgramError: Error: Malformed UTF-8 data
selenium
review
- + Single-threaded multi-tasking asynchronous coroutine
- Special Function
- After calling the internal implementation program statement will not be executed immediately
- After the call returns a coroutine objects
- Coroutine objects
- == == coroutine special functions a set of specified operations
- The task object
- Senior coroutine objects
- Bind callbacks
- task.add_done_callback(func)
- func:
- There must be a (current task object) parameter
- .Result parameter () is represented by a special function's return value
- The task object == a specified set of operations
- Object event loop
- Creating a eventloop objects
- effect:
- It must be loaded one or more tasks objects (task object is required to register to eventloop)
- Start the event loop objects
- A designation operation may be performed for each task object registered therein corresponding asynchronous
- Wait await: ensure eventloop will perform blocking operations
- Suspend: Let the task object blocking the current occurrence of hand over the right to use the cpu
- asyncio.wait(tasks)
- Key:
- Code does not support asynchronous module's internal special function can not appear
- aiohttp: supports asynchronous web request module
- Use the context mechanism (with ... as)
- Instantiating a request object (a ClientSession ())
- get / post () request transmission. (Blocking operation)
- proxy parameters: ' HTTP: // ip: Port '
- Fetch response data (blocking operation)
- response.text (): String
- response.read():byte
- Special Function
- selenium
- Action Chain
- Headless browser
- Evade detection
- Managed browser
- 12306 of simulated landing
- Action Chain
- from selenium.webdriver import ActionChains
- NoSuchElementException: not locate the specified label
- Positioning of tag is present in a nested sub-page, if you want to specify a label locator page if needed:
- bro.switch_to.frame ( 'iframe tag id attribute value of'): the current browser page to the switching range of the specified subpage
- Positioning of tag is present in a nested sub-page, if you want to specify a label locator page if needed:
- For the specified browser instance of an action link objects
- action = ActionChains(bro)
- action.click_and_hold(tagName)
- move_by_offset(10,15)
- perform () operation is performed immediately chain
In [6]:
from selenium import webdriver
from selenium.webdriver import ActionChains
from time import sleep
# 后面是你的浏览器驱动位置,记得前面加r'','r'是防止字符转义的
bro = webdriver.Chrome('./chromedriver.exe')
bro.get('https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable')
#标签定位
bro.switch_to.frame('iframeResult')
div_tag = bro.find_element_by_id('draggable')
#需要使用ActionChains定制好的行为动作
action = ActionChains(bro)#针对当前浏览器页面实例化了一个动作链对象
action.click_and_hold(div_tag)#点击且长按指定的标签
for i in range(1,7):
action.move_by_offset(10,15).perform()#perform()是的动作链立即执行
sleep(0.5)
bro.quit()
- Headless browser
- No visual interface of the browser
- phantomJS: headless browser
- Google headless browser:
- This machine is that you install the Google browser, but you need to configure via code can become headless browser
In [8]:
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
bro = webdriver.Chrome(executable_path='./chromedriver.exe',chrome_options=chrome_options)
bro.get('https://www.baidu.com/')
sleep(1)
bro.save_screenshot('./1.png')#截屏
print(bro.page_source)
- selenium evade detection
- Managed browser
- Environment configuration:
- Path to the directory machine Google browser as the driver 1. This added to the environment variable
- 2. Use the native Google driver to open a browser
- chrome.exe --remote-debugging-port=9222 --user-data-dir="C:\selenum\AutomationProfile"
- 9222: Port (arbitrary)
- "C: \ selenum \ AutomationProfile": already a pre-existing empty directory
- chrome.exe --remote-debugging-port=9222 --user-data-dir="C:\selenum\AutomationProfile"
- Use the following code to take over the currently open browser:
- Environment configuration:
- Managed browser
In [30]:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_experimental_option("debuggerAddress", "127.0.0.1:9222")
bro = webdriver.Chrome(executable_path='./chromedriver.exe',chrome_options=chrome_options)#代码托管代开的浏览器,不会实例化一个新的浏览器。
bro.get('https://kyfw.12306.cn/otn/login/init')
C:\Users\laonanhai\Anaconda3\lib\site-packages\ipykernel_launcher.py:8: DeprecationWarning: use options instead of chrome_options
12306 simulated landing
- url:https://kyfw.12306.cn/otn/login/init
- analysis:
- Identification verification code image must be acquired by and stored in a local theme
- Landing operation and the only one to one verification picture
- Identification verification code image must be acquired by and stored in a local theme
In [25]:
#pip install Pillow
from PIL import Image
from selenium.webdriver import ActionChains
from selenium import webdriver
#识别验证码的函数
def transformCode(imgPath,imgType):
chaojiying = Chaojiying_Client('13614167787', '13614167787', '903126')
im = open(imgPath, 'rb').read()
return chaojiying.PostPic(im,imgType)['pic_str']
In [26]:
bro = webdriver.Chrome(executable_path='./chromedriver.exe')
bro.get('https://kyfw.12306.cn/otn/login/init')
sleep(2)
bro.find_element_by_id('username').send_keys('xxxxxxx')
bro.find_element_by_id('password').send_keys('12345465')
#验证码的点击操作
bro.save_screenshot('main.png')#将页面当做图片保存到本地
#将单独的验证码图片从main.png中裁剪下载
img_tag = bro.find_element_by_xpath('//*[@id="loginForm"]/div/ul[2]/li[4]/div/div/div[3]/img')#将验证码图片的标签定位到了
location = img_tag.location
size = img_tag.size
# print(location,size)
#裁剪的范围(验证码图片左下角和右上角两点坐标)
rangle = (int(location['x']),int(location['y']),int(location['x']+size['width']),int(location['y']+size['height']))
#使用Image类根据rangle裁剪范围进行验证码图片的裁剪
i = Image.open('./main.png')
frame = i.crop(rangle)#验证码对应的二进制数据
frame.save('./code.png')
result = transformCode('./code.png',9004)#99,71|120,140
#99,71|120,140 == [[99,71],[120,140]]
all_list = []#[[99,71],[120,140]]
if '|' in result:
list_1 = result.split('|')
count_1 = len(list_1)
for i in range(count_1):
xy_list = []
x = int(list_1[i].split(',')[0])
y = int(list_1[i].split(',')[1])
xy_list.append(x)
xy_list.append(y)
all_list.append(xy_list)
else:
x = int(result.split(',')[0])
y = int(result.split(',')[1])
xy_list = []
xy_list.append(x)
xy_list.append(y)
all_list.append(xy_list)
for data in all_list:
x = data[0]#11
y = data[1]#22
ActionChains(bro).move_to_element_with_offset(img_tag,x,y).click().perform()
sleep(1)
sleep(2)
bro.find_element_by_id('loginSub').click()
bro.quit()
In [21]:
#下载好的示例代码
#!/usr/bin/env python
# coding:utf-8
import requests
from hashlib import md5
class Chaojiying_Client(object):
def __init__(self, username, password, soft_id):
self.username = username
password = password.encode('utf8')
self.password = md5(password).hexdigest()
self.soft_id = soft_id
self.base_params = {
'user': self.username,
'pass2': self.password,
'softid': self.soft_id,
}
self.headers = {
'Connection': 'Keep-Alive',
'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)',
}
def PostPic(self, im, codetype):
"""
im: 图片字节
codetype: 题目类型 参考 http://www.chaojiying.com/price.html
"""
params = {
'codetype': codetype,
}
params.update(self.base_params)
files = {'userfile': ('ccc.jpg', im)}
r = requests.post('http://upload.chaojiying.net/Upload/Processing.php', data=params, files=files, headers=self.headers)
return r.json()
def ReportError(self, im_id):
"""
im_id:报错题目的图片ID
"""
params = {
'id': im_id,
}
params.update(self.base_params)
r = requests.post('http://upload.chaojiying.net/Upload/ReportError.php', data=params, headers=self.headers)
return r.json()
# if __name__ == '__main__':
# chaojiying = Chaojiying_Client('超级鹰用户名', '超级鹰用户名的密码', '96001') #用户中心>>软件ID 生成一个替换 96001
# im = open('a.jpg', 'rb').read() #本地图片文件路径 来替换 a.jpg 有时WIN系统须要//
# print chaojiying.PostPic(im, 1902) #1902 验证码类型 官方网站>>价格体系 3.4+版 print 后要加()