Preface
We all know that Python can easily implement certain functions, and it can also write web pages, such as Remi, Pysimplegui, but it is the first time I have heard of the scripting language for browsers such as JavaScript, and this is the first time I have heard of it. , So I will fill up this knowledge with everyone's brains.
Many people learn python and don't know where to start.
Many people learn python and after mastering the basic grammar, they don't know where to find cases to get started.
Many people who have done case studies do not know how to learn more advanced knowledge.
So for these three types of people, I will provide you with a good learning platform, free to receive video tutorials, e-books, and the source code of the course!
QQ group: 705933274
One, PyExecJS
It is a Python module that can execute JavaScript scripts. It can interact with the JavaScript on the web page, so that you can get the encrypted content of the web page more accurately. If you use the network module in Python to request, you will not be able to decrypt the encryption in the document Content. At this time, using our PyExecJS can easily decrypt the encrypted content in the webpage. Of course, you have to reverse the Js. But if you want PyExecJS to parse JS statements without any problems, you have to parse the entire Js language environment. NodeJS is recommended here, which is awkward. Let's take a look at the usage of PyExecJs:
1. Routine operation
import execjs
aa=execjs.eval("'one|two|three'.split('|')") #执行JavaScript代码,将字符串分割形成数组
print(aa)
e=execjs.compile(''' #编译一个表达式
function add(x,y){
return x+y;
}
''')
print(e.call('add',10,20))#调用编译好的函数并且赋值
You can also run our statement by getting the engine, as follows:
print(execjs.get().eval('1+1'))
2. View the interpretation engine
print(execjs.get().name)
Here, the interpretation engine of JavaScript is JScript, and we can also use our own designated engine, such as "Nodejs".
3. Specify the engine
import execjs
import os
os.environ["EXECJS_RUNTIME"] = "Node"
print(execjs.get().name)
You can also manually specify the engine, as follows:
js1=execjs.get(execjs.runtime_names.JScript)
print(js1.eval('1'))
js2=execjs.get(execjs.runtime_names.Node)
print(js2.eval('2'))
Two, Js2Py
I think this is relatively good, it can be said to be a complex, it does not need to rely on other environments, it can run Js files independently, but its running speed may be slightly slower, but this is nothing. Let's take a look at the functions of its artifacts.
1. Routine operation, required
2. Loop traversal
import js2py
aa=js2py.eval_js(
'''
var i=0;
for(var c=1;c<6;c++){
console.log(c);
}
'''
)
print(aa)
3. Read the Js file
We can write the JS file into the file so that we can call it as follows:
1.js
function f(aa){
if(aa>11){
console.log('OK')
}else{
console.log('Fail')
}
}
Python file
import js2py
with open('1.js','r') as f:
aa=js2py.eval_js(f.read())
print(aa(11))
4. Crawl website data
Here we focus on Taobao, and I want to assemble its JS script file as follows:
import execjs
import requests
import re
url = 'https://ai.taobao.com/?pid=mm_26632323_6762370_25910879'
res=requests.get(url).text
js=re.findall(r'<script>(.*?)</script>',res)
print(js,'\n')
js1=re.sub(r'eval\(','return(',js[0])
html="function getLego2WPK(){" + js1 + "};"
ctx = execjs.compile(html)
temp = ctx.call('getLego2WPK')
print(temp)
Three, PyV8
It is based on Google's V8 engine. Unfortunately, only the Python2 version is currently supported, and it is no longer maintained. It is recommended that Python2 friends can try it.
Four, summary
This article mainly counts 3 Python libraries that can manipulate JavaScript, namely PyExecJS, Js2Py and PyV8. As long as we use these modules, we can be more skilled and more accurate when we play crawlers.