Spynner visit webpage


I'm doing it in the east again. I use Spynner to access web pages. What is Spynner ? It is a python module that controls a GUI-free Webkit core to achieve http access. It can be used as a crawler. You can crawl some web pages that require js to run. Best.

Today, I finally realized running the Spynner program on the Raspberry Pi's linux. You need to use xvfb. Otherwise, you will be prompted to

install xvfb: sudo apt-get install xvfb
xvfb command line: xvfb- run sudo python /workhome/lpfrx.py


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#-*- encoding: utf-8 -*-
import spynner
import sys
reload(sys)
sys.setdefaultencoding('gbk')

if __name__ == "__main__": 
browser = spynner.Browser() 

#注释以下语句就是不打开窗口了
browser.show() 
try: 
browser.load(url='http://www.lpfrx.com', load_timeout=120, tries=1) 
except spynner.SpynnerTimeout: 
print 'Timeout.' 
else: 

browser.wk_fill('input[id="s"]', 'delphi') 
browser.wait(3)

#用javascript提交结果
browser.runjs("document.forms[0].submit();")

#另一种点击方式
#browser.wk_click('a[href]',wait_load=True, timeout=8)
browser.wait(3)

//以下是获取超链接的元素,在第6个链接点击
bb = browser.webframe.findAllElements('a') 
print len(bb)
print sys.getdefaultencoding()

anchor = bb[6]
try:
browser.wk_click_element_link(anchor, timeout=5)
except spynner.SpynnerTimeout:
print "timeou 5"


browser.wait(5) 
html = browser.html 
if html: 
html = html.encode('utf-8') 
open('lpfrx.txt', 'w').write(html) 
browser.close()
不同的方式点击方式,其实还可以有其它方式,不过spynner中文资料好少,这东西做爬虫有点慢,不过可以访问那些用ajax方式生成的网页,非常不错,模拟登录和填数据不错,比通过com口调用IE方便.

以上程序在win7 64位和Raspberry pi linux下通过, 都要装pyqt4 .

发觉原来还有类似的模块,那个ghost.py, 有时间再搞搞.


Guess you like

Origin blog.csdn.net/jrckkyy/article/details/39006633