Small scale chopper Python Reptile - The use Python crawl Baidu Street View image

.Net done before with some automated crawlers, listen to Daniel say using python reptile easier to write, unable to control themselves taking the time to try one, using Python crawl Baidu Street View images.

These two days, Wuhan, German Chancellor Angela Merkel welcomed a great man, and a brush Wuhan Yangtze River Bridge in Wuhan Yangtze River Bridge today, for example, using the image of the street Python crawling position.

Baidu street URL analysis

Http-based packet capture tool, you can easily get to the http request data when browsing Baidu street. As shown below, that is, a position of the point image slice Yangtze Bridge Street:

URL request corresponding to the slice is:

A detailed analysis of the URL request, and after simulation tests can be summed up as follows preliminary conclusions

Several key parameters required for the request image slices were:

① sid: on behalf of a specific bit of street attractions;

② pos: the representative coordinates of the slices on the slice of FIG full panoramic image;

③ z: Street image representative of slice level.

Street view of a single image to produce a slice position may multiple levels, at different levels, different number of slices; slices using the coordinates of the line number and column number to be distinguished.

Baidu clear more than a slice rule streetscape images, you can use the code to open the line and.

Python source code

Requirements: grab a one-time 10 consecutive sections of information at all levels full of attractions.

Source as follows:

import urllib2
import threading
from optparse import OptionParser
# from bs4 import BeautifulSoup
import sys
import re
import urlparse
import Queue
import hashlib
import os
 
def download(url, path, name):
    conn = urllib2.urlopen(url)
    if not os.path.exists(path):
        os.makedirs(path)
    f = open(path + name, 'wb')
    f.write(conn.read())
    f.close()

fp = open("E:\\Workspaces\\Python\\panolist.txt", "r")
for line in fp.readlines():
    line =  (lambda x: x[1:-2])(line)
    # url = line
    for zoom in range(1, 6):
        row_max = 0
        col_max = 0
        row_max = pow(2, zoom - 2) if zoom > 1 else 1
        col_max = pow(2, zoom - 1)
        for row in range(row_max):
            for col in range(col_max):
                z = str(zoom)
                y = str(row)
                x = str(col)
                print(y + "_" + x)
                url = line + "&pos=" + y + "_" + x + "&z=" + z
                path = "E:\\Workspaces\\Python\\pano\\" + url.split('&')[1].split('=')[1] + "\\" + z + "\\"
                name = y + "_" + x + ".jpg"
                print url
                print name
                download(url, path, name)
fp.close()

抓取结果如下,按上述分析的规则进行本地化存储,可以看到各级别下,所有的切片拼接起来,刚好是一张完整的全景图。

小结

① Python这门语言真的是蛮便捷,安装和配置都十分方便,也有很多IDE都支持,我初次使用,遇上问题就随手查Python语言手册,基本上半天完成该代码示例。

② 在爬虫程序方面,Python相关资源十分丰富,是爬虫开发的一把利器。

上述代码简要的实现了批量抓取百度街景影像切片数据,大量使用的话,建议继续处理一下,加上模拟浏览器访问的处理,否则很容易被服务方直接侦测到来自网络爬虫的资源请求,而导致封堵。

 

附 python爬虫入门(一)urllib和urllib2 https://www.cnblogs.com/derek1184405959/p/8448875.html

Guess you like

Origin www.cnblogs.com/hans_gis/p/11487228.html