Taobao commodity information directed reptiles example introduction

Function Description: 
1) goal: to get information Taobao search page, extract the product name and price them. 
2) to understand: Taobao search interface, flip the handle 
3) technical route Requests-Re 


Import Re 

"" " 
1, submitted product search request, obtaining circulation page 
2, for each page, commodity name and price information extraction 
3, output information to the screen 
"" " 


DEF getHtmlText (URL): 
    the try: 
        R & lt requests.get = (URL, timeout = 30 ) 
        r.raise_for_status () 
        r.encoding = r.apparent_encoding 
        return r.text 
    the except: 
        return '' 


DEF parsePage (ILT, HTML): 
    the try: 
        PLT = the re.findall (R & lt '\ "view_price \" \: \ "[ \ d \] * \ " ', HTML). 
        TLT = re.findall (r' \" raw_title \ "\: \."? * \ " ',html) # * minimum match? 
        for i in the Range (len (plt)):
            price = eval(plt[i].split(':')[1])
            title = eval(tlt[i].split(':')[1])
            ilt.append([price, title])
    except:
        print("")


def printGoodList(ilt):
    tplt = "{:4}\t{:8}\t{:16}"
    print(tplt.format('序号', '价格', '商品名称'))
    count = 0
    for g in ilt:
        count = count + 1
        print(tplt.format(count, g[0], g[1]))


def main():
    goods = '书包'
    depth = 2
    start_url = 'https://s.taobao.com/search?q=' + goods
    info_list = []
    for i in range(depth):
        try:
            url = start_url + '&s=' + str(44 * i)
            html = getHtmlText(url)
            parsePage(info_list, html)
        except:
            continue
    printGoodList(info_list)


main()

 

Guess you like

Origin www.cnblogs.com/wangyue0925/p/11231898.html