Climbing python reptile real items of watercress take the most popular 250 movies

Climbing python reptile real items of watercress take the most popular 250 movies

The main idea, watercress request link Get page source code

Then use BeatifulSoup get what we want

Finally, put the stored data to excel file

 

The main idea, watercress request link Get page source code

Then use BeatifulSoup get what we want

Finally, put the stored data to excel file

Project Share Source

1  '' ' 
2  What I do not know how you can add in the learning process
 3  Python learning exchanges buttoned Qun, 934 109 170
 4  group, there are good tutorials, development tools and e-books.
5  Share python current business needs and your talent and how good python learning from zero base, and learn what content.
. 6  '' ' 
. 7   
. 8  Import Requests
 . 9  from BS4 Import the BeautifulSoup
 10  Import xlwt
 . 11   
12 is   
13 is  DEF request_douban (URL):
 14      the try :
 15          Response = requests.get (URL)
 16          IF response.status_code == 200 is :
 . 17              return response.text
18     except requests.RequestException:
19         return None
20  
21  
22 book = xlwt.Workbook(encoding='utf-8', style_compression=0)
23  
24 sheet = book.add_sheet('豆瓣电影Top250', cell_overwrite_ok=True)
25 sheet.write(0, 0, '名称')
26 sheet.write(0, 1, '图片')
27 sheet.write(0, 2, '排名')
28 sheet.write(0, 3, '评分')
29 sheet.write(0, 4, '作者')
30 sheet.write(0, 5, '简介')
31  
32 n = 1
33  
34  
35 def save_to_excel(soup):
36     list = soup.find(class_='grid_view').find_all('li')
37  
38     for item in list:
39         item_name = item.find(class_='title').string
40         item_img = item.find('a').find('img').get('src')
41         item_index = item.find(class_='').string
42         item_score = item.find(class_='rating_num').string
43         item_author = item.find('p').text
44         if (item.find(class_='inq') != None):
45             item_intr = item.find(class_='inq').string
46  
47         # print('爬取电影:' + item_index + ' | ' + item_name +' | ' + item_img +' | ' + item_score +' | ' + item_author +' | ' + item_intr )
48         print('爬取电影:' + item_index + ' | ' + item_name + ' | ' + item_score + ' | ' + item_intr)
49  
50         global n
51  
52         sheet.write(n, 0, item_name)
53         sheet.write(n, 1, item_img)
54         sheet.write(n, 2, item_index)
55         sheet.write(n, 3, item_score)
56         sheet.write(n, 4, item_author)
57         sheet.write(n, 5, item_intr)
58  
59         n = n + 1
60  
61  
62 def main(page):
63     url = 'https://movie.douban.com/top250?start=' + str(page * 25) + '&filter='
64     html = request_douban(url)
65     soup = BeautifulSoup(html, 'lxml')
66      save_to_excel (Soup)
 67   
68   
69  IF  the __name__ == ' __main__ ' :
 70   
71 is      for I in Range (0, 10 ):
 72          main (I)
 73 is   
74 book.save (U ' watercress Top 250 films. XLSX ' )

Code run shot

Generate an excel file

Guess you like

Origin www.cnblogs.com/xiaoyiq/p/11386828.html