Climbing python reptile real items of watercress take the most popular 250 movies
The main idea, watercress request link Get page source code
Then use BeatifulSoup get what we want
Finally, put the stored data to excel file
The main idea, watercress request link Get page source code
Then use BeatifulSoup get what we want
Finally, put the stored data to excel file
Project Share Source
1 '' ' 2 What I do not know how you can add in the learning process 3 Python learning exchanges buttoned Qun, 934 109 170 4 group, there are good tutorials, development tools and e-books. 5 Share python current business needs and your talent and how good python learning from zero base, and learn what content. . 6 '' ' . 7 . 8 Import Requests . 9 from BS4 Import the BeautifulSoup 10 Import xlwt . 11 12 is 13 is DEF request_douban (URL): 14 the try : 15 Response = requests.get (URL) 16 IF response.status_code == 200 is : . 17 return response.text 18 except requests.RequestException: 19 return None 20 21 22 book = xlwt.Workbook(encoding='utf-8', style_compression=0) 23 24 sheet = book.add_sheet('豆瓣电影Top250', cell_overwrite_ok=True) 25 sheet.write(0, 0, '名称') 26 sheet.write(0, 1, '图片') 27 sheet.write(0, 2, '排名') 28 sheet.write(0, 3, '评分') 29 sheet.write(0, 4, '作者') 30 sheet.write(0, 5, '简介') 31 32 n = 1 33 34 35 def save_to_excel(soup): 36 list = soup.find(class_='grid_view').find_all('li') 37 38 for item in list: 39 item_name = item.find(class_='title').string 40 item_img = item.find('a').find('img').get('src') 41 item_index = item.find(class_='').string 42 item_score = item.find(class_='rating_num').string 43 item_author = item.find('p').text 44 if (item.find(class_='inq') != None): 45 item_intr = item.find(class_='inq').string 46 47 # print('爬取电影:' + item_index + ' | ' + item_name +' | ' + item_img +' | ' + item_score +' | ' + item_author +' | ' + item_intr ) 48 print('爬取电影:' + item_index + ' | ' + item_name + ' | ' + item_score + ' | ' + item_intr) 49 50 global n 51 52 sheet.write(n, 0, item_name) 53 sheet.write(n, 1, item_img) 54 sheet.write(n, 2, item_index) 55 sheet.write(n, 3, item_score) 56 sheet.write(n, 4, item_author) 57 sheet.write(n, 5, item_intr) 58 59 n = n + 1 60 61 62 def main(page): 63 url = 'https://movie.douban.com/top250?start=' + str(page * 25) + '&filter=' 64 html = request_douban(url) 65 soup = BeautifulSoup(html, 'lxml') 66 save_to_excel (Soup) 67 68 69 IF the __name__ == ' __main__ ' : 70 71 is for I in Range (0, 10 ): 72 main (I) 73 is 74 book.save (U ' watercress Top 250 films. XLSX ' )
Code run shot
Generate an excel file