4399 mini game childhood fun, python crawling 4399 mini game

Preface

The text and pictures in this article are from the Internet and are for learning and communication purposes only. They do not have any commercial use. If you have any questions, please contact us for processing.

PS: If you need Python learning materials, you can click on the link below to get it yourself

Python free learning materials and group communication answers Click to join

Basic environment configuration

  • python 3.6
  • pycharm
  • requests
  • parcel

The relevant module pip can be installed

''' 
Action games: http://www.4399.com/flash_fl/2_1.htm 
Sports games: http://www.4399.com/flash_fl/3_1.htm 
Puzzle games: http:// www.4399.com/flash_fl/5_1.htm 
shooting game: HTTP: //www.4399.com/flash_fl/4_1.htm 
... 

'' '

 

import requests
import parsel
import csv
f = open('4399游戏.csv', mode='a', encoding='utf-8-sig', newline='')

csv_writer = csv.DictWriter(f, fieldnames=['游戏地址', '游戏名字'])
csv_writer.writeheader()
for page in range(1, 106):
    url = 'http://www.4399.com/flash_fl/5_{}.htm'.format(page)
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'
    }
    response = requests.get(url=url, headers=headers)
    response.encoding = response.apparent_encoding
    selector = parsel.Selector(response.text)
    lis = selector.css('#classic li ') 
    for li in lis:
        dit = {}
        data_url = li.css('a::attr(href)').get() 
        new_url ='http://www.4399.com' + data_url.replace('http://','/') 
        dit ['Game Address'] = new_url 
        title = li.css('img::attr(alt)').get() 
        dit['Game Name'] = title 
        print(new_url, title) 
        csv_writer.writerow(dit) 
f .close()

There are still a lot of data, only 32548 pieces of data are saved here

Guess you like

Origin blog.csdn.net/weixin_43881394/article/details/109051056