Python crawler senlenium crawls Lagou recruitment data, have you learned

1. Basic ideas

Target url: https://www.lagou.com/

Use selenium crawler to achieve, enter any keywords, such as python data analysis , click search, get relevant post information, crawl it and save it to Excel.

 

 

There are 30 pages, each page has 15 job postings.

Two, selenium crawler from selenium import webdriverimport timeimport loggingimport randomimport openpyxlwb = openpyxl.Workbook()sheet = wb.activesheet.append(['job_name','company_name','city','industry','salary','experience_edu' ,'welfare','job_label'])logging.basicConfig(level=logging.INFO, format='%(asctime)s-%(levelname)s: %(message)s')def search_product(key_word): browser. find_element_by_id('cboxClose').click() # Close the window that allows you to select the city time.sleep(2) browser.find_element_by_id('search_input').send_keys(key_word) # Locate the search box and enter the keyword browser.find_element_by_class_name('search_button ').click() # Click to search browser.maximize_window() # Maximize the window time.sleep(2) browser.find_element_by_class_name('body-btn').click() # Close the pop-up window to receive the red envelope window time.sleep( random.randint(1, 3)) browser.execute_script("scroll(0,3000)") # drop-down scroll bar get_data() # call the function of grabbing data # simulate clicking the next page to turn the page to crawl the data every time a page of data is crawled sleep control the crawling speed to prevent being reversed Climb to lose the verification code for i in range(29): browser.find_element_by_class_name('pager_next').click() time.sleep(1) browser.execute_script("scroll(0,3000)") get_data() time.sleep (random.randint(3, 5))def get_data(): items = browser.find_elements_by_xpath('//*[@id="s_position_list"]/ul/li') for item in items: job_name = item.find_element_by_xpath( './/div[@class="p_top"]/a/h3').text company_name = item.find_element_by_xpath('.//div[@class="company_name"]').text city = item.find_element_by_xpath( './/div[@class="p_top"]/a/span[@class="add"]/em').text industry = item.find_element_by_xpath('.//div[@class="industry"] ').text salary = item.find_element_by_xpath('.//span[@class="money"]').text experience_edu = item.find_element_by_xpath('.//div[@class="p_bot"]/div[@class="li_b_l"]').text welfare = item.find_element_by_xpath('.//div[@class="li_b_r"]').text job_label = item.find_element_by_xpath('.//div[@class="list_item_bot"]/div[@class="li_b_l"]').text data = f'{job_name},{company_name},{city},{industry},{salary},{experience_edu},{welfare},{job_label}' logging.info(data) sheet.append([job_name, company_name, city,industry, salary, experience_edu, welfare, job_label])def main(): browser.get('https://www.lagou.com/') time.sleep(random.randint(1, 3)) search_product(keyword) wb.save('job_info.xlsx')if __name__ == '__main__': keyword = 'Python 数据分析'# The path of chromedriver.exe chrome_driver = r'D:\python\pycharm2020\chromedriver.exe' options = webdriver.ChromeOptions() # Close the prompt that Chrome is under the control of the automatic test software at the top left options.add_experimental_option('useAutomationExtension', False) options.add_experimental_option("excludeSwitches", ['enable-automation']) browser = webdriver.Chrome(options=options, executable_path=chrome_driver) main() browser.quit()

The crawler runs and successfully crawls the data and saves it to Excel. The running results are as follows:

 

Three, view the data

If you are interested in Python, you can add the teacher's WeChat: abb436574, get a set of learning materials and video courses for free~

 

 

Guess you like

Origin blog.csdn.net/weixin_45820912/article/details/108378392