I. Introduction
本文简单介绍一下如何用puppeteer抓取页面数据。
Second, download
npm install puppeteer --save-dev
npm install typescrip --save-dev
Third, examples
(A) an example (see the section of code)
import { launch } from 'puppeteer';
async function maoyan_board_run() {
let browser = await launch({
ignoreHTTPSErrors: true,
headless: true,
executablePath: 'D:\\wangxiao\\chrome-win\\chrome-win\\chrome.exe',
args: ['--start-maximized']
});
const page = await browser.newPage();
await page.setViewport({width:1980,height:1080});
await page.goto('https://maoyan.com/board', { waitUntil: 'load' });
console.log(await page.title());
await browser.close();
}
maoyan_board_run();
After the operation, agreed to the title of the current page, analyze the code to do
- launch () Analog launch a browser, which pay attention to the parameters, headless: true endless mode, without opening a browser, - start-maximized: browser maximized, executablePath: chromiun specified path
- browser.newPage () to open a new page
- page.setViewport () aspect specified window
- page.goto () to open a Web site, waitUtil: load loaded
(B) analysis page selector
Let's analyze this page, we first find popular list, the name of the movie, starring, release time are in a one, it is not as long as we get one other are the same all get to
Let's analyze one spot
const movie_bank = 'i[class*=board-index]';
According to page element analysis, to get a value within the tag ($$ eval usage needless to say, have already spoken)
,
const banks = await page.$$eval(movie_bank, list =>
list.map(n => n.innerHTML)
);
Other content acquisition method Yihuhuhuapiao, complete code is as follows
// 热门口碑榜-名次
const movie_bank = 'i[class*=board-index]';
// 热门口碑榜-名字
const movie_name = '.movie-item-info .name a';
// 热门口碑榜-主演
const movie_star = '.movie-item-info .star';
// 热门口碑榜-上映时间
const movie_releasetime = '.movie-item-info .releasetime';
// 热门口碑榜-图片
const board_lists_images = '.board-wrapper dd .image-link .board-img';
async function maoyan_board_run() {
let browser = await launch({
ignoreHTTPSErrors: true,
headless: true,
executablePath: 'D:\\wangxiao\\chrome-win\\chrome-win\\chrome.exe',
args: ['--start-maximized']
});
const page = await browser.newPage();
await page.setViewport({width:1980,height:1080});
await page.goto('https://maoyan.com/board', { waitUntil: 'load' });
// await autoScroll(page);
const length = await page.evaluate( (movie_bank) => {
return document.querySelectorAll(movie_bank).length;
},movie_bank);
const banks = await page.$$eval(movie_bank, list =>
list.map(n => n.innerHTML)
);
const names = await page.$$eval(movie_name, list =>
list.map(n => n.getAttribute('title'))
);
const stars = await page.$$eval(movie_star, list =>
list.map(n => n.innerHTML.replace(/\n/g,"").replace(/\s/g,""))
);
const releasetimes = await page.$$eval(movie_releasetime, list =>
list.map(n => n.innerHTML)
);
let data = [];
for (let i =0;i<length;i++) {
data.push({
bank:banks[i],
name:names[i],
star:stars[i],
releasetime:releasetimes[i]
})
}
await page.waitFor(10000);
console.log(data);
await browser.close();
}
maoyan_board_run();