node 利用http和cheerio编写简易爬虫 - 代码天地

node 利用http和cheerio编写简易爬虫

其他 2018-11-08 13:51:19 阅读次数: 0

首先cnpm init创建一个package.json

引入cheerio模块 cnpm install --save cheerio

然后开始编写代码

let cheerio = require('cheerio'),
    http = require('http'),
    fs = require('fs'),
    url = 'http://so.8264.com/cse/search?q=2&s=9963133823733045431&p=',
    page = 1



http.get(url + page, function (res) {
    let html = ''; //用来存储请求网页的整个html内容
    res.setEncoding('utf-8'); //防止中文乱码
    //监听data事件，每次取一块数据
    res.on('data', function (chunk) {
        html += chunk;
    });
    //监听end事件，如果整个网页内容的html都获取完毕，就执行回调函数
    res.on('end', function () {
        // console.log(html)
        var $ = cheerio.load(html, {
                decodeEntities: false
            }),
            Arr = []
        //采用cheerio模块解析    html
        $('.result').each(function (index, element) {
            const _t = $(this)
            Arr.push({
                'title': _t.find('.c-title').text().trim(),
                'src': _t.find('a').attr('href').trim(),
                'img': _t.find('img').length > 0 ? _t.find('img').attr('src').trim() : '',
                'describe': _t.find('.c-abstract').text().trim()
            })
        })

        let writerStream = fs.createWriteStream('output.txt');
        writerStream.write(JSON.stringify(Arr), 'UTF8');
        writerStream.end();

    });
}).on('error', function (err) {
    console.log(err);
});

猜你喜欢

转载自www.cnblogs.com/lmyt/p/9928492.html

node 利用http和cheerio编写简易爬虫

node cheerio爬虫图片

node.js 爬虫乱码问题 cheerio

简易Node网络爬虫

node * 和 node **的使用

Node.js简易爬虫

Node.js抓取网页信息（cheerio网络爬虫）

Node.js抓取网页信息并展示（cheerio网络爬虫）

Node.js:request&cheerio爬虫获取免费代理

node爬虫使用cheerio解析html()出现乱码问题

Node.js爬虫只会Cheerio？来试试Puppeteer！

node 爬虫

node爬虫

node 和 http

Node学习（一）node利用自带的http服务编写服务器程序

node + sql 简易教程

node之http模块之爬虫和event

node-spider：node实践简单的爬虫

Node-Red HTTP Request Node for Beginners

使用node去爬虫

node简单的爬虫

node爬虫技术初探

Node实现简单爬虫

Node完成简单爬虫

node.js爬虫

Node爬虫实践

node - 简单的爬虫案例

node网页爬虫

Node学习之HTTP

node初探之http

今日推荐

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

基于大语言模型的开源知识库问答系统 MaxKB GitHub Star 数量突破 5,000 个！

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

《2024 年一季度互联网投融资运行情况》研究报告

报告：Django 仍然是 74% 开发者的首选

周排行

laravle中orm简单的增删改查

文本分类特征选取之CHI开方检验

Spark核心编程-WordCount

大数据开发实战系列之电信客服(1)

读书笔记 - 把时间当作朋友 by 李笑来

python 笔记--if else

SpringBoot/Mybatis/Druid, 多数据源MultiDataSource配置思路

排序三个整数

redis集群搭建【2】-Windows中Redis集群搭建

STM32F030驱动TM1650点亮4联数码管

每日归档

更多

2024-05-16(6)

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)