利用爬虫爬取 zol网站热门手机

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/xiao_tommy/article/details/53363207

使用了20几分钟,爬取了zol相关的热门手机型号、特点、价格、上市时间、屏幕大小相关信息。对最新的热门手机做了一个简单的统计。如果你想知道任何其他的信息,可以给我留言。我已经把我的相关代码传导了github上。欢迎下载。另附其中还有关于LOL英雄数据统计的demo和看看豆网站的数据统计demo。

zol官方网站:http://mobile.zol.com.cn/

我的github:https://github.com/XiaoTommy/phpspider

相关爬虫代码

<?php
ini_set("memory_limit", "1024M");
require dirname(__FILE__).'/../core/init.php';

/* Do NOT delete this comment */
/* 不要删除这段注释 */

$configs = array(
    'name' => 'ZOL',
    'log_show' => false,
    'tasknum' => 1,
    //'save_running_state' => true,
    'domains' => array(
        'detail.zol.com.cn'
    ),
    'scan_urls' => array(
        'http://detail.zol.com.cn/cell_phone_index/subcate57_list_1.html'
    ),
    'list_url_regexes' => array(
        "http://detail.zol.com.cn/cell_phone_index/subcate57_0_list_1_0_1_2_0_\d.html"
    ),
    'content_url_regexes' => array(
        "http://detail.zol.com.cn/cell_phone/index\d+.shtml",
    ),
    'max_try' => 5,
    //'export' => array(
    //'type' => 'csv',
    //'file' => PATH_DATA.'/qiushibaike.csv',
    //),
    //'export' => array(
    //'type'  => 'sql',
    //'file'  => PATH_DATA.'/qiushibaike.sql',
    //'table' => 'content',
    //),
    'export' => array(
        'type' => 'db',
        'table' => 'zol',
    ),
    'fields' => array(
        array(
            'name' => "mobile_name",
            'selector' => "//div[contains(@class,'wrapper')]//div[contains(@class,'page-title')]/h1",
            'required' => true,
        ),
        array(
            'name' => "mobile_intro",
            'selector' => "//div[contains(@class,'wrapper')]//div[contains(@class,'page-title')]/div[contains(@class,'subtitle')]",
            'required' => true,
        ),
        array(
            'name' => "consult_price",
            'selector' => "//div[contains(@class,'wrapper')]//div[contains(@class,'price price-normal')]//b[contains(@class,'price-type')]/text()",
            'required' => true,
        ),
        array(
            'name' => "showdate",
            'selector' => "//div[contains(@class,'config-section')]//span[contains(@class,'showdate')]",
            'required' => true,
        ),
        array(
            'name' => "score",
            'selector' => "//*[@id=\"totalPoint\"]//div[contains(@class,'score')]/strong",
            'required' => true,
        ),
        array(
            'name' => "screen_size",
            'selector' => "//span[contains(@class,'param-value')]",
            'required' => true,
        ),
        array(
            'name' => "brand",
            'selector' => "//div[contains(@class,'breadcrumb')]/a[3]",
            'required' => true,
        ),
    ),
);

$spider = new phpspider($configs);



$spider->start();



这里我只分析了一个数据。只是简单的做一下示例

下图表示了热门产品中评分与价格的关系。2000以下的热门产品尤为多,而且评分普遍在3.5-4.5之间徘徊。


猜你喜欢

转载自blog.csdn.net/xiao_tommy/article/details/53363207