PHP根据元器件型号抓取500条数据并存入txt

有一个哥们找我抓取一下数据,根据500多个元器件型号,在某个网站去查并且写入txt文件。于是怎么简单怎么来,使用一个叫simple_html_dom.php的解析库。下载地址:https://github.com/zhiquan181/zqtoys/tree/master/YQJDatas

doc.php

<meta charset="utf-8">
<?php
	ignore_user_abort(); //即使Client断开(如关掉浏览器),PHP脚本也可以继续执行.
	error_reporting(0);
	header("Content-type: text/html; charset=utf-8");
	include('simple_html_dom.php');
 	// error_reporting(0);  
	$ch = curl_init();
	$j = 0;
	$k = 0;
	$l = 0;
	$arr_url = array(
		'MP4032-2GS-Z',
		'MP9180DG-LF-Z',
		'MP28252EL-LF-Z',
		...//假装500条
		'MP3204DJ-Z',
		'MP1521EQ-LF-Z',
	);
	$arr_length = sizeof($arr_url);
	//var_dump($arr_length);
	
	for ($i=0; $i <$arr_length ; $i++) {
		//伪造ip来源
		$ip = $_GET['ip'] ? $_GET['ip'] : '1.1.1.1'; 
		$ipArr = explode(".",$ip); 
		$ipArr[3]=rand(1,255); 
		$ipArr[2]=rand(1,255);
		$ipArr[1]=rand(1,255);
		$ipArr[0]=rand(1,255);
		$ip = implode(".",$ipArr);
		$headers['CLIENT-IP'] = $ip;
		$headers['X-FORWARDED-FOR'] = $ip;
		curl_setopt($ch,CURLOPT_HTTPHEADER,$headers);

		//伪造user_agent
		$user_agent = 'Safari Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.73.11 (KHTML, like Gecko) Version/7.0.1 Safari/5'; 
		curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);

		//url请求
		$uri = 'http://www.oemstrade.com/search/';
		curl_setopt($ch, CURLOPT_URL, $uri.$arr_url[$i]);    
		curl_setopt($ch, CURLOPT_HEADER, 1);
		curl_setopt($ch, CURLOPT_REFERER, "http://www.oemstrade.com/"); //构造来路
		curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);    
		curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);   
		$output = curl_exec($ch) ;

		//html解析
		$html = new simple_html_dom();
	    $html->load($output);
	    $target = array();
	    $number = $html->find('#list1562 > table > tbody > tr > td.td-part-number > a');
	    // $name = $html->find('#list1562 > table > tbody > tr > td.td-distributor-name > span');
	    $description = $html->find('#list1562 > table > tbody > tr > td.td-description > p');
	    $stock = $html->find('#list1562 > table > tbody > tr > td.td-stock > p > span > b');
	    // $price = $html->find('#list1562 > table > tbody > tr > td.td-price');
	    // var_dump($number[0]->plaintext);
	    // var_dump($name[0]->plaintext);
	    // var_dump($description[0]->plaintext);
	    // var_dump($stock[0]->plaintext);
	    // var_dump($price[0]->plaintext);
	    // var_dump($dom[0]->plaintext);

	    if ($description[0]->plaintext) {        
        	$open=fopen("target.txt","a" );//a代表追加
        	fwrite($open,$arr_url[$i].";".$number[0]->plaintext.";".$description[0]->plaintext.";".$stock[0]->plaintext.";"."\r\n");
        	// var_dump($arr_url[$i].";".$number[0]->plaintext.";".$description[0]->plaintext.";".$stock[0]->plaintext.";");
			fclose($open);
	        $j++;
	    }
	    else{
        	$open=fopen("target.txt","a" );//a代表追加
        	fwrite($open,$arr_url[$i].";"."NO DATA!"."\r\n");
        	// var_dump($arr_url[$i].";"."NO DATA!");
			fclose($open);
			$l++;
	    }
	    $k++;
	}

	print_r('总共'.$k.'个数据已经检索完成!其中,查询有结果的有'.$j.'个,查询无结果的有'.$l.'个。');

运行如下:

生成txt

手动复制到excel,利用分隔符分列就可以了

猜你喜欢

转载自blog.csdn.net/Cai181191/article/details/97395664