PHP example --CURL simple collection

Data on the Internet very much, there are news sites, shopping sites, video sites, if we need some regular data collection program if you do not use, we have a section of manual entry, this is a very large amount of work now we do collect collection procedures can be saved directly to our database without manual entry of an article. This collection features examples of simple implementation, we hope everyone can help reduce the workload.
 
In PHP, we can do so by writing CURL acquisition function, the first analysis of the characteristics of the site needs to collect data, adding to our property in CURL to achieve our custom collection function, and ultimately get the results we need.
 
CURL use
preg_match_all
array_filter
explode
 
The basic step of establishing request in PHP CURL
1. Create a new CURL resource: $ curl = curl_init ()
 
2 Set the URL and the appropriate option: curl_setopt ($ curl, option, value)
 
Common parameters:
 
CURLOPT_HEADER: If you want a head included in the output, set this option to a non-zero value.
CURLOPT_URL: This is the URL you want to use PHP retrieved. You can also set this option when () function initializes with curl_init.
CURLOPT_RETURNTRANSFER: If successful only returns the results, does not automatically output anything.
CURLOPT_SSL_VERIFYHOST: disable SSL certificate verification
CURLOPT_SSL_VERIFYPEER: disable SSL certificate verification
3. grab URL and pass it to the browser: $ output = curl_exec ($ curl)
 
4 Turn off the CURL resource, and free up system resources: curl_close ($ ch)
 
Third, the development of preparation
First To view the current environment supports PHP CURL.
Create and switch to the code directory
 
// Create a code catalog
sudo mkdir /home/code
// modify the code directory permissions
sudo chmod -R 777 /home/code
// Switch to the code directory
cd /home/code
// start the php built in this directory server
sudo php -S localhost:80
New phpinfo.php file and edit the file:
 
<?php
phpinfo();
?>
CURL search page, as shown below is supported by CURL.
 
 
 
If it does not, then modify the php.ini; extension = before php_curl.dll; removed, restarting the PHP server on it.
 
1 simple collection
CURl achieve access to a simple Web page features
In / home / code curltest.php create a new file, crawl laboratory building home, and output to the page:
 
<?php
1 // initialization, create a new cURL resource
$curl=curl_init();
// set URL 2 and the corresponding options, we collect `https: // www.shiyanlou.com /` page
curl_setopt($curl, CURLOPT_URL, "https://www.shiyanlou.com/");
// Because the laboratory building address is https, so the argument is false on behalf of not checking ssl certificate
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
// If you want a head included in the output, set this option to a non-zero value
curl_setopt($curl, CURLOPT_HEADER, 0);
// 3. execute and get results
curl_exec($curl);
// release CURL
curl_close($curl);
?>
The effect of the following code execution, the page showing the effect may be more chaotic, because some js and css files referenced address will fail, but this does not affect us get the data.
 
 
 
Like text replacement
Experimental laboratory building just above the output of the page, we can modify the data on the page when the output file is created curlreplace.php, we put all the "experimental building" page was changed to "I like to study in the laboratory building" Examples as follows.
 
<?php
1 // initialization, create a new cURL resource
$curl=curl_init();
// set URL 2 and the corresponding options, we collect `https: // www.shiyanlou.com /` page
curl_setopt($curl, CURLOPT_URL, "https://www.shiyanlou.com/");
// Because the laboratory building address is https, so the argument is false on behalf of not checking ssl certificate
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
// If you want a head included in the output, set this option to a non-zero value
curl_setopt($curl, CURLOPT_HEADER, 0);
// do not directly print out after execution
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
// 3. execute and get results
$resault=curl_exec($curl);
// 4 Release CURL
curl_close($curl);
// the page modify learning laboratory building at the floor as I like to experiment
echo str_replace ( "experimental building", "I like learning in the laboratory building", $ resault);
?>
 
 
3. Start page collection
The experiment used to acquire laboratory building curriculum, url address https://www.shiyanlou.com/courses/.
 
Pictures and title page of this course, I want to gather down into the database.
 
We view this page's source code can be seen that the laws we need pictures of address.
 
 
 
View the source code, we can see the course title and course pictures have a fixed format, we can use regular expressions to match the contents out.
 
 
 
Here is the title match, we can use an array of matching preg_match_all is a two-dimensional array.
 
preg_match_all () function is used to perform a global regular expression match
 
grammar:
 
// parameters to match the first regular expression to match a string of second, and the third for all matching results (array)
preg_match_all (pattern, subject, matches)
We find the style for the course-title span all 

Guess you like

Origin www.cnblogs.com/sj-php/p/11940048.html