c # reptile (a) HELLO WORLD

Recent exploration in the East reptiles related to write essays in order to forget.

Purpose and Use

Real-world projects, we need a lot of third-party interfaces. But often these third-party interfaces due to the conditions, sometimes without any success.

for example:

1. Taobao today what specials.

2. What is Baidu Today's Trending yes.

3. A user of water and electricity, bill, there is no gas arrears, owed much.

And so on, how to do it?

Please run for a programmer to solve, because the programmer is universal programmer is invincible.

 

principle

Since we can not afford to (financial) and Alibaba, Baidu and other third party to do docking, then we can access the page by simulating them for and grab the corresponding data to collect relevant information.

Here, the simulation is the key, there are many details to deal with. Later I will explain one by one.

 

tool

Gongyushanjishi, soldiers line tool. Others want to simulate access web crawling corresponding data, we must first understand the corresponding page to do what action, how to understand, here I introduce three tools I've ever used.

1. IE Developer Tools

IE developer tools, open IE when press "F12" can call directly, and select "Network"> "Start Capture"

For capture function, it requires IE version 9 of 9 or more. IE8 does not support packet capture.

Advantages: IE comes, very convenient.

Cons: interface style a little bit IE9 and IE11 are not the same. For IE11 copy of the data is very inconvenient. IE9 with a feeling okay. At least data can be copied. But I have a more complex data crawling, it actually missing a key post.

Recommendation: only supports IE site, and relatively simple data crawling, you can directly select this tool.

 

2. CHROME developer tools

google developer tools is good. F12 exhaled by the same energy, or press CTRL + Shift + I can be exhaled.

The same select "Network" to capture operational data packet of the current page.

Advantages: browser comes

Disadvantages: temporarily not found.

 

3. httpwatch

Some time ago the first capture, have been unable to grasp, and later thought might be a problem IE Developer Tools, under this try. Very good results. More professional.

Pros: very comprehensive packet capture

Disadvantages: self-install

 

If you are a novice, I recommend using CHOROME development tools, or httpwatch.

 

Related classes

In principle, as long as the HTTP protocol to the class, you should be able to, C # and capture related classes I took two, basically meet the needs.

WebClient class,

Such use is very simple, directly to the page with the URL parameters you can get the page output corresponding to the content. But if the conversation comes to client and server remain: the situation (such as data need to log in before you can see), it is no valor of the earth. For the usage of WebClient

HttpWebRequest & HttpWebResponse

These two classes are the deeper class, if you need to maintain a session with the server. This would need to rely on the two brothers.

I heard other languages ​​Ethereal is relatively simple, relatively cumbersome and C #, but also because of her, why did he then bottom of it, but I heard a little BUG, ​​but I have not met.

 

HtmlAgilityPack

Such third party belonging to the class, your own Baidu. Of course not necessarily need to use. Reptiles need to see to make the data needed to climb many, if just want to get the page in one or two fields, totally do not need.

We know that the use of reptiles crawling out of a long list of strings, but in fact he is the html tag string, we would like to use js to manipulate dom elements. Unfortunately, C #, there is no such function. And this third-party class, is to achieve a string of html tags, similar to the DOM directly converted into the same object. In this way, we can easily get to find the corresponding value of the data.

 

 

hello world

For the latter also comes to analog information login, file transfer, etc., and here I use the HttpWebRequest class to illustrate.

List of data blog Home Park, for example:

static  void the Main ( String [] args) 
        { 
            String HTML = the Hello (); 
            Console.WriteLine (HTML); 
            Console.Read (); 
        } 
         static  String   the Hello () 
        { 
            // list data, typically by capture tool, click the one, to find common rules to find the corresponding the uRL of 
            String url = " http://www.cnblogs.com/mvc/AggSite/PostList.aspx " ;
             // according to packet capture tool, POST data acquisition needs, and the corresponding numerical analysis, details here PageIndex: 1 refers to the second page. Other temporarily for analysis 
            String postData = "{\ "CategoryType \": \ "SiteHome \", \ "ParentCategoryId \": 0, \ "CategoryId \": 808, \ "PageIndex \": 1, \ "TotalPostCount \": 4000, \ "ItemListActionName \" : \ "PostList \"} " ; 

            // 1. Construction of the Request 
            the HttpWebRequest REQ = (the HttpWebRequest) the WebRequest.Create (URL); 
            req.Method = " the POST " ; // the POST OR GET, if GET, there is no second parameter passing step, a third step directly obtain data returned from the server 
            req.AllowAutoRedirect = to false ; // server redirection is generally set to false. 
            req.ContentType = " file application / X-WWW-form-urlencoded " ;// data is typically set this value, unless the file upload 

            // 2 form, with the over-current transmission parameters.
            byte[] postBytes = Encoding.UTF8.GetBytes(postData);
            req.ContentLength = postBytes.Length;
            Stream postDataStream = req.GetRequestStream();
            postDataStream.Write(postBytes, 0, postBytes.Length);
            postDataStream.Close();

            //3. 获取服务器端的返回数据。
            HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
            string html = new StreamReader(resp.GetResponseStream()).ReadToEnd();
            return html;
        }

 

The most simple analog capture major step 3

1. Build HttpWebRequest.

2. Construction of parameters to be transmitted

3. Get HttpWebResponse

 

references:

http://www.cnblogs.com/hambert/p/6118299.html

http://www.crifan.com/emulate_login_website_using_csharp/

https://q.cnblogs.com/q/67303/

Reproduced in: https: //www.cnblogs.com/xinjian/p/6340514.html

Guess you like

Origin blog.csdn.net/weixin_33995481/article/details/93822277