Combine IE9 developer tools to obtain the html code of dynamic web pages

I’m working on a project recently. It’s easier to get some data in a webpage. Static pages are easier to do. You can get HTML code by parsing the URL address of the website, but some webpages are dynamically generated, such as the address bar during page turning The URL address will not change, so it is relatively troublesome to get the content of this kind of webpage. Let me take the page turning action of the website https://honors.libraries.psu.edu/browse/author/all/ as an example to illustrate the process of obtaining the HTML code of a dynamic web page.


1. Open this website with IE9: https://honors.libraries.psu.edu/browse/author/all/



2. Press F12 to call up developer tools



Click "Network" --> "Start Capture" in Developer Tools, and then click the "next page" link on the webpage


3. The process of getting the entire request


Click "Go to detailed view"


4. Binding the parameters to the HtmlWebRequest object of c#


///<summary>

        ///采用https协议访问网络

        ///</summary>

        ///<param name="URL">url地址</param>

        ///<param name="strPostdata">发送的数据</param>

        ///<returns></returns>

        public string OpenReadWithHttps(string URL, string strPostdata, Encoding encoding)
        {
            CookieContainer cc = new CookieContainer();

            cc.Add(new Cookie("csrftoken", "04696113ff3ee3e8220dd9044921e100", "/browse/author/all/", "honors.libraries.psu.edu"));
            cc.Add(new Cookie("__utma", "148028590.1404245236.1416720957.1416734716.1416748914.3", "/browse/author/all/", "honors.libraries.psu.edu"));
            cc.Add(new Cookie("__utmz", "148028590.1416720957.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)", "/browse/author/all/", "honors.libraries.psu.edu"));
            cc.Add(new Cookie("__utmb", "148028590.2.10.1416748914", "/browse/author/all/", "honors.libraries.psu.edu"));
            cc.Add(new Cookie("__utmc", "148028590", "/browse/author/all/", "honors.libraries.psu.edu"));

            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(URL);

            request.CookieContainer = cc;            

            request.Method = "post";

            request.Accept = "text/html, application/xhtml+xml, */*";

            request.ContentType = "application/x-www-form-urlencoded";

            request.Referer="https://honors.libraries.psu.edu/browse/author/all/";

            request.KeepAlive = true;

            request.UserAgent = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)";

            request.Host = "honors.libraries.psu.edu";

            request.Headers.Add(HttpRequestHeader.AcceptLanguage, "en-US");

            request.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip, deflate");

            request.Headers.Add(HttpRequestHeader.CacheControl, "no-cache");

            byte[] buffer = encoding.GetBytes(strPostdata);

            request.ContentLength = buffer.Length;

            Stream writer = request.GetRequestStream(); //获得请求流    

            writer.Write(buffer, 0, buffer.Length); //将请求参数写入流   
            
            writer.Close(); //关闭请求流

            HttpWebResponse response = (HttpWebResponse)request.GetResponse();

            using (StreamReader reader = new StreamReader(response.GetResponseStream(), encoding))
            {

                return reader.ReadToEnd();
            }

        }


Parameter Description:

URL:请求的地址,strPostdata:POST发送的数据,encoding:页面编码

5. Call

private void button2_Click(object sender, EventArgs e)
        {
            string url = "https://honors.libraries.psu.edu/browse/author/all/";
            string strPostData = "csrfmiddlewaretoken=04696113ff3ee3e8220dd9044921e100&browse_start=all&browse_type=author&page=9&display=50&num_display_items=50";

            textBox1.Text = OpenReadWithHttps(url, strPostData, Encoding.UTF8);
        }

Summarize the process: Use the IE9 developer tool to capture the page request process, get the parameters of the request, and then bind the parameters to the HtmlWebRequest object for request!


Guess you like

Origin blog.csdn.net/hn_tzy/article/details/41420993