[robot]送出HttpWebRequest以及接收Response(get,post)

摘要:[robot]送出HttpWebRequest(get,post)


以ie为例,观察fiddler之后的范例

GET:

原始的fiddler的raw数据:

GET http://w2.land.taipei.gov.tw/land4/loina.asp HTTP/1.1
Accept: text/html, application/xhtml+xml, */*
Accept-Language: zh-Hant-TW,zh-Hant;q=0.8,en-US;q=0.5,en;q=0.3
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko
Accept-Encoding: gzip, deflate
Host: w2.land.taipei.gov.tw
DNT: 1
Proxy-Connection: Keep-Alive
Cookie: ASPSESSIONIDQQDQDBBB=KPCIMLIBOFBMJBGAMPIBKAPL
 

C#:


HttpWebRequest request;
CookieContainer cookies = new CookieContainer();
string url = "http://w2.land.taipei.gov.tw/land4/loina.asp";

string html = "";

request = WebRequest.Create(url) as HttpWebRequest;
//如果需要使用proxy的话....
WebProxy _proxy = new WebProxy("http://myproxy.com.tw:8888", true);
_proxy.Credentials = CredentialCache.DefaultCredentials;                 
request.Proxy = _proxy;
//end of proxy
request.Method = "GET";
request.Accept = "text/html, application/xhtml+xml, */*";
request.Headers.Set("Accept-Language", "zh-Hant-TW,zh-Hant;q=0.8,en-US;q=0.5,en;q=0.3");
request.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko";
//client跟server说,我要使用的加密方式是gzip,server会看设定才决定是否采用
request.Headers.Set("Accept-Encoding", "gzip, deflate");
//如果对方回传的数据有用gzip加密的话,会自动用gzip方式解开, 没加这行的话,可能解不开
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
request.Host = "w2.land.taipei.gov.tw";
request.CookieContainer = cookies;
//以下这是默认值true, 有时候故意设定为false,就会抓不到html啰
request.KeepAlive = true;
		   
using (var response = (HttpWebResponse)request.GetResponse())
{
	using (var responseStream = response.GetResponseStream())
	{
		using (var reader = new StreamReader(responseStream, Encoding.Default))
		{
			html = reader.ReadToEnd();
		}
	}
}          

ps.20161009补充chrome的参考程序:

string result = "";
HttpWebRequest request;
CookieContainer cookies = new CookieContainer();            
string url = "www.yoururl.com";
request = WebRequest.Create(url) as HttpWebRequest;
request.Method = "GET";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
request.Headers.Set("Accept-Encoding", "gzip, deflate, sdch");
request.Headers.Set("Accept-Language", "zh-TW,zh;q=0.8,en-US;q=0.6,en;q=0.4,zh-CN;q=0.2");
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36";
request.CookieContainer = cookies;
request.Headers.Set("Upgrade-Insecure-Requests", "1");
//以下这是默认值true, 有时候故意设定为false,就会抓不到html啰
request.KeepAlive = true;

using (var response = (HttpWebResponse)request.GetResponse())
{
	using (var responseStream = response.GetResponseStream())
	{
		using (var reader = new StreamReader(responseStream, Encoding.UTF8))
		{
			result = reader.ReadToEnd();
		}
	}
}     

return result;

POST:

fiddler原始raw数据:

POST http://w2.land.taipei.gov.tw/land4/loina.asp HTTP/1.1
Accept: text/html, application/xhtml+xml, */*
Referer: http://w2.land.taipei.gov.tw/land4/loina.asp
Accept-Language: zh-Hant-TW,zh-Hant;q=0.8,en-US;q=0.5,en;q=0.3
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip, deflate
Proxy-Connection: Keep-Alive
Content-Length: 40
DNT: 1
Host: w2.land.taipei.gov.tw
Pragma: no-cache
Cookie: ASPSESSIONIDQQDQDBBB=KPCIMLIBOFBMJBGAMPIBKAPL
 
destrict=03&section=&land_mom=&land_son=
 

C#:


HttpWebRequest requestPost;
CookieContainer cookiesPost = new CookieContainer();
requestPost = WebRequest.Create(url) as HttpWebRequest;
string html = "";
string postData = "destrict=03§ion=&land_mom=&land_son=";//行政区选择中正区, 有特殊符记得HttpUtility.UrlEncode
requestPost.Method = "POST";
requestPost.Accept = "text/html, application/xhtml+xml, */*";
requestPost.Referer = "http://w2.land.taipei.gov.tw/land4/loina.asp";
requestPost.Headers.Set("Accept-Language", "zh-Hant-TW,zh-Hant;q=0.8,en-US;q=0.5,en;q=0.3");
requestPost.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko";
requestPost.ContentType = "application/x-www-form-urlencoded";
requestPost.Headers.Set("Accept-Encoding", "gzip, deflate");
requestPost.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
requestPost.ContentLength = postData.Length;
requestPost.Host = "w2.land.taipei.gov.tw";
requestPost.Headers.Set("Pragma", "no-cache");
requestPost.CookieContainer = cookiesPost;
//碰到(417) Expectation Failed错误的时候,把下面这行加上去
//System.Net.ServicePointManager.Expect100Continue = false;

using (var stream = requestPost.GetRequestStream())
{
	using (var writer = new StreamWriter(stream))
	{
		writer.Write(postData.ToString());
		writer.Flush();
		writer.Close();
	}
	stream.Close();
}

using (var response = (HttpWebResponse)requestPost.GetResponse())
{
	using (var responseStream = response.GetResponseStream())
	{
		using (var reader = new StreamReader(responseStream, Encoding.Default))
		{
			html = reader.ReadToEnd();
		}
	}
}

ps.记得如果重复使用request变量的话,每次都要重新设定Method,Accept,Referer,Accept-Language....

因为.net在每次送出request之后,会把上述header都reset掉

补充20151116:

一般来说,在网站要抓东西时,常常是连续好几个request + response才能取得目目标数据,每一次每一次的request都要重新

设定相关header条件喔,例如不能偷懒只设定第一次request的request.Accept = "text/html, application/xhtml+xml, */*";

这样子是抓不出数据来的...因为.net的默认似乎会将上一次的request设定的header的内容清空

(你看debug模式显示的变量状态,是显示没清空的,但是你如果只设定第一次request的header,事实上就是完全查不出数据喔)

除此之外,cookies也一定每次都要带入喔,因为连续的request的状态的连接,有时候是用ViewState,有时候是用cookies

ps.补充20151118:ViewState, ViewStateGenerator, EventValidation这三个参数会在传统的asp.net web form出现,如果出现的话,三个要一起改

ps.补充20160325:如果要改用其他浏览器(例如:chrome),再利用fiddler观察request以及response的内容与上面文章的差异之后,改成其他浏览器的header即可

PS. 补充20170720:如果是https且必需为TLS1.2较高等级的传输加密的话,需加上以下喔:(需要在电脑安装framework4.5才能正常运行喔,程序项目的版本设定为4.0 or 4.5都可以)

ServicePointManager.Expect100Continue = true;
ServicePointManager.SecurityProtocol = (SecurityProtocolType)3072; 
ServicePointManager.DefaultConnectionLimit = 9999;

原文:大专栏  [robot]送出HttpWebRequest以及接收Response(get,post)


猜你喜欢

转载自www.cnblogs.com/petewell/p/11526686.html
今日推荐