一次软件爬取记录

昨天终于把成功爬取一个gps软件,这里记录一下过程。

起初,使用Fiddler偶然发现它的请求是这样的


于是,点击出发请求,发现果然使用http协议,先查看头信息,没有cookies,

使用:

POST http://*****/page/webservices/TRWebService.asmx HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; MS Web Services Client Protocol 2.0.50727.8669)
Content-Type: text/xml; charset=utf-8
SOAPAction: "http://www.*****.cn/New_GetVehicleminiInfo"
Host: *****
Content-Length: 404

区别获取哪类信息。

试着在浏览器访问url,得到了:


检索New_GetVehicleInfo,查看


response里的数据是base64Binary:

H4sIAAAAAAAEAOy9B2AcSZYlJi9tynt/SvVK1+B0oQiAYBMk2JBAEOzBiM3mkuwdaUcjKasqgcplVmVdZhZAzO2dvPfee++999577733ujudTif33/8/XGZkAWz2zkrayZ4hgKrIHz9+fB8/Ih7/Gr/Hr/Hu11j8GuWvkf4al79G/mvUv0bzaxS/RvVrLH+Nz36Nj36N3V9j/Gvs0M+Uvln+GlP6fEbfLn+NC/7281/jya+x92vco1Z79Nfv8Wsc/RqP6f2K/mv59zP6mRNsQP799e2c3/z3v3/8a7yiN+/9Gp/S/wEf31XUNud3am714NfYJ8hjarHzazzkVjW1yAiDkto03Eawm9GnrcLeo892f42DX2Obf27T3/JuRlhc/Bq/6NdY02+f8SfSL6C5v+++F+Yp/fcbvPmL/9jf4f820L7eKLpj+BdexEaR/hq/7A9Oe+MD3Nj45FM3Pvn7duN7QKN7sGFmdunbhzSmT5kO8TH93MyMwfw2M3ObUXTH8HM3M3/di+Nf4+mv8W3qDSMcHhOk8T7zm7T6uZ+ZEPPbzczNo+iO4f8NM/NwozbbYT3g5u//TTMDzM28fMgIuvj/3M3K4R+waSx7NIL7BGlMvx2w/fp/w2juevbz/wkAAP//Bpo5rZoHAAA=

在网上搜索一阵找到类似的情况:

http://bbs.csdn.net/topics/390408695


看下来结论是加密后的数据,需要找到加密过程。

还有另一个:

http://www.newsmth.net/nForum/#!article/WebDev/24420


这个没看明白是怎么解码的。

最后大致明白使用了加密算法,意思是在登陆时交换密钥,之后的数据都先加密在发出去。

这样就只能使用Wireshark抓包,以免漏掉其他类型的请求,结果没找到,无奈只能看源码。

使用JetBrains dotPeek反编译出c#源码,之所以确定是c#,是因为在软件安装目录使用了大量dll,其中一些三方的,搜索一下就能确定是.net。

在源码中检索加密方法,如des,aes, md5。最后找到了一个加密方法:

public static string StrEncode(string data)
{
  byte[] bytes1 = Encoding.ASCII.GetBytes("********");
  byte[] bytes2 = Encoding.ASCII.GetBytes("********");
  DESCryptoServiceProvider cryptoServiceProvider = new DESCryptoServiceProvider();
  int keySize = cryptoServiceProvider.KeySize;
  MemoryStream memoryStream = new MemoryStream();
  CryptoStream cryptoStream = new CryptoStream((Stream) memoryStream, cryptoServiceProvider.CreateEncryptor(bytes1, bytes2), CryptoStreamMode.Write);
  StreamWriter streamWriter = new StreamWriter((Stream) cryptoStream);
  streamWriter.Write(data);
  streamWriter.Flush();
  cryptoStream.FlushFinalBlock();
  streamWriter.Flush();
  return Convert.ToBase64String(memoryStream.GetBuffer(), 0, (int) memoryStream.Length);
}
然后检索这个方法明,只找到一个加密的语句:

GlobalTool.SetCfgValue("mapType", Convert.ToString(Convert.ToUInt32(GlobalTool.gi_MapType)));
string text = this.txtName.Text;
string appValue = GlobalTool.StrEncode(this.txtPassword.Text);
if (!this.cbSavePassword.Checked)
{
    appValue = "";
}
显然,这是加密密码用的,没有加密其他数据。

改变思路,寻找请求方法,检索: New_GetVehicleInfo,找到了该方法:

public override void loadUserVehicleInfo(string svrid, string userid, string userpassword)
{
    try
    {
        DateTime now = DateTime.Now;
        byte[] compressedData = this.m_WebSvc.New_GetVehicleInfo(svrid, userid, userpassword);
        TimeSpan span = (TimeSpan) (DateTime.Now - now);
        Console.Out.WriteLine("New_GetVehicleInfo:" + span.TotalMilliseconds.ToString());
        string strxml = GlobalTool.GzipDeCompress(compressedData);
        span = (TimeSpan) (DateTime.Now - now);
        Console.Out.WriteLine("GlobalTool.GZipDeCompress:" + span.TotalMilliseconds.ToString());
        if (strxml.Length > 0)
        {
            this.ReaderVehiceInfoXML(strxml);
        }
        span = (TimeSpan) (DateTime.Now - now);
        Console.Out.WriteLine("ReaderVehiceInfoXML:" + span.TotalMilliseconds.ToString());
    }
    catch (Exception exception)
    {
        this.ErrorInfo("loadUserVehicleInfo", exception.ToString());
    }
}

接着检索:New_GetVehicleInfo

public byte[] New_GetVehicleInfo(string strsvr, string userid, string pwd)
{
    return (byte[]) base.Invoke("New_GetVehicleInfo", new object[] { strsvr, userid, pwd })[0];
}
检索:GzipDeCompress

public static string GzipDeCompress(byte[] compressedData)
{
  StringBuilder stringBuilder = new StringBuilder();
  int num = 0;
  byte[] buffer = compressedData;
  byte[] numArray = new byte[4096];
  Stream stream = (Stream) new GZipStream((Stream) new MemoryStream(buffer), CompressionMode.Decompress);
  while (true)
  {
    int count = stream.Read(numArray, 0, numArray.Length);
    if (count > 0)
    {
      num += count;
      stringBuilder.Append(Encoding.Unicode.GetString(numArray, 0, count));
    }
    else
      break;
  }
  stream.Close();
  return stringBuilder.ToString();
}
至此,可以确定数据没有加密,而是被压缩了,使用了base64编码,于是使用站长工具

http://www.bejson.com/enc/base64/

进行base64解码,结果什么都没解出来,就直接使用gzip解压,然后报错。

安装vs2008,使用上面的GzipDeCompress方法进行解压,报错。

最后尝试使用base64包解码,在使用gzip解压,得出结果。

< ? x m l   v e r s i o n = " 1 . 0 "   e n c o d i n g = " G B 2 3 1 2 " ? > < r o o t > < I t e m   v _ c o d e = " �]A R 3 3 6 3 "   o d o m e t e r = " 7 4 2 . 3 0 9 "   r m a i l e s = " 0 "   d a t e = " 2 0 1 8 - 0 1 - 2 9 "   a v g q u a = "   "   o i l = "   "   / > < I t e m   v _ c o d e = " �]A R 3 3 6 3    T���  "   o d o m e t e r = " 7 4 2 . 3 0 9 "   r m a i l e s = " "   d a t e = " �N2 0 1 8 - 0 1 - 2 9   �  2 0 1 8 - 0 1 - 3 0 "   a v g q u a = " 0 "   o i l = " 0 "   / > < I t e m   v _ c o d e = " �]A R 7 6 7 3 "   o d o m e t e r = " 1 7 9 . 6 3 6 "   r m a i l e s = " 0 "   d a t e = " 2 0 1 8 - 0 1 - 2 9 "   a v g q u a = "   "   o i l = "   "   / > < I t e m   v _ c o d e = " �]A R 7 6 7 3    T���  "   o d o m e t e r = " 1 7 9 . 6 3 6 "   r m a i l e s = " "   d a t e = " �N2 0 1 8 - 0 1 - 2 9   �  2 0 1 8 - 0 1 - 3 0 "   a v g q u a = " 0 "   o i l = " 0 "   / > < I t e m   v _ c o d e = " �NA D H 9 6 7 "   o d o m e t e r = " 1 1 2 5 . 3 7 "   r m a i l e s = " 0 "   d a t e = " 2 0 1 8 - 0 1 - 2 9 "   a v g q u a = "   "   o i l = "   "   / > < I t e m   v _ c o d e = " �NA D H 9 6 7    T���  "   o d o m e t e r = " 1 1 2 5 . 3 7 "   r m a i l e s = " "   d a t e = " �N2 0 1 8 - 0 1 - 2 9   �  2 0 1 8 - 0 1 - 3 0 "   a v g q u a = " 0 "   o i l = " 0 "   / > < I t e m   v _ c o d e = " �NA D H 9 9 3 "   o d o m e t e r = " 7 0 2 . 9 6 7 "   r m a i l e s = " 0 "   d a t e = " 2 0 1 8 - 0 1 - 2 9 "   a v g q u a = "   "   o i l = "   "   / > < I t e m   v _ c o d e = " �NA D H 9 9 3  T���"   o d o m e t e r = " 7 0 2 . 9 6 7 "   r m a i l e s = " "   d a t e = " �N2 0 1 8 - 0 1 - 2 9   �  2 0 1 8 - 0 1 - 3 0 "   a v g q u a = " 0 "   o i l = " 0 "   / > < I t e m   v _ c o d e = " ;`���"   o d o m e t e r = " 2 7 5 0 . 2 8 2 "   r m a i l e s = " "   d a t e = " �N2 0 1 8 - 0 1 - 2 9   �  2 0 1 8 - 0 1 - 3 0 "   a v g q u a = " 0 "   o i l = " 0 "   / > < / r o o t > 

可以看出字符间距过大,存在中文乱码,查看响应内容:


一个是utf-8,一个是gb2312,于是尝试解码,报错,再次尝试gbk,报错。无奈上网搜索。

https://stackoverflow.com/questions/4735566/python-unicode-problem


万万没想到,居然是utf-16,
<?xml version="1.0" encoding="GB2312"?><root><Item v_code="川AR33**" odometer="742.309" rmailes="0" date="2018-01-29" avgqua=" " oil=" " /><Item v_code="川AR33**  合计: " odometer="74**09" rmailes="" date="从2018-01-29 至 2018-01-30" avgqua="0" oil="0" /><Item v_code="川AR76**" odometer="17**36" rmailes="0" date="2018-01-29" avgqua=" " oil=" " /><Item v_code="川AR76**  合计: " odometer="17**36" rmailes="" date="从2018-01-29 至 2018-01-30" avgqua="0" oil="0" /><Item v_code="京ADH9**" odometer="112**7" rmailes="0" date="2018-01-29" avgqua=" " oil=" " /><Item v_code="京ADH9**  合计: " odometer="112**7" rmailes="" date="从2018-01-29 至 2018-01-30" avgqua="0" oil="0" /><Item v_code="京ADH9**" odometer="70**67" rmailes="0" date="2018-01-29" avgqua=" " oil=" " /><Item v_code="京ADH9** 合计:" odometer="70**67" rmailes="" date="从2018-01-29 至 2018-01-30" avgqua="0" oil="0" /><Item v_code="总计:" odometer="275**82" rmailes="" date="从2018-01-29 至 2018-01-30" avgqua="0" oil="0" /></root>

至此完成。

猜你喜欢

转载自blog.csdn.net/fsh_walwal/article/details/79206259