解决中文乱码问题 获取任意网页代码

分享一下我老师大神的人工智能教程!零基础,通俗易懂!http://blog.csdn.net/jiangjunshow

也欢迎大家转载本篇文章。分享知识,造福人民,实现我们中华民族伟大复兴!

               

我们在使用C#获取某个网页代码时,经常会遇到中文乱字符的问题:

            WebRequest request = WebRequest.Create(textBox2.Text);
            WebResponse response = null;
            try
            {
                response = request.GetResponse();
            }
            catch (Exception exc)
            {

            }

            Stream resStream = response.GetResponseStream();
            StreamReader sr = new StreamReader(resStream, System.Text.Encoding.Default); //这里使用了Encoding.Default,但有时还是免不了出现乱码!
            string tempCode = sr.ReadToEnd();
            resStream.Close();
            sr.Close();


做了改进:


        static string GetHtml(string url, Encoding encoding)

        {
            byte[] buf = new WebClient().DownloadData(url);
            if (encoding != null) return encoding.GetString(buf);

            string html = Encoding.UTF8.GetString(buf);
            encoding = GetEncoding(html);
            if (encoding == null || encoding == Encoding.UTF8) return html;

            return encoding.GetString(buf);
        }

        // 根据网页的HTML内容提取网页的Encoding
        static Encoding GetEncoding(string html)
        {
            string pattern = @"(?i)\bcharset=(?<charset>[-a-zA-Z_0-9]+)";
            string charset = Regex.Match(html, pattern).Groups["charset"].Value;
            try { return Encoding.GetEncoding(charset); }
            catch (ArgumentException) { return null; }

        }


//调用方法:

string url="http://www.fhcy88.com";

string tempCode = GetHtml(url, null);  //不知道编码时,第二个参数用null
           

给我老师的人工智能教程打call!http://blog.csdn.net/jiangjunshow

这里写图片描述

猜你喜欢

转载自blog.csdn.net/fguyfff/article/details/84026569