C # collection Audiobook console (with source code)

Thank "Beijing - Frances sharing", we ttlsa wandering groups are willing to help and share Frances is our second share ttlsa sister paper to send sister paper, said the article, open RAR surprised, would .net sister. paper incredible. well, the following is the original Qingqing today to introduce the next C # .net collection have demo sound novel, in fact, are the acquisition of Datong little meaning, as long as a good way to hold grip, all data can be collected. Not much to say. + Comments directly bonded to the code.
namespace CJ.BLL
{
    public class FictionBLL
    {
    private string domain = "www.xiats.com";//提前定义好变量,这个变量以后会总用到。不用总重复写了
        public void main()
        {
            //首先要找到一个列表页www.xiats.com/yslist/1.html

            string url = domain + "/yslist/1.html";//找到该页的最大页数,为了做遍历用
            string html = Tool.GetHtml(url);//获取到源码
            int maxPage = Tool.ToInt32(Regex.Match(html, "Page (?:\\d+) of (\\d+)</td>").Groups[1].Value);//正则里.*?是获取所有文字,包括数字,.Groups[2].Value指的是获取到第二个(.*?)里的数据 
            //好了。现在我们开始遍历数据
            //要先分析出地址的下一页是什么
            //所以要点开列表的第2页 www.xiats.com/yslist/1_2.html
            //好。最后真实地址找到。把2换成任意字符,如www.xiats.com/yslist/1_page.html
            string listurl = domain + "/yslist/1_page.html";
            for (int i = 1; i <= maxPage; i++)
            {

                listurl = listurl.Replace("page", i.ToString());//获取到真实地址
                List(listurl);
            }

        }

        /// <summary>
        /// 列表页数据分析
        /// </summary>
        /// <param name="url"></param>
        public void List(string url)
        {
            //这里主要是获取到小说详情页的地址。
            string html = Tool.GetHtml(url);
            //获取到源码后要把大的范围缩小,我们只需要中间的列表页信息
            html = Regex.Match(html, "<DIV class=\"layout_newlist\">([\\s\\S]+?)<DIV id=div_3>", RegexOptions.IgnoreCase).Groups[1].Value;//RegexOptions.IgnoreCase指的是忽略大小写

            //接下来就要取出详情页地址了
            MatchCollection matchs = Regex.Matches(html, "<DIV><A href=\"(.*?)\">");

            foreach (Match item in matchs)
            {
                string detailurl = domain + item.Groups[1].Value;//就这么简单的获取到了详情页地址
                Detail(detailurl);//最麻烦的详情页分析来了
            }

        }

        /// <summary>
        /// 详情页分析
        /// </summary>
        /// <param name="url"></param>
        public void Detail(string url)
        {
            string html = Tool.GetHtml(url);
            string title = Regex.Match(html, "<h3>(.*?)</h3>", RegexOptions.IgnoreCase).Groups[1].Value;
            string broadcast = Regex.Match(html, "<LI>播音:<SPAN style=\"COLOR: red\"><font color=\"blus\"><a(?:.*?)\">(.*?)</a>").Groups[1].Value;
            string author = Regex.Match(html, "<LI>作者:<a title=\"(?:.*?)\" href=\"(?:.*?)\">(.*?)</a>").Groups[1].Value;
            string state = Regex.Match(html, " <LI>状态:(.*?)</LI>").Groups[1].Value;
            string content = Regex.Match(html, "小说简介</span></li></div>([\\s\\S]+?)</div>", RegexOptions.IgnoreCase).Groups[1].Value;
            content = content.Replace("<div>", string.Empty);

            //下一步就是要把获取到的数据源写到数据库了。这里不做数据库操作的DEMO。

            //其它的大同小意,就不一一写出规则了。如果有问题,可以加入群一起讨论学习
            //可能你已经发现。正则就是两种
            //1、.*? 就是规则出所有文字
            //2、[\\s\\S]+? 获取含有\r\n\t这类的文字内容
            //做采集无非就是把大的内容缩放到最小。然后在匹配出想要获取的内容 
            //注:以上内容只是学习参考,数据还是最好自己原创。剽窃是对他人和自己的不公平 。
        }
    }

}
C # Source: c # console collection Audiobook Source  site: Web site operation and maintenance lifetime: http: //www.ttlsa.com/html/3627.html

Reproduced in: https: //my.oschina.net/766/blog/211555

Guess you like

Origin blog.csdn.net/weixin_34119545/article/details/91493250