c # WPF-- complete a simple Baidu Post Bar reptiles client

Ado first on the map

 

 

 

 

 

 10 crawling about 500 posts about 10s, 500 Ye Duoge 2w posts about 2min, shows that performance is not particularly good, but not bad.

Well ado, let's step by step to achieve such a simple client.

1. Create a project

Create an empty WPF project, to import needed Devexpress of dll

Devexpress can go to the official website to download the basic version can be 16 or more. Download a trial version can also be the basic maturity will not limit your use only develop when the box pops up, you can fork out, compare conscience.

Download: https://www.devexpress.com/

 

 

 2. editing interface

Basic xaml code is written, DevExpress's demo center, there are many examples, directly on the code.

<dx:ThemedWindow x:Class="SearchAnyWay.MainWindow"
        xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
        xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
        xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
        xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
        xmlns:dx="http://schemas.devexpress.com/winfx/2008/xaml/core"
        xmlns:dxmvvm="http://schemas.devexpress.com/winfx/2008/xaml/mvvm"
        xmlns:dxe="http://schemas.devexpress.com/winfx/2008/xaml/editors"
        xmlns:dxlc="http://schemas.devexpress.com/winfx/2008/xaml/layoutcontrol"   
        xmlns:dxg="http://schemas.devexpress.com/winfx/2008/xaml/grid"
        xmlns:local="clr-namespace:SearchAnyWay"
        mc:Ignorable="d"
        Title="百度贴吧搜索神器(v1.0)" Height="600" Width="800">
    <Grid>
        <dxlc:LayoutControl VerticalAlignment="Stretch" Orientation="Vertical" TextBlock.FontSize="11">
            <Label VerticalAlignment="Top" FontWeight="Bold" Content="输入您需要查找的关键字"></Label>
            <dxlc:LayoutGroup Orientation="Horizontal">
                <dxlc:LayoutItem Label="关键字(K)" AddColonToLabel="True">
                    <dxe:TextEdit EditValue="{Binding Path=Name, Mode=TwoWay, UpdateSourceTrigger=PropertyChanged, ValidatesOnDataErrors=True}" >
                        <dxmvvm:Interaction.Triggers>
                            <dxmvvm:KeyToCommand KeyGesture="Enter" Command="{Binding SearchCommand}"></dxmvvm:KeyToCommand>
                        </dxmvvm:Interaction.Triggers>
                    </dxe:TextEdit>
                </dxlc:LayoutItem>
                <dxlc:LayoutItem Label="贴吧名(N)" AddColonToLabel="True">
                    <dxe:TextEdit EditValue="{Binding Path=HubName, Mode=TwoWay, UpdateSourceTrigger=PropertyChanged, ValidatesOnDataErrors=True}">
                    </dxe:TextEdit>
                </dxlc:LayoutItem>
                <dxlc:LayoutItem Label="爬取页数(P)" AddColonToLabel="True">
                    <dxe:ComboBoxEdit ItemsSource="{Binding PageRange}"
                                      SelectedItem="{Binding Page}"
                                      ShowSizeGrip="False"
                                      IsTextEditable="False">
                    </dxe:ComboBoxEdit>
                </dxlc:LayoutItem>
                <dxlc:LayoutGroup HorizontalAlignment="Right" VerticalAlignment="Center">
                    <dx:SimpleButton x:Name="btnSearch" Content="查找(S)" Width="80" Command="{Binding SearchCommand}"></dx:SimpleButton>
         
                </dxlc:LayoutGroup>
            </dxlc:LayoutGroup>
            <dxg:TreeListControl x:Name="treeList"  Margin="0,10" ItemsSource="{Binding Source}"
                                         SelectionMode="Row" SelectedItem="{Binding SelectedRow}">
                <dxg:TreeListControl.Columns>
                    <dxg:TreeListColumn  FieldName="Title" Header="标题"  Width="2*"/>
                    <dxg:TreeListColumn  FieldName="Brief" Width="2*" Header="详情"/>
                    <dxg:TreeListColumn Header="回复数" FieldName="CommentCount" Width="*"/>
                    <dxg:TreeListColumn Header="作者" FieldName="AuthorName" Width="*"/>
                </dxg:TreeListControl.Columns>
                <dxg:TreeListControl.View>
                    <dxg:TreeListView x:Name="view" VerticalScrollbarVisibility="Auto" AutoExpandAllNodes="True"  AllowEditing="False" NavigationStyle="Row" ShowIndicator="False"  TreeDerivationMode="ChildNodesSelector" ChildNodesPath="ICDItems">
                        <dxmvvm:Interaction.Triggers>
                            <dxmvvm:EventToCommand EventName="SourceUpdated" Command="{Binding Commands.ExpandAllNodes, ElementName=view}" />
                            <dxmvvm:EventToCommand EventName="RowDoubleClick" Command="{Binding SearchCommand}" CommandParameter="{Binding ElementName=treeList,Path=SelectedItem}" />
                        </dxmvvm:Interaction.Triggers>
                    </dxg:TreeListView>
                </dxg:TreeListControl.View>
            </dxg:TreeListControl>
            <dxlc:LayoutGroup VerticalAlignment="Bottom" Orientation="Horizontal">
                <Label Content="帖子总数:" HorizontalAlignment="Right"/>
                <Label Content="{Binding Source.Count, UpdateSourceTrigger=PropertyChanged}"  HorizontalAlignment="Right">
                    </Label>
            </dxlc:LayoutGroup>
            <dxlc:LayoutGroup VerticalAlignment="Bottom" Orientation="Horizontal">
                <dxe:CheckEdit IsChecked="{Binding IsAll}"  Content="Include All" HorizontalAlignment="Left"/>
                <dx:SimpleButton Content="Copy VLPath To Clipboard" IsEnabled="{Binding CanNext}" Command="{Binding CopyVLPathCommand}" HorizontalAlignment="Left"></dx:SimpleButton>
                <dxlc:LayoutGroup HorizontalAlignment="Right">
                    <dx:SimpleButton Content="下载(D)" Width="80" IsEnabled="{Binding CanNext}" Command="{Binding NextCommand}"></dx:SimpleButton>
                    <dx:SimpleButton Content="清除(C)" Width="80" IsEnabled="{Binding CanNext}" Command="{Binding OKCommand}"></dx:SimpleButton>
                    <dx:SimpleButton Content="合作(P)" Width="80" Command="{Binding CancelCommand}"></dx:SimpleButton>
                </dxlc:LayoutGroup>
            </dxlc:LayoutGroup>
        </dxlc:LayoutControl>
        <dx:WaitIndicator  DeferedVisibility="{Binding IsLoading}" />
    </Grid>
</dx:ThemedWindow>

3.实现mvvm模式。

这里采用了DevExpress自带的的mvvm模式,和WPF自带的去创建的框架基本一致。不了解mvvm的同学可以去园子里看看相关文章。

(1)后台代码设置主题还有绑定视图模型。

 

public partial class MainWindow
    {
        public MainWindow()
        {
            InitializeComponent();
            //设置样式
            ApplicationThemeHelper.UseLegacyDefaultTheme = true;
            ApplicationThemeHelper.ApplicationThemeName = Theme.VisualStudioCategory;
            this.WindowStyle = System.Windows.WindowStyle.SingleBorderWindow;
            this.Icon = new BitmapImage(new Uri("../../debug.png",UriKind.Relative));
            this.BorderThickness = new Thickness(0);
            this.Margin = new Thickness(0);
            this.Padding = new Thickness(0);
            this.DataContext = new MainViewModel();
        }
    }

 

( 2 ) 设计帖子的实体类。

可以根据自己想要爬取的信息设计。

 public class ArticleModel
    {
        public string Title { get; set; }
        public string Brief { get; set; }
        public int CommentCount { get; set; }
        public string AuthorName { get; set; }
    }

(3)页数,帖子集合,等属性在ViewModel中进行声明。

//加载中
        private bool _loading;
        public bool IsLoading
        {
            get { return this._loading; }
            set
            {
                SetProperty(ref _loading, value, () => IsLoading);
            }
        }
        //贴吧名
        private string _hub;
        public string HubName
        {
            get { return this._hub; }
            set
            {
                SetProperty(ref _hub, value, () => HubName);
            }
        }
        //爬取页数
        private int _page;
        public int Page
        {
            get { return this._page; }
            set
            {
                SetProperty(ref _page, value, () => Page);
            }
        }
        //帖子集合
        public ObservableCollection<ArticleModel> _source;
        public ObservableCollection<ArticleModel> Source
        {
            get { return _source; }
            set { SetProperty(ref _source, value, ()=>Source); }
        }

(3)查询业务绑定到按钮的Command,下拉列表的绑定等。

public AsyncCommand SearchCommand { get; set; }

public IEnumerable<int> PageRange { get; private set; }
public MainViewModel()
        {
            Page = 10;
            PageRange = new List<int>() { 10,50, 100, 200, 500, 1000, 10000 };
            Source = new ObservableCollection<ArticleModel>();
            SearchCommand = new AsyncCommand(Search);
        }

4.爬虫业务的简单实现

我们使用HttpClient进行请求获取html页面的代码

 

使用AngleSharp解析html示例代码(按Ctrl+Shift+P快速安装NuGet包):Install-Package AngleSharp

 

相关简单使用:

//获取请求后response的页面代码。
                                string pageData = await http.GetStringAsync($"https://tieba.baidu.com/f?kw={HubName}&ie=utf-8&pn={pnIndex}");
//AngleSharp解析页面代码
                                IHtmlDocument doc = await parser.ParseDocumentAsync(pageData);

5.分析百度贴吧

 

 

 可以看到URL基本一致,主要是一个URL参数会跟着页数而变化就是pn(Page Number),规律就是(Page-1)*50。50大概就是每页有50个帖子

那我们就好处理了,获取每个帖子的节点然后再去依次查找我们所需要的数据就可以了。

爬取的核心代码如下

await Task.Run(() =>
                {
                    var http = new HttpClient();
                    var parser = new HtmlParser();
                    var result=Enumerable.Range(0, Page)
                        .AsParallel()
                        .AsOrdered()
                        .SelectMany(page =>
                        {
                            return Task.Run(async () =>
                            {
                                var pnIndex = page * 50;
                                //获取请求后response的页面代码。
                                string pageData = await http.GetStringAsync($"https://tieba.baidu.com/f?kw={HubName}&ie=utf-8&pn={pnIndex}".Dump());
                                //AngleSharp解析页面代码
                                IHtmlDocument doc = await parser.ParseDocumentAsync(pageData);
                                return doc.QuerySelectorAll(".t_con.cleafix").Select(tag => new ArticleModel()
                                {
                                    Title = tag.QuerySelector(".j_th_tit").TextContent?.Trim(),
                                    Brief= tag.QuerySelector(".threadlist_abs.threadlist_abs_onlyline")?.TextContent?.Trim(),
                                    CommentCount=Convert.ToInt32(tag.QuerySelector(".threadlist_rep_num.center_text")?.TextContent),
                                    AuthorName=tag.QuerySelector(".frs-author-name.j_user_card")?.TextContent?.Trim(),
                                }); ;
                            }).GetAwaiter().GetResult();
                        });
                    Source = new ObservableCollection<ArticleModel>(result);
                });

一个小细节就是dom元素如果class中有空格查找的时候一定要用'.'来代替,比如dom元素class是'ftt poot'那么查找的时候就应该是tag.QuerySelector(".ftt.poot")坑里了我很久!!!可能是我这方面没怎么接触过吧。。。

好了,爬取的功能完成了,其他的边角料就自己随意发挥吧,哈哈。

代码下载地址:https://github.com/BruceQiu1996/WPF-/tree/master

 

 

 

Guess you like

Origin www.cnblogs.com/qwqwQAQ/p/12014383.html