Baidu Knows Q&A aggregation collection software anti-crawling version documentation/Python collection script

Hello everyone, I am Tao Xiaobai~

 Baidu knows about the aggregation collection software documentation. It has only made a demonstration video before without giving a detailed introduction. Today, I will give you a detailed introduction based on the updated content.

1. Software language: Python 

2. Logic: Batch collection based on keywords----Aggregation of multiple articles----Save to local txt 

3. Configuration file description:

After we get the software, we first modify the config.ini configuration file, which contains the following custom contents:

path: keyword call path;

bf_num: The number of concurrency, the maximum value is 20, if it exceeds 20, the software will automatically change it to 20 concurrency;

out_path: data output path;

title_mode: title mode, 0 1 2 3 4, the meanings are as follows:

Keyword single title: 1

Know the title sheet title: 2

Keywords + Know the title Double title: 3

Know the title + Know the title: 4  

Random title pattern: 0

title_f, title_b: double title connector, Note: If you use spaces to link, please use English double quotes, for example: " ", this means using spaces to link;

title_len: Title length limit filtering, if the title length is greater than 30, it will be filtered out;

article_seq: extraction order switch, whether the article ID extraction order is randomly disrupted, for example: 123456789... Sequential articles, randomly scrambled: 951326487...
0 extracts in the default order, 1 disrupts the order of articles

article_num: The number of article aggregation is customized. The number of customized article aggregation: the minimum is 2 and the maximum is 10. If it is set to 0, it will be a random combination of 3-5 articles.

The above main functions are some added content based on customer needs. Later, we will continue to collect some questions from customers for optimization and upgrade.

4. Use requires purchasing a license and binding it to a computer;

5. Automatically map aggregated data;

6. If you need to know the custom logic of aggregated data based on Baidu, you can contact me to customize the logic;

7. Other notes: Try to use notepad++ to open and edit the keywords.txt and config.ini files. Try not to use Notepad to edit. Different computers may have different unknown errors!

8. Demonstration collection video:

Baidu knows collection tool software demonstration, article combination aggregation website update, rapid collection, batch collection

Guess you like

Origin blog.csdn.net/u012917925/article/details/133244188