Based on landing page characteristics

The objects crawled, stored and indexed by crawlers based on the characteristics of the target webpage are generally websites or webpages. According to the method of obtaining seed samples, it can be divided into:
(1) Pre-given initial grab seed samples;
(2) Pre-given web page classification catalog and seed samples corresponding to the classification catalog;
(3) Grabbing determined by user behavior Take target sample:

  • Display marked grab samples during user browsing;
  • Obtain access patterns and related samples through user log mining.

Guess you like

Origin blog.csdn.net/weixin_55323026/article/details/115272460