crawler-beans.cxml

1、CrawlMetadata: including identification of crawler/operator
org.archive.modules.CrawlMetadata:  Basic crawl metadata, as consulted by functional modules and recorded in ARCs/WARCs.

org.archive.modules.seeds.TextSeedModule

org.archive.modules.deciderules.DecideRuleSequence

org.archive.modules.CandidateChain

org.archive.modules.FetchChain

org.archive.modules.DispositionChain

org.archive.crawler.framework.CrawlController

org.archive.crawler.frontier.BdbFrontier

org.archive.crawler.util.BdbUriUniqFilter

forceRetire

smallBudget

veryPolite

highPrecedence

<!--    OPTIONAL BUT RECOMMENDED BEANS  -->
actionDirectory

crawlLimiter

checkpointService

statisticsTracker

loggerModule

sheetOverlaysManager

cookieStorage

serverCache

configPathConfigurer

猜你喜欢

转载自sharehua.iteye.com/blog/1745818
今日推荐