nutch nutch-site.xml

1. nutch-site.xml的变更不需要重新ant,  与ycs的说法有误
2. nutch-site.xml中的
<property>
  <name>http.agent.name</name>
  <value>Mozilla/5.0 (Windows NT 6.1; rv:20.0) Gecko/20100101 Firefox/20.0</value>
  <description>HTTP 'User-Agent' request header. MUST NOT be empty -
  please set this to a single word uniquely related to your organization.

  NOTE: You should also check other related properties:

        http.robots.agents
        http.agent.description
        http.agent.url
        http.agent.email
        http.agent.version

  and set their values appropriately.

  </description>
</property>
其中<value></value>要有同一行,不然会出现fetch www.amazon.cn,www.vancl.com 不到东西的情况。非常怪异的情况

猜你喜欢

转载自john-doe.iteye.com/blog/1860070