Nutch2 + Solr 6: solrdedup causes ClassCastException

Error running:

  /mnt/nutch/nutch/runtime/local/bin/nutch solrdedup -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true http://localhost:8983/solr/nutch

Failed with exit value 1.

 

hadoop.log :

java.lang.Exception: java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String

        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)

        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)

Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String

        at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrRecordReader.nextKeyValue(SolrDeleteDuplicates.java:233)

        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)

        at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)

        at 

 

This error exists in both solr5 and 6. Refer to the solution here:

http://lucene.472066.n3.nabble.com/Nutch-2-Solr-5-solrdedup-causes-ClassCastException-td4301149.html#a4302739

 

1) Copy the data_driven_schema_configs in the solr-6.6.0/server/solr/configsets directory to the original directory and rename it to nutch

2) Copy the schema.xml under $NUTCH_HOME/conf to server/solr/configsets/nutch/conf.

3) Delete enablePositionIncrements in the server/solr/configsets/nutch/conf/schema.xml file.

4) Execute the command to create core:

 

solr create -c nutch -d nutch

   Explanation: [-c name] [-d confdir]

 

   The nutch here is the name of the core and the name of the conf specified above.

If successful, you will see this input:

Zhuos-MacBook-Pro:solr-6.6.0 jo$ solr create -c nutch -d nutch

 

Copying configuration to new core instance directory:

/Users/jo/soft/solr-5.5.4/server/solr/nutch

 

Creating new core 'nutch' using command:

http://localhost:8983/solr/admin/cores?action=CREATE&name=nutch&instanceDir=nutch

 

{

  "responseHeader":{

    "status":0,

    "QTime":107},

  "core":"nutch"}

 

 

 

如果报错:ERROR: Error CREATEing SolrCore 'nutch': Unable to create core [nutch] Caused by: fieldType 'tdates' not found in the schema

This can be tdates or others.. You can compare the schema.xml and managed-schema files in the server/solr/configsets/nutch/conf directory. tdates should be used in the managed-schema file, but not defined in schema.xml. So copy the definition of tdates in managed-schema to schema.xml. For example, I found the following paragraph from managed-schema and copied it into schema.xml:

    <fieldType name="tints" class="solr.TrieIntField" docValues="true" precisionStep="8" positionIncrementGap="0" multiValued="true"/>
    <fieldType name="tfloats" class="solr.TrieFloatField" docValues="true" precisionStep="8" positionIncrementGap="0" multiValued="true"/>
    <fieldType name="tlongs" class="solr.TrieLongField" docValues="true" precisionStep="8" positionIncrementGap="0" multiValued="true"/>
    <fieldType name="tdoubles" class="solr.TrieDoubleField" docValues="true" precisionStep="8" positionIncrementGap="0" multiValued="true"/>

 

这下应该没什么问题了。

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326523491&siteId=291194637