Error running:
/mnt/nutch/nutch/runtime/local/bin/nutch solrdedup -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true http://localhost:8983/solr/nutch
Failed with exit value 1.
hadoop.log :
java.lang.Exception: java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String
at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrRecordReader.nextKeyValue(SolrDeleteDuplicates.java:233)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at
This error exists in both solr5 and 6. Refer to the solution here:
http://lucene.472066.n3.nabble.com/Nutch-2-Solr-5-solrdedup-causes-ClassCastException-td4301149.html#a4302739
1) Copy the data_driven_schema_configs in the solr-6.6.0/server/solr/configsets directory to the original directory and rename it to nutch
2) Copy the schema.xml under $NUTCH_HOME/conf to server/solr/configsets/nutch/conf.
3) Delete enablePositionIncrements in the server/solr/configsets/nutch/conf/schema.xml file.
4) Execute the command to create core:
solr create -c nutch -d nutch
Explanation: [-c name] [-d confdir]
The nutch here is the name of the core and the name of the conf specified above.
If successful, you will see this input:
Zhuos-MacBook-Pro:solr-6.6.0 jo$ solr create -c nutch -d nutch
Copying configuration to new core instance directory:
/Users/jo/soft/solr-5.5.4/server/solr/nutch
Creating new core 'nutch' using command:
http://localhost:8983/solr/admin/cores?action=CREATE&name=nutch&instanceDir=nutch
{
"responseHeader":{
"status":0,
"QTime":107},
"core":"nutch"}
如果报错:ERROR: Error CREATEing SolrCore 'nutch': Unable to create core [nutch] Caused by: fieldType 'tdates' not found in the schema
This can be tdates or others.. You can compare the schema.xml and managed-schema files in the server/solr/configsets/nutch/conf directory. tdates should be used in the managed-schema file, but not defined in schema.xml. So copy the definition of tdates in managed-schema to schema.xml. For example, I found the following paragraph from managed-schema and copied it into schema.xml:
<fieldType name="tints" class="solr.TrieIntField" docValues="true" precisionStep="8" positionIncrementGap="0" multiValued="true"/> <fieldType name="tfloats" class="solr.TrieFloatField" docValues="true" precisionStep="8" positionIncrementGap="0" multiValued="true"/> <fieldType name="tlongs" class="solr.TrieLongField" docValues="true" precisionStep="8" positionIncrementGap="0" multiValued="true"/> <fieldType name="tdoubles" class="solr.TrieDoubleField" docValues="true" precisionStep="8" positionIncrementGap="0" multiValued="true"/>
这下应该没什么问题了。