Nutch is an open source Java implemented search engine

Nutch is an open source Java implementation of a search engine. It provides all the tools we need to run our own search engine. Includes full-text search and web crawler .

 

 

Nutch's architecture adopts a very flexible plug-in model, and most of the core functions can be completed by assembling plug-ins. If you want to get familiar with the operation mechanism of specific nutch plug-ins, you can refer to the official wiki of nutch. Today, I will introduce how to add our own plug-ins in nutch. 
1. Download the nutch code and compile it.
2. Go to src/plugin/ in the root directory of nutch, and create a new folder for index-self. The name of this folder can be named arbitrarily. Sanxian is here, just an example.

3. Enter the index-self folder, create a new \src\java\org\apache\nutch\myplugin\ folder, and store your own source code classes.

4. Go back to the index-self root directory and create a new build.xml, ivy. xml, plugin.xml, this is written in imitation of other plug-in structure format, if you don't understand, you can look at the structure of other plug-ins


5, then go to the root directory of src/plugin/, modify build.xml, pay attention to path


6, next , I need to modify the build.xml


7 in the root directory of nutch. When all the above are completed, we can enter the root directory of nutch to compile ant. After compiling, we can compile it in build/plugin/index-self In the directory, find the compiled jar package and class file.

8. In the last step, we need to configure our plug-in in nutch-default.xml to take effect.

9. After all the above steps are executed, it is completed. We run in local or deploy mode to test whether our plug-in is activated. .

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326604121&siteId=291194637