[NLP] Introduction to OpenNLP

table of Contents

Introduction to OpenNLP

OpenNLP execution steps

Pre-trained model


Introduction to OpenNLP

The Apache OpenNLP library is a machine learning-based toolbox for processing natural language text. OpenNLP supports most common NLP tasks, such as word segmentation, clause, part-of-speech tagging, named entity recognition, segmentation, grammatical analysis, language detection, coreference analysis, etc.

The goal of the OpenNLP project is to create a mature toolbox for the above tasks. Another goal is to provide a large number of pre-built models for various languages ​​and annotated text resources derived from these models.

The OpenNLP library contains multiple components that enable it to build a complete natural language processing pipeline. These components include: sentence detector, tagger, name finder, document classification program, part-of-speech tagger, blocker, parser, and coreference analysis. Components contain components that enable us to perform our respective natural language processing tasks, train models, and usually also evaluate models. Each of these facilities can be accessed through its application programming interface (API). In addition, in order to facilitate experiments and training, a command line interface (CLI) is also provided.

OpenNLP execution steps

OpenNLP components have similar APIs, usually to perform tasks, it should provide a model and an input. After loading the model, you can instantiate the tool itself; after the tool is instantiated, you can perform processing tasks. The input and output formats are tool-specific, but usually the output is a string array, and the input is a string or string array.

Pre-trained model

The OpenNLP community provides many trained models that can be downloaded and used directly.

  1. SourceForge Models

http://opennlp.sourceforge.net/models-1.5/

 

    2.maven repository

http://maven.tamingtext.com/opennlp-models/models-1.5/

Guess you like

Origin blog.csdn.net/henku449141932/article/details/111041173