Development of HarmonyOS learning path - AI function development (part-of-speech tagging)

Overview of part-of-speech tagging

With the development of information technology, the geometric growth of the amount of information in the network has gradually become the main feature of today's society. Accurately extracting key information from text is the technical foundation of search engines and other fields, and word segmentation is particularly important as the first step in text information extraction.

As a basic research in the field of natural language processing, word segmentation has derived various text processing related applications.

Part-of-speech tagging includes word segmentation and marking a correct part of speech for each word in the word segmentation result (marking each word as a noun, verb, adjective or other part of speech). Developers can customize the granularity of word segmentation.

Operation Mechanism

Part-of-speech tagging provides an interface for automatic text segmentation and part-of-speech. For a piece of input text, it is automatically segmented through the part-of-speech tagging interface, and a correct part-of-speech is marked for each word in the word segmentation result. Part-of-speech tagging provides different word segmentation granularities, and developers can customize word segmentation granularity as needed.

Constraints and Restrictions

  • Currently only Chinese context is supported.
  • The part-of-speech tagging text is limited to 500 characters. If the number of characters exceeds, a parameter error will be returned. The text must be in UTF-8 format. If the format is wrong, no error will be reported, but it will lead to an error in the analysis result.
  • Engine supports simultaneous access by multiple users, but does not support concurrent invocation of the same feature by the same user. If the same feature is called multiple times by the same process at the same time, a system busy error will be returned; if different processes call the same feature, only one process can process business at the same time, and other processes will enter the queue.

POS tagging development

scene introduction

  • Applied to search engine development. For search engines, it is meaningless to find all the results in tens of billions of web pages. What is important is to present the most relevant results at the top, which is also called relevance ranking. Whether the word segmentation is accurate or not will directly affect the ranking of the relevance of the search results.
  • Applied to the development of semantic analysis related software. In semantic analysis, understand the correct meaning of the text through word segmentation, and obtain the part of speech through part-of-speech tagging, and accurately determine whether a word is a noun, verb, adjective, etc., making semantic analysis easier to expand.

Interface Description

Part-of-speech tagging provides the getWordPos() interface, which can mark a correct part-of-speech for each word in the word segmentation result according to the segmentation granularity.

main interface

interface name

describe

ResponseResult getWordPos(String requestData, int requestType)

Part-of-speech tagging is done synchronously.

ResponseResult getWordPos(final String requestData, final int requestType, final OnResultListener<ResponseResult> listener)

Part-of-speech tagging is done asynchronously.

void init(Context context, OnResultListener<Integer> listener, boolean isLoadModel)

Initialize the NLU service. Before calling functional interfaces such as NLU, you need to call this interface first, and then call the NLU functional interface after obtaining the callback result in the onResult(T) method of OnResultListener. The developer passes in the listener parameter as a callback to wait for the calling process and result of the NLU functional interface.

void destroy(Context context)

Cancel all NLU tasks and destroy the NLU engine service. After calling this method, the NLU service can no longer be used. If you need to reuse the NLU service, you need to call init(Context, OnResultListener<Integer>, boolean)} again to initialize the NLU service.

Interface input value description

  • requestType indicates the request type, which is defined by the NluRequestType class as follows:

    type

    illustrate

    static int

    REQUEST_TYPE_LOCAL = 0 local request

  • requestData represents the input text information in JSON format, as described in the following table.

    parameter name

    Is it required?

    type

    illustrate

    text

    ture

    String

    The text to be analyzed is encoded in UTF-8, limited to 500 characters.

    type

    false

    long

    The granularity of word segmentation, the default is 0.

    • 0: basic words, smaller granularity. For example: "I want to watch The Fast and the Furious", divided into "I/want/watch/speed/and/furious".
    • 1: On the basis of basic words, merge entities. For example: "I'm going to Jiangning Wanda Plaza to watch The Fast and the Furious" is divided into "I/want/go/Jiangning Wanda Plaza/watch/speed/and/passion".

      For text information that has no mergeable entity, its word segmentation effect is the same as that of type 0. For example: "Watch a movie together at 3 o'clock tomorrow afternoon" is divided into "tomorrow/afternoon/3 o'clock/together/watch/movie".

    • 9223372036854775807 (2 to the 63rd power minus 1): On the basis of type 1, merge the overall structure of entity time, place, etc. (not merge if there are symbols separated), and merge some common phrases.

      For example: "adjective + of", "one-character verb + one-character noun", etc., simplify the sentence components. According to the above principles, "Tomorrow I will watch a movie at Jiangning Ruidu Jinyi Cinema from 3:00 to 5:00 p.m." will be divided into "Tomorrow 3:00 p.m./to/5:00 p.m./I am/at/Jiangning Ruidu Jinyi Cinema/watching/movie".

    callPkg

    false

    String

    caller name.

    callType

    false

    int

    Caller type:

    • 0: normal application (default)
    • 1: Quick App

    callVersion

    false

    String

    The caller version number.

    callState

    false

    int

    Caller state:

    • -1: unknown (default)
    • 0: foreground
    • 1: background

    Entity categories currently supported by NLU:

    Entity class

    Remark

    Movie

    Rely on dictionaries, require real use cases, do not modify.

    TV drama

    Rely on dictionaries, require real use cases, do not modify.

    variety show

    Rely on dictionaries, require real use cases, do not modify.

    cartoon

    Rely on dictionaries, require real use cases, do not modify.

    train number

    Real use cases are required and no modification is required.

    flight number

    Real use cases are required and no modification is required.

    team

    Rely on the dictionary, support NBA, CBA, Premier League, La Liga, Bundesliga, Serie A, Ligue 1, Chinese Super League team identification, require real use cases, do not modify.

    person's name

    Real use cases are required and no modification is required.

    tracking number

    Real use cases are required and no modification is required.

    telephone number

    Real use cases are required and no modification is required.

    url

    Real use cases are required and no modification is required.

    Mail

    Real use cases are required and no modification is required.

    the league

    NBA, CBA, Premier League, La Liga, Bundesliga, Serie A, Ligue 1, Chinese Super League, require real use cases, do not modify.

    time

    Real use cases are required and no modification is required.

    Place

    Contains hotels, restaurants, scenic spots, schools, roads, provinces, cities, counties, districts, towns, etc., partially relying on dictionaries.

    verification code

    The use case is real, do not modify it.

Interface return value description

The responseResult in the return value ResponseResult is a JSON string, reflecting the result of part-of-speech tagging:

parameter name

Is it required?

value type

illustrate

code

yes

int

The result code of part-of-speech tagging. Values ​​include:
  • 0: success
  • 1: The system is initializing
  • 2: Parameter error
  • 3: The system is busy
  • 4: System exception
  • 5: Task timed out
  • 6: Other errors

message

yes

String

error message.

pos

no

JSONArray

The segmented word array, the type in the array is JSONObject.

+word

no

String

Segmented words.

+tag

no

String

词性,type为1或9223372036854775807时,人名实体的词性为nr,时间实体的词为t,地点实体的词性为ns,其他实体统一为ne。具体词性类型可参表1

表1 词性说明

词性

说明

词性

说明

词性

说明

n

名词

rr

人称代词

u

助词

nr

人名

rz

指示代词

uzhe

助词“着”

ns

地名

rzt

时间指示代词

ule

助词“了”“喽”

ne

只在实体合并时使用,除人名、时间、地点之前,其他实体统一返回ne

rzs

处所指示代词

uguo

助词“过”

t

时间词

rzv

谓词性指示代词

ude1

助词“的”

tg

时间词性语素

ry

疑问代词

ude2

助词“地”

s

处所词

ryt

时间疑问代词

ude3

助词”得”

f

方位词

rys

处所疑问代词

usuo

助词”所“

v

动词

ryv

谓词性疑问代词

udeng

助词“等”“等等”

vd

副动词

rg

代词性语素

uyy

助词”一样”“一般”“似的”“般”

vn

名动词

m

数词

udh

助词“的话”

vshi

动词“是”

mq

数量词

uls

助词“来讲”“来说”“而言”“说来”

vyou

动词“有”

q

量词

uzhi

助词“之“

vf

趋向动词

qv

动量词

ulian

助词“连”

a

形容词

qt

时量词

e

叹词

ad

副形词

d

副词

y

语气词

an

名形词

p

介词

o

拟声词

b

区别词

pba

介词“把”

h

前缀

bl

区别词性惯用语

pbei

介词“被”

k

后缀

z

状态词

c

连词

x

字符串

r

代词

cc

并列连词

idiom

成语

w

标点符号

-

-

-

-

开发步骤

在使用词性标注的接口时,将实现词性标注的相关类添加至工程。

import ohos.ai.nlu.NluRequestType;
import ohos.ai.nlu.NluClient;
import ohos.ai.nlu.OnResultListener;
import ohos.ai.nlu.ResponseResult;

使用NluClient静态类进行初始化,通过异步方式获取服务的连接。

  • context:应用上下文信息,应为ohos.aafwk.ability.Ability或ohos.aafwk.ability.AbilitySlice的实例或子类实例。
  • listener:初始化结果的回调,可以传null。
  • isLoadModel:是否加载模型,如果传true,则在初始化时加载模型;如果传false,则在初始化时不加载模型。
NluClient.getInstance().init(context, new OnResultListener<Integer>(){
        @Override
        public void onResult(Integer result){
         // 初始化成功回调,在服务初始化成功调用该函数
        }
}, true);

调用词性标注的接口。

采用同步方式进行词性标注:

String requestData = "{\"text\":\"我要看速度与激情\",\"type\":0}";
ResponseResult responseResult = NluClient.getInstance().getWordPos(requestData, NluRequestType.REQUEST_TYPE_LOCAL);

采用异步方式进行词性标注:

NluClient.getInstance().getWordPos(requestData,
            NluRequestType.REQUEST_TYPE_LOCAL, new OnResultListener<ResponseResult>() {
                @Override
                public void onResult(ResponseResult result) {
                    //异步返回处理
                }
            });

销毁NLU服务。

NluClient.getInstance().destroy(context);

Guess you like

Origin blog.csdn.net/weixin_47094733/article/details/131402969