Teacher Gavin's Transformer Live Class Perception - Detailed Explanation of NLU Data of the Education Bot Project in the Education Field of the Rasa Dialogue Robot Project (71)

   This article continues to focus on the industrial-grade business dialogue platform and framework Rasa, the hierarchical structure of the Education Bot project NLU Data in the education field of the Rasa dialogue robot project, the data format, how to use regular expressions and lookup table in intent classification and entity extraction, How to use things like role and group tags for detailed parsing when extracting entities.

 1. Detailed explanation of the NLU Data of the Education Bot project in the education field of the Rasa dialogue robot project

  1. Education Bot Project NLU Data Architecture Design High-Level Structure Four Core Analysis

The goal of NLU is to extract structured information from user input, usually including intents and entities. You can help better identify intents and extract entities by adding regular expressions and lookup tables to your NLU training data.

Each YAML file can contain training data corresponding to multiple keys, but in a file, a key can only appear once. There are 4 types of keys available:

version: should be specified in all files, otherwise Rasa will assume the latest data format supported by the currently installed version

nlu: Specify NLU training data

stories: used to train machine learning models to recognize patterns used in the current dialogue training data, thereby generalizing to dialogue paths that are not currently visible

rules: used to train RulePolicy

 2. Analysis and instance analysis of NLU Training Examples

The training data of NLU will classify the user dialogue data according to the intent. The name of the intent should be related to the task that the user needs to use it to complete. The name should be lowercase, without spaces and special characters.

Here is an example:

The intent in the example below uses the / symbol, indicating that the out_of_scope contains the sub-intent "non_english", which is also used by faq or chicchat:

If you use a custom NLU component, you can also use the extended format. For example, metadata is used in the following example. The metadata can include any key-value data, and the custom component can use the metadata data for analysis:

You can also specify metadata at the intent level:

About the training data format:

Rasa uses YAML to manage all training data, including NLU data, stories and rules. You can use multiple YAML files to divide the training data, each file can contain any combination of NLU data, stories and rules. The training data interpreter will determine the training data type based on the top-level key. Domain also uses YAML format, you can define multiple domain files, domain includes the definition of responses and forms.

 3. NLU Entities analysis and example analysis

The name of the entity is used for labeling in the training data. In addition to the name, synonyms, roles, and groups can also be used for labeling.

Regarding entity extraction, you can use the training data to train a machine learning model, or you can define regular expressions to extract entity information using RegexEntityExtractor based on character patterns. When extracting entities, you need to consider the purpose of use. If it is user information that does not need to be processed, you do not need to extract:

Regular expressions are used here to extract the information:

The following is an example of the annotation of the entity:

nlu:

- intent: check_balance

  examples: |

    - how much do I have on my [savings](account) account

    - how much money is in my [checking]{"entity": "account"} account

    - What's the balance on my [credit card account]{"entity":"account","value":"credit"}

All possible syntax formats for marking an entity are as follows:

[<entity-text>]{"entity": "<entity name>", "role": "<role name>", "group": "<group name>", "value": "<entity synonym>"}

 4. NLU Synonyms analysis and example analysis

The extracted entity information is mapped to a unified value through the synonym technology. The specific examples are as follows:

You can also use the following way:

nlu:

- intent: check_balance

  examples: |

    - how much do I have on my [credit card account]{"entity": "account", "value": "credit"}

- how much do I owe on my [credit account]{"entity": "account", "value": "credit"}

 5. Analysis and example analysis of NLU Regular Expressions for Intent Classification

When using RegexFeaturizer, instead of using a regular expression as a rule to classify intents, it only provides a feature for the intent classifier to use to learn patterns for intent classification. All current intent classifiers utilize available regular expression features for intent classification. The name of a regular expression is readable to help people remember its purpose. The name does not need to match any intent or entity name. For example, a regular expression for a help request is described as follows:

The matched intent can be greet, help_me, assistance, etc. To minimize the number of words matched, it is recommended to use \bhelp\b instead of help.*.

 6. Analysis and example analysis of NLU Regular Expressions for Entity Extraction

- Create features for RegexFeaturizer using regular expressions. When using RegexFeaturizer, a regular expression provides a feature to help the model learn the connection between intents/entities and inputs that match the regular expression. In this case, RegexFeaturizer just provides features to the entity extractor, and the training data needs to contain enough regular expression examples to help the entity extractor learn to use regular expression features. The regular expression features currently used for entity extraction can only be used by the components CRFEntityExtractor and DIETClassifier.

- To use regular expressions in rule-based entity extraction, you need to use RegexEntityExtractor. When using this component, the name of the regular expression needs to match the name of the entity you want to extract, as in the following example, use regular expressions Expression to extract 10-12 digits of account information, RegexEntityExtractor does not require the use of training data to learn to extract entities, but requires at least two pieces of data marked with account entity information so that the NLU model can register the account as an entity during training. .

 7. NLU Lookup Tables analysis and example analysis

Lookup tables are lists of words used to generate case-insensitive regular expression patterns. They are used like regular expressions, and are usually combined with RegexFeaturizer and RegexEntityExtractor in the pipeline. You can use lookup tables to help extract entities that have predefined values. Keep the content of the lookup tables as specific as possible, as in the example of extracting country names, you can add all the country names to a lookup table:

When lookup tables are used in conjunction with RegexFeaturizer, provide enough training data for the intent or entity you want to match so the model can learn to use a regular expression as a feature. When lookup tables are used in combination with RegexEntityExtractor, provide at least two pieces of data annotating this entity so that the NLU model can register this entity during training.

 8. NLU Entities Roles and Groups Analysis and Case Analysis

In order to distinguish the use of the same entity in different scenarios in the training data, a role can be used to mark an entity. For example, in the following example, in order to distinguish the cities used in different places, the role can be specified as department or destination:

- I want to fly from [Berlin]{"entity": "city", "role": "departure"} to [San Francisco]{"entity": "city", "role": "destination"}.

You can also specify group to group different entities:

Give me a [small]{"entity": "size", "group": "1"} pizza with [mushrooms]{"entity": "topping", "group": "1"} and

a [large]{"entity": "size", "group": "2"} [pepperoni]{"entity": "topping", "group": "2"}

In order to properly train the model, enough training data needs to be included for each entity and role, group combination. In order for the model to generalize, the following variables need to be present in the training data, such as including fly TO y FROM x.

In order to fill the slot with the specified role or group, the role and group also need to be specified in the from_entity type of slot mapping.

 9. Analysis and case analysis of NLU Entity Roles and Groups influencing dialogue predictions

If roles and groups need to be used to influence dialogue predictions, then stories need to be modified to include the desired role or group tags. In addition, roles and groups need to be specified for an entity in the domain file. The following example demonstrates how to output different information based on users in different locations:

 10. NLU BILOU Entity Tagging

DIETClassifier and CRFEntityExtractor can specify BILOU_flag, which is used to identify the relative position of the extracted entity in the list of user input information converted into tokens, such as at the beginning, in the middle of entities, or at the end, etc. Using this flag can improve the performance of the machine learning model when predicting entities. For example, for the following training data:

[Alex]{"entity": "person"} is going with [Marty A. Rick]{"entity": "person"} to [Los Angeles]{"entity": "location"}.

Result for BILOU_flag being true or false in the token list:

Guess you like

Origin blog.csdn.net/m0_49380401/article/details/123342915
Recommended