RASA-Intent Identification Component Classifier

  1. MitieIntentClassifier

This classifier mainly uses MitieNLP for intent classification. The underlying classifier uses a multiclass linear SVM with a sparse linear kernel (see the function in the MITIE training code train_text_categorizer_classifier). The input is the dialogue text of the user alone, and the output is the user's intent type with confidence.

{
    "intent": {"name": "greet", "confidence": 0.98343}
}

Note: This classifier does not depend on any featurizer, since it extracts features by itself.

The configuration is as follows:

pipeline:
- name: "MitieIntentClassifier"

  1. LogisticRegressionClassifier

This classifier uses scikit-learn's implementation of logistic regression to perform intent classification. It can use both sparse and dense features. Outputs recognized intents and intent confidence rankings. Compared with the DIET classifier, although the accuracy of this classifier is not as good as DIET, this classifier is trained faster.

{
    "intent": {"name": "greet", "confidence": 0.780},
    "intent_ranking": [
        {
            "confidence": 0.780,
            "name": "greet"
        },
        {
            "confidence": 0.140,
            "name": "goodbye"
        },
        {
            "confidence": 0.080,
            "name": "restaurant_search"
        }
    ]
}

Configuration:

pipeline:
- name: LogisticRegressionClassifier
  max_iter: 100
  solver: lbfgs
  tol: 0.0001
  random_state: 42
  ranking_length: 10

Configuration parameters:

max_iter: The maximum number of iterations required for the solver to converge.

solver: The solver to use. For very small datasets, you might consider liblinear.

tol: The tolerance of the optimizer stopping criterion.

random_state: Used to shuffle the data before training.

ranking_length: The number of top intents to report. Set to 0 to report all intents


  1. SklearnIntentClassifier

The Sklearn intent classifier trains a linear SVM, which is optimized using grid search. It also provides a ranking of tags without "winning". SklearnIntentClassifier previously required dense featurizers in the pipeline. This dense featurizer creates features for classification.

The Sklearn intent classifier is mainly implemented based on scikit-learn, and the output is intent and intent ranking. During the training of the SVM, a hyperparameter search is run to find the best set of parameters. In configuration, developers can specify parameters to try.

{
  "intent": {"name": "greet", "confidence": 0.780},
  "intent_ranking": [
    {
      "confidence": 0.780,
      "name": "greet"
    },
    {
      "confidence": 0.140,
      "name": "goodbye"
    },
    {
      "confidence": 0.080,
      "name": "restaurant_search"
    }
  ]
}

Configuration:

pipeline:
  - name: "SklearnIntentClassifier"
    # Specifies the list of regularization values to
    # cross-validate over for C-SVM.
    # This is used with the ``kernel`` hyperparameter in GridSearchCV.
    C: [1, 2, 5, 10, 20, 100]
    # Specifies the kernel to use with C-SVM.
    # This is used with the ``C`` hyperparameter in GridSearchCV.
    kernels: ["linear"]
    # Gamma parameter of the C-SVM.
    "gamma": [0.1]
    # We try to find a good number of cross folds to use during
    # intent training, this specifies the max number of folds.
    "max_cross_validation_folds": 5
    # Scoring function used for evaluating the hyper parameters.
    # This can be a name or a function.
    "scoring_function": "f1_weighted"

  1. KeywordIntentClassifier

Simple keyword match intent classifier for small, short-term projects. This classifier works by searching for keywords in messages. By default, matching is case-sensitive and only searches for exact matches of keyword strings in user messages. The keywords for an intent are examples of that intent in the NLU training data. This means that the entire example is the keyword, not the individual words in the example.

{
    "intent": {"name": "greet", "confidence": 1.0}
}

This classifier is only suitable for small projects or getting started.

Configuration:

pipeline:
- name: "KeywordIntentClassifier"
  case_sensitive: True

  1. DIETClassifier

DIET (Dual Intent and Entity Transformer) is a multi-task architecture for intent classification and entity recognition. The architecture is based on a transformer shared for both tasks. A sequence of entity labels is predicted by a conditional random field (CRF) labeling layer, on top of the transformer output sequence corresponding to the token input sequence. For intent labels, the transformer outputs of the full utterance and intent labels are embedded into a single semantic vector space. We use a dot product loss to maximize similarity to target labels and minimize similarity to negative samples.

    DIET does not provide pretrained word embeddings or pretrained language models, but it can use these features if you add them to the pipeline.

    The output of this classifier is mainly entities, intents, and intent-ranked sets.

{
    "intent": {"name": "greet", "confidence": 0.7800},
    "intent_ranking": [
        {
            "confidence": 0.7800,
            "name": "greet"
        },
        {
            "confidence": 0.1400,
            "name": "goodbye"
        },
        {
            "confidence": 0.0800,
            "name": "restaurant_search"
        }
    ],
    "entities": [{
        "end": 53,
        "entity": "time",
        "start": 48,
        "value": "2017-04-10T00:00:00.000+02:00",
        "confidence": 1.0,
        "extractor": "DIETClassifier"
    }]
}

If you want to use DIETClassifier for intent classification only, set entity_recognition to False. If you only want to do entity recognition, set intent_classification to False. By default DIETClassifier does both, i.e. both entity_recognition and intent_classification are set to True. Developers can define many hyperparameters to tune the model. If you want to tune the model, you can modify the following parameters:

epochs: This parameter sets how many times the algorithm will see the training data (default: 300). One epoch is equal to one forward pass and one backpropagation of all training examples. Sometimes the model needs more epochs to learn correctly. Sometimes more epochs don't affect performance. The smaller the number of epochs, the faster the model is trained.

hidden_layers_sizes: This parameter allows you to define the number of feed-forward layers and their output dimensions for user messages and intents (default: text: ], label: []). Each entry in the list corresponds to a feed-forward layer. For example, if you set text: [256, 128], we will add two feed-forward layers in front of the transformer. Vectors of input tokens (from user messages) are passed to these layers. The output dimension of the first layer is 256, and the output dimension of the second layer is 128. If an empty list is used (the default behavior), no feed-forward layer will be added. Make sure to use only positive integer values. Typically, powers of 2 are used. Also, it's common practice to decrement the values ​​in the list: the next value is less than or equal to the previous value.

embedding_dimension: This parameter defines the output dimension of the embedding layer used internally by the model (default: 20). We use multiple embedding layers in our model architecture. For example, vectors of full utterances and intents are passed to the embedding layer before comparing and computing losses.

number_of_transformer_layers: This parameter sets the number of transformer layers to use (default: 2). The number of transformer layers corresponds to the transformer blocks used for the model.

transformer_size: This parameter sets the number of cells in the transformer (default: 256). The vector from transformer will have the given transformer_size.

connection_density: This parameter defines the fraction of kernel weights set to non-zero values ​​for all feed-forward layers in the model (default: 0.2). The value should be between 0 and 1. If you set connection_density to 1, no kernel weights will be set to 0, and the layer acts as a standard feed-forward layer. You should not set connection_density to 0, as this will cause all kernel weights to be 0, i.e. the model cannot learn.

constraints_similarities: This parameter set to True applies sigmoid cross-entropy loss on all similar items. This helps to keep the similarity between input and negative labels small. This should help to better generalize the model to a real-world test set.

model_confidence: This parameter allows the user to configure how the confidence is calculated during inference. It can only take one value as input, the softmax. In softmax, the confidence is in the range [0, 1]. The calculated similarities are normalized with a softmax activation function.

---------------------------------+------------------+--------------------------------------------------------------+
| Parameter                       | Default Value    | Description                                                  |
+=================================+==================+==============================================================+
| hidden_layers_sizes             | text: []         | Hidden layer sizes for layers before the embedding layers    |
|                                 | label: []        | for user messages and labels. The number of hidden layers is |
|                                 |                  | equal to the length of the corresponding list.               |
+---------------------------------+------------------+--------------------------------------------------------------+
| share_hidden_layers             | False            | Whether to share the hidden layer weights between user       |
|                                 |                  | messages and labels.                                         |
+---------------------------------+------------------+--------------------------------------------------------------+
| transformer_size                | 256              | Number of units in transformer.                              |
+---------------------------------+------------------+--------------------------------------------------------------+
| number_of_transformer_layers    | 2                | Number of transformer layers.                                |
+---------------------------------+------------------+--------------------------------------------------------------+
| number_of_attention_heads       | 4                | Number of attention heads in transformer.                    |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_key_relative_attention      | False            | If 'True' use key relative embeddings in attention.          |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_value_relative_attention    | False            | If 'True' use value relative embeddings in attention.        |
+---------------------------------+------------------+--------------------------------------------------------------+
| max_relative_position           | None             | Maximum position for relative embeddings.                    |
+---------------------------------+------------------+--------------------------------------------------------------+
| unidirectional_encoder          | False            | Use a unidirectional or bidirectional encoder.               |
+---------------------------------+------------------+--------------------------------------------------------------+
| batch_size                      | [64, 256]        | Initial and final value for batch sizes.                     |
|                                 |                  | Batch size will be linearly increased for each epoch.        |
|                                 |                  | If constant `batch_size` is required, pass an int, e.g. `8`. |
+---------------------------------+------------------+--------------------------------------------------------------+
| batch_strategy                  | "balanced"       | Strategy used when creating batches.                         |
|                                 |                  | Can be either 'sequence' or 'balanced'.                      |
+---------------------------------+------------------+--------------------------------------------------------------+
| epochs                          | 300              | Number of epochs to train.                                   |
+---------------------------------+------------------+--------------------------------------------------------------+
| random_seed                     | None             | Set random seed to any 'int' to get reproducible results.    |
+---------------------------------+------------------+--------------------------------------------------------------+
| learning_rate                   | 0.001            | Initial learning rate for the optimizer.                     |
+---------------------------------+------------------+--------------------------------------------------------------+
| embedding_dimension             | 20               | Dimension size of embedding vectors.                         |
+---------------------------------+------------------+--------------------------------------------------------------+
| dense_dimension                 | text: 128        | Dense dimension for sparse features to use.                  |
|                                 | label: 20        |                                                              |
+---------------------------------+------------------+--------------------------------------------------------------+
| concat_dimension                | text: 128        | Concat dimension for sequence and sentence features.         |
|                                 | label: 20        |                                                              |
+---------------------------------+------------------+--------------------------------------------------------------+
| number_of_negative_examples     | 20               | The number of incorrect labels. The algorithm will minimize  |
|                                 |                  | their similarity to the user input during training.          |
+---------------------------------+------------------+--------------------------------------------------------------+
| similarity_type                 | "auto"           | Type of similarity measure to use, either 'auto' or 'cosine' |
|                                 |                  | or 'inner'.                                                  |
+---------------------------------+------------------+--------------------------------------------------------------+
| loss_type                       | "cross_entropy"  | The type of the loss function, either 'cross_entropy'        |
|                                 |                  | or 'margin'. If type 'margin' is specified,                  |
|                                 |                  | "model_confidence=cosine" will be used which is deprecated   |
|                                 |                  | as of 2.3.4. See footnote (1).                               |
+---------------------------------+------------------+--------------------------------------------------------------+
| ranking_length                  | 10               | Number of top intents to report. Set to 0 to report all      |
|                                 |                  | intents.                                                     |
+---------------------------------+------------------+--------------------------------------------------------------+
| renormalize_confidences         | False            | Normalize the reported top intents. Applicable only with loss|
|                                 |                  | type 'cross_entropy' and 'softmax' confidences.              |
+---------------------------------+------------------+--------------------------------------------------------------+
| maximum_positive_similarity     | 0.8              | Indicates how similar the algorithm should try to make       |
|                                 |                  | embedding vectors for correct labels.                        |
|                                 |                  | Should be 0.0 < ... < 1.0 for 'cosine' similarity type.      |
+---------------------------------+------------------+--------------------------------------------------------------+
| maximum_negative_similarity     | -0.4             | Maximum negative similarity for incorrect labels.            |
|                                 |                  | Should be -1.0 < ... < 1.0 for 'cosine' similarity type.     |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_maximum_negative_similarity | True             | If 'True' the algorithm only minimizes maximum similarity    |
|                                 |                  | over incorrect intent labels, used only if 'loss_type' is    |
|                                 |                  | set to 'margin'.                                             |
+---------------------------------+------------------+--------------------------------------------------------------+
| scale_loss                      | False            | Scale loss inverse proportionally to confidence of correct   |
|                                 |                  | prediction.                                                  |
+---------------------------------+------------------+--------------------------------------------------------------+
| regularization_constant         | 0.002            | The scale of regularization.                                 |
+---------------------------------+------------------+--------------------------------------------------------------+
| negative_margin_scale           | 0.8              | The scale of how important it is to minimize the maximum     |
|                                 |                  | similarity between embeddings of different labels.           |
+---------------------------------+------------------+--------------------------------------------------------------+
| connection_density              | 0.2              | Connection density of the weights in dense layers.           |
|                                 |                  | Value should be between 0 and 1.                             |
+---------------------------------+------------------+--------------------------------------------------------------+
| drop_rate                       | 0.2              | Dropout rate for encoder. Value should be between 0 and 1.   |
|                                 |                  | The higher the value the higher the regularization effect.   |
+---------------------------------+------------------+--------------------------------------------------------------+
| drop_rate_attention             | 0.0              | Dropout rate for attention. Value should be between 0 and 1. |
|                                 |                  | The higher the value the higher the regularization effect.   |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_sparse_input_dropout        | True             | If 'True' apply dropout to sparse input tensors.             |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_dense_input_dropout         | True             | If 'True' apply dropout to dense input tensors.              |
+---------------------------------+------------------+--------------------------------------------------------------+
| evaluate_every_number_of_epochs | 20               | How often to calculate validation accuracy.                  |
|                                 |                  | Set to '-1' to evaluate just once at the end of training.    |
+---------------------------------+------------------+--------------------------------------------------------------+
| evaluate_on_number_of_examples  | 0                | How many examples to use for hold out validation set.        |
|                                 |                  | Large values may hurt performance, e.g. model accuracy.      |
+---------------------------------+------------------+--------------------------------------------------------------+
| intent_classification           | True             | If 'True' intent classification is trained and intents are   |
|                                 |                  | predicted.                                                   |
+---------------------------------+------------------+--------------------------------------------------------------+
| entity_recognition              | True             | If 'True' entity recognition is trained and entities are     |
|                                 |                  | extracted.                                                   |
+---------------------------------+------------------+--------------------------------------------------------------+
| use_masked_language_model       | False            | If 'True' random tokens of the input message will be masked  |
|                                 |                  | and the model has to predict those tokens. It acts like a    |
|                                 |                  | regularizer and should help to learn a better contextual     |
|                                 |                  | representation of the input.                                 |
+---------------------------------+------------------+--------------------------------------------------------------+
| tensorboard_log_directory       | None             | If you want to use tensorboard to visualize training         |
|                                 |                  | metrics, set this option to a valid output directory. You    |
|                                 |                  | can view the training metrics after training in tensorboard  |
|                                 |                  | via 'tensorboard --logdir <path-to-given-directory>'.        |
+---------------------------------+------------------+--------------------------------------------------------------+
| tensorboard_log_level           | "epoch"          | Define when training metrics for tensorboard should be       |
|                                 |                  | logged. Either after every epoch ('epoch') or for every      |
|                                 |                  | training step ('batch').                                 |
+---------------------------------+------------------+--------------------------------------------------------------+
| featurizers                     | []               | List of featurizer names (alias names). Only features        |
|                                 |                  | coming from the listed names are used. If list is empty      |
|                                 |                  | all available features are used.                             |
+---------------------------------+------------------+--------------------------------------------------------------+
| checkpoint_model                | False            | Save the best performing model during training. Models are   |
|                                 |                  | stored to the location specified by `--out`. Only the one    |
|                                 |                  | best model will be saved.                                    |
|                                 |                  | Requires `evaluate_on_number_of_examples > 0` and            |
|                                 |                  | `evaluate_every_number_of_epochs > 0`                        |
+---------------------------------+------------------+--------------------------------------------------------------+
| split_entities_by_comma         | True             | Splits a list of extracted entities by comma to treat each   |
|                                 |                  | one of them as a single entity. Can either be `True`/`False` |
|                                 |                  | globally, or set per entity type, such as:                   |
|                                 |                  | ```                                                          |
|                                 |                  | ...                                                          |
|                                 |                  | - name: DIETClassifier                                       |
|                                 |                  |   split_entities_by_comma:                                   |
|                                 |                  |     address: True                                            |
|                                 |                  |     ...                                                      |
|                                 |                  | ...                                                          |
|                                 |                  | ```                                                          |
+---------------------------------+------------------+--------------------------------------------------------------+
| constrain_similarities          | False            | If `True`, applies sigmoid on all similarity terms and adds  |
|                                 |                  | it to the loss function to ensure that similarity values are |
|                                 |                  | approximately bounded. Used only if `loss_type=cross_entropy`|
+---------------------------------+------------------+--------------------------------------------------------------+
| model_confidence                | "softmax"        | Affects how model's confidence for each intent               |
|                                 |                  | is computed. Currently, only one value is supported:         |
|                                 |                  | 1. `softmax` - Similarities between input and intent         |
|                                 |                  | embeddings are post-processed with a softmax function,       |
|                                 |                  | as a result of which confidence for all intents sum up to 1. |
|                                 |                  | This parameter does not affect the confidence for entity     |
|                                 |                  | prediction.                                                  |
+---------------------------------+------------------+--------------------------------------------------------------+

  1. FallbackClassifier

    If the NLU intent classification is not very clear, it can also use nlu_fallback to classify the message. Confidence is the same as the fallback threshold setting. The output of this classifier is also a set of entities, intents, and intent types.

{
    "intent": {"name": "nlu_fallback", "confidence": 0.7183846840434321},
    "intent_ranking": [
        {
            "confidence": 0.7183846840434321,
            "name": "nlu_fallback"
        },
        {
            "confidence": 0.28161531595656784,
            "name": "restaurant_search"
        }
    ],
    "entities": [{
        "end": 53,
        "entity": "time",
        "start": 48,
        "value": "2017-04-10T00:00:00.000+02:00",
        "confidence": 1.0,
        "extractor": "DIETClassifier"
    }]
}

    The FallbackClassifier can be used to classify user messages if the previous intent classifier fails to classify the intent properly. It also predicts fallback intents when the confidence scores of the two top-ranked intents are close to ambiguity_threshold. Developers can use the FallbackClassifier to handle fallback operations with uncertain NLU predictions.

rules:
- rule: Ask the user to rephrase in case of low NLU confidence
  steps:
  - intent: nlu_fallback
  - action: utter_please_rephrase

Developers can define many hyperparameters to tune the model. If you want to adjust the model, you can modify the following parameters. Only when no other intent is predicted or the confidence level is not greater than or equal to the threshold, the FallbackClassifier classifier can use nlu_fallback for intent classification.

threshold: This parameter sets the threshold for pre-nlu_fallback intents. FallbackClassifier will set an nlu_fallback intent with a confidence of 1.0 if the intent predicted by the previous intent classifier has no confidence greater than or equal to the threshold.

ambiguity_threshold: If the developer configures ambiguity_threshold, FallbackClassifier will also predict the nlu_fallback intent in case the difference in confidence scores of the two top ranked intents is less than ambiguity_threshold.

Guess you like

Origin blog.csdn.net/fzz97_/article/details/128957999