Text anomaly detection

original

Simple word2vec averaging has a mediocre effect in calculating the semantic similarity of sentences (it is easy to understand that the meanings of I love you and you love me sentences are different, but the final result of direct averaging is the same. This type of strategy ignores The order information of the item (but it is not a big problem for unordered items...))

sif does not solve the above problem. (Here I will briefly talk about the problem of w2)

The idea of ​​sif here is

Calculate the weighted average of the word vectors in the sentence, and then subtract these word vectors from their respective projections on the first principal vector of the sentence vector matrix (composed of word vectors);

1. The weighted average method is:

where a is a hyperparameter that users can set by themselves;

2、

Guess you like

Origin blog.csdn.net/u013250861/article/details/133102313