Bytedance: 3 rounds of technical + 1 round of hr, it still hangs
Word2vec principle
Google proposed in 2013
It mainly contains two models:
- Skip-gram model
- Continuous bag of words (CBOW)
Two effective training methods:
- Negative sampling
- Sequence softmax (hierarchical softmax)
It can better express the similarity and analogy relationship between different words .
skip-gram: Use the center word to predict the background word
CBOW: Use the background word to predict the center word, and you need to do the projection layer for averaging
The optimization goal is to maximize the probability of the background word appearing under the condition of the head word
Before optimization, each parameter update needs to involve all the words in the dictionary, and the complexity is O (∣ V ∣) O(|V|)O ( ∣ V ∣ )
How does w2v negative sampling