论文解惑《word2vec Parameter Learning Explained》1.1--CBOW模型中One-word context情况公式推导问题

  word2vec中有CBOW和Skip-Gram模型,对于两个模型中的参数如何学习的公式推导,在《word2vec Parameter Learning Explained》中有详细解释,我在阅读1.1节One-word context时对于公式(8)的推导感到不解,花了些时间,原文如下:
  “Let us now derive the update equation of the weights between hidden and output layers. Take the derivative of E with regard to j j -th unit’s net input u j u_j , we obtain E u j = y j t j : = e j \frac{\partial E}{\partial u_j}=y_j-t_j:=e_j   where t j = 1 ( j = j ) , i.e , t j t_j=\mathbb{1}(j=j^*),\text{i.e},t_j will only be 1 when the j j -th unit is the output word, otherwise t j = 0. t_j=0.
  我一开始不明白是怎么推到这一步的,后来发现过程很显然:
E = log j = 1 V exp ( u j ) u j e j = E u j = exp ( u j ) j = 1 V exp ( u j ) u j = y j u j = y j t i \begin{aligned} E & =\text{log}\sum_{j'=1}^V{\text{exp}(u_{j'})-u_{j*}} \\ e_j=\frac{\partial E}{\partial u_j} & =\frac{\text{exp}(u_j)}{\sum_{j'=1}^V{\text{exp}(u_{j'})}}-u_{j*} \\ & =y_j-u_{j*} \\ & =y_j-t_i \end{aligned}

发布了27 篇原创文章 · 获赞 10 · 访问量 5010

猜你喜欢

转载自blog.csdn.net/l1l1l1l/article/details/102914512