2019 CS224N lecture1 Introduction and Word Vectors

其他 2020-02-08 20:09:16 阅读次数: 0

课程介绍

这门课主要学习什么

An understanding of the effective modern methods for deep learning
A big picture understanding of human languages and the difficulties in understanding and producing them
An understanding of and ability to build systems (in PyTorch) for some of the major problems in NLP:
- Word meaning, dependency parsing, machine translation, question answering

2019Winter与之前的课程有什么区别

新材料：character models，transformers， safety/fairness，multitask learn
实验：由之前的3个两周的实验改为5个一周的实验（占比6% + 4 x 12%）
实验中包含新材料：NMT with attention, ConvNets, subword modeling
由tensorflow改用Pytorch

五个实验的介绍

HW1 is hopefully an easy on ramp –an IPythonNotebook
HW2 is pure Python (numpy) but expects you to do (multivariate) calculus so you really understand the basics
HW3 introduces PyTorch
HW4 and HW5 use PyTorchon a GPU (Microsoft Azure)
- Libraries like PyTorch, Tensorflow(and Chainer, MXNet, CNTK, Keras, etc.) are becoming the standard tools of DL
For FP, you either
- Do the default project, which is SQuADquestion answering
  - Open-ended but an easier start; a good choice for most
- Propose a custom final project, which we approve
  - You will receive feedback from a mentor(TA/prof/postdoc/PhD)
- Can work in teams of 1–3; can use any language

lecture 1

如何表示一个单词的含义(meaning of a word)

建立所有同义词synonym和下义词hypernym(即“is a"的关系）的词库
- wordnet
one-hot的向量
word2vec
- Skip-Gram model
- CBOW

WordNet

作为一种资源很好，但缺少细微差别
缺少词的新义，不能保持更新
主观
需要人工
不能计算词的相似性

one-hot

任何两个词向量都是正交的，无法表示相似性
解决：结合WordNet中的同义词，结果失败了。由于不完整等原因

使用上下文(context)来表示词

Distributional semantics: A word’s meaning is given by the words that frequently appear close-by
一个词出现在文章中，固定一个window，其上下文指周围出现的词构成的集合
使用词w的很多context可以来构造w的表示

Word2vec (Mikolovet al. 2013)

o为contex words，c为center words
使用o和c的词向量来计算 $p(o|c)$ 或 $p(c|o)$
调整词向量来最大化该概率
目标函数，如下图，最小化代价函数就是最大化预测正确的概率
计算 $p(w_{t+j}|w_t;\theta)$
- 使用两个向量来表示每个词
  - $v_w$ ：w是center word
  - $u_w$ ：w是context word
- 计算
- 上式中使用了softmax，其作用为：

模型的训练

$\theta$ 代表模型所有参数，即所有词的词向量（每个词有两个词向量）

如何计算词向量的梯度

一个基础
链导法则

对于一个window中的一个context word计算代价函数关于 $v_c$ 的梯度

在这里插入图片描述

分子求导比较容易，log和exp抵消后，形式就和我们之前所说的一样
分母需要两次链导法则，中间有一步将求导和求和调换顺序：

最终结果为

当前中心词的梯度相当于当前context word o的词向量减去所有context 词向量的期望或者说加权平均值（概率*词向量）

如何计算所有的梯度
- 在一个window中需要遍历计算每一个center vector v的梯度，同时也要计算context u的梯度
- 在一个window中需要计算如下参数

为何需要两个向量

容易优化，最后取平均即可
使用一个词向量也可以

两种模型

Skip-grams（SG）：给定中心词预测context words
Continuous Bag of Words（CBOW）：给定context words预测中心词

优化：梯度下降

Gradient Descent is an algorithm to minimize $J(\theta)$
在这里插入图片描述

随机梯度下降 (SGD)：随机选择一个window来更新词向量

总结

本节课主要讲了词的表示，中心想法是使用一个词的context来表示词，这即是word2vec的思想。word2vec有两种模型，skip-gram是根据中心词预测上下文（本文就是这种，但去掉了负采样），CBOW是根据周围词预测中心词。

本文中截图来自斯坦福CS224N课程，感谢

24kb_

发布了29 篇原创文章 · 获赞 10 · 访问量 7171

私信关注

猜你喜欢

转载自blog.csdn.net/weixin_42017042/article/details/104000721

2019 CS224N lecture1 Introduction and Word Vectors

【NLP CS224N笔记】Lecture 1 - Introduction and Word Vectors

Task 1: Introduction and Word Vectors（附代码）（Stanford CS224N NLP with Deep Learning Winter 2019）

2019 CS224N lecture2 Word Vectors and Word Senses

[cs224n] Lecture 2 – Word Vectors and Word Senses

CS224N（2019）——Introduction and Word Vector（一）

2019-CS224N-Assignment 1: Exploring Word Vectors

【NLP CS224N笔记】Lecture 2 - Word Vectors2 and Word Senses

【NLP CS224N笔记】Lecture 3 GloVe： Global Vectors for Word Representation

Task 2: Word Vectors and Word Senses （附代码）（Stanford CS224N NLP with Deep Learning Winter 2019）

【NLP CS224N笔记】Lecture 1 - Introduction of NLP

cs224n---lecture2: Word Vectors

cs224n学习笔记L2:word vectors and word senses

cs231n lecture1 introduction

CS224n学习笔记：Lecture1 & 2

CS224N Learning - Lecture1

cs224n---lecture1:Introduction

Lecture1- Introduction and Word Vectors 斯坦福 nlp 教程

【NLP CS224N笔记】Lecture 2 - Word Vector Representations: word2vec

[cs224n] Lecture 2 | Word Vector Representations: word2vec

2019斯坦福CS224n深度学习自然语言处理笔记（1）——绪论与Word2Vec

CS224n NLP with Deep Learning（1）：Introduction NLP与深度学习入门

CS224n assignment1 Q3 word2vec

Task_01_Introduction and Word Vectors

深度强化学习cs294 Lecture1: Introduction and Course Overview

Task 4: Contextual Word Embeddings （附代码）（Stanford CS224N NLP with Deep Learning Winter 2019）

Word Vectors详解(1)

Lecture1: Introduction to Reinforcement Learning

cs224n assignment 1总结

CS224n学习笔记1

今日推荐

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

国产云输入法——仅华为无云端数据上传安全问题

开源日报 | 工业开源项目OGG 1.0；姐姐，你要和我一起配置火狐吗；苹果AI遥遥落后？Fedora 40

开放签电子签章：停止新增，优化体验，前进更进（五一假期前工作）

开源日报 | 中学生开源前端动画引擎；全球首个Llama3 8B中文版开源模型；联想电脑恐出局；Linus讽刺AI炒作

“百模大战”必有一战 | 2024中国“百模大战”竞争格局分析

周排行

Family Tree 题解

BZOJ 1093 最大半连通子图 SCC + DP

幂等处理

Spring----学习（2）----XML 配置Bean 自动装配

SQL Server 远程更新目标表数据

HIbernate3.6 环境搭建

特殊符号正则表达式

【Linux】第一章进程的理解

843. n-皇后问题（dfs+输出各种情况）

空间数据库2

每日归档

更多

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)

2024-04-21(0)

2024-04-20(6)

2024-04-19(5)

2024-04-18(0)

2024-04-17(5)