CS224n: Natural Language Processing with Deep Learning

Stanford / Winter 2019

Logistics

Lectures: are on Tuesday/Thursday 4:30-5:50pm PST in NVIDIA Auditorium.
Lecture videos for enrolled students: are posted on mvideox.stanford.edu (requires login) shortly after each lecture ends. Unfortunately, it is not technically possible to make these videos viewable by non-enrolled students.
Public lecture videos: Once the course has completed, we plan to also make the videos publicly available on YouTube. Please be patient as it takes some time to prepare the videos for release. In the meantime, the videos from Winter 2017 are publicly available on YouTube.
Other public resources: The lecture slides and assignments will be posted online as the course progresses. We are happy for anyone to use these resources, but we cannot grade the work of any students who are not officially enrolled in the class.
Office hours: Information here.
Contact: Students should ask all course-related questions in the Piazza forum, where you will also find announcements. For external enquiries, personal matters, or in emergencies, you can email us at [email protected].
Sitting in on lectures: In general we are happy for guests to sit-in on lectures if they are a member of the Stanford community (registered student, staff, and/or faculty). If the class is too full and we're running out of space, we ask that you please allow registered students to attend. Due to high enrollment, we cannot grade the work of any students who are not officially enrolled in the class.
Academic accommodations: If you need an academic accommodation based on a disability, you should initiate the request with the Office of Accessible Education (OAE). The OAE will evaluate the request, recommend accommodations, and prepare a letter for faculty. Students should contact the OAE as soon as possible since timely notice is needed to coordinate accommodations.

Instructors

Chris Manning

Abigail See
Head TA

Teaching Assistants

Content

What is this course about?

Natural language processing (NLP) is one of the most important technologies of the information age, and a crucial part of artificial intelligence. Applications of NLP are everywhere because people communicate almost everything in language: web search, advertising, emails, customer service, language translation, medical reports, etc. In recent years, Deep Learning approaches have obtained very high performance across many different NLP tasks, using single end-to-end neural models that do not require traditional, task-specific feature engineering. In this course, students will gain a thorough introduction to cutting-edge research in Deep Learning for NLP. Through lectures, assignments and a final project, students will learn the necessary skills to design, implement, and understand their own neural network models. This year, CS224n will be taught for the first time using PyTorch rather than TensorFlow (as in previous years).

Previous offerings

This course was formed in 2017 as a merger of the earlier CS224n (Natural Language Processing) and CS224d (Natural Language Processing with Deep Learning) courses. Below you can find archived websites and student project reports.

CS224n Websites: Winter 2018 / Winter 2017 / Autumn 2015 / Autumn 2014 / Autumn 2013 / Autumn 2012 / Autumn 2011 / Winter 2011 / Spring 2010 / Spring 2009 / Spring 2008 / Spring 2007 / Spring 2006 / Spring 2005 / Spring 2004 / Spring 2003 / Spring 2002 / Spring 2000

CS224n Reports: Winter 2018 / Winter 2017 / Autumn 2015 and earlier

CS224d Reports: Spring 2016 / Spring 2015

Prerequisites

Proficiency in Python
All class assignments will be in Python (using NumPy and PyTorch). If you need to remind yourself of Python, or you're not very familiar with NumPy, you can come to the Python review session in week 1 (listed in the schedule). If you have a lot of programming experience but in a different language (e.g. C/C++/Matlab/Java/Javascript), you will probably be fine.
College Calculus, Linear Algebra (e.g. MATH 51, CME 100)
You should be comfortable taking (multivariable) derivatives and understanding matrix/vector notation and operations.
Basic Probability and Statistics (e.g. CS 109 or equivalent)
You should know basics of probabilities, gaussian distributions, mean, standard deviation, etc.
Foundations of Machine Learning (e.g. CS 221 or CS 229)
We will be formulating cost functions, taking derivatives and performing optimization with gradient descent. If you already have basic machine learning and/or deep learning knowledge, the course will be easier; however it is possible to take CS224n without it. There are many introductions to ML, in webpage, book, and video form. One approachable introduction is Hal Daumé's in-progress A Course in Machine Learning. Reading the first 5 chapters of that book would be good background. Knowing the first 7 chapters would be even better!

Reference Texts

The following texts are useful, but not required. All of them can be read free online.

Dan Jurafsky and James H. Martin. Speech and Language Processing (3rd ed. draft)
Jacob Eisenstein. Natural Language Processing
Yoav Goldberg. A Primer on Neural Network Models for Natural Language Processing
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning

If you have no background in neural networks but would like to take the course anyway, you might well find one of these books helpful to give you more background:

Michael A. Nielsen. Neural Networks and Deep Learning
Eugene Charniak. Introduction to Deep Learning

Coursework

Assignments (54%)

There are five weekly assignments, which will improve both your theoretical understanding and your practical skills. All assignments contain both written questions and programming parts.

Credit:
- Assignment 1 (6%): Introduction to word vectors [zip] [preview]
- Assignment 2 (12%): Derivatives and implementation of word2vec algorithm
- Assignment 3 (12%): Dependency parsing and neural network foundations
- Assignment 4 (12%): Neural Machine Translation with sequence-to-sequence and attention
- Assignment 5 (12%): Neural Machine Translation with ConvNets and subword modeling
Deadlines: All assignments are due on either a Tuesday or a Thursday before class (i.e. before 4:30pm). All deadlines are listed in the schedule.
Submission: Assignments are submitted via Gradescope (access code is here — requires Stanford login). If you need to sign up for a Gradescope account, please use your @stanford.edu email address. Further instructions are given in each assignment handout. Do not email us your assignments.
Collaboration: Study groups are allowed, but students must understand and complete their own assignments, and hand in one assignment per student. If you worked in a group, please put the names of your study group at the top of your assignment. Please ask if you have any questions about the collaboration policy.
Honor Code: We expect students to not look at solutions or implementations online. Like all other classes at Stanford, we take the student Honor Codeseriously.

Final Project (43%)

The Final Project offers you the chance to apply your newly acquired skills towards an in-depth application. Students have two options: the Default Final Project (in which students tackle a predefined task, namely textual Question Answering) or a Custom Final Project (in which students choose their own project). For both options, credit for the final project is broken down as follows:

Credit:
- Project proposal: 5%
- Project milestone: 5%
- Project poster: 3%
- Project report: 30%
Deadlines: The project proposal, milestone and report are all due at 4:30pm. All deadlines are listed in the schedule.
Team size: Students may do final projects solo, or in teams of up to 3 people. We strongly recommend you do the final project in a team. Larger teams are expected to do correspondingly larger projects, and you should only form a 3-person team if you are planning to do an ambitious project where every team member will have a significant contribution.
Contribution: In the final report we ask for a statement of what each team member contributed to the project. Team members will typically get the same grade, but we may differentiate in extreme cases of unequal contribution. You can contact us in confidence in the event of unequal contribution.
External collaborators: You can work on a project that has external (non CS224n student) collaborators, but you must make it clear in your final report which parts of the project were your work.
Sharing projects: You can share a single project between CS224n and another class, but we expect the project to be accordingly bigger, and you must declare that you are sharing the project in your project proposal.
More information: See the Final project page.

Participation (3%)

We appreciate everyone being actively involved in the class! There are several ways of earning participation credit, which is capped at 3%:

Attending guest speakers' lectures:
- In the second half of the class, we will have three invited speakers (dates will be confirmed soon). Our guest speakers make a significant effort to come lecture for us, so (both to show our appreciation and to continue attracting interesting speakers) we do not want them lecturing to a largely empty room.
- For on-campus students, your attendance at lectures with guest speakers is expected! You will get 0.5% per speaker (1.5% total) for attending.
- Since SCPD students can’t (easily) attend classes, they can instead get 0.83% per speaker (2.5% total) by writing a ‘reaction paragraph’ based on listening to the talk; details will be provided. Non-SCPD students with an unavoidable absence who ask in advance can also do this option.
Attending two random lectures: At two randomly-selected (non-guest) lectures in the quarter, we will take attendance. Each is worth 0.5% (total 1%).
Completing mid-quarter evaluation: Around the middle of the quarter, we will send out a survey to help us understand how the course is going, and how we can improve. Completing it is worth 0.5%.
Piazza participation: The top ~20 contributors to Piazza will get 3%; others will get credit in proportion to the participation of the ~20th person.
Karma point: Any other act that improves the class, which a CS224n TA or instructor notices and deems worthy: 1%

Late Days

Each student has 6 late days to use. A late day extends the deadline 24 hours. You can use up to 3 late days per assignment (including all five assignments, project proposal, project milestone, project final report, but not poster).
Teams must use one late day per person if they wish to extend the deadline by a day. For example, a group of three people must have at least six remaining late days between them (distributed among them in any way) to extend the deadline two days.
Once you have used all 6 late days, the penalty is 10% of the assignment for each additional late day.

Regrade Requests

If you feel you deserved a better grade on an assignment, you may submit a regrade request on Gradescope within 3 days after the grades are released. Your request should briefly summarize why you feel the original grade was unfair. Your TA will reevaluate your assignment as soon as possible, and then issue a decision. If you are still not happy, you can ask for your assignment to be regraded by an instructor.

Credit/No credit enrollment

If you take the class credit/no credit then you are graded in the same way as those registered for a letter grade. The only difference is that, providing you reach a C- standard in your work, it will simply be graded as CR.

Schedule

Lecture slides will be posted here shortly before each lecture. If you wish to view slides further in advance, refer to last year's slides, which are mostly similar.

This schedule is subject to change.

Date	Description	Course Materials	Events	Deadlines
Tue Jan 8	Introduction and Word Vectors [slides] Gensim word vectors example: [zip] [preview]	Suggested Readings: Word2Vec Tutorial - The Skip-Gram Model Efficient Estimation of Word Representations in Vector Space (original word2vec paper) Distributed Representations of Words and Phrases and their Compositionality (negative sampling paper)	Assignment 1 out [zip] [preview]
Thu Jan 10	Word Vectors 2 and Word Senses [slides]	Suggested Readings: GloVe: Global Vectors for Word Representation (original GloVe paper) Improving Distributional Similarity with Lessons Learned from Word Embeddings Evaluation methods for unsupervised word embeddings A Latent Variable Model Approach to PMI-based Word Embeddings Linear Algebraic Structure of Word Senses, with Applications to Polysemy On the Dimensionality of Word Embedding.
Fri Jan 11	Python review session [slides]	1:30 - 2:50pm Skilling Auditorium [map]
Tue Jan 15	Backpropagation [notes]	Suggested Readings: CS231n notes on backprop Review of differential calculus	Assignment 2 out	Assignment 1 due
Thu Jan 17	Neural Networks	Suggested Readings: CS231n notes on network architectures Natural Language Processing (almost) from Scratch Learning Representations by Backpropagating Errors Derivatives, Backpropagation, and Vectorization Yes you should understand backprop
Tue Jan 22	Linguistic Structure: Dependency Parsing	Suggested Readings: Incrementality in Deterministic Dependency Parsing A Fast and Accurate Dependency Parser using Neural Networks Dependency Parsing Globally Normalized Transition-Based Neural Networks Universal Stanford Dependencies: A cross-linguistic typology Universal Dependencies website	Assignment 3 out	Assignment 2 due
Thu Jan 24	The probability of a sentence? Recurrent Neural Networks and Language Models	Suggested Readings: N-gram Language Models and Perplexity The Unreasonable Effectiveness of Recurrent Neural Networks Recurrent Neural Networks Tutorial Sequence Modeling: Recurrent and Recursive Neural Nets On Chomsky and the Two Cultures of Statistical Learning
Tue Jan 29	Vanishing Gradients and Fancy RNNs	Suggested Readings: Understanding LSTM Networks Vanishing Gradients Example	Assignment 4 out	Assignment 3 due
Thu Jan 31	Machine Translation, Seq2Seq and Attention	Suggested Readings: Statistical Machine Translation slides (see lectures 2/3/4) Statistical Machine Translation Book BLEU metric Original sequence-to-sequence NMT paper (also describes beam search) Earlier sequence-to-sequence speech recognition paper (includes detailed beam search alg) Original sequence-to-sequence + attention paper Guide to attention and other RNN augmentations Massive Exploration of Neural Machine Translation Architectures
Tue Feb 5	Practical Tips for Final Projects
Thu Feb 7	Question Answering and the Default Final Project		Project Proposal out	Assignment 4 due
Tue Feb 12	ConvNets for NLP	Suggested Readings: Convolutional Neural Networks for Sentence Classification A Convolutional Neural Network for Modelling Sentences
Thu Feb 14	Information from parts of words: Subword Models		Assignment 5 out	Project Proposal due
Tue Feb 19	Modeling contexts of use: Contextual Representations and Pretraining
Thu Feb 21	Transformers (guest lecture by Ashish Vaswani)		Project Milestone out	Assignment 5 due
Tue Feb 26	Natural Language Generation
Thu Feb 28	Reference in Language and Coreference Resolution
Tue Mar 5	Multitask Learning: A general model for NLP? (guest lecture by Richard Socher)
Thu Mar 7	Constituency Parsing and Tree Recursive Neural Networks	Suggested Readings: Parsing with Compositional Vector Grammars. Constituency Parsing with a Self-Attentive Encoder		Project Milestone due
Tue Mar 12	Safety, Bias, and Fairness (guest lecture by Margaret Mitchell)
Thu Mar 14	Future of NLP + Deep Learning
Sun Mar 17				Final Project Report due
Wed Mar 20	Final project poster session	6 - 9pm McCaw Hall at the Alumni Center [map]		Poster due

Stanford:Natural Language Processing with Deep Learning