Over the weekend, I recommend the Github project compiled by Songyingxin / NLPer-Interview . This warehouse mainly records interview questions related to NLP algorithm engineers :
https://github.com/songyingxin/NLPer-Interview
Lao Song is currently an algorithm engineer at Baidu and the author of a tea book club who knows Lao Song. The following is mainly from Lao Song’s description of the warehouse. Click " Read the original text " to go directly to the main page of the warehouse. Star is recommended. The content is quite rich.
This warehouse mainly records my accumulation of NLP-related knowledge. I have made a lot of notes before. Considering that the autumn recruitment has arrived, I will gradually clear this knowledge in the review process, and then organize the relevant knowledge notes into special topics. Help me review better.
At the same time, open source is released, I hope everyone can help me make up the related technology stack to see where I am weaker, and also help all the partners in the autumn recruitment to better review. If you want to work with students, you can contact me. After all, it is a bit difficult to do so much by one person. Fortunately, I did a lot of notes in the previous period.
It is recommended to use Typora editor to open, what you see is what you get.
Contents
1. Basic Programming Language
This folder mainly records some language details of python and c++. After all, these two major languages are mainstream, and they are basically required. Currently, we are still checking the gaps.
C++ interview questions
Python interview questions
2. Mathematical foundation
This folder mainly records some mathematics-related knowledge, including high numbers, linear algebra, probability theory and information theory, Lao Song's personal experience, will ask, and is currently still in the process of finding out the gaps.
Probability theory
advanced mathematics
Linear algebra
Information Theory
3. Computer basic theoretical knowledge
This part of the content is generally not tested very much, so I did not focus on it, at least now I have almost no questions about this aspect. What is interesting is that I voted for the NLP algorithm of a certain department of Ali, and there is actually someone who does not understand NLP. , The whole process is really nonsense, it's all about development.
4. Machine Learning Fundamentals
This part has already begun to enter the topic. Facts have proved that some major manufacturers will mention some basic machine learning algorithm knowledge. Therefore, I think that several core models are necessary for this part.
Machine learning project process
Discriminant model vs. generative model
Frequency Pie vs Bayesian
Data preprocessing
Feature engineering
Feature Engineering-Association Rules
Model-SVM
Model-clustering algorithm
Model-Decision Tree
Model-Logistic Regression
Model-Naive Bayes
Model-Random Forest
Model-linear regression
5. Basics of Deep Learning
This part mainly describes the basic knowledge of deep learning, which is the core point, but in many cases, the questions of many interviewers are basically the same, but I personally think that it is beneficial to have such an overall and comprehensive knowledge framework.
Deep learning project process
5.1 Basic theory
Basic Theory-Multi-Task Learning
Basic Theory-Integrated Learning
Basic Theory-Evaluation Index for Classification Problems
Basic Theory-Distance Measurement Method
Basic theory-objective function, loss function, cost function
Basic Theory-Bias vs. Variance, Underfitting vs Overfitting
Basic Theory-Deep Learning from a Data Perspective
Basic theory-gradient disappearance, gradient explosion problem
Basic Theory-Curse of Dimensionality
Basic Theory-Exponential Weighted Average
Basic theory-local minimum, saddle point
Basic Theory-Integrated Learning
Basic Theory-Integrated Learning
5.2 Basic unit
Basic unit-CNN
Basic unit-MLP
Basic unit-RNN
5.3 Tuning related
Parameter tuning-hyperparameter tuning
Tuning-activation function
Tuning-weight initialization scheme
Tuning-optimization algorithm
5.4 Tricks
Trick - Dropout
Trick - Normalization
Trick-Fusion training set, validation set, test set
Trick-early termination
Trick-learning rate decay
Trick-regularization
6. Statistical Natural Language Processing
There are not many early notes in this part, so I haven't started much yet.
7. Deep learning natural language processing
This part can be regarded as the core knowledge, and this part needs to be gradually improved. Time is a bit tight.
Text data preprocessing
Evaluation indicators for major tasks
Some ideas for improving the NLP model
7.1 Trilogy of Word Vectors
Word Vector-Word2Vec
Word Vector-Glove
Word vector-FastText
7.2 Pre-trained language model
Pre-training language model-BERT improvement research
Pre-trained language model-integrated into the knowledge graph
Pre-trained language model-natural language generation
7.3 Attention mechanism
7.4 Text classification
7.5 Semantic matching
7.6 Reading Comprehension
8. Source code reading
This part mainly recommends some source codes that I have read. Some source codes are related to NLP, some are related to deep learning, and some of the source codes are personally annotated and will be listed accordingly.
9. The old Song slag algorithm experience
This part is mainly about some insights during the interview process. Hey, I am almost autistic.
Reference
[1] DeepLearning-500-questions - a good warehouse
[2] Algorithm_Interview_Notes-Chinese - The knowledge is relatively old, but it is also very good
Others are mainly my daily accumulation and reading papers.
About AINLP
AINLP is an interesting natural language processing community with AI, focusing on the sharing of AI, NLP, machine learning, deep learning, recommendation algorithms and other related technologies. Topics include text summarization, intelligent question answering, chat robots, machine translation, automatic generation, and knowledge Graphs, pre-training models, recommendation systems, computational advertisements, recruitment information, job search experience sharing, etc. Welcome to follow! Please add AINLPer (id: ainlper) to add technical exchange group, note work/research direction + add group purpose.