Text classification process:
Text preprocessing items:
1. https://www.cnblogs.com/jiangxinyang/p/10207482.html
Depth study carried out using the blog text classification. I direct the text CNN can be run directly.
2.https://blog.csdn.net/u010297828/article/details/50465263
Classification of spam messages
3.http://xccds1977.blogspot.com/2015/05/word2vec.html
word2vec classification. This average is not as input of neural network.
4.https://zmister.com/archives/173.html
SMS Spam
Garbage CSDN, to members.
6. https://zhuanlan.zhihu.com/p/26729228
More fun of a real
Text classification requires some basic concepts used:
1.https://www.cnblogs.com/wangbogong/p/3211833.html
Text representation - vector space model, bag of words model. It is the most basic kind.
2.https://www.cnblogs.com/wangbogong/p/3251132.html
Feature selection. A kind of vector space model optimization.
3. https://www.jiqizhixin.com/articles/2018-07-25-5
Some conceptual introduction of Chinese NLP. We need to learn more about the case.
4.http://www.jeyzhang.com/text-classification-in-action.html
Feature works. A plurality of feature extraction did not quite understand.
5.
First, text preprocessing
Mainly contains the word stammer.
Crawler module: scrapy.
1.https://docs.scrapy.org/en/latest/topics/architecture.html#topics-architecture
This is the entire flowchart scrapy help understand scrapy
2.https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html#find
beautiful soup parsing
Second, feature selection
1.http://sklearn.lzjqsdd.com/modules/feature_extraction.html
skleran in some implementations based on the use of space vector model, the text conversion.
2. https://zhuanlan.zhihu.com/p/33779124
Also based on some of the features sklearn extraction. A couple of functions is very important.
3.https://blog.csdn.net/tianbwin2995/article/details/51693396
Or realization bag of words model sklearn.
4.https://blog.csdn.net/u014595019/article/details/52433754
And binding genssim sklearn library for feature extraction.
5.
Third, the model training
Fourth, the depth of learning
1. https://www.cnblogs.com/subconscious/p/5058741.html
Neural Network Starter
2 . https://www.cnblogs.com/subconscious/p/4107357.html
Continue Started
3.https://www.zhihu.com/question/22553761/answer/126474394
Again
4.https://www.sohu.com/a/235924191_633698
BP back propagation
5.https://blog.csdn.net/u014303046/article/details/78200010#351__296
BP continues
6. https://zhuanlan.zhihu.com/p/21930884
For further neural network
7.https://blog.csdn.net/weixin_42137700/article/details/84302045
Some basic concepts
Neural network needs to know, what is the stochastic gradient descent, these. . Very basic.
Five, word2vec
1.https://xiaosheng.me/2017/06/05/article67/
Statistical language model, very important
2.https://blog.csdn.net/qq_39422642/article/details/78658309
Word vector development
3. https://blog.csdn.net/leadai/article/details/81369884
Or word vector development
4.https://blog.csdn.net/a635661820/article/details/44130285
NNLM principle
5. https://www.cnblogs.com/ooon/p/5558119.html
NNLM and word2vec
6.https://spaces.ac.cn/archives/4299
word2vec principle recommendation
7.https://blog.csdn.net/itplus/article/details/37969817
word2vec recommended, but above that there is the pdf url download.
8.https://blog.csdn.net/Z4a9Gx/article/details/80268126
word2vec principle
9.https://blog.csdn.net/shuihupo/article/details/85156544
Training word2vec
Six, text CNN
Introduce the principle of
2.http://www.52nlp.cn/tag/textcnn
text CNN
3.https://me.csdn.net/mytestmy
personal space
Forgotten why saved.
4.
Seven introduced some algorithms
1. https://www.cnblogs.com/zhoukui/p/8584085.html
Contains some common machine algorithm can be changed based on this conduct.k- nearest neighbor, naive Bayes, logistic regression,K- means clustering have their thoughts and python code.
2.https://github.com/csuldw/MachineLearning/tree/master/Kmeans
kmeans algorithm code.
3.
Eight, model assessment
https://blog.csdn.net/sinat_26917383/article/details/75199996