Text collection of easy to use blog

Text classification process:

Text preprocessing items:

1. https://www.cnblogs.com/jiangxinyang/p/10207482.html

Depth study carried out using the blog text classification. I direct the text CNN can be run directly.

2.https://blog.csdn.net/u010297828/article/details/50465263

Classification of spam messages

3.http://xccds1977.blogspot.com/2015/05/word2vec.html

word2vec classification. This average is not as input of neural network.

4.https://zmister.com/archives/173.html

SMS Spam

5.https://blog.csdn.net/qq_34695147/article/details/81006059#%E7%AC%AC%E4%BA%8C%E5%B1%8A%E6%90%9C%E7%8B%90%E5%86%85%E5%AE%B9%E8%AF%86%E5%88%AB%E5%A4%A7%E8%B5%9B%E5%86%A0%E5%86%9Bluckyrabbit%E5%9B%A2%E9%98%9F%E7%9A%84%E8%A7%A3%E5%86%B3%E6%96%B9%E6%A1%88

Garbage CSDN, to members.

6. https://zhuanlan.zhihu.com/p/26729228

More fun of a real

 

Text classification requires some basic concepts used:

1.https://www.cnblogs.com/wangbogong/p/3211833.html

Text representation - vector space model, bag of words model. It is the most basic kind.

2.https://www.cnblogs.com/wangbogong/p/3251132.html

Feature selection. A kind of vector space model optimization.

3. https://www.jiqizhixin.com/articles/2018-07-25-5

Some conceptual introduction of Chinese NLP. We need to learn more about the case.

4.http://www.jeyzhang.com/text-classification-in-action.html

Feature works. A plurality of feature extraction did not quite understand.

5.

First, text preprocessing

Mainly contains the word stammer.

Crawler module: scrapy.

1.https://docs.scrapy.org/en/latest/topics/architecture.html#topics-architecture

This is the entire flowchart scrapy help understand scrapy

2.https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html#find

beautiful soup parsing

 

Second, feature selection

1.http://sklearn.lzjqsdd.com/modules/feature_extraction.html

skleran in some implementations based on the use of space vector model, the text conversion.

2. https://zhuanlan.zhihu.com/p/33779124

Also based on some of the features sklearn extraction. A couple of functions is very important.

3.https://blog.csdn.net/tianbwin2995/article/details/51693396

Or realization bag of words model sklearn.

4.https://blog.csdn.net/u014595019/article/details/52433754

And binding genssim sklearn library for feature extraction.

5.

 

Third, the model training

Fourth, the depth of learning

1. https://www.cnblogs.com/subconscious/p/5058741.html

Neural Network Starter

2 . https://www.cnblogs.com/subconscious/p/4107357.html

Continue Started

3.https://www.zhihu.com/question/22553761/answer/126474394

Again

4.https://www.sohu.com/a/235924191_633698

BP back propagation

5.https://blog.csdn.net/u014303046/article/details/78200010#351__296

BP continues

6. https://zhuanlan.zhihu.com/p/21930884

For further neural network

7.https://blog.csdn.net/weixin_42137700/article/details/84302045

Some basic concepts

Neural network needs to know, what is the stochastic gradient descent, these. . Very basic.

 

Five, word2vec

1.https://xiaosheng.me/2017/06/05/article67/

Statistical language model, very important

2.https://blog.csdn.net/qq_39422642/article/details/78658309

Word vector development

3. https://blog.csdn.net/leadai/article/details/81369884

Or word vector development

4.https://blog.csdn.net/a635661820/article/details/44130285

NNLM principle

5. https://www.cnblogs.com/ooon/p/5558119.html

NNLM and word2vec

6.https://spaces.ac.cn/archives/4299

word2vec principle recommendation

7.https://blog.csdn.net/itplus/article/details/37969817

word2vec recommended, but above that there is the pdf url download.

8.https://blog.csdn.net/Z4a9Gx/article/details/80268126

word2vec principle

9.https://blog.csdn.net/shuihupo/article/details/85156544

Training word2vec

Six, text CNN

1.http://www.hackcv.com/index.php/archives/104/?hmsr=toutiao.io&utm_medium=toutiao.io&utm_source=toutiao.io

Introduce the principle of

2.http://www.52nlp.cn/tag/textcnn

text CNN

3.https://me.csdn.net/mytestmy

personal space

Forgotten why saved.

4.

 

Seven introduced some algorithms

1. https://www.cnblogs.com/zhoukui/p/8584085.html

Contains some common machine algorithm can be changed based on this conduct.k- nearest neighbor, naive Bayes, logistic regression,K- means clustering have their thoughts and python code.

2.https://github.com/csuldw/MachineLearning/tree/master/Kmeans

kmeans algorithm code.

3.

Eight, model assessment

https://blog.csdn.net/sinat_26917383/article/details/75199996

 

Guess you like

Origin www.cnblogs.com/meikon/p/11448617.html