Bo Yu Gongyi AI "hands-on science learning PyTorch depth version of" Task 02 study notes

Bo Yu Gongyi AI "hands-on science learning PyTorch depth version of" Task 02 study notes

Task 02: text preprocessing; language model; recurrent neural network infrastructure

Tell the truth, these three classes really hard core. (Touch my knowledge the blind)

The group also made before the brothers face recognition, doing his will to engage in NLP, this thing, listening is very tall with wood there (actually very mature wood there).

I look at this issue is holding a video of the three, is that this stuff exactly how to achieve that? How to enter into a sentence and then output a sentence, as I have not checked the information, they want it, do not know Zezheng.

This reflects the big brother of the NB, half do not know how we would like the whole, others will get a little bit out of the frame.

First, it is the text preprocessing. Now we learn in this fur inside, it is the pre-word, English word is divided, Chinese is even more brutal, and direct each word separately (but this obviously has drawbacks). So there is a corpus of each word (or characters) of word frequency statistics, and then establish correspondence between a word to the ID of each word (or characters) are mapped into a digital (two-shot oh!).

The second is the language model. Do not learn do not know, a school only to find victims, also another story children. The initial language model SB, called n-gram, directly to the Markov moved here with the set, the entire formula is derived not the Bayesian filter SLAM inside half of difficulty. So there is a lot of problems, according to the statement by the discussion inside the community Gangster: In addition to the features sparse, the parameter space is too big problem, in the Chinese text processing, the presence of Chinese semantic word is more complex than English, less vocabulary, sparse feature , classification accuracy rate (people say that this is the big brother chanting, we do not know). In short, it is important to pre-text, the language model is more important.

The language model there is also one thing, that the sampling time series data (the statement itself contains the timing information). Two ways: randomly sampled and the adjacent samples. Random sampling, that is, the entire corpus advance according to the time step n cut into segments, there is no overlap between the n segments, then the training time, just take an arbitrary n stages from the inside to the use of the finished thing. Adjacent sampling more interesting, first corpus according batchsize access, to ensure the access portion is an integer multiple of batchsize and reaches as long as possible (to achieve this well in the code). Then access this period, batchsize separated according to (imagine a long period is equally divided into batchsize), then these are stacked batchsize longitudinal section, corresponding to a matrix form of rows batchsize, access the data in when we press this column matrix based on access time step data corresponding to a window sliding from left to right in this matrix, this window is high BatchSize, width is the time step. However, instead of sliding, each use, compared to the last time step position shifting units to the right. Attention! This has resulted in a very interesting phenomenon, we will discuss below, hee hee hee.

Finally, it is this recurrent neural network infrastructure. Dog days, was a long pull clear, Ah weights and bias are shared. (Old, the brain does not serve). In fact itself was quite easy to understand, more wonderful is the middle of a hidden layer, the hidden layer outputs status information. Note that the calculation cycle neural network hidden layer is the need to use H t 1 H_{t-1} with X t X_t Two quantities, this H t 1 H_{t-1} May Interestingly, this variable is the equivalent of a word before leaving after a hidden layer of information (of course, there are actually left before all the information to the word), so the use of adjacent sampling, we can keep the last word of the last batch H t 1 H_{t-1} , Which then detach followed by the first word of the next batch, and random sampling will not enjoy the benefits of this.

Basically these difficulties and focus, there is one that one, give examples of code written inside really pretty simple and pretty.

发布了10 篇原创文章 · 获赞 8 · 访问量 4577

Guess you like

Origin blog.csdn.net/ICE_KoKi/article/details/104287549