Xiaobai doing scientific research (3) data processing

After reading a lot of papers, we ourselves have some understanding and context of the overall progress, so we can try to reproduce it in the next step~

If you want to reproduce other people’s code, you must find other people’s data sets. After you get the data, you can start processing. This article mainly sticks to some of the better blogs you have seen.

platform

I think pytorch is more handsome hhhh than tensorflow, and many senior brothers also say that pytorch is better, so you can start with the official website tutorial ~

  • I need to point out that after installing pytorch, you don’t need to install cuda and you can also use gpu (because some cuda drivers are included in torch), but it seems that the function is limited. If you want the full version, you should still be honest cuda+cudnn

data processing

To train the model, it is best to feed in a batch of data, so we need to read the data we need, and then let the data be presented in batches. Here pytorch provides very convenient tools Dataset and DataLoader , Can be used directly after the definition;

I want to focus on torchtext , which is mainly used in text processing, which has very rich and convenient functions. In fact, some text tutorials on the pytorch official website use this tool, but there is no systematic usage. So I googled and found that this article is very clear, and there is another one but this interface is too ugly, I didn't read it carefully .

In addition

I refer to (zhào) test (chāo) the code provided by Microsoft's open source recommendation system to migrate part of the data processing (actually just copy... I changed a little part, added some comments, and deleted some unnecessary abstractions for small projects) pytorch to the next platform, details refer to my warehouse .

I am still exploring it myself, and I will keep updating it, hoping to communicate widely.

Guess you like

Origin blog.csdn.net/namespace_Pt/article/details/109436260