Article Directory
1. Data Preparation
1.1 Building a corpus
If there is no given corpus file (such as corpus.txt), you can use the training set and test set data to construct the corpus file. The specific code is as follows (code file name):
filtered_line = set()
with open('../../data/raw/train.txt', 'r') as f:
line = f.readline()
while line:
if line[-1] != '\n':
line += '\n'
filtered_line.add(line)
line = f.readlin