I am in here has been given MatchZoo a simple to use, this time I will give for the model-auto Senate.
This variable is used once, and the simple hands of the variables is the same, so I always direct copy, and delete unwanted places.
Download Data:
train_pack_processed = preprocessor.fit_transform(train) # 其实就是做了一个字符转id操作,所以对于中文文本,不需要分词
dev_pack_processed = preprocessor.transform(dev)
Definition Model:
# 模型的参数可以先使用官方初始参数,如果已有初始参数的,可以不需要写出来。
def build():
model = mz.models.DUET() # 同样,DUET网络可看官网的论文,这里不做解释;同样,模型的参数不做解释,官方文档有
ranking_task = mz.tasks.Ranking(loss=mz.losses.RankCrossEntropyLoss(num_neg=1)) # 定义损失函数,这里采用的是排序交叉熵损失函数,它还有一个分类交叉熵损失函数,看你如何定义你的数据
model.params['input_shapes'] = preprocessor.context['input_shapes']
model.params['embedding_input_dim'] = preprocessor.context['vocab_size'] # 如果版本较老,这里需要加1,因为要考虑一个UNK的字符,如果版本较新,这个以更新解决
model.params['embedding_output_dim'] = 300
model.params['task'] = ranking_task
model.params['optimizer'] = 'adam'
model.params['padding'] = 'same'
model.params['lm_filters'] = 32
model.params['lm_hidden_sizes'] = [32]
model.params['dm_filters'] = 32
model.params['dm_kernel_size'] = 3
model.params['dm_d_mpool'] = 3
model.params['dm_hidden_sizes'] = [32]
model.params['activation_func'] = 'relu'
model.params['dropout_rate'] = 0.32
model.params['embedding_trainable'] = True
model.guess_and_fill_missing_params(verbose=0)
model.params.completed()
model.build()
model.backend.summary()
model.compile()
return model
Automatic Scheduling model's core code:
model = build()
tuner = mz.auto.Tuner(
params=model.params, # 模型参数
train_data=train_pack_processed, # 训练集
test_data=dev_pack_processed, # 验证集
num_runs=10 # 调参次数
)
results = tuner.tune()
print(results)
If num_runs 10, 10 will output parameter adjustment parameter combination, and give each value of parameter combinations, the final output parameter combinations highest score.