nlp&python | Use bert to extract molecular representation (ongoing)

Introduction

BERT has achieved great success in the field of natural language processing (NLP). Using unlabeled data sets for training, large-scale models that can learn complex language representations can be obtained. Then, we can apply similar research methods to chemical representations, especially SMILES sequences:
Insert picture description here

Self-supervised learning task

Insert picture description here
1. Masked language modeling (MASKEDLM)
The normative task proposed by BERT is to predict the true identity of the mask by training the model. Use the cross-entropy loss between the sequence output and the input mask to optimize the task.

2.SMILE

Guess you like

Origin blog.csdn.net/weixin_43236007/article/details/112623862