How BCELoss is calculated:
Sequence tagging: Similar to word segmentation tasks, one word corresponds to one part of speech, similar to multi-classification tasks
Multi-label classification task: a word can correspond to multiple parts of speech, similar to complex SPO annotations in relation extraction