Introduction to must-link and cannot link constraints

When reading some papers, I often see some related work introductions about must-link and cannot link constraints, especially in the field of clustering.

Wikipedia introduction:

A must-link constraint is used to specify that the two instances in the must-link relation should be associated with the same cluster. A cannot-link constraint is used to specify that the two instances in the cannot-link relation should not be associated with the same cluster.

must-link constraints:

The must-connect constraint means that the samples must belong to the same cluster. If two samples meet the necessary connection constraint , then the two samples should be grouped into the same cluster during clustering .

cannot-link constraints:

Do not connect constraint means that the samples do not have to belong to the same cluster. If the two samples meet the disjoint constraint , then the two samples should not be grouped into the same cluster during clustering.

These two constraints are generally used as paired constraints as guidance information, and are more commonly used in metric learning or semi-supervised clustering.

matrix decomposition

These two constraints will also appear in the matrix decomposition work, as shown in the following figure (when using popular regular terms):
Insert picture description here
the values ​​in the constraint matrix are positive for dissimilar objects and negative for similar objects . The former are called disjoint constraints because they impose a penalty on the current approximation of the matrix factor, and the latter is a necessary chain constraint, which is to reduce the loss of the objective equation in the optimization process.

For the must-link constraint, a pair of objects of the same type should be closer in the latent representation space. For example, an example that satisfies the bound chain constraint is the drug-drug interaction, but the non-link constraint example here means that different types of objects are farther away in the latent representation space. In general, data sources with mandatory link constraints are more abundant.

Guess you like

Origin blog.csdn.net/qq_39463175/article/details/111410062