image segmentation processing data loss-- extremely unbalanced condition

Preamble:

For small target image segmentation task, a picture is often only one or two goals, this network will increase the difficulty of training, generally there are three ways to solve:

1, select the appropriate loss, network optimization reasonable concern smaller target.

2, change the network structure, attention mechanisms.

3, generic attention mechanism, which is to detect the target area, then cut split training.

 

Scenes:

Now with U-net web-based, using keras divided achieve small goals.

 

Loss function:

1、Log loss

For dichotomous task, log loss as follows:

Where, yi is the input of real examples xixi category, pi to predict the probability enter the instance xi belongs to category 1. It represents the average logarithmic loss of each sample logarithmic loss for all samples.

The loss of function of each gradient return with the same degree of concern for each category, so vulnerable categories imbalance impact.

Referring to this case Airbus-Ship-Detection . The task is to detect boats on the sea, the whole picture of the sea accounted for the larger pieces, so the use of some techniques: the use of montage mosaic picture, a picture of the sea only sampling to reduce the image size.

 

2、WCE loss(weighted cross-entropy)

With the weight of the cross entropy

Dichotomous WCE:

The disadvantage of loss of human needs is difficult to adjust the sample weights to increase the difficulty of adjusting the time.

 

3、Focal loss

You can make the network active learning difficulties sample it?

focal loss is presented in the field of target detection, positive and negative samples in order to solve serious imbalance problem.

focal function formula:

The above comparison is actually much (1-PI) r .

loss value with probability samples of larger and smaller.

 

The basic idea is that for the next category of extreme imbalance of the situation, if the network in the log loss would tend to only predict negative samples and negative samples predicted probability will be very high, the return of the gradient is also great.

But if you add the above items, the focal function will predict the probability of loss of a large sample becomes smaller, but predicted a small sample of the probability of loss increases, thereby strengthening the attention on the positive sample.

from keras import backend as K
'''
Compatible with tensorflow backend
'''
def focal_loss(gamma=2., alpha=.25):
    def focal_loss_fixed(y_true, y_pred):
        pt_1 = tf.where(tf.equal(y_true, 1), y_pred, tf.ones_like(y_pred))
            pt_0 = tf.where(tf.equal(y_true, 0), y_pred, tf.zeros_like(y_pred))
            return -K.sum(alpha * K.pow(1. - pt_1, gamma) * K.log(pt_1))-K.sum((1-alpha) * K.pow( pt_0, gamma) * K.log(1. - pt_0))
return focal_loss_fixed

model_prn.compile(optimizer=optimizer, loss=[focal_loss(alpha=.25, gamma=2)])

使用U-net输入输出都是一张图,直接使用会导致loss值很大。而且调参alpha和gamma也麻烦。

 

4、Dice loss

直观理解为两个轮廓的相似程度。

或则表示为:

二分类的dice loss:

def dice_coef(y_true, y_pred, smooth=1): 
    intersection = K.sum(y_true * y_pred, axis=[1,2,3]) 
    union = K.sum(y_true, axis=[1,2,3]) + K.sum(y_pred, axis=[1,2,3]) 
    return K.mean( (2. * intersection + smooth) / (union + smooth), axis=0) 

def dice_coef_loss(y_true, y_pred): 
    1 - dice_coef(y_true, y_pred, smooth=1)

使用dice loss有时会不可信,原因是对于sofemax或log loss其梯度简言之是p-t ,t为目标值,p为预测值。而dice loss 为 2t2  /  (p+t)2

如果p,t过小会导致梯度变化剧烈,导致训练困难。

 

Guess you like

Origin www.cnblogs.com/hotsnow/p/10954624.html