思路小结-kaggle

第一阶段：0.459 7% 0.557 48/3634 2%
matterport mask-rcnn
所有的方法都是从公开讨论中学到的
使用https://github.com/lopuhin/kaggle-dsbowl-2018-dataset-fixes

1）
Augmentations

Since there are hundreds of training images, we must find some useful augmentations to prevent our models from overfitting and make them generalizable. Here are some methods I have tried but it didn't work for me:

add gaussian goise
color to gray
contrast and brightness
random crop 512x512 if image size is bigger than 512 otherwise resize the image to 512x512
mosacis
mosacis+random crop
mosacis+random crop+ h&e
rotate 90 degrees
radnom rotate 90,180,270 degrees
rotate +-5 degrees on top of flip & 90 degree rotation
elastic transform
I only use flip up&down&left&right

A question - If you rotated your images by 5 degree, how did you fill the part outside of the image, just left it 0?
he only used ""use flip up&down&left&right"; For me too H&E data did not help and rotations by 0-15 did not help.

attention：
When I use general normalization, I guess this method doesn't work well with other methods, so I just use flip. 1st place solution use many augmentation methods and it actually works very well. So always choose the best methods for your model.

-------------------------------------------
As there were just some hundreds of training images, we needed to come up with specific augmentations to prevent our models from overfitting and make them more or less generalizable. We used a lot of heavy augmentations (maybe too heavy)

Clahe, Sharpen, Emboss
Gaussian Noise
Color to Gray
Inverting - we should not have used it, some images were not predicted correctly on stage2 because of this augmentation
Remapping grayscale images to random color images
Blur, Median Blur, Motion Blur
contrast and brightness
random scale, rotates and flips
Heavy geometric transformations: Elastic Transform, Perspective Transform, Piecewise Affine transforms, pincushion distortion
Random HSV
Channel shuffle - I guess this one was very important due to the nature of the data
Nucleus copying on images. That created a lot of overlapping nuclei. It seemed to help networks to learn better borders for overlapping nuclei.

2)
Additional data
No. I have tried to add h&e dataset to training data, but it didn't improve my performance. I am using https://github.com/lopuhin/kaggle-dsbowl-2018-dataset-fixes

3)
Ensembling
No. I spend almost one week trying this method but it didn't work so well. I devided the training data into two categories: color & grey. I have seen someone get a high score (0.5+) in stage1 by using this method. I am very interested in this so I really hope that someone can share some solutions about this.

4)Parameters
train:

inti_with= coco

RESNET_ARCHITECTURE = "resnet101"

MEAN_PIXEL = np.array([0., 0., 0.])

RPN_NMS_THRESHOLD = 0.7

DETECTION_MIN_CONFIDENCE = 0.7

DETECTION_NMS_THRESHOLD = 0.3

TRAIN_ROIS_PER_IMAGE = 600

RPN_TRAIN_ANCHORS_PER_IMAGE = 320

LEARNING_RATE = 1e-3

inference:

RPN_NMS_THRESHOLD = 0.6

DETECTION_NMS_THRESHOLD = 0.1

Training

1e-3 all 20epochs. Choose the best point according to the display of tensorboard, then 1e-4 or 1e-5 train all 20epochs. (if val loss stops decreasing, stops training) optimizer=Adam.

question：
1. I am also using matterport mask-rcnn with image-size 1024 * 1024. I found it gets better score than small sizes. What size are you using? 2. you are using a very low DETECTION_NMS_THRESHOLD, will this cause problems when the GT is few? 3. you are using zero MEAN_PIXEL, what is the consideration for this?
1. I am using 512*512. 2. Sorry there are some parameters I didn't mentioned above. When I was training, I set DETECTION_NMS_THRESHOLD = 0.3, DETECTION_MIN_CONFIDENCE = 0.7. POST_NMS_ROIS_TRAINING = 2000, POST_NMS_ROIS_INFERENCE = 2000 3. general standardization can speed up the convergence of your algorithm.

5)
Post processing Use binary_dilation.
6)
general standardization. img = img-mean(img)/std(img).
7)
Filter
其中五份训练集分别做（1）原始train数据（2）分水岭算法（3）随机游走+分水岭（4）基于中值滤波的geodesic active contour （5）基于反高斯滤波的geodesic active contour
Each epoch I would randomly swap each annotation for one of the five buckets (the real training data, the watershed data, the random walker and watershed data, and the two buckets of geodesic and watershed data) so each epoch would only see one image once.
每一个epoch都随机选择上述中五份train的其中一个，所以一个epoch只能可视化一张图片

8)
fill holes\wateshed
https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.ndimage.morphology.binary_fill_holes.html#scipy-ndimage-morphology-binary-fill-holes

9)
I want to crop the image into 4 sub-images and masks. After that, I used your code to combine and merge layers in the edge. However, it gives me an unexpected result that deleted masks in the edges. thus I cannot recover the original masks(before cropping). This is my output using the below code of id=c901794d1a421d52e5734500c0a2a8ca84651fb93b19cec2f411855e70cae339. In additions, what is happen if these sub-images overlapping? For example, image_crop1=image[:512,:512,:], image_crop3=image[500:,:500,:]

10)
几张图片是由一张显微镜图片得到 --》拼接再切分
神经网络风格迁移

11）
I was wondering if compared the model with pretrained coco weights and without? And whether or not it made a significant difference?
I only compared the model with pretrained coco and imagenet and i found that coco is better than imagenet in this competition.

12）
(1) What is the purpose for binary dilation as post-processing, compared to other methods such as opening? Does dilation make the masks larger? (1) Did learning rate decay help the convergence? I found decreasing LR to 1e-4 or below causes overfitting, but I was using SGD with momentum. Did you see an improvement using Adam vs. SGD? (2) Since you're using 512x512, do you use a batch size (IMAGES_PER_GPU) > 1?
answer：
Dilation is generally used to extend edges and fill some small holes. As Heng mentioned before "Deep network is the weakest at the boundary image and strongest are the center.". In this competition, it is very necessary to do some works on boundary detection. It actually improve my result. (2) I have not used SGD, so it is hard to tell which one is more suitable in this competition. As far as i know, Keven Wang and Panpan Zhou is using SGD, you can discuss with them. (3) batch_size=1.

13）
我只训练了彩色图像，然后在stage_test中测试了颜色数据，没有比灰度图像训练更好的效果，测试彩色图像我也尝试从BGR2HSV传输彩色图像，然后进行训练，也没有得到更好的结果。

14）
q：Random crop with size of 512. Otherwise you resize them then lost too much information. Random crop helps aug data also. I achieved 0.494 for matterport only using random crop , mosaic拼接and h&e.
a：When you are using random crop, how did you set your IMAGE_MIN_DIM IMAGE_MAX_DIM in config file?
q：Just use 512. I was test 256 but it is very bad. I did not resize if the size is bigger than 512
a：As I'm sure you noticed a lot of the test images are really narrow so resizing distorts the images a lot.
q：resize all to 512 and test on this size. I think it may have onther solution like sliding but it hard to merge cropped results to the final result
a：when you use random crop for matterport Mask RCNN, did you just do it during training and only for images larger than 512? In another word, did you still use resize when image are smaller than 512x512, and always use resize when detecting?
q：i did not test on mosaic testing set. I only train on mosaic training set and test on original image. First, i will random crop 512x512 if the size of image is bigger than 512 ( not using prob crop), otherwise resize image to 512x512. I trained with 60 epochs on heads and 40 epoch on alls with learning rate 0.0001 using Adam. I don't know why someone success to train with SGD( i used sgd but got 0.44). Using above suggestion, you may got 0.47 (no post)~0.49lb (with post processing)
a：One extra question, when you do need to resize, did you use the padding method (as in matterport's resize_image, which probably preserves the height/width ratio), or did you use the skimage.transform.resize?
b:We have about 543 gray images, you trained these gray images to obtain a gray image model, then apply processing on color images (12 testing images) for using it in the testing of the gray model.
a：

15）
Network architectures
We used UNet like encoder-decoder architectures with encoders pretrained on ImageNet.
Surprisingly, simple encoders like VGG16 did not work in this competition at all. They failed on the hard cases when the tissue looked like nucleus but it was not, especially on color images like 59b35151d4a7a5ffdd7ab7f171b142db8cfe40beeee67277fac6adca4d042c4
After these experiments we decided that we have to go deeper!!! As a result - top performing encoders in this competition were : DPN-92, Resnet-152, InceptionResnetV2, Resnet101.

https://www.cnblogs.com/alanma/p/6877166.html
替换原本ResNet的残差学习结构，同时也可以增加结构的数量，网络深度得以增加。生成了ResNet-50，ResNet-101，ResNet-152. 随着深度增加，因为解决了退化问题，性能不断提升。

16）
Ensembling
We used a simple approach for ensembling where we just averaged our masks before postprocessing

Ensembling~训练多个神经网络,最后计算它们的预测值的平均值。集成把不同模型的预测结果结合起来，生成最终预测，集成的模型越多，效果就越好。另外，由于集成结合了不同的基线预测，它们的性能至少等同于最优的基线模型。集成使得我们几乎免费就获得了性能提升!

17）
learning rate: initial 1e-4 with decay (we had different LR policies, but mostly small LR no more than 1e-4)

one example：
1、二值mask上使用watershed；
2、第二通道加上contours，但是在我们真的需要contours去分开细胞的地方，网络预测的很差，所以我们决定只去去预测细胞的borders
3、在一个通道上使用完整的mask，另一个通道上使用border；在borders的像素变为empty；在target激活函数使用softmax而不是sigmoid
--》更好的分离了nuclei但是mAP反而下降，因为lou的阈值太高
--》解决：用在full masks上训练的additional网络，结合后处理的结果

https://zhuanlan.zhihu.com/p/27449596?utm_source=weibo&utm_medium=social
Adam算法即自适应时刻估计方法（Adaptive Moment Estimation），能计算每个参数的自适应学习率。这个方法不仅存储了AdaDelta先前平方梯度的指数衰减平均值，而且保持了先前梯度M(t)的指数衰减平均值，这一点与动量类似

https://blog.csdn.net/guojingjuan/article/details/51206256
recall越高，precision越低。这是很合理的，因为假如说我把1000个全拿进来，那肯定正样本都包住了，recall=1，但是此时precision就很小了，因为我全部认为他们是正样本。recall=1时的precision的数值，等于正样本所占的比例。
所以AP，average precision，就是这个曲线下的面积，这里average，等于是对recall取平均。而mean average precision的mean，是对所有类别取平均（每一个类当做一次二分类任务）。现在的图像分类论文基本都是用mAP作为标准。

lightBGM~30万样本,40维特征,lightGBM在22秒内跑完,速度惊人,比xgboost快不少,精度与xgboost不相上下。
LightGBM models trained on predicted nucleus candidates. Each base candidate selected with lowest threshold for separation and tried to be separated with few higher thresholds and erosion. Used few basic morphological features about candidate like solidity, circularity, convexity, area, neighbors median area, count, etc.. Target for prediction - iou with ground truth (0 if iou < 0.5).
Then best separation threshold selected for each candidate according to predicted iou. Candidates with small predicted iou just removed (iou < 0.3 and iou < 0.2 for 2 submissions - it was hard to find this threshold using OOF prediction, because there is small overfit to image types even with such hard augmentation).

猜你喜欢