Summary of ideas - kaggle

Phase 1: 0.459 7% 0.557 48/3634 2%
matterport mask-rcnn
All methods are learned from public discussions
using https://github.com/lopuhin/kaggle-dsbowl-2018-dataset-fixes

1)
Increases

Since there are hundreds of training images, we must find some useful augmentations to prevent our models from overfitting and make them generalizable. Here are some methods I have tried but it didn't work for me:

add gaussian goise
color to gray
contrast and brightness
random crop 512x512 if image size is bigger than 512 otherwise resize the image to 512x512
mosacis
mosacis+random crop
mosacis+random crop+ h&e
rotate 90 degrees
radnom rotate 90,180,270 degrees
rotate +-5 degrees on top of flip & 90 degree rotation
elastic transform
I only use flip up&down&left&right

 

A question - If you rotated your images by 5 degree, how did you fill the part outside of the image, just left it 0?
he only used ""use flip up&down&left&right"; For me too H&E data did not help and rotations by 0-15 did not help.

attention:
When I use general normalization, I guess this method doesn't work well with other methods, so I just use flip. 1st place solution use many augmentation methods and it actually works very well. So always choose the best methods for your model.


-------------------------------------------
As there were just some hundreds of training images, we needed to come up with specific augmentations to prevent our models from overfitting and make them more or less generalizable. We used a lot of heavy augmentations (maybe too heavy)

Clahe, Sharpen, Emboss
Gaussian Noise
Color to Gray
Inverting - we should not have used it, some images were not predicted correctly on stage2 because of this augmentation
Remapping grayscale images to random color images
Blur, Median Blur, Motion Blur
contrast and brightness
random scale, rotates and flips
Heavy geometric transformations: Elastic Transform, Perspective Transform, Piecewise Affine transforms, pincushion distortion
Random HSV
Channel shuffle - I guess this one was very important due to the nature of the data
Nucleus copying on images. That created a lot of overlapping nuclei. It seemed to help networks to learn better borders for overlapping nuclei.


2)
Additional data
No. I have tried to add h&e dataset to training data, but it didn't improve my performance. I am using https://github.com/lopuhin/kaggle-dsbowl-2018-dataset-fixes

3)
Ensembling
No. I spend almost one week trying this method but it didn't work so well. I devided the training data into two categories: color & grey. I have seen someone get a high score (0.5+) in stage1 by using this method. I am very interested in this so I really hope that someone can share some solutions about this.

4)Parameters
train:

inti_with= coco

RESNET_ARCHITECTURE = "resnet101"

MEAN_PIXEL = np.array([0., 0., 0.])

RPN_NMS_THRESHOLD = 0.7

DETECTION_MIN_CONFIDENCE = 0.7

DETECTION_NMS_THRESHOLD = 0.3

TRAIN_ROIS_PER_IMAGE = 600

RPN_TRAIN_ANCHORS_PER_IMAGE = 320

LEARNING_RATE = 1e-3

inference:

RPN_NMS_THRESHOLD = 0.6

DETECTION_NMS_THRESHOLD = 0.1

Training

1e-3 all 20epochs. Choose the best point according to the display of tensorboard, then 1e-4 or 1e-5 train all 20epochs. (if val loss stops decreasing, stops training) optimizer=Adam.


question:
1. I am also using matterport mask-rcnn with image-size 1024 * 1024. I found it gets better score than small sizes. What size are you using? 2. you are using a very low DETECTION_NMS_THRESHOLD, will this cause problems when the GT is few? 3. you are using zero MEAN_PIXEL, what is the consideration for this?
1. I am using 512*512. 2. Sorry there are some parameters I didn't mentioned above. When I was training, I set DETECTION_NMS_THRESHOLD = 0.3, DETECTION_MIN_CONFIDENCE = 0.7. POST_NMS_ROIS_TRAINING = 2000, POST_NMS_ROIS_INFERENCE = 2000 3. general standardization can speed up the convergence of your algorithm.


5)
Post processing Use binary_dilation.
6)
general standardization. img = img-mean(img)/std(img).
7)
Filter
The five training sets are respectively (1) original train data (2) watershed algorithm (3) Random walk + watershed (4) geodesic active contour based on median filtering (5) geodesic active contour based on inverse Gaussian filtering
Each epoch I would randomly swap each annotation for one of the five buckets (the real training data, the watershed data , the random walker and watershed data, and the two buckets of geodesic and watershed data) so each epoch would only see one image once
. pictures

8)
fill holes\wateshed
https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.ndimage.morphology.binary_fill_holes.html#scipy-ndimage-morphology-binary-fill-holes

9)
I want to crop the image into 4 sub-images and masks. After that, I used your code to combine and merge layers in the edge. However, it gives me an unexpected result that deleted masks in the edges. thus I cannot recover the original masks(before cropping). This is my output using the below code of id=c901794d1a421d52e5734500c0a2a8ca84651fb93b19cec2f411855e70cae339. In additions, what is happen if these sub-images overlapping? For example, image_crop1=image[:512,:512,:], image_crop3=image[500:,:500,:]

10)
Several pictures are obtained from a microscope picture -- "splicing and re-slicing
neural network style transfer

11)
 I was wondering if compared the model with pretrained coco weights and without? And whether or not it made a significant difference?
I only compared the model with pretrained coco and imagenet and i found that coco is better than imagenet in this competition.

12)
(1) What is the purpose for binary dilation as post-processing, compared to other methods such as opening? Does dilation make the masks larger? (1) Did learning rate decay help the convergence? I found decreasing LR to 1e-4 or below causes overfitting, but I was using SGD with momentum. Did you see an improvement using Adam vs. SGD? (2) Since you're using 512x512, do you use a batch size (IMAGES_PER_GPU) > 1?
answer:
Dilation is generally used to extend edges and fill some small holes. As Heng mentioned before "Deep network is the weakest at the boundary image and strongest are the center.". In this competition, it is very necessary to do some works on boundary detection. It actually improve my result. (2) I have not used SGD, so it is hard to tell which one is more suitable in this competition. As far as i know, Keven Wang and Panpan Zhou is using SGD, you can discuss with them. (3) batch_size=1.


13)
I only trained color images, then tested color data in stage_test, no better than training on grayscale images, test color images I also tried transferring color images from BGR2HSV, then training, also didn't get better the result of.

14)
q:Random crop with size of 512. Otherwise you resize them then lost too much information. Random crop helps aug data also. I achieved 0.494 for matterport only using random crop , mosaic拼接and h&e.
a:When you are using random crop, how did you set your IMAGE_MIN_DIM IMAGE_MAX_DIM in config file?
q:Just use 512. I was test 256 but it is very bad. I did not resize if the size is bigger than 512
a:As I'm sure you noticed a lot of the test images are really narrow so resizing distorts the images a lot.
q:resize all to 512 and test on this size. I think it may have onther solution like sliding but it hard to merge cropped results to the final result
a:when you use random crop for matterport Mask RCNN, did you just do it during training and only for images larger than 512? In another word, did you still use resize when image are smaller than 512x512, and always use resize when detecting?
q:i did not test on mosaic testing set. I only train on mosaic training set and test on original image. First, i will random crop 512x512 if the size of image is bigger than 512 ( not using prob crop), otherwise resize image to 512x512. I trained with 60 epochs on heads and 40 epoch on alls with learning rate 0.0001 using Adam. I don't know why someone success to train with SGD( i used sgd but got 0.44). Using above suggestion, you may got 0.47 (no post)~0.49lb (with post processing)
a:One extra question, when you do need to resize, did you use the padding method (as in matterport's resize_image, which probably preserves the height/width ratio), or did you use the skimage.transform.resize?
b:We have about 543 gray images, you trained these gray images to obtain a gray image model, then apply processing on color images (12 testing images) for using it in the testing of the gray model.
a:

15)
Network architectures
We used UNet like encoder-decoder architectures with encoders pretrained on ImageNet.
Surprisingly, simple encoders like VGG16 did not work in this competition at all. They failed on the hard cases when the tissue looked like nucleus but it was not, especially on color images like 59b35151d4a7a5ffdd7ab7f171b142db8cfe40beeee67277fac6adca4d042c4
After these experiments we decided that we have to go deeper!!! As a result - top performing encoders in this competition were : DPN-92, Resnet-152, InceptionResnetV2, Resnet101.

https://www.cnblogs.com/alanma/p/6877166.html
replaces the residual learning structure of the original ResNet, and can also increase the number of structures and the network depth. Generated ResNet-50, ResNet-101, ResNet-152. As the depth increases, the performance improves continuously because the degradation problem is solved.

16)
Ensembling
We used a simple approach for ensembling where we just averaged our masks before postprocessing

Ensembling ~ trains multiple neural networks and finally computes the average of their predicted values. The ensemble combines the prediction results of different models to generate the final prediction. The more models ensemble, the better the effect. Additionally, since the ensembles combine different baseline predictions, they perform at least as good as the best baseline models. The integration gives us a performance boost for almost free!

17)
learning rate: initial 1e-4 with decay (we had different LR policies, but mostly small LR no more than 1e-4)


one example:
1. Use watershed on the binary mask;
2. Add contours to the second channel, but where we really need contours to separate cells, the network predicts poorly, so we decide to only predict the borders of cells
3. Use the full mask on one channel and the border on the other channel; the pixels in the borders become empty; use softmax instead of sigmoid in the target activation function
-- "better separation of nuclei but mAP drops instead, because The threshold of lou is too high
--"Solution: use the additional network trained on full masks, combined with the results of post-processing

https://zhuanlan.zhihu.com/p/27449596?utm_source=weibo&utm_medium=social
Adam algorithm is Adaptive Moment Estimation, which can calculate the adaptive learning rate of each parameter. This method not only stores the exponentially decaying average of the previous squared gradients of AdaDelta, but also maintains the exponentially decaying average of the previous gradients M(t), which is similar to momentum


https://blog.csdn.net/guojingjuan/article/details/51206256
The higher the recall, the lower the precision. This is very reasonable, because if I take all 1000 samples, then the positive samples must be covered, recall=1, but the precision is very small at this time, because I think they are all positive samples. The value of precision when recall=1 is equal to the proportion of positive samples.
So AP, average precision, is the area under this curve, where average is equal to taking the average of recall. The mean of mean average precision is to take the average of all categories (each category is regarded as a binary classification task). The current image classification papers basically use mAP as the standard.

 

lightBGM~30万样本,40维特征,lightGBM在22秒内跑完,速度惊人,比xgboost快不少,精度与xgboost不相上下。
LightGBM models trained on predicted nucleus candidates. Each base candidate selected with lowest threshold for separation and tried to be separated with few higher thresholds and erosion. Used few basic morphological features about candidate like solidity, circularity, convexity, area, neighbors median area, count, etc.. Target for prediction - iou with ground truth (0 if iou < 0.5).
Then best separation threshold selected for each candidate according to predicted iou. Candidates with small predicted iou just removed (iou < 0.3 and iou < 0.2 for 2 submissions - it was hard to find this threshold using OOF prediction, because there is small overfit to image types even with such hard augmentation).

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324594479&siteId=291194637