link: https://www.cnblogs.com/skyfsm/p/8330882.html
He participated in a tournament the last two months, did the right thing to do semantic segmentation of remote sensing high-definition images, calling them "eye in the sky." Lesson ending two weeks of data mining project our group is selected topic semantic segmentation of remote sensing images, so some time ago to do again just to refresh and strengthen the results a bit, so wrote this article, recording what to do with a deep learning of remote sensing the complete process of image segmentation and semantic some good ideas and tips.
data set
First, some data, we used this data set is data (high-definition remote sensing images in 2015 a city of southern China) CCF large data provided by the game, which is a small data set, which contains a large five annotated RGB remote sensing image (size range from 3000 × 3000 to 6000 × 6000), which altogether are denoted by the 4 class of objects, vegetation (labeled 1), building (labeled 2), water (labeled 3), road (labeled 4) and others ( mark 0). Among them, farmland, woodland, grassland vegetation are classified as category, in order to better observe the mark, we will train in which three visual picture as follows: blue - water, yellow - houses, green - vegetation, brown - the road. Referring more data can be described here .
Now we talk about the steps of data processing. We now have a queen-size 5 remote sensing image, we can not put these images directly into the network is trained, and they can not afford because the memory size is also different. Therefore, we will first of all they do random cutting, that is randomly generated x, y coordinates and then cutting out the coordinates 256 * 256 thumbnail, and make the following data enhancement operations:
- And FIG original label requires rotation: 90 degrees, 180 degrees, 270 degrees
- FIG original label needs to be done and the y-axis of the mirroring
- Artwork do fuzzy operation
- Picture adjustment operation made light
- Original done to increase the operating noise (Gaussian noise, impulse noise)
Here I did not use Keras own data augmented function, but their use opencv prepared a corresponding enhancement function.
img_w = 256
img_h = 256
image_sets = ['1.png','2.png','3.png','4.png','5.png'] def gamma_transform(img, gamma): gamma_table = [np.power(x / 255.0, gamma) * 255.0 for x in range(256)] gamma_table = np.round(np.array(gamma_table)).astype(np.uint8) return cv2.LUT(img, gamma_table) def random_gamma_transform(img, gamma_vari): log_gamma_vari = np.log(gamma_vari) alpha = np.random.uniform(-log_gamma_vari, log_gamma_vari) gamma = np.exp(alpha) return gamma_transform(img, gamma) def rotate(xb,yb,angle): M_rotate = cv2.getRotationMatrix2D((img_w/2, img_h/2), angle, 1) xb = cv2.warpAffine(xb, M_rotate, (img_w, img_h)) yb = cv2.warpAffine(yb, M_rotate, (img_w, img_h)) return xb,yb def blur(img): img = cv2.blur(img, (3, 3)); return img def add_noise(img): for i in range(200): #添加点噪声 temp_x = np.random.randint(0,img.shape[0]) temp_y = np.random.randint(0,img.shape[1]) img[temp_x][temp_y] = 255 return img def data_augment(xb,yb): if np.random.random() < 0.25: xb,yb = rotate(xb,yb,90) if np.random.random() < 0.25: xb,yb = rotate(xb,yb,180) if np.random.random() < 0.25: xb,yb = rotate(xb,yb,270) if np.random.random() < 0.25: xb = cv2.flip(xb, 1) # flipcode > 0:沿y轴翻转 yb = cv2.flip(yb, 1) if np.random.random() < 0.25: xb = random_gamma_transform(xb,1.0) if np.random.random() < 0.25: xb = blur(xb) if np.random.random() < 0.2: xb = add_noise(xb) return xb,yb def creat_dataset(image_num = 100000, mode = 'original'): print('creating dataset...') image_each = image_num / len(image_sets) g_count = 0 for i in tqdm(range(len(image_sets))): count = 0 src_img = cv2.imread('./data/src/' + image_sets[i]) # 3 channels label_img = cv2.imread('./data/label/' + image_sets[i],cv2.IMREAD_GRAYSCALE) # single channel X_height,X_width,_ = src_img.shape while count < image_each: random_width = random.randint(0, X_width - img_w - 1) random_height = random.randint(0, X_height - img_h - 1) src_roi = src_img[random_height: random_height + img_h, random_width: random_width + img_w,:] label_roi = label_img[random_height: random_height + img_h, random_width: random_width + img_w] if mode == 'augment': src_roi,label_roi = data_augment(src_roi,label_roi) visualize = np.zeros((256,256)).astype(np.uint8) visualize = label_roi *50 cv2.imwrite(('./aug/train/visualize/%d.png' % g_count),visualize) cv2.imwrite(('./aug/train/src/%d.png' % g_count),src_roi) cv2.imwrite(('./aug/train/label/%d.png' % g_count),label_roi) count += 1 g_count += 1
After the above data enhancement operation, we get a larger training set: 100 000 256 * 256 images.
Convolution neural network
面对这类图像语义分割的任务,我们可以选取的经典网络有很多,比如FCN,U-Net,SegNet,DeepLab,RefineNet,Mask Rcnn,Hed Net这些都是非常经典而且在很多比赛都广泛采用的网络架构。所以我们就可以从中选取一两个经典网络作为我们这个分割任务的解决方案。我们根据我们小组的情况,选取了U-Net和SegNet作为我们的主体网络进行实验。
SegNet
SegNet已经出来好几年了,这不是一个最新、效果最好的语义分割网络,但是它胜在网络结构清晰易懂,训练快速坑少,所以我们也采取它来做同样的任务。SegNet网络结构是编码器-解码器的结构,非常优雅,值得注意的是,SegNet做语义分割时通常在末端加入CRF模块做后处理,旨在进一步精修边缘的分割结果。有兴趣深究的可以看看这里
现在讲解代码部分,首先我们先定义好SegNet的网络结构。
def SegNet():
model = Sequential()
#encoder
model.add(Conv2D(64,(3,3),strides=(1,1),input_shape=(3,img_w,img_h),padding='same',activation='relu')) model.add(BatchNormalization()) model.add(Conv2D(64,(3,3),strides=(1,1),padding='same',activation='relu')) model.add(BatchNormalization()) model.add(MaxPooling2D(pool_size=(2,2))) #(128,128) model.add(Conv2D(128, (3, 3), strides=(1, 1), padding='same', activation='relu')) model.add(BatchNormalization()) model.add(Conv2D(128, (3, 3), strides=(1, 1), padding='same', activation='relu')) model.add(BatchNormalization()) model.add(MaxPooling2D(pool_size=(2, 2))) #(64,64) model.add(Conv2D(256, (3, 3), strides=(1, 1), padding='same', activation='relu')) model.add(BatchNormalization()) model.add(Conv2D(256, (3, 3), strides=(1, 1), padding='same', activation='relu')) model.add(BatchNormalization()) model.add(Conv2D(256, (3, 3), strides=(1, 1), padding='same', activation='relu')) model.add(BatchNormalization()) model.add(MaxPooling2D(pool_size=(2, 2))) #(32,32) model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu')) model.add(BatchNormalization()) model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu')) model.add(BatchNormalization()) model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu')) model.add(BatchNormalization()) model.add(MaxPooling2D(pool_size=(2, 2))) #(16,16) model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu')) model.add(BatchNormalization()) model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu')) model.add(BatchNormalization()) model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu')) model.add(BatchNormalization()) model.add(MaxPooling2D(pool_size=(2, 2))) #(8,8) #decoder model.add(UpSampling2D(size=(2,2))) #(16,16) model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu')) model.add(BatchNormalization()) model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu')) model.add(BatchNormalization()) model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu')) model.add(BatchNormalization()) model.add(UpSampling2D(size=(2, 2))) #(32,32) model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu')) model.add(BatchNormalization()) model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu')) model.add(BatchNormalization()) model.add(Conv2D(512, (3, 3), strides=(1, 1), padding='same', activation='relu')) model.add(BatchNormalization()) model.add(UpSampling2D(size=(2, 2))) #(64,64) model.add(Conv2D(256, (3, 3), strides=(1, 1), padding='same', activation='relu')) model.add(BatchNormalization()) model.add(Conv2D(256, (3, 3), strides=(1, 1), padding='same', activation='relu')) model.add(BatchNormalization()) model.add(Conv2D(256, (3, 3), strides=(1, 1), padding='same', activation='relu')) model.add(BatchNormalization()) model.add(UpSampling2D(size=(2, 2))) #(128,128) model.add(Conv2D(128, (3, 3), strides=(1, 1), padding='same', activation=