初识语义分割与 FCN

本文章来自: Fully_Convolutional_Neural_Networks_rendered.ipynb

因学习而翻译了该练习的内容，不妥删。转载也请说明情况。

语义分割

在练习前，先了解下关于语义分割的一些知识。刚开始看到语义分割这个词的时候，以为是自然语言处理里面的知识，后来看的多了，就知道还是计算机视觉中的内容。那么什么是语义分割？别看名字这么高级，其实就是像素级别的分类问题。即对一张图片中的每一个像素进行分类。

为什么要进行语义分割？如果当精确的物体边界很重要的时候，比如，在自动驾驶中，需要精确地识别每个像素中的物体，这时候就需要用到语义分割。同时，在医学图像识别的问题上，也会用到语义分割。

语义分割中是怎么进行数据标注的？看到关于语义分割的各种分类的结果通常都会如下所示，一片蓝，一片红的。这时候就会好奇语义分割是怎么进行数据标注的？其实语义分割分类的结果是为了便于展示，所以用不同的颜色块来区分。在语义分割中，数据标注也很简单。因为语义分割做的是对一张图片中的每一个像素进行分类，所以，数据标注的时候，就需要对每一个像素标注出标签。当然，标注的时候，还会对每一个类别和颜色进行一个颜色关联，这样就方便后续的展示。

练习目标

加载一个在 ImageNet 上预训练过的模型 - ResNet50
将 ResNet50 模型转换成全卷积网络
应用该网络在图片上实行弱分割 - 利用热力图

%matplotlib inline
import warnings

import numpy as np
from scipy.misc import imread as scipy_imread, imresize as scipy_imresize
import matplotlib.pyplot as plt

np.random.seed(2)

# 包装函数来禁用一些警告

def imread(*args, **kwargs):
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        return scipy_imread(*args, **kwargs)
    
def imresize(*args, **kwargs):
    with warnings.catch_warnings():
        warnings.simplefilter("ignore")
        return scipy_imresize(*args, **kwargs)

# 加载 ResNet50

from keras.applications.resnet50 import ResNet50

base_model = ResNet50(include_top=False)

print(base_model.output_shape)

1	(None, None, None, 2048)

1 2	res5c = base_model.layers[-1] type(res5c)

1	keras.layers.core.Activation

1	res5c.output_shape

1	(None, None, None, 2048)

全卷积 ResNet

除了残差块，resnet 输出形状为 WxHx2048
ImageNet 的默认输入形状为 224x224，其对应的输出为 7x7x2048

正规的 ResNet 的网络层：

x = base_model.output
x = Faltten()(x)
x = Dense(1000)(x)
x = softmax()(x)

之后修正的版本：
- 我们想要检索存储在 Dense 图层中的标签信息。之后我们将加载这些权重。
- 将 Dense 层更改为 Convolution2D 层以保留空间信息，输出为 WxHx1000
- 对于新的 Convolution2D 层，可以使用（1,1）的 kernel 来保持前一层的空间组织不变（它称为逐点卷积）
- 只想在最后一个维度上应用 softmax，以便保留空间信息。

# 自定义的 softmax

import keras
from keras.engine import Layer
import keras.backend as K

class SoftmaxMap(Layer):
    # 初始化函数
    def __init__(self, axis=-1, **kwargs):
        self.axis = axis
        super(SoftmaxMap, self).__init__(**kwargs)
        
    # 没有参数，所以不需要定义这个函数
    def build(self, input_shape):
        pass
    
    # 这里是我们感兴趣的地方
    # 和一般的 softmax 非常相似，但是需要注意额外的点：
    # 我们接受的 x 的形状为：x.shape == (batch_size, w, h, n_classes)
    # 默认情况下，在 Keras不是这种情况
    # 请注意，我们将 logits 的最大值减去以使 softmax 在数值上更稳定。
    def call(self, x, mask=None):
        e = K.exp(x - K.max(x, axis=self.axis, keepdims=True))
        s = K.sum(e, axis=self.axis, keepdims=True)
        return e / s
    
    # 输出的形状与输入一致
    def get_output_shape_for(self, input_shape):
        return input_shape

# 将 resnet 改成全卷积网络

from keras.layers import Convolution2D
from keras.models import Model

input = base_model.layers[0].input

# 选取卷积层的最后一层输出
x = base_model.layers[-1].output

# Kernel 的大小为 (1,1), 输出通道为 1000
x = Convolution2D(1000, (1, 1), name='conv1000')(x)

output = SoftmaxMap(axis=-1)(x)

fully_conv_ResNet = Model(inputs=input, outputs=output)

可以使用以下随机数据来检查是否可以在随机 RGB 图像上运行正向传递：

1 2	prediction_maps = fully_conv_ResNet.predict(np.random.randn(1, 200, 300, 3)) prediction_maps.shape

1	(1, 7, 10, 1000)

如何解释最终的输出形状？一个类的概率应该在输出映射的每个区域中总和为 1：

1	prediction_maps.sum(axis=-1)

array([[[1.        , 1.        , 1.        , 1.        , 0.99999994,
         1.        , 0.99999994, 0.9999999 , 1.        , 1.        ],
        [0.99999994, 1.        , 1.        , 1.        , 1.        ,
         0.99999994, 1.        , 1.        , 1.        , 1.        ],
        [1.        , 1.0000001 , 1.        , 1.        , 1.        ,
         0.99999994, 0.99999994, 0.99999994, 1.        , 1.0000001 ],
        [1.0000001 , 1.        , 1.        , 1.        , 0.99999994,
         1.0000001 , 1.0000001 , 1.        , 1.        , 1.        ],
        [0.99999994, 1.        , 1.        , 1.        , 1.        ,
         1.        , 1.        , 1.0000001 , 0.99999994, 1.        ],
        [1.0000001 , 1.        , 0.99999994, 1.        , 1.        ,
         1.        , 1.        , 0.99999994, 1.        , 1.        ],
        [1.        , 1.        , 1.        , 1.        , 1.        ,
         1.        , 1.        , 1.        , 1.        , 1.        ]]],
      dtype=float32)

加载权重

在文件 weights_dense.h5 中提供 ResNet50 的最后一个 Dense 层的权重和偏差
现在，最后一层现在是 1x1 卷积层而不是全连接层

import h5py

with h5py.File('weights_dense.h5', 'r') as h5f:
    w = h5f['w'][:]
    b = h5f['b'][:]

1 2	last_layer = fully_conv_ResNet.layers[-2] last_layer.get_weights()[0].shape

1	(1, 1, 2048, 1000)

last_layer = fully_conv_ResNet.layers[-2]

print("加载的权重形状：", w.shape)
print("最后的卷积层的权重形状：", last_layer.get_weights()[0].shape)

1 2	加载的权重形状： (2048, 1000) 最后的卷积层的权重形状： (1, 1, 2048, 1000)

# 重组权重的形状
w_reshaped = w.reshape((1, 1, 2048, 1000))

# 设置卷积层的权重
last_layer.set_weights([w_reshaped, b])

前向传播

定义下面的函数来测试我们的网络
它将输入调整为给定大小，然后使用 model.predict 计算输出

from keras.applications.imagenet_utils import preprocess_input

def forward_pass_resize(img_path, img_size):
    img_raw = imread(img_path)
    print('Image shape before resizing: %s' %(img_raw.shape,))
    img = imresize(img_raw, size=img_size).astype("float32")
    img = preprocess_input(img[np.newaxis])
    print("Image batch size shape before forward pass:", img.shape)
    z = fully_conv_ResNet.predict(img)
    return z

1 2	output = forward_pass_resize("dog.jpg", (800, 600)) print("prediction map shape:", output.shape)

1
2
3

Image shape before resizing: (1600, 2560, 3)
Image batch size shape before forward pass: (1, 800, 600, 3)
prediction map shape: (1, 25, 19, 1000)

找到与狗关联的类别

ImageNet 使用概念本体，从中派生类。 synset 对应于本体中的节点。

例如，所有种类的狗都是同义词 n02084071（狗，家犬，犬科动物）的孩子：

# Helper file importing synsets from imagenet
import imagenet_tool

synset = "n02084071" # 对应于狗
ids = imagenet_tool.synset_to_dfs_ids(synset)
print("All dog classes ids (%d):" % len(ids))
print(ids)

1
2

All dog classes ids (118):
[251, 268, 256, 253, 255, 254, 257, 159, 211, 210, 212, 214, 213, 216, 215, 219, 220, 221, 217, 218, 207, 209, 206, 205, 208, 193, 202, 194, 191, 204, 187, 203, 185, 192, 183, 199, 195, 181, 184, 201, 186, 200, 182, 188, 189, 190, 197, 196, 198, 179, 180, 177, 178, 175, 163, 174, 176, 160, 162, 161, 164, 168, 173, 170, 169, 165, 166, 167, 172, 171, 264, 263, 266, 265, 267, 262, 246, 242, 243, 248, 247, 229, 233, 234, 228, 231, 232, 230, 227, 226, 235, 225, 224, 223, 222, 236, 252, 237, 250, 249, 241, 239, 238, 240, 244, 245, 259, 261, 260, 258, 154, 153, 158, 152, 155, 151, 157, 156]

1
2
3

for dog_id in ids[:10]:
    print(imagenet_tool.id_to_words(dog_id))
print('...')

dalmatian, coach dog, carriage dog
Mexican hairless
Newfoundland, Newfoundland dog
basenji
Leonberg
pug, pug-dog
Great Pyrenees
Rhodesian ridgeback
vizsla, Hungarian pointer
German short-haired pointer
...

狗类热力图

下面的函数是从前向传播中建立热力图的辅助函数。它对与 synset 相对应的所有 id 的表示求和。

def build_heatmap(z, synset):
    class_ids = imagenet_tool.synset_to_dfs_ids(synset)
    class_ids = np.array([id_ for id_ in class_ids if id_ is not None])
    x = z[0, :, :, class_ids].sum(axis=0)
    print("size of heatmap:" + str(x.shape))
    return x

def display_img_and_heatmap(img_path, heatmap):
    dog = imread(img_path)
    fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(12, 8))
    ax0.imshow(dog)
    ax0.axis('off')
    ax1.imshow(heatmap, interpolation='nearest', cmap="viridis")
    ax1.axis('off')

下面用 3 种不同的大小：

(400, 640)
(800, 1280)
(1600, 2560)

# dog synset
s = "n02084071"

# size 1 - (400, 640)
probas_1 = forward_pass_resize("dog.jpg", (400, 640))
heatmap_1 = build_heatmap(probas_1, synset=s)
display_img_and_heatmap("dog.jpg", heatmap_1)

1
2
3

Image shape before resizing: (1600, 2560, 3)
Image batch size shape before forward pass: (1, 400, 640, 3)
size of heatmap:(13, 20)

# size 2 - (800, 1280)
probas_2 = forward_pass_resize("dog.jpg", (800, 1280))
heatmap_2 = build_heatmap(probas_2, synset=s)
display_img_and_heatmap("dog.jpg", heatmap_2)

1
2
3

Image shape before resizing: (1600, 2560, 3)
Image batch size shape before forward pass: (1, 800, 1280, 3)
size of heatmap:(25, 40)

# size 3 - (1600, 2560)
probas_3 = forward_pass_resize("dog.jpg", (1600, 2560))
heatmap_3 = build_heatmap(probas_3, synset=s)
display_img_and_heatmap("dog.jpg", heatmap_3)

1
2
3

Image shape before resizing: (1600, 2560, 3)
Image batch size shape before forward pass: (1, 1600, 2560, 3)
size of heatmap:(50, 80)

可以观察到 heatmap_1 和 heatmap_2 提供了比 heatmap_3 更粗略的分个。但是，heatmap_3 在狗区域外面有一些小的杂碎，heatmap_3 会编码更多关于狗的局部纹理级别的信息，而较低的分辨率则会编码更多关于整个对象的语义信息。结合它们可能是一个好主意！

heatmap_1_r = imresize(heatmap_1, (50,80)).astype("float32")
heatmap_2_r = imresize(heatmap_2, (50,80)).astype("float32")
heatmap_3_r = imresize(heatmap_3, (50,80)).astype("float32")

heatmap_geom_avg = np.power(heatmap_1_r * heatmap_2_r * heatmap_3_r, 0.333)
display_img_and_heatmap("dog.jpg", heatmap_geom_avg)