我的博客

1. 概述

随着深度学习的发展，越来越多的模型都采用更深的网络来训练自己的模型，那么模型网络层数越多真的就能训练出更好的模型吗？答案是否定的，何凯明等人在 ResNet 论文中提出，随着网络层数的加深，训练误差反而增加了，论文中叫做退化问题，论文中提出了残差块结构，将浅层的计算结果与深层的计算结果相加，解决了退化问题。同时随着网络层数的加深，还会导致梯度的爆炸和消失问题，论文中提出使用 Batch Normalization 批量归一化加快网络的训练速度。从下图可以明显看出 56 层网络的误差反而比 20 、 32 、 44 层网络更高。

退化问题

2. ResNet 模型

2.1 残差块

残差块结构如下图所示：

残差块

ResNet 网络根据不同的层数使用不同的残差块，较浅的网络使用下图左侧的残差块，较深的网络使用下图右侧的残差块。左边残差块在代码实现上叫 BasicBlock ，右边残差块在代码实现上叫 Bottleneck 。为什么较深网络采用 Bottleneck 残差块呢？因为 Bootleneck 残差块参数更少，更适合训练更深的网络。 BasicBlock 残差块参数个数： 3 x 3 x 64 x 64 + 3 x 3 x 64 x 64 = 73728 ； Bootleneck 残差块参数个数： 1 x 1 x 256 x 64 + 3 x 3 x 64 x 64 + 1 x 1 x 64 x 256 = 69632 。

2.2 网络结构

ResNet 网络结构就是根据不同的层数堆叠不同种类和个数的残差块， 18-layer 和 34-layer 采用 BasicBlock 结构， 50-layer 、 101-layer 、 152-layer 采用 Bottleneck 结构，如下图所示：

网络结构

3. 代码实现

BasicBlock 实现：

class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, in_channel, out_channel, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
                               kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channel)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
                               kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channel)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        out += identity
        out = self.relu(out)

        return out

BasicBlock 结构由两个 3x3 卷积组成，最后的输出要加上网络的输入，形成一个残差结构。注意 identity 可能需要进行 downsample 操作，从 ResNet 网络的结构图可以看出，除了 conv2_x 之外， conv3_x 、 conv4_x 、 conv5_x 都要 downsample ， downsample 的位置在进入该组残差网络之前，具体实现看 ResNet 类中的 _make_layer 函数。 BasicBlock 结构中的 expansion 属性值为 1 ，因为 BasicBlock 结构的输入和输出的通道数是一样的。

Bottleneck 实现：

class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, in_channel, out_channel, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
                               kernel_size=1, stride=1, bias=False)  # squeeze channels
        self.bn1 = nn.BatchNorm2d(out_channel)
        # -----------------------------------------
        self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
                               kernel_size=3, stride=stride, bias=False, padding=1)
        self.bn2 = nn.BatchNorm2d(out_channel)
        # -----------------------------------------
        self.conv3 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel*self.expansion,
                               kernel_size=1, stride=1, bias=False)  # unsqueeze channels
        self.bn3 = nn.BatchNorm2d(out_channel*self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        out += identity
        out = self.relu(out)

        return out

Bottleneck 结构由三个卷积层组成，分别是 1x1 ， 3x3 ， 1x1 ；最后的输出要加上网络的输入，形成一个残差结构。注意 identity 可能需要 downsample 操作，从 ResNet 网络的结构图可以看出，除了 conv2_x 之外， conv3_x 、 conv4_x 、 conv5_x 都要 downsample ， downsample 的位置在进入该组残差网络之前，具体实现看 ResNet 类中的 _make_layer 函数。 Bottleneck 结构中的 expansion 属性值为 4 ，因为 Bottleneck 结构的输出是输入通道数的 4 倍。

ResNet 网络实现：

class ResNet(nn.Module):

    def __init__(self, block, blocks_num, num_classes=1000, include_top=True):
        super(ResNet, self).__init__()
        self.include_top = include_top
        self.in_channel = 64

        self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2,
                               padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(self.in_channel)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, blocks_num[0])
        self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)
        self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)
        self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)
        if self.include_top:
            self.avgpool = nn.AdaptiveAvgPool2d((1, 1))  # output size = (1, 1)
            self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        if self.include_top:
            x = self.avgpool(x)
            x = torch.flatten(x, 1)
            x = self.fc(x)

        return x

ResNet 网络开头是一个卷积核大小为 7x7 , 步长为 2 ，填充为 3 ，卷积核个数为64的卷积层；然后对输出进行 Batch Normalization 处理；紧接着用 relu 函数激活；然后对输出进行 Maxpooling 处理；后面根据网络深度接 BasicBlock 模块或 Bottleneck 模块。 BasicBlock 模块和 Bottleneck 模块是由 _make_layer 函数创建的。

def _make_layer(self, block, channel, block_num, stride=1):
    downsample = None
    if stride != 1 or self.in_channel != channel * block.expansion:
        downsample = nn.Sequential(
            nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
            nn.BatchNorm2d(channel * block.expansion))

    layers = []
    layers.append(block(self.in_channel, channel, downsample=downsample, stride=stride))
    self.in_channel = channel * block.expansion

    for _ in range(1, block_num):
        layers.append(block(self.in_channel, channel))

    return nn.Sequential(*layers)

首先说明一下 downsample 层的作用，该层用于残差层，它主要有两个作用，一个是减半特征图的大小，另一个是改变通道的维度。网络中要想实现输入和输出进行残差计算，那么就要保证输入和输出的张量大小是相同的，因为网络会进行下采样(特征图大小减半)操作，并且 Bottleneck 结构的通道维度输出是输入的 4 倍，所以网络中需要 downsample 层进行下采样和通道维度的变化。

_make_layer 函数生成 conv2_x 、 conv3_x 、 conv4_x 、 conv5_x 卷积网络。每个卷积网络是由多个 BasicBlock 模块或 Bottleneck 模块组成的，这里我们把这些 BasicBlock 模块或 Bottleneck 模块简称为 block ，第一个 block 可能需要 downsample 层。

对于 ResNet-50 、 ResNet-101 、 ResNet-152 的网络，网络中使用的是 BasicBlock 模块， conv2_x 网络即不需要减半特征图大小也不需要改变通道维度; conv3_x 、 conv4_x 、 conv5_x 网络需要减半特征图大小和改变通道维度，其中这些网络结构的第一个 block 要将前一个网络输出的特征图大小减半，具体操作是 block 结构的 3x3 卷积的 stride 设为 2 ，同时这个 block 的残差边的 stride 也要设为 2 ，而这些网络的通道维度也是在第一个 block 结构中进行升维，升维时 block 的主分支和残差边分支都要进行升维。

对于 ResNet-18 和 ResNet-34 的网络，网络中使用的是 Bottleneck 模块， conv2_x 网络不需要减半特征图大小，但需要改变通道维度，该网络结构的第一个 block 结构中输入维度时 64 ，输出维度是 256 ，所以第一个 block 的残差边要进行升维操作； conv3_x 、 conv4_x 、 conv5_x 网络需要减半特征图大小和改变通道维度，这些网络结构的第一个 block 输出维度是输入维度的 4 倍，所以残差边要进行升维，同时第一个 block 要将前一个网络输出的特征图大小减半，所以 3x3 卷积的 stride 设为 2 ，残差边的 stride 也要设为 2 。

残差网络中使用 downsample 的条件如下代码所示，首先，当 stride = 2 时，从 ResNet 的 init 函数可以看出后三组残差网络的 stride = 2 需要下采样将 featrue map 的大小降为原来的 2 倍；其次，当 self.in_channel != channel * block.expansion 时，这个条件对于 BasicBlock 结构中每组残差网络之间满足该条件；对于 Bottleneck 结构中组内和组间都满足该条件。

if stride != 1 or self.in_channel != channel * block.expansion:

ResNet 示例：

def resnet34(num_classes=1000, include_top=True):
    return ResNet(BasicBlock, [3, 4, 6, 3], num_classes=num_classes, include_top=include_top)

def resnet101(num_classes=1000, include_top=True):
    return ResNet(Bottleneck, [3, 4, 23, 3], num_classes=num_classes, include_top=include_top)

34 层网络使用 BasicBlock 网络结构，列表 [3, 4, 6, 3] 表示每组残差块的个数，总共四组残差块。 101 层网络使用 Bottleneck 网络结构，列表 [3, 4, 23, 3] 表示每组残差块的个数，总共四组残差块。

深度学习之(十一)ResNet 网络

1. 概述

2. ResNet 模型

2.1 残差块

2.2 网络结构

3. 代码实现

CATALOG

FEATURED TAGS