我的博客

1. 概述

本篇文章开始介绍 SSD 网络， SSD ，是 Single Shot MultiBox Detector 的简称，是一种 one-stage 的多尺度目标检测网络，相比较 Faster-RCNN 这种 two-stage 的网络来说，模型检测速度快，并且由于在不同特征尺度上预测不同尺度的目标，对小目标的检测效果明显提升。另外， SSD 的预测网络使用的是卷积层，而非全链接层。

SSD 网络我打算从主干网络开始写，因为数据加载根 Faster-RCNN 网络是一样的都是使用的是 VOC 数据集。

本章首先介绍下 SSD 网络的主干网络，我看的这个版本的 SSD 算法中使用的是 resnet 网络作为其主干网络，作者对其进行了调整，并在其后增加了一些预测特征层， resnet 我之前的博客中也写了。废话不多说，直接进入正题。

2. SSD 网络

前文提到， SSD 网络使用的是经过修改后的 resnet 网络，并在其后增加了 5 个预测特征层。下面我们直接先看主干网络对 resnet 的修改。

self.feature_extractor: 主干特征提取网络只保留到第 3 组残差网络，后面网络丢弃。
self.feature_extractor[-1][0]: 将第 3 组残差网络的第一个 Bottleneck 结构中的 conv1 ， conv2 ， downsample ，中的卷积网络步距改为 (1,1) ，即不进行下采样。

self.out_channels: 表示 6 个预测特征层的通道数，其中 1024 就是修改后的主干网络的输出通道数，其余 5 个分别是新增的 5 个预测特征层的通道数

class Backbone(nn.Module):
  def __init__(self, pretrain_path=None):
      super(Backbone, self).__init__()
      net = resnet50()
      self.out_channels = [1024, 512, 512, 256, 256, 256]

      if pretrain_path is not None:
          net.load_state_dict(torch.load(pretrain_path))

      self.feature_extractor = nn.Sequential(*list(net.children())[:7])

      conv4_block1 = self.feature_extractor[-1][0]

      # 修改conv4_block1的步距，从2->1
      conv4_block1.conv1.stride = (1, 1)
      conv4_block1.conv2.stride = (1, 1)
      conv4_block1.downsample[0].stride = (1, 1)

  def forward(self, x):
      x = self.feature_extractor(x)
      return x

下面看下 SSD300 类，该类主要是在主干网络的基础上增加 5 层额外的预测特征层。

self._build_additional_features: 增加 5 层额外的预测特征层。
self.num_defaults: 保存每个预测特征层每个像素点生成 DefaultBox 的数量
self.loc: 保存各个预测特征层的边界框预测网络，输出 DefaultBox num * 4
self.conf: 保存各个预测特征层的类别预测网络，输出 DefaultBox num * self.num_classes
self._init_weights: 初始化参数
default_box: 生成 DefaultBox
self.compute_loss: 损失函数
self.encoder: 编解码函数，用来计算边界框的损失和根据边界框回归调整 Default Box 得到真正的边界框

self.postprocess: 通过预测的边界框参数得到最终的预测坐标

class SSD300(nn.Module):
  def __init__(self, backbone=None, num_classes=21):
      super(SSD300, self).__init__()
      if backbone is None:
          raise Exception("backbone is None")
      if not hasattr(backbone, "out_channels"):
          raise Exception("the backbone not has attribute: out_channel")
      self.feature_extractor = backbone

      self.num_classes = num_classes
      # out_channels = [1024, 512, 512, 256, 256, 256] for resnet50
      self._build_additional_features(self.feature_extractor.out_channels)
      self.num_defaults = [4, 6, 6, 6, 4, 4]
      location_extractors = []
      confidence_extractors = []

      # out_channels = [1024, 512, 512, 256, 256, 256] for resnet50
      for nd, oc in zip(self.num_defaults, self.feature_extractor.out_channels):
          # nd is number_default_boxes, oc is output_channel
          location_extractors.append(nn.Conv2d(oc, nd * 4, kernel_size=3, padding=1))
          confidence_extractors.append(nn.Conv2d(oc, nd * self.num_classes, kernel_size=3, padding=1))

      self.loc = nn.ModuleList(location_extractors)
      self.conf = nn.ModuleList(confidence_extractors)
      self._init_weights()

      default_box = dboxes300_coco()
      self.compute_loss = Loss(default_box)
      self.encoder = Encoder(default_box)
      self.postprocess = PostProcess(default_box)

增加 5 层额外的预测特征层在下面函数中实现。

input_size: 将 self.feature_extractor.out_channels 的值赋给 input_size ，前一层的输出维度是后一层的输入维度，所以 input_size[:-1] 表示额外增加的 5 层预测特征层的输入， input_size[1:] 表示输出。

这 5 层预测特征层的网络结构是， Conv2d ， Batchorm2d ， ReLU , Conv2d ， BatchNorm2d ， ReLU 。第二个 Conv2d 网络的 padding 和 stride 对于前 3 个预测特征层是 (1, 2) ，后两个是 (0, 1)

def _build_additional_features(self, input_size):

  additional_blocks = []
  # input_size = [1024, 512, 512, 256, 256, 256] for resnet50
  middle_channels = [256, 256, 128, 128, 128]
  for i, (input_ch, output_ch, middle_ch) in enumerate(zip(input_size[:-1], input_size[1:], middle_channels)):
      padding, stride = (1, 2) if i < 3 else (0, 1)
      layer = nn.Sequential(
          nn.Conv2d(input_ch, middle_ch, kernel_size=1, bias=False),
          nn.BatchNorm2d(middle_ch),
          nn.ReLU(inplace=True),
          nn.Conv2d(middle_ch, output_ch, kernel_size=3, padding=padding, stride=stride, bias=False),
          nn.BatchNorm2d(output_ch),
          nn.ReLU(inplace=True),
      )
      additional_blocks.append(layer)
  self.additional_blocks = nn.ModuleList(additional_blocks)

至此， SSD 网络的主干特征网络就介绍到这里了，有了前面 Faster-RCNN 基础，学习 SSD 还是相对容易不少。

SSD源代码分析(一) 主干网络

1. 概述

2. SSD 网络

CATALOG

FEATURED TAGS