YOLO 源代码分析(四) YOLOv3 网络训练 -

我的博客

这篇文章介绍 YOLOv3 的网络训练过程。主要介绍网络的正负样本的选取和损失函数计算。

训练模型的代码在 train.py 文件中。这个文件大部分内容前面三篇文章都介绍过了，这里只讲解之前每提到的内容。

下面的 accumulate 是模拟一个更大的 batchsize 来进行梯度下降，一定条件下， batchsize 越大训练效果越好，梯度累加则实现了 batchsize 的变相扩大，如果 accumulate 为 8 ，则 batchsize ‘变相’ 扩大了 8 倍。

accumulate = max(round(64 / batch_size), 1)

下面代码是使用多尺度训练。

gs: 首先确保网络输入的图像是 32 的倍数

multi_scale: 多尺度训练，就是在训练过程中不断的改变训练数据集的大小，增加模型的鲁棒性，每训练 accumulate 个 batch 后，修改图像训练图像的大小，图像大小在 (grid_min, grid_max) 之间，并且都是 32 的倍数

gs = 32  # (pixels) grid size
assert math.fmod(imgsz_test, gs) == 0, "--img-size %g must be a %g-multiple" % (imgsz_test, gs)
grid_min, grid_max = imgsz_test // gs, imgsz_test // gs
if multi_scale:
  imgsz_min = opt.img_size // 1.5
  imgsz_max = opt.img_size // 0.667

  grid_min, grid_max = imgsz_min // gs, imgsz_max // gs
  imgsz_min, imgsz_max = int(grid_min * gs), int(grid_max * gs)
  imgsz_train = imgsz_max  # initialize with max size
  print("Using multi_scale training, image range[{}, {}]".format(imgsz_min, imgsz_max))

然后我们直接进入 train_one_epoch 函数，这个函数用来训练一个 epoch 。

第一个 epoch 使用 Warmup 训练模式。为什么要使用 Warmup 训练模式呢？由于刚开始训练时，模型的权重 (weights) 是随机初始化的，此时若选择一个较大的学习率，可能带来模型的不稳定，选择 Warmup 预热学习率的方式，可以使得开始训练的几个 epoch 或者一些 step 内学习率较小，在预热的小学习率下，模型可以慢慢趋于稳定，等模型相对稳定后在选择预先设置的学习率进行训练，使得模型收敛速度变得更快，模型效果更佳。

if epoch == 0 and warmup is True:  # 当训练第一轮（epoch=0）时，启用warmup训练方式，可理解为热身训练
    warmup_factor = 1.0 / 1000
    warmup_iters = min(1000, len(data_loader) - 1)

    lr_scheduler = utils.warmup_lr_scheduler(optimizer, warmup_iters, warmup_factor)
    accumulate = 1

每训练 accumulate 个 batch 就随机修改一次输入图片大小，由于 label 已转为相对坐标，故缩放图片不影响 label 的值。

img_size: 在给定最大最小输入尺寸范围内随机选取一个 size (size 为 32 的整数倍)

如果图片最大边长不等于 img_size , 则缩放图片，并将长和宽调整到 32 的整数倍

if multi_scale:
  if ni % accumulate == 0:  #  adjust img_size (67% - 150%) every 1 batch
      img_size = random.randrange(grid_min, grid_max + 1) * gs
  sf = img_size / max(imgs.shape[2:])  # scale factor

  if sf != 1:
      # gs: (pixels) grid size
      ns = [math.ceil(x * sf / gs) * gs for x in imgs.shape[2:]]  # new shape (stretched to 32-multiple)
      imgs = F.interpolate(imgs, size=ns, mode='bilinear', align_corners=False)

每训练 accumulate 个 batch 更新一次权重。

if ni % accumulate == 0:
    scaler.step(optimizer)
    scaler.update()
    optimizer.zero_grad()

下面看下损失函数的计算。先看 build_targets 函数，该函数用来划分正负样本。

nt: 记录当前 batch 中有多少个目标，目标的 shape 是： (image,class,x,y,w,h)
tcls, tbox, indices, anch: 最后解释
gain: 特征图的大小
multi_gpu: 是否使用多 GPU
遍历每一个 YOLO 层
anchors: 获取该 yolo predictor 对应的 anchors
na: anchor 的个数，每个 yolo 层有 3 个大小的 anchor
at: 对 anchor 进行维度上的变换，变换过程如下： [3] -> [3, 1] -> [3, nt] ，比如 nt=14 表示当前 batch 有 14 个目标， at 转化成 (3,14) 的 tensor ，表示 3 个 anchor 和 14 个目标
接下来是为每一个目标匹配 anchor
t=targets * gain ， targets 为目标边界框，被缩放到图片的相对尺寸下， gain 为 feature map 的大小， targets * gain 就是把目标边界框映射到 featrue map 上
wh_iou: 计算 anchor 与目标的 iou ，这里计算 IOU 的方法不同于以往我见到过的，以往都是传入左上角和右下角两个坐标，而 wh_iou 只传入了宽和高。这里我查了很多资料，也思考了很久，说一下我的理解：训练时，目标的中心点坐标落在哪个 cell 里，哪个 cell 就负责预测这个目标， yolov3 有三个预测特征层，分别预测不同大小的目标，每个预测特征层的每个 cell 有 3 个 anchor ，那到底用哪个 anchor 进行预测呢？这句代码就是为一个 batch 中的每个目标选取合适的 anchor ，这里的合适就是将 anchor 和目标的左上角对齐计算 iou ，大于阈值(0.2) 则让这个 anchor 负责预测这个目标。为什么这样选？我的理解是， anchor 是在固定位置设置的候选框，需要通过网络预测的边界框参数进行调整才能得到最终的预测框。只有 anchor 和目标的 iou 大于设定的这个阈值时，才有可能通过预测的边界框回归参数将其调整到目标的大小。这里在进行筛选的时候，会出现一个目标由多个 anchor 进行预测的情况，这个问题在测试阶段非极大值抑制会过滤掉重复的目标，在训练阶段，网络只管让负责预测这个目标的 anchor 尽可能的接近目标。
j: 保存 anchor 与目标的 iou 大于 model.hyp[‘iou_t’] 的 bool 索引，shape 为 [3, nt] ，其中为 True 的是符合的 anchor 。
a, t = at[j], t.repeat(na, 1, 1)[j]: 获取 iou 大于阈值的 anchor 与 target 对应信息。t.repeat(na, 1, 1): [nt, 6] -> [3, nt, 6]
b, c: b 表示图片在 batch中的索引， c 表示类别
gxy: 获取目标的中心点坐标
gij: 获取目标中心点坐标落在哪个 cell 上， gjj 表示 cell 的左上角坐标
gi, gj: 获取 cell 的左上角坐标
indices: 保存 image 、 anchor 、 grid 的索引
tbox: 相对于 cell 左上角的偏移和 gwh
anch: 保存 anchor

tcls: 保存 class

def build_targets(p, targets, model):
  nt = targets.shape[0]
  tcls, tbox, indices, anch = [], [], [], []
  gain = torch.ones(6, device=targets.device)  # normalized to gridspace gain

  multi_gpu = type(model) in (nn.parallel.DataParallel, nn.parallel.DistributedDataParallel)
  for i, j in enumerate(model.yolo_layers):  # [89, 101, 113]
      anchors = model.module.module_list[j].anchor_vec if multi_gpu else model.module_list[j].anchor_vec
      gain[2:] = torch.tensor(p[i].shape)[[3, 2, 3, 2]]  # xyxy gain
      na = anchors.shape[0]  # number of anchors
      at = torch.arange(na).view(na, 1).repeat(1, nt)  # anchor tensor, same as .repeat_interleave(nt)

      # Match targets to anchors
      a, t, offsets = [], targets * gain, 0
      if nt:  # 如果存在target的话
          j = wh_iou(anchors, t[:, 4:6]) > model.hyp['iou_t']
          a, t = at[j], t.repeat(na, 1, 1)[j]  # filter

      b, c = t[:, :2].long().T  # image, class
      gxy = t[:, 2:4]  # grid xy
      gwh = t[:, 4:6]  # grid wh
      gij = (gxy - offsets).long()  # 匹配targets所在的grid cell左上角坐标
      gi, gj = gij.T  # grid xy indices

      # Append
      indices.append((b, a, gj, gi))  # image, anchor, grid indices(x, y)
      tbox.append(torch.cat((gxy - gij, gwh), 1))  # gt box相对anchor的x,y偏移量以及w,h
      anch.append(anchors[a])  # anchors
      tcls.append(c)  # class
      if c.shape[0]:  # if any targets
          # 目标的标签数值不能大于给定的目标类别数
          assert c.max() < model.nc, 'Model accepts %g classes labeled from 0-%g, however you labelled a class %g. ' \
                                     'See https://github.com/ultralytics/yolov3/wiki/Train-Custom-Data' % (
                                         model.nc, model.nc - 1, c.max())

  return tcls, tbox, indices, anch

损失函数。

类别和置信度损失使用二值交叉熵损失函数
smooth_BCE: 下面介绍
FocalLoss: 下面介绍
定位损失使用 GIou 损失函数
p 存储 yolo 的输出，遍历每一个 yolo 的输出层， shape 是： [bs, anchor, grid, grid, xywh + obj + classes]
ps = pi[b, a, gj, gi]: 对应匹配到正样本的预测信息
用匹配到的预测信息与真实标签计算损失，先计算 GIou 损失
tobj: 置信度损失，当前框有目标的概率乘以 bounding box 和 ground truth 的 IoU 的结果
lcls: 类别损失

最后将各类损失乘以各自的权重

def compute_loss(p, targets, model):  # predictions, targets, model
  device = p[0].device
  lcls = torch.zeros(1, device=device)  # Tensor(0)
  lbox = torch.zeros(1, device=device)  # Tensor(0)
  lobj = torch.zeros(1, device=device)  # Tensor(0)
  tcls, tbox, indices, anchors = build_targets(p, targets, model)  # targets
  h = model.hyp  # hyperparameters
  red = 'mean'  # Loss reduction (sum or mean)

  # Define criteria
  BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['cls_pw']], device=device), reduction=red)
  BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['obj_pw']], device=device), reduction=red)

  # class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3
  cp, cn = smooth_BCE(eps=0.0)

  # focal loss
  g = h['fl_gamma']  # focal loss gamma
  if g > 0:
      BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g)

  # per output
  nt = 0  # targets
  for i, pi in enumerate(p):  # layer index, layer predictions
      b, a, gj, gi = indices[i]  # image, anchor, gridy, gridx
      tobj = torch.zeros_like(pi[..., 0], device=device)  # target obj

      nb = b.shape[0]  # number of targets
      if nb:
          nt += nb  # cumulative targets
          ps = pi[b, a, gj, gi]  # prediction subset corresponding to targets

          # GIoU
          pxy = ps[:, :2].sigmoid()
          pwh = ps[:, 2:4].exp().clamp(max=1E3) * anchors[i]
          pbox = torch.cat((pxy, pwh), 1)  # predicted box
          giou = bbox_iou(pbox.t(), tbox[i], x1y1x2y2=False, GIoU=True)  # giou(prediction, target)
          lbox += (1.0 - giou).mean()  # giou loss

          # Obj
          tobj[b, a, gj, gi] = (1.0 - model.gr) + model.gr * giou.detach().clamp(0).type(tobj.dtype)  # giou ratio

          # Class
          if model.nc > 1:  # cls loss (only if multiple classes)
              t = torch.full_like(ps[:, 5:], cn, device=device)  # targets
              t[range(nb), tcls[i]] = cp
              lcls += BCEcls(ps[:, 5:], t)  # BCE

      lobj += BCEobj(pi[..., 4], tobj)  # obj loss

  lbox *= h['giou']
  lobj *= h['obj']
  lcls *= h['cls']

  # loss = lbox + lobj + lcls
  return {"box_loss": lbox, "obj_loss": lobj, "class_loss": lcls}

下面这段代码提供了 label smoothing 的功能，只能用在多类问题， ultralytics 版 YOLOv3 使用的二分类进行损失计算，所以实际使用中代码中传入的参数为 0 。

def smooth_BCE(eps=0.1):
    # return positive, negative label smoothing BCE targets
    return 1.0 - 0.5 * eps, 0.5 * eps

这里介绍下为什么要引入 label smoothing 。交叉熵损失函数在多分类任务中存在如下问题，训练神经网络时，最小化预测概率和标签真实概率之间的交叉熵，从而得到最优的预测概率分布。神经网络会促使自身往正确标签和错误标签差值最大的方向学习，在训练数据较少，不足以表征所有的样本特征的情况下，会导致网络过拟合。 label smoothing 可以解决上述问题，这是一种正则化策略，主要是通过 soft one-hot 来加入噪声，减少了真实样本标签的类别在计算损失函数时的权重，最终起到抑制过拟合的效果。

下面看下 FocalLoss 的实现。通过增加 gamma 和 alpha 两个参数提高模型对难分和易分样本极度不平衡问题。

reduction: 该参数共有三种选项 mean，sum 和 none 。 mean 为默认情况，表明对 N 个样本的 loss 进行求平均之后返回； sum 指对 N 个样本的 loss 求和； none 表示直接返回 N 分样本的 loss 。

class FocalLoss(nn.Module):
  # Wraps focal loss around existing loss_fcn(), i.e. criteria = FocalLoss(nn.BCEWithLogitsLoss(), gamma=1.5)
  def __init__(self, loss_fcn, gamma=1.5, alpha=0.25):
      super(FocalLoss, self).__init__()
      self.loss_fcn = loss_fcn  # must be nn.BCEWithLogitsLoss()
      self.gamma = gamma
      self.alpha = alpha
      self.reduction = loss_fcn.reduction
      self.loss_fcn.reduction = 'none'  # required to apply FL to each element

  def forward(self, pred, true):
      loss = self.loss_fcn(pred, true)

      pred_prob = torch.sigmoid(pred)  # prob from logits
      p_t = true * pred_prob + (1 - true) * (1 - pred_prob)
      alpha_factor = true * self.alpha + (1 - true) * (1 - self.alpha)
      modulating_factor = (1.0 - p_t) ** self.gamma
      loss *= alpha_factor * modulating_factor

      if self.reduction == 'mean':
          return loss.mean()
      elif self.reduction == 'sum':
          return loss.sum()
      else:  # 'none'
          return loss

YOLO 源代码分析(四) YOLOv3 网络训练

CATALOG

FEATURED TAGS