我的博客

1. 概述

这篇文章记录下 ultralytics 版 YOLOv3 的数据加载，包含数据集处理、数据预处理等。

1. 数据集处理

数据集处理是基于 VOC 数据集的，我们知道 YOLO 网络在训练的时候拟合的是目标的中心点坐标和宽高，而 VOC 数据集标注的是目标的左上角坐标和右下角坐标，所以要先将数据集的标注信息进行处理。

1.1. 格式转换

首先我们要先把 VOC 的标注信息转换成 YOLO 标注信息。使用 trans_voc2yolo.py 脚本进行转换，下面我们看一下这个脚本是怎么转换的。

首先我们先看一下这个脚本里的常数代表什么意思。

voc_root 和 voc_version 指定 voc 数据集的目录。我把 VOC 数据集放到了项目的上一级目录了
train_txt 和 val_txt 指定训练集和测试集所使用标签文件名
save_file_root 指定转换后标注信息存储的位置
label_json_path 存储类别信息

voc_images_path, voc_xml_path, train_txt_path, val_txt_path 这几个目录是拼接目录

voc_root = "../VOCdevkit"
voc_version = "VOC2012"
train_txt = "train.txt"
val_txt = "val.txt"
save_file_root = "./my_yolo_dataset"
label_json_path = './data/pascal_voc_classes.json'
voc_images_path = os.path.join(voc_root, voc_version, "JPEGImages")
voc_xml_path = os.path.join(voc_root, voc_version, "Annotations")
train_txt_path = os.path.join(voc_root, voc_version, "ImageSets", "Main", train_txt)
val_txt_path = os.path.join(voc_root, voc_version, "ImageSets", "Main", val_txt)

下面从主函数开始，捋一捋主要的流程。

加载 json ，转换成 class_dict 字典
分别读取 train_txt_path 和 val_txt_path 两个文件然后调用 translate_info 进行转换，将转换结果保存到 save_file_root 中

create_class_names: 将类别信息写入 data/my_data_label.names 文件中

def main():
  # read class_indict
  json_file = open(label_json_path, 'r')
  class_dict = json.load(json_file)

  with open(train_txt_path, "r") as r:
      train_file_names = [i for i in r.read().splitlines() if len(i.strip()) > 0]
  translate_info(train_file_names, save_file_root, class_dict, "train")

  with open(val_txt_path, "r") as r:
      val_file_names = [i for i in r.read().splitlines() if len(i.strip()) > 0]
  translate_info(val_file_names, save_file_root, class_dict, "val")

  create_class_names(class_dict)

create_class_names 函数比较简单，遍历字典中的 keys 将其写入文件中。

def create_class_names(class_dict: dict):
    keys = class_dict.keys()
    with open("./data/my_data_label.names", "w") as w:
        for index, k in enumerate(keys):
            if index + 1 == len(keys):
                w.write(k)
            else:
                w.write(k + "\n")

接下来我们重点分析一下训练数据集和测试数据集是如何将 VOC 格式转化成 YOLO 格式的。这里就以训练数据集为例，测试集也是一样的。首先是将 train_txt_path 文件的每一行存储读到 train_file_names 变量中。

创建目录，将标签文件存储到 my_yolo_dataset/train/labels 目录，将图像文件存储到 my_yolo_dataset/train/images 目录
读取每一张标注 xml 文件，将 xml 里标注的每一个目标按照 [类别、中心点坐标 (x,y) 、宽、高] 的格式存储到标签文件中，每一行存储一个目标信息，如果一个 xml 标注了多个目标则标签文件中就会存在多行，特别注意，这里的中心点坐标、宽高都转换成了相对坐标，训练时也是使用的相对坐标

最后将图像数据拷贝到 my_yolo_dataset/train/images 中

def translate_info(file_names: list, save_root: str, class_dict: dict, train_val='train'):
  save_txt_path = os.path.join(save_root, train_val, "labels")
  if os.path.exists(save_txt_path) is False:
      os.makedirs(save_txt_path)
  save_images_path = os.path.join(save_root, train_val, "images")
  if os.path.exists(save_images_path) is False:
      os.makedirs(save_images_path)

  for file in tqdm(file_names, desc="translate {} file...".format(train_val)):
      img_path = os.path.join(voc_images_path, file + ".jpg")
      assert os.path.exists(img_path), "file:{} not exist...".format(img_path)

      xml_path = os.path.join(voc_xml_path, file + ".xml")
      assert os.path.exists(xml_path), "file:{} not exist...".format(xml_path)

      with open(xml_path) as fid:
          xml_str = fid.read()
      xml = etree.fromstring(xml_str)
      data = parse_xml_to_dict(xml)["annotation"]
      img_height = int(data["size"]["height"])
      img_width = int(data["size"]["width"])

      with open(os.path.join(save_txt_path, file + ".txt"), "w") as f:
          for index, obj in enumerate(data["object"]):
              xmin = float(obj["bndbox"]["xmin"])
              xmax = float(obj["bndbox"]["xmax"])
              ymin = float(obj["bndbox"]["ymin"])
              ymax = float(obj["bndbox"]["ymax"])
              class_name = obj["name"]
              class_index = class_dict[class_name] - 1  # 目标id从0开始

              xcenter = xmin + (xmax - xmin) / 2
              ycenter = ymin + (ymax - ymin) / 2
              w = xmax - xmin
              h = ymax - ymin

              xcenter = round(xcenter / img_width, 6)
              ycenter = round(ycenter / img_height, 6)
              w = round(w / img_width, 6)
              h = round(h / img_height, 6)

              info = [str(i) for i in [class_index, xcenter, ycenter, w, h]]

              if index == 0:
                  f.write(" ".join(info))
              else:
                  f.write("\n" + " ".join(info))

      shutil.copyfile(img_path, os.path.join(save_images_path, img_path.split(os.sep)[-1]))

trans_voc2yolo.py 脚本执行完成后目录结构如下所示：

├── my_yolo_dataset
│   ├── train
│   │   ├── images
│   │   └── labels
│   └── val
│       ├── images
│       └── labels

1.2. 生成准备文件

这里指的是根据生成好的 YOLO 标签和数据集生成一系列准备文件，以供训练时使用，这一步可以通执行 calculate_dataset.py 脚本来完成。

首先我们看一下该脚本中所用的常量。

train_annotation_dir 和 val_annotation_dir 指定训练集和验证机的标签目录
classes_label 指定 label 存放的文件

cfg_path 指定 yolo 网络结构的配置文件

train_annotation_dir = "./my_yolo_dataset/train/labels"
val_annotation_dir = "./my_yolo_dataset/val/labels"
classes_label = "./data/my_data_label.names"
cfg_path = "./cfg/yolov3-spp.cfg"

下面我们看一下 main 函数，整体比较简单就不展开介绍了。

分别将训练集和验证集用到的图片数据的路径保存到 train_txt_path 和 val_txt_path 中
生成 data/my_data.data 文件，保存训练的列别个数、训练集路径、验证集路径

最后根据训练数据集的类别数调整 yolo 主干网络的配置

def main():
  train_txt_path = "data/my_train_data.txt"
  val_txt_path = "data/my_val_data.txt"
  calculate_data_txt(train_txt_path, train_annotation_dir)
  calculate_data_txt(val_txt_path, val_annotation_dir)

  classes_info = [line.strip() for line in open(classes_label, "r").readlines() if len(line.strip()) > 0]
  create_data_data("./data/my_data.data", classes_label, train_txt_path, val_txt_path, classes_info)

  change_and_create_cfg_file(classes_info)

2. 数据预处理

至此，数据集已经准备好了，让我们进一步了解数据预处理的过程。数据预处理在 LoadImagesAndLabels 类中实现，首先我们先看该类的构造函数，该函数比较长，我们分段分析。

path: 指向 data/my_train_data.txt ，存放类别个数以及训练集验证集数据
img_size: 当为训练集时，设置的是训练过程中(因为开启了多尺度训练)的最大尺寸；当为验证集时，设置的是最终使用的网络大小
batch_size: 每一个 batch 输入图像的大小
augment: 是否进行图像增强，训练集设置为 True ，验证集设置为 False
hyp: 网络使用的超参数，通过解析 cfg/hyp.yaml 文件得到
rect: 是否使用 rectangular ，在验证时打开
cache_images: 是否缓存图像
single_cls: 是否是单个类别

rank: 多 GPU 中哪一个 GPU

def __init__(self,
               path,
               img_size=416,
               batch_size=16,
               augment=False,
               hyp=None,
               rect=False,
               cache_images=False,
               single_cls=False, pad=0.0, rank=-1):

获取文件路径名。

解析 data/my_train_data.txt 文件，获取存储训练集和验证集数据的路径
检查每张图片后缀格式是否在支持的列表中，保存支持的图像路径

path = str(Path(path))
if os.path.isfile(path):
    with open(path, "r") as f:
        f = f.read().splitlines()
else:
    raise Exception("%s does not exist" % path)

#img_formats = ['.bmp', '.jpg', '.jpeg', '.png', '.tif', '.dng']

self.img_files = [x for x in f if os.path.splitext(x)[-1].lower() in img_formats]

继续往下看：

如果没有找到图像则报错
bi: 将数据划分到一个个 batch ，将图片索引除以 batch_size 并向下取整，给同一个 batch 的图像分配相同的 index ，只在验证集中使用
nb: 存储数据集被分成了多少个 batch
self.n: 存储图像总个数
self.batch: 每个图像对应的 batch 的索引
self.img_size: 设置预处理后输出的图片尺寸
self.augment: 是否该其数据增强
self.hyp: 超参数字典
self.rect: 是否开启 rectangular
self.mosaic: 是否开启马赛克数据增强，训练时开启
self.label_files: 存储标签的路径，将目录中的 images 替换成 labels ，将后缀替换成 .txt
sp: 查看 data/my_train_data.shapes 目录下是否缓存有对应数据集的 .shapes 文件，里面存储了每张图像的 width, height
如果 sp 文件不存在，则生成并保存到 sp 文件中
self.shapes: 记录每张图像的原始尺寸 (w, h)

n = len(self.img_files)
assert n > 0, "No images found in %s. See %s" % (path, help_url)

bi = np.floor(np.arange(n) / batch_size).astype(np.int)
nb = bi[-1] + 1  # number of batches

self.n = n
self.batch = bi
self.img_size = img_size
self.augment = augment
self.hyp = hyp
self.rect = rect
self.mosaic = self.augment and not self.rect

self.label_files = [x.replace("images", "labels").replace(os.path.splitext(x)[-1], ".txt")
                    for x in self.img_files]

sp = path.replace(".txt", ".shapes")  # shapefile path
try:
    with open(sp, "r") as f:  # read existing shapefile
        s = [x.split() for x in f.read().splitlines()]
        assert len(s) == n, "shapefile out of aync"
except Exception as e:
    if rank in [-1, 0]:
        image_files = tqdm(self.img_files, desc="Reading image shapes")
    else:
        image_files = self.img_files
    s = [exif_size(Image.open(f)) for f in image_files]
    np.savetxt(sp, s, fmt="%g")  # overwrite existing (if any)

self.shapes = np.array(s, dtype=np.float64)

下面这段代码是在验证时用到的 rectangular inference 矩形推理。

计算每张图片的高/宽比
irect: argsort 函数返回的是数组值从小到大的索引值，按照高宽比例进行排序，这样后面划分的每个 batch 中的图像就拥有类似的高宽比
self.img_files,self.label_files,self.shapes: 根据排序后的顺序重新设置图像路径的顺序、标签顺序以及 shape 顺序
ar: 根据排序后的顺序重新设置高宽比
shapes: 计算每个 batch 采用的统一尺度，这里需要说一下 rectangular inference 的原理：将图像的较大的边 resize 到最大网络需要的大小，而较小的边 resize 相同的比例，再将较小的边填充到满足 32 的整数倍即可
ari: 遍历每一个 batch ，每次取出 batchsize 大小的 shape 信息，由于这些 shape 已经排过序，所以每一个 batch 的数据是相近的，加入 batchsize 是 4 ，那么 bi 里的数据是 [0,0,0,0,1,1,1,1…] 这样的格式， ar[bi == i] 这样就刚好每次取出 4 个数据
mini,maxi: 获取第 i 个 batch 中，最小和最大高宽比，这么做是为了保证一个 batch 内的图像缩放到相同的大小
如果高/宽小于 1(w > h) ，将 w 设为 img_size ，这里有一点要注意， shapes 里存储的是 (w,h) ，但是后面 load_image 用的是 (h, w) 所以 [maxi, 1] 对应的是 (h, w) ，当 w > h 时，将 w 缩放为 img_size ， maxi 表示 w 的缩放比例， h 也要缩放相应的比例，并且向上 32 对齐
如果高/宽大于 1(w < h) ，将 h 设置为 img_size

self.batch_shapes: 计算每个 batch 输入网络的 shape 值(向上设置为 32 的整数倍)

if self.rect:
  s = self.shapes  # wh
  ar = s[:, 1] / s[:, 0]  # aspect ratio
  irect = ar.argsort()
  self.img_files = [self.img_files[i] for i in irect]
  self.label_files = [self.label_files[i] for i in irect]
  self.shapes = s[irect]  # wh
  ar = ar[irect]

  shapes = [[1, 1]] * nb
  for i in range(nb):
      ari = ar[bi == i]
      mini, maxi = ari.min(), ari.max()

      if maxi < 1:
          shapes[i] = [maxi, 1]
      elif mini > 1:
          shapes[i] = [1, 1 / mini]
  self.batch_shapes = np.ceil(np.array(shapes) * img_size / 32. + pad).astype(np.int) * 32

下面这段代码加载标签：

n 为图像总数
self.label: 标签文件的存储格式为 [class, x, y, w, h] ，其中的 xywh 都为相对值
nm, nf, ne, nd: 分别表示有多少图像缺失，找到了多少张图像，有多少为空，有多少重复
np_labels_path: 保存 labels ，当 rect 为 True 时会对 self.images 和 self.labels 进行从新排序，所以要分别考虑。如果 np_labels_path 存在则从文件中加载 labels ，否则从 self.label_files 中加载

self.imgs = [None] * n
self.labels = [np.zeros((0, 5), dtype=np.float32)] * n
extract_bounding_boxes, labels_loaded = False, False
nm, nf, ne, nd = 0, 0, 0, 0  # number mission, found, empty, duplicate
if rect is True:
    np_labels_path = str(Path(self.label_files[0]).parent) + ".rect.npy"  # saved labels in *.npy file
else:
    np_labels_path = str(Path(self.label_files[0]).parent) + ".norect.npy"

if os.path.isfile(np_labels_path):
    x = np.load(np_labels_path, allow_pickle=True)
    if len(x) == n:
        self.labels = x
        labels_loaded = True

if rank in [-1, 0]:
    pbar = tqdm(self.label_files)
else:
    pbar = self.label_files

继续往下看：

遍历标签文件，如果存在缓存直接从缓存读取，否则从文件读取标签信息
读取每一行 label，并按空格划分数据，如果出现异常 nm 加 1
标签信息每行必须是五个值 [class, x, y, w, h]
检查每一行，看是否有重复信息，如果重复 nd 加 1

extract_bounding_boxes: 这个变量为 False ，这段代码可以先不看

for i, file in enumerate(pbar):
  if labels_loaded is True:
      l = self.labels[i]
  else:
      try:
          with open(file, "r") as f:
              l = np.array([x.split() for x in f.read().splitlines()], dtype=np.float32)
      except Exception as e:
          print("An error occurred while loading the file {}: {}".format(file, e))
          nm += 1  # file missing
          continue

  if l.shape[0]:
      assert l.shape[1] == 5, "> 5 label columns: %s" % file
      assert (l >= 0).all(), "negative labels: %s" % file
      assert (l[:, 1:] <= 1).all(), "non-normalized or out of bounds coordinate labels: %s" % file

      if np.unique(l, axis=0).shape[0] < l.shape[0]:  # duplicate rows
          nd += 1
      if single_cls:
          l[:, 0] = 0  # force dataset into single-class mode

      self.labels[i] = l
      nf += 1  # file found

      # Extract object detection boxes for a second stage classifier
      if extract_bounding_boxes:
          p = Path(self.img_files[i])
          img = cv2.imread(str(p))
          h, w = img.shape[:2]
          for j, x in enumerate(l):
              f = "%s%sclassifier%s%g_%g_%s" % (p.parent.parent, os.sep, os.sep, x[0], j, p.name)
              if not os.path.exists(Path(f).parent):
                  os.makedirs(Path(f).parent)  # make new output folder

              # 将相对坐标转为绝对坐标
              # b: x, y, w, h
              b = x[1:] * [w, h, w, h]  # box
              # 将宽和高设置为宽和高中的最大值
              b[2:] = b[2:].max()  # rectangle to square
              # 放大裁剪目标的宽高
              b[2:] = b[2:] * 1.3 + 30  # pad
              # 将坐标格式从 x,y,w,h -> xmin,ymin,xmax,ymax
              b = xywh2xyxy(b.reshape(-1, 4)).revel().astype(np.int)

              # 裁剪bbox坐标到图片内
              b[[0, 2]] = np.clip[b[[0, 2]], 0, w]
              b[[1, 3]] = np.clip[b[[1, 3]], 0, h]
              assert cv2.imwrite(f, img[b[1]:b[3], b[0]:b[2]]), "Failure extracting classifier boxes"
  else:
      ne += 1  # file empty

  # 处理进度条只在第一个进程中显示
  if rank in [-1, 0]:
      # 更新进度条描述信息
      pbar.desc = "Caching labels (%g found, %g missing, %g empty, %g duplicate, for %g images)" % (
          nf, nm, ne, nd, n)
assert nf > 0, "No labels found in %s." % os.path.dirname(self.label_files[0]) + os.sep

继续往下看：

如果标签信息没有被保存成 numpy 的格式，且训练样本数大于 1000 则将标签信息保存成 numpy 的格式
接下来的代码时缓存图像数据，受限于硬件， cache_images 一般为 False
self.imgs, self.img_hw0, self.img_hw: 这几个变量用来缓存图像数据， img_hw0 表示原始图像的宽高， img_hw 表示 resize 之后的图像的宽高

if not labels_loaded and n > 1000:
    print("Saving labels to %s for faster future loading" % np_labels_path)
    np.save(np_labels_path, self.labels)  # save for next time

# Cache images into memory for faster training (Warning: large datasets may exceed system RAM)
if cache_images:  # if training
    gb = 0  # Gigabytes of cached images 用于记录缓存图像占用RAM大小
    if rank in [-1, 0]:
        pbar = tqdm(range(len(self.img_files)), desc="Caching images")
    else:
        pbar = range(len(self.img_files))

    self.img_hw0, self.img_hw = [None] * n, [None] * n
    for i in pbar:  # max 10k images
        self.imgs[i], self.img_hw0[i], self.img_hw[i] = load_image(self, i)  # img, hw_original, hw_resized
        gb += self.imgs[i].nbytes  # 用于记录缓存图像占用RAM大小
        if rank in [-1, 0]:
            pbar.desc = "Caching images (%.1fGB)" % (gb / 1E9)

分析完构造函数后，接下来看获取数据的接口，获取数据分为两种，获取训练数据和获取验证数据。

self.mosaic: 训练时该变量为 True ，使用马赛克数据集，后面在介绍
验证时 self.mosaic=False ，self.rect=True
shape: 获取当前获取图像所对应的 batch 的图像尺度
使用 letterbox 对图像进行预处理，并且重新计算 labels，这个也在后面介绍
self.augment: 训练时该变量为 True ，对数据进行增强：旋转缩放、调节透明度、对比度、翻转等

将 labels 坐标转换成中心坐标和框高模式，并归一化，最后将图像由 BGR 转换成 RGB ，并将维度由 HWC 转换成 CHW

def __getitem__(self, index):
  hyp = self.hyp
  if self.mosaic:
      # load mosaic
      img, labels = load_mosaic(self, index)
      shapes = None
  else:
      # load image
      img, (h0, w0), (h, w) = load_image(self, index)

      # letterbox
      shape = self.batch_shapes[self.batch[index]] if self.rect else self.img_size  # final letterboxed shape
      img, ratio, pad = letterbox(img, shape, auto=False, scale_up=self.augment)
      shapes = (h0, w0), ((h / h0, w / w0), pad)  # for COCO mAP rescaling

      # load labels
      labels = []
      x = self.labels[index]
      if x.size > 0:
          # Normalized xywh to pixel xyxy format
          labels = x.copy()  # label: class, x, y, w, h
          labels[:, 1] = ratio[0] * w * (x[:, 1] - x[:, 3] / 2) + pad[0]  # pad width
          labels[:, 2] = ratio[1] * h * (x[:, 2] - x[:, 4] / 2) + pad[1]  # pad height
          labels[:, 3] = ratio[0] * w * (x[:, 1] + x[:, 3] / 2) + pad[0]
          labels[:, 4] = ratio[1] * h * (x[:, 2] + x[:, 4] / 2) + pad[1]

  if self.augment:
      # Augment imagespace
      if not self.mosaic:
          img, labels = random_affine(img, labels,
                                      degrees=hyp["degrees"],
                                      translate=hyp["translate"],
                                      scale=hyp["scale"],
                                      shear=hyp["shear"])

      # Augment colorspace
      augment_hsv(img, h_gain=hyp["hsv_h"], s_gain=hyp["hsv_s"], v_gain=hyp["hsv_v"])

  nL = len(labels)  # number of labels
  if nL:
      # convert xyxy to xywh
      labels[:, 1:5] = xyxy2xywh(labels[:, 1:5])

      # Normalize coordinates 0-1
      labels[:, [2, 4]] /= img.shape[0]  # height
      labels[:, [1, 3]] /= img.shape[1]  # width

  if self.augment:
      # random left-right flip
      lr_flip = True
      if lr_flip and random.random() < 0.5:
          img = np.fliplr(img)
          if nL:
              labels[:, 1] = 1 - labels[:, 1]  # 1 - x_center

      # random up-down flip
      ud_flip = False
      if ud_flip and random.random() < 0.5:
          img = np.flipud(img)
          if nL:
              labels[:, 2] = 1 - labels[:, 2]  # 1 - y_center

  labels_out = torch.zeros((nL, 6))  # nL: number of labels
  if nL:
      # labels_out[:, 0] = index
      labels_out[:, 1:] = torch.from_numpy(labels)

  # Convert BGR to RGB, and HWC to CHW(3x512x512)
  img = img[:, :, ::-1].transpose(2, 0, 1)
  img = np.ascontiguousarray(img)

  return torch.from_numpy(img), labels_out, self.img_files[index], shapes, index

下面看 load_image 函数，该函数首先从 self.imgs 读取数据，如果没有使用 opencv 接口从文件读取数据。

r: 将较大边 resize 到 img_size 大小，这里的 r 是缩放比例

函数返回 img 数据和原始图像的宽高和 resize 之后的宽高

def load_image(self, index):
  img = self.imgs[index]
  if img is None:  # not cached
      path = self.img_files[index]
      img = cv2.imread(path)  # BGR
      assert img is not None, "Image Not Found " + path
      h0, w0 = img.shape[:2]  # orig hw
      # img_size 设置的是预处理后输出的图片尺寸
      r = self.img_size / max(h0, w0)  # resize image to img_size
      if r != 1:  # always resize down, only resize up if training with augmentation
          interp = cv2.INTER_AREA if r < 1 and not self.augment else cv2.INTER_LINEAR
          img = cv2.resize(img, (int(w0 * r), int(h0 * r)), interpolation=interp)
      return img, (h0, w0), img.shape[:2]  # img, hw_original, hw_resized
  else:
      return self.imgs[index], self.img_hw0[index], self.img_hw[index]  # img, hw_original, hw_resized

下面先介绍 letterbox 函数，首先获得每一个 batch 的 shape ， self.batch 在构造函数里已经初始化好了，是 [0,0,0,0,1,1,1,1…] 这样的数据，所以在一个 batch 内取到的 shape 值是一样的。将 shape 传入该函数的第二个参数

图像在 load_image 加载时已经将最大边长缩放到 img_size ，最小边也缩放了相同的比例，所以在 letterbox 中只需要将每一张图像调整到该 batch 指定的 shape 大小
dw, dh: 这两个值就是上小和左右要填充的大小
计算 top, bottom, left, right 四个值，用来填充四个边界
使用 copyMakeBorder 函数将图像进行填充，填充值为 (114, 114, 114)

shape = self.batch_shapes[self.batch[index]] if self.rect else self.img_size  # final letterboxed shape
img, ratio, pad = letterbox(img, shape, auto=False, scale_up=self.augment)

def letterbox(img: np.ndarray,
              new_shape=(416, 416),
              color=(114, 114, 114),
              auto=True,
              scale_fill=False,
              scale_up=True):

    shape = img.shape[:2]  # [h, w]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    if not scale_up:  # only scale down, do not scale up (for better test mAP) 
        r = min(r, 1.0)

    # compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
    if auto:  # minimun rectangle 保证原图比例不变，将图像最大边缩放到指定大小
        # 这里的取余操作可以保证padding后的图片是32的整数倍(416x416)，如果是(512x512)可以保证是64的整数倍
        dw, dh = np.mod(dw, 64), np.mod(dh, 64)  # wh padding
    elif scale_fill:  # stretch 简单粗暴的将图片缩放到指定尺寸
        dw, dh = 0, 0
        new_unpad = new_shape
        ratio = new_shape[0] / shape[1], new_shape[1] / shape[0]  # wh ratios

    dw /= 2  # divide padding into 2 sides 将padding分到上下，左右两侧
    dh /= 2

    # shape:[h, w]  new_unpad:[w, h]
    if shape[::-1] != new_unpad:
        img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))  # 计算上下两侧的padding
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))  # 计算左右两侧的padding

    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
    return img, ratio, (dw, dh)

下面在看看马塞克数据增强的代码，主要代码在 load_mosaic 函数中，该函数的作用是将四张图片拼接成一张图片，这极大丰富了检测物体的背景，且在标准化 BN 计算的时候一下子会计算四张图片的数据。

labels4: 保存拼接后图像 labels 信息
s: 原始输入图像的大小
xc, yc: 在 (s0.5, s1.5) 范围内随机生成一个中心点坐标
在数据集中随机寻找三张图片
遍历四张图像，分别加载每一张图像
i == 0 时，在左上角放置第一张图像，首先 np.full 创建一个 (s2, s2) 大小的画布，填充值 114 。计算左上角图像的位置，如果图像的宽高大于中心点坐标，需要截断，如果小于中心点坐标则图像信息向中心点对齐。计算截取的图像区域信息(以 xc,yc 为第一张图像的右下角坐标填充到马赛克图像中，丢弃越界的区域)
i == 1 时，在右上角放置第二张图像，计算截取的图像区域信息(以 xc,yc 为第二张图像的左下角坐标填充到马赛克图像中，丢弃越界的区域)
i == 2 时，在左下角放置第三张图像，计算截取的图像区域信息(以 xc,yc 为第三张图像的右上角坐标填充到马赛克图像中，丢弃越界的区域)
i == 3 时，在右下角放置第四张图像，计算截取的图像区域信息(以 xc,yc 为第四张图像的左上角坐标填充到马赛克图像中，丢弃越界的区域)
将截取的图像区域填充到马赛克图像的相应位置
padw, padh: 计算 pad (图像边界与马赛克边界的距离，越界的情况为负值) ，如果大于 0 ，说明马赛克图像对应区域大于图像大小，如果小于 0 ，说明马赛克图像对应区域小于图像大小
labels: 获取对应拼接图像的 labels 信息
计算标注数据在马赛克图像中的，前面是图像缩放后目标的位置，加上在马赛克中的偏移
将 labels4 堆叠起来，将马赛克图像上的超过边框的目标固定到马赛克图像内使用 np.clip

最后将新生成的马赛克图像和标签使用 random_affine 函数进行图像处理

def load_mosaic(self, index):
  labels4 = []
  s = self.img_size
  xc, yc = [int(random.uniform(s * 0.5, s * 1.5)) for _ in range(2)]  # mosaic center x, y
  indices = [index] + [random.randint(0, len(self.labels) - 1) for _ in range(3)]  # 3 additional image indices
  # 遍历四张图像进行拼接
  for i, index in enumerate(indices):
      # load image
      img, _, (h, w) = load_image(self, index)

      # place img in img4
      if i == 0:  # top left
          img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8)  # base image with 4 tiles
          x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc  # xmin, ymin, xmax, ymax (large image)
          x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h  # xmin, ymin, xmax, ymax (small image)
      elif i == 1:  # top right
          x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc
          x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h
      elif i == 2:  # bottom left
          x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)
          x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, max(xc, w), min(y2a - y1a, h)
      elif i == 3:  # bottom right
          x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)
          x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)

      img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]  # img4[ymin:ymax, xmin:xmax]
      padw = x1a - x1b
      padh = y1a - y1b

      x = self.labels[index]
      labels = x.copy()  # 深拷贝，防止修改原数据
      if x.size > 0:  # Normalized xywh to pixel xyxy format
          labels[:, 1] = w * (x[:, 1] - x[:, 3] / 2) + padw
          labels[:, 2] = h * (x[:, 2] - x[:, 4] / 2) + padh
          labels[:, 3] = w * (x[:, 1] + x[:, 3] / 2) + padw
          labels[:, 4] = h * (x[:, 2] + x[:, 4] / 2) + padh
      labels4.append(labels)

  # Concat/clip labels
  if len(labels4):
      labels4 = np.concatenate(labels4, 0)
      # np.clip(labels4[:, 1:] - s / 2, 0, s, out=labels4[:, 1:])  # use with center crop
      np.clip(labels4[:, 1:], 0, 2 * s, out=labels4[:, 1:])  # use with random_affine

  # Augment
  # img4 = img4[s // 2: int(s * 1.5), s // 2:int(s * 1.5)]  # center crop (WARNING, requires box pruning)
  img4, labels4 = random_affine(img4, labels4,
                                degrees=self.hyp['degrees'],
                                translate=self.hyp['translate'],
                                scale=self.hyp['scale'],
                                shear=self.hyp['shear'],
                                border=-s // 2)  # border to remove

  return img4, labels4

数据加载器在获取 batchsize 大小的数据后，将数据打包，打包函数是 collate_fn 。

在这里添加标签对应的 image 的索引

将 img 和 label 在维度 0 上进行拼接

def collate_fn(batch):
  img, label, path, shapes, index = zip(*batch)  # transposed
  for i, l in enumerate(label):
      l[:, 0] = i  # add target image index for build_targets()
  return torch.stack(img, 0), torch.cat(label, 0), path, shapes, index

以上就是所有数据加载相关的内容。

YOLO 源代码分析(二) YOLOv3 数据加载

1. 概述

1. 数据集处理

1.1. 格式转换

1.2. 生成准备文件

2. 数据预处理

CATALOG

FEATURED TAGS