本站消息

站长简介/公众号


站长简介:高级软件工程师,曾在阿里云,每日优鲜从事全栈开发工作,利用周末时间开发出本站,欢迎关注我的微信公众号:程序员总部,程序员的家,探索程序员的人生之路!分享IT最新技术,关注行业最新动向,让你永不落伍。了解同行们的工资,生活工作中的酸甜苦辣,谋求程序员的最终出路!

  价值13000svip视频教程,python大神匠心打造,零基础python开发工程师视频教程全套,基础+进阶+项目实战,包含课件和源码

  出租广告位,需要合作请联系站长

+关注
已关注

分类  

暂无分类

标签  

暂无标签

日期归档  

2020-12(15)

2021-01(43)

【人脸检测】Paddle复现RetinaFace详细解析

发布于2021-11-21 18:20     阅读(866)     评论(0)     点赞(27)     收藏(2)



Paddle复现RetinaFace详细解析

RetinaFace前向推理

分析主要分以下部分:

1,网络主干结构

2,网络的后处理

3, 网络前向推理

1,网络的主干结构复现

网络结构图如下:

这里复现部分做了精简,5层FPN删减为3层,主干为mobilinet

In [10]

  1. # 专干网络所用的模块
  2. # View dataset directory.
  3. import paddle
  4. import paddle.nn as nn
  5. import paddle.nn.functional as F
  6. def conv_bn(inp, oup, stride = 1, leaky = 0):
  7. return nn.Sequential(
  8. nn.Conv2D(inp, oup, 3, stride, 1, bias_attr=False),
  9. nn.BatchNorm2D(oup),
  10. nn.LeakyReLU(negative_slope=leaky)
  11. )
  12. def conv_bn_no_relu(inp, oup, stride):
  13. return nn.Sequential(
  14. nn.Conv2D(inp, oup, 3, stride, 1, bias_attr=False),
  15. nn.BatchNorm2D(oup),
  16. )
  17. def conv_bn1X1(inp, oup, stride, leaky=0):
  18. return nn.Sequential(
  19. nn.Conv2D(inp, oup, 1, stride, padding=0, bias_attr=False),
  20. nn.BatchNorm2D(oup),
  21. nn.LeakyReLU(negative_slope=leaky)
  22. )
  23. def conv_dw(inp, oup, stride, leaky=0.1):
  24. return nn.Sequential(
  25. nn.Conv2D(inp, inp, 3, stride, 1, groups=inp, bias_attr=False),
  26. nn.BatchNorm2D(inp),
  27. nn.LeakyReLU(negative_slope=leaky),
  28. nn.Conv2D(inp, oup, 1, 1, 0, bias_attr=False),
  29. nn.BatchNorm2D(oup),
  30. nn.LeakyReLU(negative_slope=leaky),
  31. )
  32. class SSH(nn.Layer):
  33. def __init__(self, in_channel, out_channel):
  34. super(SSH, self).__init__()
  35. assert out_channel % 4 == 0
  36. leaky = 0
  37. if (out_channel <= 64):
  38. leaky = 0.1
  39. self.conv3X3 = conv_bn_no_relu(in_channel, out_channel//2, stride=1)
  40. self.conv5X5_1 = conv_bn(in_channel, out_channel//4, stride=1, leaky = leaky)
  41. self.conv5X5_2 = conv_bn_no_relu(out_channel//4, out_channel//4, stride=1)
  42. self.conv7X7_2 = conv_bn(out_channel//4, out_channel//4, stride=1, leaky = leaky)
  43. self.conv7x7_3 = conv_bn_no_relu(out_channel//4, out_channel//4, stride=1)
  44. def forward(self, input):
  45. conv3X3 = self.conv3X3(input)
  46. conv5X5_1 = self.conv5X5_1(input)
  47. conv5X5 = self.conv5X5_2(conv5X5_1)
  48. conv7X7_2 = self.conv7X7_2(conv5X5_1)
  49. conv7X7 = self.conv7x7_3(conv7X7_2)
  50. out = paddle.concat([conv3X3, conv5X5, conv7X7], axis=1)
  51. out = F.relu(out)
  52. return out
  53. class FPN(nn.Layer):
  54. def __init__(self,in_channels_list,out_channels):
  55. super(FPN,self).__init__()
  56. leaky = 0
  57. if (out_channels <= 64):
  58. leaky = 0.1
  59. self.output1 = conv_bn1X1(in_channels_list[0], out_channels, stride = 1, leaky = leaky)
  60. self.output2 = conv_bn1X1(in_channels_list[1], out_channels, stride = 1, leaky = leaky)
  61. self.output3 = conv_bn1X1(in_channels_list[2], out_channels, stride = 1, leaky = leaky)
  62. self.merge1 = conv_bn(out_channels, out_channels, leaky = leaky)
  63. self.merge2 = conv_bn(out_channels, out_channels, leaky = leaky)
  64. def forward(self, input):
  65. # names = list(input.keys())
  66. #input = list(input.values())
  67. input = list(input)
  68. output1 = self.output1(input[0])
  69. output2 = self.output2(input[1])
  70. output3 = self.output3(input[2])
  71. up3 = F.interpolate(output3, size=[output2.shape[2], output2.shape[3]], mode="nearest")
  72. output2 = output2 + up3
  73. output2 = self.merge2(output2)
  74. up2 = F.interpolate(output2, size=[output1.shape[2], output1.shape[3]], mode="nearest")
  75. output1 = output1 + up2
  76. output1 = self.merge1(output1)
  77. out = [output1, output2, output3]
  78. return out
  79. class MobileNetV1(nn.Layer):
  80. def __init__(self):
  81. super(MobileNetV1, self).__init__()
  82. self.stage1 = nn.Sequential(
  83. conv_bn(3, 8, 2, leaky = 0.1), # 3
  84. conv_dw(8, 16, 1), # 7
  85. conv_dw(16, 32, 2), # 11
  86. conv_dw(32, 32, 1), # 19
  87. conv_dw(32, 64, 2), # 27
  88. conv_dw(64, 64, 1), # 43
  89. )
  90. self.stage2 = nn.Sequential(
  91. conv_dw(64, 128, 2), # 43 + 16 = 59
  92. conv_dw(128, 128, 1), # 59 + 32 = 91
  93. conv_dw(128, 128, 1), # 91 + 32 = 123
  94. conv_dw(128, 128, 1), # 123 + 32 = 155
  95. conv_dw(128, 128, 1), # 155 + 32 = 187
  96. conv_dw(128, 128, 1), # 187 + 32 = 219
  97. )
  98. self.stage3 = nn.Sequential(
  99. conv_dw(128, 256, 2), # 219 +3 2 = 241
  100. conv_dw(256, 256, 1), # 241 + 64 = 301
  101. )
  102. self.avg = nn.AdaptiveAvgPool2D((1,1))
  103. self.fc = nn.Linear(256, 1000)
  104. def forward(self, x):
  105. x1 = self.stage1(x)
  106. x2 = self.stage2(x1)
  107. x3 = self.stage3(x2)
  108. #x = self.avg(x)
  109. # x = self.model(x)
  110. #x = x.view(-1, 256)
  111. #x = self.fc(x)
  112. out = [x1,x2,x3]
  113. return out

以上完成的是fpn,ssh等结构代码,修改后的MobileNetV1,前向传播完成以上部分,将得到三个输出,每个输出将再分别连接三个卷积得到分类,定位,关键点内容,实现如下:

In [11]

  1. # 网络主干部分,前向推理
  2. import paddle
  3. import paddle.nn as nn
  4. import paddle.nn.functional as F
  5. class ClassHead(nn.Layer):
  6. def __init__(self,inchannels=512,num_anchors=3):
  7. super(ClassHead,self).__init__()
  8. self.num_anchors = num_anchors
  9. self.conv1x1 = nn.Conv2D(inchannels,self.num_anchors*2,kernel_size=(1,1),stride=1,padding=0)
  10. def forward(self,x):
  11. out = self.conv1x1(x)
  12. out = out.transpose([0,2,3,1])
  13. return out.reshape([out.shape[0], -1, 2])
  14. class BboxHead(nn.Layer):
  15. def __init__(self,inchannels=512,num_anchors=3):
  16. super(BboxHead,self).__init__()
  17. self.conv1x1 = nn.Conv2D(inchannels,num_anchors*4,kernel_size=(1,1),stride=1,padding=0)
  18. def forward(self,x):
  19. out = self.conv1x1(x)
  20. out = out.transpose([0,2,3,1])
  21. return out.reshape([out.shape[0], -1, 4])
  22. class LandmarkHead(nn.Layer):
  23. def __init__(self,inchannels=512,num_anchors=3):
  24. super(LandmarkHead,self).__init__()
  25. self.conv1x1 = nn.Conv2D(inchannels,num_anchors*10,kernel_size=(1,1),stride=1,padding=0)
  26. def forward(self,x):
  27. out = self.conv1x1(x)
  28. out = out.transpose([0,2,3,1])
  29. return out.reshape([out.shape[0], -1, 10])
  30. class RetinaFace(nn.Layer):
  31. def __init__(self, cfg = None, phase = 'train'):
  32. """
  33. :param cfg: Network related settings.
  34. :param phase: train or test.
  35. """
  36. super(RetinaFace,self).__init__()
  37. self.phase = phase
  38. backbone = None
  39. if cfg['name'] == 'mobilenet0.25':
  40. backbone = MobileNetV1()
  41. if cfg['pretrain']:
  42. checkpoint = paddle.load("./weights/mobilenetV1X0.25_pretrain.pdparams")
  43. backbone.set_state_dict(checkpoint)
  44. elif cfg['name'] == 'Resnet50':
  45. import paddle.vision.models as models
  46. backbone = models.resnet50(pretrained=cfg['pretrain'])
  47. self.body = backbone
  48. in_channels_stage2 = cfg['in_channel']
  49. in_channels_list = [
  50. in_channels_stage2 * 2,
  51. in_channels_stage2 * 4,
  52. in_channels_stage2 * 8,
  53. ]
  54. out_channels = cfg['out_channel']
  55. self.fpn = FPN(in_channels_list,out_channels)
  56. self.ssh1 = SSH(out_channels, out_channels)
  57. self.ssh2 = SSH(out_channels, out_channels)
  58. self.ssh3 = SSH(out_channels, out_channels)
  59. self.ClassHead = self._make_class_head(fpn_num=3, inchannels=cfg['out_channel'])
  60. self.BboxHead = self._make_bbox_head(fpn_num=3, inchannels=cfg['out_channel'])
  61. self.LandmarkHead = self._make_landmark_head(fpn_num=3, inchannels=cfg['out_channel'])
  62. def _make_class_head(self,fpn_num=3,inchannels=64,anchor_num=2):
  63. classhead = nn.LayerList()
  64. for i in range(fpn_num):
  65. classhead.append(ClassHead(inchannels,anchor_num))
  66. return classhead
  67. def _make_bbox_head(self,fpn_num=3,inchannels=64,anchor_num=2):
  68. bboxhead = nn.LayerList()
  69. for i in range(fpn_num):
  70. bboxhead.append(BboxHead(inchannels,anchor_num))
  71. return bboxhead
  72. def _make_landmark_head(self,fpn_num=3,inchannels=64,anchor_num=2):
  73. landmarkhead = nn.LayerList()
  74. for i in range(fpn_num):
  75. landmarkhead.append(LandmarkHead(inchannels,anchor_num))
  76. return landmarkhead
  77. def forward(self,inputs):
  78. out = self.body(inputs)
  79. # FPN
  80. fpn = self.fpn(out)
  81. # SSH
  82. feature1 = self.ssh1(fpn[0])
  83. feature2 = self.ssh2(fpn[1])
  84. feature3 = self.ssh3(fpn[2])
  85. features = [feature1, feature2, feature3]
  86. bbox_regressions = paddle.concat([self.BboxHead[i](feature) for i, feature in enumerate(features)], axis=1)
  87. classifications = paddle.concat([self.ClassHead[i](feature) for i, feature in enumerate(features)],axis=1)
  88. ldm_regressions = paddle.concat([self.LandmarkHead[i](feature) for i, feature in enumerate(features)], axis=1)
  89. if self.phase == 'train':
  90. output = (bbox_regressions, classifications, ldm_regressions)
  91. else:
  92. output = (bbox_regressions, F.softmax(classifications, axis=-1), ldm_regressions)
  93. return output
  94. cfg_mnet = {
  95. 'name': 'mobilenet0.25',
  96. 'min_sizes': [[16, 32], [64, 128], [256, 512]],
  97. 'steps': [8, 16, 32],
  98. 'variance': [0.1, 0.2],
  99. 'clip': False,
  100. 'loc_weight': 2.0,
  101. 'gpu_train': True,
  102. 'batch_size': 32,
  103. 'ngpu': 1,
  104. 'epoch': 250,
  105. 'decay1': 190,
  106. 'decay2': 220,
  107. 'image_size': 640,
  108. 'pretrain': True,
  109. 'return_layers': {'stage1': 1, 'stage2': 2, 'stage3': 3},
  110. 'in_channel': 32,
  111. 'out_channel': 64
  112. }
  113. ##net = RetinaFace(cfg=cfg_mnet, phase = 'test')
  114. #net.eval()

2,预选框生成与网络结果后处理

先生成anchors, 使用推理结果和anchors进行解码

In [12]

  1. # 预选框生成, 如下方代码示例:
  2. import paddle
  3. from itertools import product as product
  4. from math import ceil
  5. class PriorBox(object):
  6. def __init__(self, cfg, image_size=None, phase='train'):
  7. super(PriorBox, self).__init__()
  8. self.min_sizes = cfg['min_sizes']
  9. self.steps = cfg['steps']
  10. self.clip = cfg['clip']
  11. self.image_size = image_size
  12. self.feature_maps = [[ceil(self.image_size[0]/step), ceil(self.image_size[1]/step)] for step in self.steps]
  13. self.name = "s"
  14. def forward(self):
  15. anchors = []
  16. for k, f in enumerate(self.feature_maps):
  17. min_sizes = self.min_sizes[k]
  18. for i, j in product(range(f[0]), range(f[1])):
  19. for min_size in min_sizes:
  20. s_kx = min_size / self.image_size[1]
  21. s_ky = min_size / self.image_size[0]
  22. dense_cx = [x * self.steps[k] / self.image_size[1] for x in [j + 0.5]]
  23. dense_cy = [y * self.steps[k] / self.image_size[0] for y in [i + 0.5]]
  24. for cy, cx in product(dense_cy, dense_cx):
  25. anchors += [cx, cy, s_kx, s_ky]
  26. # back to torch land
  27. output = paddle.to_tensor(anchors).reshape([-1, 4])
  28. if self.clip:
  29. output = output.clip(max=1, min=0)
  30. return output

In [13]

  1. # 同时添加如下代码, 这样每次环境(kernel)启动的时候只要运行下方代码即可:
  2. import paddle
  3. import numpy as np
  4. def index_fill(input, index, update):
  5. '''
  6. achieve Tensor.index_fill method
  7. only for this repo, it's not common use
  8. '''
  9. for i in range(len(index)):
  10. input[index[i]] = update
  11. return input
  12. def point_form(boxes):
  13. """ Convert prior_boxes to (xmin, ymin, xmax, ymax)
  14. representation for comparison to point form ground truth data.
  15. Args:
  16. boxes: (tensor) center-size default boxes from priorbox layers.
  17. Return:
  18. boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes.
  19. """
  20. return paddle.concat((boxes[:, :2] - boxes[:, 2:]/2, # xmin, ymin
  21. boxes[:, :2] + boxes[:, 2:]/2), 1) # xmax, ymax
  22. def center_size(boxes):
  23. """ Convert prior_boxes to (cx, cy, w, h)
  24. representation for comparison to center-size form ground truth data.
  25. Args:
  26. boxes: (tensor) point_form boxes
  27. Return:
  28. boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes.
  29. """
  30. return paddle.concat((boxes[:, 2:] + boxes[:, :2])/2, # cx, cy
  31. boxes[:, 2:] - boxes[:, :2], 1) # w, h
  32. def intersect(box_a, box_b):
  33. """ We resize both tensors to [A,B,2] without new malloc:
  34. [A,2] -> [A,1,2] -> [A,B,2]
  35. [B,2] -> [1,B,2] -> [A,B,2]
  36. Then we compute the area of intersect between box_a and box_b.
  37. Args:
  38. box_a: (tensor) bounding boxes, Shape: [A,4].
  39. box_b: (tensor) bounding boxes, Shape: [B,4].
  40. Return:
  41. (tensor) intersection area, Shape: [A,B].
  42. """
  43. A = box_a.shape[0]
  44. B = box_b.shape[0]
  45. max_xy = paddle.minimum(box_a[:, 2:].unsqueeze(1).expand([A, B, 2]),
  46. box_b[:, 2:].unsqueeze(0).expand([A, B, 2]))
  47. min_xy = paddle.maximum(box_a[:, :2].unsqueeze(1).expand([A, B, 2]),
  48. box_b[:, :2].unsqueeze(0).expand([A, B, 2]))
  49. inter = paddle.clip(max_xy - min_xy, min=0)
  50. return inter[:, :, 0] * inter[:, :, 1]
  51. def jaccard(box_a, box_b):
  52. """Compute the jaccard overlap of two sets of boxes. The jaccard overlap
  53. is simply the intersection over union of two boxes. Here we operate on
  54. ground truth boxes and default boxes.
  55. E.g.:
  56. A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B)
  57. Args:
  58. box_a: (tensor) Ground truth bounding boxes, Shape: [num_objects,4]
  59. box_b: (tensor) Prior boxes from priorbox layers, Shape: [num_priors,4]
  60. Return:
  61. jaccard overlap: (tensor) Shape: [box_a.shape[0], box_b.shape[0]]
  62. """
  63. inter = intersect(box_a, box_b)
  64. area_a = ((box_a[:, 2]-box_a[:, 0]) *
  65. (box_a[:, 3]-box_a[:, 1])).unsqueeze(1).expand_as(inter) # [A,B]
  66. area_b = ((box_b[:, 2]-box_b[:, 0]) *
  67. (box_b[:, 3]-box_b[:, 1])).unsqueeze(0).expand_as(inter) # [A,B]
  68. union = area_a + area_b - inter
  69. return inter / union # [A,B]
  70. def matrix_iou(a, b):
  71. """
  72. return iou of a and b, numpy version for data augenmentation
  73. """
  74. lt = np.maximum(a[:, np.newaxis, :2], b[:, :2])
  75. rb = np.minimum(a[:, np.newaxis, 2:], b[:, 2:])
  76. area_i = np.prod(rb - lt, axis=2) * (lt < rb).all(axis=2)
  77. area_a = np.prod(a[:, 2:] - a[:, :2], axis=1)
  78. area_b = np.prod(b[:, 2:] - b[:, :2], axis=1)
  79. return area_i / (area_a[:, np.newaxis] + area_b - area_i)
  80. def matrix_iof(a, b):
  81. """
  82. return iof of a and b, numpy version for data augenmentation
  83. """
  84. lt = np.maximum(a[:, np.newaxis, :2], b[:, :2])
  85. rb = np.minimum(a[:, np.newaxis, 2:], b[:, 2:])
  86. area_i = np.prod(rb - lt, axis=2) * (lt < rb).all(axis=2)
  87. area_a = np.prod(a[:, 2:] - a[:, :2], axis=1)
  88. return area_i / np.maximum(area_a[:, np.newaxis], 1)
  89. def match(threshold, truths, priors, variances, labels, landms, loc_t, conf_t, landm_t, idx):
  90. """Match each prior box with the ground truth box of the highest jaccard
  91. overlap, encode the bounding boxes, then return the matched indices
  92. corresponding to both confidence and location preds.
  93. Args:
  94. threshold: (float) The overlap threshold used when mathing boxes.
  95. truths: (tensor) Ground truth boxes, Shape: [num_obj, 4].
  96. priors: (tensor) Prior boxes from priorbox layers, Shape: [n_priors,4].
  97. variances: (tensor) Variances corresponding to each prior coord,
  98. Shape: [num_priors, 4].
  99. labels: (tensor) All the class labels for the image, Shape: [num_obj].
  100. landms: (tensor) Ground truth landms, Shape [num_obj, 10].
  101. loc_t: (tensor) Tensor to be filled w/ endcoded location targets.
  102. conf_t: (tensor) Tensor to be filled w/ matched indices for conf preds.
  103. landm_t: (tensor) Tensor to be filled w/ endcoded landm targets.
  104. idx: (int) current batch index
  105. Return:
  106. The matched indices corresponding to 1)location 2)confidence 3)landm preds.
  107. """
  108. # jaccard index
  109. overlaps = jaccard(
  110. truths,
  111. point_form(priors)
  112. )
  113. # (Bipartite Matching)
  114. # [1,num_objects] best prior for each ground truth
  115. best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True), overlaps.argmax(1, keepdim=True)
  116. # ignore hard gt
  117. valid_gt_idx = best_prior_overlap[:, 0] >= 0.2
  118. best_prior_idx_filter = best_prior_idx.masked_select(valid_gt_idx.unsqueeze(1)).unsqueeze(1)
  119. if best_prior_idx_filter.shape[0] <= 0:
  120. loc_t[idx] = 0
  121. conf_t[idx] = 0
  122. return
  123. # [1,num_priors] best ground truth for each prior
  124. best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True), overlaps.argmax(0, keepdim=True)
  125. best_truth_idx = best_truth_idx.squeeze(0)
  126. best_truth_overlap = best_truth_overlap.squeeze(0)
  127. best_prior_idx = best_prior_idx.squeeze(1)
  128. best_prior_idx_filter = best_prior_idx_filter.squeeze(1)
  129. best_prior_overlap = best_prior_overlap.squeeze(1)
  130. best_truth_overlap = index_fill(best_truth_overlap, best_prior_idx_filter, 2) # ensure best prior
  131. # TODO refactor: index best_prior_idx with long tensor
  132. # ensure every gt matches with its prior of max overlap
  133. for j in range(best_prior_idx.shape[0]): # 判别此anchor是预测哪一个boxes
  134. best_truth_idx[best_prior_idx[j]] = j
  135. matches = paddle.to_tensor(truths.numpy()[best_truth_idx.numpy()]) # Shape: [num_priors,4] 此处为每一个anchor对应的bbox取出来
  136. conf = paddle.to_tensor(labels.numpy()[best_truth_idx.numpy()]) # Shape: [num_priors] 此处为每一个anchor对应的label取出来
  137. temp_conf = conf.numpy()
  138. temp_conf[(best_truth_overlap < threshold).numpy()] = 0 # label as background overlap<0.35的全部作为负样本
  139. conf = paddle.to_tensor(temp_conf).astype('int32')
  140. loc = encode(matches, priors, variances)
  141. matches_landm = paddle.to_tensor(landms.numpy()[best_truth_idx.numpy()])
  142. landm = encode_landm(matches_landm, priors, variances)
  143. loc_t[idx] = loc # [num_priors,4] encoded offsets to learn
  144. conf_t[idx] = conf # [num_priors] top class label for each prior
  145. landm_t[idx] = landm
  146. def encode(matched, priors, variances):
  147. """Encode the variances from the priorbox layers into the ground truth boxes
  148. we have matched (based on jaccard overlap) with the prior boxes.
  149. Args:
  150. matched: (tensor) Coords of ground truth for each prior in point-form
  151. Shape: [num_priors, 4].
  152. priors: (tensor) Prior boxes in center-offset form
  153. Shape: [num_priors,4].
  154. variances: (list[float]) Variances of priorboxes
  155. Return:
  156. encoded boxes (tensor), Shape: [num_priors, 4]
  157. """
  158. # dist b/t match center and prior's center
  159. g_cxcy = (matched[:, :2] + matched[:, 2:])/2 - priors[:, :2]
  160. # encode variance
  161. g_cxcy /= (variances[0] * priors[:, 2:])
  162. # match wh / prior wh
  163. g_wh = (matched[:, 2:] - matched[:, :2]) / priors[:, 2:]
  164. g_wh = paddle.log(g_wh) / variances[1]
  165. # return target for smooth_l1_loss
  166. return paddle.concat([g_cxcy, g_wh], 1) # [num_priors,4]
  167. def encode_landm(matched, priors, variances):
  168. """Encode the variances from the priorbox layers into the ground truth boxes
  169. we have matched (based on jaccard overlap) with the prior boxes.
  170. Args:
  171. matched: (tensor) Coords of ground truth for each prior in point-form
  172. Shape: [num_priors, 10].
  173. priors: (tensor) Prior boxes in center-offset form
  174. Shape: [num_priors,4].
  175. variances: (list[float]) Variances of priorboxes
  176. Return:
  177. encoded landm (tensor), Shape: [num_priors, 10]
  178. """
  179. # dist b/t match center and prior's center
  180. matched = paddle.reshape(matched, [matched.shape[0], 5, 2])
  181. priors_cx = priors[:, 0].unsqueeze(1).expand([matched.shape[0], 5]).unsqueeze(2)
  182. priors_cy = priors[:, 1].unsqueeze(1).expand([matched.shape[0], 5]).unsqueeze(2)
  183. priors_w = priors[:, 2].unsqueeze(1).expand([matched.shape[0], 5]).unsqueeze(2)
  184. priors_h = priors[:, 3].unsqueeze(1).expand([matched.shape[0], 5]).unsqueeze(2)
  185. priors = paddle.concat([priors_cx, priors_cy, priors_w, priors_h], axis=2)
  186. g_cxcy = matched[:, :, :2] - priors[:, :, :2]
  187. # encode variance
  188. g_cxcy /= (variances[0] * priors[:, :, 2:])
  189. # g_cxcy /= priors[:, :, 2:]
  190. g_cxcy = g_cxcy.reshape([g_cxcy.shape[0], -1])
  191. # return target for smooth_l1_loss
  192. return g_cxcy
  193. # Adapted from https://github.com/Hakuyume/chainer-ssd
  194. def decode(loc, priors, variances):
  195. """Decode locations from predictions using priors to undo
  196. the encoding we did for offset regression at train time.
  197. Args:
  198. loc (tensor): location predictions for loc layers,
  199. Shape: [num_priors,4]
  200. priors (tensor): Prior boxes in center-offset form.
  201. Shape: [num_priors,4].
  202. variances: (list[float]) Variances of priorboxes
  203. Return:
  204. decoded bounding box predictions
  205. """
  206. boxes = paddle.concat((
  207. priors[:, :2] + loc[:, :2] * variances[0] * priors[:, 2:],
  208. priors[:, 2:] * paddle.exp(loc[:, 2:] * variances[1])), 1)
  209. boxes[:, :2] -= boxes[:, 2:] / 2
  210. boxes[:, 2:] += boxes[:, :2]
  211. return boxes
  212. def decode_landm(pre, priors, variances):
  213. """Decode landm from predictions using priors to undo
  214. the encoding we did for offset regression at train time.
  215. Args:
  216. pre (tensor): landm predictions for loc layers,
  217. Shape: [num_priors,10]
  218. priors (tensor): Prior boxes in center-offset form.
  219. Shape: [num_priors,4].
  220. variances: (list[float]) Variances of priorboxes
  221. Return:
  222. decoded landm predictions
  223. """
  224. landms = paddle.concat((priors[:, :2] + pre[:, :2] * variances[0] * priors[:, 2:],
  225. priors[:, :2] + pre[:, 2:4] * variances[0] * priors[:, 2:],
  226. priors[:, :2] + pre[:, 4:6] * variances[0] * priors[:, 2:],
  227. priors[:, :2] + pre[:, 6:8] * variances[0] * priors[:, 2:],
  228. priors[:, :2] + pre[:, 8:10] * variances[0] * priors[:, 2:],
  229. ), axis=1)
  230. return landms
  231. def log_sum_exp(x):
  232. """Utility function for computing log_sum_exp while determining
  233. This will be used to determine unaveraged confidence loss across
  234. all examples in a batch.
  235. Args:
  236. x (Variable(tensor)): conf_preds from conf layers
  237. """
  238. x_max = x.max()
  239. return paddle.log(paddle.sum(paddle.exp(x-x_max), 1, keepdim=True)) + x_max
  240. # Original author: Francisco Massa:
  241. # https://github.com/fmassa/object-detection.torch
  242. # Ported to PyTorch by Max deGroot (02/01/2017)
  243. def nms(boxes, scores, overlap=0.5, top_k=200):
  244. """Apply non-maximum suppression at test time to avoid detecting too many
  245. overlapping bounding boxes for a given object.
  246. Args:
  247. boxes: (tensor) The location preds for the img, Shape: [num_priors,4].
  248. scores: (tensor) The class predscores for the img, Shape:[num_priors].
  249. overlap: (float) The overlap thresh for suppressing unnecessary boxes.
  250. top_k: (int) The Maximum number of box preds to consider.
  251. Return:
  252. The indices of the kept boxes with respect to num_priors.
  253. """
  254. keep = paddle.to_tensor(scores.shape[0]).fill_(0).long()
  255. if boxes.numel() == 0:
  256. return keep
  257. x1 = boxes[:, 0]
  258. y1 = boxes[:, 1]
  259. x2 = boxes[:, 2]
  260. y2 = boxes[:, 3]
  261. area = paddle.multiply(x2 - x1, y2 - y1)
  262. v, idx = scores.sort(0) # sort in ascending order
  263. # I = I[v >= 0.01]
  264. idx = idx[-top_k:] # indices of the top-k largest vals
  265. xx1 = boxes.new()
  266. yy1 = boxes.new()
  267. xx2 = boxes.new()
  268. yy2 = boxes.new()
  269. w = boxes.new()
  270. h = boxes.new()
  271. # keep = paddle.Tensor()
  272. count = 0
  273. while idx.numel() > 0:
  274. i = idx[-1] # index of current largest val
  275. # keep.append(i)
  276. keep[count] = i
  277. count += 1
  278. if idx.shape[0] == 1:
  279. break
  280. idx = idx[:-1] # remove kept element from view
  281. # load bboxes of next highest vals
  282. paddle.index_select(x1, 0, idx, out=xx1)
  283. paddle.index_select(y1, 0, idx, out=yy1)
  284. paddle.index_select(x2, 0, idx, out=xx2)
  285. paddle.index_select(y2, 0, idx, out=yy2)
  286. # store element-wise max with next highest score
  287. xx1 = paddle.clip(xx1, min=x1[i])
  288. yy1 = paddle.clip(yy1, min=y1[i])
  289. xx2 = paddle.clip(xx2, max=x2[i])
  290. yy2 = paddle.clip(yy2, max=y2[i])
  291. w.resize_as_(xx2)
  292. h.resize_as_(yy2)
  293. w = xx2 - xx1
  294. h = yy2 - yy1
  295. # check sizes of xx1 and xx2.. after each iteration
  296. w = paddle.clip(w, min=0.0)
  297. h = paddle.clip(h, min=0.0)
  298. inter = w*h
  299. # IoU = i / (area(a) + area(b) - i)
  300. rem_areas = paddle.index_select(area, 0, idx) # load remaining areas)
  301. union = (rem_areas - inter) + area[i]
  302. IoU = inter/union # store result in iou
  303. # keep only elements with an IoU <= overlap
  304. idx = idx[IoU.le(overlap)]
  305. return keep, count

3,网络前向推理

In [14]

  1. from __future__ import print_function
  2. import argparse
  3. import paddle
  4. import numpy as np
  5. import cv2
  6. import time
  7. def py_cpu_nms(dets, thresh):
  8. """Pure Python NMS baseline."""
  9. x1 = dets[:, 0]
  10. y1 = dets[:, 1]
  11. x2 = dets[:, 2]
  12. y2 = dets[:, 3]
  13. scores = dets[:, 4]
  14. areas = (x2 - x1 + 1) * (y2 - y1 + 1)
  15. order = scores.argsort()[::-1]
  16. keep = []
  17. while order.size > 0:
  18. i = order[0]
  19. keep.append(i)
  20. xx1 = np.maximum(x1[i], x1[order[1:]])
  21. yy1 = np.maximum(y1[i], y1[order[1:]])
  22. xx2 = np.minimum(x2[i], x2[order[1:]])
  23. yy2 = np.minimum(y2[i], y2[order[1:]])
  24. w = np.maximum(0.0, xx2 - xx1 + 1)
  25. h = np.maximum(0.0, yy2 - yy1 + 1)
  26. inter = w * h
  27. ovr = inter / (areas[i] + areas[order[1:]] - inter)
  28. inds = np.where(ovr <= thresh)[0]
  29. order = order[inds + 1]
  30. return keep
  31. parser = argparse.ArgumentParser(description='Retinaface')
  32. parser.add_argument('-m', '--trained_model', default='./weights/mobilenetV1X0.25_pretrain.pdparams',
  33. type=str, help='Trained state_dict file path to open')
  34. parser.add_argument('--network', default='mobile0.25', help='Backbone network mobile0.25 or resnet50')
  35. parser.add_argument('--cpu', action="store_true", default=False, help='Use cpu inference')
  36. parser.add_argument('--confidence_threshold', default=0.02, type=float, help='confidence_threshold')
  37. parser.add_argument('--top_k', default=5000, type=int, help='top_k')
  38. parser.add_argument('--nms_threshold', default=0.4, type=float, help='nms_threshold')
  39. parser.add_argument('--keep_top_k', default=750, type=int, help='keep_top_k')
  40. parser.add_argument('-s', '--save_image', action="store_true", default=True, help='show detection results')
  41. parser.add_argument('--vis_thres', default=0.6, type=float, help='visualization_threshold')
  42. #args = parser.parse_args()
  43. args = parser.parse_known_args()[0]
  44. def check_keys(model, pretrained_state_dict):
  45. ckpt_keys = set(pretrained_state_dict.keys())
  46. model_keys = set(model.state_dict().keys())
  47. used_pretrained_keys = model_keys & ckpt_keys
  48. unused_pretrained_keys = ckpt_keys - model_keys
  49. missing_keys = model_keys - ckpt_keys
  50. print('Missing keys:{}'.format(len(missing_keys)))
  51. print('Unused checkpoint keys:{}'.format(len(unused_pretrained_keys)))
  52. print('Used keys:{}'.format(len(used_pretrained_keys)))
  53. assert len(used_pretrained_keys) > 0, 'load NONE from pretrained checkpoint'
  54. return True
  55. def remove_prefix(state_dict, prefix):
  56. ''' Old style model is stored with all names of parameters sharing common prefix 'module.' '''
  57. print('remove prefix \'{}\''.format(prefix))
  58. f = lambda x: x.split(prefix, 1)[-1] if x.startswith(prefix) else x
  59. return {f(key): value for key, value in state_dict.items()}
  60. def load_model(model, pretrained_path):
  61. print('Loading pretrained model from {}'.format(pretrained_path))
  62. pretrained_dict = paddle.load(pretrained_path)
  63. if "state_dict" in pretrained_dict.keys():
  64. pretrained_dict = remove_prefix(pretrained_dict['state_dict'], 'module.')
  65. else:
  66. pretrained_dict = remove_prefix(pretrained_dict, 'module.')
  67. check_keys(model, pretrained_dict)
  68. model.set_state_dict(pretrained_dict)
  69. return model
  70. paddle.set_grad_enabled(False)
  71. cfg = cfg_mnet
  72. #args.network == "mobile0.25"
  73. # net and model
  74. net = RetinaFace(cfg=cfg_mnet, phase = 'test')
  75. net = load_model(net, 'test/mobilenet0.25_epoch_5.pdparams')
  76. net.eval()
  77. print('Finished loading model!')
  78. # print(net)
  79. resize = 1
  80. # testing begin
  81. image_path = "test.jpg"
  82. img_raw = cv2.imread(image_path, cv2.IMREAD_COLOR)
  83. img = np.float32(img_raw)
  84. im_height, im_width, _ = img.shape
  85. scale = paddle.to_tensor([img.shape[1], img.shape[0], img.shape[1], img.shape[0]])
  86. img -= (104, 117, 123)
  87. img /= (57.1,57.4,58.4)
  88. img = img.transpose(2, 0, 1)
  89. img = paddle.to_tensor(img).unsqueeze(0)
  90. tic = time.time()
  91. loc, conf, landms = net(img) # forward pass
  92. print('net forward time: {:.4f}'.format(time.time() - tic))
  93. priorbox = PriorBox(cfg, image_size=(im_height, im_width))
  94. priors = priorbox.forward()
  95. prior_data = priors
  96. boxes = decode(loc.squeeze(0), prior_data, cfg['variance'])
  97. boxes = boxes * scale / resize
  98. boxes = boxes.cpu().numpy()
  99. scores = conf.squeeze(0).cpu().numpy()[:, 1]
  100. landms = decode_landm(landms.squeeze(0), prior_data, cfg['variance'])
  101. scale1 = paddle.to_tensor([img.shape[3], img.shape[2], img.shape[3], img.shape[2],
  102. img.shape[3], img.shape[2], img.shape[3], img.shape[2],
  103. img.shape[3], img.shape[2]])
  104. landms = landms * scale1 / resize
  105. landms = landms.cpu().numpy()
  106. # ignore low scores
  107. inds = np.where(scores > args.confidence_threshold)[0]
  108. boxes = boxes[inds]
  109. landms = landms[inds]
  110. scores = scores[inds]
  111. # keep top-K before NMS
  112. order = scores.argsort()[::-1][:args.top_k]
  113. boxes = boxes[order]
  114. landms = landms[order]
  115. scores = scores[order]
  116. # do NMS
  117. dets = np.hstack((boxes, scores[:, np.newaxis])).astype(np.float32, copy=False)
  118. keep = py_cpu_nms(dets, args.nms_threshold)
  119. # keep = nms(dets, args.nms_threshold,force_cpu=args.cpu)
  120. dets = dets[keep, :]
  121. landms = landms[keep]
  122. # keep top-K faster NMS
  123. dets = dets[:args.keep_top_k, :]
  124. landms = landms[:args.keep_top_k, :]
  125. dets = np.concatenate((dets, landms), axis=1)
  126. # show image
  127. if args.save_image:
  128. for b in dets:
  129. if b[4] < args.vis_thres:
  130. continue
  131. text = "{:.4f}".format(b[4])
  132. b = list(map(int, b))
  133. cv2.rectangle(img_raw, (b[0], b[1]), (b[2], b[3]), (0, 0, 255), 2)
  134. cx = b[0]
  135. cy = b[1] + 12
  136. cv2.putText(img_raw, text, (cx, cy),
  137. cv2.FONT_HERSHEY_DUPLEX, 0.5, (255, 255, 255))
  138. # landms
  139. cv2.circle(img_raw, (b[5], b[6]), 1, (0, 0, 255), 4)
  140. cv2.circle(img_raw, (b[7], b[8]), 1, (0, 255, 255), 4)
  141. cv2.circle(img_raw, (b[9], b[10]), 1, (255, 0, 255), 4)
  142. cv2.circle(img_raw, (b[11], b[12]), 1, (0, 255, 0), 4)
  143. cv2.circle(img_raw, (b[13], b[14]), 1, (255, 0, 0), 4)
  144. # save image
  145. name = "test_out.jpg"
  146. cv2.imwrite(name, img_raw)
Loading pretrained model from test/mobilenet0.25_epoch_5.pdparams
remove prefix 'module.'
Missing keys:0
Unused checkpoint keys:0
Used keys:255
Finished loading model!
net forward time: 0.0325

RetinaFace反向传播

主要包括:

4,网络损失函数

5,训练数据组织

6,训练设置,迭代

4,网络损失函数

正向传播需要对网络的输出结果进行解码,训练需要根据预选框和真实结果进行编码

In [15]

  1. #网络损失函数
  2. def index_fill(input, index, update):
  3. '''
  4. achieve Tensor.index_fill method
  5. only for this repo, it's not common use
  6. '''
  7. for i in range(len(index)):
  8. input[index[i]] = update
  9. return input
  10. def point_form(boxes):
  11. """ Convert prior_boxes to (xmin, ymin, xmax, ymax)
  12. representation for comparison to point form ground truth data.
  13. Args:
  14. boxes: (tensor) center-size default boxes from priorbox layers.
  15. Return:
  16. boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes.
  17. """
  18. return paddle.concat((boxes[:, :2] - boxes[:, 2:]/2, # xmin, ymin
  19. boxes[:, :2] + boxes[:, 2:]/2), 1) # xmax, ymax
  20. def center_size(boxes):
  21. """ Convert prior_boxes to (cx, cy, w, h)
  22. representation for comparison to center-size form ground truth data.
  23. Args:
  24. boxes: (tensor) point_form boxes
  25. Return:
  26. boxes: (tensor) Converted xmin, ymin, xmax, ymax form of boxes.
  27. """
  28. return paddle.concat((boxes[:, 2:] + boxes[:, :2])/2, # cx, cy
  29. boxes[:, 2:] - boxes[:, :2], 1) # w, h
  30. def intersect(box_a, box_b):
  31. """ We resize both tensors to [A,B,2] without new malloc:
  32. [A,2] -> [A,1,2] -> [A,B,2]
  33. [B,2] -> [1,B,2] -> [A,B,2]
  34. Then we compute the area of intersect between box_a and box_b.
  35. Args:
  36. box_a: (tensor) bounding boxes, Shape: [A,4].
  37. box_b: (tensor) bounding boxes, Shape: [B,4].
  38. Return:
  39. (tensor) intersection area, Shape: [A,B].
  40. """
  41. A = box_a.shape[0]
  42. B = box_b.shape[0]
  43. max_xy = paddle.minimum(box_a[:, 2:].unsqueeze(1).expand([A, B, 2]),
  44. box_b[:, 2:].unsqueeze(0).expand([A, B, 2]))
  45. min_xy = paddle.maximum(box_a[:, :2].unsqueeze(1).expand([A, B, 2]),
  46. box_b[:, :2].unsqueeze(0).expand([A, B, 2]))
  47. inter = paddle.clip(max_xy - min_xy, min=0)
  48. return inter[:, :, 0] * inter[:, :, 1]
  49. def jaccard(box_a, box_b):
  50. """Compute the jaccard overlap of two sets of boxes. The jaccard overlap
  51. is simply the intersection over union of two boxes. Here we operate on
  52. ground truth boxes and default boxes.
  53. E.g.:
  54. A ∩ B / A ∪ B = A ∩ B / (area(A) + area(B) - A ∩ B)
  55. Args:
  56. box_a: (tensor) Ground truth bounding boxes, Shape: [num_objects,4]
  57. box_b: (tensor) Prior boxes from priorbox layers, Shape: [num_priors,4]
  58. Return:
  59. jaccard overlap: (tensor) Shape: [box_a.shape[0], box_b.shape[0]]
  60. """
  61. inter = intersect(box_a, box_b)
  62. area_a = ((box_a[:, 2]-box_a[:, 0]) *
  63. (box_a[:, 3]-box_a[:, 1])).unsqueeze(1).expand_as(inter) # [A,B]
  64. area_b = ((box_b[:, 2]-box_b[:, 0]) *
  65. (box_b[:, 3]-box_b[:, 1])).unsqueeze(0).expand_as(inter) # [A,B]
  66. union = area_a + area_b - inter
  67. return inter / union # [A,B]
  68. def matrix_iou(a, b):
  69. """
  70. return iou of a and b, numpy version for data augenmentation
  71. """
  72. lt = np.maximum(a[:, np.newaxis, :2], b[:, :2])
  73. rb = np.minimum(a[:, np.newaxis, 2:], b[:, 2:])
  74. area_i = np.prod(rb - lt, axis=2) * (lt < rb).all(axis=2)
  75. area_a = np.prod(a[:, 2:] - a[:, :2], axis=1)
  76. area_b = np.prod(b[:, 2:] - b[:, :2], axis=1)
  77. return area_i / (area_a[:, np.newaxis] + area_b - area_i)
  78. def matrix_iof(a, b):
  79. """
  80. return iof of a and b, numpy version for data augenmentation
  81. """
  82. lt = np.maximum(a[:, np.newaxis, :2], b[:, :2])
  83. rb = np.minimum(a[:, np.newaxis, 2:], b[:, 2:])
  84. area_i = np.prod(rb - lt, axis=2) * (lt < rb).all(axis=2)
  85. area_a = np.prod(a[:, 2:] - a[:, :2], axis=1)
  86. return area_i / np.maximum(area_a[:, np.newaxis], 1)
  87. def match(threshold, truths, priors, variances, labels, landms, loc_t, conf_t, landm_t, idx):
  88. """Match each prior box with the ground truth box of the highest jaccard
  89. overlap, encode the bounding boxes, then return the matched indices
  90. corresponding to both confidence and location preds.
  91. Args:
  92. threshold: (float) The overlap threshold used when mathing boxes.
  93. truths: (tensor) Ground truth boxes, Shape: [num_obj, 4].
  94. priors: (tensor) Prior boxes from priorbox layers, Shape: [n_priors,4].
  95. variances: (tensor) Variances corresponding to each prior coord,
  96. Shape: [num_priors, 4].
  97. labels: (tensor) All the class labels for the image, Shape: [num_obj].
  98. landms: (tensor) Ground truth landms, Shape [num_obj, 10].
  99. loc_t: (tensor) Tensor to be filled w/ endcoded location targets.
  100. conf_t: (tensor) Tensor to be filled w/ matched indices for conf preds.
  101. landm_t: (tensor) Tensor to be filled w/ endcoded landm targets.
  102. idx: (int) current batch index
  103. Return:
  104. The matched indices corresponding to 1)location 2)confidence 3)landm preds.
  105. """
  106. # jaccard index
  107. overlaps = jaccard(
  108. truths,
  109. point_form(priors)
  110. )
  111. # (Bipartite Matching)
  112. # [1,num_objects] best prior for each ground truth
  113. best_prior_overlap, best_prior_idx = overlaps.max(1, keepdim=True), overlaps.argmax(1, keepdim=True)
  114. # ignore hard gt
  115. valid_gt_idx = best_prior_overlap[:, 0] >= 0.2
  116. best_prior_idx_filter = best_prior_idx.masked_select(valid_gt_idx.unsqueeze(1)).unsqueeze(1)
  117. if best_prior_idx_filter.shape[0] <= 0:
  118. loc_t[idx] = 0
  119. conf_t[idx] = 0
  120. return
  121. # [1,num_priors] best ground truth for each prior
  122. best_truth_overlap, best_truth_idx = overlaps.max(0, keepdim=True), overlaps.argmax(0, keepdim=True)
  123. best_truth_idx = best_truth_idx.squeeze(0)
  124. best_truth_overlap = best_truth_overlap.squeeze(0)
  125. best_prior_idx = best_prior_idx.squeeze(1)
  126. best_prior_idx_filter = best_prior_idx_filter.squeeze(1)
  127. best_prior_overlap = best_prior_overlap.squeeze(1)
  128. best_truth_overlap = index_fill(best_truth_overlap, best_prior_idx_filter, 2) # ensure best prior
  129. # TODO refactor: index best_prior_idx with long tensor
  130. # ensure every gt matches with its prior of max overlap
  131. for j in range(best_prior_idx.shape[0]): # 判别此anchor是预测哪一个boxes
  132. best_truth_idx[best_prior_idx[j]] = j
  133. matches = paddle.to_tensor(truths.numpy()[best_truth_idx.numpy()]) # Shape: [num_priors,4] 此处为每一个anchor对应的bbox取出来
  134. conf = paddle.to_tensor(labels.numpy()[best_truth_idx.numpy()]) # Shape: [num_priors] 此处为每一个anchor对应的label取出来
  135. temp_conf = conf.numpy()
  136. temp_conf[(best_truth_overlap < threshold).numpy()] = 0 # label as background overlap<0.35的全部作为负样本
  137. conf = paddle.to_tensor(temp_conf).astype('int32')
  138. loc = encode(matches, priors, variances)
  139. matches_landm = paddle.to_tensor(landms.numpy()[best_truth_idx.numpy()])
  140. landm = encode_landm(matches_landm, priors, variances)
  141. loc_t[idx] = loc # [num_priors,4] encoded offsets to learn
  142. conf_t[idx] = conf # [num_priors] top class label for each prior
  143. landm_t[idx] = landm
  144. def encode(matched, priors, variances):
  145. """Encode the variances from the priorbox layers into the ground truth boxes
  146. we have matched (based on jaccard overlap) with the prior boxes.
  147. Args:
  148. matched: (tensor) Coords of ground truth for each prior in point-form
  149. Shape: [num_priors, 4].
  150. priors: (tensor) Prior boxes in center-offset form
  151. Shape: [num_priors,4].
  152. variances: (list[float]) Variances of priorboxes
  153. Return:
  154. encoded boxes (tensor), Shape: [num_priors, 4]
  155. """
  156. # dist b/t match center and prior's center
  157. g_cxcy = (matched[:, :2] + matched[:, 2:])/2 - priors[:, :2]
  158. # encode variance
  159. g_cxcy /= (variances[0] * priors[:, 2:])
  160. # match wh / prior wh
  161. g_wh = (matched[:, 2:] - matched[:, :2]) / priors[:, 2:]
  162. g_wh = paddle.log(g_wh) / variances[1]
  163. # return target for smooth_l1_loss
  164. return paddle.concat([g_cxcy, g_wh], 1) # [num_priors,4]
  165. def encode_landm(matched, priors, variances):
  166. """Encode the variances from the priorbox layers into the ground truth boxes
  167. we have matched (based on jaccard overlap) with the prior boxes.
  168. Args:
  169. matched: (tensor) Coords of ground truth for each prior in point-form
  170. Shape: [num_priors, 10].
  171. priors: (tensor) Prior boxes in center-offset form
  172. Shape: [num_priors,4].
  173. variances: (list[float]) Variances of priorboxes
  174. Return:
  175. encoded landm (tensor), Shape: [num_priors, 10]
  176. """
  177. # dist b/t match center and prior's center
  178. matched = paddle.reshape(matched, [matched.shape[0], 5, 2])
  179. priors_cx = priors[:, 0].unsqueeze(1).expand([matched.shape[0], 5]).unsqueeze(2)
  180. priors_cy = priors[:, 1].unsqueeze(1).expand([matched.shape[0], 5]).unsqueeze(2)
  181. priors_w = priors[:, 2].unsqueeze(1).expand([matched.shape[0], 5]).unsqueeze(2)
  182. priors_h = priors[:, 3].unsqueeze(1).expand([matched.shape[0], 5]).unsqueeze(2)
  183. priors = paddle.concat([priors_cx, priors_cy, priors_w, priors_h], axis=2)
  184. g_cxcy = matched[:, :, :2] - priors[:, :, :2]
  185. # encode variance
  186. g_cxcy /= (variances[0] * priors[:, :, 2:])
  187. # g_cxcy /= priors[:, :, 2:]
  188. g_cxcy = g_cxcy.reshape([g_cxcy.shape[0], -1])
  189. # return target for smooth_l1_loss
  190. return g_cxcy
  191. def log_sum_exp(x):
  192. """Utility function for computing log_sum_exp while determining
  193. This will be used to determine unaveraged confidence loss across
  194. all examples in a batch.
  195. Args:
  196. x (Variable(tensor)): conf_preds from conf layers
  197. """
  198. x_max = x.max()
  199. return paddle.log(paddle.sum(paddle.exp(x-x_max), 1, keepdim=True)) + x_max
  200. # Original author: Francisco Massa:
  201. # https://github.com/fmassa/object-detection.torch
  202. # Ported to PyTorch by Max deGroot (02/01/2017)

对人脸框和人脸关键点进行编码之后,可以计算损失:

In [16]

  1. GPU = cfg['gpu_train']
  2. class MultiBoxLoss(nn.Layer):
  3. """SSD Weighted Loss Function
  4. Compute Targets:
  5. 1) Produce Confidence Target Indices by matching ground truth boxes
  6. with (default) 'priorboxes' that have jaccard index > threshold parameter
  7. (default threshold: 0.5).
  8. 2) Produce localization target by 'encoding' variance into offsets of ground
  9. truth boxes and their matched 'priorboxes'.
  10. 3) Hard negative mining to filter the excessive number of negative examples
  11. that comes with using a large number of default bounding boxes.
  12. (default negative:positive ratio 3:1)
  13. Objective Loss:
  14. L(x,c,l,g) = (Lconf(x, c) + αLloc(x,l,g)) / N
  15. Where, Lconf is the CrossEntropy Loss and Lloc is the SmoothL1 Loss
  16. weighted by α which is set to 1 by cross val.
  17. Args:
  18. c: class confidences,
  19. l: predicted boxes,
  20. g: ground truth boxes
  21. N: number of matched default boxes
  22. See: https://arxiv.org/pdf/1512.02325.pdf for more details.
  23. """
  24. def __init__(self, num_classes, overlap_thresh, prior_for_matching, bkg_label, neg_mining, neg_pos, neg_overlap, encode_target):
  25. super(MultiBoxLoss, self).__init__()
  26. self.num_classes = num_classes
  27. self.threshold = overlap_thresh
  28. self.background_label = bkg_label
  29. self.encode_target = encode_target
  30. self.use_prior_for_matching = prior_for_matching
  31. self.do_neg_mining = neg_mining
  32. self.negpos_ratio = neg_pos
  33. self.neg_overlap = neg_overlap
  34. self.variance = [0.1, 0.2]
  35. def forward(self, predictions, priors, targets):
  36. """Multibox Loss
  37. Args:
  38. predictions (tuple): A tuple containing loc preds, conf preds,
  39. and prior boxes from SSD net.
  40. conf shape: paddle.shape(batch_size,num_priors,num_classes)
  41. loc shape: paddle.shape(batch_size,num_priors,4)
  42. priors shape: paddle.shape(num_priors,4)
  43. ground_truth (tensor): Ground truth boxes and labels for a batch,
  44. shape: [batch_size,num_objs,5] (last idx is the label).
  45. """
  46. loc_data, conf_data, landm_data = predictions
  47. priors = priors
  48. num = loc_data.shape[0]
  49. num_priors = (priors.shape[0])
  50. # match priors (default boxes) and ground truth boxes
  51. loc_t = paddle.randn([num, num_priors, 4])
  52. landm_t = paddle.randn([num, num_priors, 10])
  53. conf_t = paddle.zeros([num, num_priors], dtype='int32')
  54. for idx in range(num):
  55. truths = targets[idx][:, :4]
  56. labels = targets[idx][:, -1]
  57. landms = targets[idx][:, 4:14]
  58. defaults = priors
  59. match(self.threshold, truths, defaults, self.variance, labels, landms, loc_t, conf_t, landm_t, idx)
  60. # landm Loss (Smooth L1)
  61. # Shape: [batch,num_priors,10]
  62. pos1 = conf_t > 0
  63. num_pos_landm = pos1.astype('int64').sum(1, keepdim=True)
  64. N1 = max(num_pos_landm.sum().astype('float32'), 1)
  65. pos_idx1 = pos1.unsqueeze(pos1.dim()).expand_as(landm_data)
  66. landm_p = landm_data.masked_select(pos_idx1).reshape([-1, 10])
  67. landm_t = landm_t.masked_select(pos_idx1).reshape([-1, 10])
  68. loss_landm = F.smooth_l1_loss(landm_p, landm_t, reduction='sum')
  69. pos = conf_t != 0
  70. conf_t_temp = conf_t.numpy()
  71. conf_t_temp[pos.numpy()] = 1
  72. conf_t = paddle.to_tensor(conf_t_temp)
  73. # conf_t[pos] = 1
  74. # conf_t = conf_t.add(pos.astype('int64'))
  75. # Localization Loss (Smooth L1)
  76. # Shape: [batch,num_priors,4]
  77. pos_idx = pos.unsqueeze(pos.dim()).expand_as(loc_data)
  78. loc_p = loc_data.masked_select(pos_idx).reshape([-1, 4])
  79. loc_t = loc_t.masked_select(pos_idx).reshape([-1, 4])
  80. loss_l = F.smooth_l1_loss(loc_p, loc_t, reduction='sum')
  81. # Compute max conf across batch for hard negative mining
  82. batch_conf = conf_data.reshape([-1, self.num_classes])
  83. loss_c = log_sum_exp(batch_conf) - batch_conf.multiply(paddle.nn.functional.one_hot(conf_t.reshape([-1, 1]), 2).squeeze(1)).sum(1).unsqueeze(1)
  84. # Hard Negative Mining
  85. # loss_c[pos.reshape([-1, 1])] = 0 # filter out pos boxes for now
  86. loss_c = loss_c * (pos.reshape([-1, 1])==0).astype('float32')
  87. loss_c = loss_c.reshape([num, -1])
  88. loss_idx = loss_c.argsort(1, descending=True)
  89. idx_rank = loss_idx.argsort(1)
  90. num_pos = pos.astype('int64').sum(1, keepdim=True)
  91. num_neg = paddle.clip(self.negpos_ratio*num_pos, max=pos.shape[1]-1)
  92. neg = idx_rank < num_neg.expand_as(idx_rank)
  93. # Confidence Loss Including Positive and Negative Examples
  94. pos_idx = pos.unsqueeze(2).expand_as(conf_data)
  95. neg_idx = neg.unsqueeze(2).expand_as(conf_data)
  96. conf_p = conf_data.masked_select((pos_idx.logical_or(neg_idx)).astype('float32') > 0).reshape([-1,self.num_classes])
  97. targets_weighted = conf_t.masked_select((pos.logical_or(neg)).astype('float32') > 0)
  98. loss_c = F.cross_entropy(conf_p, targets_weighted.astype('int64'), reduction='sum')
  99. # Sum of losses: L(x,c,l,g) = (Lconf(x, c) + αLloc(x,l,g)) / N
  100. N = max(num_pos.sum().astype('float32'), 1)
  101. loss_l /= N
  102. loss_c /= N
  103. loss_landm /= N1
  104. return loss_l, loss_c, loss_landm

请点击此处查看本环境基本用法.
Please click here for more detailed instructions.

5,训练数据组织

构建加载训练数据函数,数据增强,构建数据生成器

In [17]

  1. #构建加载训练数据函数
  2. from paddle.io import Dataset
  3. from paddle.io import BatchSampler, DistributedBatchSampler, RandomSampler, SequenceSampler, DataLoader
  4. class WiderFaceDetection(Dataset):
  5. def __init__(self, txt_path, preproc=None):
  6. self.preproc = preproc
  7. self.imgs_path = []
  8. self.words = []
  9. f = open(txt_path,'r')
  10. lines = f.readlines()
  11. isFirst = True
  12. labels = []
  13. for line in lines:
  14. line = line.rstrip()
  15. if line.startswith('#'):
  16. if isFirst is True:
  17. isFirst = False
  18. else:
  19. labels_copy = labels.copy()
  20. self.words.append(labels_copy)
  21. labels.clear()
  22. path = line[2:]
  23. path = txt_path.replace('label.txt','images/') + path
  24. self.imgs_path.append(path)
  25. else:
  26. line = line.split(' ')
  27. label = [float(x) for x in line]
  28. labels.append(label)
  29. self.words.append(labels)
  30. def __len__(self):
  31. return len(self.imgs_path)
  32. def __getitem__(self, index):
  33. img = cv2.imread(self.imgs_path[index])
  34. height, width, _ = img.shape
  35. labels = self.words[index]
  36. annotations = np.zeros((0, 15))
  37. if len(labels) == 0:
  38. return annotations
  39. for idx, label in enumerate(labels):
  40. annotation = np.zeros((1, 15))
  41. # bbox
  42. annotation[0, 0] = label[0] # x1
  43. annotation[0, 1] = label[1] # y1
  44. annotation[0, 2] = label[0] + label[2] # x2
  45. annotation[0, 3] = label[1] + label[3] # y2
  46. # landmarks
  47. annotation[0, 4] = label[4] # l0_x
  48. annotation[0, 5] = label[5] # l0_y
  49. annotation[0, 6] = label[7] # l1_x
  50. annotation[0, 7] = label[8] # l1_y
  51. annotation[0, 8] = label[10] # l2_x
  52. annotation[0, 9] = label[11] # l2_y
  53. annotation[0, 10] = label[13] # l3_x
  54. annotation[0, 11] = label[14] # l3_y
  55. annotation[0, 12] = label[16] # l4_x
  56. annotation[0, 13] = label[17] # l4_y
  57. if (annotation[0, 4]<0):
  58. annotation[0, 14] = -1
  59. else:
  60. annotation[0, 14] = 1
  61. annotations = np.append(annotations, annotation, axis=0)
  62. target = np.array(annotations)
  63. if self.preproc is not None:
  64. img, target = self.preproc(img, target)
  65. return img, target
  66. def detection_collate(batch):
  67. """Custom collate fn for dealing with batches of images that have a different
  68. number of associated object annotations (bounding boxes).
  69. Arguments:
  70. batch: (tuple) A tuple of tensor images and lists of annotations
  71. Return:
  72. A tuple containing:
  73. 1) (tensor) batch of images stacked on their 0 dim
  74. 2) (list of tensors) annotations for a given image are stacked on 0 dim
  75. """
  76. targets = []
  77. imgs = []
  78. for sample in batch:
  79. imgs.append(sample[0].astype('float32'))
  80. targets.append(sample[1].astype('float32'))
  81. return (np.stack(imgs, 0), targets)
  82. '''
  83. targets = []
  84. imgs = []
  85. for _, sample in enumerate(batch):
  86. for _, tup in enumerate(sample):
  87. if len(tup.shape) == 3:
  88. imgs.append(tup.astype('float32'))
  89. elif len(tup.shape) == 2:
  90. annos = tup.astype('float32')
  91. targets.append(annos)
  92. '''
  93. return (np.stack(imgs, 0), targets)
  94. def make_dataloader(dataset, shuffle=True, batchsize=12, distributed=False, num_workers=0, num_iters=None, start_iter=0, collate_fn=None):
  95. if distributed:
  96. data_sampler=DistributedBatchSampler(dataset, batch_size=batchsize, shuffle=True, drop_last=True)
  97. dataloader = DataLoader(dataset, batch_sampler=data_sampler, num_workers=num_workers, collate_fn=collate_fn)
  98. if not distributed and shuffle:
  99. sampler = RandomSampler(dataset)
  100. batch_sampler = BatchSampler(sampler=sampler, batch_size=batchsize, drop_last=True)
  101. if num_iters is not None:
  102. batch_sampler = IterationBasedBatchSampler(batch_sampler, num_iters, start_iter)
  103. dataloader = DataLoader(dataset=dataset, batch_sampler=batch_sampler, num_workers=num_workers, collate_fn=collate_fn)
  104. else:
  105. sampler = SequenceSampler(dataset)
  106. batch_sampler = BatchSampler(sampler=sampler, batch_size=batchsize, drop_last=True)
  107. if num_iters is not None:
  108. batch_sampler = IterationBasedBatchSampler(batch_sampler, num_iters, start_iter)
  109. dataloader = DataLoader(dataset=dataset, batch_sampler=batch_sampler, num_workers=num_workers, collate_fn=collate_fn)
  110. return dataloader
  111. class IterationBasedBatchSampler(BatchSampler):
  112. """
  113. Wraps a BatchSampler, resampling from it until
  114. a specified number of iterations have been sampled
  115. """
  116. def __init__(self, batch_sampler, num_iterations, start_iter=0):
  117. self.batch_sampler = batch_sampler
  118. self.num_iterations = num_iterations
  119. self.start_iter = start_iter
  120. def __iter__(self):
  121. iteration = self.start_iter
  122. while iteration <= self.num_iterations:
  123. # if the underlying sampler has a set_epoch method, like
  124. # DistributedSampler, used for making each process see
  125. # a different split of the dataset, then set it
  126. if hasattr(self.batch_sampler.sampler, "set_epoch"):
  127. self.batch_sampler.sampler.set_epoch(iteration)
  128. for batch in self.batch_sampler:
  129. iteration += 1
  130. if iteration > self.num_iterations:
  131. break
  132. yield batch
  133. def __len__(self):
  134. return self.num_iterations
  135. #数据增强
  136. def _crop(image, boxes, labels, landm, img_dim):
  137. height, width, _ = image.shape
  138. pad_image_flag = True
  139. for _ in range(250):
  140. """
  141. if random.uniform(0, 1) <= 0.2:
  142. scale = 1.0
  143. else:
  144. scale = random.uniform(0.3, 1.0)
  145. """
  146. PRE_SCALES = [0.3, 0.45, 0.6, 0.8, 1.0]
  147. scale = random.choice(PRE_SCALES)
  148. short_side = min(width, height)
  149. w = int(scale * short_side)
  150. h = w
  151. if width == w:
  152. l = 0
  153. else:
  154. l = random.randrange(width - w)
  155. if height == h:
  156. t = 0
  157. else:
  158. t = random.randrange(height - h)
  159. roi = np.array((l, t, l + w, t + h))
  160. value = matrix_iof(boxes, roi[np.newaxis])
  161. flag = (value >= 1)
  162. if not flag.any():
  163. continue
  164. centers = (boxes[:, :2] + boxes[:, 2:]) / 2
  165. mask_a = np.logical_and(roi[:2] < centers, centers < roi[2:]).all(axis=1)
  166. boxes_t = boxes[mask_a].copy()
  167. labels_t = labels[mask_a].copy()
  168. landms_t = landm[mask_a].copy()
  169. landms_t = landms_t.reshape([-1, 5, 2])
  170. if boxes_t.shape[0] == 0:
  171. continue
  172. image_t = image[roi[1]:roi[3], roi[0]:roi[2]]
  173. boxes_t[:, :2] = np.maximum(boxes_t[:, :2], roi[:2])
  174. boxes_t[:, :2] -= roi[:2]
  175. boxes_t[:, 2:] = np.minimum(boxes_t[:, 2:], roi[2:])
  176. boxes_t[:, 2:] -= roi[:2]
  177. # landm
  178. landms_t[:, :, :2] = landms_t[:, :, :2] - roi[:2]
  179. landms_t[:, :, :2] = np.maximum(landms_t[:, :, :2], np.array([0, 0]))
  180. landms_t[:, :, :2] = np.minimum(landms_t[:, :, :2], roi[2:] - roi[:2])
  181. landms_t = landms_t.reshape([-1, 10])
  182. # make sure that the cropped image contains at least one face > 16 pixel at training image scale
  183. b_w_t = (boxes_t[:, 2] - boxes_t[:, 0] + 1) / w * img_dim
  184. b_h_t = (boxes_t[:, 3] - boxes_t[:, 1] + 1) / h * img_dim
  185. mask_b = np.minimum(b_w_t, b_h_t) > 0.0
  186. boxes_t = boxes_t[mask_b]
  187. labels_t = labels_t[mask_b]
  188. landms_t = landms_t[mask_b]
  189. if boxes_t.shape[0] == 0:
  190. continue
  191. pad_image_flag = False
  192. return image_t, boxes_t, labels_t, landms_t, pad_image_flag
  193. return image, boxes, labels, landm, pad_image_flag
  194. def _distort(image):
  195. def _convert(image, alpha=1, beta=0):
  196. tmp = image.astype(float) * alpha + beta
  197. tmp[tmp < 0] = 0
  198. tmp[tmp > 255] = 255
  199. image[:] = tmp
  200. image = image.copy()
  201. if random.randrange(2):
  202. #brightness distortion
  203. if random.randrange(2):
  204. _convert(image, beta=random.uniform(-32, 32))
  205. #contrast distortion
  206. if random.randrange(2):
  207. _convert(image, alpha=random.uniform(0.5, 1.5))
  208. image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
  209. #saturation distortion
  210. if random.randrange(2):
  211. _convert(image[:, :, 1], alpha=random.uniform(0.5, 1.5))
  212. #hue distortion
  213. if random.randrange(2):
  214. tmp = image[:, :, 0].astype(int) + random.randint(-18, 18)
  215. tmp %= 180
  216. image[:, :, 0] = tmp
  217. image = cv2.cvtColor(image, cv2.COLOR_HSV2BGR)
  218. else:
  219. #brightness distortion
  220. if random.randrange(2):
  221. _convert(image, beta=random.uniform(-32, 32))
  222. image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
  223. #saturation distortion
  224. if random.randrange(2):
  225. _convert(image[:, :, 1], alpha=random.uniform(0.5, 1.5))
  226. #hue distortion
  227. if random.randrange(2):
  228. tmp = image[:, :, 0].astype(int) + random.randint(-18, 18)
  229. tmp %= 180
  230. image[:, :, 0] = tmp
  231. image = cv2.cvtColor(image, cv2.COLOR_HSV2BGR)
  232. #contrast distortion
  233. if random.randrange(2):
  234. _convert(image, alpha=random.uniform(0.5, 1.5))
  235. return image
  236. def _expand(image, boxes, fill, p):
  237. if random.randrange(2):
  238. return image, boxes
  239. height, width, depth = image.shape
  240. scale = random.uniform(1, p)
  241. w = int(scale * width)
  242. h = int(scale * height)
  243. left = random.randint(0, w - width)
  244. top = random.randint(0, h - height)
  245. boxes_t = boxes.copy()
  246. boxes_t[:, :2] += (left, top)
  247. boxes_t[:, 2:] += (left, top)
  248. expand_image = np.empty(
  249. (h, w, depth),
  250. dtype=image.dtype)
  251. expand_image[:, :] = fill
  252. expand_image[top:top + height, left:left + width] = image
  253. image = expand_image
  254. return image, boxes_t
  255. def _mirror(image, boxes, landms):
  256. _, width, _ = image.shape
  257. if random.randrange(2):
  258. image = image[:, ::-1]
  259. boxes = boxes.copy()
  260. boxes[:, 0::2] = width - boxes[:, 2::-2]
  261. # landm
  262. landms = landms.copy()
  263. landms = landms.reshape([-1, 5, 2])
  264. landms[:, :, 0] = width - landms[:, :, 0]
  265. tmp = landms[:, 1, :].copy()
  266. landms[:, 1, :] = landms[:, 0, :]
  267. landms[:, 0, :] = tmp
  268. tmp1 = landms[:, 4, :].copy()
  269. landms[:, 4, :] = landms[:, 3, :]
  270. landms[:, 3, :] = tmp1
  271. landms = landms.reshape([-1, 10])
  272. return image, boxes, landms
  273. def _pad_to_square(image, rgb_mean, pad_image_flag):
  274. if not pad_image_flag:
  275. return image
  276. height, width, _ = image.shape
  277. long_side = max(width, height)
  278. image_t = np.empty((long_side, long_side, 3), dtype=image.dtype)
  279. image_t[:, :] = rgb_mean
  280. image_t[0:0 + height, 0:0 + width] = image
  281. return image_t
  282. def _resize_subtract_mean(image, insize, rgb_mean, rgb_std):
  283. interp_methods = [cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_NEAREST, cv2.INTER_LANCZOS4]
  284. interp_method = interp_methods[random.randrange(5)]
  285. image = cv2.resize(image, (insize, insize), interpolation=interp_method)
  286. image = image.astype(np.float32)
  287. image -= rgb_mean
  288. image /= rgb_std
  289. return image.transpose(2, 0, 1)
  290. class preproc(object):
  291. def __init__(self, img_dim, rgb_means, rgb_stds):
  292. self.img_dim = img_dim
  293. self.rgb_means = rgb_means
  294. self.rgb_stds = rgb_stds
  295. def __call__(self, image, targets):
  296. assert targets.shape[0] > 0, "this image does not have gt"
  297. boxes = targets[:, :4].copy()
  298. labels = targets[:, -1].copy()
  299. landm = targets[:, 4:-1].copy()
  300. image_t, boxes_t, labels_t, landm_t, pad_image_flag = _crop(image, boxes, labels, landm, self.img_dim)
  301. image_t = _distort(image_t)
  302. image_t = _pad_to_square(image_t, self.rgb_means, pad_image_flag)
  303. image_t, boxes_t, landm_t = _mirror(image_t, boxes_t, landm_t)
  304. height, width, _ = image_t.shape
  305. image_t = _resize_subtract_mean(image_t, self.img_dim, self.rgb_means, self.rgb_stds)
  306. boxes_t[:, 0::2] /= width
  307. boxes_t[:, 1::2] /= height
  308. landm_t[:, 0::2] /= width
  309. landm_t[:, 1::2] /= height
  310. labels_t = np.expand_dims(labels_t, 1)
  311. targets_t = np.hstack((boxes_t, landm_t, labels_t))
  312. return image_t, targets_t

6,训练参数设置,迭代

加载数据,构建网络,构建损失函数,训练迭代次数,学习率,优化器设置

In [9]

  1. from __future__ import print_function
  2. import os
  3. import paddle
  4. import paddle.optimizer as optim
  5. import time
  6. import datetime
  7. import math
  8. import random
  9. parser = argparse.ArgumentParser(description='Retinaface Training')
  10. parser.add_argument('--training_dataset', default='data/widerface/train/label.txt', help='Training dataset directory')
  11. parser.add_argument('--network', default='mobile0.25', help='Backbone network mobile0.25 or resnet50')
  12. parser.add_argument('--num_workers', default=0, type=int, help='Number of workers used in dataloading')
  13. parser.add_argument('--lr', '--learning-rate', default=1e-3, type=float, help='initial learning rate')
  14. parser.add_argument('--momentum', default=0.9, type=float, help='momentum')
  15. parser.add_argument('--resume_net', default=None, help='resume net for retraining')
  16. parser.add_argument('--resume_epoch', default=0, type=int, help='resume iter for retraining')
  17. parser.add_argument('--weight_decay', default=5e-4, type=float, help='Weight decay for SGD')
  18. parser.add_argument('--gamma', default=0.1, type=float, help='Gamma update for SGD')
  19. parser.add_argument('--save_folder', default='./test/', help='Location to save checkpoint models')
  20. args = parser.parse_known_args()[0]
  21. rgb_mean = (104, 117, 123) # bgr order
  22. rgb_std = (57.1,57.4,58.4)
  23. num_classes = 2
  24. img_dim = cfg['image_size']
  25. num_gpu = cfg['ngpu']
  26. batch_size = cfg['batch_size']
  27. max_epoch = cfg['epoch']
  28. gpu_train = cfg['gpu_train']
  29. num_workers = args.num_workers
  30. momentum = args.momentum
  31. weight_decay = args.weight_decay
  32. initial_lr = args.lr
  33. gamma = args.gamma
  34. training_dataset = args.training_dataset
  35. save_folder = args.save_folder
  36. net = RetinaFace(cfg=cfg)
  37. print("Printing net...")
  38. print(net)
  39. if args.resume_net is not None:
  40. print('Loading resume network...')
  41. state_dict = paddle.load(args.resume_net)
  42. # create new OrderedDict that does not contain `module.`
  43. from collections import OrderedDict
  44. new_state_dict = OrderedDict()
  45. for k, v in state_dict.items():
  46. head = k[:7]
  47. if head == 'module.':
  48. name = k[7:] # remove `module.`
  49. else:
  50. name = k
  51. new_state_dict[name] = v
  52. net.set_state_dict(new_state_dict)
  53. if num_gpu > 1 and gpu_train:
  54. net = paddle.DataParallel(net)
  55. optimizer = optim.Momentum(parameters=net.parameters(), learning_rate=initial_lr, momentum=momentum, weight_decay=weight_decay)
  56. criterion = MultiBoxLoss(num_classes, 0.35, True, 0, True, 7, 0.35, False)
  57. priorbox = PriorBox(cfg, image_size=(img_dim, img_dim))
  58. with paddle.no_grad():
  59. priors = priorbox.forward()
  60. def train():
  61. net.train()
  62. epoch = 0 + args.resume_epoch
  63. print('Loading Dataset...')
  64. dataset = WiderFaceDetection(training_dataset, preproc(img_dim, rgb_mean, rgb_std))
  65. epoch_size = math.ceil(len(dataset) / batch_size)
  66. stepvalues = (cfg['decay1'] * epoch_size, cfg['decay2'] * epoch_size)
  67. step_index = 0
  68. if args.resume_epoch > 0:
  69. start_iter = args.resume_epoch * epoch_size
  70. else:
  71. start_iter = 0
  72. max_iter = max_epoch * epoch_size - start_iter
  73. batch_iterator = make_dataloader(dataset, shuffle=True, batchsize=batch_size, distributed=False, num_workers=0, num_iters=max_iter, start_iter=0, collate_fn=detection_collate)
  74. iteration = start_iter
  75. for images, labels in batch_iterator:
  76. if iteration % epoch_size == 0:
  77. if (epoch % 5 == 0 and epoch > 0) or (epoch % 5 == 0 and epoch > cfg['decay1']):
  78. paddle.save(net.state_dict(), save_folder + cfg['name']+ '_epoch_' + str(epoch) + '.pdparams')
  79. epoch += 1
  80. load_t0 = time.time()
  81. if iteration in stepvalues:
  82. step_index += 1
  83. lr = adjust_learning_rate(optimizer, gamma, epoch, step_index, iteration, epoch_size)
  84. # forward
  85. out = net(images)
  86. # backprop
  87. loss_l, loss_c, loss_landm = criterion(out, priors, [anno for anno in labels])
  88. loss = cfg['loc_weight'] * loss_l + loss_c + loss_landm
  89. loss.backward()
  90. optimizer.step()
  91. optimizer.clear_gradients()
  92. load_t1 = time.time()
  93. batch_time = load_t1 - load_t0
  94. eta = int(batch_time * (max_iter - iteration))
  95. print('Epoch:{}/{} || Epochiter: {}/{} || Iter: {}/{} || Loc: {:.4f} Cla: {:.4f} Landm: {:.4f} || LR: {:.8f} || Batchtime: {:.4f} s || ETA: {}'
  96. .format(epoch, max_epoch, (iteration % epoch_size) + 1,
  97. epoch_size, iteration + 1, max_iter, loss_l.item(), loss_c.item(), loss_landm.item(), lr, batch_time, str(datetime.timedelta(seconds=eta))))
  98. iteration += 1
  99. paddle.save(net.state_dict(), save_folder + cfg['name'] + '_Final.pdparams')
  100. def adjust_learning_rate(optimizer, gamma, epoch, step_index, iteration, epoch_size):
  101. """Sets the learning rate
  102. # Adapted from PyTorch Imagenet example:
  103. """
  104. warmup_epoch = -1
  105. if epoch <= warmup_epoch:
  106. lr = 1e-6 + (initial_lr-1e-6) * iteration / (epoch_size * warmup_epoch)
  107. else:
  108. lr = initial_lr * (gamma ** (step_index))
  109. optimizer.set_lr(lr)
  110. return lr
  111. if __name__ == '__main__':
  112. train()
Loading Dataset...
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ipykernel_launcher.py:9: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  if __name__ == '__main__':
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ipykernel_launcher.py:152: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
Epoch:1/250 || Epochiter: 1/403 || Iter: 1/100750 || Loc: 5.5039 Cla: 18.9180 Landm: 21.8879 || LR: 0.00100000 || Batchtime: 0.8275 s || ETA: 23:09:30
Epoch:1/250 || Epochiter: 2/403 || Iter: 2/100750 || Loc: 5.5344 Cla: 18.5185 Landm: 21.6857 || LR: 0.00100000 || Batchtime: 0.5799 s || ETA: 16:13:40
Epoch:1/250 || Epochiter: 3/403 || Iter: 3/100750 || Loc: 5.1545 Cla: 16.1501 Landm: 20.6097 || LR: 0.00100000 || Batchtime: 0.3594 s || ETA: 10:03:30
Epoch:1/250 || Epochiter: 4/403 || Iter: 4/100750 || Loc: 5.1112 Cla: 14.2228 Landm: 20.8187 || LR: 0.00100000 || Batchtime: 0.4798 s || ETA: 13:25:35
Epoch:1/250 || Epochiter: 5/403 || Iter: 5/100750 || Loc: 4.9339 Cla: 12.7024 Landm: 20.1817 || LR: 0.00100000 || Batchtime: 0.3922 s || ETA: 10:58:28
Epoch:1/250 || Epochiter: 6/403 || Iter: 6/100750 || Loc: 4.9495 Cla: 12.1982 Landm: 21.1817 || LR: 0.00100000 || Batchtime: 0.4395 s || ETA: 12:17:59
Epoch:1/250 || Epochiter: 7/403 || Iter: 7/100750 || Loc: 4.8757 Cla: 11.8493 Landm: 19.9921 || LR: 0.00100000 || Batchtime: 0.3520 s || ETA: 9:51:00
Epoch:1/250 || Epochiter: 8/403 || Iter: 8/100750 || Loc: 4.7777 Cla: 10.8863 Landm: 20.3447 || LR: 0.00100000 || Batchtime: 0.4070 s || ETA: 11:23:19
Epoch:1/250 || Epochiter: 9/403 || Iter: 9/100750 || Loc: 4.7849 Cla: 8.2906 Landm: 19.6271 || LR: 0.00100000 || Batchtime: 0.5299 s || ETA: 14:49:42
Epoch:1/250 || Epochiter: 10/403 || Iter: 10/100750 || Loc: 4.4964 Cla: 9.6688 Landm: 20.5572 || LR: 0.00100000 || Batchtime: 0.4078 s || ETA: 11:24:44
Epoch:1/250 || Epochiter: 11/403 || Iter: 11/100750 || Loc: 4.8028 Cla: 8.2961 Landm: 19.0736 || LR: 0.00100000 || Batchtime: 0.4680 s || ETA: 13:05:47
Epoch:1/250 || Epochiter: 12/403 || Iter: 12/100750 || Loc: 4.4659 Cla: 8.8183 Landm: 19.5600 || LR: 0.00100000 || Batchtime: 0.4338 s || ETA: 12:08:17
Epoch:1/250 || Epochiter: 13/403 || Iter: 13/100750 || Loc: 4.7572 Cla: 7.3842 Landm: 19.5515 || LR: 0.00100000 || Batchtime: 0.3566 s || ETA: 9:58:45
Epoch:1/250 || Epochiter: 14/403 || Iter: 14/100750 || Loc: 4.4520 Cla: 7.4838 Landm: 19.3934 || LR: 0.00100000 || Batchtime: 0.4300 s || ETA: 12:01:54
Epoch:1/250 || Epochiter: 15/403 || Iter: 15/100750 || Loc: 4.4183 Cla: 7.6439 Landm: 19.8088 || LR: 0.00100000 || Batchtime: 0.3954 s || ETA: 11:03:46
Epoch:1/250 || Epochiter: 16/403 || Iter: 16/100750 || Loc: 4.5246 Cla: 6.6345 Landm: 19.4181 || LR: 0.00100000 || Batchtime: 0.3990 s || ETA: 11:09:51
Epoch:1/250 || Epochiter: 17/403 || Iter: 17/100750 || Loc: 4.5214 Cla: 5.8815 Landm: 18.9220 || LR: 0.00100000 || Batchtime: 0.3920 s || ETA: 10:58:05
Epoch:1/250 || Epochiter: 18/403 || Iter: 18/100750 || Loc: 4.4529 Cla: 6.2802 Landm: 18.9054 || LR: 0.00100000 || Batchtime: 0.4728 s || ETA: 13:13:41
Epoch:1/250 || Epochiter: 19/403 || Iter: 19/100750 || Loc: 4.3458 Cla: 5.9653 Landm: 19.6959 || LR: 0.00100000 || Batchtime: 0.3720 s || ETA: 10:24:32
Epoch:1/250 || Epochiter: 20/403 || Iter: 20/100750 || Loc: 4.3320 Cla: 5.3611 Landm: 18.2379 || LR: 0.00100000 || Batchtime: 0.4461 s || ETA: 12:28:56
Epoch:1/250 || Epochiter: 21/403 || Iter: 21/100750 || Loc: 4.2572 Cla: 5.0751 Landm: 19.0970 || LR: 0.00100000 || Batchtime: 0.4399 s || ETA: 12:18:28
Epoch:1/250 || Epochiter: 22/403 || Iter: 22/100750 || Loc: 4.3705 Cla: 4.8014 Landm: 17.8947 || LR: 0.00100000 || Batchtime: 0.6961 s || ETA: 19:28:33
Epoch:1/250 || Epochiter: 23/403 || Iter: 23/100750 || Loc: 4.2417 Cla: 4.6365 Landm: 18.2330 || LR: 0.00100000 || Batchtime: 0.5477 s || ETA: 15:19:28
Epoch:1/250 || Epochiter: 24/403 || Iter: 24/100750 || Loc: 3.9981 Cla: 4.8140 Landm: 17.9373 || LR: 0.00100000 || Batchtime: 0.4679 s || ETA: 13:05:25
Epoch:1/250 || Epochiter: 25/403 || Iter: 25/100750 || Loc: 4.2646 Cla: 4.2984 Landm: 19.7434 || LR: 0.00100000 || Batchtime: 0.3951 s || ETA: 11:03:14
Epoch:1/250 || Epochiter: 26/403 || Iter: 26/100750 || Loc: 4.3223 Cla: 4.3522 Landm: 18.5542 || LR: 0.00100000 || Batchtime: 0.4015 s || ETA: 11:14:05
Epoch:1/250 || Epochiter: 27/403 || Iter: 27/100750 || Loc: 4.1652 Cla: 4.4780 Landm: 17.2275 || LR: 0.00100000 || Batchtime: 0.3671 s || ETA: 10:16:20
Epoch:1/250 || Epochiter: 28/403 || Iter: 28/100750 || Loc: 4.1778 Cla: 4.2569 Landm: 17.9795 || LR: 0.00100000 || Batchtime: 0.3724 s || ETA: 10:25:06
Epoch:1/250 || Epochiter: 29/403 || Iter: 29/100750 || Loc: 4.2219 Cla: 4.1784 Landm: 18.4573 || LR: 0.00100000 || Batchtime: 0.3764 s || ETA: 10:31:52
Epoch:1/250 || Epochiter: 30/403 || Iter: 30/100750 || Loc: 3.8601 Cla: 4.1411 Landm: 17.5710 || LR: 0.00100000 || Batchtime: 0.4443 s || ETA: 12:25:55
Epoch:1/250 || Epochiter: 31/403 || Iter: 31/100750 || Loc: 4.1699 Cla: 3.9541 Landm: 17.5772 || LR: 0.00100000 || Batchtime: 0.4302 s || ETA: 12:02:06
Epoch:1/250 || Epochiter: 32/403 || Iter: 32/100750 || Loc: 3.9781 Cla: 3.9588 Landm: 17.0395 || LR: 0.00100000 || Batchtime: 0.4304 s || ETA: 12:02:33
Epoch:1/250 || Epochiter: 33/403 || Iter: 33/100750 || Loc: 4.0470 Cla: 3.8377 Landm: 17.3675 || LR: 0.00100000 || Batchtime: 0.4221 s || ETA: 11:48:36
Epoch:1/250 || Epochiter: 34/403 || Iter: 34/100750 || Loc: 3.8755 Cla: 3.9044 Landm: 17.1435 || LR: 0.00100000 || Batchtime: 0.4630 s || ETA: 12:57:07
Epoch:1/250 || Epochiter: 35/403 || Iter: 35/100750 || Loc: 3.7425 Cla: 3.8776 Landm: 16.4029 || LR: 0.00100000 || Batchtime: 0.3946 s || ETA: 11:02:25
Epoch:1/250 || Epochiter: 36/403 || Iter: 36/100750 || Loc: 3.9764 Cla: 3.9707 Landm: 16.8414 || LR: 0.00100000 || Batchtime: 0.5842 s || ETA: 16:20:40
Epoch:1/250 || Epochiter: 37/403 || Iter: 37/100750 || Loc: 3.7441 Cla: 3.8408 Landm: 17.6619 || LR: 0.00100000 || Batchtime: 0.4351 s || ETA: 12:10:19
Epoch:1/250 || Epochiter: 38/403 || Iter: 38/100750 || Loc: 3.9364 Cla: 3.7047 Landm: 18.6355 || LR: 0.00100000 || Batchtime: 0.3682 s || ETA: 10:18:01
Epoch:1/250 || Epochiter: 39/403 || Iter: 39/100750 || Loc: 3.7350 Cla: 3.8814 Landm: 16.8114 || LR: 0.00100000 || Batchtime: 0.3889 s || ETA: 10:52:45
Epoch:1/250 || Epochiter: 40/403 || Iter: 40/100750 || Loc: 3.5163 Cla: 3.8605 Landm: 16.5092 || LR: 0.00100000 || Batchtime: 0.4335 s || ETA: 12:07:37
Epoch:1/250 || Epochiter: 41/403 || Iter: 41/100750 || Loc: 3.7482 Cla: 3.6833 Landm: 16.7456 || LR: 0.00100000 || Batchtime: 0.4363 s || ETA: 12:12:24
Epoch:1/250 || Epochiter: 42/403 || Iter: 42/100750 || Loc: 3.6396 Cla: 3.6600 Landm: 16.9524 || LR: 0.00100000 || Batchtime: 0.3985 s || ETA: 11:08:51
Epoch:1/250 || Epochiter: 43/403 || Iter: 43/100750 || Loc: 3.6277 Cla: 3.5343 Landm: 15.3870 || LR: 0.00100000 || Batchtime: 0.4476 s || ETA: 12:31:15
Epoch:1/250 || Epochiter: 44/403 || Iter: 44/100750 || Loc: 3.6305 Cla: 3.8723 Landm: 15.8916 || LR: 0.00100000 || Batchtime: 0.3889 s || ETA: 10:52:42
Epoch:1/250 || Epochiter: 45/403 || Iter: 45/100750 || Loc: 3.4999 Cla: 3.7035 Landm: 16.0365 || LR: 0.00100000 || Batchtime: 0.3781 s || ETA: 10:34:32
Epoch:1/250 || Epochiter: 46/403 || Iter: 46/100750 || Loc: 3.5424 Cla: 3.6261 Landm: 16.0393 || LR: 0.00100000 || Batchtime: 0.4740 s || ETA: 13:15:38
Epoch:1/250 || Epochiter: 47/403 || Iter: 47/100750 || Loc: 3.7910 Cla: 3.6140 Landm: 16.3321 || LR: 0.00100000 || Batchtime: 0.3976 s || ETA: 11:07:20
Epoch:1/250 || Epochiter: 48/403 || Iter: 48/100750 || Loc: 3.5407 Cla: 3.6659 Landm: 15.9662 || LR: 0.00100000 || Batchtime: 0.3946 s || ETA: 11:02:21
Epoch:1/250 || Epochiter: 49/403 || Iter: 49/100750 || Loc: 3.2723 Cla: 3.5321 Landm: 15.0109 || LR: 0.00100000 || Batchtime: 0.4122 s || ETA: 11:31:50
Epoch:1/250 || Epochiter: 50/403 || Iter: 50/100750 || Loc: 3.6369 Cla: 3.6536 Landm: 16.7837 || LR: 0.00100000 || Batchtime: 0.3930 s || ETA: 10:59:35
Epoch:1/250 || Epochiter: 51/403 || Iter: 51/100750 || Loc: 3.2572 Cla: 3.5005 Landm: 16.94

在AI Studio可直接运行,欢迎Fork  Paddle复现RetinaFace详细解析 - 飞桨AI Studio - 人工智能学习与实训社区

原文链接:https://blog.csdn.net/qq_34106574/article/details/121427978







所属网站分类: 技术文章 > 博客

作者:飞龙出海

链接:https://www.pythonheidong.com/blog/article/1082629/adc472062df60eba2ee3/

来源:python黑洞网

任何形式的转载都请注明出处,如有侵权 一经发现 必将追究其法律责任

27 0
收藏该文
已收藏

评论内容:(最多支持255个字符)