关键点检测一：HRNet数据预处理（MPII）

关键点检测⼀：HRNet数据预处理（MPII）

前⾔

最近在做考场⾏为分析的⼀个项⽬，其中我负责的是使⽤关键点检测算法来进⾏考⽣异常⾏为检测。之前只接触过分类算法，写与看的代码也只限于分类任务。⽽检测任务⼯程量太⼤，因此在看官⽅源码时⾮常的吃⼒，因此希望写博客来记录⼀下。

HRNet源码

在这⾥想提⼀句，HRNet是由中科⼤的团队提出的，我在看HRNet的源码时觉得读的很享受，或许是中国⼈懂中国⼈吧，不仅是论⽂原⽂还是源码，读的过程中发现遣词造句与代码逻辑都⾮常符合中国⼈的思维。

数据预处理

其实每⼀份开源代码，⼤多数情况下与模型相关的代码很少，⼤部分代码其实是在进⾏数据预处理和train、evaluate、save模型，因此HRNet源码读的过程中，model相关代码读的⽐较顺利，毕竟原⽂把思路也写的很清楚明了，所以本⽂也不想再分析model，⽽是从数据预处理开始。

郑晓江⽬录位置

数据预处理代码位于lib/dataset，关键点检测主要的数据集是MPII和COCO，本⽂以MPII数据集为例说明关键点检测常⽤的预处理⽅法和注意事项。不⽤COCO是因为⽤了COCO很多API，不如MPII简单。

源码分析

mpii和coco⽂件的主类都是继承JointsDataset.py⾥的JointsDataset类，因此先从JointsDataset类开始。

JointsDataset类

__init__

定义了很多相关的参数，这些参数来⾃lib/config⽂件夹下三个参数配置⽂件

_get_db

这个函数是将所有的标注信息格式化后返回。由于mpii和coco数据集的标注格式不⼀样，因此⽆法⽤统⼀的代码进⾏读取。本⽂就是

以mpii为例，因此为了便于分析，在JointsDataset类分析的地⽅直接把mpii继承之后实现的部分放过来。

# 该部分代码在`lib/dataset/mpii`

def_get_db(self):

# create train/val split

# 请先按照Readme将数据准备好

阿比丹艾山

file_name = os.path.join(

<,'annot', self.image_set+'.json'

)

with open(file_name)as anno_file:

anno = json.load(anno_file)

gt_db =[]

for a in anno:

image_name = a['image']

# mpii标注中的center和scale是指：

# H * W的原图像中，bbox的框原来应该是四个坐标确定，这⾥是⽤center和scale两个值来表⽰

# bbox的center即为center，⽽bbox在mpii中默认是正⽅形，边长（宽） = scale * 200，这个200是官⽅定的

阳光聊天室c = np.array(a['center'], dtype=np.float)

s = np.array([a['scale'], a['scale']], dtype=np.float)

劳思光# Adjust center/scale slightly to avoid cropping limbs

# 因为mpii直接默认bbox为正⽅形，因此可能真正的bbox是矩形，调成正⽅形后可能会把⼈体某些部分给裁掉，所以直接把正⽅形扩⼤

if c[0]!=-1:

c[1]= c[1]+15* s[1]

s = s *1.25

# MPII uses matlab format, index is based 1,

# we should first convert to 0-based index

c = c -1

# ⽤到的都只有前两维

joints_3d = np.zeros((self.num_joints,3), dtype=np.float)

joints_3d_vis = np.zeros((self.num_joints,3), dtype=np.float)

if self.image_set !='test':

joints = np.array(a['joints'])

joints[:,0:2]= joints[:,0:2]-1

joints_vis = np.array(a['joints_vis'])

assert len(joints)== self.num_joints, \

'joint num diff: {} vs {}'.format(len(joints),

self.num_joints)

joints_3d[:,0:2]= joints[:,0:2]

joints_3d_vis[:,0]= joints_vis[:]

joints_3d_vis[:,1]= joints_vis[:]

image_dir ='images.zip@'if self.data_format =='zip'else'images'

gt_db.append(

{

'image': os.path., image_dir, image_name),

'center': c,

'scale': s,

'joints_3d': joints_3d,

'joints_3d_vis': joints_3d_vis,

'filename':'',

'imgnum':0,

}

)

return gt_db

half_body_transform

地统计学这个函数我觉得主要是⽤来数据增强的时候使⽤，也就是说，并不是所有的数据都是全⾝的关节，为了增强模型的鲁棒性，也应当适当加⼀些半⾝的图像进⾏训练。

def half_body_transform(self, joints, joints_vis):

# ⾸先获得上半⾝和下半⾝的关节id，这些关节必须都是可见的

upper_joints =[]

lower_joints =[]

for joint_id in range(self.num_joints):

if joints_vis[joint_id][0]>0:# 这些关节必须都是可见的

if joint_id in self.upper_body_ids:

upper_joints.append(joints[joint_id])

else:

lower_joints.append(joints[joint_id])

# 根据概率决定是上半⾝还是下半⾝

if np.random.randn()<0.5and len(upper_joints)>2:

selected_joints = upper_joints

else:

selected_joints = lower_joints \

if len(lower_joints)>2else upper_joints

if len(selected_joints)<2:

return None,None

selected_joints = np.array(selected_joints, dtype=np.float32)

center = an(axis=0)[:2]# 计算选出来的关节的坐标中⼼

# 通过右下与左上得到半⾝区域的宽和⾼来得到scale

left_top = np.amin(selected_joints, axis=0)

right_bottom = np.amax(selected_joints, axis=0)

w = right_bottom[0]- left_top[0]

h = right_bottom[1]- left_top[1]

# 保证是正⽅形

if w > self.aspect_ratio * h:

h = w *1.0/ self.aspect_ratio

elif w < self.aspect_ratio * h:

w = h * self.aspect_ratio

scale = np.array(

[

w *1.0/ self.pixel_std,

h *1.0/ self.pixel_std

dtype=np.float32

)

# 适当放⼤，避免裁剪到⼈

scale = scale *1.5

return center, scale

__getitem__

def__getitem__(self, idx):

db_rec = copy.deepcopy(self.db[idx])

# 读idx图像及其标注信息

image_file = db_rec['image']

filename = db_rec['filename']if'filename'in db_rec else''

imgnum = db_rec['imgnum']if'imgnum'in db_rec else''

if self.data_format =='zip':

from utils import zipreader

data_numpy = zipreader.imread(

image_file, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION )

else:

data_numpy = cv2.imread(

image_file, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION )

)

lor_rgb:

data_numpy = cv2.cvtColor(data_numpy, cv2.COLOR_BGR2RGB)

if data_numpy is None:

<('=> fail to read {}'.format(image_file))

raise ValueError('Fail to read {}'.format(image_file))

joints = db_rec['joints_3d']

joints_vis = db_rec['joints_3d_vis']

c = db_rec['center']

s = db_rec['scale']

score = db_rec['score']if'score'in db_rec else1

r =0

# 训练则需要数据增强：flip和rotate

if self.is_train:

# 是否⽤半⾝

if(np.sum(joints_vis[:,0])> self.num_joints_half_body # = 8

国家杜马and np.random.rand()< self.prob_half_body):# = 0.0

c_half_body, s_half_body = self.half_body_transform(

joints, joints_vis

)

if c_half_body is not None and s_half_body is not None:

c, s = c_half_body, s_half_body

sf = self.scale_factor # 0.25

rf = ation_factor # 30

s = s * np.clip(np.random.randn()*sf +1,1- sf,1+ sf)# 0.75 - 1.25

r = np.clip(np.random.randn()*rf,-rf*2, rf*2) \

if random.random()<=0.6else0# 0 / -60 - 60

if self.flip and random.random()<=0.5:# ⽔平翻转

data_numpy = data_numpy[:,::-1,:]#原图像的w⽅向翻转

joints, joints_vis = fliplr_joints(

joints, joints_vis, data_numpy.shape[1], self.flip_pairs)# 该函数在`lib/utils/transforms.py，把标注坐标进⾏翻转 c[0]= data_numpy.shape[1]- c[0]-1

trans = get_affine_transform(c, s, r, self.image_size)# 把原bbox先缩放到image_size，再按box中⼼旋转r°

input= cv2.warpAffine(

data_numpy,

trans,

(int(self.image_size[0]),int(self.image_size[1])),

flags=cv2.INTER_LINEAR)

ansform:

input= ansform(input)

for i in range(self.num_joints):

if joints_vis[i,0]>0.0:

joints[i,0:2]= affine_transform(joints[i,0:2], trans)# 对原图的transform都要记得把对应的标注也要transform

target, target_weight = ate_target(joints, joints_vis)

target = torch.from_numpy(target)

target_weight = torch.from_numpy(target_weight)

meta ={

'image': image_file,

'filename': filename,

'imgnum': imgnum,

'joints': joints,

'joints_vis': joints_vis,

'center': c,

'scale': s,

'rotation': r,

'score': score

}

return input, target, target_weight, meta

def fliplr_joints(joints, joints_vis, width, matched_parts):

# Flip horizontal

joints[:,0]= width - joints[:,0]-1# x坐标变为 w - x - 1

# Change left-right parts

for pair in matched_parts:

joints[pair[0],:], joints[pair[1],:]= \

joints[pair[1],:], joints[pair[0],:].copy()

joints_vis[pair[0],:], joints_vis[pair[1],:]= \

joints_vis[pair[1],:], joints_vis[pair[0],:].copy()

return joints*joints_vis, joints_vis # flip后的joint为什么还有和vis相乘我还是没搞懂

get_affine_transform

源码的这个函数我真的看不懂，于是我把stacked hourglass network源码⾥进⾏缩放和旋转的部分代替了源码的这个函数，发现两种⽅法对图像的效果是⼀样的，所以下⾯我说明的是stacked hourglass network源码⾥的做法。这个函数我也看了特别久，原因在于之前我对仿射变换了解很少，所以建议先学习⼀下仿射变换以及常见的仿射变换矩阵再来看这个函数就会简单得多。

def get_affine_transform(center, scale, res, rot=0):

# Generate transformation matrix

# ⾸先是缩放到res尺⼨

# 缩放矩阵本来应该就是[[W,0][0,H]]，但是为什么还有第三⾏和第三列那两个数我想了很久才想明⽩

h =200* scale[0]

t = np.zeros((3,3))

t[0,0]=float(res[1])/ h

t[1,1]=float(res[0])/ h

t[0,2]= res[1]*(-float(center[0])/ h +.5)# 把中⼼变到原点

t[1,2]= res[0]*(-float(center[1])/ h +.5)# 把中⼼变到原点

t[2,2]=1

if not rot ==0:

rot =-rot # To match direction of rotation from cropping

rot_mat = np.zeros((3,3))

rot_rad = rot * np.pi /180

sn,cs = np.sin(rot_rad), np.cos(rot_rad)

rot_mat[0,:2]=[cs,-sn]

rot_mat[1,:2]=[sn, cs]

rot_mat[2,2]=1

# Need to rotate around center

t_mat = np.eye(3)

t_mat[0,2]=-res[1]/2

t_mat[1,2]=-res[0]/2

t_inv = py()

t_inv[:2,2]*=-1

t = np.dot(t_inv,np.dot(rot_mat,np.dot(t_mat,t)))

t = np.dot(rot_mat, np.dot(t_mat, t))

return t

为了更好的展⽰每个设置的作⽤，我⾸先把下⾯这两⾏注释掉并且把r = 0，结果如下图所⽰，左边是注释前的，右边是注释后的。区别在于中⼼点的位置。

本文发布于:2024-09-23 11:23:50，感谢您对本站的认可！

本文链接：https://www.17tex.com/xueshu/337623.html

上一篇：ztxtz？SilverFastAi1

下一篇：新概念英语第一册课文101-120

标签：数据代码源码检测标注

留言与评论（共有 0 条评论）