Wang et al (2020) HRNet

Classification networks such as AlexNet, VGGNet, GoogLeNet, ResNet are all reducing spatial size and produce a low-resolution representation. High-resolution representation is produced in U-net, for example, using dilated convolution and upsampling. This paper proposes HRNet that maintain high-resolution representation through the whole process. It starts from high resolution convolution stream... [more]

Wei et al (2016) Convolutional Pose Machine

This paper proposes Convolutional Pose Machines (CPMs), which is a computer vision deep learning model to identify human poses in the form of keypoints. The output of the model are 2D belief maps, i.e., a heatmap of the predicted probability of the location of a keypoint. The architecture of the... [more]

Lin et al (2017) Focal Loss for Dense Object Detection

This is the paper proposed RetinaNet and also the focal loss function to better train object detection models. Object detection models are in two camps, there are two-stage proposal-driven models such as R-CNN, and one-stage detector such that YOLO and SSD. The paper claimed that the prior result on one-stage... [more]

Lin et al (2017) Feature Pyramid Networks for Object Detection

This paper proposed feature pyramid network to find scale invariant object detection, i.e., a model that can detect objects of different scales. One way to tackle scale invariant problem is to form an image pyramid of different scale and process each with the same model. This is a brute-force approach.... [more]