Carion et al (2020) End-to-End Object Detection with Transformers

Object detection is to predict the bounding boxes and category labels for each object of interest. This paper proposed DETR (Detection Transformer) to predict all objects at once, trained end-to-end with a set loss function to perform bipartite matching between the predicted and groundtruth. It is found to perform better... [more]

Liu et al (2016) SSD: Single Shot MultiBox Detector

This paper distinct from previous work in the sense that the older approach of object detection first hypothesize bounding boxes, resample features for each box, then apply a classifier. This paper proposed a network that does not resample for bounding box hypotheses but equally accurate. It can do high speed... [more]

Wang et al (2020) HRNet

Classification networks such as AlexNet, VGGNet, GoogLeNet, ResNet are all reducing spatial size and produce a low-resolution representation. High-resolution representation is produced in U-net, for example, using dilated convolution and upsampling. This paper proposes HRNet that maintain high-resolution representation through the whole process. It starts from high resolution convolution stream... [more]