Tian et al (2019) FCOS

This is a paper of a new model for object detection, which the output is a bounding box and a class label. Compared to other object detection models, FCOS does not use predefined anchor boxes. This model is described as “anchor box free and proposal free”. [more]

Digital Camera Noise

How to understand noise in digital camera photos? A common believe is that it is related to ISO. The higher the ISO, the stronger the noise. However, some also said ISO has nothing to do with noise. To be accurate, the only valid statement is that noise is a built-in... [more]

Vasu et al (2023) MobileOne

This is a paper from Apple, targeted to create a backbone network that is fast enough to use on mobile devices. One characteristic of the network is that is decoupled the train-time and inference-time architecture. The trained model will be reparameterized for inference to make it more efficient. [more]

Kirillov et al (2023) Segment Anything

The goal of the paper is to produce a promptable segmentation model. It is a model that supports flexible prompting and can output segmentation masks in real time. For any given segmentation prompt (e.g., a point in the image, and may be ambiguous), the model is expected to return a... [more]

Dosovitskiy et al (2021) An Image is Worth 16x16 Words

This is the paper that introduced the Vision Transformer (ViT), which proposed that transformers can be used for image classification replacing CNNs. Inspired by the success of transformer models in NLP, this paper explored the technique of using transformers to process 2D image data. The goal is to create a... [more]