Zhang et al (2024) TinyLlama

This paper is a report on training the TinyLlama model, following the Llama 2 architecture but with 1.1B parameters. The objective is to experiment with the idea that a smaller model but trained with more data can match or overperform a larger model. [more]

Gerber (2025) Attention Is Not All You Need

This is a paper to investigate the importance of the MLP sublayer in a decoder-only transformer model. In particular, the common design of a two-layer feed-forward network is challenged. Alternatives of 0 to 3 layers of feed-forward networks are compared for the standard error loss. [more]

Tian et al (2019) FCOS

This is a paper of a new model for object detection, which the output is a bounding box and a class label. Compared to other object detection models, FCOS does not use predefined anchor boxes. This model is described as “anchor box free and proposal free”. [more]

Digital Camera Noise

How to understand noise in digital camera photos? A common believe is that it is related to ISO. The higher the ISO, the stronger the noise. However, some also said ISO has nothing to do with noise. To be accurate, the only valid statement is that noise is a built-in... [more]

Vasu et al (2023) MobileOne

This is a paper from Apple, targeted to create a backbone network that is fast enough to use on mobile devices. One characteristic of the network is that is decoupled the train-time and inference-time architecture. The trained model will be reparameterized for inference to make it more efficient. [more]