This paper is a report on training the TinyLlama model, following the Llama 2 architecture but with 1.1B parameters.
The objective is to experiment with the idea that a smaller model but trained with more data can match or overperform a larger model.
[more]
Gerber (2025) Attention Is Not All You Need
This is a paper to investigate the importance of the MLP sublayer in a
decoder-only transformer model. In particular, the common design of a two-layer
feed-forward network is challenged. Alternatives of 0 to 3 layers of
feed-forward networks are compared for the standard error loss.
[more]
Tian et al (2019) FCOS
This is a paper of a new model for object detection, which the output is a
bounding box and a class label. Compared to other object detection models, FCOS
does not use predefined anchor boxes. This model is described as “anchor box
free and proposal free”.
[more]
Digital Camera Noise
How to understand noise in digital camera photos? A common believe is that it is related to ISO. The higher the ISO, the stronger the noise. However, some also said ISO has nothing to do with noise. To be accurate, the only valid statement is that noise is a built-in...
[more]
Vasu et al (2023) MobileOne
This is a paper from Apple, targeted to create a backbone network that is fast enough to use on mobile devices. One characteristic of the network is that is decoupled the train-time and inference-time architecture. The trained model will be reparameterized for inference to make it more efficient.
[more]
