∫ntegrabℓε ∂iﬀerentiαℓs · unorganised notes, code, and writings of random topics

Zhang et al (2024) TinyLlama

January 3, 2026 paper

This paper is a report on training the TinyLlama model, following the Llama 2 architecture but with 1.1B parameters. The objective is to experiment with the idea that a smaller model but trained with more data can match or overperform a larger model. [more]

Gerber (2025) Attention Is Not All You Need

December 2, 2025 paper

This is a paper to investigate the importance of the MLP sublayer in a decoder-only transformer model. In particular, the common design of a two-layer feed-forward network is challenged. Alternatives of 0 to 3 layers of feed-forward networks are compared for the standard error loss. [more]

Tian et al (2019) FCOS

July 8, 2025 paper

This is a paper of a new model for object detection, which the output is a bounding box and a class label. Compared to other object detection models, FCOS does not use predefined anchor boxes. This model is described as “anchor box free and proposal free”. [more]

Digital Camera Noise

July 7, 2025 blog math

How to understand noise in digital camera photos? A common believe is that it is related to ISO. The higher the ISO, the stronger the noise. However, some also said ISO has nothing to do with noise. To be accurate, the only valid statement is that noise is a built-in... [more]

Vasu et al (2023) MobileOne

April 17, 2025 paper code

This is a paper from Apple, targeted to create a backbone network that is fast enough to use on mobile devices. One characteristic of the network is that is decoupled the train-time and inference-time architecture. The trained model will be reparameterized for inference to make it more efficient. [more]