∫ntegrabℓε ∂iﬀerentiαℓs · unorganised notes, code, and writings of random topics

Hellerstein et al (1995) Generalized Search Trees for Database Systems

December 2, 2019 paper

GiST is implemented in PostgreSQL and found to have exceptional performance. This is the paper behind it. [more]

Cantopop id3v2 tagger using scikit-learn and Keras

November 25, 2019 blog code

A previous post on parsing Cantopop titles by machine learning shows how different machine learning techniques do to identify song title and artist names from a short string. As it turns out, the simple neural network model works nicely and we can make use of it to build a tool.... [more]

VSM for text normalization

November 22, 2019 blog code

In many occasions we have textual labels in structured data. The case we considered in this paper is the industry designations to companies. While there are standard to govern the industry designation, its use is found to be arbitrary. [more]

Pimbley (2016) Better measurements for CLO equity performance

November 20, 2019 paper

A short paper arguing how to evaluate CLO equity performance. As the equity tranche of a CLO has a maturity date, we can consider that as a bond with indeterministic coupon. So it is natural to use IRR as a measure for equity. IRR is the solution of \(r\) in... [more]

Numbers for machine learning

November 6, 2019 blog code

How much data is enough? This was the question for any statistical exercise, such as experiments, simulations, surveys. But nowadays, this is also the question for machine learning. [more]