GiST is implemented in PostgreSQL and found to have exceptional performance.
This is the paper behind it.
[more]
Cantopop id3v2 tagger using scikit-learn and Keras
A previous post on parsing Cantopop titles by machine learning shows how different machine learning techniques do to identify song title and artist names from a short string. As it turns out, the simple neural network model works nicely and we can make use of it to build a tool....
[more]
VSM for text normalization
In many occasions we have textual labels in structured data. The case we
considered in this paper is the industry designations to companies. While there
are standard to govern the industry designation, its use is found to be
arbitrary.
[more]
Pimbley (2016) Better measurements for CLO equity performance
A short paper arguing how to evaluate CLO equity performance. As the equity tranche of a CLO has a maturity date, we can consider that as a bond with indeterministic coupon. So it is natural to use IRR as a measure for equity. IRR is the solution of \(r\) in...
[more]
Numbers for machine learning
How much data is enough? This was the question for any statistical exercise, such as experiments, simulations, surveys. But nowadays, this is also the question for machine learning.
[more]