Cantopop id3v2 tagger using scikit-learn and Keras

A previous post on parsing Cantopop titles by machine learning shows how different machine learning techniques do to identify song title and artist names from a short string. As it turns out, the simple neural network model works nicely and we can make use of it to build a tool.... [more]

VSM for text normalization

In many occasions we have textual labels in structured data. The case we considered in this paper is the industry designations to companies. While there are standard to govern the industry designation, its use is found to be arbitrary. [more]

Numbers for machine learning

How much data is enough? This was the question for any statistical exercise, such as experiments, simulations, surveys. But nowadays, this is also the question for machine learning. [more]