Cantopop id3v2 tagger using scikit-learn and Keras

A previous post on parsing Cantopop titles by machine learning shows how different machine learning techniques do to identify song title and artist names from a short string. As it turns out, the simple neural network model works nicely and we can make use of it to build a tool. I need such tool occasionally to grab Cantopop music into MP3 format to play in my offline jukebox (i.e., the stereo in my car) as the ID3v2 tags are what it needs to display on the control panel.

What I need is simple. The workflow:

Download a MP3 using youtube_dl, the file downloaded will be named as the title string in YouTube
Parse that string to figure out the artist name and song title
Set them as ID3v2 tag, using mutagen

Step 1 can be done in a shell script. Step 3 is easy (UTF-8 encoding used):

mp3 = mutagen.id3.ID3(filename)
mp3.add(mutagen.id3.TPE1(encoding=3, text=artist))
mp3.add(mutagen.id3.TIT2(encoding=3, text=title))
mp3.save()

And step 2, as explained in the previous post, can be done using scikit-learn. We focus on multilayer perceptron models as a multilabel classifier here – and the simplest one of 3 layers. The code to train one and save the model would be as easy as this:

import pickle
from sklearn.neural_network import MLPClassifier

X = dataframe[incol]    # numeric features
y = dataframe['label']  # text labels
clf = MLPClassifier(alpha=0.01, max_iter=1000, activation='logistic')
clf.fit(X, y)
with open("mlp-trained.pickle", "wb") as fp:
    pickle.dump(clf, fp)

and we can reuse the model by loading up a pickle file:

clf = pickle.load(open("mlp-trained.pickle", "rb"))
dataframe['label'] = clf.predict(dataframe[incol])

We can then based on the label to identify the artist and song title substring from the input.

Scikit-learn is not the only tool to provide MLP models. In fact, it is suitable only for simple MLP models like this one. A more complex model would need a specialized engine such as tensorflow. So I tried out a different implementation based on Keras with tensorflow backend. However, as Keras is more flexible, its code will be not as simple as using scikit-learn. The training code will looks like this:

import pickle

import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import SGD, Adam

# size of input
input_dim = len(incol)

# Create model: 3 layers with 3 output, 100 neurons with signmoid activation in
# the hidden layer
model = Sequential()
model.add(Dense(100, input_dim=input_dim, activation='sigmoid'))
model.add(Dense(3, activation="softmax"))
adam = Adam(learning_rate=0.01, beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(optimizer=adam,
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model, report performance, and save
label = ["a", "t", "x"]
X = dataframe[incol]
y = np.array([[1 if x==y else 0 for y in label] for x in dataframe['label']])
clf.fit(X, y, epochs=1000, batch_size=128)

score = clf.evaluate(X, y, batch_size=128)
print(dict(zip(clf.metrics_names, score)))

clf.save("keras-trained.h5")

Keras does not suggest to use pickle to save the model (I bet it is because of pickle doesn’t deepcopy the model) but instead, use its save() function to save the structure and weights into a HDF5 file. While I mentioned to use 1000 epochs to train the MLP, it seems the performance is good enough at 100 epochs. However, as the model is small, the total training time is still fast enough.

Also note that, Keras does not support text label as the output. We have to convert the output into one-hot encoding. A textbook way to do this is to use function keras.utils.to_categorical() to encode labels but we manually do the conversion here to enforce the label order used.

Once we have the model, to use it:

clf = load_model("keras-trained.h5")

label = ["a", "t", "x"]
X = dataframe[incol]
y = np.array([[1 if x==y else 0 for y in label] for x in dataframe['label']])
clf.fit(X, y, epochs=1000, batch_size=128)

score = clf.evaluate(X, y, batch_size=128)
print(dict(zip(clf.metrics_names, score)))

clf.save("keras-trained.h5")

Once we have the model, to use it:

import numpy as np
from keras.models import load_model

clf = load_model("keras-trained.h5")

label = ["a", "t", "x"]
dataframe['label'] = [label[np.argmax(x)] for x in clf.predict(dataframe[incol])]

Again, we cannot have text label output so we have to convert the one-hot encoded output into text label. The output of the MLP classifier is indeed a vector of floats which can be anything between 0 and 1. We may want to convert the output into binary for some other purposes:

binary_result = np.array([[1 if y>0.5 else 0 for y in x] for x in clf.predict(dataframe[incol])])

but here, it is sufficient for us to simply use np.argmax() to find the max position without looking at the exact value produced by the classifier.

The code is in this repository, where a usable command line tool is resided.