This document summarizes research on using deep convolutional neural networks for automatic music tagging. It describes the problem of automatic tagging, proposed architectures using convolutional and max pooling layers, and experiments on two datasets. The experiments showed that melgram representations with 4 convolutional layers achieved the best results, and deeper models did not significantly improve performance. Re-running the experiments on the MSD dataset with proper hyperparameter tuning yielded improved results over those originally reported.
Related topics: