aset400 - Heterogeneous Embedding
Submitted by bmcfee on Wed, 09/16/2009 - 16:55
This data set contains the aset400 artists and subjective similarity measurements from Ellis, et al 1. Also included are 5 kernels derived from text and acoustic features associated with the artists, as described in 2. The variables in the .mat file are as follows:
- Kbio and Ktag are TF-IDF kernels computed from stemmed Last.FM biographies and
- bag_* and dict_* contain the bag-of-words and dictionaries for each type of document.
- T* contain the TF-IDF filtered bags-of-words.
- Kmfcc is a probability product kernel (PPK) over the Gaussian mixture models (contained in models).
- Ksm is a PPK derived from semantic multinomials (SM).
- Kchroma is a KL-divergence pseudo-kernel derived from chroma features.
- names is a dictionary of the names of the artists in question.
- indices.csv contains the list of artist index numbers
- triples.csv contains the similarity triples, one per line, encoded by index number. A line "X,Y,Z" indicates that the similarity between (X,Y) is greater than between (X,Z).
- 1.  (2002). The Quest for Ground Truth in Musical Artist Similarity. ISMIR 2002, 3rd International Conference on Music Information Retrieval.
- 2.  (2009). Heterogeneous Embedding for Subjective Artist Similarity. Tenth International Symposium for Music Information Retrieval (ISMIR).