aset400 - Heterogeneous Embedding
Submitted by bmcfee on Wed, 09/16/2009 - 16:55
This data set contains the aset400 artists and subjective similarity measurements from Ellis, et al [1]. Also included are 5 kernels derived from text and acoustic features associated with the artists, as described in [2].
The variables in the .mat file are as follows:
- Kbio and Ktag are TF-IDF kernels computed from stemmed Last.FM biographies and
top-100 tags.- bag_* and dict_* contain the bag-of-words and dictionaries for
each type of document. - T* contain the TF-IDF filtered bags-of-words.
- bag_* and dict_* contain the bag-of-words and dictionaries for
- Kmfcc is a probability product kernel (PPK) over the Gaussian mixture models
(contained in models). - Ksm is a PPK derived from semantic multinomials (SM).
- Kchroma is a KL-divergence pseudo-kernel derived from chroma features.
- names is a dictionary of the names of the artists in question.
Similarity measurements are derived from aset400, and encoded as follows:
- indices.csv contains the list of artist index numbers
- triples.csv contains the similarity triples, one per line, encoded by index number. A line "X,Y,Z" indicates that the similarity between (X,Y) is greater than between (X,Z).
References
- [18] The Quest for Ground Truth in Musical Artist Similarity, Ellis, Daniel P. W., Whitman Brian, Berenzweig Adam, and Lawrence Steve , ISMIR 2002, 3rd International Conference on Music Information Retrieval, 10/2002, Paris, France, (2002)
- [17] Heterogeneous Embedding for Subjective Artist Similarity, McFee, B., and Lanckriet Gert , Tenth International Symposium for Music Information Retrieval (ISMIR), 10/2009, Kobe, Japan, (2009)