aset400 - Heterogeneous Embedding

Number of points: 
412
Number of kernels: 
5

This data set contains the aset400 artists and subjective similarity measurements from Ellis, et al [1].  Also included are 5 kernels derived from text and acoustic features associated with the artists, as described in [2].
The variables in the .mat file are as follows:

  • Kbio and Ktag are TF-IDF kernels computed from stemmed Last.FM biographies and
    top-100 tags. 
    • bag_* and dict_* contain the bag-of-words and dictionaries for
      each type of document. 
    • T* contain the TF-IDF filtered bags-of-words.
  • Kmfcc is a probability product kernel (PPK) over the Gaussian mixture models
    (contained in models).
  • Ksm is a PPK derived from semantic multinomials (SM).
  • Kchroma is a KL-divergence pseudo-kernel derived from chroma features.
  • names is a dictionary of the names of the artists in question.

Similarity measurements are derived from aset400, and encoded as follows:

  • indices.csv contains the list of artist index numbers
  • triples.csv contains the similarity triples, one per line, encoded by index number.  A line "X,Y,Z" indicates that the similarity between (X,Y) is greater than between (X,Z).


References

  1. [18] The Quest for Ground Truth in Musical Artist Similarity, Ellis, Daniel P. W., Whitman Brian, Berenzweig Adam, and Lawrence Steve , ISMIR 2002, 3rd International Conference on Music Information Retrieval, 10/2002, Paris, France, (2002)
  2. [17] Heterogeneous Embedding for Subjective Artist Similarity, McFee, B., and Lanckriet Gert , Tenth International Symposium for Music Information Retrieval (ISMIR), 10/2009, Kobe, Japan, (2009)