aset400 - Heterogeneous Embedding

Format: 
Matlab
Number of points: 
412
Number of kernels: 
5
Tasks: 
Binary
Performance: 
0.79
Measure: 
Accuracy
This data set contains the aset400 artists and subjective similarity measurements from Ellis, et al 1.  Also included are 5 kernels derived from text and acoustic features associated with the artists, as described in 2. The variables in the .mat file are as follows:
  • Kbio and Ktag are TF-IDF kernels computed from stemmed Last.FM biographies and top-100 tags. 
    • bag_* and dict_* contain the bag-of-words and dictionaries for each type of document. 
    • T* contain the TF-IDF filtered bags-of-words.
  • Kmfcc is a probability product kernel (PPK) over the Gaussian mixture models (contained in models).
  • Ksm is a PPK derived from semantic multinomials (SM).
  • Kchroma is a KL-divergence pseudo-kernel derived from chroma features.
  • names is a dictionary of the names of the artists in question.
Similarity measurements are derived from aset400, and encoded as follows:
  • indices.csv contains the list of artist index numbers
  • triples.csv contains the similarity triples, one per line, encoded by index number.  A line "X,Y,Z" indicates that the similarity between (X,Y) is greater than between (X,Z).