CAL500 (Computer Audition Lab 500) is a dataset aimed for evaluation of music information retrieval systems. It consists of 502 songs picked from western popular music. The audio is represented as a time series of the first 13 Mel-frequency cepstral coefficients (and their first and second derivatives) extracted by sliding a 12 ms half-overlapping short-time window over the waveform of each song. Each song has been annotated by at least 3 people with 135 musically-relevant concepts spanning six semantic categories:
20 PAPERS • NO BENCHMARKS YET
CHiME-Home is a dataset for sound source recognition in a domestic environment. It uses around 6.8 hours of domestic environment audio recordings. The recordings were obtained from the CHiME projects – computational hearing in multisource environments – where recording equipment was positioned inside an English Victorian semi-detached house. The recordings were selected from 22 sessions totalling 19.5 hours, with each session made between 7:30 in the morning and 20:00 in the evening. In the considered recordings, the equipment was placed in the lounge (sitting room) near the door opening onto a hallway, with the hallway opening onto a kitchen with no door. With the lounge door typically open, prominent sounds thus may originate from sources both in the lounge and kitchen.
4 PAPERS • NO BENCHMARKS YET
For each dataset we provide a short description as well as some characterization metrics. It includes the number of instances (m), number of attributes (d), number of labels (q), cardinality (Card), density (Dens), diversity (Div), average Imbalance Ratio per label (avgIR), ratio of unconditionally dependent label pairs by chi-square test (rDep) and complexity, defined as m × q × d as in [Read 2010]. Cardinality measures the average number of labels associated with each instance, and density is defined as cardinality divided by the number of labels. Diversity represents the percentage of labelsets present in the dataset divided by the number of possible labelsets. The avgIR measures the average degree of imbalance of all labels, the greater avgIR, the greater the imbalance of the dataset. Finally, rDep measures the proportion of pairs of labels that are dependent at 99% confidence. A broader description of all the characterization metrics and the used partition methods are described in
Moviescope is a large-scale dataset of 5,000 movies with corresponding video trailers, posters, plots and metadata. Moviescope is based on the IMDB 5000 dataset consisting of 5.043 movie records. It is augmented by crawling video trailers associated with each movie from YouTube and text plots from Wikipedia.
3 PAPERS • NO BENCHMARKS YET