The MSRVTT-MC (Multiple Choice) dataset is a video question-answering dataset created based on the MSR-VTT dataset. It consists of 2,990 questions generated from 10,000 video clips with associated ground truth captions. For each question, there are five candidate captions, including the ground truth caption and four randomly sampled negative choices. The objective of the dataset is to choose the correct answer from the five candidate captions.
Paper | Code | Results | Date | Stars |
---|