Long Video Retrieval (Background Removed)
6 papers with code • 1 benchmarks • 1 datasets
Retrieve the long videos given all subtitles.
Most implemented papers
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
In this work, we propose instead to learn such embeddings from video data with readily available natural language annotations in the form of automatically transcribed narrations.
End-to-End Learning of Visual Representations from Uncurated Instructional Videos
Annotating videos is cumbersome, expensive and not scalable.
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
We present VideoCLIP, a contrastive approach to pre-train a unified model for zero-shot video and text understanding, without using any labels on downstream tasks.
Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos
Multimodal self-supervised learning is getting more and more attention as it allows not only to train large networks without human supervision but also to search and retrieve data across various modalities.
TempCLR: Temporal Alignment Representation with Contrastive Learning
For long videos, given a paragraph of description where the sentences describe different segments of the video, by matching all sentence-clip pairs, the paragraph and the full video are aligned implicitly.
Multi-granularity Correspondence Learning from Long-term Noisy Videos
Existing video-language studies mainly focus on learning short video clips, leaving long-term temporal dependencies rarely explored due to over-high computational cost of modeling long videos.