Long Video Retrieval (Background Removed)

6 papers with code • 1 benchmarks • 1 datasets

Retrieve the long videos given all subtitles.

Benchmarks

Add a Result

These leaderboards are used to track progress in Long Video Retrieval (Background Removed)

Trend	Dataset	Best Model	Paper	Code	Compare
	YouCook2	Norton			See all

Datasets

YouCook2

Most implemented papers

Most implemented Social Latest No code

HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips

antoine77340/MIL-NCE_HowTo100M • • ICCV 2019

In this work, we propose instead to learn such embeddings from video data with readily available natural language annotations in the form of automatically transcribed narrations.

Paper
Code

End-to-End Learning of Visual Representations from Uncurated Instructional Videos

antoine77340/MIL-NCE_HowTo100M • • CVPR 2020

Annotating videos is cumbersome, expensive and not scalable.

Paper
Code

VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding

pytorch/fairseq • • EMNLP 2021

We present VideoCLIP, a contrastive approach to pre-train a unified model for zero-shot video and text understanding, without using any labels on downstream tasks.

Paper
Code

Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos

brian7685/Multimodal-Clustering-Network • • ICCV 2021

Multimodal self-supervised learning is getting more and more attention as it allows not only to train large networks without human supervision but also to search and retrieve data across various modalities.

Paper
Code

TempCLR: Temporal Alignment Representation with Contrastive Learning

yyuncong/tempclr • • 28 Dec 2022

For long videos, given a paragraph of description where the sentences describe different segments of the video, by matching all sentence-clip pairs, the paragraph and the full video are aligned implicitly.

Paper
Code

Multi-granularity Correspondence Learning from Long-term Noisy Videos

XLearning-SCU/2024-ICLR-Norton • • 30 Jan 2024

Existing video-language studies mainly focus on learning short video clips, leaving long-term temporal dependencies rarely explored due to over-high computational cost of modeling long videos.

Paper
Code

Long Video Retrieval (Background Removed)

Benchmarks Add a Result

Datasets

Most implemented papers

HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips

End-to-End Learning of Visual Representations from Uncurated Instructional Videos

VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding

Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos

TempCLR: Temporal Alignment Representation with Contrastive Learning

Multi-granularity Correspondence Learning from Long-term Noisy Videos

Content

Benchmarks

Add a Result