Video-Adverb Retrieval (Unseen Compositions)
2 papers with code • 3 benchmarks • 3 datasets
The task aims to recognize adverbs beyond seen adverb-action compositions, i.e. compositions that were not seen during training.
Most implemented papers
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs
We aim to understand how actions are performed and identify subtle differences, such as 'fold firmly' vs. 'fold gently'.
Video-adverb retrieval with compositional adverb-action embeddings
We propose a framework for video-to-adverb retrieval (and vice versa) that aligns video embeddings with their matching compositional adverb-action text embedding in a joint embedding space.