LRW (Lip Reading in the Wild)

Introduced by Joon Son Chung et al. in Lip Reading in the Wild

The Lip Reading in the Wild (LRW) dataset a large-scale audio-visual database that contains 500 different words from over 1,000 speakers. Each utterance has 29 frames, whose boundary is centered around the target word. The database is divided into training, validation and test sets. The training set contains at least 800 utterances for each class while the validation and test sets contain 50 utterances.

Source: Towards Pose-invariant Lip-Reading

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Lipreading	Lip Reading in the Wild	3D Conv + ResNet-18 + DC-TCN + KD
Audio-Visual Speech Recognition	LRW	AVCRFormer
Unconstrained Lip-synchronization	LRW	Wav2Lip + GAN
Visual Keyword Spotting	LRW	Transpotter
Lip Reading	LRW	Lip2Wav
Lip to Speech Synthesis	LRW	Lip2Wav
Talking Face Generation	LRW	LipGAN