wav2vec Unsupervised

Introduced by Baevski et al. in Unsupervised Speech Recognition

wav2vec-U is an unsupervised method to train speech recognition models without any labeled data. It leverages self-supervised speech representations to segment unlabeled language and learn a mapping from these representations to phonemes via adversarial training.

Specifically, we learn self-supervised representations with wav2vec 2.0 on unlabeled speech audio, then identify clusters in the representations with k-means to segment the audio data. Next, we build segment representations by mean pooling the wav2vec 2.0 representations, performing PCA and a second mean pooling step between adjacent segments. This is input to the generator which outputs a phoneme sequence that is fed to the discriminator, similar to phonemized unlabeled text to perform adversarial training.

Source: Unsupervised Speech Recognition

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Speech Recognition	5	38.46%
Unsupervised Speech Recognition	4	30.77%
Automatic Speech Recognition (ASR)	3	23.08%
Language Modelling	1	7.69%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
k-Means Clustering	Clustering

Categories

Add Remove

Speech Recognition