Search Results for author: LiRong Dai

Found 13 papers, 4 papers with code

The USTC-NELSLIP Offline Speech Translation Systems for IWSLT 2022

no code implementations • IWSLT (ACL) 2022 • Weitai Zhang, Zhongyi Ye, Haitao Tang, Xiaoxi Li, Xinyuan Zhou, Jing Yang, Jianwei Cui, Dan Liu, Junhua Liu, LiRong Dai

This paper describes USTC-NELSLIP’s submissions to the IWSLT 2022 Offline Speech Translation task, including speech translation of talks from English to German, English to Chinese and English to Japanese.

Translation

Paper
Add Code

Adversarial speech for voice privacy protection from Personalized Speech generation

no code implementations • 22 Jan 2024 • Shihao Chen, Liping Chen, Jie Zhang, KongAik Lee, ZhenHua Ling, LiRong Dai

For validation, we employ the open-source pre-trained YourTTS model for speech generation and protect the target speaker's speech in the white-box scenario.

Speaker Verification Voice Conversion

Paper
Add Code

Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation

1 code implementation • 7 Jan 2024 • Qiushi Zhu, Jie Zhang, Yu Gu, Yuchen Hu, LiRong Dai

Considering that visual information helps to improve speech recognition performance in noisy scenes, in this work we propose a multichannel multi-modal speech self-supervised learning framework AV-wav2vec2, which utilizes video and multichannel audio data as inputs.

Audio-Visual Speech Recognition Automatic Speech Recognition +7

Paper
Code

Rep2wav: Noise Robust text-to-speech Using self-supervised representations

no code implementations • 28 Aug 2023 • Qiushi Zhu, Yu Gu, Rilin Chen, Chao Weng, Yuchen Hu, LiRong Dai, Jie Zhang

Noise-robust TTS models are often trained using the enhanced speech, which thus suffer from speech distortion and background noise that affect the quality of the synthesized speech.

Speech Enhancement

Paper
Add Code

Speech4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation

no code implementations • ICCV 2023 • Shan He, Haonan He, Shuo Yang, Xiaoyan Wu, Pengcheng Xia, Bing Yin, Cong Liu, LiRong Dai, Chang Xu

Besides, we also verify that the proposed framework is able to explicitly control the emotion of the animated talking face.

Contrastive Learning

Paper
Add Code

VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

no code implementations • 21 Nov 2022 • Qiushi Zhu, Long Zhou, Ziqiang Zhang, Shujie Liu, Binxing Jiao, Jie Zhang, LiRong Dai, Daxin Jiang, Jinyu Li, Furu Wei

Although speech is a simple and effective way for humans to communicate with the outside world, a more realistic speech interaction contains multimodal information, e. g., vision, text.

Audio-Visual Speech Recognition Language Modelling +3

Paper
Add Code

SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training

1 code implementation • 7 Oct 2022 • Ziqiang Zhang, Long Zhou, Junyi Ao, Shujie Liu, LiRong Dai, Jinyu Li, Furu Wei

The rapid development of single-modal pre-training has prompted researchers to pay more attention to cross-modal pre-training methods.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

1,071

Paper
Code

SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

1 code implementation • 30 Sep 2022 • Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu, Shuo Ren, Shujie Liu, Zhuoyuan Yao, Xun Gong, LiRong Dai, Jinyu Li, Furu Wei

In this paper, we propose a cross-modal Speech and Language Model (SpeechLM) to explicitly align speech and text pre-training with a pre-defined unified discrete representation.

Language Modelling speech-recognition +1

1,071

Paper
Code

Vision-Language Adaptive Mutual Decoder for OOV-STR

no code implementations • 2 Sep 2022 • Jinshui Hu, Chenyu Liu, Qiandong Yan, Xuyang Zhu, Jiajia Wu, Jun Du, LiRong Dai

However, in real-world scenarios, out-of-vocabulary (OOV) words are of great importance and SOTA recognition models usually perform poorly on OOV settings.

Decoder Language Modelling +2

Paper
Add Code

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data

1 code implementation • 31 Mar 2022 • Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu, Haizhou Li, Tom Ko, LiRong Dai, Jinyu Li, Yao Qian, Furu Wei

In this way, the decoder learns to reconstruct original speech information with codes before learning to generate correct text.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

1,071

Paper
Code

Speech-MLP: a simple MLP architecture for speech processing

no code implementations • 29 Sep 2021 • Chao Xing, Dong Wang, LiRong Dai, Qun Liu, Anderson Avila

Overparameterized transformer-based architectures have shown remarkable performance in recent years, achieving state-of-the-art results in speech processing tasks such as speech recognition, speech synthesis, keyword spotting, and speech enhancement et al.

Keyword Spotting Speech Enhancement +3