Search Results for author: Wei Ping

Found 43 papers, 26 papers with code

X-VILA: Cross-Modality Alignment for Large Language Model

no code implementations • 29 May 2024 • Hanrong Ye, De-An Huang, Yao Lu, Zhiding Yu, Wei Ping, Andrew Tao, Jan Kautz, Song Han, Dan Xu, Pavlo Molchanov, Hongxu Yin

We introduce X-VILA, an omni-modality model designed to extend the capabilities of large language models (LLMs) by incorporating image, video, and audio modalities.

Instruction Following Language Modelling +1

Paper
Add Code

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

no code implementations • 27 May 2024 • Chankyu Lee, Rajarshi Roy, Mengyao Xu, Jonathan Raiman, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping

In this work, we introduce the NV-Embed model with a variety of architectural designs and training procedures to significantly enhance the performance of LLM as a versatile embedding model, while maintaining its simplicity and reproducibility.

Language Modelling Large Language Model +3

Paper
Add Code

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

1 code implementation • 2 Feb 2024 • Zhifeng Kong, Arushi Goel, Rohan Badlani, Wei Ping, Rafael Valle, Bryan Catanzaro

Augmenting large language models (LLMs) to understand audio -- including non-speech sounds and non-verbal speech -- is critically important for diverse real-world applications of LLMs.

Ranked #1 on Retrieval-augmented Few-shot In-context Audio Captioning on AudioCaps (using extra training data)

Acoustic Scene Classification Few-Shot Learning +5

Paper
Code

ChatQA: Surpassing GPT-4 on Conversational QA and RAG

no code implementations • 18 Jan 2024 • Zihan Liu, Wei Ping, Rajarshi Roy, Peng Xu, Chankyu Lee, Mohammad Shoeybi, Bryan Catanzaro

In this work, we introduce ChatQA, a suite of models that outperform GPT-4 on retrieval-augmented generation (RAG) and conversational question answering (QA).

Conversational Question Answering Retrieval

Paper
Add Code

VILA: On Pre-training for Visual Language Models

3 code implementations • 12 Dec 2023 • Ji Lin, Hongxu Yin, Wei Ping, Yao Lu, Pavlo Molchanov, Andrew Tao, Huizi Mao, Jan Kautz, Mohammad Shoeybi, Song Han

Visual language models (VLMs) rapidly progressed with the recent success of large language models.

Ranked #31 on Visual Question Answering on MM-Vet

In-Context Learning Language Modelling +2

1,995

Paper
Code

InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining

1 code implementation • 11 Oct 2023 • Boxin Wang, Wei Ping, Lawrence McAfee, Peng Xu, Bo Li, Mohammad Shoeybi, Bryan Catanzaro

After instruction tuning on Retro, InstructRetro demonstrates significant improvement over the instruction tuned GPT on a wide range of zero-shot tasks.

4k Decoder +4

9,015

Paper
Code

Retrieval meets Long Context Large Language Models

no code implementations • 4 Oct 2023 • Peng Xu, Wei Ping, Xianchao Wu, Lawrence McAfee, Chen Zhu, Zihan Liu, Sandeep Subramanian, Evelina Bakhturina, Mohammad Shoeybi, Bryan Catanzaro

Perhaps surprisingly, we find that LLM with 4K context window using simple retrieval-augmentation at generation can achieve comparable performance to finetuned LLM with 16K context window via positional interpolation on long context tasks, while taking much less computation.

16k 4k +4

Paper
Add Code

CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram

no code implementations • 12 Sep 2023 • Zhifeng Kong, Wei Ping, Ambrish Dantrey, Bryan Catanzaro

In this work, we present CleanUNet 2, a speech denoising model that combines the advantages of waveform denoiser and spectrogram denoiser and achieves the best of both worlds.

Ranked #3 on Speech Enhancement on Deep Noise Suppression (DNS) Challenge

Denoising Speech Denoising +2

Paper
Add Code

RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models

1 code implementation • 15 Aug 2023 • Jie Huang, Wei Ping, Peng Xu, Mohammad Shoeybi, Kevin Chen-Chuan Chang, Bryan Catanzaro

In this paper, we investigate the in-context learning ability of retrieval-augmented encoder-decoder language models.

Decoder In-Context Learning +4

Paper
Code

Defending against Insertion-based Textual Backdoor Attacks via Attribution

1 code implementation • 3 May 2023 • Jiazhao Li, Zhuofeng Wu, Wei Ping, Chaowei Xiao, V. G. Vinod Vydiswaran

Textual backdoor attack, as a novel attack model, has been shown to be effective in adding a backdoor to the model during training.

Backdoor Attack Language Modelling

Paper
Code

Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study

1 code implementation • 13 Apr 2023 • Boxin Wang, Wei Ping, Peng Xu, Lawrence McAfee, Zihan Liu, Mohammad Shoeybi, Yi Dong, Oleksii Kuchaiev, Bo Li, Chaowei Xiao, Anima Anandkumar, Bryan Catanzaro

Thus, it is still an open question: shall we pretrain large autoregressive LMs with retrieval?

Decoder Open-Ended Question Answering +2

9,015

Paper
Code

Defending against Adversarial Audio via Diffusion Model

1 code implementation • 2 Mar 2023 • Shutong Wu, Jiongxiao Wang, Wei Ping, Weili Nie, Chaowei Xiao

In this paper, we propose an adversarial purification-based defense pipeline, AudioPure, for acoustic systems via off-the-shelf diffusion models.

Adversarial Purification

Paper
Code

Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning

no code implementations • 9 Feb 2023 • Zhuolin Yang, Wei Ping, Zihan Liu, Vijay Korthikanti, Weili Nie, De-An Huang, Linxi Fan, Zhiding Yu, Shiyi Lan, Bo Li, Ming-Yu Liu, Yuke Zhu, Mohammad Shoeybi, Bryan Catanzaro, Chaowei Xiao, Anima Anandkumar

Augmenting pretrained language models (LMs) with a vision encoder (e. g., Flamingo) has obtained the state-of-the-art results in image-to-text generation.

Few-Shot Learning Image Captioning +3

Paper
Add Code

FlowGrad: Controlling the Output of Generative ODEs With Gradients

no code implementations • CVPR 2023 • Xingchao Liu, Lemeng Wu, Shujian Zhang, Chengyue Gong, Wei Ping, Qiang Liu

To further accelerate the computation of the back-propagation, we propose to use a non-uniform discretization to approximate the ODE trajectory, where we measure how straight the trajectory is and gather the straight parts into one discretization step.

Image Manipulation

Paper
Add Code

Evaluating Parameter Efficient Learning for Generation

no code implementations • 25 Oct 2022 • Peng Xu, Mostofa Patwary, Shrimai Prabhumoye, Virginia Adams, Ryan J. Prenger, Wei Ping, Nayeon Lee, Mohammad Shoeybi, Bryan Catanzaro

For cross-domain and cross-dataset cases, we show that (a) Adapter (Houlsby et al., 2019) performs the best amongst all the PERMs studied here, and (b) it outperforms finetuning if the task dataset is below a certain size.

Paper
Add Code

BigVGAN: A Universal Neural Vocoder with Large-Scale Training

3 code implementations • 9 Jun 2022 • Sang-gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon

Despite recent progress in generative adversarial network (GAN)-based vocoders, where the model generates raw waveform conditioned on acoustic features, it is challenging to synthesize high-fidelity audio for numerous speakers across various recording environments.

Ranked #5 on Speech Synthesis on LibriTTS

Audio Generation Audio Synthesis +4

1,110

Paper
Code

Factuality Enhanced Language Models for Open-Ended Text Generation

5 code implementations • 9 Jun 2022 • Nayeon Lee, Wei Ping, Peng Xu, Mostofa Patwary, Pascale Fung, Mohammad Shoeybi, Bryan Catanzaro

In this work, we measure and improve the factual accuracy of large-scale LMs for open-ended text generation.

Misconceptions Sentence +2

5,577

Paper
Code

Multi-Stage Prompting for Knowledgeable Dialogue Generation

1 code implementation • Findings (ACL) 2022 • Zihan Liu, Mostofa Patwary, Ryan Prenger, Shrimai Prabhumoye, Wei Ping, Mohammad Shoeybi, Bryan Catanzaro

We propose a multi-stage prompting approach to generate knowledgeable responses from a single pretrained LM.

Dialogue Generation Language Modelling +1

9,015

Paper
Code

Speech Denoising in the Waveform Domain with Self-Attention

1 code implementation • 15 Feb 2022 • Zhifeng Kong, Wei Ping, Ambrish Dantrey, Bryan Catanzaro

In this work, we present CleanUNet, a causal speech denoising model on the raw waveform.

Ranked #6 on Speech Enhancement on Deep Noise Suppression (DNS) Challenge

Decoder Denoising +2

270

Paper
Code

Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models

1 code implementation • 8 Feb 2022 • Boxin Wang, Wei Ping, Chaowei Xiao, Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Bo Li, Anima Anandkumar, Bryan Catanzaro

In this work, we systematically explore domain-adaptive training to reduce the toxicity of language models.

9,015

Paper
Code

One TTS Alignment To Rule Them All

3 code implementations • 23 Aug 2021 • Rohan Badlani, Adrian Łancucki, Kevin J. Shih, Rafael Valle, Wei Ping, Bryan Catanzaro

However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words.

Speech Synthesis

30,615

Paper
Code

Long-Short Transformer: Efficient Transformers for Language and Vision

3 code implementations • NeurIPS 2021 • Chen Zhu, Wei Ping, Chaowei Xiao, Mohammad Shoeybi, Tom Goldstein, Anima Anandkumar, Bryan Catanzaro

For instance, Transformer-LS achieves 0. 97 test BPC on enwik8 using half the number of parameters than previous method, while being faster and is able to handle 3x as long sequences compared to its full-attention version on the same hardware.

Ranked #1 on Language Modelling on enwik8 dev

Language Modelling

314

Paper
Code

RAD-TTS: Parallel Flow-Based TTS with Robust Alignment Learning and Diverse Synthesis

1 code implementation • ICML Workshop INNF 2021 • Kevin J. Shih, Rafael Valle, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro

This work introduces a predominantly parallel, end-to-end TTS model based on normalizing flows.

Speech Synthesis Text-To-Speech Synthesis

274

Paper
Code

On Fast Sampling of Diffusion Probabilistic Models

1 code implementation • ICML Workshop INNF 2021 • Zhifeng Kong, Wei Ping

In this work, we propose FastDPM, a unified framework for fast sampling in diffusion probabilistic models.

Paper
Code

End-to-End Training of Neural Retrievers for Open-Domain Question Answering

2 code implementations • ACL 2021 • Devendra Singh Sachan, Mostofa Patwary, Mohammad Shoeybi, Neel Kant, Wei Ping, William L Hamilton, Bryan Catanzaro

We also explore two approaches for end-to-end supervised training of the reader and retriever components in OpenQA models.

Natural Questions Open-Domain Question Answering +3

9,015

Paper
Code

Local Knowledge Powered Conversational Agents

1 code implementation • 20 Oct 2020 • Sashank Santhanam, Wei Ping, Raul Puri, Mohammad Shoeybi, Mostofa Patwary, Bryan Catanzaro

State-of-the-art conversational agents have advanced significantly in conjunction with the use of large transformer-based language models.

Informativeness

9,015

Paper
Code

DiffWave: A Versatile Diffusion Model for Audio Synthesis

11 code implementations • ICLR 2021 • Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, Bryan Catanzaro

In this work, we propose DiffWave, a versatile diffusion probabilistic model for conditional and unconditional waveform generation.

Ranked #2 on Speech Synthesis on LJSpeech

Audio Synthesis Speech Synthesis

728

Paper
Code

Parallel Neural Text-to-Speech

no code implementations • ICLR 2020 • Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao

In this work, we first propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram.

Paper
Add Code

WaveFlow: A Compact Flow-based Model for Raw Audio

4 code implementations • ICML 2020 • Wei Ping, Kainan Peng, Kexin Zhao, Zhao Song

WaveFlow provides a unified view of likelihood-based models for 1-D data, including WaveNet and WaveGlow as special cases.

Ranked #9 on Speech Synthesis on LibriTTS

Speech Synthesis

599

Paper
Code

Multi-Speaker End-to-End Speech Synthesis

no code implementations • 9 Jul 2019 • Jihyun Park, Kexin Zhao, Kainan Peng, Wei Ping

In this work, we extend ClariNet (Ping et al., 2019), a fully end-to-end speech synthesis model (i. e., text-to-wave), to generate high-fidelity speech from multiple speakers.

Speech Synthesis

Paper
Add Code

Non-Autoregressive Neural Text-to-Speech

2 code implementations • ICML 2020 • Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao

In this work, we propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram.

Text-To-Speech Synthesis

120

Paper
Code

Large Margin Neural Language Model

no code implementations • EMNLP 2018 • Jiaji Huang, Yi Li, Wei Ping, Liang Huang

We propose a large margin criterion for training neural language models.

Language Modelling Machine Translation +3

Paper
Add Code

ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech

5 code implementations • ICLR 2019 • Wei Ping, Kainan Peng, Jitong Chen

In this work, we propose a new solution for parallel wave generation by WaveNet.

Speech Synthesis

288

Paper
Code

Cancer Metastasis Detection With Neural Conditional Random Field

1 code implementation • 19 Jun 2018 • Yi Li, Wei Ping

Compared to the baseline method without considering spatial correlations, we show that the proposed NCRF framework obtains probability maps of patch predictions with better visual quality.

Cancer Metastasis Detection whole slide images

749

Paper
Code

Neural Voice Cloning with a Few Samples

2 code implementations • NeurIPS 2018 • Sercan O. Arik, Jitong Chen, Kainan Peng, Wei Ping, Yanqi Zhou

Speaker adaptation is based on fine-tuning a multi-speaker generative model with a few cloning samples.

Speech Synthesis Voice Cloning

511

Paper
Code

HybridNet: A Hybrid Neural Architecture to Speed-up Autoregressive Models

no code implementations • ICLR 2018 • Yanqi Zhou, Wei Ping, Sercan Arik, Kainan Peng, Greg Diamos

This paper introduces HybridNet, a hybrid neural network to speed-up autoregressive models for raw audio waveform generation.

Speech Synthesis

Paper
Add Code

Topic Compositional Neural Language Model

no code implementations • 28 Dec 2017 • Wenlin Wang, Zhe Gan, Wenqi Wang, Dinghan Shen, Jiaji Huang, Wei Ping, Sanjeev Satheesh, Lawrence Carin

The TCNLM learns the global semantic coherence of a document via a neural topic model, and the probability of each learned latent topic is further used to build a Mixture-of-Experts (MoE) language model, where each expert (corresponding to one topic) is a recurrent neural network (RNN) that accounts for learning the local structure of a word sequence.

Language Modelling

Paper
Add Code

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

7 code implementations • ICLR 2018 • Wei Ping, Kainan Peng, Andrew Gibiansky, Sercan O. Arik, Ajay Kannan, Sharan Narang, Jonathan Raiman, John Miller

We present Deep Voice 3, a fully-convolutional attention-based neural text-to-speech (TTS) system.

Speech Synthesis

1,949

Paper
Code

Learning Infinite RBMs with Frank-Wolfe

no code implementations • NeurIPS 2016 • Wei Ping, Qiang Liu, Alexander Ihler

In this work, we propose an infinite restricted Boltzmann machine~(RBM), whose maximum likelihood estimation~(MLE) corresponds to a constrained convex optimization.

Paper
Add Code

Deep Voice 2: Multi-Speaker Neural Text-to-Speech

1 code implementation • NeurIPS 2017 • Sercan Arik, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, Yanqi Zhou

We introduce Deep Voice 2, which is based on a similar pipeline with Deep Voice 1, but constructed with higher performance building blocks and demonstrates a significant audio quality improvement over Deep Voice 1.

Speech Synthesis

236

Paper
Code

Belief Propagation in Conditional RBMs for Structured Prediction

no code implementations • 2 Mar 2017 • Wei Ping, Alexander Ihler

We demonstrate that, in both maximum likelihood and max-margin learning, training conditional RBMs with BP as the inference routine can provide significantly better results than current state-of-the-art CD methods on structured prediction problems.

Structured Prediction

Paper
Add Code

Decomposition Bounds for Marginal MAP

no code implementations • NeurIPS 2015 • Wei Ping, Qiang Liu, Alexander Ihler

Marginal MAP inference involves making MAP predictions in systems defined with latent variables or missing information.

Paper
Add Code

Marginal Structured SVM with Hidden Variables

no code implementations • 4 Sep 2014 • Wei Ping, Qiang Liu, Alexander Ihler

In this work, we propose the marginal structured SVM (MSSVM) for structured prediction with hidden variables.

Structured Prediction

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.