Search Results for author: Kun Wei

Found 18 papers, 7 papers with code

MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition

no code implementations • 6 May 2024 • Bingshen Mu, Yangze Li, Qijie Shao, Kun Wei, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie

Accents represent deviations from standard pronunciation norms, and the multi-task learning framework for simultaneous ASR and accent recognition (AR) has effectively addressed the multi-accent scenarios, making it a prominent solution.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets

no code implementations • 3 May 2024 • Xuelong Geng, Tianyi Xu, Kun Wei, Bingshen Mu, Hongfei Xue, He Wang, Yangze Li, Pengcheng Guo, Yuhang Dai, Longhao Li, Mingchen Shao, Lei Xie

Building upon this momentum, our research delves into an in-depth examination of this paradigm on a large open-source Chinese dataset.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Do You Guys Want to Dance: Zero-Shot Compositional Human Dance Generation with Multiple Persons

no code implementations • 24 Jan 2024 • Zhe Xu, Kun Wei, Xu Yang, Cheng Deng

Human dance generation (HDG) aims to synthesize realistic videos from images and sequences of driving poses.

Paper
Add Code

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation

no code implementations • 22 Oct 2023 • Kun Wei, Bei Li, Hang Lv, Quan Lu, Ning Jiang, Lei Xie

By introducing both cross-modal and conversational representations into the decoder, our model retains context over longer sentences without information loss, achieving relative accuracy improvements of 8. 8% and 23% on Mandarin conversation datasets HKUST and MagicData-RAMC, respectively, compared to the standard Conformer model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

ALR-GAN: Adaptive Layout Refinement for Text-to-Image Synthesis

no code implementations • 13 Apr 2023 • Hongchen Tan, BaoCai Yin, Kun Wei, Xiuping Liu, Xin Li

The ALR-GAN includes an Adaptive Layout Refinement (ALR) module and a Layout Visual Refinement (LVR) loss.

Generative Adversarial Network Text-to-Image Generation

Paper
Add Code

Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation

1 code implementation • 31 Oct 2022 • Kun Wei, Long Zhou, Ziqiang Zhang, Liping Chen, Shujie Liu, Lei He, Jinyu Li, Furu Wei

However, direct S2ST suffers from the data scarcity problem because the corpora from speech of the source language to speech of the target language are very rare.

Speech-to-Speech Translation Translation

1,071

Paper
Code

ME-GAN: Learning Panoptic Electrocardio Representations for Multi-view ECG Synthesis Conditioned on Heart Diseases

no code implementations • 21 Jul 2022 • Jintai Chen, Kuanlun Liao, Kun Wei, Haochao Ying, Danny Z. Chen, Jian Wu

Electrocardiogram (ECG) is a widely used non-invasive diagnostic tool for heart diseases.

Generative Adversarial Network

Paper
Add Code

Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR

no code implementations • 3 Jul 2022 • Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma

Then, during the training of the conversational ASR system, the extractor will be frozen to extract the textual representation of preceding speech, while such representation is used as context fed to the ASR decoder through attention mechanism.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism

no code implementations • 2 Jul 2022 • Kun Wei, Pengcheng Guo, Ning Jiang

Transformer-based models have demonstrated their effectiveness in automatic speech recognition (ASR) tasks and even shown superior performance over the conventional hybrid framework.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Siamese Contrastive Embedding Network for Compositional Zero-Shot Learning

1 code implementation • CVPR 2022 • Xiangyu Li, Xu Yang, Kun Wei, Cheng Deng, Muli Yang

Some methods recognize state and object with two trained classifiers, ignoring the impact of the interaction between object and state; the other methods try to learn the joint representation of the state-object compositions, leading to the domain gap between seen and unseen composition sets.

Compositional Zero-Shot Learning Object

Paper
Code

Conversational Speech Recognition By Learning Conversation-level Characteristics

no code implementations • 16 Feb 2022 • Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma

Conversational automatic speech recognition (ASR) is a task to recognize conversational speech including multiple speakers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Not Just Selection, but Exploration: Online Class-Incremental Continual Learning via Dual View Consistency

1 code implementation • CVPR 2022 • Yanan Gu, Xu Yang, Kun Wei, Cheng Deng

Unfortunately, these methods only focus on selecting samples from the memory bank for replay and ignore the adequate exploration of semantic information in the single-pass data stream, leading to poor classification accuracy.

Continual Learning

Paper
Code

Text-Driven Image Manipulation via Semantic-Aware Knowledge Transfer

no code implementations • 29 Sep 2021 • Ziqi Zhang, Cheng Deng, Kun Wei, Xu Yang

And on this basis, a novel attribute transfer method, named semantic directional decomposition network (SDD-Net), is proposed to achieve semantic-level facial attribute transfer by latent semantic direction decomposition, improving the interpretability and editability of our method.

Attribute Image Manipulation +1

Paper
Add Code

Nearest Neighbor Matching for Deep Clustering

1 code implementation • CVPR 2021 • Zhiyuan Dang, Cheng Deng, Xu Yang, Kun Wei, Heng Huang

Specifically, for the local level, we match the nearest neighbors based on batch embedded features, as for the global one, we match neighbors from overall embedded features.

Clustering Deep Clustering

Paper
Code

SelfSAGCN: Self-Supervised Semantic Alignment for Graph Convolution Network

1 code implementation • CVPR 2021 • Xu Yang, Cheng Deng, Zhiyuan Dang, Kun Wei, Junchi Yan

Specifically, the Identity Aggregation is applied to extract semantic features from labeled nodes, the Semantic Alignment is utilized to align node features obtained from different aspects using the class central similarity.

Representation Learning

Paper
Code

Incremental Embedding Learning via Zero-Shot Translation

1 code implementation • 31 Dec 2020 • Kun Wei, Cheng Deng, Xu Yang, Maosen Li

Different from traditional incremental classification networks, the semantic gap between the embedding spaces of two adjacent tasks is the main challenge for embedding networks under incremental learning setting.

Face Recognition Image Retrieval +4

Paper
Code

Adversarial Learning for Robust Deep Clustering

1 code implementation • NeurIPS 2020 • Xu Yang, Cheng Deng, Kun Wei, Junchi Yan, Wei Liu

Meanwhile, we devise an adversarial attack strategy to explore samples that easily fool the clustering layers but do not impact the performance of the deep embedding.

Adversarial Attack Clustering +1

Paper
Code

Adversarial Fine-Grained Composition Learning for Unseen Attribute-Object Recognition

no code implementations • ICCV 2019 • Kun Wei, Muli Yang, Hao Wang, Cheng Deng, Xianglong Liu

Adversarial learning is employed to model the discrepancy and correlations among attributes and objects.

Attribute Object +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.