no code implementations • 6 May 2024 • Bingshen Mu, Yangze Li, Qijie Shao, Kun Wei, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie
Accents represent deviations from standard pronunciation norms, and the multi-task learning framework for simultaneous ASR and accent recognition (AR) has effectively addressed the multi-accent scenarios, making it a prominent solution.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 3 May 2024 • Xuelong Geng, Tianyi Xu, Kun Wei, Bingshen Mu, Hongfei Xue, He Wang, Yangze Li, Pengcheng Guo, Yuhang Dai, Longhao Li, Mingchen Shao, Lei Xie
Building upon this momentum, our research delves into an in-depth examination of this paradigm on a large open-source Chinese dataset.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 24 Jan 2024 • Zhe Xu, Kun Wei, Xu Yang, Cheng Deng
Human dance generation (HDG) aims to synthesize realistic videos from images and sequences of driving poses.
no code implementations • 22 Oct 2023 • Kun Wei, Bei Li, Hang Lv, Quan Lu, Ning Jiang, Lei Xie
By introducing both cross-modal and conversational representations into the decoder, our model retains context over longer sentences without information loss, achieving relative accuracy improvements of 8. 8% and 23% on Mandarin conversation datasets HKUST and MagicData-RAMC, respectively, compared to the standard Conformer model.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 13 Apr 2023 • Hongchen Tan, BaoCai Yin, Kun Wei, Xiuping Liu, Xin Li
The ALR-GAN includes an Adaptive Layout Refinement (ALR) module and a Layout Visual Refinement (LVR) loss.
1 code implementation • 31 Oct 2022 • Kun Wei, Long Zhou, Ziqiang Zhang, Liping Chen, Shujie Liu, Lei He, Jinyu Li, Furu Wei
However, direct S2ST suffers from the data scarcity problem because the corpora from speech of the source language to speech of the target language are very rare.
no code implementations • 21 Jul 2022 • Jintai Chen, Kuanlun Liao, Kun Wei, Haochao Ying, Danny Z. Chen, Jian Wu
Electrocardiogram (ECG) is a widely used non-invasive diagnostic tool for heart diseases.
no code implementations • 3 Jul 2022 • Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma
Then, during the training of the conversational ASR system, the extractor will be frozen to extract the textual representation of preceding speech, while such representation is used as context fed to the ASR decoder through attention mechanism.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 2 Jul 2022 • Kun Wei, Pengcheng Guo, Ning Jiang
Transformer-based models have demonstrated their effectiveness in automatic speech recognition (ASR) tasks and even shown superior performance over the conventional hybrid framework.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • CVPR 2022 • Xiangyu Li, Xu Yang, Kun Wei, Cheng Deng, Muli Yang
Some methods recognize state and object with two trained classifiers, ignoring the impact of the interaction between object and state; the other methods try to learn the joint representation of the state-object compositions, leading to the domain gap between seen and unseen composition sets.
no code implementations • 16 Feb 2022 • Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma
Conversational automatic speech recognition (ASR) is a task to recognize conversational speech including multiple speakers.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • CVPR 2022 • Yanan Gu, Xu Yang, Kun Wei, Cheng Deng
Unfortunately, these methods only focus on selecting samples from the memory bank for replay and ignore the adequate exploration of semantic information in the single-pass data stream, leading to poor classification accuracy.
no code implementations • 29 Sep 2021 • Ziqi Zhang, Cheng Deng, Kun Wei, Xu Yang
And on this basis, a novel attribute transfer method, named semantic directional decomposition network (SDD-Net), is proposed to achieve semantic-level facial attribute transfer by latent semantic direction decomposition, improving the interpretability and editability of our method.
1 code implementation • CVPR 2021 • Zhiyuan Dang, Cheng Deng, Xu Yang, Kun Wei, Heng Huang
Specifically, for the local level, we match the nearest neighbors based on batch embedded features, as for the global one, we match neighbors from overall embedded features.
1 code implementation • CVPR 2021 • Xu Yang, Cheng Deng, Zhiyuan Dang, Kun Wei, Junchi Yan
Specifically, the Identity Aggregation is applied to extract semantic features from labeled nodes, the Semantic Alignment is utilized to align node features obtained from different aspects using the class central similarity.
1 code implementation • 31 Dec 2020 • Kun Wei, Cheng Deng, Xu Yang, Maosen Li
Different from traditional incremental classification networks, the semantic gap between the embedding spaces of two adjacent tasks is the main challenge for embedding networks under incremental learning setting.
1 code implementation • NeurIPS 2020 • Xu Yang, Cheng Deng, Kun Wei, Junchi Yan, Wei Liu
Meanwhile, we devise an adversarial attack strategy to explore samples that easily fool the clustering layers but do not impact the performance of the deep embedding.
no code implementations • ICCV 2019 • Kun Wei, Muli Yang, Hao Wang, Cheng Deng, Xianglong Liu
Adversarial learning is employed to model the discrepancy and correlations among attributes and objects.