no code implementations • IWSLT (EMNLP) 2018 • Yuguang Wang, Liangliang Shi, Linyu Wei, Weifeng Zhu, Jinkun Chen, Zhichao Wang, Shixue Wen, Wei Chen, Yanfeng Wang, Jia Jia
Our final average result on speech translation is 31. 02 BLEU.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
1 code implementation • 20 Mar 2024 • Zixuan Wang, Jia Jia, Shikun Sun, Haozhe Wu, Rong Han, Zhenyu Li, Di Tang, Jiaqing Zhou, Jiebo Luo
However, camera movement synthesis with music and dance remains an unsolved challenging problem due to the scarcity of paired data.
no code implementations • 17 Jan 2024 • Jia Jia, Geunho Lee, Zhibo Wang, Lyu Zhi, Yuchu He
This network combines the Siam-U2Net Feature Differential Encoder (SU-FDE) and the denoising diffusion implicit model to improve the accuracy of image edge change detection and enhance the model's robustness under environmental changes.
no code implementations • 28 Dec 2023 • Houlun Chen, Xin Wang, Hong Chen, Zihan Song, Jia Jia, Wenwu Zhu
To tackle these challenges, in this work we propose a Grounding-Prompter method, which is capable of conducting TSG in long videos through prompting LLM with multimodal information.
no code implementations • 21 Sep 2023 • Xianhao Wei, Jia Jia, Xiang Li, Zhiyong Wu, Ziyi Wang
More interestingly, although we aim at the synthesis effect of the style transfer model, the synthesized speech by the proposed text prosodic analysis model is even better than the style transfer from the original speech in some user evaluation indicators.
1 code implementation • 11 Aug 2023 • Haoyu Wang, Haozhe Wu, Junliang Xing, Jia Jia
Creating realistic 3D facial animation is crucial for various applications in the movie production and gaming industry, especially with the burgeoning demand in the metaverse.
1 code implementation • 11 Aug 2023 • Zijie Ye, Jia Jia, Junliang Xing
Human hands, the primary means of non-verbal communication, convey intricate semantics in various scenarios.
1 code implementation • 10 Aug 2023 • Haozhe Wu, Songtao Zhou, Jia Jia, Junliang Xing, Qi Wen, Xiang Wen
This paper emphasizes the importance of considering both the composite and regional natures of facial movements in speech-driven 3D face animation.
no code implementations • 8 Aug 2023 • Jianye Ji, Jiayan Pei, Shaochuan Lin, Taotao Zhou, Hengxu He, Jia Jia, Ning Hu
Group recommendation provides personalized recommendations to a group of users based on their shared interests, preferences, and characteristics.
no code implementations • 8 Aug 2023 • Shaochuan Lin, Jiayan Pei, Taotao Zhou, Hengxu He, Jia Jia, Ning Hu
Online Food Recommendation Service (OFRS) has remarkable spatiotemporal characteristics and the advantage of being able to conveniently satisfy users' needs in a timely manner.
no code implementations • 7 Aug 2023 • Zhenhao Jiang, Biao Zeng, Hao Feng, Jin Liu, Jie Zhang, Jia Jia, Ning Hu
In order to address the problem of pagination trigger mechanism, we propose a completely new module in the pipeline of recommender system named Mobile Supply.
no code implementations • 4 Aug 2023 • Shikun Sun, Longhui Wei, Junliang Xing, Jia Jia, Qi Tian
Recent score-based diffusion models (SBDMs) show promising results in unpaired image-to-image translation (I2I).
no code implementations • 18 Jul 2023 • Zhenhao Jiang, Biao Zeng, Hao Feng, Jin Liu, Jicong Fan, Jie Zhang, Jia Jia, Ning Hu, Xingyu Chen, Xuguang Lan
We propose a novel Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint (ESMC) and two alternatives: Entire Space Multi-Task Model with Siamese Network (ESMS) and Entire Space Multi-Task Model in Global Domain (ESMG) to address the PSC issue.
no code implementations • 13 Jul 2023 • Shuo Huang, Zongxin Yang, Liangting Li, Yi Yang, Jia Jia
Large-scale pre-trained vision-language models allow for the zero-shot text-based generation of 3D avatars.
no code implementations • 10 Jun 2023 • Shuo Huang, Jia Jia, Zongxin Yang, Wei Wang, Haozhe Wu, Yi Yang, Junliang Xing
However, motion interpolation is a more complex problem that takes isolated poses (e. g., only one start pose and one end pose) as input.
1 code implementation • 17 Mar 2023 • Haozhe Wu, Jia Jia, Junliang Xing, Hongwei Xu, Xiangyuan Wang, Jelo Wang
Upon MMFace4D, we construct a non-autoregressive framework for audio-driven 3D face animation.
no code implementations • 22 Nov 2022 • Boya Du, Shaochuan Lin, Jiong Gao, Xiyu Ji, Mengya Wang, Taotao Zhou, Hengxu He, Jia Jia, Ning Hu
Therefore, we address this challenge by proposing a Bottom-up Adaptive Spatiotemporal Model(BASM) to adaptively fit the spatiotemporal data distribution, which further improve the fitting capability of the model.
no code implementations • 20 Sep 2022 • Shaochuan Lin, Yicong Yu, Xiyu Ji, Taotao Zhou, Hengxu He, Zisen Sang, Jia Jia, Guodong Cao, Ning Hu
In Location-Based Services(LBS), user behavior naturally has a strong dependence on the spatiotemporal information, i. e., in different geographical locations and at different times, user click behavior will change significantly.
no code implementations • 10 Aug 2022 • Xiang Li, Changhe Song, Xianhao Wei, Zhiyong Wu, Jia Jia, Helen Meng
This paper aims to introduce a chunk-wise multi-scale cross-speaker style model to capture both the global genre and the local prosody in audiobook speeches.
1 code implementation • 30 Oct 2021 • Haozhe Wu, Jia Jia, Haoyu Wang, Yishun Dou, Chao Duan, Qingshan Deng
Due to such huge differences between different styles, it is necessary to incorporate the talking style into audio-driven talking face synthesis framework.
no code implementations • 8 Apr 2021 • Xiang Li, Changhe Song, Jingbei Li, Zhiyong Wu, Jia Jia, Helen Meng
This paper introduces a multi-scale speech style modeling method for end-to-end expressive speech synthesis.
no code implementations • 16 Sep 2020 • Zijie Ye, Haozhe Wu, Jia Jia, Yaohua Bu, Wei Chen, Fanbo Meng, Yan-Feng Wang
Meanwhile, human choreographers design dance motions from music in a two-stage manner: they firstly devise multiple choreographic dance units (CAUs), each with a series of dance motions, and then arrange the CAU sequence according to the rhythm, melody and emotion of the music.
no code implementations • 12 Sep 2020 • Yaohua Bu, Weijun Li, Tianyi Ma, Shengqi Chen, Jia Jia, Kun Li, Xiaobo Lu
To provide more discriminative feedback for the second language (L2) learners to better identify their mispronunciation, we propose a method for exaggerated visual-speech feedback in computer-assisted pronunciation training (CAPT).
no code implementations • 20 Jun 2020 • Huirong Huang, Zhiyong Wu, Shiyin Kang, Dongyang Dai, Jia Jia, Tianxiao Fu, Deyi Tuo, Guangzhi Lei, Peng Liu, Dan Su, Dong Yu, Helen Meng
Recent approaches mainly have following limitations: 1) most speaker-independent methods need handcrafted features that are time-consuming to design or unreliable; 2) there is no convincing method to support multilingual or mixlingual speech as input.
1 code implementation • 17 Nov 2019 • Haozhe Wu, Zhiyuan Hu, Jia Jia, Yaohua Bu, Xiangnan He, Tat-Seng Chua
Next, we define user's attributes as two categories: spatial attributes (e. g., social role of user) and temporal attributes (e. g., post content of user).
no code implementations • 1 Nov 2019 • Pan Zhou, Ruchao Fan, Wei Chen, Jia Jia
Transformer has shown promising results in many sequence to sequence transformation tasks recently.
no code implementations • 1 Dec 2018 • Meng Li, Yan Zhang, Haicheng She, Jinqiong Zhou, Jia Jia, Danmei He, Li Zhang
The change of retinal vasculature is an early sign of many vascular and systematic diseases, such as diabetes and hypertension.
no code implementations • 13 Nov 2018 • Pan Zhou, Wenwen Yang, Wei Chen, Yan-Feng Wang, Jia Jia
In this paper, we propose a novel multimodal attention based method for audio-visual speech recognition which could automatically learn the fused representation from both modalities based on their importance.
Audio-Visual Speech Recognition Robust Speech Recognition +2
no code implementations • 13 Nov 2018 • Ruchao Fan, Pan Zhou, Wei Chen, Jia Jia, Gang Liu
In previous work, researchers have shown that such architectures can acquire comparable results to state-of-the-art ASR systems, especially when using a bidirectional encoder and global soft attention (GSA) mechanism.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 13 Nov 2018 • Senmao Wang, Pan Zhou, Wei Chen, Jia Jia, Lei Xie
End-to-end approaches have drawn much attention recently for significantly simplifying the construction of an automatic speech recognition (ASR) system.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 17 Nov 2016 • Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai
Hence, traditional methods may fail to distinguish some of the emotions with just one global feature subspace.