1 code implementation • 9 Jan 2024 • Ziyue Huang, Mingming Zhang, Yuan Gong, Qingjie Liu, Yunhong Wang
Deep learning models are essential for scene classification, change detection, land cover segmentation, and other remote sensing image understanding tasks.
1 code implementation • 25 Sep 2023 • Yuan Gong, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, James Glass
Humans are surrounded by audio signals that include both speech and non-speech sounds.
1 code implementation • 19 Sep 2023 • Tianhua Zhang, Jiaxin Ge, Hongyin Luo, Yung-Sung Chuang, Mingye Gao, Yuan Gong, Xixin Wu, Yoon Kim, Helen Meng, James Glass
How can we perform computations over natural language representations to solve tasks that require symbolic and numeric reasoning?
no code implementations • ICCV 2023 • Yuan Gong, Yong Zhang, Xiaodong Cun, Fei Yin, Yanbo Fan, Xuan Wang, Baoyuan Wu, Yujiu Yang
Moreover, since no paired data is provided, we propose a novel cross-domain training scheme using data from two domains with the designed analogy constraint.
1 code implementation • 13 Jul 2023 • Yingqing He, Menghan Xia, Haoxin Chen, Xiaodong Cun, Yuan Gong, Jinbo Xing, Yong Zhang, Xintao Wang, Chao Weng, Ying Shan, Qifeng Chen
For the first module, we leverage an off-the-shelf video retrieval system and extract video depths as motion structure.
1 code implementation • 29 May 2023 • Yuan Gong, Youxin Pang, Xiaodong Cun, Menghan Xia, Yingqing He, Haoxin Chen, Longyue Wang, Yong Zhang, Xintao Wang, Ying Shan, Yujiu Yang
Accurate Story visualization requires several necessary elements, such as identity consistency across frames, the alignment between plain text and visual content, and a reasonable layout of objects in images.
no code implementations • 24 May 2023 • Hongyin Luo, Yung-Sung Chuang, Yuan Gong, Tianhua Zhang, Yoon Kim, Xixin Wu, Danny Fox, Helen Meng, James Glass
Large language models (LLMs) have been significantly improved by instruction fine-tuning, but still lack transparency and the ability to utilize up-to-date knowledge and information.
2 code implementations • 18 May 2023 • Yuan Gong, Hongyin Luo, Alexander H. Liu, Leonid Karlinsky, James Glass
On the other hand, modern large language models (LLMs) exhibit emerging reasoning ability but they lack audio perception capabilities.
Ranked #3 on Music Question Answering on MusicQA (using extra training data)
no code implementations • CVPR 2023 • Fei Yin, Yong Zhang, Xuan Wang, Tengfei Wang, Xiaoyu Li, Yuan Gong, Yanbo Fan, Xiaodong Cun, Ying Shan, Cengiz Oztireli, Yujiu Yang
It is natural to associate 3D GANs with GAN inversion methods to project a real image into the generator's latent space, allowing free-view consistent synthesis and editing, referred as 3D GAN inversion.
1 code implementation • CVPR 2023 • Yatai Ji, Junjie Wang, Yuan Gong, Lin Zhang, Yanru Zhu, Hongfa Wang, Jiaxing Zhang, Tetsuya Sakai, Yujiu Yang
Multimodal semantic understanding often has to deal with uncertainty, which means the obtained messages tend to refer to multiple targets.
1 code implementation • 2 Oct 2022 • Yuan Gong, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James Glass
In this paper, we first extend the recent Masked Auto-Encoder (MAE) model from a single modality to audio-visual multi-modalities.
Ranked #1 on Sound Prompted Semantic Segmentation on ADE20K
1 code implementation • 22 Aug 2022 • Zhendong Yang, Zhe Li, Yuan Gong, Tianke Zhang, Shanshan Lao, Chun Yuan, Yu Li
Furthermore, we smooth students' target output to treat it as the soft target for training without teachers and propose a teacher-free new KD loss (tf-NKD).
1 code implementation • 29 Jul 2022 • Yuan Gong, Alexander H. Liu, Andrew Rouditchenko, James Glass
Conventional audio-visual models have independent audio and video branches.
Ranked #2 on Multi-modal Classification on AudioSet (using extra training data)
1 code implementation • 6 May 2022 • Yuan Gong, Ziyi Chen, Iek-Heng Chu, Peng Chang, James Glass
Automatic pronunciation assessment is an important technology to help self-directed language learners.
Ranked #2 on Phone-level pronunciation scoring on speechocean762 (using extra training data)
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
1 code implementation • 6 May 2022 • Yuan Gong, Jin Yu, James Glass
Recognizing human non-speech vocalizations is an important task and has broad applications such as automatic sound transcription and health condition monitoring.
Ranked #1 on Audio Classification on VocalSound
2 code implementations • 22 Apr 2022 • Shanshan Lao, Yuan Gong, Shuwei Shi, Sidi Yang, Tianhe Wu, Jiahao Wang, Weihao Xia, Yujiu Yang
Image quality assessment (IQA) algorithm aims to quantify the human perception of image quality.
Ranked #1 on Image Quality Assessment on MSU FR VQA Database
2 code implementations • 19 Apr 2022 • Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, Yujiu Yang
No-Reference Image Quality Assessment (NR-IQA) aims to assess the perceptual quality of images in accordance with human subjective perception.
Ranked #8 on Video Quality Assessment on MSU SR-QA Dataset
2 code implementations • 13 Mar 2022 • Yuan Gong, Sameer Khurana, Andrew Rouditchenko, James Glass
Audio classification is an active research area with a wide range of applications.
1 code implementation • CVPR 2022 • Zhendong Yang, Zhe Li, Xiaohu Jiang, Yuan Gong, Zehuan Yuan, Danpei Zhao, Chun Yuan
Global distillation rebuilds the relation between different pixels and transfers it from teachers to students, compensating for missing global information in focal distillation.
Ranked #1 on Knowledge Distillation on MS COCO
3 code implementations • 19 Oct 2021 • Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, James Glass
However, pure Transformer models tend to require more training data compared to CNNs, and the success of the AST relies on supervised pretraining that requires a large amount of labeled data and a complex training pipeline, thus limiting the practical usage of AST.
Ranked #1 on Spoken Command Recognition on Speech Command v2
4 code implementations • 5 Apr 2021 • Yuan Gong, Yu-An Chung, James Glass
In the past decade, convolutional neural networks (CNNs) have been widely adopted as the main building block for end-to-end audio classification models, which aim to learn a direct mapping from audio spectrograms to corresponding labels.
Ranked #1 on Audio Classification on Speech Commands
1 code implementation • 2 Feb 2021 • Yuan Gong, Yu-An Chung, James Glass
Audio tagging is an active research area and has a wide range of applications.
Ranked #6 on Audio Classification on FSD50K (using extra training data)
2 code implementations • 18 Mar 2020 • Yuan Gong, Jian Yang, Christian Poellabauer
With the rapidly growing number of security-sensitive systems that use voice as the primary input, it becomes increasingly important to address these systems' potential vulnerability to replay attacks.
no code implementations • 31 Aug 2019 • Bryan, Xia, Yuan Gong, Yizhe Zhang, Christian Poellabauer
Recent efforts have shown promising results for person re-identification by designing part-based architectures to allow a neural network to learn discriminative representations from semantically coherent parts.
1 code implementation • 31 May 2019 • Yuan Gong, Boyang Li, Christian Poellabauer, Yiyu Shi
In recent years, many efforts have demonstrated that modern machine learning algorithms are vulnerable to adversarial attacks, where small, but carefully crafted, perturbations on the input can make them fail.
2 code implementations • 6 Apr 2019 • Yuan Gong, Jian Yang, Jacob Huber, Mitchell MacKnight, Christian Poellabauer
This paper introduces a new database of voice recordings with the goal of supporting research on vulnerabilities and protection of voice-controlled systems (VCSs).
no code implementations • 8 Aug 2018 • Yuan Gong, Christian Poellabauer
Learning disentangled representations of high-dimensional data is currently an active research area.
no code implementations • 28 Mar 2018 • Yuan Gong, Christian Poellabauer
Major depressive disorder is a common mental disorder that affects almost 7% of the adult U. S. population.
no code implementations • 24 Mar 2018 • Yuan Gong, Christian Poellabauer
These systems have been shown to be vulnerable to various types of voice spoofing attacks.
no code implementations • ICLR 2018 • Yuan Gong, Christian Poellabauer
Prior work on speech and audio processing has demonstrated the ability to obtain excellent performance when learning directly from raw audio waveforms using convolutional neural networks (CNNs).
no code implementations • 9 Nov 2017 • Yuan Gong, Christian Poellabauer
Computational paralinguistic analysis is increasingly being used in a wide range of cyber applications, including security-sensitive applications such as speaker verification, deceptive speech detection, and medical diagnostics.