Search Results for author: Gaoang Wang

Found 51 papers, 18 papers with code

CityCraft: A Real Crafter for 3D City Generation

no code implementations • 7 Jun 2024 • Jie Deng, Wenhao Chai, Junsheng Huang, Zhonghan Zhao, Qixuan Huang, Mingyan Gao, Jianshu Guo, Shengyu Hao, Wenhao Hu, Jenq-Neng Hwang, Xi Li, Gaoang Wang

The rendered scenes lack variety, resembling the training images, resulting in monotonous styles.

Paper
Add Code

S4Fusion: Saliency-aware Selective State Space Model for Infrared Visible Image Fusion

no code implementations • 31 May 2024 • Haolong Ma, Hui Li, Chunyang Cheng, Gaoang Wang, Xiaoning Song, XiaoJun Wu

However, in image fusion, current methods underestimate the potential of SSSM in capturing the global spatial information of both modalities.

Infrared And Visible Image Fusion

Paper
Add Code

FlexiFilm: Long Video Generation with Flexible Conditions

1 code implementation • 29 Apr 2024 • Yichen Ouyang, Jianhao Yuan, Hao Zhao, Gaoang Wang, Bo Zhao

Generating long and consistent videos has emerged as a significant yet challenging problem.

Image Generation Video Generation

Paper
Code

MovieChat+: Question-aware Sparse Memory for Long Video Question Answering

1 code implementation • 26 Apr 2024 • Enxin Song, Wenhao Chai, Tian Ye, Jenq-Neng Hwang, Xi Li, Gaoang Wang

Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific pre-defined vision tasks.

Ranked #2 on Question Answering on NExT-QA (Open-ended VideoQA)

2k Question Answering +2

429

Paper
Code

Do We Really Need a Complex Agent System? Distill Embodied Agent into a Single Model

no code implementations • 6 Apr 2024 • Zhonghan Zhao, Ke Ma, Wenhao Chai, Xuan Wang, Kewei Chen, Dongxu Guo, Yanting Zhang, Hongwei Wang, Gaoang Wang

After distillation, embodied agents can complete complex, open-ended tasks without additional expert guidance, utilizing the performance and knowledge of a versatile MLM.

Knowledge Distillation

Paper
Add Code

VersaT2I: Improving Text-to-Image Models with Versatile Reward

no code implementations • 27 Mar 2024 • Jianshu Guo, Wenhao Chai, Jie Deng, Hsiang-Wei Huang, Tian Ye, Yichen Xu, Jiawei Zhang, Jenq-Neng Hwang, Gaoang Wang

Recent text-to-image (T2I) models have benefited from large-scale and high-quality data, demonstrating impressive performance.

Paper
Add Code

Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation

no code implementations • 13 Mar 2024 • Zhonghan Zhao, Kewei Chen, Dongxu Guo, Wenhao Chai, Tian Ye, Yanting Zhang, Gaoang Wang

To assess organizational behavior, we design a series of navigation tasks in the Minecraft environment, which includes searching and exploring.

Navigate

Paper
Add Code

MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant

no code implementations • 7 Mar 2024 • Chenlu Zhan, Yu Lin, Gaoang Wang, Hongwei Wang, Jian Wu

Medical generative models, acknowledged for their high-quality sample generation ability, have accelerated the fast growth of medical applications.

Clinical Knowledge

Paper
Add Code

UniDCP: Unifying Multiple Medical Vision-language Tasks via Dynamic Cross-modal Learnable Prompts

no code implementations • 18 Dec 2023 • Chenlu Zhan, Yufei Zhang, Yu Lin, Gaoang Wang, Hongwei Wang

Medical vision-language pre-training (Med-VLP) models have recently accelerated the fast-growing medical diagnostics application.

Language Modelling

Paper
Add Code

User-Aware Prefix-Tuning is a Good Learner for Personalized Image Captioning

no code implementations • 8 Dec 2023 • Xuan Wang, Guanhong Wang, Wenhao Chai, Jiayu Zhou, Gaoang Wang

Moreover, we employ GPT-2 as the frozen large language model.

Image Captioning Language Modelling +1

Paper
Add Code

CityGen: Infinite and Controllable 3D City Layout Generation

no code implementations • 3 Dec 2023 • Jie Deng, Wenhao Chai, Jianshu Guo, Qixuan Huang, Wenhao Hu, Jenq-Neng Hwang, Gaoang Wang

In this paper, we propose CityGen, a novel end-to-end framework for infinite, diverse and controllable 3D city layout generation. First, we propose an outpainting pipeline to extend the local layout to an infinite city layout.

Paper
Add Code

See and Think: Embodied Agent in Virtual Environment

no code implementations • 26 Nov 2023 • Zhonghan Zhao, Wenhao Chai, Xuan Wang, Li Boyi, Shengyu Hao, Shidong Cao, Tian Ye, Jenq-Neng Hwang, Gaoang Wang

Vision perception involves the interpretation of visual information in the environment, which is then integrated into the LLMs component with agent state and task instruction.

Question Answering Retrieval

Paper
Add Code

Vision meets mmWave Radar: 3D Object Perception Benchmark for Autonomous Driving

no code implementations • 17 Nov 2023 • Yizhou Wang, Jen-Hao Cheng, Jui-Te Huang, Sheng-Yao Kuan, Qiqian Fu, Chiming Ni, Shengyu Hao, Gaoang Wang, Guanbin Xing, Hui Liu, Jenq-Neng Hwang

This kind of radar format can enable machine learning models to generate more reliable object perception results after interacting and fusing the information or features between the camera and radar.

Autonomous Driving Sensor Fusion

Paper
Add Code

Sam-Guided Enhanced Fine-Grained Encoding with Mixed Semantic Learning for Medical Image Captioning

1 code implementation • 2 Nov 2023 • Zhenyu Zhang, Benlu Wang, Weijie Liang, Yizhi Li, Xuechen Guo, Guanhong Wang, Shiyan Li, Gaoang Wang

With the development of multimodality and large language models, the deep learning-based technique for medical image captioning holds the potential to offer valuable diagnostic recommendations.

Image Captioning

Paper
Code

Devil in the Number: Towards Robust Multi-modality Data Filter

no code implementations • 24 Sep 2023 • Yichen Xu, Zihan Xu, Wenhao Chai, Zhonghan Zhao, Enxin Song, Gaoang Wang

In order to appropriately filter multi-modality data sets on a web-scale, it becomes crucial to employ suitable filtering methods to boost performance and reduce training costs.

Paper
Add Code

FrameRS: A Video Frame Compression Model Composed by Self supervised Video Frame Reconstructor and Key Frame Selector

1 code implementation • 16 Sep 2023 • Qiqian Fu, Guanhong Wang, Gaoang Wang

The key frame selector, Frame Selector, is built on CNN architecture.

Computational Efficiency

Paper
Code

Chasing Consistency in Text-to-3D Generation from a Single Image

no code implementations • 7 Sep 2023 • Yichen Ouyang, Wenhao Chai, Jiayi Ye, Dapeng Tao, Yibing Zhan, Gaoang Wang

In light of the above issues, we present Consist3D, a three-stage framework Chasing for semantic-, geometric-, and saturation-Consistent Text-to-3D generation from a single image, in which the first two stages aim to learn parameterized consistency tokens, and the last stage is for optimization.

3D Generation Text to 3D

Paper
Add Code

Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection

1 code implementation • ICCV 2023 • Longrong Yang, Xianpan Zhou, XueWei Li, Liang Qiao, Zheyang Li, Ziwei Yang, Gaoang Wang, Xi Li

Thus, the optimum of the distillation loss does not necessarily lead to the optimal student classification scores for dense object detectors.

Binary Classification Classification +4

Paper
Code

UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning

no code implementations • 19 Aug 2023 • Meiqi Sun, Zhonghan Zhao, Wenhao Chai, Hanjun Luo, Shidong Cao, Yanting Zhang, Jenq-Neng Hwang, Gaoang Wang

Our proposed model takes support images and labels as prompt guidance for a query image.

Decoder Few-Shot Learning +1

Paper
Add Code

StableVideo: Text-driven Consistency-aware Diffusion Video Editing

1 code implementation • ICCV 2023 • Wenhao Chai, Xun Guo, Gaoang Wang, Yan Lu

In this paper, we tackle this problem by introducing temporal dependency to existing text-driven diffusion models, which allows them to generate consistent appearance for the edited objects.

Video Editing

1,342

Paper
Code

PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation

1 code implementation • 18 Aug 2023 • Hanbing Liu, Jun-Yan He, Zhi-Qi Cheng, Wangmeng Xiang, Qize Yang, Wenhao Chai, Gaoang Wang, Xu Bao, Bin Luo, Yifeng Geng, Xuansong Xie

Typically, PoSynDA uses a diffusion-inspired structure to simulate 3D pose distribution in the target domain.

3D Human Pose Estimation Domain Adaptation

Paper
Code

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

1 code implementation • 31 Jul 2023 • Enxin Song, Wenhao Chai, Guanhong Wang, Yucheng Zhang, Haoyang Zhou, Feiyang Wu, Haozhe Chi, Xun Guo, Tian Ye, Yanting Zhang, Yan Lu, Jenq-Neng Hwang, Gaoang Wang

Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific pre-defined vision tasks.

Ranked #1 on zero-shot long video global-mode question answering on MovieChat-1K

Video-based Generative Performance Benchmarking (Consistency) Video-based Generative Performance Benchmarking (Contextual Understanding) +10

429

Paper
Code

A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision

no code implementations • 7 Jul 2023 • Zhonghan Zhao, Wenhao Chai, Shengyu Hao, Wenhao Hu, Guanhong Wang, Shidong Cao, Mingli Song, Jenq-Neng Hwang, Gaoang Wang

Deep learning has the potential to revolutionize sports performance, with applications ranging from perception and comprehension to decision.

Paper
Add Code

MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling

no code implementations • 29 Jun 2023 • Zhenyu Zhang, Wenhao Chai, Zhongyu Jiang, Tian Ye, Mingli Song, Jenq-Neng Hwang, Gaoang Wang

In this paper, we propose MPM, a unified 2D-3D human pose representation framework via masked pose modeling.

3D Human Pose Estimation 3D Pose Estimation

Paper
Add Code

SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation

1 code implementation • 6 Jun 2023 • XueWei Li, Tao Wu, Zhongang Qi, Gaoang Wang, Ying Shan, Xi Li

Experimental results on Stanford2D3D Panoramic datasets show that SGAT4PASS significantly improves performance and robustness, with approximately a 2% increase in mIoU, and when small 3D disturbances occur in the data, the stability of our performance is improved by an order of magnitude.

Ranked #4 on Semantic Segmentation on Stanford2D3D Panoramic

Semantic Segmentation

Paper
Code

Language Adaptive Weight Generation for Multi-task Visual Grounding

1 code implementation • CVPR 2023 • Wei Su, Peihan Miao, Huanzhang Dou, Gaoang Wang, Liang Qiao, Zheyang Li, Xi Li

The active perception can take expressions as priors to extract relevant visual features, which can effectively alleviate the mismatches.

Referring Expression Referring Expression Comprehension +1

Paper
Code

Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation

1 code implementation • ICCV 2023 • Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang, Gaoang Wang

We observe that the degradation is caused by two factors: 1) the large distribution gap over global positions of poses between the source and target datasets due to variant camera parameters and settings, and 2) the deficient diversity of local structures of poses in training.

Ranked #1 on 3D Human Pose Estimation in Limited Data on Human3.6M

3D Human Pose Estimation 3D Human Pose Estimation in Limited Data +3

Paper
Code

DDMM-Synth: A Denoising Diffusion Model for Cross-modal Medical Image Synthesis with Sparse-view Measurement Embedding

no code implementations • 28 Mar 2023 • Xiaoyue Li, Kai Shang, Gaoang Wang, Mark D. Butala

Reducing the radiation dose in computed tomography (CT) is important to mitigate radiation-induced risks.

Computed Tomography (CT) Denoising +1

Paper
Add Code

Blind Inpainting with Object-aware Discrimination for Artificial Marker Removal

no code implementations • 27 Mar 2023 • Xuechen Guo, Wenhao Hu, Chiming Ni, Wenhao Chai, Shiyan Li, Gaoang Wang

The reconstruction network consists of two branches that predict the corrupted regions with artificial markers and simultaneously recover the missing visual contents.

Object

Paper
Add Code

Deep Learning Methods for Small Molecule Drug Discovery: A Survey

no code implementations • 1 Mar 2023 • Wenhao Hu, Yingying Liu, Xuanyu Chen, Wenhao Chai, Hangyue Chen, Hongwei Wang, Gaoang Wang

With the development of computer-assisted techniques, research communities including biochemistry and deep learning have been devoted into the drug discovery field for over a decade.

Drug Discovery Molecular Property Prediction +2

Paper
Add Code

DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes

2 code implementations • 15 Feb 2023 • Shenghao Hao, Peiyuan Liu, Yibing Zhan, Kaixun Jin, Zuozhu Liu, Mingli Song, Jenq-Neng Hwang, Gaoang Wang

Although cross-view multi-object tracking has received increased attention in recent years, existing datasets still have several issues, including 1) missing real-world scenarios, 2) lacking diverse scenes, 3) owning a limited number of tracks, 4) comprising only static cameras, and 5) lacking standard benchmarks, which hinder the investigation and comparison of cross-view tracking methods.

Multi-Object Tracking Object +2

Paper
Code

DiffFashion: Reference-based Fashion Design with Structure-aware Transfer by Diffusion Models

1 code implementation • 14 Feb 2023 • Shidong Cao, Wenhao Chai, Shengyu Hao, Yanting Zhang, Hangyue Chen, Gaoang Wang

We focus on a new fashion design task, where we aim to transfer a reference appearance image onto a clothing image while preserving the structure of the clothing image.

Denoising Style Transfer

Paper
Code

STSC-SNN: Spatio-Temporal Synaptic Connection with Temporal Convolution and Attention for Spiking Neural Networks

1 code implementation • 11 Oct 2022 • Chengting Yu, Zheming Gu, Da Li, Gaoang Wang, Aili Wang, Erping Li

We show that endowing synaptic models with temporal dependencies can improve the performance of SNNs on classification tasks.

Ranked #4 on Audio Classification on SHD

Audio Classification Gesture Recognition +1

Paper
Code

Missing Modality meets Meta Sampling (M3S): An Efficient Universal Approach for Multimodal Sentiment Analysis with Missing Modality

no code implementations • 7 Oct 2022 • Haozhe Chi, Minghua Yang, Junhao Zhu, Guanhong Wang, Gaoang Wang

Multimodal sentiment analysis (MSA) is an important way of observing mental activities with the help of data captured from multiple modalities.

Meta-Learning Multimodal Sentiment Analysis

Paper
Add Code

Hierarchical Semi-Supervised Contrastive Learning for Contamination-Resistant Anomaly Detection

1 code implementation • 24 Jul 2022 • Gaoang Wang, Yibing Zhan, Xinchao Wang, Mingli Song, Klara Nahrstedt

Anomaly detection aims at identifying deviant samples from the normal data distribution.

Contrastive Learning One-Class Classification +1

Paper
Code

Recent Advances in Embedding Methods for Multi-Object Tracking: A Survey

no code implementations • 22 May 2022 • Gaoang Wang, Mingli Song, Jenq-Neng Hwang

Multi-object tracking (MOT) aims to associate target objects across video frames in order to obtain entire moving trajectories.

Image Classification Multi-Object Tracking +3

Paper
Add Code

Preserve Pre-trained Knowledge: Transfer Learning With Self-Distillation For Action Recognition

no code implementations • 1 May 2022 • Yang Zhou, Zhanhao He, Keyu Lu, Guanhong Wang, Gaoang Wang

Video-based action recognition is one of the most popular topics in computer vision.

Action Recognition Representation Learning +1

Paper
Add Code

Human-Centered Prior-Guided and Task-Dependent Multi-Task Representation Learning for Action Recognition Pre-Training

no code implementations • 27 Apr 2022 • Guanhong Wang, Keyu Lu, Yang Zhou, Zhanhao He, Gaoang Wang

Recently, much progress has been made for self-supervised action recognition.

Contrastive Learning Human Parsing +5

Paper
Add Code

Self-paced Multi-grained Cross-modal Interaction Modeling for Referring Expression Comprehension

no code implementations • 21 Apr 2022 • Peihan Miao, Wei Su, Gaoang Wang, XueWei Li, Xi Li

As an important and challenging problem in vision-language tasks, referring expression comprehension (REC) generally requires a large amount of multi-grained information of visual and linguistic modalities to realize accurate reasoning.

Informativeness Referring Expression +1

Paper
Add Code

MAP-SNN: Mapping Spike Activities with Multiplicity, Adaptability, and Plasticity into Bio-Plausible Spiking Neural Networks

1 code implementation • 21 Apr 2022 • Chengting Yu, Yangkai Du, Mufeng Chen, Aili Wang, Gaoang Wang, Erping Li

For plasticity, we propose a trainable convolutional synapse that models spike response current to enhance the diversity of spiking neurons for temporal feature extraction.

Ranked #7 on Audio Classification on SHD

Audio Classification

Paper
Code

Disjoint Contrastive Regression Learning for Multi-Sourced Annotations

no code implementations • 31 Dec 2021 • Xiaoqian Ruan, Gaoang Wang

However, the inconsistency and bias among different annotators are harmful to the model training, especially for qualitative and subjective tasks. To address this challenge, in this paper, we propose a novel contrastive regression framework to address the disjoint annotations problem, where each sample is labeled by only one annotator and multiple annotators work on disjoint subsets of the data.

Image Quality Assessment regression

Paper
Add Code

ActiveMatch: End-to-end Semi-supervised Active Representation Learning

no code implementations • 6 Oct 2021 • Xinkai Yuan, Zilinghan Li, Gaoang Wang

With human-in-the-loop, active learning can iteratively select informative unlabeled samples for labeling and training to improve the performance in the SSL framework.

Active Learning Contrastive Learning +1

Paper
Add Code

Track without Appearance: Learn Box and Tracklet Embedding with Local and Global Motion Patterns for Vehicle Tracking

1 code implementation • ICCV 2021 • Gaoang Wang, Renshu Gu, Zuozhu Liu, Weijie Hu, Mingli Song, Jenq-Neng Hwang

In this paper, we try to explore the significance of motion patterns for vehicle tracking without appearance information.

Multi-Object Tracking

Paper
Code

Rethinking of Radar's Role: A Camera-Radar Dataset and Systematic Annotator via Coordinate Alignment

no code implementations • 11 May 2021 • Yizhou Wang, Gaoang Wang, Hung-Min Hsu, Hui Liu, Jenq-Neng Hwang

Radar has long been a common sensor on autonomous vehicles for obstacle ranging and speed estimation.

Autonomous Vehicles object-detection +2

Paper
Add Code

Split and Connect: A Universal Tracklet Booster for Multi-Object Tracking

no code implementations • 6 May 2021 • Gaoang Wang, Yizhou Wang, Renshu Gu, Weijie Hu, Jenq-Neng Hwang

To address such common challenges in most of the existing trackers, in this paper, a tracklet booster algorithm is proposed, which can be built upon any other tracker.

Multi-Object Tracking

Paper
Add Code

When Few-Shot Learning Meets Video Object Detection

no code implementations • 26 Mar 2021 • Zhongjie Yu, Gaoang Wang, Lin Chen, Sebastian Raschka, Jiebo Luo

We employ a transfer-learning framework to effectively train the video object detector on a large number of base-class objects and a few video clips of novel-class objects.

Few-Shot Video Object Detection Object +3

Paper
Add Code

DAIL: Dataset-Aware and Invariant Learning for Face Recognition

no code implementations • 14 Jan 2021 • Gaoang Wang, Lin Chen, Tianqiang Liu, Mingwei He, Jiebo Luo

To solve the first issue of identity overlapping, we propose a dataset-aware loss for multi-dataset training by reducing the penalty when the same person appears in multiple datasets.

Domain Adaptation Face Recognition

Paper
Add Code

Exploring Severe Occlusion: Multi-Person 3D Pose Estimation with Gated Convolution

no code implementations • 31 Oct 2020 • Renshu Gu, Gaoang Wang, Jenq-Neng Hwang

Videos that contain multiple potentially occluded people captured from freely moving monocular cameras are very common in real-world scenarios, while 3D HPE for such scenarios is quite challenging, partially because there is a lack of such data with accurate 3D ground truth labels in existing datasets.

3D Human Pose Estimation 3D Pose Estimation

Paper
Add Code

Eye in the Sky: Drone-Based Object Tracking and 3D Localization

no code implementations • 18 Oct 2019 • Haotian Zhang, Gaoang Wang, Zhichao Lei, Jenq-Neng Hwang

Drones, or general UAVs, equipped with a single camera have been widely deployed to a broad range of applications, such as aerial photography, fast goods delivery and most importantly, surveillance.

drone-based object tracking Multi-Object Tracking +3

Paper
Add Code

Exploit the Connectivity: Multi-Object Tracking with TrackletNet

1 code implementation • 18 Nov 2018 • Gaoang Wang, Yizhou Wang, Haotian Zhang, Renshu Gu, Jenq-Neng Hwang

Multi-object tracking (MOT) is an important and practical task related to both surveillance systems and moving camera applications, such as autonomous driving and robotic vision.

Ranked #19 on Multi-Object Tracking on MOT16

Autonomous Driving Multi-Object Tracking +1

Paper
Code

Multiple-Kernel Based Vehicle Tracking Using 3D Deformable Model and Camera Self-Calibration

no code implementations • 22 Aug 2017 • Zheng Tang, Gaoang Wang, Tao Liu, Young-Gun Lee, Adwin Jahn, Xu Liu, Xiaodong He, Jenq-Neng Hwang

In this challenge, we propose a model-based vehicle localization method, which builds a kernel at each patch of the 3D deformable vehicle model and associates them with constraints in 3D space.

Ensemble Learning object-detection +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.