no code implementations • 27 May 2024 • Haoyu Zhao, Wenhang Ge, Ying-Cong Chen
LLM-Optic first employs an LLM as a Text Grounder to interpret complex text queries and accurately identify objects the user intends to locate.
no code implementations • 24 May 2024 • Guibao Shen, Luozhou Wang, Jiantao Lin, Wenhang Ge, Chaozhe Zhang, Xin Tao, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Guangyong Chen, Yijun Li, Ying-Cong Chen
In this paper, we introduce the Scene Graph Adapter(SG-Adapter), leveraging the structured representation of scene graphs to rectify inaccuracies in the original text embeddings.
1 code implementation • 22 Apr 2024 • Tao Hu, Wenhang Ge, Yuyang Zhao, Gim Hee Lee
We introduce X-Ray, a novel 3D sequential representation inspired by the penetrability of x-ray scans.
1 code implementation • 26 Jun 2023 • Luozhou Wang, Guibao Shen, Wenhang Ge, Guangyong Chen, Yijun Li, Ying-Cong Chen
The ``Decompose'' phase separates conditions based on pair relationships, computing the result individually for each pair.
1 code implementation • ICCV 2023 • Wenhang Ge, Tao Hu, Haoyu Zhao, Shu Liu, Ying-Cong Chen
We show that together with a reflection direction-dependent radiance, our model achieves high-quality surface reconstruction on reflective surfaces and outperforms the state-of-the-arts by a large margin.
no code implementations • 27 Sep 2022 • Chengzhi Lin, AnCong Wu, Junwei Liang, Jun Zhang, Wenhang Ge, Wei-Shi Zheng, Chunhua Shen
To address this problem, we propose a Text-Adaptive Multiple Visual Prototype Matching model, which automatically captures multiple prototypes to describe a video by adaptive aggregation of video token features.
1 code implementation • CVPR 2022 • Chao Wu, Wenhang Ge, AnCong Wu, Xiaobin Chang
To learn camera-view invariant features for person Re-IDentification (Re-ID), the cross-camera image pairs of each person play an important role.
1 code implementation • 29 Jul 2021 • Wenhang Ge, Chunyan Pan, AnCong Wu, Hongwei Zheng, Wei-Shi Zheng
To learn camera-invariant representation from cross-camera unpaired training data, we propose a cross-camera feature prediction method to mine cross-camera self supervision information from camera-specific feature distribution by transforming fake cross-camera positive feature pairs and minimize the distances of the fake pairs.