no code implementations • 28 May 2024 • Jiaze Wang, Yi Wang, Ziyu Guo, Renrui Zhang, Donghao Zhou, Guangyong Chen, Anfeng Liu, Pheng-Ann Heng
Data augmentation has proven to be a vital tool for enhancing the generalization capabilities of deep learning models, especially in the context of 3D vision where traditional datasets are often limited.
4 code implementations • 5 Apr 2024 • Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Han Xiao, Chaoyou Fu, Hao Dong, Peng Gao
To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning.
no code implementations • 21 Mar 2024 • Renrui Zhang, Dongzhi Jiang, Yichi Zhang, Haokun Lin, Ziyu Guo, Pengshuo Qiu, Aojun Zhou, Pan Lu, Kai-Wei Chang, Peng Gao, Hongsheng Li
To this end, we introduce MathVerse, an all-around visual math benchmark designed for an equitable and in-depth evaluation of MLLMs.
no code implementations • 26 Feb 2024 • Kexin Chen, Yuyang Du, Tao You, Mobarakol Islam, Ziyu Guo, Yueming Jin, Guangyong Chen, Pheng-Ann Heng
We further design an adaptive weight assignment approach that balances the generalization ability of the LLM and the domain expertise of the old CL model.
no code implementations • 22 Jan 2024 • Hao Chen, Jiaze Wang, Ziyu Guo, Jinpeng Li, Donghao Zhou, Bian Wu, Chenyong Guan, Guangyong Chen, Pheng-Ann Heng
Sign language recognition (SLR) plays a vital role in facilitating communication for the hearing-impaired community.
1 code implementation • 14 Sep 2023 • Ziyu Guo, Weiqin Zhao, Shujun Wang, Lequan Yu
Considering that the information from different resolutions is complementary and can benefit each other during the learning process, we further design a novel Bidirectional Interaction block to establish communication between different levels within the WSI pyramids.
2 code implementations • 7 Sep 2023 • Jiaming Han, Renrui Zhang, Wenqi Shao, Peng Gao, Peng Xu, Han Xiao, Kaipeng Zhang, Chris Liu, Song Wen, Ziyu Guo, Xudong Lu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Xiangyu Yue, Hongsheng Li, Yu Qiao
During training, we adopt a learnable bind network to align the embedding space between LLaMA and ImageBind's image encoder.
5 code implementations • 1 Sep 2023 • Ziyu Guo, Renrui Zhang, Xiangyang Zhu, Yiwen Tang, Xianzheng Ma, Jiaming Han, Kexin Chen, Peng Gao, Xianzhi Li, Hongsheng Li, Pheng-Ann Heng
We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image, language, audio, and video.
Ranked #5 on 3D Question Answering (3D-QA) on 3D MM-Vet
1 code implementation • 24 Aug 2023 • Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Hao Dong, Peng Gao
However, the prior pre-training stage not only introduces excessive time overhead, but also incurs a significant domain gap on `unseen' classes.
3D Semantic Segmentation Few-shot 3D semantic segmentation +1
1 code implementation • 25 May 2023 • Shilin Yan, Renrui Zhang, Ziyu Guo, Wenchao Chen, Wei zhang, Hongyang Li, Yu Qiao, Hao Dong, Zhongjiang He, Peng Gao
In this paper, we propose MUTR, a Multi-modal Unified Temporal transformer for Referring video object segmentation.
1 code implementation • 4 May 2023 • Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junting Pan, Xianzheng Ma, Hao Dong, Peng Gao, Hongsheng Li
Driven by large-data pre-training, Segment Anything Model (SAM) has been demonstrated as a powerful and promptable framework, revolutionizing the segmentation models.
Ranked #1 on Personalized Segmentation on PerSeg
2 code implementations • 14 Mar 2023 • Renrui Zhang, Liuhui Wang, Ziyu Guo, Yali Wang, Peng Gao, Hongsheng Li, Jianbo Shi
We present a Non-parametric Network for 3D point cloud analysis, Point-NN, which consists of purely non-learnable components: farthest point sampling (FPS), k-nearest neighbors (k-NN), and pooling operations, with trigonometric functions.
Ranked #1 on Training-free 3D Part Segmentation on ShapeNet-Part
Supervised Only 3D Point Cloud Classification Training-free 3D Part Segmentation +1
no code implementations • 1 Mar 2023 • Renrui Zhang, Liuhui Wang, Ziyu Guo, Jianbo Shi
Performances on standard 3D point cloud benchmarks have plateaued, resulting in oversized models and complex network design to make a fractional improvement.
no code implementations • 27 Feb 2023 • Ziyu Guo, Renrui Zhang, Longtian Qiu, Xianzhi Li, Pheng-Ann Heng
In this paper, we explore how the 2D modality can benefit 3D masked autoencoding, and propose Joint-MAE, a 2D-3D joint MAE framework for self-supervised 3D point cloud pre-training.
2 code implementations • ICCV 2023 • Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Ziyao Zeng, Zipeng Qin, Shanghang Zhang, Peng Gao
In this paper, we first collaborate CLIP and GPT to be a unified 3D open-world learner, named as PointCLIP V2, which fully unleashes their potential for zero-shot 3D classification, segmentation, and detection.
Ranked #2 on 3D Open-Vocabulary Instance Segmentation on STPLS3D
no code implementations • 15 Nov 2022 • Zihan Yang, Peng Chen, Ziyu Guo, Dahai Ni
In this work, we consider the Direction-of-Arrival (DOA) estimation problem in a low-cost architecture where only one antenna as the receiver is aided by a reconfigurable intelligent surface (RIS).
1 code implementation • 28 Sep 2022 • Ziyu Guo, Renrui Zhang, Longtian Qiu, Xianzheng Ma, Xupeng Miao, Xuming He, Bin Cui
Contrastive Language-Image Pre-training (CLIP) has been shown to learn visual representations with great transferability, which achieves promising accuracy for zero-shot classification.
Ranked #4 on Training-free 3D Point Cloud Classification on ScanObjectNN (using extra training data)
Training-free 3D Point Cloud Classification Transfer Learning +1
1 code implementation • 3 Jul 2022 • Renrui Zhang, Ziyao Zeng, Ziyu Guo, Yafeng Li
To our best knowledge, we are the first to conduct zero-shot adaptation from the semantic language knowledge to quantified downstream tasks and perform zero-shot monocular depth estimation.
3 code implementations • 28 May 2022 • Renrui Zhang, Ziyu Guo, Rongyao Fang, Bin Zhao, Dong Wang, Yu Qiao, Hongsheng Li, Peng Gao
By fine-tuning on downstream tasks, Point-M2AE achieves 86. 43% accuracy on ScanObjectNN, +3. 36% to the second-best, and largely benefits the few-shot classification, part segmentation and 3D object detection with the hierarchical pre-training scheme.
Ranked #5 on 3D Point Cloud Linear Classification on ModelNet40 (using extra training data)
1 code implementation • 25 Apr 2022 • Zhimin Chen, Peng Chen, Ziyu Guo, Yudong Zhang, Xianbin Wang
A novel estimation method is proposed in the scenario with a receiver using only one full-functional channel, where multiple measurements for the DOA estimation are achieved by controlling the reflection matrix (measurement matrix) in the RIS.
1 code implementation • ICCV 2023 • Renrui Zhang, Han Qiu, Tai Wang, Ziyu Guo, Xuanzhuo Xu, Ziteng Cui, Yu Qiao, Peng Gao, Hongsheng Li
In this paper, we introduce the first DETR framework for Monocular DEtection with a depth-guided TRansformer, named MonoDETR.
3D Object Detection From Monocular Images Autonomous Driving +4
no code implementations • 19 Mar 2022 • Peng Chen, Zihan Yang, Zhimin Chen, Ziyu Guo
The direction of arrival (DOA) estimation problem is addressed in this letter.
2 code implementations • CVPR 2022 • Renrui Zhang, Ziyu Guo, Wei zhang, Kunchang Li, Xupeng Miao, Bin Cui, Yu Qiao, Peng Gao, Hongsheng Li
On top of that, we design an inter-view adapter to better extract the global feature and adaptively fuse the few-shot knowledge learned from 3D into CLIP pre-trained in 2D.
Ranked #3 on 3D Open-Vocabulary Instance Segmentation on STPLS3D
3D Open-Vocabulary Instance Segmentation Few-Shot Learning +6
no code implementations • 4 Dec 2021 • Longtian Qiu, Renrui Zhang, Ziyu Guo, Ziyao Zeng, Zilu Guo, Yafeng Li, Guangnan Zhang
Contrastive Language-Image Pre-training (CLIP) has drawn increasing attention recently for its transferable visual representation learning.
1 code implementation • 19 Nov 2021 • Renrui Zhang, Ziyao Zeng, Ziyu Guo, Xinben Gao, Kexue Fu, Jianbo Shi
We reverse the conventional design of applying convolution on voxels and attention to points.
Ranked #36 on 3D Part Segmentation on ShapeNet-Part
no code implementations • 12 Oct 2021 • Huifeng Yao, Ziyu Guo, Yatao Zhang, Xiaomeng Li
This paper proposes a landmark detection network for detecting sutures in endoscopic pictures, which solves the problem of a variable number of suture points in the images.