no code implementations • 2 Mar 2024 • Shuo Yang, Zirui Shang, Yongqi Wang, Derong Deng, Hongwei Chen, Qiyuan Cheng, Xinxiao wu
This paper proposes a novel framework for multi-label image recognition without any training data, called data-free framework, which uses knowledge of pre-trained Large Language Model (LLM) to learn prompts to adapt pretrained Vision-Language Model (VLM) like CLIP to multilabel classification.
no code implementations • 25 May 2023 • Sitian Shen, Zilin Zhu, Linqian Fan, Harry Zhang, Xinxiao wu
Large pre-trained models have had a significant impact on computer vision by enabling multi-modal learning, where the CLIP model has achieved impressive results in image classification, object detection, and semantic segmentation.
no code implementations • CVPR 2023 • Jin Chen, Zhi Gao, Xinxiao wu, Jiebo Luo
Under this paradigm, we propose a meta-causal learning method to learn meta-knowledge, that is, how to infer the causes of domain shift between the auxiliary and source domains during training.
Ranked #1 on Single-Source Domain Generalization on PACS
no code implementations • 11 Dec 2022 • Shitong Shao, Huanran Chen, Zhen Huang, Linrui Gong, Shuai Wang, Xinxiao wu
To be specific, we design a neural network-based data augmentation module with priori bias, which assists in finding what meets the teacher's strengths but the student's weaknesses, by learning magnitudes and probabilities to generate suitable data samples.
no code implementations • 22 Nov 2022 • Yuheng Shi, Xinxiao wu, Hanxi Lin
Few-shot action recognition in videos is challenging for its lack of supervision and difficulty in generalizing to unseen actions.
1 code implementation • 18 Sep 2022 • Huanran Chen, Shitong Shao, Ziyi Wang, Zirui Shang, Jin Chen, Xiaofeng Ji, Xinxiao wu
Domain generalization aims to learn a model that can generalize well on the unseen test dataset, i. e., out-of-distribution data, which has different distribution from the training dataset.
1 code implementation • 12 May 2022 • Shuo Yang, Xinxiao wu
Language-driven action localization in videos is a challenging task that involves not only visual-linguistic matching but also action boundary prediction.
no code implementations • NeurIPS 2021 • Wentian Zhao, Xinxiao wu, Jiebo Luo
To this end, we propose a novel video captioning method that generates a sentence by first constructing a multi-modal dependency tree and then traversing the constructed tree, where the syntactic structure and semantic relationship in the sentence are represented by the tree topology.
no code implementations • 26 Jul 2021 • Wentian Zhao, Yao Hu, HeDa Wang, Xinxiao wu, Jiebo Luo
Entity-aware image captioning aims to describe named entities and events related to the image by utilizing the background knowledge in the associated article.
no code implementations • 25 Jul 2021 • Hanxi Lin, Xinxiao wu, Jiebo Luo
It inherits the operators and parameters of the original layer but is slightly different in the use of those operators and parameters.
no code implementations • 2 Sep 2020 • Jingyi Hou, Yunde Jia, Xinxiao wu, Yayun Qi
Through traversing the dependency trees, the sentences are generated to train the captioning model.
1 code implementation • AAAI 2020 • Wentian Zhao, Xinxiao wu, Xiaoxun Zhang
Generating stylized captions for images is a challenging task since it requires not only describing the content of the image accurately but also expressing the desired linguistic style appropriately.
Ranked #2 on Image Captioning on FlickrStyle10K
no code implementations • 4 Jun 2019 • Jingyi Hou, Xinxiao Wu, Yayun Qi, Wentian Zhao, Jiebo Luo, Yunde Jia
Extensive experiments on the MS-COCO image captioning benchmark and the MSVD video captioning benchmark validate the superiority of our method on leveraging prior commonsense knowledge to enhance relational reasoning for visual captioning.
no code implementations • 10 May 2019 • Jin Chen, Xinxiao wu, Lixin Duan, Shenghua Gao
In this more general and practical scenario, a major challenge is how to select source instances in the shared classes across different domains for positive transfer.
no code implementations • 11 May 2018 • Feiwu Yu, Xinxiao wu, Yuchao Sun, Lixin Duan
By taking advantage of these two-level adversarial learning, our method is capable of learning a domain-invariant feature representation of source images and target videos.