Search Results for author: Xinxiao wu

Found 15 papers, 3 papers with code

Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning

no code implementations • 2 Mar 2024 • Shuo Yang, Zirui Shang, Yongqi Wang, Derong Deng, Hongwei Chen, Qiyuan Cheng, Xinxiao wu

This paper proposes a novel framework for multi-label image recognition without any training data, called data-free framework, which uses knowledge of pre-trained Large Language Model (LLM) to learn prompts to adapt pretrained Vision-Language Model (VLM) like CLIP to multilabel classification.

Language Modelling Large Language Model

Paper
Add Code

DiffCLIP: Leveraging Stable Diffusion for Language Grounded 3D Classification

no code implementations • 25 May 2023 • Sitian Shen, Zilin Zhu, Linqian Fan, Harry Zhang, Xinxiao wu

Large pre-trained models have had a significant impact on computer vision by enabling multi-modal learning, where the CLIP model has achieved impressive results in image classification, object detection, and semantic segmentation.

3D Classification Classification +5

Paper
Add Code

Meta-causal Learning for Single Domain Generalization

no code implementations • CVPR 2023 • Jin Chen, Zhi Gao, Xinxiao wu, Jiebo Luo

Under this paradigm, we propose a meta-causal learning method to learn meta-knowledge, that is, how to infer the causes of domain shift between the auxiliary and source domains during training.

Ranked #1 on Single-Source Domain Generalization on PACS

counterfactual Counterfactual Inference +2

Paper
Add Code

Teaching What You Should Teach: A Data-Based Distillation Method

no code implementations • 11 Dec 2022 • Shitong Shao, Huanran Chen, Zhen Huang, Linrui Gong, Shuai Wang, Xinxiao wu

To be specific, we design a neural network-based data augmentation module with priori bias, which assists in finding what meets the teacher's strengths but the student's weaknesses, by learning magnitudes and probabilities to generate suitable data samples.

Data Augmentation Knowledge Distillation +1

Paper
Add Code

Knowledge Prompting for Few-shot Action Recognition

no code implementations • 22 Nov 2022 • Yuheng Shi, Xinxiao wu, Hanxi Lin

Few-shot action recognition in videos is challenging for its lack of supervision and difficulty in generalizing to unseen actions.

Action Recognition In Videos Few-Shot action recognition +3

Paper
Add Code

Bootstrap Generalization Ability from Loss Landscape Perspective

1 code implementation • 18 Sep 2022 • Huanran Chen, Shitong Shao, Ziyi Wang, Zirui Shang, Jin Chen, Xiaofeng Ji, Xinxiao wu

Domain generalization aims to learn a model that can generalize well on the unseen test dataset, i. e., out-of-distribution data, which has different distribution from the training dataset.

Domain Generalization

Paper
Code

Entity-aware and Motion-aware Transformers for Language-driven Action Localization in Videos

1 code implementation • 12 May 2022 • Shuo Yang, Xinxiao wu

Language-driven action localization in videos is a challenging task that involves not only visual-linguistic matching but also action boundary prediction.

Action Localization Representation Learning

Paper
Code

Multi-modal Dependency Tree for Video Captioning

no code implementations • NeurIPS 2021 • Wentian Zhao, Xinxiao wu, Jiebo Luo

To this end, we propose a novel video captioning method that generates a sentence by first constructing a multi-modal dependency tree and then traversing the constructed tree, where the syntactic structure and semantic relationship in the sentence are represented by the tree topology.

Caption Generation Dependency Parsing +3

Paper
Add Code

Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph

no code implementations • 26 Jul 2021 • Wentian Zhao, Yao Hu, HeDa Wang, Xinxiao wu, Jiebo Luo

Entity-aware image captioning aims to describe named entities and events related to the image by utilizing the background knowledge in the associated article.

Graph Attention Image Captioning +1

Paper
Add Code

Adaptive Recursive Circle Framework for Fine-grained Action Recognition

no code implementations • 25 Jul 2021 • Hanxi Lin, Xinxiao wu, Jiebo Luo

It inherits the operators and parameters of the original layer but is slightly different in the use of those operators and parameters.

Fine-grained Action Recognition

Paper
Add Code

Video Captioning Using Weak Annotation

no code implementations • 2 Sep 2020 • Jingyi Hou, Yunde Jia, Xinxiao wu, Yayun Qi

Through traversing the dependency trees, the sentences are generated to train the captioning model.

Sentence Video Captioning +1

Paper
Add Code

MemCap: Memorizing Style Knowledge for Image Captioning

1 code implementation • AAAI 2020 • Wentian Zhao, Xinxiao wu, Xiaoxun Zhang

Generating stylized captions for images is a challenging task since it requires not only describing the content of the image accurately but also expressing the desired linguistic style appropriately.

Ranked #2 on Image Captioning on FlickrStyle10K

Image Captioning Language Modelling +1

Paper
Code

Relational Reasoning using Prior Knowledge for Visual Captioning

no code implementations • 4 Jun 2019 • Jingyi Hou, Xinxiao Wu, Yayun Qi, Wentian Zhao, Jiebo Luo, Yunde Jia

Extensive experiments on the MS-COCO image captioning benchmark and the MSVD video captioning benchmark validate the superiority of our method on leveraging prior commonsense knowledge to enhance relational reasoning for visual captioning.

Image Captioning object-detection +4

Paper
Add Code

Domain Adversarial Reinforcement Learning for Partial Domain Adaptation

no code implementations • 10 May 2019 • Jin Chen, Xinxiao wu, Lixin Duan, Shenghua Gao

In this more general and practical scenario, a major challenge is how to select source instances in the shared classes across different domains for positive transfer.

Partial Domain Adaptation Q-Learning +2

Paper
Add Code

Exploiting Images for Video Recognition with Hierarchical Generative Adversarial Networks

no code implementations • 11 May 2018 • Feiwu Yu, Xinxiao wu, Yuchao Sun, Lixin Duan

By taking advantage of these two-level adversarial learning, our method is capable of learning a domain-invariant feature representation of source images and target videos.

Domain Adaptation Video Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.