Search Results for author: Chenfei Wu

Found 30 papers, 18 papers with code

Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification

1 code implementation • 5 Jun 2024 • Gexin Huang, Chenfei Wu, Mingjie Li, Xiaojun Chang, Ling Chen, Ying Sun, Shen Zhao, Xiaodan Liang, Liang Lin

(b) A knowledge association module that fuses linguistic and biomedical knowledge into gene priors by transformer-based graph representation learning, capturing the intrinsic relationships between different genes' mutations.

Binary Classification Graph Representation Learning +2

Paper
Code

LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models

1 code implementation • 3 Apr 2024 • Gabriela Ben Melech Stan, Estelle Aflalo, Raanan Yehezkel Rohekar, Anahita Bhiwandiwalla, Shao-Yen Tseng, Matthew Lyle Olson, Yaniv Gurwicz, Chenfei Wu, Nan Duan, Vasudev Lal

In this work, we present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models.

Language Modelling

Paper
Code

Using Left and Right Brains Together: Towards Vision and Language Planning

no code implementations • 16 Feb 2024 • Jun Cen, Chenfei Wu, Xiao Liu, Shengming Yin, Yixuan Pei, Jinglong Yang, Qifeng Chen, Nan Duan, JianGuo Zhang

Large Language Models (LLMs) and Large Multi-modality Models (LMMs) have demonstrated remarkable decision masking capabilities on a variety of tasks.

Paper
Add Code

StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis

no code implementations • 30 Jan 2024 • Zecheng Tang, Chenfei Wu, Zekai Zhang, Mingheng Ni, Shengming Yin, Yu Liu, Zhengyuan Yang, Lijuan Wang, Zicheng Liu, Juntao Li, Nan Duan

To leverage LLMs for visual synthesis, traditional methods convert raster image information into discrete grid tokens through specialized visual modules, while disrupting the model's ability to capture the true semantic representation of visual scenes.

Vector Graphics

Paper
Add Code

EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation

no code implementations • 12 Oct 2023 • Wang You, Wenshan Wu, Yaobo Liang, Shaoguang Mao, Chenfei Wu, Maosong Cao, Yuzhe Cai, Yiduo Guo, Yan Xia, Furu Wei, Nan Duan

In this paper, we propose a new framework called Evaluation-guided Iterative Plan Extraction for long-form narrative text generation (EIPE-text), which extracts plans from the corpus of narratives and utilizes the extracted plans to construct a better planner.

In-Context Learning Text Generation

Paper
Add Code

LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models

1 code implementation • 18 Sep 2023 • Zecheng Tang, Chenfei Wu, Juntao Li, Nan Duan

Graphic layout generation, a growing research field, plays a significant role in user engagement and information perception.

Code Completion Code Generation

125

Paper
Code

ORES: Open-vocabulary Responsible Visual Synthesis

1 code implementation • 26 Aug 2023 • Minheng Ni, Chenfei Wu, Xiaodong Wang, Shengming Yin, Lijuan Wang, Zicheng Liu, Nan Duan

In this work, we formalize a new task, Open-vocabulary Responsible Visual Synthesis (ORES), where the synthesis model is able to avoid forbidden visual concepts while allowing users to input any desired content.

Image Generation Language Modelling

Paper
Code

GameEval: Evaluating LLMs on Conversational Games

1 code implementation • 19 Aug 2023 • Dan Qiao, Chenfei Wu, Yaobo Liang, Juntao Li, Nan Duan

In this paper, we propose GameEval, a novel approach to evaluating LLMs through goal-driven conversational games, overcoming the limitations of previous methods.

Question Answering

Paper
Code

DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory

1 code implementation • 16 Aug 2023 • Shengming Yin, Chenfei Wu, Jian Liang, Jie Shi, Houqiang Li, Gong Ming, Nan Duan

Our experiments validate the effectiveness of DragNUWA, demonstrating its superior performance in fine-grained control in video generation.

Trajectory Modeling Video Generation

131

Paper
Code

ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

1 code implementation • 31 May 2023 • Xiao Xu, Bei Li, Chenfei Wu, Shao-Yen Tseng, Anahita Bhiwandiwalla, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan

With only 4M VLP data, ManagerTower achieves superior performances on various downstream VL tasks, especially 79. 15% accuracy on VQAv2 Test-Std, 86. 56% IR@1 and 95. 64% TR@1 on Flickr30K.

Representation Learning

Paper
Code

Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining

1 code implementation • 26 Apr 2023 • Bingqian Lin, Zicong Chen, Mingjie Li, Haokun Lin, Hang Xu, Yi Zhu, Jianzhuang Liu, Wenjia Cai, Lei Yang, Shen Zhao, Chenfei Wu, Ling Chen, Xiaojun Chang, Yi Yang, Lei Xing, Xiaodan Liang

In MOTOR, we combine two kinds of basic medical knowledge, i. e., general and specific knowledge, in a complementary manner to boost the general pretraining process.

Medical Visual Question Answering Question Answering +1

Paper
Code

Learning to Plan with Natural Language

1 code implementation • 20 Apr 2023 • Yiduo Guo, Yaobo Liang, Chenfei Wu, Wenshan Wu, Dongyan Zhao, Nan Duan

To obtain it, we propose the Learning to Plan method, which involves two phases: (1) In the first learning task plan phase, it iteratively updates the task plan with new step-by-step solutions and behavioral instructions, which are obtained by prompting LLMs to derive from training error feedback.

Transfer Learning

Paper
Code

Low-code LLM: Graphical User Interface over Large Language Models

2 code implementations • 17 Apr 2023 • Yuzhe Cai, Shaoguang Mao, Wenshan Wu, Zehua Wang, Yaobo Liang, Tao Ge, Chenfei Wu, Wang You, Ting Song, Yan Xia, Jonathan Tien, Nan Duan, Furu Wei

By introducing this framework, we aim to bridge the gap between humans and LLMs, enabling more effective and efficient utilization of LLMs for complex tasks.

Prompt Engineering

34,529

Paper
Code

TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

no code implementations • 29 Mar 2023 • Yaobo Liang, Chenfei Wu, Ting Song, Wenshan Wu, Yan Xia, Yu Liu, Yang Ou, Shuai Lu, Lei Ji, Shaoguang Mao, Yun Wang, Linjun Shou, Ming Gong, Nan Duan

On the other hand, there are also many existing models and systems (symbolic-based or neural-based) that can do some domain-specific tasks very well.

Code Generation Common Sense Reasoning +1

Paper
Add Code

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation

no code implementations • 22 Mar 2023 • Shengming Yin, Chenfei Wu, Huan Yang, JianFeng Wang, Xiaodong Wang, Minheng Ni, Zhengyuan Yang, Linjie Li, Shuguang Liu, Fan Yang, Jianlong Fu, Gong Ming, Lijuan Wang, Zicheng Liu, Houqiang Li, Nan Duan

In this paper, we propose NUWA-XL, a novel Diffusion over Diffusion architecture for eXtremely Long video generation.

Video Generation

Paper
Add Code

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

3 code implementations • 8 Mar 2023 • Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, Nan Duan

To this end, We build a system called \textbf{Visual ChatGPT}, incorporating different Visual Foundation Models, to enable the user to interact with ChatGPT by 1) sending and receiving not only languages but also images 2) providing complex visual questions or visual editing instructions that require the collaboration of multiple AI models with multi-steps.

34,529

Paper
Code

Learning 3D Photography Videos via Self-supervised Diffusion on Single Images

no code implementations • 21 Feb 2023 • Xiaodong Wang, Chenfei Wu, Shengming Yin, Minheng Ni, JianFeng Wang, Linjie Li, Zhengyuan Yang, Fan Yang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan

3D photography renders a static image into a video with appealing 3D visual effects.

Ranked #1 on Image Outpainting on MSCOCO

Image Outpainting Monocular Depth Estimation

Paper
Add Code

ReCo: Region-Controlled Text-to-Image Generation

no code implementations • CVPR 2023 • Zhengyuan Yang, JianFeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang

Human evaluation on PaintSkill shows that ReCo is +19. 28% and +17. 21% more accurate in generating images with correct object count and spatial relationship than the T2I model.

Ranked #2 on Conditional Text-to-Image Synthesis on COCO-MIG

Conditional Text-to-Image Synthesis Position

Paper
Add Code

HORIZON: High-Resolution Semantically Controlled Panorama Synthesis

no code implementations • 10 Oct 2022 • Kun Yan, Lei Ji, Chenfei Wu, Jian Liang, Ming Zhou, Nan Duan, Shuai Ma

Panorama synthesis endeavors to craft captivating 360-degree visual landscapes, immersing users in the heart of virtual worlds.

Vocal Bursts Intensity Prediction

Paper
Add Code

NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis

1 code implementation • 20 Jul 2022 • Chenfei Wu, Jian Liang, Xiaowei Hu, Zhe Gan, JianFeng Wang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan

In this paper, we present NUWA-Infinity, a generative model for infinite visual synthesis, which is defined as the task of generating arbitrarily-sized high-resolution images or long-duration videos.

Ranked #1 on Image Outpainting on LHQC

Image Outpainting Text-to-Image Generation +1

2,796

Paper
Code

BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning

1 code implementation • 17 Jun 2022 • Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan

Vision-Language (VL) models with the Two-Tower architecture have dominated visual-language representation learning in recent years.

Representation Learning

141

Paper
Code

DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder

no code implementations • 1 Jun 2022 • Jie Shi, Chenfei Wu, Jian Liang, Xiang Liu, Nan Duan

Our work proposes a VQ-VAE architecture model with a diffusion decoder (DiVAE) to work as the reconstructing component in image synthesis.

Decoder Denoising +1

Paper
Add Code

VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers

1 code implementation • CVPR 2022 • Estelle Aflalo, Meng Du, Shao-Yen Tseng, Yongfei Liu, Chenfei Wu, Nan Duan, Vasudev Lal

Breakthroughs in transformer-based models have revolutionized not only the NLP field, but also vision and multimodal systems.

Question Answering Visual Commonsense Reasoning +1

Paper
Code

NÜWA-LIP: Language Guided Image Inpainting with Defect-free VQGAN

no code implementations • 10 Feb 2022 • Minheng Ni, Chenfei Wu, Haoyang Huang, Daxin Jiang, WangMeng Zuo, Nan Duan

Language guided image inpainting aims to fill in the defective regions of an image under the guidance of text while keeping non-defective regions unchanged.

Image Inpainting

Paper
Add Code

NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion

1 code implementation • 24 Nov 2021 • Chenfei Wu, Jian Liang, Lei Ji, Fan Yang, Yuejian Fang, Daxin Jiang, Nan Duan

To cover language, image, and video at the same time for different scenarios, a 3D transformer encoder-decoder framework is designed, which can not only deal with videos as 3D data but also adapt to texts and images as 1D and 2D data, respectively.

Ranked #1 on Text-to-Video Generation on Kinetics

Decoder Text-to-Image Generation +3

535

Paper
Code

KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation

1 code implementation • Findings (NAACL) 2022 • Yongfei Liu, Chenfei Wu, Shao-Yen Tseng, Vasudev Lal, Xuming He, Nan Duan

Self-supervised vision-and-language pretraining (VLP) aims to learn transferable multi-modal representations from large-scale image-text data and to achieve strong performances on a broad scope of vision-language tasks after finetuning.

Knowledge Distillation Object +1

Paper
Code

GEM: A General Evaluation Benchmark for Multimodal Tasks

1 code implementation • Findings (ACL) 2021 • Lin Su, Nan Duan, Edward Cui, Lei Ji, Chenfei Wu, Huaishao Luo, Yongfei Liu, Ming Zhong, Taroon Bharti, Arun Sacheti

Comparing with existing multimodal datasets such as MSCOCO and Flicker30K for image-language tasks, YouCook2 and MSR-VTT for video-language tasks, GEM is not only the largest vision-language dataset covering image-language tasks and video-language tasks at the same time, but also labeled in multiple languages.

Paper
Code

GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions

1 code implementation • 30 Apr 2021 • Chenfei Wu, Lun Huang, Qianxi Zhang, Binyang Li, Lei Ji, Fan Yang, Guillermo Sapiro, Nan Duan

Generating videos from text is a challenging task due to its high computational requirements for training and infinite possible answers for evaluation.

Ranked #17 on Text-to-Video Generation on MSR-VTT (CLIPSIM metric)

Text-to-Video Generation Video Generation

Paper
Code

Deep Reason: A Strong Baseline for Real-World Visual Reasoning

no code implementations • 24 May 2019 • Chenfei Wu, Yanzhao Zhou, Gen Li, Nan Duan, Duyu Tang, Xiaojie Wang

This paper presents a strong baseline for real-world visual reasoning (GQA), which achieves 60. 93% in GQA 2019 challenge and won the sixth place.

Visual Reasoning

Paper
Add Code

Chain of Reasoning for Visual Question Answering

no code implementations • NeurIPS 2018 • Chenfei Wu, Jinlai Liu, Xiaojie Wang, Xuan Dong

A chain of reasoning (CoR) is constructed for supporting multi-step and dynamic reasoning on changed relations and objects.

Object Question Answering +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.