Search Results for author: Jiwen Zhang

Found 6 papers, 4 papers with code

VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models

1 code implementation • 27 May 2024 • Zejun Li, Ruipu Luo, Jiwen Zhang, Minghui Qiu, Zhongyu Wei

While large multi-modal models (LMMs) have exhibited impressive capabilities across diverse tasks, their effectiveness in handling complex tasks has been limited by the prevailing single-step reasoning paradigm.

Object

Paper
Code

DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning

1 code implementation • 2 Apr 2024 • Mengfei Du, Binhao Wu, Jiwen Zhang, Zhihao Fan, Zejun Li, Ruipu Luo, Xuanjing Huang, Zhongyu Wei

For task completion, the agent needs to align and integrate various navigation modalities, including instruction, observation and navigation history.

Contrastive Learning Decision Making +2

Paper
Code

Android in the Zoo: Chain-of-Action-Thought for GUI Agents

1 code implementation • 5 Mar 2024 • Jiwen Zhang, Jihao Wu, Yihua Teng, Minghui Liao, Nuo Xu, Xiao Xiao, Zhongyu Wei, Duyu Tang

To address this, this work presents Chain-of-Action-Thought (dubbed CoAT), which takes the description of the previous actions, the current screen, and more importantly the action thinking of what actions should be performed and the outcomes led by the chosen action.

Language Modelling Large Language Model