Search Results for author: Zikai Song

Found 10 papers, 5 papers with code

Coupled Mamba: Enhanced Multi-modal Fusion with Coupled State Space Model

no code implementations • 28 May 2024 • Wenbing Li, Hang Zhou, Junqing Yu, Zikai Song, Wei Yang

However, fusing multiple modalities is challenging for SSMs due to its hardware-aware parallelism designs.

Paper
Add Code

DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception

no code implementations • 24 May 2024 • Run Luo, Yunshui Li, Longze Chen, Wanwei He, Ting-En Lin, Ziqiang Liu, Lei Zhang, Zikai Song, Xiaobo Xia, Tongliang Liu, Min Yang, Binyuan Hui

Delving into the modeling capabilities of diffusion models for images naturally prompts the question: Can diffusion models serve as the eyes of large language models for image perception?

Hallucination

Paper
Add Code

EfficientGS: Streamlining Gaussian Splatting for Large-Scale High-Resolution Scene Representation

no code implementations • 19 Apr 2024 • Wenkai Liu, Tao Guan, Bin Zhu, Lili Ju, Zikai Song, Dan Li, Yuesong Wang, Wei Yang

In the domain of 3D scene representation, 3D Gaussian Splatting (3DGS) has emerged as a pivotal technology.

Paper
Add Code

AMD:Anatomical Motion Diffusion with Interpretable Motion Decomposition and Fusion

no code implementations • 20 Dec 2023 • Beibei Jing, Youjia Zhang, Zikai Song, Junqing Yu, Wei Yang

Generating realistic human motion sequences from text descriptions is a challenging task that requires capturing the rich expressiveness of both natural language and human motion. Recent advances in diffusion models have enabled significant progress in human motion synthesis. However, existing methods struggle to handle text inputs that describe complex or long motions. In this paper, we propose the Adaptable Motion Diffusion (AMD) model, which leverages a Large Language Model (LLM) to parse the input text into a sequence of concise and interpretable anatomical scripts that correspond to the target motion. This process exploits the LLM's ability to provide anatomical guidance for complex motion synthesis. We then devise a two-branch fusion scheme that balances the influence of the input text and the anatomical scripts on the inverse diffusion process, which adaptively ensures the semantic fidelity and diversity of the synthesized motion. Our method can effectively handle texts with complex or long motion descriptions, where existing methods often fail.

Language Modelling Large Language Model

Paper
Add Code

Optimized View and Geometry Distillation from Multi-view Diffuser

no code implementations • 11 Dec 2023 • Youjia Zhang, Zikai Song, Junqing Yu, Yawei Luo, Wei Yang

We leverage the rendered views from the optimized radiance field as the basis and develop a two-step specialization process of a 2D diffusion model, which is adept at conducting object-specific denoising and generating high-quality multi-view images.

Denoising

Paper
Add Code

Fine-grained Appearance Transfer with Diffusion Models

1 code implementation • 27 Nov 2023 • Yuteng Ye, Guanwen Li, Hang Zhou, Cai Jiale, Junqing Yu, Yawei Luo, Zikai Song, Qilong Xing, Youjia Zhang, Wei Yang

A pivotal aspect of our approach is the strategic use of the predicted $x_0$ space by diffusion models within the latent space of diffusion processes.

Image-to-Image Translation

Paper
Code

Progressive Text-to-Image Diffusion with Soft Latent Direction

1 code implementation • 18 Sep 2023 • Yuteng Ye, Jiale Cai, Hang Zhou, Guanwen Li, Youjia Zhang, Zikai Song, Chenxing Gao, Junqing Yu, Wei Yang

In spite of the rapidly evolving landscape of text-to-image generation, the synthesis and manipulation of multiple entities while adhering to specific relational constraints pose enduring challenges.

Language Modelling Large Language Model +1

Paper
Code

DiffusionTrack: Diffusion Model For Multi-Object Tracking

1 code implementation • 19 Aug 2023 • Run Luo, Zikai Song, Lintao Ma, JinLin Wei, Wei Yang, Min Yang

In inference, the model refines a set of paired randomly generated boxes to the detection and tracking results in a flexible one-step or multi-step denoising diffusion process.

Denoising Multi-Object Tracking +3

151

Paper
Code

Compact Transformer Tracker with Correlative Masked Modeling

1 code implementation • 26 Jan 2023 • Zikai Song, Run Luo, Junqing Yu, Yi-Ping Phoebe Chen, Wei Yang

Transformer framework has been showing superior performances in visual object tracking for its great strength in information aggregation across the template and search image with the well-known attention mechanism.

Decoder Visual Object Tracking

Paper
Code

Transformer Tracking with Cyclic Shifting Window Attention

1 code implementation • CVPR 2022 • Zikai Song, Junqing Yu, Yi-Ping Phoebe Chen, Wei Yang

Transformer architecture has been showing its great strength in visual object tracking, for its effective attention mechanism.

Object Visual Object Tracking

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.