Search Results for author: Xing Sun

Found 63 papers, 46 papers with code

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

no code implementations • 31 May 2024 • Chaoyou Fu, Yuhan Dai, Yondong Luo, Lei LI, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, Peixian Chen, Yanwei Li, Shaohui Lin, Sirui Zhao, Ke Li, Tong Xu, Xiawu Zheng, Enhong Chen, Rongrong Ji, Xing Sun

With Video-MME, we extensively evaluate various state-of-the-art MLLMs, including GPT-4 series and Gemini 1. 5 Pro, as well as open-source image models like InternVL-Chat-V1. 5 and video models like LLaVA-NeXT-Video.

Paper
Add Code

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

no code implementations • 24 Apr 2024 • Timin Gao, Peixian Chen, Mengdan Zhang, Chaoyou Fu, Yunhang Shen, Yan Zhang, Shengchuan Zhang, Xiawu Zheng, Xing Sun, Liujuan Cao, Rongrong Ji

This paper delves into the realm of multimodal CoT to solve intricate visual reasoning tasks with multimodal large language models(MLLMs) and their cognitive capability.

Decision Making Logical Reasoning +1

Paper
Add Code

HRVDA: High-Resolution Visual Document Assistant

no code implementations • 10 Apr 2024 • Chaohu Liu, Kun Yin, Haoyu Cao, Xinghua Jiang, Xin Li, Yinsong Liu, Deqiang Jiang, Xing Sun, Linli Xu

In addition, we construct a document-oriented visual instruction tuning dataset and apply a multi-stage training strategy to enhance the model's document modeling capabilities.

document understanding

Paper
Add Code

A General and Efficient Training for Transformer via Token Expansion

1 code implementation • 31 Mar 2024 • Wenxuan Huang, Yunhang Shen, Jiao Xie, Baochang Zhang, Gaoqi He, Ke Li, Xing Sun, Shaohui Lin

The remarkable performance of Vision Transformers (ViTs) typically requires an extremely large training cost.

Paper
Code

RESTORE: Towards Feature Shift for Vision-Language Prompt Learning

1 code implementation • 10 Mar 2024 • Yuncheng Yang, Chuyan Zhang, Zuopeng Yang, Yuting Gao, Yulei Qin, Ke Li, Xing Sun, Jie Yang, Yun Gu

Prompt learning is effective for fine-tuning foundation models to improve their generalization across a variety of downstream tasks.

Paper
Code

Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models

no code implementations • 29 Feb 2024 • Xin Li, Yunfei Wu, Xinghua Jiang, Zhihao Guo, Mingming Gong, Haoyu Cao, Yinsong Liu, Deqiang Jiang, Xing Sun

It can represent that the contrastive learning between the visual holistic representations and the multimodal fine-grained features of document objects can assist the vision encoder in acquiring more effective visual cues, thereby enhancing the comprehension of text-rich documents in LVLMs.

Contrastive Learning document understanding

Paper
Add Code

Sinkhorn Distance Minimization for Knowledge Distillation

1 code implementation • 27 Feb 2024 • Xiao Cui, Yulei Qin, Yuting Gao, Enwei Zhang, Zihan Xu, Tong Wu, Ke Li, Xing Sun, Wengang Zhou, Houqiang Li

We propose the Sinkhorn Knowledge Distillation (SinKD) that exploits the Sinkhorn distance to ensure a nuanced and precise assessment of the disparity between teacher and student distributions.

Decoder Knowledge Distillation

106

Paper
Code

FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema

1 code implementation • 19 Feb 2024 • Junru Lu, Siyu An, Min Zhang, Yulan He, Di Yin, Xing Sun

In the quest to facilitate the deep intelligence of Large Language Models (LLMs) accessible in final-end user-bot interactions, the art of prompt crafting emerges as a critical yet complex task for the average user.

Paper
Code

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

2 code implementations • 19 Dec 2023 • Chaoyou Fu, Renrui Zhang, Zihan Wang, Yubo Huang, Zhengye Zhang, Longtian Qiu, Gaoxiang Ye, Yunhang Shen, Mengdan Zhang, Peixian Chen, Sirui Zhao, Shaohui Lin, Deqiang Jiang, Di Yin, Peng Gao, Ke Li, Hongsheng Li, Xing Sun

They endow Large Language Models (LLMs) with powerful capabilities in visual understanding, enabling them to tackle diverse multi-modal tasks.

Visual Reasoning

9,901

Paper
Code

MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL

1 code implementation • 18 Dec 2023 • Bing Wang, Changyu Ren, Jian Yang, Xinnian Liang, Jiaqi Bai, Linzheng Chai, Zhao Yan, Qian-Wen Zhang, Di Yin, Xing Sun, Zhoujun Li

Our framework comprises a core decomposer agent for Text-to-SQL generation with few-shot chain-of-thought reasoning, accompanied by two auxiliary agents that utilize external tools or models to acquire smaller sub-databases and refine erroneous SQL queries.

Ranked #5 on Text-To-SQL on BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)

SQL Parsing Text-To-SQL

122

Paper
Code

SPD-DDPM: Denoising Diffusion Probabilistic Models in the Symmetric Positive Definite Space

1 code implementation • 13 Dec 2023 • Yunchen Li, Zhou Yu, Gaoqi He, Yunhang Shen, Ke Li, Xing Sun, Shaohui Lin

On the other hand, the model unconditionally learns the probability distribution of the data $p(X)$ and generates samples that conform to this distribution.

Denoising Traffic Prediction

Paper
Code

MMICT: Boosting Multi-Modal Fine-Tuning with In-Context Examples

no code implementations • 11 Dec 2023 • Tao Chen, Enwei Zhang, Yuting Gao, Ke Li, Xing Sun, Yan Zhang, Hui Li

Although In-Context Learning (ICL) brings remarkable performance gains to Large Language Models (LLMs), the improvements remain lower than fine-tuning on downstream tasks.

In-Context Learning

Paper
Add Code

Aligning and Prompting Everything All at Once for Universal Visual Perception

2 code implementations • 4 Dec 2023 • Yunhang Shen, Chaoyou Fu, Peixian Chen, Mengdan Zhang, Ke Li, Xing Sun, Yunsheng Wu, Shaohui Lin, Rongrong Ji

However, predominant paradigms, driven by casting instance-level tasks as an object-word alignment, bring heavy cross-modality interaction, which is not effective in prompting object detection and visual grounding.

Object object-detection +6

445

Paper
Code

Towards Robust Text Retrieval with Progressive Learning

1 code implementation • 20 Nov 2023 • Tong Wu, Yulei Qin, Enwei Zhang, Zihan Xu, Yuting Gao, Ke Li, Xing Sun

However, existing embedding models for text retrieval usually have three non-negligible limitations.

Machine Reading Comprehension Question Answering +2

Paper
Code

Woodpecker: Hallucination Correction for Multimodal Large Language Models

1 code implementation • 24 Oct 2023 • Shukang Yin, Chaoyou Fu, Sirui Zhao, Tong Xu, Hao Wang, Dianbo Sui, Yunhang Shen, Ke Li, Xing Sun, Enhong Chen

Hallucination is a big shadow hanging over the rapidly evolving Multimodal Large Language Models (MLLMs), referring to the phenomenon that the generated text is inconsistent with the image content.

Hallucination

562

Paper
Code

Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration

no code implementations • ICCV 2023 • Haoyu Cao, Changcun Bao, Chaohu Liu, Huang Chen, Kun Yin, Hao liu, Yinsong Liu, Deqiang Jiang, Xing Sun

We propose a novel end-to-end document understanding model called SeRum (SElective Region Understanding Model) for extracting meaningful information from document images, including document analysis, retrieval, and office automation.

Decoder document understanding +2

Paper
Add Code

Unified and Dynamic Graph for Temporal Character Grouping in Long Videos

no code implementations • 27 Aug 2023 • Xiujun Shu, Wei Wen, Liangsheng Xu, Mingbao Lin, Ruizhi Qiao, Taian Guo, Hanjun Li, Bei Gan, Xiao Wang, Xing Sun

In this paper, we present a unified and dynamic graph (UniDG) framework for temporal character grouping.

Clustering Graph Clustering

Paper
Add Code

Turning a CLIP Model into a Scene Text Spotter

1 code implementation • 21 Aug 2023 • Wenwen Yu, Yuliang Liu, Xingkui Zhu, Haoyu Cao, Xing Sun, Xiang Bai

Utilizing only 10% of the supervised data, FastTCM-CR50 improves performance by an average of 26. 5% and 5. 5% for text detection and spotting tasks, respectively.

object-detection Object Detection +3

160

Paper
Code

MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation

1 code implementation • 16 Aug 2023 • Junru Lu, Siyu An, Mingbao Lin, Gabriele Pergola, Yulan He, Di Yin, Xing Sun, Yunsheng Wu

We propose MemoChat, a pipeline for refining instructions that enables large language models (LLMs) to effectively employ self-composed memos for maintaining consistent long-range open-domain conversations.

Memorization Retrieval

Paper
Code

D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation

1 code implementation • ICCV 2023 • Hanjun Li, Xiujun Shu, Sunan He, Ruizhi Qiao, Wei Wen, Taian Guo, Bei Gan, Xing Sun

Under this setup, we propose a Dynamic Gaussian prior based Grounding framework with Glance annotation (D3G), which consists of a Semantic Alignment Group Contrastive Learning module (SA-GCL) and a Dynamic Gaussian prior Adjustment module (DGA).

Ranked #10 on Temporal Sentence Grounding on Charades-STA

Contrastive Learning Sentence +1

Paper
Code

Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval

1 code implementation • ICCV 2023 • Yunquan Zhu, Xinkai Gao, Bo Ke, Ruizhi Qiao, Xing Sun

Image retrieval targets to find images from a database that are visually similar to the query image.

Image Retrieval Retrieval

Paper
Code

A Survey on Multimodal Large Language Models

1 code implementation • 23 Jun 2023 • Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, Enhong Chen

Recently, Multimodal Large Language Model (MLLM) represented by GPT-4V has been a new rising research hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform multimodal tasks.

Hallucination In-Context Learning +5

9,901

Paper
Code

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

3 code implementations • 23 Jun 2023 • Chaoyou Fu, Peixian Chen, Yunhang Shen, Yulei Qin, Mengdan Zhang, Xu Lin, Jinrui Yang, Xiawu Zheng, Ke Li, Xing Sun, Yunsheng Wu, Rongrong Ji

Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks, showing amazing emergent abilities in recent studies, such as writing poems based on an image.

Benchmarking Language Modelling +3

9,901

Paper
Code

Looking and Listening: Audio Guided Text Recognition

1 code implementation • 6 Jun 2023 • Wenwen Yu, MingYu Liu, Biao Yang, Enming Zhang, Deqiang Jiang, Xing Sun, Yuliang Liu, Xiang Bai

Text recognition in the wild is a long-standing problem in computer vision.

Decoder Scene Text Recognition

Paper
Code

ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images

no code implementations • 5 Jun 2023 • Wenwen Yu, Chengquan Zhang, Haoyu Cao, Wei Hua, Bohan Li, Huang Chen, MingYu Liu, Mingrui Chen, Jianfeng Kuang, Mengjun Cheng, Yuning Du, Shikun Feng, Xiaoguang Hu, Pengyuan Lyu, Kun Yao, Yuechen Yu, Yuliang Liu, Wanxiang Che, Errui Ding, Cheng-Lin Liu, Jiebo Luo, Shuicheng Yan, Min Zhang, Dimosthenis Karatzas, Xing Sun, Jingdong Wang, Xiang Bai

It is hoped that this competition will attract many researchers in the field of CV and NLP, and bring some new thoughts to the field of Document AI.

Document AI Entity Linking +1

Paper
Add Code

SoftCLIP: Softer Cross-modal Alignment Makes CLIP Stronger

no code implementations • 30 Mar 2023 • Yuting Gao, Jinfeng Liu, Zihan Xu, Tong Wu Enwei Zhang, Wei Liu, Jie Yang, Ke Li, Xing Sun

During the preceding biennium, vision-language pre-training has achieved noteworthy success on several downstream tasks.

Zero-Shot Learning

Paper
Add Code

Grab What You Need: Rethinking Complex Table Structure Recognition with Flexible Components Deliberation

no code implementations • 16 Mar 2023 • Hao liu, Xin Li, Mingming Gong, Bing Liu, Yunfei Wu, Deqiang Jiang, Yinsong Liu, Xing Sun

Recently, Table Structure Recognition (TSR) task, aiming at identifying table structure into machine readable formats, has received increasing interest in the community.

Paper
Add Code

Co-Salient Object Detection with Co-Representation Purification

1 code implementation • 14 Mar 2023 • Ziyue Zhu, Zhao Zhang, Zheng Lin, Xing Sun, Ming-Ming Cheng

Such irrelevant information in the co-representation interferes with its locating of co-salient objects.

Co-Salient Object Detection Object +2

Paper
Code

Efficient Decoder-free Object Detection with Transformers

2 code implementations • 14 Jun 2022 • Peixian Chen, Mengdan Zhang, Yunhang Shen, Kekai Sheng, Yuting Gao, Xing Sun, Ke Li, Chunhua Shen

A natural usage of ViTs in detection is to replace the CNN-based backbone with a transformer-based backbone, which is straightforward and effective, with the price of bringing considerable computation burden for inference.

Decoder Object +1

Paper
Code

Training-free Transformer Architecture Search

1 code implementation • CVPR 2022 • Qinqin Zhou, Kekai Sheng, Xiawu Zheng, Ke Li, Xing Sun, Yonghong Tian, Jie Chen, Rongrong Ji

Recently, Vision Transformer (ViT) has achieved remarkable success in several computer vision tasks.

Paper
Code

DIFNet: Boosting Visual Information Flow for Image Captioning

no code implementations • CVPR 2022 • Mingrui Wu, Xuying Zhang, Xiaoshuai Sun, Yiyi Zhou, Chao Chen, Jiaxin Gu, Xing Sun, Rongrong Ji

Current Image captioning (IC) methods predict textual words sequentially based on the input visual information from the visual feature extractor and the partially generated sentence information.

Image Captioning Sentence

Paper
Add Code

RMNet: Equivalently Removing Residual Connection from Networks

1 code implementation • 1 Nov 2021 • Fanxu Meng, Hao Cheng, Jiaxin Zhuang, Ke Li, Xing Sun

In this paper, we aim to remedy this problem and propose to remove the residual connection in a vanilla ResNet equivalently by a reserving and merging (RM) operation on ResBlock.

Network Pruning

208

Paper
Code

Mitigating Memorization of Noisy Labels via Regularization between Representations

1 code implementation • 18 Oct 2021 • Hao Cheng, Zhaowei Zhu, Xing Sun, Yang Liu

Designing robust loss functions is popular in learning with noisy labels while existing designs did not explicitly consider the overfitting property of deep neural networks (DNNs).

Learning with noisy labels Memorization +1

Paper
Code

Self-supervised Models are Good Teaching Assistants for Vision Transformers

no code implementations • 29 Sep 2021 • Haiyan Wu, Yuting Gao, Ke Li, Yinqi Zhang, Shaohui Lin, Yuan Xie, Xing Sun

These findings motivate us to introduce an self-supervised teaching assistant (SSTA) besides the commonly used supervised teacher to improve the performance of transformers.

Image Classification Knowledge Distillation

Paper
Add Code

PR-Net: Preference Reasoning for Personalized Video Highlight Detection

no code implementations • ICCV 2021 • Runnan Chen, Penghao Zhou, Wenzhe Wang, Nenglun Chen, Pai Peng, Xing Sun, Wenping Wang

Personalized video highlight detection aims to shorten a long video to interesting moments according to a user's preference, which has recently raised the community's attention.

Highlight Detection Semantic Similarity +1

Paper
Add Code

Learning Canonical View Representation for 3D Shape Recognition with Arbitrary Views

1 code implementation • ICCV 2021 • Xin Wei, Yifei Gong, Fudong Wang, Xing Sun, Jian Sun

In this way, each 3D shape with arbitrary views is represented by a fixed number of canonical view features, which are further aggregated to generate a rich and robust 3D shape representation for shape recognition.

3D Shape Recognition 3D Shape Representation

Paper
Code

Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer

1 code implementation • 3 Aug 2021 • Yifan Xu, Zhijie Zhang, Mengdan Zhang, Kekai Sheng, Ke Li, WeiMing Dong, Liqing Zhang, Changsheng Xu, Xing Sun

Vision transformers (ViTs) have recently received explosive popularity, but the huge computational cost is still a severe issue.

Ranked #11 on Efficient ViTs on ImageNet-1K (with DeiT-T)

Efficient ViTs

Paper
Code

Discriminator-Free Generative Adversarial Attack

1 code implementation • 20 Jul 2021 • ShaoHao Lu, Yuqiao Xian, Ke Yan, Yi Hu, Xing Sun, Xiaowei Guo, Feiyue Huang, Wei-Shi Zheng

The Deep Neural Networks are vulnerable toadversarial exam-ples(Figure 1), making the DNNs-based systems collapsed byadding the inconspicuous perturbations to the images.

Adversarial Attack Disentanglement

Paper
Code

AS-MLP: An Axial Shifted MLP Architecture for Vision

2 code implementations • ICLR 2022 • Dongze Lian, Zehao Yu, Xing Sun, Shenghua Gao

Our proposed AS-MLP obtains 51. 5 mAP on the COCO validation set and 49. 5 MS mIoU on the ADE20K dataset, which is competitive compared to the transformer-based architectures.

Ranked #13 on Semantic Segmentation on DensePASS

object-detection Object Detection +1

162

Paper
Code

Learning 3D Shape Feature for Texture-Insensitive Person Re-Identification

1 code implementation • CVPR 2021 • Jiaxing Chen, Xinyang Jiang, Fudong Wang, Jun Zhang, Feng Zheng, Xing Sun, Wei-Shi Zheng

In this paper, rather than relying on texture based information, we propose to improve the robustness of person ReID against clothing texture by exploiting the information of a person's 3D shape.

Ranked #4 on Person Re-Identification on PRCC

3D Reconstruction Person Re-Identification

Paper
Code

Temporal Modulation Network for Controllable Space-Time Video Super-Resolution

1 code implementation • CVPR 2021 • Gang Xu, Jun Xu, Zhen Li, Liang Wang, Xing Sun, Ming-Ming Cheng

To well exploit the temporal information, we propose a Locally-temporal Feature Comparison (LFC) module, along with the Bi-directional Deformable ConvLSTM, to extract short-term and long-term motion cues in videos.

Ranked #8 on Video Super-Resolution on MSU Video Super Resolution Benchmark: Detail Restoration

Space-time Video Super-resolution Video Super-Resolution

106

Paper
Code

DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning

2 code implementations • 19 Apr 2021 • Yuting Gao, Jia-Xin Zhuang, Shaohui Lin, Hao Cheng, Xing Sun, Ke Li, Chunhua Shen

Specifically, we find the final embedding obtained by the mainstream SSL methods contains the most fruitful information, and propose to distill the final embedding to maximally transmit a teacher's knowledge to a lightweight model by constraining the last embedding of the student to be consistent with that of the teacher.

Contrastive Learning Representation Learning +1

Paper
Code

On Evolving Attention Towards Domain Adaptation

no code implementations • 25 Mar 2021 • Kekai Sheng, Ke Li, Xiawu Zheng, Jian Liang, WeiMing Dong, Feiyue Huang, Rongrong Ji, Xing Sun

However, considering that the configuration of attention, i. e., the type and the position of attention module, affects the performance significantly, it is more generalized to optimize the attention configuration automatically to be specialized for arbitrary UDA scenario.

Ranked #1 on Partial Domain Adaptation on Office-Home

Partial Domain Adaptation Unsupervised Domain Adaptation

Paper
Add Code

Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval with Partial Query

1 code implementation • ICCV 2021 • Guanyu Cai, Jun Zhang, Xinyang Jiang, Yifei Gong, Lianghua He, Fufu Yu, Pai Peng, Xiaowei Guo, Feiyue Huang, Xing Sun

However, the performance of existing methods suffers in real life since the user is likely to provide an incomplete description of an image, which often leads to results filled with false positives that fit the incomplete description.

Cross-Modal Retrieval Image Retrieval +1

Paper
Code

An Empirical Study and Analysis on Open-Set Semi-Supervised Learning

no code implementations • 19 Jan 2021 • Huixiang Luo, Hao Cheng, Fanxu Meng, Yuting Gao, Ke Li, Mengdan Zhang, Xing Sun

Pseudo-labeling (PL) and Data Augmentation-based Consistency Training (DACT) are two approaches widely used in Semi-Supervised Learning (SSL) methods.

Data Augmentation

Paper
Add Code

Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search

2 code implementations • 8 Jan 2021 • Chenyang Gao, Guanyu Cai, Xinyang Jiang, Feng Zheng, Jun Zhang, Yifei Gong, Pai Peng, Xiaowei Guo, Xing Sun

Secondly, a BERT with locality-constrained attention is proposed to obtain representations of descriptions at different scales.

Ranked #15 on Text based Person Retrieval on CUHK-PEDES

Descriptive Sentence +2

Paper
Code

Learning To Know Where To See: A Visibility-Aware Approach for Occluded Person Re-Identification

no code implementations • ICCV 2021 • Jinrui Yang, Jiawei Zhang, Fufu Yu, Xinyang Jiang, Mengdan Zhang, Xing Sun, Ying-Cong Chen, Wei-Shi Zheng

Several mainstream methods utilize extra cues (e. g., human pose information) to distinguish human parts from obstacles to alleviate the occlusion problem.

Person Re-Identification

Paper
Add Code

One for More: Selecting Generalizable Samples for Generalizable ReID Model

1 code implementation • 10 Dec 2020 • Enwei Zhang, Xinyang Jiang, Hao Cheng, AnCong Wu, Fufu Yu, Ke Li, Xiaowei Guo, Feng Zheng, Wei-Shi Zheng, Xing Sun

Current training objectives of existing person Re-IDentification (ReID) models only ensure that the loss of the model decreases on selected training batch, with no regards to the performance on samples outside the batch.

Person Re-Identification

Paper
Code

Learning with Instance-Dependent Label Noise: A Sample Sieve Approach

1 code implementation • ICLR 2021 • Hao Cheng, Zhaowei Zhu, Xingyu Li, Yifei Gong, Xing Sun, Yang Liu

This high-quality sample sieve allows us to treat clean examples and the corrupted ones separately in training a DNN solution, and such a separation is shown to be advantageous in the instance-dependent noise setting.

Ranked #1 on Image Classification with Label Noise on CIFAR-10, 60% IDN

Image Classification with Label Noise Learning with noisy labels

Paper
Code

Pruning Filter in Filter

1 code implementation • NeurIPS 2020 • Fanxu Meng, Hao Cheng, Ke Li, Huixiang Luo, Xiaowei Guo, Guangming Lu, Xing Sun

Through extensive experiments, we demonstrate that SWP is more effective compared to the previous FP-based methods and achieves the state-of-art pruning ratio on CIFAR-10 and ImageNet datasets without obvious accuracy drop.

167

Paper
Code

Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion

3 code implementations • 12 Sep 2020 • Jinpeng Wang, Yuting Gao, Ke Li, Jianguo Hu, Xinyang Jiang, Xiaowei Guo, Rongrong Ji, Xing Sun

Specifically, we construct a positive clip and a negative clip for each video.

Action Recognition Representation Learning

113

Paper
Code

Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning

2 code implementations • CVPR 2021 • Jinpeng Wang, Yuting Gao, Ke Li, Yiqi Lin, Andy J. Ma, Hao Cheng, Pai Peng, Feiyue Huang, Rongrong Ji, Xing Sun

Then we force the model to pull the feature of the distracting video and the feature of the original video closer, so that the model is explicitly restricted to resist the background influence, focusing more on the motion changes.

Representation Learning Self-Supervised Learning

152

Paper
Code

Devil's in the Details: Aligning Visual Clues for Conditional Embedding in Person Re-Identification

1 code implementation • 11 Sep 2020 • Fufu Yu, Xinyang Jiang, Yifei Gong, Shizhen Zhao, Xiaowei Guo, Wei-Shi Zheng, Feng Zheng, Xing Sun

Secondly, the Conditional Feature Embedding requires the overall feature of a query image to be dynamically adjusted based on the gallery image it matches, while most of the existing methods ignore the reference images.

Ranked #1 on Person Re-Identification on CUHK03-C

Person Re-Identification

Paper
Code

Do Not Disturb Me: Person Re-identification Under the Interference of Other Pedestrians

1 code implementation • ECCV 2020 • Shizhen Zhao, Changxin Gao, Jun Zhang, Hao Cheng, Chuchu Han, Xinyang Jiang, Xiaowei Guo, Wei-Shi Zheng, Nong Sang, Xing Sun

In the conventional person Re-ID setting, it is widely assumed that cropped person images are for each individual.

Person Re-Identification Retrieval

Paper
Code

NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination

1 code implementation • 27 Jul 2020 • Penghao Zhou, Chong Zhou, Pai Peng, Junlong Du, Xing Sun, Xiaowei Guo, Feiyue Huang

Greedy-NMS inherently raises a dilemma, where a lower NMS threshold will potentially lead to a lower recall rate and a higher threshold introduces more false positives.

Ranked #12 on Object Detection on CrowdHuman (full body)

Hallucination Object Detection +1

Paper
Code

Filter Grafting for Deep Neural Networks: Reason, Method, and Cultivation

1 code implementation • 26 Apr 2020 • Hao Cheng, Fanxu Meng, Ke Li, Yuting Gao, Guangming Lu, Xing Sun, Rongrong Ji

To gain a universal improvement on both valid and invalid filters, we compensate grafting with distillation (\textbf{Cultivation}) to overcome the drawback of grafting .

valid

140

Paper
Code

Filter Grafting for Deep Neural Networks

2 code implementations • CVPR 2020 • Fanxu Meng, Hao Cheng, Ke Li, Zhixin Xu, Rongrong Ji, Xing Sun, Gaungming Lu

To better perform the grafting process, we develop an entropy-based criterion to measure the information of filters and an adaptive weighting strategy for balancing the grafted information among networks.

140

Paper
Code

Viewpoint-Aware Loss with Angular Regularization for Person Re-Identification

1 code implementation • 3 Dec 2019 • Zhihui Zhu, Xinyang Jiang, Feng Zheng, Xiaowei Guo, Feiyue Huang, Wei-Shi Zheng, Xing Sun

Instead of one subspace for each viewpoint, our method projects the feature from different viewpoints into a unified hypersphere and effectively models the feature distribution on both the identity-level and the viewpoint-level.

Ranked #5 on Person Re-Identification on Market-1501 (using extra training data)

Person Re-Identification

Paper
Code

Asymmetric Co-Teaching for Unsupervised Cross Domain Person Re-Identification

1 code implementation • 3 Dec 2019 • Fengxiang Yang, Ke Li, Zhun Zhong, Zhiming Luo, Xing Sun, Hao Cheng, Xiaowei Guo, Feiyue Huang, Rongrong Ji, Shaozi Li

This procedure encourages that the selected training samples can be both clean and miscellaneous, and that the two models can promote each other iteratively.

Ranked #10 on Unsupervised Domain Adaptation on Market to Duke

Clustering Miscellaneous +2

107

Paper
Code

Rethinking Temporal Fusion for Video-based Person Re-identification on Semantic and Time Aspect

2 code implementations • 28 Nov 2019 • Xinyang Jiang, Yifei Gong, Xiaowei Guo, Qize Yang, Feiyue Huang, Wei-Shi Zheng, Feng Zheng, Xing Sun

Recently, the research interest of person re-identification (ReID) has gradually turned to video-based methods, which acquire a person representation by aggregating frame features of an entire video.

Video-Based Person Re-Identification

Paper
Code

High-dimensional Dense Residual Convolutional Neural Network for Light Field Reconstruction

1 code implementation • 3 Oct 2019 • Nan Meng, Hayden K. -H. So, Xing Sun, Edmund Y. Lam

We consider the problem of high-dimensional light field reconstruction and develop a learning-based framework for spatial and angular super-resolution.

Super-Resolution Vocal Bursts Intensity Prediction

Paper
Code

Pyramidal Person Re-IDentification via Multi-Loss Dynamic Training

1 code implementation • CVPR 2019 • Feng Zheng, Cheng Deng, Xing Sun, Xinyang Jiang, Xiaowei Guo, Zongqiao Yu, Feiyue Huang, Rongrong Ji

Most existing Re-IDentification (Re-ID) methods are highly dependent on precise bounding boxes that enable images to be aligned with each other.

Ranked #2 on Person Re-Identification on CUHK03-C

Person Re-Identification

Paper
Code

Consistency Analysis for the Doubly Stochastic Dirichlet Process

no code implementations • 24 May 2016 • Xing Sun, Nelson H. C. Yung, Edmund Y. Lam, Hayden K. -H. So

This technical report proves components consistency for the Doubly Stochastic Dirichlet Process with exponential convergence of posterior probability.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.