Auto-Encoding Morph-Tokens for Multimodal LLM

dcdmllm/morphtokens 3 May 2024

For multimodal LLMs, the synergy of visual comprehension (textual output) and generation (visual output) presents an ongoing challenge.

2
03 May 2024

Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models

reka-ai/reka-vibe-eval 3 May 2024

We introduce Vibe-Eval: a new open benchmark and framework for evaluating multimodal chat models.

89
03 May 2024

Neural Context Flows for Learning Generalizable Dynamical Systems

ddrous/ncflow 3 May 2024

Neural Ordinary Differential Equations typically struggle to generalize to new dynamical behaviors created by parameter changes in the underlying system, even when the dynamics are close to previously seen behaviors.

1
03 May 2024

An Attention Based Pipeline for Identifying Pre-Cancer Lesions in Head and Neck Clinical Images

precision-vision/oed-classification-segmentation 3 May 2024

Early detection of cancer can help improve patient prognosis by early intervention.

0
03 May 2024

On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?

maxzanella/mta 3 May 2024

Additionally, our method does not rely on ad hoc rules (e. g., confidence threshold) used in some previous test-time augmentation techniques to filter the augmented views.

4
03 May 2024

Multi-level projection with exponential parallel speedup; Application to sparse auto-encoders neural networks

memo-p/projection 3 May 2024

In this paper, we propose a new bi-level projection method for which we show that the time complexity for the $\ell_{1,\infty}$ norm is only $\mathcal{O}\big(n m \big)$ for a matrix in $\mathbb{R}^{n\times m}$, and $\mathcal{O}\big(n + m \big)$ with full parallel power.

0
03 May 2024

TOPICAL: TOPIC Pages AutomagicaLly

allenai/topical 3 May 2024

Topic pages aggregate useful information about an entity or concept into a single succinct and accessible article.

4
03 May 2024

Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation

xzhouzeng/prpose 3 May 2024

The accuracy and robustness of 3D human pose estimation (HPE) are limited by 2D pose detection errors and 2D to 3D ill-posed challenges, which have drawn great attention to Multi-Hypothesis HPE research.

0
03 May 2024

Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection

hui-design/aand 3 May 2024

The success of knowledge distillation mainly relies on how to keep the feature discrepancy between the teacher and student model, in which it assumes that: (1) the teacher model can jointly represent two different distributions for the normal and abnormal patterns, while (2) the student model can only reconstruct the normal distribution.

0
03 May 2024

Self-Supervised Learning for Real-World Super-Resolution from Dual and Multiple Zoomed Observations

cszhilu1998/selfdzsr_plusplus 3 May 2024

In addition, we further take multiple zoomed observations to explore self-supervised RefSR, and present a progressive fusion scheme for the effective utilization of reference images.

12
03 May 2024