MiVOS is a video object segmentation model which decouples interaction-to-mask and mask propagation. By decoupling interaction from propagation, MiVOS is versatile and not limited by the type of interactions. It uses three modules: Interaction-to-Mask, Propagation and Difference-Aware Fusion. Trained separately, the interaction module converts user interactions to an object mask, which is then temporally propagated by our propagation module using a novel top-filtering strategy in reading the space-time memory. To effectively take the user's intent into account, a novel difference-aware module is proposed to learn how to properly fuse the masks before and after each interaction, which are aligned with the target frames by employing the space-time memory.
Source: Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware FusionPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Interactive Video Object Segmentation | 1 | 20.00% |
Semantic Segmentation | 1 | 20.00% |
Semi-Supervised Video Object Segmentation | 1 | 20.00% |
Video Object Segmentation | 1 | 20.00% |
Video Semantic Segmentation | 1 | 20.00% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |