Reddit Machine Learning12h ago|Research & PapersProducts & Services

VOID: Video Object and Interaction Deletion (physically-consistent video inpainting)

VOID is a model for video object removal that aims to handle physical interactions, unlike existing video inpainting methods that fail to account for the dynamic effects of removed objects.

💡

Why it matters

VOID represents a significant advancement in video inpainting by addressing the critical issue of physically-consistent object removal, which has important applications in video editing and content creation.

Key Points

  • 1VOID models counterfactual scene evolution to predict what the video would look like if the object had never been there
  • 2Uses counterfactual training data, VLM-guided masks, and a two-pass generation process to achieve physically-consistent results
  • 3Outperformed baselines like Runway (Aleph), Generative Omnimatte, and ProPainter in a human preference study

Details

VOID addresses the limitations of existing video inpainting methods that can fill in pixels behind an object but fail to handle cases where the removed object affects the dynamics of the scene, such as a domino chain falling or two cars about to crash. VOID models the counterfactual scene evolution to predict what the video would look like if the object had never been there. Key ideas include using counterfactual training data (paired videos with and without objects), VLM-guided masks to identify affected regions, and a two-pass generation process to first predict the new motion and then refine with flow-warped noise for temporal consistency. In a human preference study on real-world videos, VOID was selected 64.8% of the time over baseline methods.

Like
Save
Read original
Cached
Comments
?

No comments yet

Be the first to comment

AI Curator - Daily AI News Curation

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies