WorldCanvas: A Promptable Framework for Rich, User-Directed Simulations
WorldCanvas is a framework that enables rich, user-directed simulations by combining text, trajectories, and reference images. It allows for the generation of coherent, controllable events with multi-agent interactions, object entry/exit, and reference-guided appearance.
Why it matters
WorldCanvas represents a significant advancement in AI-powered simulation, enabling more expressive and user-directed virtual environments.
Key Points
- 1Combines text, trajectories, and reference images for simulation
- 2Enables rich, user-directed events with multi-agent interactions
- 3Preserves object identity and scene consistency despite temporary disappearance
- 4Advances world models from passive predictors to interactive, user-shaped simulators
Details
WorldCanvas is a framework that combines text, trajectories, and reference images to enable rich, user-directed simulations. Unlike text-only approaches and existing trajectory-controlled image-to-video methods, this multimodal approach encodes motion, timing, and visibility through trajectories, while using natural language for semantic intent and reference images for visual grounding of object identity. This allows for the generation of coherent, controllable events that include multi-agent interactions, object entry/exit, and reference-guided appearance, as well as counterintuitive events. The resulting videos demonstrate not only temporal coherence but also emergent consistency, preserving object identity and scene despite temporary disappearance. By supporting expressive world events generation, WorldCanvas advances world models from passive predictors to interactive, user-shaped simulators.
No comments yet
Be the first to comment