Vibecoding a Video Editing Pipeline with AI
The author explores using various AI tools, including ComfyUI, Gemma 4, and Claude Code, to streamline a video editing workflow for a trip to the California coast. They encounter challenges and ultimately determine that CLIP is the right tool for selecting the most scenic frames from their footage.
Why it matters
This article highlights the importance of selecting the right AI tools for specific tasks and the challenges of navigating the rapidly evolving AI landscape.
Key Points
- 1The author had 17.5 GB of video footage from a trip to the California coast
- 2They tried using ComfyUI, Gemma 4, and TurboQuant to enhance or select the best footage
- 3The author found that using a vision language model like Gemma 4 was not the right approach for selecting scenic frames
- 4They pivoted to using Claude Code and determined that CLIP is the best tool for this task
Details
The author had a large amount of video footage from a trip to the California coast and wanted to create a 90-second highlight reel and some YouTube Shorts. They initially tried using ComfyUI, thinking they could use image-to-video workflows to enhance the footage or create creative transitions. However, they realized ComfyUI was not the right tool for selecting and editing the existing footage. Next, they tried using Gemma 4 and TurboQuant to extract frames and have the vision model select the most scenic shots. This approach had issues with speed, hallucinated filenames, and struggled to distinguish scenic locations from less interesting ones at low resolutions. After some back-and-forth with Gemini, the author decided to pivot to using Claude Code. Within the first exchange, Claude pointed out that CLIP would be a better tool than a vision language model for this task, as it is designed for similarity matching rather than just describing what it sees. The author realized they had been trying to use the wrong AI tools for their video editing needs.
No comments yet
Be the first to comment