Whisper + LLM Task Extraction: My Meeting Intelligence Architecture
The author built a system that listens to meetings, extracts structured tasks, and routes them to the right people. It uses Whisper for speech-to-text, segments the transcript, classifies the segments, and then extracts task details using a more capable LLM.
Why it matters
This system demonstrates a practical application of AI/ML to automate the extraction of actionable tasks from meeting transcripts, improving productivity and reducing the burden of manual note-taking.
Key Points
- 1Naive transcription and task extraction from meeting notes is challenging due to long transcripts, high costs, and loss of context
- 2The system uses a multi-stage pipeline: transcribe, segment, classify, extract, and validate tasks
- 3It leverages cheaper models for classification and more capable LLMs only for task extraction to reduce costs by 70%
Details
The author's team was drowning in meeting notes, with action items scattered across various tools. To address this, they built a system that automatically processes meeting audio files. It uses Whisper for speech-to-text, segmenting the transcript by speaker turns and limiting each segment to 300 tokens. It then classifies the segments as Decision, Action, or Discussion using a cheaper LLM like Claude Haiku. Only the Action segments are passed to a more capable LLM (Claude 3.5) to extract task details like owner, deadline, and dependencies. The structured task data is then deduped and routed to project management tools. The key insight is to use cheap models for classification and expensive ones only for extraction, which cuts costs by 70%.
No comments yet
Be the first to comment