Dev.to Machine Learning1d ago|Research & PapersProducts & Services

Building a Voice-Controlled Local AI Agent with Streamlit, Local STT, and Safe Tool Execution

The author built a voice-controlled local AI agent using Streamlit, local speech-to-text, and a safe tool execution layer. The system accepts audio input, converts speech to text, understands user intent, and executes local tools in a clean UI.

💡

Why it matters

This project demonstrates an end-to-end AI application that combines speech processing, intent understanding, and safe local tool execution in a transparent and user-friendly manner.

Key Points

  • 1Uses local Hugging Face speech-to-text model with API fallback
  • 2Supports multiple intent planning backends, including local rules-based and Ollama LLM
  • 3Implements safe file operations restricted to a dedicated output folder
  • 4Streamlit UI shows transcription, planned actions, and final output

Details

The project follows a local-first design, allowing users to either record audio or upload an existing file. The speech-to-text layer uses a local Hugging Face model by default, with an OpenAI API fallback option for weaker hardware. The intent planning module supports multiple backends, including a lightweight local rules-based planner and a stronger Ollama LLM. The safe tool execution layer maps intents to specific actions like file creation, code generation, and text summarization, all restricted to a dedicated output folder. The Streamlit UI displays the full pipeline, showing transcription, planned steps, and final results. Key challenges included balancing capability and reliability, as well as ensuring safe file operations.

Like
Save
Read original
Cached
Comments
?

No comments yet

Be the first to comment

AI Curator - Daily AI News Curation

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies