This document outlines the design and implementation of a DIY voice assistant called Jarvis. It discusses capturing audio with a microphone, using DeepSpeech to convert audio to text, processing text for intent using a command parser, and generating responses via text-to-speech. The overall goal is to build a local voice assistant that does not rely on cloud services and respects data privacy. Technical challenges include limited hardware capabilities and the need for more advanced natural language understanding.
Related topics: