BorisovAI — Tools for the community. By the community.

A Voice Agent Met Claude Code: How We Built a Persistent Assistant

When I opened the voice-agent project, I faced a classic yet non-trivial task: create a full-fledged AI assistant that works not just with text, but with voice, integrates into a REST API on the backend, and interacts with Next.js frontend components. Python on the backend, JavaScript on the front—a familiar modern architecture. But the main challenge had nothing to do with technology.

First, I realized this wasn’t just another chatbot. We needed a system that understands voice commands, works with asynchronous operations, executes filesystem commands, integrates with documentation, and can honestly say: “I need help here.” I started with architecture—structuring the project so each layer owned its responsibility: TMA documentation in docs/tma/, a structured error log in docs/ERROR_JOURNAL.md, and separation of backend services by function.

Unexpectedly, it turned out the hardest part was organizing information flows. The agent had to know where to look for reference material, how to handle errors, and when to ask the developer for clarification. That’s when I understood: we needed built-in memory—not just the context of the current session, but a real knowledge store. I integrated aiosqlite for async SQLite access, and the agent gained the ability to remember information about the user, their preferences, and even personal data like country of residence.

This opened up a whole range of personalization possibilities. The agent became not just answering, but recognizing the user: “You’re from Russia? Got it, I’ll remember that and factor it into my recommendations.”

Interesting fact: we live in an era of accelerating AI development. The deep learning boom that started in the 2010s turned into a real explosion of accessibility in the 2020s. Once, only an expert with a PhD in mathematics could create a complex AI system. Now a developer can build a full-fledged assistant with memory, asynchronicity, and integrations over a weekend—and that’s become the norm.

In the end, we got an application that: - Accepts voice commands and turns them into actions - Executes backend operations without blocking the interface (thanks, async/await) - Remembers context and facts about the user - Independently diagnoses errors through a structured log - Honestly says when human help is needed

Ahead lies optimization, feature expansion, and integration with real APIs. The project proved the main thing: AI agents work best when they know their limitations and don’t try to play the unbreakable superhero.

Migrating from Linux is like changing tires while driving. On an airplane. 😄

Voice Agent Meets Persistent Memory: Building AI That Remembers

A Voice Agent Met Claude Code: How We Built a Persistent Assistant

Metadata