BorisovAI — Tools for the community. By the community.

Building a Voice Agent: Orchestrating Python and JavaScript Across the Monorepo

The task landed on my desk with a familiar weight: build a voice agent that could handle real-time chat, authentication, and voice processing across a split architecture—Python backend, Next.js frontend. The real challenge wasn’t the individual pieces; it was orchestrating them without letting the complexity spiral into a tangled mess.

I started by sketching the backend foundation. FastAPI 0.115 became the core, not just because it’s fast, but because its native async support meant I could lean into streaming responses with sse-starlette 2 for real-time chat without wrestling with blocking I/O. Authentication came next—implementing it early rather than bolting it on later proved essential, as every subsequent endpoint needed to trust the user context.

The voice processing endpoints demanded careful thought. Unlike typical REST endpoints that fire-and-forget, voice required state management: buffering audio chunks, running inference, and streaming responses back. I structured these as separate concerns—one endpoint for transcription, another for chat context, another for voice synthesis. This separation meant I could debug and scale each independently.

Then came the frontend integration. The Next.js team needed to consume these endpoints, but they also needed to integrate with Telegram Mini App SDK (TMA)—which introduced its own authentication layer. The streaming chat UI in React 19 had to handle partial messages gracefully, displaying text as it arrived rather than waiting for the full response. This is where Tailwind CSS v4 with its new CSS-first configuration actually simplified things; the previous @apply-heavy syntax would have made dynamic class management messier.

Here’s something I discovered during this phase that most developers overlook: the separation of concerns in monorepos only works if you establish strict validation protocols upfront. I created a mental model—Python imports always get validated with a quick python -c 'from src.module import Class' check, npm builds happen after every frontend change, TypeScript gets run before anything ships. This discipline saved hours later when subtle import errors could have cascaded through the codebase.

The real insight came from studying the project’s ERROR_JOURNAL.md pattern. Instead of letting errors vanish into git history, documenting them upfront and checking that journal before attempting fixes prevented the classic mistake of solving the same problem three times. It’s institutional memory in a single markdown file.

One unexpected win: batching independent tasks across codebases in single commands. Rather than switching contexts repeatedly, I’d prepare backend validations and frontend builds together, letting them run in parallel. The monorepo structure—Python backend in /backend, Next.js in /frontend—made this clean. No cross-contamination, clear boundaries.

By the end, the architecture was solid: defined agent roles, comprehensive validation checks, and a documentation pattern that actually prevented repeated mistakes. The frontend could stream chat responses while the backend processed voice, and authentication threaded through both without becoming a bottleneck.

A SQL statement walks into a bar and sees two tables. It approaches and asks, “May I join you?” 😄

Voice Agent: Bridging Python, JavaScript, and Real-Time Complexity

Building a Voice Agent: Orchestrating Python and JavaScript Across the Monorepo

Metadata