BorisovAI — Tools for the community. By the community.

Debugging the Reflection Loop: When Your Agent Forgets to Think

The ai-agents project had a mysterious problem: the bot wouldn’t start. But not in an obvious way. The voice agent’s self-reflection system—this shiny new feature that was supposed to make the bot smarter over time—simply wasn’t running. No errors, no crashes, just silence. The reflection loop never kicked off.

I started by diving into the architecture. The agent reflection system was designed to work independently of Ollama, using only Claude CLI and SQLite for memory management. Smart design—fewer moving parts. But something was broken in the startup sequence.

The first clue came from examining the initialization code in manager.py. The reflection task was supposed to be created and scheduled at startup: self._reflection_task = asyncio.create_task(self._reflection_loop()). This looked correct on paper. But when I traced through the actual execution flow, I realized the task was never being awaited or properly integrated into the application’s lifecycle.

The real problem was architectural: the reflection loop was defined but never actually wired into the startup sequence. It’s the kind of bug that seems obvious in retrospect—like forgetting to flip the main power switch while carefully installing all the wiring.

While investigating the reflection system, I discovered a secondary issue that had been lurking in the codebase. In handlers.py, there was a critical data corruption bug in the chat_with_tools function. Whenever tool execution failed, session.messages would remain in a broken state—containing tool_use blocks without corresponding tool_result blocks. On the next request, these malformed messages would be sent back to the API, causing cascading failures.

I added automatic cleanup in the exception handlers at three critical points, ensuring that corrupted message sequences were removed before they could propagate. This was paired with structured logging to capture the different failure patterns: error analysis, success patterns, knowledge gaps, optimization opportunities, and self-improvement signals.

But there was more. Later, the /insights command started crashing with a cryptic Telegram error: “can’t parse entities: Can’t find end of the entity starting at byte offset 1158”. The agent reflection content contained markdown special characters that, when combined with Telegram’s markdown parser, created malformed entities. I implemented markdown escaping at the output stage, sanitizing underscores, asterisks, and brackets before sending to Telegram.

Here’s the educational bit: Understanding message protocol design is crucial when working with multi-system architectures. Many developers overlook the fact that tool-calling frameworks require strict ordering: tool_use → tool_result → next response. Breaking this contract silently corrupts the conversation state in ways that are nightmarish to debug because the error surfaces much later, far removed from the actual cause.

By the end of the session, the reflection loop was properly integrated into the startup sequence, message handling was bulletproof, and the Telegram integration was rock-solid. The bot could now think about itself without crashing.

😄 Why did the developer add logging to the reflection system? Because debugging requires self-awareness!

When Your Agent Forgets to Think: Debugging the Reflection Loop

Debugging the Reflection Loop: When Your Agent Forgets to Think

Metadata