BorisovAI — Tools for the community. By the community.

When AI Responses Go Silent: Debugging the Great Output Vanishing Act

The ai-agents project had a peculiar problem on their hands. A user in Telegram was trying to issue a simple command—just “Создавай” (Create)—but something was going catastrophically wrong behind the scenes. The system was successfully connecting to Claude’s API, processing the request through multiple retries, and reporting success… yet returning absolutely nothing to the user. It was like sending a letter, getting a delivery confirmation, but finding an empty envelope.

The Setup

The architecture was elegant in theory: a Telegram bot routes CLI queries through Claude’s models (Sonnet in this case) with prompt caching enabled for performance. When a user sends a message, it gets routed to the CLI handler, batched as a request with a 5,344-character prompt, and sent to the API. The system had built-in retry logic—three attempts with exponential backoff (5 seconds, then 10 seconds). Everything looked reasonable on paper.

The Mystery Unfolds

But here’s where it got interesting. Looking at the logs from February 9th at 12:23:58 UTC, three consecutive API calls happened:

The first attempt took 26.5 seconds. The API returned a successful response ('is_error': False, 'subtype': 'success') but the actual result field was completely empty. The system had burned through impressive token usage—11,652 cache creation tokens and 37,616 cache read tokens—yet produced 1,701 output tokens that somehow vanished into the void.

The second attempt ran 5 seconds later. Similar pattern: 23 seconds of processing, cache hits working beautifully (1,740 creation tokens, 47,520 read tokens), 1,719 output tokens generated, and… nothing returned.

The third attempt mirrored the first two. Different session IDs, different token counts, but identical result: successful API call, zero actual content delivered to the user.

The Root Cause

This is where prompt caching becomes a double-edged sword. The system was efficiently caching the massive prompt context (over 37,000 tokens being read from cache on subsequent calls), which normally saves costs and improves latency. But the Claude API was generating responses—the token counts prove it—that weren’t being properly serialized into the response body. This suggests a bug in how the response handler was extracting content from the API response when prompt caching was heavily utilized.

The warning logs called it out explicitly: cli_empty_response. Three times. The system recognized the problem, triggered retries, but kept hitting the same invisible wall.

What This Teaches Us

Prompt caching in LLM APIs is powerful for reducing latency and costs, but it introduces complexity in response handling that developers often overlook. When everything reports “success” but users see nothing, the culprit is usually in the response extraction layer—the code that takes the API’s JSON and pulls out the actual generated content. It’s the kind of bug that looks impossible because all your metrics say the system is working perfectly.

The fix would likely involve explicitly checking that cached responses include a non-empty content field before marking the request as successful, rather than relying solely on the API’s is_error flag.

The lesson: monitor what your users actually receive, not just what your API metrics tell you about sending requests. 😄

Silent API Success: Why Claude Returned Nothing

When AI Responses Go Silent: Debugging the Great Output Vanishing Act

Metadata