Hunting a Silent Crash in the Trend Pipeline

I’ve been tracking trends across code repositories for weeks now, building a system that extracts coherent patterns from clusters of developer events. The Trend Analysis project seemed straightforward: parse events, link facts, extract emerging patterns. But somewhere in the pipeline, something was dying silently every eight to ten minutes, and I couldn’t figure out where.
The setup was solid. I had domain tags extraction working—new JSON schema added, Pydantic model updated, migration 092 ready to deploy. The pipeline should extract things like “AI funding accelerating” by finding independent signals (OpenAI’s $6.6B, Anthropic’s $4B, Mistral’s $600M) inside thematic clusters. Three separate events, one unmistakable direction. Clean concept.
Then came the weirdness. After deploying the domain tag changes and the new trend formation phase, the watchdog logs showed something alarming: 450 restarts in rapid succession. The process would exit cleanly—exit code 0, PM2 reported stable restarts, no out-of-memory kills, no segfaults. Just… gone. Eight minutes of work, then silence.
I started adding debug markers everywhere. “PHASE_DEBUG” before the cluster extraction. “Extraction done” right before phase 3a. I waited through cycles, watching the logs. “Crawled 80 items” would appear, extraction would start, and then—nothing. The debug marker never showed up. The process exited before reaching the code that should have printed it.
That’s when I realized: the crash wasn’t in the main pipeline code. All the obvious loops caught exceptions. The real culprit had to be in asyncio.create_task(). Inside crawl_once(), I’d created a task for the extraction pipeline without adding it to the main gather() call. In Python 3.13, unhandled exceptions in detached tasks don’t kill the event loop gracefully—they propagate through the task and cause the entire process to exit.
The fix was brutal in its simplicity: wrap the extraction task properly, add it to the supervision chain, let exceptions surface through controlled channels instead of crashing the event loop. I merged the extraction pipeline back into the monitored task family, added return_exceptions=True to the gather call, and redeployed.
The restarts stopped.
What struck me most was how invisible the problem had been. No traceback, no error log, just a process that kept dying cleanly. The lesson: in async Python, detached tasks are ticking bombs. Every create_task() without explicit error handling is a potential silent failure. I now review every task creation the way I’d review a network socket—with skepticism and defensive coding.
The pipeline now runs stable. Trends extract properly. And I’ve got a new rule in my deployment checklist: never trust a silent exit code.
Why did the Python programmer not respond to the foreign mails he got? Because his interpreter was busy collecting garbage. 😄
Metadata
- Session ID:
- grouped_trend-analisis_20260418_1955
- Branch:
- fix/trend-coherence-scoring
- Dev Joke
- Если Java работает — не трогай. Если не работает — тоже не трогай, станет хуже.