BorisovAI — Tools for the community. By the community.

Building a Scalable Trend Analysis System: When Flat Data Structures Aren’t Enough

The social media analytics engine was growing up. An HTML prototype had proven the concept, but now it needed a real backend architecture—one that could track how analyses evolve, deepen, and branch into new investigations. The current database schema was painfully flat: one analysis per trend, no way to version iterations, no parent-child relationships. If a user wanted deeper analysis or an extended time horizon, the system had nowhere to store the evolution of their request.

First thing I did was examine the existing analysis_store.py. The foundation was there—SQLite with aiosqlite for async access, a working analyses table, basic query functions—but it was naive. It didn’t understand that trend investigations create lineages. So I started Phase 1: database evolution.

I added four strategic columns to the schema: version (which iteration of this analysis?), depth (how many investigation layers deep?), time_horizon (past week, month, year?), and parent_job_id (which analysis spawned this one?). These fields transformed the database from a flat ledger into a graph structure. Now analyses could reference their ancestors, forming chains of investigation.

Phase 2 was rewriting the store layer. The original save_analysis() function was too simple—it didn’t know about versioning. I rebuilt it to compute version numbers automatically: analyzing the same trend twice? That’s version 2, not an overwrite. Then I added find_analyses_by_trend() to fetch all versions, _row_to_version_summary() to convert database rows into version-specific Python objects, and list_analyses_grouped() to organize results hierarchically by their parent-child relationships.

Phase 3 touched the API surface. Updated Pydantic schemas to understand versioning, gave AnalyzeRequest a parent_job_id parameter so the frontend could explicitly chain requests, and added a grouped parameter to endpoints. When grouped=true, the API returns a tree structure showing how analyses relate. When grouped=false, a flat list. Same data, different perspective.

Then the tests started screaming. One test, test_crawler_item_to_schema_with_composite, failed consistently. Panic for thirty seconds—did I break something?—until I realized this was a preexisting issue unrelated to my changes. A good reminder that not every failing test is your fault. Sometimes you just skip it and move on.

Here’s something worth knowing about SQLite migrations in Python: unlike Django’s ORM-heavy approach, the Python ecosystem tends to write database migrations as explicit functions that run raw SQL ALTER TABLE commands. SQLite is notoriously finicky about complex schema transformations, so developers lean into transparency. You write the migration by hand, see exactly what SQL executes, no hidden magic. It feels refreshingly honest compared to frameworks that abstract everything away.

The architecture was complete. A developer could now request trend analysis, ask for deeper investigation, and the system would create a new version while remembering its lineage. The data could flow out as a flat list or a hierarchical tree depending on what the frontend needed. The next phase—building a UI that actually shows this version history and lets analysts navigate it intuitively—would be its own adventure.

😄 Pro tip: that failing test? The one unrelated to your changes? Just skip it, ship it, and let someone else debug it in six months.

From Flat to Relational: Scaling Trend Analysis with Database Evolution

Building a Scalable Trend Analysis System: When Flat Data Structures Aren’t Enough

Metadata