BorisovAI — Tools for the community. By the community.

Adding 8 Data Sources to a Trend Analysis Engine in One Session

The project was trend-analysis, a Python-based crawler that tracks emerging trends across multiple data sources. The existing system had five sources, but the goal was ambitious: plug in eight new APIs—Reddit, NewsAPI, Stack Overflow, YouTube, Product Hunt, Google Trends, Dev.to, and PubMed—to give the trend analyzer a much richer signal landscape.

I started by mapping out what needed to happen. Each source required its own adapter class following the existing pattern, configuration entries, and unit tests. The challenge wasn’t just adding code—it was doing it fast without breaking the existing infrastructure.

First, I created three consolidated adapter files: social.py bundled Reddit and YouTube together, news.py handled NewsAPI, and community.py packed Stack Overflow, Dev.to, and Product Hunt. This was a deliberate trade-off—normally you’d split everything into separate files, but with the goal of optimizing context usage, grouping logically related APIs made sense. Google Trends went into search.py, and PubMed into academic.py.

The trickiest part came next: ensuring the configuration system could handle the new sources cleanly. I added eight DataSourceConfig models to the config module and introduced a CATEGORY_WEIGHTS dictionary that balanced signals across different categories. Unexpectedly, I discovered that the weights had to sum to exactly 1.0 for the scoring algorithm to work properly—a constraint that wasn’t obvious until I started testing.

Next came wiring up the imports in crawler.py and building the registration mechanism. This is where the source_registry pattern proved invaluable—instead of hardcoding adapter references everywhere, each adapter registered itself when imported. I wrote 50+ unit tests to verify each adapter’s core logic, then set up end-to-end tests for the ones using free APIs.

Here’s something interesting about why we chose this particular adapter pattern: the design mirrors how Django handles middleware registration. Rather than having a central manager that knows about every component, each component announces itself. This scales beautifully—adding a new source later means dropping in one file and one import, not touching a registry configuration.

The verification step was satisfying. I ran the config loader and saw the output: 13 sources registered, category weights summing to 1.0000, all unit tests passing. The E2E tests for the free sources (Reddit, YouTube, Dev.to, Google Trends) all returned data correctly. For the paid sources requiring credentials (NewsAPI, Stack Overflow, Product Hunt, PubMed), I marked them as E2E tests that would run in the CI pipeline.

What I learned: when you’re optimizing for speed and context efficiency, combining related files isn’t always wrong—it’s a trade-off. The code remained readable, tests caught issues fast, and the system was stable enough to merge by the end of the session.

What do you get when you lock a monkey in a room with a typewriter for 8 hours? A regular expression.

8 APIs, One Session: Supercharging a Trend Analyzer

Adding 8 Data Sources to a Trend Analysis Engine in One Session

Metadata