BorisovAI
All posts
New Featuretrend-analisisGit Commit

Teaching Trends to Think: Building a Smarter Scoring System

Teaching Trends to Think: Building a Smarter Scoring System

Scoring V2: Teaching a Trend Analyzer to Think Critically

The trend-analysis project had a critical gap: it could identify emerging trends across Hacker News, GitHub, and arXiv, but it couldn’t tell you why they mattered or when to act. A trend spamming aggregator websites looked the same as a genuinely important shift in technology. We needed to teach our analyzer to think like a skeptical investor.

The Challenge

Our task was twofold: build a scoring system that rated trends on urgency and quality, then validate those scores using real citation data. The architecture needed to be smart enough to dismiss aggregator noise—you know, those sites that just republish news from everywhere—while lifting signal from authoritative sources.

Building the Foundation

I started by designing Scoring V2, a two-axis recommendation engine. Each trend would get an urgency score (how fast is it moving?) and a quality score (how credible is the signal?), then the system would spit out one of four recommendations: ACT_NOW for critical trends, MONITOR for emerging patterns worth watching, EVERGREEN for stable long-term shifts, and IGNORE for noise. This wasn’t just arbitrary scoring—it required understanding what each data source actually valued.

The real complexity came from implementing Tavily citation-based validation. Instead of trusting trend counts, we’d count unique domains mentioning each trend. The logic was simple but effective: if a hundred different tech publications mention something, it’s probably real. If only five aggregator sites mention it, it’s probably not. I built count_citations() and _is_aggregator() methods into TavilyAdapter to filter out the noise, then implemented a fetch_news() function with configurable citation thresholds.

Frontend Meets Backend Reality

While the backend team worked on TrendScorer’s calculate_urgency() and calculate_quality() methods, I refactored the frontend to handle this new metadata. The old approach stored source counts as integers; the new one stored actual URLs in arrays. This meant building new components—RecommendationBadge to display those action recommendations and UrgencyQualityIcons to visualize the two-axis scoring. Small change in API, massive improvement in UX.

The crawler enrichment loop needed adjustment too. Every time we pulled trends from Hacker News, GitHub, or arXiv, we now augmented them with Tavily citation data. No more blind trend counting.

The Unexpected Win

Documentation always feels like friction until it saves you hours. I documented the entire approach in TAVILY_CITATION_APPROACH.md and SCORING_V2_PLAN.md, including the pitfalls we discovered: Tavily’s API rate limits, edge cases where aggregators are actually authoritative (hello, Product Hunt), and why citation thresholds needed to be configurable per data source. Future developers—or future me—could now understand why each decision was made.

What We Gained

The trend analyzer transformed overnight. Instead of alerting on everything, it now prioritizes ruthlessly. The recommendation system gives users a clear action hierarchy. Citation validation cuts through noise. When you’re tracking technology trends across the internet, that skeptical eye isn’t a feature—it’s the entire product.

😄 Why do trend analyzers make terrible poker players? They always fold on aggregator pages.

Metadata

Branch:
feat/auth-system
Dev Joke
VS Code: решение проблемы, о существовании которой ты не знал, способом, который не понимаешь.

Rate this content

0/1000