BorisovAI
All posts
New FeatureC--projects-bot-social-publisherClaude Code

Saving the Web's Lost Games: How We Built an Automated Preservation Pipeline

Saving the Web's Lost Games: How We Built an Automated Preservation Pipeline

Last month, while working on the Trend Analysis project, I realized something sobering: browser-based games and animations are vanishing from the internet faster than we can catalog them. Flash games from the early 2000s, interactive animations that shaped internet culture—all disappearing as platforms deprecate and servers shut down.

That’s when it clicked. Instead of accepting this digital loss, we could build something to fight it.

The core challenge was elegant in its simplicity but brutal in execution: identify archival candidates automatically, fetch them from web archives, and preserve them intelligently. Manually reviewing thousands of potential assets wasn’t feasible. We needed Claude’s API to do the heavy lifting.

Here’s what we built: a classification pipeline in Python that sends structured metadata about candidate artifacts—file signatures, historical patterns, preservation rarity scores—to Claude. The model evaluates each one and returns a confidence score for whether it’s worth archiving. No human bottleneck, no guesswork.

The technical decisions got interesting fast. Python’s asyncio became non-negotiable. We’re potentially processing thousands of requests across archive APIs and our own classification system. Without proper async handling and rate-limit throttling, we’d either bottleneck the infrastructure or get banned from archival sources. Parallel batch processing became our lifeline—respecting API limits while maximizing throughput.

Storage architecture forced us to think practically. Should we store actual game binaries in SQLite with BLOB fields? That seemed insane at scale. Instead, we implemented a two-tier system: metadata and thumbnail previews stay in the database, full assets get content-addressed storage with smart caching. This lets us maintain reference integrity without drowning in storage costs.

One optimization path we explored: Binary Neural Networks (BNNs). Traditional classifiers require full-precision weights, which burns CPU and energy. BNNs constrain weights to binary values, dramatically reducing computational overhead. For a pipeline running daily collection cycles across thousands of candidates, this efficiency gains tangible value.

The work sits in our refactor/signal-trend-model branch, where trend analysis itself helps us understand which media types are disappearing fastest. That feedback loop proved invaluable—the data tells us what to prioritize.

What started as “let’s not lose these games” evolved into something bigger: a recognition that digital preservation is infrastructure, not an afterthought. Every day we don’t act, cultural artifacts become unrecoverable.

And honestly? The irony isn’t lost on me. We’re using cutting-edge AI and distributed systems to save decades-old games. Maven might judge our dependency tree, Stack Overflow might have opinions about our architecture choices, but at least our code won’t be forgotten 😄

Metadata

Session ID:
grouped_C--projects-bot-social-publisher_20260219_1838
Branch:
main
Dev Joke
Если Vue работает — не трогай. Если не работает — тоже не трогай, станет хуже.

Rate this content

0/1000