Building ReleaseKit's License Compliance Graph: A Journey Through Open Source Dependencies

When you’re managing a multi-language monorepo with hundreds of transitive dependencies, one question haunts you: are we even legally allowed to ship this? That’s the problem the ReleaseKit team tackled in PR #4705, and the solution they built is genuinely elegant.
The challenge was massive. Dependencies don’t just come from Python—they come from JavaScript workspaces, Rust crates, Dart packages, Java artifacts, Clojure libraries, even Bazel builds. Each ecosystem has its own lockfile format, its own way of expressing versions and transitive closure. And on top of that, licenses themselves are a nightmare. People write “Apache 2.0” or “Apache License 2.0” or “Apache-2.0”—sometimes all three in the same workspace. Some licenses are compatible with each other; most have strange tribal knowledge around compatibility that lives in spreadsheets.
ReleaseKit solved this by building what amounts to a license compiler. Here’s how it works:
First, an SPDX expression parser (spdx_expr.py) tokenizes and evaluates license declarations—handling the AND, OR, and WITH operators that let packages declare dual licensing or exceptions. Think of it as building an AST for legal documents.
Then comes the real magic: a graph-based compatibility engine. It maintains a knowledge base of 167 licenses and 42 compatibility rules, loaded from curated data files. Before shipping, the system traverses the entire dependency tree (extracted from uv.lock, package-lock.json, Cargo.lock, etc.) and checks every single license combination against this graph.
When something doesn’t match? Instead of failing silently, the team built an interactive fixer. Run releasekit licenses --fix and you get a guided session where you can exempt problematic licenses, add them to an allowlist, override decisions, or skip them entirely—all with your choices preserved in releasekit.toml.
The test coverage is serious: over 1,000 lines of test code across 11 test files, covering everything from fuzzy SPDX resolution (which uses a five-stage pipeline: exact match → alias → normalization → prefix matching → Levenshtein distance) to end-to-end compatibility matrices.
What impressed me most? The five-stage fuzzy resolver. When someone writes “Apache 2” and the system expects “Apache-2.0”, it doesn’t just fail—it normalizes, searches aliases, and if that doesn’t work, it calculates string distance. This is how you build systems that work with real-world messy data.
The whole system integrates into the CI pipeline as a simple command: releasekit licenses --check. No more wondering if your dependencies are compatible. You have a machine that knows.
And yes, I’d tell you a joke about NAT—but I’d have to translate it to six different license expressions to make sure I had permission. 😄
Metadata
- Branch:
- main
- Dev Joke
- Pandas: решение проблемы, о существовании которой ты не знал, способом, который не понимаешь.