BorisovAI — Tools for the community. By the community.

I was staring at the MetaMath results—82% accuracy on GSM8K with voting, and the loss curve still declining at 3,000 training steps. The problem hit me: we had only scratched the surface of one dataset. The model was learning fast, but we were feeding it the same curated problems over and over. What if, instead of hunting for new external datasets, the model could generate its own training data?

The idea crystallized during a code review session. We had 7,473 problems in GSM8K’s training split. With simple augmentation—rephrasing, backward reasoning, changing numerical values (what the MetaMath team calls FOBAR)—we could multiply that into 36,000 diverse problems. The beauty was that we didn’t need SearXNG or any web scraper running on port 8888. We had everything already.

The plan became a three-stage closure loop. First, push the current MetaMath model further. We’d been training for 3K steps; the loss curve suggested we hadn’t hit diminishing returns yet. I scheduled a full run with 395K problems from MetaMathQA (not just GSM8K, but also MATH for diversity) across 10,000 steps. That’s 3.3 times longer. The target was straightforward: break 80% with greedy decoding, then test voting with N=8 and aim for 88-91%. Record territory.

But the real work was the second stage. I sketched out the self-augmentation pipeline: take each training problem, have the model rephrase it three ways, generate the backward reasoning (what mathematical path led to this problem), and vary the numbers while preserving the structure. No external API calls. No dataset downloads. Just the model and its own problems, recursively improving itself.

The third stage—the SearXNG agent—would wait. That was for unlimited data acquisition, feeding the loop continuously. But stages one and two? Those were self-contained. Closed. Independent of infrastructure.

While the training runs spun up, I kept thinking about why this matters. Most ML teams chase bigger, richer datasets. We were doing something different: proving that a focused model could bootstrap its own curriculum. MetaMath had shown the way with their augmentation pipeline. We were taking it inward, making it part of the learning cycle itself.

The voting layer alone was compelling. Eight different sampling passes over the same problem, then majority vote. It’s not elegant, but it works—trading inference cost for accuracy. With a self-augmented training set running in parallel, the model wouldn’t just get better at reasoning; it would learn to reason about reasoning.

And somewhere in that loop, there’s a joke waiting: why are machine learning engineers always drowning in their own data? Because they built the pump themselves. 😄

Building the Self-Augmentation Loop: When Your Model Becomes Its Own Data Generator

Metadata