BorisovAI

Blog

Posts about the development process, solved problems and learned technologies

Found 2 notesReset filters
Learningllm-analisis

Training Seed 0: When Your GPU Burns and Your Model Learns

I've been staring at this training run for the past hour, watching the GPU meter sit stubbornly at 100% while 15.7GB of VRAM fills with the weight updates for Seed 0. We're at step 400 out of 500, and honestly, it's working. That might sound anticlimactic, but in machine learning, "working" is a victory worth documenting. This whole Phase 39 experiment started because we hit a wall. After Phase 38's catastrophic failures with unfreezing the backbone—we tried QLoRA, we tried GRPO, everything just collapsed into catastrophic forgetting—I realized we were swinging at shadows. The quest for that elusive +20 percentage points toward 94% on GSM8K wasn't going to come from tweaking the same approach. So instead of one big bet, we decided to hedge: run 20 different seeds through the same pipeline, let the data speak louder than our intuitions. The **LLM Analysis** project forced me to confront something uncomfortable: I'd been overthinking this. My colleague sent over that MiniMax M2.7 paper about "self-evolution," and I spent two hours reading about their agent-level meta-optimization—automatically analyzing errors, modifying configs, evaluating, accepting or reverting. Beautiful work, but it was the wrong kind of self-improvement. They're optimizing prompts and scaffolding; we're trying to optimize weights. Different game entirely. What struck me hardest was realizing how little separates a breakthrough from a dead end. The **test-time compute scaling** path—chain-of-thought sampling plus verifier—sits right there in our notes, untouched. We obsessed over weight-level unfreezing because it *felt* like the answer, but we never actually tested whether letting the model think harder before answering might push us past that 94% threshold. Sometimes the tool you need is hiding in the decisions you haven't made yet. So here's Seed 0, grinding through iterations while my GPU sweats. If this seed hits higher eval metrics than the baseline, we'll know something. If it doesn't, we'll know something else. That's the whole point of the search—not genius intuition, just *signal* from the data. The panel of experts keeps asking, "How do we build a self-improving architecture *and* hit 94% on Qwen 2.5 3B?" Maybe the answer isn't choosing one or the other. Maybe it's admitting that sometimes your GPU does the thinking while you take notes. *And if ASCII silly questions get silly ANSI answers, at least my training curves are deterministic.* 😄

Mar 20, 2026
Learningtrend-analisis

Fixing the Lowercase Monster: How One Function Was Silently Breaking Multilingual Text

I was deep in the **Trend Analysis** project, wrestling with something that seemed simple on the surface but was causing subtle chaos across our i18n pipeline. The issue? A function called `formatClassName` that was supposed to just capitalize the first letter of category names. Sounds harmless, right? It absolutely wasn't. The culprit was buried in our codebase—a function that didn't just capitalize the first letter; it was **aggressively lowercasing everything else**. When our backend sent us a perfectly formatted title like "React Native Adoption," this function would transform it into "React native adoption." Native, as a proper noun, lost its dignity. On the Russian side, it was even worse: carefully preserved Cyrillic capitalization from our `_enforce_sentence_case()` backend logic was being brutally flattened to lowercase. I'd been staring at this for two days before the real problem clicked. We have Claude on the backend already doing sentence-case enforcement for Russian and English descriptions. The frontend didn't need to fix what wasn't broken—it just needed to respect what the backend already got right. So instead of trying to be clever, I simplified the entire approach: **capitalize the first letter, leave everything else untouched**. The new logic was almost embarrassingly straightforward. First word gets a capital letter—*that's it*. Abbreviations like "AI," "LLM," and "API" stay uppercase because they never got lowercased in the first place. Proper nouns like "React" and "Native" survive unmolested. Russian text keeps its character. English text flows naturally. Testing the fix felt like watching a weight lift. "финансирование инвестиций в ИИ" now becomes "Финансирование инвестиций в ИИ" instead of "Финансирование инвестиций в ии." "Small language models contamination" stays readable instead of becoming "Small language models contamination" with lost emphasis. The fix was so simple—three lines of actual logic—that I almost missed how much damage the old approach was doing. The real lesson? Sometimes the best engineering isn't about adding smarter code; it's about removing code that shouldn't exist. I pushed the commit, and suddenly our category display across multiple languages looked **actually correct** for the first time. Programming is 10% science, 20% ingenuity, and 70% getting the ingenuity to work with the science. 😄

Mar 4, 2026