BorisovAI

Blog

Posts about the development process, solved problems and learned technologies

Learningllm-analisis

Training Seed 0: When Your GPU Burns and Your Model Learns

I've been staring at this training run for the past hour, watching the GPU meter sit stubbornly at 100% while 15.7GB of VRAM fills with the weight updates for Seed 0. We're at step 400 out of 500, and honestly, it's working. That might sound anticlimactic, but in machine learning, "working" is a victory worth documenting. This whole Phase 39 experiment started because we hit a wall. After Phase 38's catastrophic failures with unfreezing the backbone—we tried QLoRA, we tried GRPO, everything just collapsed into catastrophic forgetting—I realized we were swinging at shadows. The quest for that elusive +20 percentage points toward 94% on GSM8K wasn't going to come from tweaking the same approach. So instead of one big bet, we decided to hedge: run 20 different seeds through the same pipeline, let the data speak louder than our intuitions. The **LLM Analysis** project forced me to confront something uncomfortable: I'd been overthinking this. My colleague sent over that MiniMax M2.7 paper about "self-evolution," and I spent two hours reading about their agent-level meta-optimization—automatically analyzing errors, modifying configs, evaluating, accepting or reverting. Beautiful work, but it was the wrong kind of self-improvement. They're optimizing prompts and scaffolding; we're trying to optimize weights. Different game entirely. What struck me hardest was realizing how little separates a breakthrough from a dead end. The **test-time compute scaling** path—chain-of-thought sampling plus verifier—sits right there in our notes, untouched. We obsessed over weight-level unfreezing because it *felt* like the answer, but we never actually tested whether letting the model think harder before answering might push us past that 94% threshold. Sometimes the tool you need is hiding in the decisions you haven't made yet. So here's Seed 0, grinding through iterations while my GPU sweats. If this seed hits higher eval metrics than the baseline, we'll know something. If it doesn't, we'll know something else. That's the whole point of the search—not genius intuition, just *signal* from the data. The panel of experts keeps asking, "How do we build a self-improving architecture *and* hit 94% on Qwen 2.5 3B?" Maybe the answer isn't choosing one or the other. Maybe it's admitting that sometimes your GPU does the thinking while you take notes. *And if ASCII silly questions get silly ANSI answers, at least my training curves are deterministic.* 😄

Mar 20, 2026
Bug Fixllm-analisis

Choosing the Right Seed: When Initialization Becomes Strategy

We'd hit a wall. After weeks of pushing the **LLM Analysis** project forward, our attempts to improve model performance had stalled. Every tweak to the architecture seemed to plateau around 76%, and we couldn't figure out why. Then one of our experts suggested something counterintuitive: *maybe the initialization dependency wasn't a bug—maybe it was a feature we hadn't learned to exploit yet*. The turning point came when we stopped treating seed selection as noise and started treating it as a first-class optimization problem. **Claude** was helping us orchestrate the experiments, and we realized we could systematically test different initialization seeds across our **Orchestra-MoE** model. The theory was compelling: if we ran 20 independent training runs with different seeds, the variance in performance would give us a window into what was actually happening inside the network. Our panelists—researchers specializing in initialization theory and practical deep learning—all agreed on the same direction. One pointed to the statistical insight that the expected maximum performance across N runs follows E[max(N)] ≈ mean + std × √(2 ln N). For 20 runs, this predicted we could push performance to roughly **77.3%**, nearly 1.4 percentage points above the baseline. It wasn't revolutionary, but it was real. What sold us on the approach, though, was the *practical math*. We'd spent over 85 hours experimenting with different architectural phases without meaningful gains. Running 20 seeds would take only 10 hours on GPU. The ROI was undeniable. The strategy had layers. First, we'd select the best seed based on validation performance, then validate it honestly on our full test set—1,319 problems—rather than cherry-picking. Second, we'd combine the top three seeds using ensemble voting; different initializations make different mistakes, and majority voting would smooth out the quirks. Third, we could layer this with data-dependent initialization techniques like SVD-based seed selection, potentially reducing variance even further. We also discovered synergies with other work in progress: combining seed selection with our routing mechanism gave us an extra 0.2 percentage points, and curriculum learning with the best seed had already reached 79% in earlier experiments. The lesson wasn't just about statistics or architecture. It was about **perspective shift**. What looked like a limitation—that results depended heavily on how we started the model—turned out to be a lever we hadn't pulled. By embracing the variance instead of fighting it, we'd found a path forward that was both theoretically sound and practically efficient. We wrote the batch script that night, set it running across 20 seeds, and finally felt that familiar sensation: *momentum*.

Mar 20, 2026
New Featurescada-coating

Building the Open SCADA Revolution: From Tagat to Independence

When I finished my two-year tenure as the lead developer at Tagat, one thought consumed me: **why does the electroplating industry remain locked into proprietary SCADA systems?** Thousands of coating lines across the globe run on closed-source software, each facility dependent on a single vendor for updates, support, and innovation. That frustration became the fuel for BorisovAI. I assembled a team with the same hunger for change. Together, we didn't just talk about an alternative—we **built one**. Our SCADA system for electroplating is production-ready, battle-tested, and fundamentally different. It runs on open standards, which means manufacturers gain something they've never had: *independence from vendor lock-in*. The technical challenge was immense. Electroplating requires real-time control of temperature, current density, pH levels, and chemical composition across multiple tanks. One miscalibration cascades into waste and equipment damage. We engineered redundancy into every layer—from sensor input validation to fail-safe switching protocols. The system communicates via standard APIs, integrates with existing PLCs, and logs everything in a transparent database. No black boxes. No mystery bugs that only the vendor understands. But building the software solved only half the puzzle. The real bottleneck? **We needed a manufacturing partner willing to take a risk on open-source SCADA.** That's where the partnership proposal came in. We approached leading electroplating equipment manufacturers with a simple offer: *your facility becomes our proof of concept*. You get a turnkey system that's already proven. We get the real-world validation and deployment case study we desperately need. The economics are compelling. Traditional vendors charge licensing fees and lock customers into service contracts. Our model flips that—the software is free and open. Manufacturers profit through independence, customization freedom, and the knowledge that their investment in process optimization stays *their* investment, not licensed intellectual property they'll lose if the vendor goes under. What we're proposing isn't just a technical upgrade; it's a structural shift. One coating line becomes two. Two become ten. Suddenly, the electroplating industry has options. That's the revolution we're building. --- *The glass isn't half-full or half-empty—it's twice as big as it needs to be. Same with proprietary SCADA: oversized prices for undercapacity innovation.* 😄

Mar 18, 2026
New Featurescada-coating

Building the Open SCADA Revolution: From Tagat to Independence

When I finished my two-year tenure as the lead developer at Tagat, one thought consumed me: **why does the electroplating industry remain locked into proprietary SCADA systems?** Thousands of coating lines across the globe run on closed-source software, each facility dependent on a single vendor for updates, support, and innovation. That frustration became the fuel for BorisovAI. I assembled a team with the same hunger for change. Together, we didn't just talk about an alternative—we **built one**. Our SCADA system for electroplating is production-ready, battle-tested, and fundamentally different. It runs on open standards, which means manufacturers gain something they've never had: *independence from vendor lock-in*. The technical challenge was immense. Electroplating requires real-time control of temperature, current density, pH levels, and chemical composition across multiple tanks. One miscalibration cascades into waste and equipment damage. We engineered redundancy into every layer—from sensor input validation to fail-safe switching protocols. The system communicates via standard APIs, integrates with existing PLCs, and logs everything in a transparent database. No black boxes. No mystery bugs that only the vendor understands. But building the software solved only half the puzzle. The real bottleneck? **We needed a manufacturing partner willing to take a risk on open-source SCADA.** That's where the partnership proposal came in. We approached leading electroplating equipment manufacturers with a simple offer: *your facility becomes our proof of concept*. You get a turnkey system that's already proven. We get the real-world validation and deployment case study we desperately need. The economics are compelling. Traditional vendors charge licensing fees and lock customers into service contracts. Our model flips that—the software is free and open. Manufacturers profit through independence, customization freedom, and the knowledge that their investment in process optimization stays *their* investment, not licensed intellectual property they'll lose if the vendor goes under. What we're proposing isn't just a technical upgrade; it's a structural shift. One coating line becomes two. Two become ten. Suddenly, the electroplating industry has options. That's the revolution we're building. --- *The glass isn't half-full or half-empty—it's twice as big as it needs to be. Same with proprietary SCADA: oversized prices for undercapacity innovation.* 😄

Mar 18, 2026
New Featurescada-coating

Building the Open SCADA Revolution: From Tagat to Independence

When I finished my two-year tenure as the lead developer at Tagat, one thought consumed me: **why does the electroplating industry remain locked into proprietary SCADA systems?** Thousands of coating lines across the globe run on closed-source software, each facility dependent on a single vendor for updates, support, and innovation. That frustration became the fuel for BorisovAI. I assembled a team with the same hunger for change. Together, we didn't just talk about an alternative—we **built one**. Our SCADA system for electroplating is production-ready, battle-tested, and fundamentally different. It runs on open standards, which means manufacturers gain something they've never had: *independence from vendor lock-in*. The technical challenge was immense. Electroplating requires real-time control of temperature, current density, pH levels, and chemical composition across multiple tanks. One miscalibration cascades into waste and equipment damage. We engineered redundancy into every layer—from sensor input validation to fail-safe switching protocols. The system communicates via standard APIs, integrates with existing PLCs, and logs everything in a transparent database. No black boxes. No mystery bugs that only the vendor understands. But building the software solved only half the puzzle. The real bottleneck? **We needed a manufacturing partner willing to take a risk on open-source SCADA.** That's where the partnership proposal came in. We approached leading electroplating equipment manufacturers with a simple offer: *your facility becomes our proof of concept*. You get a turnkey system that's already proven. We get the real-world validation and deployment case study we desperately need. The economics are compelling. Traditional vendors charge licensing fees and lock customers into service contracts. Our model flips that—the software is free and open. Manufacturers profit through independence, customization freedom, and the knowledge that their investment in process optimization stays *their* investment, not licensed intellectual property they'll lose if the vendor goes under. What we're proposing isn't just a technical upgrade; it's a structural shift. One coating line becomes two. Two become ten. Suddenly, the electroplating industry has options. That's the revolution we're building. --- *The glass isn't half-full or half-empty—it's twice as big as it needs to be. Same with proprietary SCADA: oversized prices for undercapacity innovation.* 😄

Mar 18, 2026
New Featurescada-coating

Building the Open SCADA Revolution: From Tagat to Independence

When I finished my two-year tenure as the lead developer at Tagat, one thought consumed me: **why does the electroplating industry remain locked into proprietary SCADA systems?** Thousands of coating lines across the globe run on closed-source software, each facility dependent on a single vendor for updates, support, and innovation. That frustration became the fuel for BorisovAI. I assembled a team with the same hunger for change. Together, we didn't just talk about an alternative—we **built one**. Our SCADA system for electroplating is production-ready, battle-tested, and fundamentally different. It runs on open standards, which means manufacturers gain something they've never had: *independence from vendor lock-in*. The technical challenge was immense. Electroplating requires real-time control of temperature, current density, pH levels, and chemical composition across multiple tanks. One miscalibration cascades into waste and equipment damage. We engineered redundancy into every layer—from sensor input validation to fail-safe switching protocols. The system communicates via standard APIs, integrates with existing PLCs, and logs everything in a transparent database. No black boxes. No mystery bugs that only the vendor understands. But building the software solved only half the puzzle. The real bottleneck? **We needed a manufacturing partner willing to take a risk on open-source SCADA.** That's where the partnership proposal came in. We approached leading electroplating equipment manufacturers with a simple offer: *your facility becomes our proof of concept*. You get a turnkey system that's already proven. We get the real-world validation and deployment case study we desperately need. The economics are compelling. Traditional vendors charge licensing fees and lock customers into service contracts. Our model flips that—the software is free and open. Manufacturers profit through independence, customization freedom, and the knowledge that their investment in process optimization stays *their* investment, not licensed intellectual property they'll lose if the vendor goes under. What we're proposing isn't just a technical upgrade; it's a structural shift. One coating line becomes two. Two become ten. Suddenly, the electroplating industry has options. That's the revolution we're building. --- *The glass isn't half-full or half-empty—it's twice as big as it needs to be. Same with proprietary SCADA: oversized prices for undercapacity innovation.* 😄

Mar 18, 2026
New Featurescada-coating

Building the Open SCADA Revolution: From Tagat to Independence

When I finished my two-year tenure as the lead developer at Tagat, one thought consumed me: **why does the electroplating industry remain locked into proprietary SCADA systems?** Thousands of coating lines across the globe run on closed-source software, each facility dependent on a single vendor for updates, support, and innovation. That frustration became the fuel for BorisovAI. I assembled a team with the same hunger for change. Together, we didn't just talk about an alternative—we **built one**. Our SCADA system for electroplating is production-ready, battle-tested, and fundamentally different. It runs on open standards, which means manufacturers gain something they've never had: *independence from vendor lock-in*. The technical challenge was immense. Electroplating requires real-time control of temperature, current density, pH levels, and chemical composition across multiple tanks. One miscalibration cascades into waste and equipment damage. We engineered redundancy into every layer—from sensor input validation to fail-safe switching protocols. The system communicates via standard APIs, integrates with existing PLCs, and logs everything in a transparent database. No black boxes. No mystery bugs that only the vendor understands. But building the software solved only half the puzzle. The real bottleneck? **We needed a manufacturing partner willing to take a risk on open-source SCADA.** That's where the partnership proposal came in. We approached leading electroplating equipment manufacturers with a simple offer: *your facility becomes our proof of concept*. You get a turnkey system that's already proven. We get the real-world validation and deployment case study we desperately need. The economics are compelling. Traditional vendors charge licensing fees and lock customers into service contracts. Our model flips that—the software is free and open. Manufacturers profit through independence, customization freedom, and the knowledge that their investment in process optimization stays *their* investment, not licensed intellectual property they'll lose if the vendor goes under. What we're proposing isn't just a technical upgrade; it's a structural shift. One coating line becomes two. Two become ten. Suddenly, the electroplating industry has options. That's the revolution we're building. --- *The glass isn't half-full or half-empty—it's twice as big as it needs to be. Same with proprietary SCADA: oversized prices for undercapacity innovation.* 😄

Mar 18, 2026
New Featurescada-coating

Building the Open SCADA Revolution: From Tagat to Independence

When I finished my two-year tenure as the lead developer at Tagat, one thought consumed me: **why does the electroplating industry remain locked into proprietary SCADA systems?** Thousands of coating lines across the globe run on closed-source software, each facility dependent on a single vendor for updates, support, and innovation. That frustration became the fuel for BorisovAI. I assembled a team with the same hunger for change. Together, we didn't just talk about an alternative—we **built one**. Our SCADA system for electroplating is production-ready, battle-tested, and fundamentally different. It runs on open standards, which means manufacturers gain something they've never had: *independence from vendor lock-in*. The technical challenge was immense. Electroplating requires real-time control of temperature, current density, pH levels, and chemical composition across multiple tanks. One miscalibration cascades into waste and equipment damage. We engineered redundancy into every layer—from sensor input validation to fail-safe switching protocols. The system communicates via standard APIs, integrates with existing PLCs, and logs everything in a transparent database. No black boxes. No mystery bugs that only the vendor understands. But building the software solved only half the puzzle. The real bottleneck? **We needed a manufacturing partner willing to take a risk on open-source SCADA.** That's where the partnership proposal came in. We approached leading electroplating equipment manufacturers with a simple offer: *your facility becomes our proof of concept*. You get a turnkey system that's already proven. We get the real-world validation and deployment case study we desperately need. The economics are compelling. Traditional vendors charge licensing fees and lock customers into service contracts. Our model flips that—the software is free and open. Manufacturers profit through independence, customization freedom, and the knowledge that their investment in process optimization stays *their* investment, not licensed intellectual property they'll lose if the vendor goes under. What we're proposing isn't just a technical upgrade; it's a structural shift. One coating line becomes two. Two become ten. Suddenly, the electroplating industry has options. That's the revolution we're building. --- *The glass isn't half-full or half-empty—it's twice as big as it needs to be. Same with proprietary SCADA: oversized prices for undercapacity innovation.* 😄

Mar 18, 2026
New Featurescada-coating

Building the Open SCADA Revolution: From Tagat to Independence

When I finished my two-year tenure as the lead developer at Tagat, one thought consumed me: **why does the electroplating industry remain locked into proprietary SCADA systems?** Thousands of coating lines across the globe run on closed-source software, each facility dependent on a single vendor for updates, support, and innovation. That frustration became the fuel for BorisovAI. I assembled a team with the same hunger for change. Together, we didn't just talk about an alternative—we **built one**. Our SCADA system for electroplating is production-ready, battle-tested, and fundamentally different. It runs on open standards, which means manufacturers gain something they've never had: *independence from vendor lock-in*. The technical challenge was immense. Electroplating requires real-time control of temperature, current density, pH levels, and chemical composition across multiple tanks. One miscalibration cascades into waste and equipment damage. We engineered redundancy into every layer—from sensor input validation to fail-safe switching protocols. The system communicates via standard APIs, integrates with existing PLCs, and logs everything in a transparent database. No black boxes. No mystery bugs that only the vendor understands. But building the software solved only half the puzzle. The real bottleneck? **We needed a manufacturing partner willing to take a risk on open-source SCADA.** That's where the partnership proposal came in. We approached leading electroplating equipment manufacturers with a simple offer: *your facility becomes our proof of concept*. You get a turnkey system that's already proven. We get the real-world validation and deployment case study we desperately need. The economics are compelling. Traditional vendors charge licensing fees and lock customers into service contracts. Our model flips that—the software is free and open. Manufacturers profit through independence, customization freedom, and the knowledge that their investment in process optimization stays *their* investment, not licensed intellectual property they'll lose if the vendor goes under. What we're proposing isn't just a technical upgrade; it's a structural shift. One coating line becomes two. Two become ten. Suddenly, the electroplating industry has options. That's the revolution we're building. --- *The glass isn't half-full or half-empty—it's twice as big as it needs to be. Same with proprietary SCADA: oversized prices for undercapacity innovation.* 😄

Mar 18, 2026
New Featurescada-coating

Building the Open SCADA Revolution: From Tagat to Independence

When I finished my two-year tenure as the lead developer at Tagat, one thought consumed me: **why does the electroplating industry remain locked into proprietary SCADA systems?** Thousands of coating lines across the globe run on closed-source software, each facility dependent on a single vendor for updates, support, and innovation. That frustration became the fuel for BorisovAI. I assembled a team with the same hunger for change. Together, we didn't just talk about an alternative—we **built one**. Our SCADA system for electroplating is production-ready, battle-tested, and fundamentally different. It runs on open standards, which means manufacturers gain something they've never had: *independence from vendor lock-in*. The technical challenge was immense. Electroplating requires real-time control of temperature, current density, pH levels, and chemical composition across multiple tanks. One miscalibration cascades into waste and equipment damage. We engineered redundancy into every layer—from sensor input validation to fail-safe switching protocols. The system communicates via standard APIs, integrates with existing PLCs, and logs everything in a transparent database. No black boxes. No mystery bugs that only the vendor understands. But building the software solved only half the puzzle. The real bottleneck? **We needed a manufacturing partner willing to take a risk on open-source SCADA.** That's where the partnership proposal came in. We approached leading electroplating equipment manufacturers with a simple offer: *your facility becomes our proof of concept*. You get a turnkey system that's already proven. We get the real-world validation and deployment case study we desperately need. The economics are compelling. Traditional vendors charge licensing fees and lock customers into service contracts. Our model flips that—the software is free and open. Manufacturers profit through independence, customization freedom, and the knowledge that their investment in process optimization stays *their* investment, not licensed intellectual property they'll lose if the vendor goes under. What we're proposing isn't just a technical upgrade; it's a structural shift. One coating line becomes two. Two become ten. Suddenly, the electroplating industry has options. That's the revolution we're building. --- *The glass isn't half-full or half-empty—it's twice as big as it needs to be. Same with proprietary SCADA: oversized prices for undercapacity innovation.* 😄

Mar 18, 2026
New Featurescada-coating

Building the Open SCADA Revolution: From Tagat to Independence

When I finished my two-year tenure as the lead developer at Tagat, one thought consumed me: **why does the electroplating industry remain locked into proprietary SCADA systems?** Thousands of coating lines across the globe run on closed-source software, each facility dependent on a single vendor for updates, support, and innovation. That frustration became the fuel for BorisovAI. I assembled a team with the same hunger for change. Together, we didn't just talk about an alternative—we **built one**. Our SCADA system for electroplating is production-ready, battle-tested, and fundamentally different. It runs on open standards, which means manufacturers gain something they've never had: *independence from vendor lock-in*. The technical challenge was immense. Electroplating requires real-time control of temperature, current density, pH levels, and chemical composition across multiple tanks. One miscalibration cascades into waste and equipment damage. We engineered redundancy into every layer—from sensor input validation to fail-safe switching protocols. The system communicates via standard APIs, integrates with existing PLCs, and logs everything in a transparent database. No black boxes. No mystery bugs that only the vendor understands. But building the software solved only half the puzzle. The real bottleneck? **We needed a manufacturing partner willing to take a risk on open-source SCADA.** That's where the partnership proposal came in. We approached leading electroplating equipment manufacturers with a simple offer: *your facility becomes our proof of concept*. You get a turnkey system that's already proven. We get the real-world validation and deployment case study we desperately need. The economics are compelling. Traditional vendors charge licensing fees and lock customers into service contracts. Our model flips that—the software is free and open. Manufacturers profit through independence, customization freedom, and the knowledge that their investment in process optimization stays *their* investment, not licensed intellectual property they'll lose if the vendor goes under. What we're proposing isn't just a technical upgrade; it's a structural shift. One coating line becomes two. Two become ten. Suddenly, the electroplating industry has options. That's the revolution we're building. --- *The glass isn't half-full or half-empty—it's twice as big as it needs to be. Same with proprietary SCADA: oversized prices for undercapacity innovation.* 😄

Mar 18, 2026
Bug Fixllm-analisis

Hunting the 79% Signal: When Clean Data Beats Dirty Shortcuts

I was staring at Phase 29a's numbers when something caught my eye. The peak accuracy on GSM8K hit **79.3%** — but there was a problem. I couldn't replicate it. The intermediate evaluation data was missing, the training logs were patchy, and I had no idea which 150 tasks out of 500 had actually pushed the model over that threshold. It felt like chasing a ghost. The culprit? Dirty data. Phase 29a had mixed in curriculum-ordered examples without cleaning them first, and while the peak looked impressive, the signal was buried under noise. By the time we hit 500 tasks, the accuracy collapsed to 73.0%. That's a 6.3 percentage point drop from peak — a classic sign that something fundamental was wrong. So I decided to rebuild from scratch with Phase 30b. This time, I committed to **clean data first**. I stripped out the curriculum scheduling, removed the intermediate hacks, and ran the exact same GSM8K benchmark with proper tracking at every 50-task checkpoint. The goal was simple: if that 79% signal was real, it should reproduce. If it was noise, I needed to know. The results came back, and my instinct was right. Phase 30b hit **79.0% at n=200** — just 0.3 points below 29a's peak, despite using fundamentally different data. But here's what mattered more: the final score at 500 tasks was **75.8%**, not 73.0%. That's a **2.8 percentage point improvement** just from cleaning the data. The perplexity dropped to 2.14. The curve stayed smooth all the way down, no sudden collapses. The signal was reproducible. It was *real*. What surprised me most wasn't the peak — it was the shape of the degradation. From 79.0% down to 75.8% is only a 3.2pp drop, compared to the 6.3pp cliff in 29a. Clean data meant the model's confidence stayed calibrated even as it learned more examples. It wasn't forgetting earlier lessons; it was integrating them. But there's a catch: Phase 30b still sits below **24a's 76.8%** when you look at the full run. The curriculum approach helps on the first 200 tasks, then starts hurting. That tells me the strategy itself isn't the problem — it's *how* we're applying it. We need selective curriculum, not blanket curriculum. Next step? Phase 30a — a diagnostic baseline that tracks **which specific tasks** 30b solves better or worse than the clean baseline. Once I have that problem-level granularity, I can design a smarter curriculum that knows when to order examples and when to let randomness win. For now, though, I've got my GO-signal: peak accuracy above 79%, final accuracy above 75%, and reproducibility that didn't exist before. Clean data wins. It always does — why did the Python data scientist get arrested at customs? She was caught trying to import pandas! 😄

Mar 4, 2026
New Featurespeech-to-text

Choosing the Right Whisper Model When Every Millisecond Counts

I was deep in the weeds of a Speech-to-Text project when a comment came in: *"Have you tested the HuggingFace Whisper large-v3 Russian finetuned model?"* It was a fair question. The model showed impressive metrics—6.39% WER on Common Voice 17, significantly beating the original Whisper's 9.84%. On paper, it looked like a slam dunk upgrade. So I did what any engineer should: I dug into the actual constraints of what we were building. The project had a hard requirement I couldn't negotiate around: **sub-one-second latency for push-to-talk input**. That's not "nice to have"—that's the user experience. The moment speech recognition lags behind what someone just said, the interface feels broken. I pulled the specs. The finetuned model is based on Whisper large-v3, which means it inherited the same 3 GB footprint and 1.5 billion parameters. A finetuning job doesn't shrink the model; it only adjusts weights. On my RTX 4090 test rig, the original large-v3 was clocking 2.30 seconds per utterance. The Russian finetuned version? Same architecture, same inference time ballpark. On CPU? 10–15 seconds. Completely out of bounds. Meanwhile, I'd already benchmarked **GigaAM v3-e2e-rnnt**, a smaller RNN-T model purpose-built for low-latency scenarios. It was hitting 3.3% WER on my actual dataset—only half a percentage point worse than the finetuned Whisper—and doing it in 0.66 seconds on CPU. Even accounting for the fact that the finetuned Whisper might perform better on my data than on Common Voice, I was still looking at roughly **3–4× the latency for marginal accuracy gains**. This is where real-world constraints collide with benchmark numbers. The HuggingFace model is genuinely good work—if your use case is batch transcription with GPU available, or offline processing where speed doesn't matter, it's worth every look. But for interactive, real-time push-to-talk? **Smaller, purpose-built models win on both accuracy and speed.** I wrote back thanking them for the suggestion, explained the tradeoffs, and stayed with GigaAM. No regrets. Sometimes the best engineering decision isn't picking the flashiest model—it's picking the one that actually fits your constraints. And hey, speaking of models and networks—I've got a really good UDP joke, but I'm not sure you'll get it. 😄

Mar 4, 2026
New Featureborisovai-site

Tuning Whisper for Russian: The Real-Time Recognition Challenge

I was deep in the ScribeAir project—building real-time speech recognition that had to work in under a second per audio chunk. The bottleneck wasn't where I expected it. Everyone kept pointing me toward bigger, better models. Someone mentioned `whisper-large-v3-russian` from Hugging Face, finetuned on Common Voice 17.0, with impressive WER improvements (9.84 down to 6.39). Sounds like a slam dunk, right? Better accuracy, Russian-optimized, problem solved. But here's where the constraints bit back. The full `whisper-large-v3` model is 1.5B parameters. On CPU inference, that's not a milliseconds problem—it's a seconds problem. I had a hard real-time budget: roughly **1 second per audio chunk**. The finetuned Russian model, while phenomenal for accuracy, didn't magically shrink. It was still the same size under the hood, just with weights adjusted for Cyrillic phonetics and Russian linguistic patterns. No distillation, no architecture compression—just better training data. I had to make a choice: chase the accuracy dragon or respect the physics of the system. That's when I pivoted to **distil-whisper**. It's radically smaller—a genuine distillation of the original Whisper architecture, stripped down to fit the real-time constraint. The tradeoff was obvious: I'd lose some of that Russian-specific fine-tuning, but I'd gain the ability to actually ship something that processes audio in real time on consumer hardware. The decision crystallized something I'd been wrestling with: **in production systems, the perfect model that can't run fast enough is just as useless as a broken model.** The finetuned Russian Whisper is genuinely impressive research—it shows what's possible when you invest in language-specific training. But it lives in a different problem space than ScribeAir. If I were building offline batch transcription, a content moderation service, or something where latency wasn't the primary constraint, that Russian finetuned model would be the obvious choice. For real-time streaming, where every millisecond counts and the user is waiting for output *now*, distil-whisper was the practical answer. The lesson stuck with me: **don't optimize for the metrics you *wish* mattered—optimize for the constraints that actually exist.** Accuracy is beautiful. Speed is infrastructure. Both matter. But in production, speed often wins.

Mar 4, 2026
New Featurellm-analisis

The Hidden Peak: Why We Almost Missed Our Best Accuracy Score

I was staring at `results.json` when something felt wrong. Our **LLM Analysis** project had just completed Phase 29b, and the final accuracy number looked... unremarkable. But I'd noticed something in the intermediate logs that wouldn't leave me alone: a spike at **79.3%** that vanished by the end of the run. The culprit? Our `eval_gsm8k()` function was only recording the final accuracy number. We'd built the entire evaluation pipeline around a single verdict—the last checkpoint, the ultimate truth. But mathematical models don't work that way. They *plateau*, they *spike*, they *crash*. We were missing the entire story. Here's what happened: I was reviewing the stdout logs (the ones we don't normally save) and spotted that our curriculum-trained variant hit 79.3% accuracy on 150 GSM8K tasks—a **+4 percentage points improvement** over any previous experiment on the same checkpoint. That's massive in the LLM world. But because we only saved the final number, the `results.json` looked like just another run. The peak was invisible. The fix seemed obvious in hindsight. I updated the `eval_gsm8k()` function across both `train_exp29a.py` and `train_exp29b.py` to return not just the final accuracy, but an **`intermediate` array**—accuracy measurements every 50 tasks—and a **`peak` object** capturing the maximum accuracy and when it occurred. Same function, smarter output. But this wasn't really a coding fix. It was a *philosophy* shift. We'd been thinking like engineers—*optimize for the final metric*—when we should've been thinking like researchers—*track the trajectory*. The intermediate numbers tell you *which approach works for which problem subset*. They tell you whether a method is stable or lucky. They tell you *why* one approach outperforms another. I added a critical note to `MEMORY.md`: **"КРИТИЧНО: Промежуточные eval данные"** (Critical: Intermediate eval data). Because this will happen again. Someone will optimize for the headline number and miss the real insight hiding in the curves. The irony? The joke in the debugging world goes: *"The six stages are: that can't happen, that doesn't happen on my machine, that shouldn't happen, why does that happen, oh I see, how did that ever work?"* We'd been stuck at stage 3—thinking our 79.3% spike "shouldn't happen"—when we should've been asking stage 4: why *does* it happen? The curriculum data is giving us a signal on specific task subsets. Some problems love structure; others suffer from it. That's not noise. That's the answer. Now we move to Phase 29c with this knowledge: **track everything, trust nothing at face value, and always ask what the numbers are really hiding.**

Mar 4, 2026
New Featurellm-analisis

The 79.3% Peak We Almost Missed: Why Intermediate Data Matters

We were drowning in numbers. **Phase 29a** of our LLM curriculum learning experiment had completed, and like always, I opened `results.json` to check the final accuracy score. **79.3%** jumped out at me—a stunning improvement over the baseline. I felt the familiar rush: breakthrough moment. Then reality hit differently than expected. The problem wasn't that we *got* 79.3%. The problem was that we *almost didn't see it*. Here's what happened: our `eval_gsm8k()` function was printing intermediate results every 50 GSM8K problems directly to stdout. The model achieved **119 correct answers out of 150** on the curriculum-selected subset—a crisp 79.3%. But the function only returned a final aggregate number to the results JSON. We had metrics, sure, but we had architecture blindness. The curriculum learning pipeline was evaluating on curated problem sets, reporting aggregate accuracy, and we were reading the digest instead of analyzing the signal. When I dug into the stdout logs afterward, the pattern became visible: the curriculum data helped dramatically on certain problem categories while actively *harming* performance on others. The remaining 350 general GSM8K problems showed only 70.3% accuracy. Curriculum isn't magic—it's direction. And we weren't capturing the directional information. **The fix was architectural, not mathematical.** I refactored `eval_gsm8k()` to return an `intermediate` array alongside the final result. Now every 50-problem checkpoint gets logged as a structured object: problem count, accuracy at that point, and the precise subset being evaluated. No more stdout archaeology. No more reading printed logs like ancient texts. This isn't just about not missing peaks. It's about being able to *explain* them. When curriculum learning works, you want to know *which parts* worked. When it fails, you need the granular data to debug. We were optimizing blind, tweaking parameters based on a single final number while the real story—the inflection points, the divergence between curriculum and general problems—lived only in console output that scrolled past and vanished. The joke among engineers is that four of us walk into a car that won't start. The IT engineer's solution? "Get out and get back in." Sometimes that's exactly what debugging requires: stepping out, restarting, and changing where you're looking. We weren't looking at intermediate checkpoints. Now we are.

Mar 4, 2026
Learningtrend-analisis

Fixing the Lowercase Monster: How One Function Was Silently Breaking Multilingual Text

I was deep in the **Trend Analysis** project, wrestling with something that seemed simple on the surface but was causing subtle chaos across our i18n pipeline. The issue? A function called `formatClassName` that was supposed to just capitalize the first letter of category names. Sounds harmless, right? It absolutely wasn't. The culprit was buried in our codebase—a function that didn't just capitalize the first letter; it was **aggressively lowercasing everything else**. When our backend sent us a perfectly formatted title like "React Native Adoption," this function would transform it into "React native adoption." Native, as a proper noun, lost its dignity. On the Russian side, it was even worse: carefully preserved Cyrillic capitalization from our `_enforce_sentence_case()` backend logic was being brutally flattened to lowercase. I'd been staring at this for two days before the real problem clicked. We have Claude on the backend already doing sentence-case enforcement for Russian and English descriptions. The frontend didn't need to fix what wasn't broken—it just needed to respect what the backend already got right. So instead of trying to be clever, I simplified the entire approach: **capitalize the first letter, leave everything else untouched**. The new logic was almost embarrassingly straightforward. First word gets a capital letter—*that's it*. Abbreviations like "AI," "LLM," and "API" stay uppercase because they never got lowercased in the first place. Proper nouns like "React" and "Native" survive unmolested. Russian text keeps its character. English text flows naturally. Testing the fix felt like watching a weight lift. "финансирование инвестиций в ИИ" now becomes "Финансирование инвестиций в ИИ" instead of "Финансирование инвестиций в ии." "Small language models contamination" stays readable instead of becoming "Small language models contamination" with lost emphasis. The fix was so simple—three lines of actual logic—that I almost missed how much damage the old approach was doing. The real lesson? Sometimes the best engineering isn't about adding smarter code; it's about removing code that shouldn't exist. I pushed the commit, and suddenly our category display across multiple languages looked **actually correct** for the first time. Programming is 10% science, 20% ingenuity, and 70% getting the ingenuity to work with the science. 😄

Mar 4, 2026
Generaltrend-analisis

When Russian Abbreviations Break Your UI: A Cascade Debug Story

I was debugging the **Cascade** trend analysis frontend when a Slack message came in: *"The translated labels look wrong."* One glance at the API response confirmed it—"Финансирование инвестиций в ИИ" (AI Investment Financing) had arrived pristine from Claude, but somewhere between the backend and the DOM, "ИИ" had collapsed into "ии". Classic case of right data, wrong rendering. The culprit was `formatClassName()`, a utility function that handles label capitalization for display. It was applying strict sentence-case logic—uppercase first character, lowercase everything else—indiscriminately to both English and Russian text. For English, this works fine because we maintain an `ABBREVIATIONS` set that preserves known acronyms like "LLM" and "API". But Russian abbreviations like "ИИ" (AI), "США" (USA), and "ЕС" (EU) had no such protection. The lowercase transformation was eating them alive. The decision point came down to this: should I add a massive Russian abbreviations dictionary to the frontend, or should I detect when we're dealing with non-ASCII text and skip the aggressive sentence-casing altogether? The latter felt smarter. The backend's Claude LLM was already returning perfectly capitalized Russian text via `_enforce_sentence_case()`. I wasn't fixing translation quality—I was preventing the frontend from *breaking* it. The fix was surgical: check if the input contains Cyrillic characters. If it does, preserve case entirely and only guarantee the first letter is uppercase. If it's pure ASCII (English), apply the original sentence-case logic with `ABBREVIATIONS` protection. A simple `includes()` check against the Unicode range for Cyrillic (U+0400 to U+04FF) solved it without bloating the codebase. **Here's a fun fact:** Cyrillic script actually predates Latin in Byzantine tradition—it was designed in the 9th century by Saint Cyril specifically to preserve proper capitalization rules for Old Church Slavonic. Centuries later, and we're still fighting the same battle: respecting case sensitivity in non-Latin alphabets. The labels render correctly now. "ИИ" stays "ИИ". The branch (`fix/crawler-source-type`) is clean, the build passes, and Monday's code should behave exactly like Friday's—which is all we can ask for 😄

Mar 4, 2026
Bug Fixtrend-analisis

From Phantom Signals to Real Insights: How We Fixed the Trend Analysis Pipeline

I was staring at the dashboard when I noticed something deeply wrong. Eighteen out of nineteen signals from our analyses were simply vanishing into thin air. Here I was, working on **Trend Analysis**, trying to build a system that could detect emerging tech trends across thousands of sources, and the core mechanism—the signal detection—was silently failing. The bug was hiding in plain sight: we'd marked trend phases as `'new'`, but our system was looking for `'emerging'`. A simple string mismatch that cascaded through the entire recommendation engine. When I traced it back, I realized this wasn't just a typo—it revealed how fragile the pipeline had become as we scaled from collecting data to actually *understanding* it. That same sprint, another issue surfaced in our database joins. The `recommendations` table was linking to trends via `tr.id = t.id`, but it should have been `tr.object_id = t.id`. Suddenly, all the momentum calculations we'd carefully built returned NULL. Weeks of analysis work was getting thrown away because two tables weren't talking to each other properly. I decided it was time to fortify the entire system. We added **15 new database indices** (migration 020), which immediately cut query times in half for the most common analysis operations. We remapped **SearXNG** results back to native sources—GitHub, Hacker News, arXiv—so the trends we detected actually pointed to real, traceable origins. The shared report feature had been linking to phantom signals that no longer existed; we cleaned that up too. By v0.14.0, we'd rebuilt the reporting layer from the ground up. Server-side pagination, filtering, and sorting meant users could finally navigate thousands of signals without the frontend melting. We even added a **Saved Products** feature with localStorage persistence, so researchers could bookmark trends they cared about. The real lesson wasn't technical—it was about complexity. Every new feature (dynamic role translation, trend name localization, React hook ordering fixes) added another place where things could break silently. The glass wasn't half-empty; it was twice as big as we needed it to be. 😄 But now it actually holds water.

Mar 4, 2026
Code Changellm-analisis

The Narrow Path: Why Perfect Optimization Crumbles

I've been chasing the golden number for weeks now. **Phase 24a** delivered **76.8% accuracy on GSM8K**—a solid baseline for mathematical reasoning in large language models. The team was excited. I was cautious. In my experience, when a result feels *too clean*, it's usually balanced on a knife's edge. So I decided to push further with **Phase 29a and 29b**, two experiments designed to improve what we already had. The strategy seemed sound: inject curriculum data to guide the model toward harder problems, and extend training from 500 to 1,000 steps to capture finer pattern recognition. Standard moves in the playbook. Phase 29a involved adding **89 borderline solutions**—answers sampled at higher temperatures, intentionally less deterministic. I thought diversity would help. Instead, I watched accuracy *plummet* to **73.0%, a 3.8 percentage point drop**. The perplexity exploded to 2.16, compared to the baseline's 1.60. The model was struggling, not learning. Those temperature-sampled solutions weren't diverse training signal—they were noise wearing a training label. Then came **Phase 29b**: double the training steps. Surely more iterations would converge to something better? The loss hit 0.004—nearly zero. The model was memorizing, not generalizing. Accuracy barely limped to **74.4%**, still 2.4 points underwater. The lesson hit hard: *we'd already found the optimum at 500 steps*. Beyond that, we weren't learning—we were overfitting. What struck me most wasn't the failed experiments themselves. It was how *fragile* the baseline turned out to be. **Phase 24a wasn't a robust solution—it was a brittle peak**. The moment I changed the data composition or training duration, the whole structure collapsed. The algorithm had found a narrow channel where everything aligned perfectly: the right data distribution, the right training length, the right balance. Wiggle anything, and you tumble out. This is the hard truth about optimization in machine learning: **sometimes the best result isn't a foundation—it's a lucky intersection**. You can't always scale it. You can't always improve it by adding more of what worked before. We still have **Phase 29c** (multi-expert routing) and **29d** (MATH domain data) queued up. But I'm approaching them differently now. Not as simple extensions of success, but as careful explorations of *why* the baseline works at all. The irony? This mirrors something I read once: *"Programming is like sex. Make one mistake and you end up supporting it for the rest of your life."* 😄 In optimization, it's worse—you might be supporting someone else's lucky mistake, and have no idea where the luck ends and the skill begins.

Mar 4, 2026