BorisovAI — Tools for the community. By the community.

When Random Labels Betrayed Your Self-Modifying Model

The llm-analisis project hit a wall that looked like a wall but was actually a mirror. I was deep into Phase 7b, trying to teach a mixture-of-experts model to manage its own architecture—to grow and prune experts based on what it learned during training. Beautiful vision. Terrible execution.

Here’s what happened: I’d successfully completed Phase 7a and Phase 7b.1. Q1 had found the best config at 70.15% accuracy, Q2 optimized the MoE architecture to 70.73%. The plan was elegant—add a control head that would learn when to expand or contract the expert pool. The model would become self-aware about its own computational needs. Except it didn’t.

Phase 7b.1 produced a NO-GO decision: 58.30% accuracy versus the 69.80% baseline. The culprit was brutally simple—I’d labeled the control signals with synthetic random labels. Thirty percent probability of “grow,” twenty percent of “prune,” totally disconnected from reality. The control head had nothing to learn from noise.

So I pivoted to Phase 7b.2, attacking the problem with entropy-based signals instead. The routing entropy in the MoE layer represents real model behavior—which experts the model actually trusts. That’s grounded, differentiable, honest data. I created expert_manager.py with state preservation for safe expert addition and removal, and documented the entire strategy in PHASE_7B2_PLAN.md. This was the right direction.

Except Phase 7b.2 had its own ghosts. When I tried implementing actual expert add/remove operations, the model initialization broke. The n_routed parameter wasn’t accessible the way I expected. And even when I fixed that, checkpoint loading became a nightmare—the pretrained Phase 7a weights weren’t loading correctly. The model would start at 8.95% accuracy instead of ~70%, making the training completely unreliable.

Then came the real moment of truth: I realized the fundamental issue wasn’t about finding the perfect control signal. The real problem was trying to do two hard things simultaneously—train a model AND have it restructure itself. Every architecture modification during training created instability.

Here’s the non-obvious fact about mixture-of-experts models: they’re deceptively fragile when you try to modify them dynamically. The routing patterns, the expert specialization, and the gradient flows are tightly coupled. Add an expert mid-training, and you’re not just adding capacity—you’re breaking the learned routing distribution that took epochs to develop. It’s like replacing car parts while driving at highway speed.

So I made the decision to pivot again. Phase 7b.3 would be direct and honest: focus on actual architecture modifications with a fixed expert count, moving toward multi-task learning instead of self-modification. The model would learn task-specific parameters, not reinvent its own structure. Sometimes the biological metaphor breaks down, and pure parameter learning is enough.

The session left three new artifacts: the failed but educational train_exp7b3_direct.py, the reusable expert_manager.py for future use, and most importantly, the understanding that self-modifying models need ground truth signals, not optimization fairy tales.

Next phase: implement the direct approach with proper initialization and validate that sometimes a fixed architecture with learned parameters beats the complexity of dynamic self-modification.

😄 Trying to build a self-modifying model without proper ground truth signals is like asking a chicken to redesign its own skeleton while running—it just flails around and crashes.

Random Labels, Silent Failures: When Noise Defeats Self-Modifying Models

When Random Labels Betrayed Your Self-Modifying Model

Metadata