Blog

Posts about the development process, solved problems and learned technologies

Smart Feedback Without the Spam: A Three-Layer Defense Strategy

# Building a Spam-Resistant Feedback System: Lessons from the Real World The borisovai-site project needed something every modern developer blog desperately wants: meaningful feedback without drowning in bot comments. The challenge was clear—implement a feedback system that lets readers report issues, mark helpful content, and share insights, all while keeping spam at bay. No signup required, but no open door to chaos either. **The first decision was architectural.** Rather than reinventing the wheel with a custom registration system, I chose a multi-layered defense approach. The system would offer three feedback types: bug reports, feature requests, and "helpful" votes. For sensitive operations like bug reports, OAuth authentication through NextAuth.js would be required, creating a natural barrier without friction for legitimate users. The real puzzle was handling spam and rate limiting. I sketched out three strategies: pure reCAPTCHA, pattern-based detection, and a hybrid approach. The hybrid won. Here's why: reCAPTCHA alone feels heavy-handed for a simple "mark as helpful" action. Pattern-based detection using regex against common spam markers catches obvious abuse cheaply. But the real protection came from rate limiting—one feedback per IP address per 24 hours, tracked either through Redis or an in-memory store depending on deployment scale. **The implementation stack reflected modern web practices.** React 19 with TypeScript provided type safety, Tailwind v4 handled styling efficiently, and Framer Motion added subtle animations that made the interface feel responsive without bloat. The backend connected to Strapi, where I added a new feedback collection with fields tracking the page URL, feedback type, user authentication status, IP address, and a timestamp. The API endpoint itself became a gatekeeper—checking rate limits before creating records, validating input against spam patterns, and returning helpful error messages like "You already left feedback on this page" or "Too many feedbacks from your IP. Try again later." **One unexpectedly thorny detail:** designing the UI for the feedback count. Should we show "23 people found this helpful" or just a percentage? The data model needed to support both, but the psychological impact differs significantly. I opted for showing the count when it exceeded a threshold—small numbers feel insignificant, but once you hit thirty or more, social proof kicks in. Error handling demanded attention too. Network failures got retry buttons, server errors pointed toward support, and validation errors explained exactly what went wrong. The mobile experience compressed the floating button interface into a minimal footprint while keeping all functionality accessible. ## The Tech Insight Most developers overlook that **rate limiting isn't just about preventing abuse—it's about conversation design.** When someone can only leave one feedback per day, they tend to make it count. They think before commenting. The constraint paradoxically improves feedback quality by making it scarce. **What's next?** The foundation is solid, but integrating an ML-based spam detector from Hugging Face would add a sophistication layer that adapts to evolving attack patterns. For now, the system ships with pattern detection and OAuth—practical, maintainable, and battle-tested by similar implementations across the web. Why is Linux safe? Hackers peek through Windows only.

Feb 13, 2026

Bug Fixspeech-to-text

Whisper's Speed Trap: Why Fast Speech Recognition Demands Ruthless Trade-offs

# Racing Against the Clock: When Every Millisecond Matters in Speech Recognition The task was brutally simple on paper: make the speech-to-text pipeline faster. But reality had other plans. The team needed to squeeze this system under one second of processing time while keeping accuracy respectable, and I was tasked with finding every possible optimization hiding in the codebase. I started where most engineers do—model shopping. The Whisper ecosystem offers multiple model sizes, each promising different speed-to-accuracy trade-offs. The tiny model? A disappointment at 56.2% word error rate. The small model delivered a beautiful 23.4% WER, a 28% improvement over the base version—but it demanded 1.23 seconds. That's 230 milliseconds beyond our budget. The medium model performed slightly worse at 24.3% WER and completely blew past the deadline at 3.43 seconds. The base model remained our only option that fit the constraint, clocking in at just under one second with a 32.6% WER. Refusing to accept defeat, I pivoted to beam search optimization and temperature tuning. Nothing. All variations stubbornly returned the same 32.6% error rate. Then came the T5 filtering strategies—applying different confidence thresholds between 0.6 and 0.95 to selectively correct weak predictions. The data was humbling: every threshold produced identical results. But here's what fascinated me: removing T5 entirely tanked performance to 41% WER. This meant T5 was doing *something* critical, just not in the way I'd hoped to optimize it. I explored confidence-based selection next, thinking perhaps we could be smarter about when to invoke the correction layer. Nope. The error analysis revealed the real villain: Whisper's base model itself was fundamentally bottlenecked, struggling most with deletions (12 common cases) and substitutions (6 instances). These weren't filter failures—they were detection failures at the source. The hybrid approaches crossed my desk: maybe we run the base model for real-time responses and spawn a background task with the medium model for async refinement? Theoretically sound, practically nightmarish. The complexity of managing two parallel pipelines, handling race conditions, and deciding which result to trust felt like building a second system just to work around the first. Post-processing techniques like segment-based normalization and capitalization rules promised quick wins. They delivered nothing. By this point, the evidence was overwhelming. **The brutal truth:** An 80% WER reduction target with a sub-one-second CPU constraint isn't optimization—it's physics. No model swap, no clever algorithm, no post-processing trick could overcome the fundamental limitation. This system needed either GPU acceleration, a larger model running asynchronously, or honest acceptance of its current ceiling. The lesson learned wasn't about Whisper or speech recognition specifically. It's that sometimes investigation reveals not a bug to fix, but a boundary to respect. The best engineering decision isn't always the most elegant code—sometimes it's knowing when to stop optimizing and start redesigning. 😄 Why is Linux safe? Hackers peek through Windows only.

Feb 13, 2026

New Featurellm-analisis

Random Labels, Silent Failures: When Noise Defeats Self-Modifying Models

# When Random Labels Betrayed Your Self-Modifying Model The `llm-analisis` project hit a wall that looked like a wall but was actually a mirror. I was deep into Phase 7b, trying to teach a mixture-of-experts model to manage its own architecture—to grow and prune experts based on what it learned during training. Beautiful vision. Terrible execution. Here's what happened: I'd successfully completed Phase 7a and Phase 7b.1. Q1 had found the best config at 70.15% accuracy, Q2 optimized the MoE architecture to 70.73%. The plan was elegant—add a control head that would learn when to expand or contract the expert pool. The model would become self-aware about its own computational needs. Except it didn't. Phase 7b.1 produced a **NO-GO decision**: 58.30% accuracy versus the 69.80% baseline. The culprit was brutally simple—I'd labeled the control signals with synthetic random labels. Thirty percent probability of "grow," twenty percent of "prune," totally disconnected from reality. The control head had nothing to learn from noise. So I pivoted to Phase 7b.2, attacking the problem with entropy-based signals instead. The routing entropy in the MoE layer represents real model behavior—which experts the model actually trusts. That's grounded, differentiable, honest data. I created `expert_manager.py` with state preservation for safe expert addition and removal, and documented the entire strategy in `PHASE_7B2_PLAN.md`. This was the right direction. Except Phase 7b.2 had its own ghosts. When I tried implementing actual expert add/remove operations, the model initialization broke. The `n_routed` parameter wasn't accessible the way I expected. And even when I fixed that, checkpoint loading became a nightmare—the pretrained Phase 7a weights weren't loading correctly. The model would start at 8.95% accuracy instead of ~70%, making the training completely unreliable. Then came the real moment of truth: I realized the fundamental issue wasn't about finding the perfect control signal. The real problem was trying to do two hard things simultaneously—train a model AND have it restructure itself. Every architecture modification during training created instability. **Here's the non-obvious fact about mixture-of-experts models:** they're deceptively fragile when you try to modify them dynamically. The routing patterns, the expert specialization, and the gradient flows are tightly coupled. Add an expert mid-training, and you're not just adding capacity—you're breaking the learned routing distribution that took epochs to develop. It's like replacing car parts while driving at highway speed. So I made the decision to pivot again. Phase 7b.3 would be direct and honest: focus on actual architecture modifications with a fixed expert count, moving toward multi-task learning instead of self-modification. The model would learn task-specific parameters, not reinvent its own structure. Sometimes the biological metaphor breaks down, and pure parameter learning is enough. The session left three new artifacts: the failed but educational `train_exp7b3_direct.py`, the reusable `expert_manager.py` for future use, and most importantly, the understanding that self-modifying models need ground truth signals, not optimization fairy tales. Next phase: implement the direct approach with proper initialization and validate that sometimes a fixed architecture with learned parameters beats the complexity of dynamic self-modification. 😄 Trying to build a self-modifying model without proper ground truth signals is like asking a chicken to redesign its own skeleton while running—it just flails around and crashes.

Feb 13, 2026

New Featurespeech-to-text

When Stricter Isn't Better: The Threshold Paradox

# Hitting the Ceiling: When Better Thresholds Don't Mean Better Results The speech-to-text pipeline was humming along at 34% Word Error Rate (WER)—respectable for a Whisper base model—but the team wanted more. The goal was ambitious: cut that error rate down to 6–8%, a dramatic 80% reduction. To get there, I started tweaking the T5 text corrector that sits downstream of the audio transcription, thinking that tighter filtering could squeeze out those extra percentage points. First thing I did was add configurable threshold methods to the T5TextCorrector class. The idea was simple: instead of hardcoded similarity thresholds, make them adjustable so we could experiment without rewriting code every iteration. I implemented `set_thresholds()` and `set_ultra_strict()` methods, then set ultra-strict filtering to use aggressive cutoffs—0.9 and 0.95 similarity scores—theoretically catching every questionable correction before it could degrade the output. Then came the benchmarking. I fixed references in `benchmark_aggressive_optimization.py` to match the full audio texts we were actually working with, not just snippets, and ran the tests. The results were sobering. **The baseline** (Whisper base + improved T5 at 0.8/0.85 thresholds): 34.0% WER, 0.52 seconds. **Ultra-strict T5** (0.9/0.95): 34.9% WER, 0.53 seconds—marginally *worse*. I also tested beam search with width=5, thinking diversity in decoding might help. That crushed performance: 42.9% WER, 0.71 seconds. Even stripping T5 entirely gave 35.8% WER. The pattern was clear: we'd plateaued. Tightening the screws on T5 correction wasn't the lever we needed. Higher beam widths actually hurt because they introduced more candidate hypotheses that could mangle the transcription. The fundamental issue wasn't filtering quality—it was the model's capacity to *understand* what it was hearing in the first place. Here's the uncomfortable truth: if you want to drop from 34% WER to 6–8%, you need a bigger model. Whisper medium would get you there, but it would shatter our latency budget. The time to run inference would balloon past what the system could tolerate. So we hit a hard constraint: stay fast or get accurate, but not both. **The lesson stuck with me**: optimization has diminishing returns, and sometimes the smartest decision is recognizing when you're chasing ghosts. The team documented the current optimal configuration—Whisper base with improved T5 filtering at 0.8/0.85 thresholds—and filed a ticket for future work. Sometimes shipping what works beats perfecting what breaks. 😄 Optimizing a speech-to-text system at 34% WER is like arguing about which airline has the best peanuts—you're still missing the entire flight.

Feb 13, 2026

Learningspeech-to-text

When Your AI Fixer Breaks What Isn't Broken

# Tuning the Truth: When Aggressive AI Corrections Go Too Far The speech-to-text pipeline was working, but something felt off. Our T5 model—trained to correct transcription errors—had developed a peculiar habit: it was *fixing* things that weren't broken. On audiobook samples, the correction layer was deleting roughly 30% of perfectly good text, chasing an impossible perfection. Word Error Rate looked decent on paper, but open any corrected transcript and you'd find entire sentences vanished. That's when I decided to investigate why our "smart" fallback was actually making things worse. The root cause turned out to be thresholds—those invisible guardrails that decide when a correction is confident enough to apply. The T5 filtering was set too aggressively: a word-level similarity threshold of just 0.6 meant the model would confidently rewrite almost anything. I bumped it up to 0.80 for single words and 0.85 for multi-word phrases. The result was almost comical in its improvement: Word Error Rate dropped from 28.4% to 3.9%, and text preservation jumped from 70% to 96.8%. No more phantom deletions. But that was only half the battle. The codebase also had an adaptive fallback mechanism—a feature designed to switch between models based on audio degradation. Theoretically brilliant, practically problematic. I ran benchmarks across four test suites: synthetic degraded audio, clean TTS audiobook data, degraded TTS audio, and real-world samples. The results were unambiguous. On clean data, the fallback added noise, pushing error rates up to 34.6% versus 31.9% baseline. On degraded synthetic audio, it provided no meaningful improvement over the primary model. The only thing it *did* accomplish was consuming 460MB of memory and adding 0.3 seconds of latency to every inference call. **Here's something worth knowing about adaptive systems**: they sound perfect in theory because they promise to handle everything. But in practice, they often optimize for edge cases that don't actually exist in production. The fallback was built anticipating real-world microphone degradation, but we were running on high-quality audiobooks processed through professional TTS pipelines. I kept the code—maybe someday we'll use it—but disabled it by default. Sometimes the simplest solution is admitting your clever idea doesn't fit the problem. The changes rippled through the system quietly. Filtering tightened, fallback disabled, documentation updated with complete benchmark results. Output became cleaner, inference became faster, and the correction layer finally started earning its name by actually *correcting* rather than *rewriting*. The lesson here isn't about T5 or audio processing specifically. It's about the dangerous seduction of "smart" systems. They feel sophisticated until you measure them against reality. When your adaptive fallback makes everything worse, sometimes the best optimization is knowing when to turn it off. 😄 Judge: "I sentence you to the maximum punishment..." Me (thinking): "Please be death, please be death..." Judge: "Maintain legacy code!" Me: "Damn."

Feb 13, 2026

New FeatureC--projects-ai-agents-voice-agent

Voice Agent: Bridging Python, JavaScript, and Real-Time Complexity

# Building a Voice Agent: Orchestrating Python and JavaScript Across the Monorepo The task landed on my desk with a familiar weight: build a voice agent that could handle real-time chat, authentication, and voice processing across a split architecture—Python backend, Next.js frontend. The real challenge wasn't the individual pieces; it was orchestrating them without letting the complexity spiral into a tangled mess. I started by sketching the backend foundation. **FastAPI 0.115** became the core, not just because it's fast, but because its native async support meant I could lean into streaming responses with **sse-starlette 2** for real-time chat without wrestling with blocking I/O. Authentication came next—implementing it early rather than bolting it on later proved essential, as every subsequent endpoint needed to trust the user context. The voice processing endpoints demanded careful thought. Unlike typical REST endpoints that fire-and-forget, voice required state management: buffering audio chunks, running inference, and streaming responses back. I structured these as separate concerns—one endpoint for transcription, another for chat context, another for voice synthesis. This separation meant I could debug and scale each independently. Then came the frontend integration. The Next.js team needed to consume these endpoints, but they also needed to integrate with **Telegram Mini App SDK** (TMA)—which introduced its own authentication layer. The streaming chat UI in React 19 had to handle partial messages gracefully, displaying text as it arrived rather than waiting for the full response. This is where **Tailwind CSS v4** with its new CSS-first configuration actually simplified things; the previous @apply-heavy syntax would have made dynamic class management messier. Here's something I discovered during this phase that most developers overlook: **the separation of concerns in monorepos only works if you establish strict validation protocols upfront.** I created a mental model—Python imports always get validated with a quick `python -c 'from src.module import Class'` check, npm builds happen after every frontend change, TypeScript gets run before anything ships. This discipline saved hours later when subtle import errors could have cascaded through the codebase. The real insight came from studying the project's **ERROR_JOURNAL.md pattern**. Instead of letting errors vanish into git history, documenting them upfront and checking that journal *before* attempting fixes prevented the classic mistake of solving the same problem three times. It's institutional memory in a single markdown file. One unexpected win: batching independent tasks across codebases in single commands. Rather than switching contexts repeatedly, I'd prepare backend validations and frontend builds together, letting them run in parallel. The monorepo structure—Python backend in `/backend`, Next.js in `/frontend`—made this clean. No cross-contamination, clear boundaries. By the end, the architecture was solid: defined agent roles, comprehensive validation checks, and a documentation pattern that actually prevented repeated mistakes. The frontend could stream chat responses while the backend processed voice, and authentication threaded through both without becoming a bottleneck. **A SQL statement walks into a bar and sees two tables. It approaches and asks, "May I join you?" 😄**

Blog

Smart Feedback Without the Spam: A Three-Layer Defense Strategy

Whisper's Speed Trap: Why Fast Speech Recognition Demands Ruthless Trade-offs

Random Labels, Silent Failures: When Noise Defeats Self-Modifying Models

When Stricter Isn't Better: The Threshold Paradox

When Your AI Fixer Breaks What Isn't Broken

Voice Agent: Bridging Python, JavaScript, and Real-Time Complexity

Спасли T5 от урезания: оптимизация вместо потерь

Already Done: Reading the Room in Refactoring

Already Done: When Your Plan Meets Reality

From Technical Jargon to User Gold: Naming Features That Matter

Decoupling SCADA: From Duplication to Architecture

20 Pages of Chaos → One Structured Roadmap

Mapping AI's Wild Growth: Building Your Trend Dashboard

Stripping the Gloss: Making Antirender Production Ready

An Interface That Speaks the Operator's Language

When Feedback Redesigned Everything

Unrendering Architecture: Stripping Digital Makeup from Design

Stripping the Gloss: When Fake Renders Ruin Real Data

Docs vs. Reality: Why Your Best Practices Fail in Production

From 3+ Seconds to Sub-Second: Inside Whisper's CPU Optimization Sprint