ScribeAir — Offline Speech-to-Text for Windowsv1.3.0
Free offline voice input tool for Windows with three ASR engines: GigaAM (3.3% WER, best quality for Russian), Whisper (multilingual, GPU), and Vosk (fast offline). 90% error reduction, 93 automated tests, push-to-talk into any application. Open source, MIT license.
Screenshots
Documentation
Voice Input Without Cloud or Subscriptions
ScribeAir 1.1.0 is a free speech recognition tool for Windows that works entirely offline. Hold a hotkey, speak — and the text instantly appears in any application: messenger, editor, IDE, or browser.
No data ever leaves your computer. No accounts, no subscriptions, no time limits.
What's New in 1.1.0
- GigaAM — new ASR engine for Russian language. 3.3% WER (word error rate) vs 32.6% with Whisper base — a 90% error reduction.
- Three recognition engines to choose from: GigaAM (quality), Whisper (multilingual), Vosk (speed).
- 93 automated tests — full coverage of all components: transcription, correction, filtering, pipeline.
- ASR backend switching from tray — change engine and model without restarting.
- Audio device selection — switch microphone from the tray menu.
Testing Results
Recognition quality was benchmarked on an audiobook corpus (Russian literary text of varying complexity): short phrases 6–7 seconds, long passages 25–30 seconds, rare words.
Recognition Quality (WER — Word Error Rate)
| Engine | WER | Error Reduction | Speed | Notes |
|---|---|---|---|---|
| GigaAM v3-e2e-rnnt | 3.3% | 90.0% | 0.66s | Best quality + speed, punctuation |
| GigaAM v3-rnnt | 3.3% | 90.0% | 0.82s | Best quality |
| GigaAM v3-e2e-ctc | 4.2% | 87.2% | 1.08s | Good balance |
| Whisper large-v3-turbo (GPU) | 7.9% | 75.7% | 0.44s | Multilingual |
| Vosk small | 13.0% | 60.0% | 0.75s | Minimal size (50 MB) |
| Whisper base (CPU) | 32.6% | — | 0.45s | v1.0 baseline |
GigaAM on CPU outperforms all Whisper models on GPU in Russian text quality (3.3% vs 7.9% WER).
Automated Testing
| Category | Tests | Coverage |
|---|---|---|
| Transcription (Whisper) | 15 | RU/EN, streaming, multilingual, hints, silence |
| Full Pipeline | 9 | Audio → ASR → correction → text, RU→EN translation |
| T5 Correction | 11 | Error correction, chain-of-correction, model unloading |
| Hallucination Filtering | 14 | Known phrases, delooping, duration anomalies |
| Audio Processing | 11 | Normalization, silence trimming, high-pass filter, edge cases |
| Streaming Pipeline | 7 | VAD, lifecycle, chunk processing |
| Integration | 8 | Silero VAD, noisereduce, config v2 |
| Total | 93 | 100% passing |
Key Advantages
- 100% Offline & Private — voice is processed locally, never sent to the cloud. Complete privacy out of the box.
- 90% Error Reduction — GigaAM (Sber, 700K hours of training data) delivers 3.3% WER for Russian language.
- Three ASR Engines — GigaAM for quality, Whisper for multilingual, Vosk for speed.
- GPU & CPU — CUDA acceleration for maximum speed, CPU build for any PC.
- 93 Automated Tests — every component is verified: from audio processing to the full pipeline.
- Free & Open Source — MIT license. Use, modify, and distribute without restrictions.
How It Works
- Launch ScribeAir — a microphone icon appears in the system tray
- Hold the hotkey (default: LShift + RShift)
- Speak — text appears on screen in real time
- Release the key — finished text is inserted into the active field
Models are downloaded automatically on first launch. After that, no internet is needed.
Recognition Modes
| Mode | Engine | Languages | Quality (WER) | Speed |
|---|---|---|---|---|
| Auto (default) | GigaAM (CPU) / Whisper (GPU) | RU / Multi | 3.3% / 7.9% | 0.66s / 0.44s |
| GigaAM | GigaAM ONNX | Russian only | 3.3% | 0.66s |
| Whisper | faster-whisper | RU, EN, auto, translation | 7.9–32.6% | 0.44–2.3s |
| Vosk | Vosk offline | Russian | 13.0% | 0.75s |
Features
Speech Recognition
- GigaAM (Sber) — ONNX model trained on 700K hours of Russian speech. 6 model variants.
- faster-whisper (CTranslate2) — optimized Whisper. Models: base, small, medium, large-v3-turbo.
- Vosk — compact offline engine (50 MB), ideal for short phrases.
- Streaming pipeline with Silero VAD — text appears as you speak
- Whisper hallucination filtering (known artifacts are removed automatically)
Text Processing
- T5 correction (bond005/ruT5-ASR-large) — fixes ASR errors for Russian language
- Chain-of-Correction for complex sentences
- Custom vocabulary for professional terminology
Audio & Interface
- Audio preprocessing: pre-emphasis, normalization, 80 Hz high-pass filter, noise reduction, silence trimming
- Configurable hotkeys (LShift+RShift, Win+Shift, etc.)
- Semi-transparent overlay window with transcription progress
- Microphone and ASR backend selection from tray menu
- Windows autostart
Two Build Variants
| Variant | Size | Description |
|---|---|---|
| CUDA | ~2.4 GB | Full GPU support. Maximum speed on NVIDIA graphics cards. |
| CPU | ~800 MB | Works on any PC without a graphics card. |
AI models are downloaded automatically on first launch and cached locally.
Comparison with Alternatives
| Feature | ScribeAir 1.1 | CamoVoice | Handy | Speechnotes |
|---|---|---|---|---|
| Price | Free | $24.99 | Free | Freemium |
| Offline | Yes | Yes | Yes | No |
| WER (Russian) | 3.3% | N/A | N/A | N/A |
| ASR Engines | 3 (GigaAM, Whisper, Vosk) | 1 | 1 | 1 (cloud) |
| Open Source | MIT | No | Yes | No |
| GPU Acceleration | CUDA | No | No | Cloud |
| Text Correction | T5 Neural Net | No | No | No |
| Automated Tests | 93 tests | N/A | N/A | N/A |
| Mixed RU+EN | Yes | No | No | No |
| Streaming | Real-time | No | No | Yes |
System Requirements
| Minimum | Recommended | |
|---|---|---|
| OS | Windows 10/11 | Windows 11 |
| RAM | 8 GB | 16 GB |
| GPU | Not required (CPU build) | NVIDIA 6+ GB VRAM |
| CUDA | — | CUDA 12.x |
| Disk | ~3 GB | ~5 GB (with models) |
Technology Stack
- GigaAM (Sber) — ONNX model for Russian ASR, trained on 700K hours of speech
- faster-whisper — optimized Whisper engine based on CTranslate2
- Vosk — compact offline ASR engine (Kaldi)
- Silero VAD — neural voice activity detector
- T5 (bond005/ruT5-ASR-large) — correction model for Russian ASR
- ONNX Runtime — cross-platform inference for GigaAM
- PyTorch + CUDA — GPU-accelerated inference
- PyInstaller — standalone EXE packaging