BorisovAI
All projects

ScribeAir — Offline Speech-to-Text for Windowsv1.3.0

Free offline voice input tool for Windows with three ASR engines: GigaAM (3.3% WER, best quality for Russian), Whisper (multilingual, GPU), and Vosk (fast offline). 90% error reduction, 93 automated tests, push-to-talk into any application. Open source, MIT license.

ScribeAir — Offline Speech-to-Text for Windows
Offline
All processing on your device. No internet needed.
Open Source
MIT license. Full source code available.
Wake Word
Say "Record" to start, "Stop" to finish.
GPU Accelerated
CUDA support for faster transcription.
3.3% WER
90% error reduction vs Whisper baseline.
Push-to-Talk
Hold a hotkey, speak, release. Text appears instantly.
GigaAM
Best Russian ASR model. 700K hours training data.
ONNX
Optimized runtime. Fast on CPU and GPU.
UtilitiesPythonGigaAMfaster-whisperVoskONNX RuntimePyTorchSilero VADTransformersT5tkinterPyInstaller

Screenshots

Documentation

Voice Input Without Cloud or Subscriptions

ScribeAir 1.1.0 is a free speech recognition tool for Windows that works entirely offline. Hold a hotkey, speak — and the text instantly appears in any application: messenger, editor, IDE, or browser.

No data ever leaves your computer. No accounts, no subscriptions, no time limits.


What's New in 1.1.0

  • GigaAM — new ASR engine for Russian language. 3.3% WER (word error rate) vs 32.6% with Whisper base — a 90% error reduction.
  • Three recognition engines to choose from: GigaAM (quality), Whisper (multilingual), Vosk (speed).
  • 93 automated tests — full coverage of all components: transcription, correction, filtering, pipeline.
  • ASR backend switching from tray — change engine and model without restarting.
  • Audio device selection — switch microphone from the tray menu.

Testing Results

Recognition quality was benchmarked on an audiobook corpus (Russian literary text of varying complexity): short phrases 6–7 seconds, long passages 25–30 seconds, rare words.

Recognition Quality (WER — Word Error Rate)

EngineWERError ReductionSpeedNotes
GigaAM v3-e2e-rnnt3.3%90.0%0.66sBest quality + speed, punctuation
GigaAM v3-rnnt3.3%90.0%0.82sBest quality
GigaAM v3-e2e-ctc4.2%87.2%1.08sGood balance
Whisper large-v3-turbo (GPU)7.9%75.7%0.44sMultilingual
Vosk small13.0%60.0%0.75sMinimal size (50 MB)
Whisper base (CPU)32.6%0.45sv1.0 baseline

GigaAM on CPU outperforms all Whisper models on GPU in Russian text quality (3.3% vs 7.9% WER).

Automated Testing

CategoryTestsCoverage
Transcription (Whisper)15RU/EN, streaming, multilingual, hints, silence
Full Pipeline9Audio → ASR → correction → text, RU→EN translation
T5 Correction11Error correction, chain-of-correction, model unloading
Hallucination Filtering14Known phrases, delooping, duration anomalies
Audio Processing11Normalization, silence trimming, high-pass filter, edge cases
Streaming Pipeline7VAD, lifecycle, chunk processing
Integration8Silero VAD, noisereduce, config v2
Total93100% passing

Key Advantages

  • 100% Offline & Private — voice is processed locally, never sent to the cloud. Complete privacy out of the box.
  • 90% Error Reduction — GigaAM (Sber, 700K hours of training data) delivers 3.3% WER for Russian language.
  • Three ASR Engines — GigaAM for quality, Whisper for multilingual, Vosk for speed.
  • GPU & CPU — CUDA acceleration for maximum speed, CPU build for any PC.
  • 93 Automated Tests — every component is verified: from audio processing to the full pipeline.
  • Free & Open Source — MIT license. Use, modify, and distribute without restrictions.

How It Works

  1. Launch ScribeAir — a microphone icon appears in the system tray
  2. Hold the hotkey (default: LShift + RShift)
  3. Speak — text appears on screen in real time
  4. Release the key — finished text is inserted into the active field

Models are downloaded automatically on first launch. After that, no internet is needed.


Recognition Modes

ModeEngineLanguagesQuality (WER)Speed
Auto (default)GigaAM (CPU) / Whisper (GPU)RU / Multi3.3% / 7.9%0.66s / 0.44s
GigaAMGigaAM ONNXRussian only3.3%0.66s
Whisperfaster-whisperRU, EN, auto, translation7.9–32.6%0.44–2.3s
VoskVosk offlineRussian13.0%0.75s

Features

Speech Recognition

  • GigaAM (Sber) — ONNX model trained on 700K hours of Russian speech. 6 model variants.
  • faster-whisper (CTranslate2) — optimized Whisper. Models: base, small, medium, large-v3-turbo.
  • Vosk — compact offline engine (50 MB), ideal for short phrases.
  • Streaming pipeline with Silero VAD — text appears as you speak
  • Whisper hallucination filtering (known artifacts are removed automatically)

Text Processing

  • T5 correction (bond005/ruT5-ASR-large) — fixes ASR errors for Russian language
  • Chain-of-Correction for complex sentences
  • Custom vocabulary for professional terminology

Audio & Interface

  • Audio preprocessing: pre-emphasis, normalization, 80 Hz high-pass filter, noise reduction, silence trimming
  • Configurable hotkeys (LShift+RShift, Win+Shift, etc.)
  • Semi-transparent overlay window with transcription progress
  • Microphone and ASR backend selection from tray menu
  • Windows autostart

Two Build Variants

VariantSizeDescription
CUDA~2.4 GBFull GPU support. Maximum speed on NVIDIA graphics cards.
CPU~800 MBWorks on any PC without a graphics card.

AI models are downloaded automatically on first launch and cached locally.


Comparison with Alternatives

FeatureScribeAir 1.1CamoVoiceHandySpeechnotes
PriceFree$24.99FreeFreemium
OfflineYesYesYesNo
WER (Russian)3.3%N/AN/AN/A
ASR Engines3 (GigaAM, Whisper, Vosk)111 (cloud)
Open SourceMITNoYesNo
GPU AccelerationCUDANoNoCloud
Text CorrectionT5 Neural NetNoNoNo
Automated Tests93 testsN/AN/AN/A
Mixed RU+ENYesNoNoNo
StreamingReal-timeNoNoYes

System Requirements

MinimumRecommended
OSWindows 10/11Windows 11
RAM8 GB16 GB
GPUNot required (CPU build)NVIDIA 6+ GB VRAM
CUDACUDA 12.x
Disk~3 GB~5 GB (with models)

Technology Stack

  • GigaAM (Sber) — ONNX model for Russian ASR, trained on 700K hours of speech
  • faster-whisper — optimized Whisper engine based on CTranslate2
  • Vosk — compact offline ASR engine (Kaldi)
  • Silero VAD — neural voice activity detector
  • T5 (bond005/ruT5-ASR-large) — correction model for Russian ASR
  • ONNX Runtime — cross-platform inference for GigaAM
  • PyTorch + CUDA — GPU-accelerated inference
  • PyInstaller — standalone EXE packaging

Rate this content

0/1000