ScribeAir — Offline Speech-to-Text for Windowsv1.3.1

Free offline voice input tool for Windows with three ASR engines: GigaAM (3.3% WER, best quality for Russian), Whisper (multilingual, GPU), and Vosk (fast offline). 90% error reduction, 93 automated tests, push-to-talk into any application. Open source, MIT license.

Offline

All processing on your device. No internet needed.

Open Source

MIT license. Full source code available.

Wake Word

Say "Record" to start, "Stop" to finish.

GPU Accelerated

CUDA support for faster transcription.

3.3% WER

90% error reduction vs Whisper baseline.

Push-to-Talk

Hold a hotkey, speak, release. Text appears instantly.

GigaAM

Best Russian ASR model. 700K hours training data.

ONNX

Optimized runtime. Fast on CPU and GPU.

UtilitiesPythonGigaAMfaster-whisperVoskONNX RuntimePyTorchSilero VADTransformersT5tkinterPyInstaller

Screenshots

Documentation

Voice Input Without Cloud or Subscriptions

ScribeAir 1.1.0 is a free speech recognition tool for Windows that works entirely offline. Hold a hotkey, speak — and the text instantly appears in any application: messenger, editor, IDE, or browser.

No data ever leaves your computer. No accounts, no subscriptions, no time limits.

What's New in 1.1.0

GigaAM — new ASR engine for Russian language. 3.3% WER (word error rate) vs 32.6% with Whisper base — a 90% error reduction.
Three recognition engines to choose from: GigaAM (quality), Whisper (multilingual), Vosk (speed).
93 automated tests — full coverage of all components: transcription, correction, filtering, pipeline.
ASR backend switching from tray — change engine and model without restarting.
Audio device selection — switch microphone from the tray menu.

Testing Results

Recognition quality was benchmarked on an audiobook corpus (Russian literary text of varying complexity): short phrases 6–7 seconds, long passages 25–30 seconds, rare words.

Recognition Quality (WER — Word Error Rate)

Engine	WER	Error Reduction	Speed	Notes
GigaAM v3-e2e-rnnt	3.3%	90.0%	0.66s	Best quality + speed, punctuation
GigaAM v3-rnnt	3.3%	90.0%	0.82s	Best quality
GigaAM v3-e2e-ctc	4.2%	87.2%	1.08s	Good balance
Whisper large-v3-turbo (GPU)	7.9%	75.7%	0.44s	Multilingual
Vosk small	13.0%	60.0%	0.75s	Minimal size (50 MB)
Whisper base (CPU)	32.6%	—	0.45s	v1.0 baseline

GigaAM on CPU outperforms all Whisper models on GPU in Russian text quality (3.3% vs 7.9% WER).

Automated Testing

Category	Tests	Coverage
Transcription (Whisper)	15	RU/EN, streaming, multilingual, hints, silence
Full Pipeline	9	Audio → ASR → correction → text, RU→EN translation
T5 Correction	11	Error correction, chain-of-correction, model unloading
Hallucination Filtering	14	Known phrases, delooping, duration anomalies
Audio Processing	11	Normalization, silence trimming, high-pass filter, edge cases
Streaming Pipeline	7	VAD, lifecycle, chunk processing
Integration	8	Silero VAD, noisereduce, config v2
Total	93	100% passing

Key Advantages

100% Offline & Private — voice is processed locally, never sent to the cloud. Complete privacy out of the box.
90% Error Reduction — GigaAM (Sber, 700K hours of training data) delivers 3.3% WER for Russian language.
Three ASR Engines — GigaAM for quality, Whisper for multilingual, Vosk for speed.
GPU & CPU — CUDA acceleration for maximum speed, CPU build for any PC.
93 Automated Tests — every component is verified: from audio processing to the full pipeline.
Free & Open Source — MIT license. Use, modify, and distribute without restrictions.

How It Works

Launch ScribeAir — a microphone icon appears in the system tray
Hold the hotkey (default: LShift + RShift)
Speak — text appears on screen in real time
Release the key — finished text is inserted into the active field

Models are downloaded automatically on first launch. After that, no internet is needed.

Recognition Modes

Mode	Engine	Languages	Quality (WER)	Speed
Auto (default)	GigaAM (CPU) / Whisper (GPU)	RU / Multi	3.3% / 7.9%	0.66s / 0.44s
GigaAM	GigaAM ONNX	Russian only	3.3%	0.66s
Whisper	faster-whisper	RU, EN, auto, translation	7.9–32.6%	0.44–2.3s
Vosk	Vosk offline	Russian	13.0%	0.75s

Features

Speech Recognition

GigaAM (Sber) — ONNX model trained on 700K hours of Russian speech. 6 model variants.
faster-whisper (CTranslate2) — optimized Whisper. Models: base, small, medium, large-v3-turbo.
Vosk — compact offline engine (50 MB), ideal for short phrases.
Streaming pipeline with Silero VAD — text appears as you speak
Whisper hallucination filtering (known artifacts are removed automatically)

Text Processing

T5 correction (bond005/ruT5-ASR-large) — fixes ASR errors for Russian language
Chain-of-Correction for complex sentences
Custom vocabulary for professional terminology

Audio & Interface

Audio preprocessing: pre-emphasis, normalization, 80 Hz high-pass filter, noise reduction, silence trimming
Configurable hotkeys (LShift+RShift, Win+Shift, etc.)
Semi-transparent overlay window with transcription progress
Microphone and ASR backend selection from tray menu
Windows autostart

Two Build Variants

Variant	Size	Description
CUDA	~2.4 GB	Full GPU support. Maximum speed on NVIDIA graphics cards.
CPU	~800 MB	Works on any PC without a graphics card.

AI models are downloaded automatically on first launch and cached locally.

Comparison with Alternatives

Feature	ScribeAir 1.1	CamoVoice	Handy	Speechnotes
Price	Free	$24.99	Free	Freemium
Offline	Yes	Yes	Yes	No
WER (Russian)	3.3%	N/A	N/A	N/A
ASR Engines	3 (GigaAM, Whisper, Vosk)	1	1	1 (cloud)
Open Source	MIT	No	Yes	No
GPU Acceleration	CUDA	No	No	Cloud
Text Correction	T5 Neural Net	No	No	No
Automated Tests	93 tests	N/A	N/A	N/A
Mixed RU+EN	Yes	No	No	No
Streaming	Real-time	No	No	Yes

System Requirements

	Minimum	Recommended
OS	Windows 10/11	Windows 11
RAM	8 GB	16 GB
GPU	Not required (CPU build)	NVIDIA 6+ GB VRAM
CUDA	—	CUDA 12.x
Disk	~3 GB	~5 GB (with models)

Technology Stack

GigaAM (Sber) — ONNX model for Russian ASR, trained on 700K hours of speech
faster-whisper — optimized Whisper engine based on CTranslate2
Vosk — compact offline ASR engine (Kaldi)
Silero VAD — neural voice activity detector
T5 (bond005/ruT5-ASR-large) — correction model for Russian ASR
ONNX Runtime — cross-platform inference for GigaAM
PyTorch + CUDA — GPU-accelerated inference
PyInstaller — standalone EXE packaging