
MRHF-Codec (emcodec)
ResearchNovel neural audio codec that achieves +40dB improvement over Meta's DAC on high-frequency reconstruction. MS research.
Tech Stack
Stats
Research Figures

Spectrogram comparison — original signal vs emcodec reconstruction. High-frequency content above 6kHz is faithfully preserved.

Ablation study — quantitative metrics across model variants.

High-frequency STFT analysis — the +40dB improvement region.

Ablation spectrogram — tambourine sample across model variants.

Training loss curves — 595K steps, multi-objective optimization.
Overview
MRHF-Codec (Multi-Resolution High-Frequency Preserving Neural Audio Codec) is a dual-path encoder-decoder architecture that solves a critical failure in existing neural audio codecs: catastrophic high-frequency destruction. Meta's DAC produces **negative SI-SDR on frequencies above 6kHz** (worse than random noise). MRHF-Codec achieves **positive SI-SDR (+11.6dB)** on the same content — a 40+ dB improvement — while using 11.8% less bitrate.
Engineering Highlights
- 01Asymmetric temporal resolution — First neural audio codec to use different downsampling ratios per frequency band. 128x for high-freq gives 4x better transient preservation than DAC's uniform 512x.
- 02FSQ over VQ — Finite Scalar Quantization eliminates codebook collapse (the root cause of HF degradation discovered in Phase 1). No learnable codebooks, no EMA updates, no dead codes.
- 03240+ tests with gradient diagnostics — Test suite includes gradient flow checks, energy conservation validation per frequency band, and end-to-end training pipeline tests. Not typical for research code.
- 04MUSHRA evaluation pipeline — Built a standardized listening test interface (in SoundPrivate) specifically for codec comparison. Scientific evaluation methodology, not just metrics.
- 05Comprehensive ablation studies — Validated each design decision: downsampling ratio choice, FSQ vs VQ, band split frequency, quantization layer allocation.