MRHF-Codec (emcodec)

Research

Novel neural audio codec that achieves +40dB improvement over Meta's DAC on high-frequency reconstruction. MS thesis research.

MRHF-Codec (emcodec) screenshot 1

Spectrogram Comparison

Side-by-side spectrogram comparison of the original signal versus emcodec reconstruction

Left: Original signal. Right: emcodec reconstruction. The high-frequency content above 6kHz is faithfully preserved — where competing codecs produce noise.

Overview

MRHF-Codec (Multi-Resolution High-Frequency Preserving Neural Audio Codec) is a dual-path encoder-decoder architecture that solves a critical failure in existing neural audio codecs: catastrophic high-frequency destruction. Meta's DAC produces **negative SI-SDR on frequencies above 6kHz** (worse than random noise). MRHF-Codec achieves **positive SI-SDR (+11.6dB)** on the same content — a 40+ dB improvement — while using 11.8% less bitrate.

Tech Stack

Language: Python 3.9+Framework: PyTorchArchitecture: Dual-path encoder-decoder with FSQ quantizationTraining: GAN-based (Multi-Period + Multi-Scale Discriminators)Losses: L1 reconstruction + multi-scale STFT + mel-spectrogram + adversarial + feature matchingAnalysis: librosa, pyloudnorm (ITU-R BS.1770), soundfileEvaluation: Custom MUSHRA interface (built in SoundPrivate), 12+ metricsTraining Infra: RTX 4090, ~111 hours per 500K steps, WandB logging, mixed precisionTesting: pytest — 240+ test cases across 9 modules

Engineering Highlights

  1. 01Asymmetric temporal resolution — First neural audio codec to use different downsampling ratios per frequency band. 128x for high-freq gives 4x better transient preservation than DAC's uniform 512x.
  2. 02FSQ over VQ — Finite Scalar Quantization eliminates codebook collapse (the root cause of HF degradation discovered in Phase 1). No learnable codebooks, no EMA updates, no dead codes.
  3. 03240+ tests with gradient diagnostics — Test suite includes gradient flow checks, energy conservation validation per frequency band, and end-to-end training pipeline tests. Not typical for research code.
  4. 04MUSHRA evaluation pipeline — Built a standardized listening test interface (in SoundPrivate) specifically for codec comparison. Scientific evaluation methodology, not just metrics.
  5. 05Comprehensive ablation studies — Validated each design decision: downsampling ratio choice, FSQ vs VQ, band split frequency, quantization layer allocation.

Stats

135 over 3.5 months
Commits
240+ cases across 9 modules
Tests
11.7M generator, 41.3M discriminator
Parameters
275,527 files (187.4 hours)
Training Data
62-file test set, 12+ metrics
Evaluation
In draft
Paper