We're super excited to share and open-source our Neural Audio Codec with the public. If you're training or researching into Speech Foundational models, our codec has has a small, single codebook that compresses speech to just 0.8kbps, 160x smaller than 128kbps mp3 and 480x smaller than raw wav. Links in comments!
These words won’t make sense to most of you But I’m super excited that we’re open-sourcing our Neural Audio Codec - NeuCodec - this week! Audio codecs are a fundamental component of Speech Language Models and NeuCodec is the codec we use for our production TTS models. It has a small, single codebook and compresses speech to just 0.8kbps, 160x smaller than 128kbps mp3 and 480x smaller than raw wav. More over, with the FSQ quantisation scheme, NeuCodec has in-built bit-level error resistence, making it perfect for telecoms and interference prone communication tasks. We’re releasing it under the Apache 2.0 license so it’s free for commercial use, too! For researchers and academics, we’re also open-sourcing the Emilia-YODAS dataset (the subset which do not use NC data) encoded with NeuCodec, meaning that all 1.7TB of the dataset is compressed down to just 41GBs. We hope this will enable those with less compute resources to work on large-scale datasets more easily! Link in the comments; we have even more work done on NeuCodec that we can’t wait to share in the near future so keep an eye out for that!