Search Results for author: Jan Skoglund

Found 13 papers, 8 papers with code

BINAQUAL: A Full-Reference Objective Localization Similarity Metric for Binaural Audio

1 code implementation17 May 2025 Davoud Shariat Panah, Dan Barry, Alessandro Ragano, Jan Skoglund, Andrew Hines

Spatial audio enhances immersion in applications such as virtual reality, augmented reality, gaming, and cinema by creating a three-dimensional auditory experience.

Perceptual Audio Coding: A 40-Year Historical Perspective

no code implementations22 Apr 2025 Jürgen Herre, Schuyler Quackenbush, Minje Kim, Jan Skoglund

In the history of audio and acoustic signal processing, perceptual audio coding has certainly excelled as a bright success story by its ubiquitous deployment in virtually all digital media devices, such as computers, tablets, mobile phones, set-top-boxes, and digital radios.

Neural Speech and Audio Coding: Modern AI Technology Meets Traditional Codecs

no code implementations13 Aug 2024 Minje Kim, Jan Skoglund

This paper explores the integration of model-based and data-driven approaches within the realm of neural speech and audio coding systems.

Ultra-Low-Bitrate Speech Coding with Pretrained Transformers

no code implementations5 Jul 2022 Ali Siahkoohi, Michael Chinen, Tom Denton, W. Bastiaan Kleijn, Jan Skoglund

Our numerical experiments show that supplementing the convolutional encoder of a neural speech codec with Transformer speech embeddings yields a speech codec with a bitrate of $600\,\mathrm{bps}$ that outperforms the original neural speech codec in synthesized speech quality when trained at the same bitrate.

Decoder Inductive Bias

A Comparison of Deep Learning MOS Predictors for Speech Synthesis Quality

no code implementations5 Apr 2022 Alessandro Ragano, Emmanouil Benetos, Michael Chinen, Helard B. Martinez, Chandan K. A. Reddy, Jan Skoglund, Andrew Hines

In this paper, we evaluate several MOS predictors based on wav2vec 2. 0 and the NISQA speech quality prediction model to explore the role of the training data, the influence of the system type, and the role of cross-domain features in SSL models.

Benchmarking Self-Supervised Learning +1

SoundStream: An End-to-End Neural Audio Codec

6 code implementations7 Jul 2021 Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, Marco Tagliasacchi

We present SoundStream, a novel neural audio codec that can efficiently compress speech, music and general audio at bitrates normally targeted by speech-tailored codecs.

Decoder Speech Enhancement +2

Handling Background Noise in Neural Speech Generation

1 code implementation23 Feb 2021 Tom Denton, Alejandro Luebs, Felicia S. C. Lim, Andrew Storus, Hengchin Yeh, W. Bastiaan Kleijn, Jan Skoglund

Recent advances in neural-network based generative modeling of speech has shown great potential for speech coding.

Denoising Speech Synthesis

WARP-Q: Quality Prediction For Generative Neural Speech Codecs

2 code implementations20 Feb 2021 Wissam A. Jassim, Jan Skoglund, Michael Chinen, Andrew Hines

Good speech quality has been achieved using waveform matching and parametric reconstruction coders.

Dynamic Time Warping Prediction

Generative Speech Coding with Predictive Variance Regularization

1 code implementation18 Feb 2021 W. Bastiaan Kleijn, Andrew Storus, Michael Chinen, Tom Denton, Felicia S. C. Lim, Alejandro Luebs, Jan Skoglund, Hengchin Yeh

We introduce predictive-variance regularization to reduce the sensitivity to outliers, resulting in a significant increase in performance.

A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet

2 code implementations28 Mar 2019 Jean-Marc Valin, Jan Skoglund

We demonstrate that LPCNet operating at 1. 6 kb/s achieves significantly higher quality than MELP and that uncompressed LPCNet can exceed the quality of a waveform codec operating at low bitrate.

Speech Synthesis

LPCNet: Improving Neural Speech Synthesis Through Linear Prediction

2 code implementations28 Oct 2018 Jean-Marc Valin, Jan Skoglund

We demonstrate that LPCNet can achieve significantly higher quality than WaveRNN for the same network size and that high quality LPCNet speech synthesis is achievable with a complexity under 3 GFLOPS.

Prediction Speech Synthesis +2

Wavenet based low rate speech coding

1 code implementation1 Dec 2017 W. Bastiaan Kleijn, Felicia S. C. Lim, Alejandro Luebs, Jan Skoglund, Florian Stimberg, Quan Wang, Thomas C. Walters

Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used.

Bandwidth Extension

Cannot find the paper you are looking for? You can Submit a new open access paper.