We propose Points2Sound, a multi-modal deep learning model which generates a binaural version from mono audio using 3D point cloud scenes.
This paper proposes a multi-modal deep learning model to perform music source separation conditioned on 3D point clouds of music performance recordings.
In particular, the presented approach uses a limited number of arbitrary discrete measurements of the magnitude of the sound field pressure in order to extrapolate this field to a higher-resolution grid of discrete points in space with a low computational complexity.
Most of the currently successful source separation techniques use the magnitude spectrogram as input, and are therefore by default omitting part of the signal: the phase.
Ranked #25 on Music Source Separation on MUSDB18