NAVVS is a volumetric dataset of naturalistic actions whose captured sound and visual appearance yield an open-access resource for immersive and interactive research within an artificial 3D audio-visual environment, such as VR/AR/XR with six degree-of-freedom (6DoF) interaction. It includes a variety of short volumetric sounding actions. It provides a valuable resource for multimodal research and testing under realistic conditions. The dataset includes ten different actions designed with both semantic and acoustic diversity. For each action, four 2-seconds takes are available to provide a total of forty audio-visual clips.
The scenes were captured at the Centre for Vision, Speech & Signal Processing (CVSSP) of the University of Surrey (UK) with the aid of multiple cameras and multiple microphones. Along with the final clips' volumetric textured instances and the audio stereo mix, additional data is provided. This includes: the separated microphones' audio channels, raw images from the 16 UHD cameras, binary masks, camera calibration data, coarse visual hull reconstruction, and volumetric stereo refinement.
Paper | Code | Results | Date | Stars |
---|