no code implementations • 10 Feb 2025 • Aviad Eisenberg, Sharon Gannot, Shlomo E. Chazan
This paper introduces a multi-microphone method for extracting a desired speaker from a mixture involving multiple speakers and directional noise in a reverberant environment.
1 code implementation • 16 Nov 2024 • Adi Cohen, Daniel Wong, Jung-Suk Lee, Sharon Gannot
This paper introduces an explainable DNN-based beamformer with a postfilter (ExNet-BF+PF) for multichannel signal processing.
no code implementations • 14 Sep 2024 • Ohad Cohen, Gershon Hazan, Sharon Gannot
This paper presents a Multi-modal Emotion Recognition (MER) system designed to enhance emotion recognition accuracy in challenging acoustic conditions.
no code implementations • 1 Jul 2024 • Daniel Levi, Amit Sofer, Sharon Gannot
Accurate and reliable identification of the relative transfer functions (RTFs) between microphones with respect to a desired source is an essential component in the design of microphone array beamformers, specifically when applying the minimum variance distortionless response (MVDR) criterion.
no code implementations • 1 Jul 2024 • Amit Eliav, Sharon Gannot
As this is the first work reporting CSD results on the challenging EasyCom dataset, the findings demonstrate the potential of the proposed multimodal approach for \ac{CSD} in real-world scenarios.
no code implementations • 5 Jun 2024 • Jacob Bitterman, Daniel Levi, Hilel Hagai Diamandi, Sharon Gannot, Tal Rosenwein
This paper focuses on room fingerprinting, a task involving the analysis of an audio recording to determine the specific volume and shape of the room in which it was captured.
no code implementations • 5 Jun 2024 • Ohad Cohen, Gershon Hazan, Sharon Gannot
The performance of most emotion recognition systems degrades in real-life situations ('in the wild' scenarios) where the audio is contaminated by reverberation.
no code implementations • 7 May 2024 • Amit Eliav, Aaron Taub, Renana Opochinsky, Sharon Gannot
In this paper, we propose a model which can generate a singing voice from normal speech utterance by harnessing zero-shot, many-to-many style transfer learning.
no code implementations • 11 Apr 2024 • Xavier Alameda-Pineda, Angus Addlesee, Daniel Hernández García, Chris Reinke, Soraya Arias, Federica Arrigoni, Alex Auternaud, Lauriane Blavette, Cigdem Beyan, Luis Gomez Camara, Ohad Cohen, Alessandro Conti, Sébastien Dacunha, Christian Dondrup, Yoav Ellinson, Francesco Ferro, Sharon Gannot, Florian Gras, Nancie Gunson, Radu Horaud, Moreno D'Incà, Imad Kimouche, Séverin Lemaignan, Oliver Lemon, Cyril Liotard, Luca Marchionni, Mordehay Moradi, Tomas Pajdla, Maribel Pino, Michal Polic, Matthieu Py, Ariel Rado, Bin Ren, Elisa Ricci, Anne-Sophie Rigaud, Paolo Rota, Marta Romeo, Nicu Sebe, Weronika Sieińska, Pinchas Tandeitnik, Francesco Tonini, Nicolas Turro, Timothée Wintz, Yanchao Yu
Despite the many recent achievements in developing and deploying social robotics, there are still many underexplored environments and applications for which systematic evaluation of such systems by end-users is necessary.
no code implementations • 11 Mar 2024 • Amit Eliav, Sharon Gannot
We present a deep-learning approach for the task of Concurrent Speaker Detection (CSD) using a modified transformer model.
no code implementations • 15 Jan 2024 • Daniel Fejgin, Elior Hadad, Sharon Gannot, Zbyněk Koldovský, Simon Doclo
According to how the SPS are combined, frequency fusion mechanisms are categorized into narrowband, broadband, or speaker-grouped, where the latter mechanism requires a speaker-wise grouping of frequencies.
no code implementations • 7 Jan 2024 • Renana Opochinsky, Mordehay Moradi, Sharon Gannot
Speech separation involves extracting an individual speaker's voice from a multi-speaker audio signal.
1 code implementation • 5 Jun 2023 • Yochai Yemini, Aviv Shamsian, Lior Bracha, Sharon Gannot, Ethan Fetaya
We then condition a diffusion model on the video and use the extracted text through a classifier-guidance mechanism where a pre-trained ASR serves as the classifier.
no code implementations • 1 Jan 2023 • Idan Cohen, Ofir Lindenbaum, Sharon Gannot
Classical methods for acoustic scene mapping require the estimation of time difference of arrival (TDOA) between microphones.
no code implementations • 6 Mar 2022 • Aviad Eisenberg, Sharon Gannot, Shlomo E. Chazan
In this paper we present a unified time-frequency method for speaker extraction in clean and noisy conditions.
2 code implementations • 27 Apr 2021 • Diego Di Carlo, Pinchas Tandeitnik, Cédric Foy, Antoine Deleforge, Nancy Bertin, Sharon Gannot
This paper presents dEchorate: a new database of measured multichannel Room Impulse Responses (RIRs) including annotations of early echo timings and 3D positions of microphones, real sources and image sources under different wall configurations in a cuboid room.
no code implementations • 11 Feb 2021 • Shlomo E. Chazan, Jacob Goldberger, Sharon Gannot
The experts estimate a mask from the noisy input and the final mask is then obtained as a weighted average of the experts' estimates, with the weights determined by the gating DNN.
no code implementations • 26 Jan 2021 • Michael J. Bianco, Sharon Gannot, Efren Fernandez-Grande, Peter Gerstoft
As far as we are aware, our paper presents the first approach to modeling the physics of acoustic propagation using deep generative modeling.
no code implementations • 1 Jan 2021 • Hodaya Hammer, Shlomo Chazan, Jacob Goldberger, Sharon Gannot
In this study we present a deep neural network-based online multi-speaker localisation algorithm based on a multi-microphone array.
no code implementations • 22 Oct 2020 • Yochai Yemini, Ethan Fetaya, Haggai Maron, Sharon Gannot
We use noisy and noiseless versions of a simulated reverberant dataset to test the proposed architecture.
1 code implementation • 26 Aug 2020 • Hodaya Hammer, Shlomo E. Chazan, Jacob Goldberger, Sharon Gannot
In this paper, we present a deep neural network-based online multi-speaker localisation algorithm.
no code implementations • 27 May 2020 • Michael J. Bianco, Sharon Gannot, Peter Gerstoft
We propose a semi-supervised localization approach based on deep generative modeling with variational autoencoders (VAEs).
no code implementations • 11 May 2019 • Michael J. Bianco, Peter Gerstoft, James Traer, Emma Ozanich, Marie A. Roch, Sharon Gannot, Charles-Alban Deledalle
Acoustic data provide scientific and engineering insights in fields ranging from biology and communications to ocean and Earth science.
no code implementations • 10 May 2019 • Nico Gößling, Elior Hadad, Sharon Gannot, Simon Doclo
While the binaural minimum variance distortionless response (BMVDR) beamformer provides a good noise reduction performance and preserves the binaural cues of the desired source, it does not allow to control the reduction of the interfering sources and distorts the binaural cues of the interfering sources and the background noise.
no code implementations • 16 Dec 2018 • Shlomo E. Chazan, Sharon Gannot, Jacob Goldberger
The optimal clustering is found by minimizing the reconstruction loss of the mixture of autoencoder network.