no code implementations • 23 Sep 2024 • Bandhav Veluri, Benjamin N Peloquin, Bokai Yu, Hongyu Gong, Shyamnath Gollakota
Despite broad interest in modeling spoken dialogue agents, most approaches are inherently "half-duplex" -- restricted to turn-based interaction with responses requiring explicit prompting by the user or implicit tracking of interruption or silence events.
no code implementations • 25 Jul 2024 • Maruchi Kim, Antonio Glenn, Bandhav Veluri, Yunseo Lee, Eyoel Gebre, Aditya Bagaria, Shwetak Patel, Shyamnath Gollakota
Integrating cameras into wireless smart rings has been challenging due to size and power constraints.
no code implementations • 30 May 2024 • Hongyu Gong, Bandhav Veluri
Expressive speech-to-speech translation (S2ST) is a key research topic in seamless communication, which focuses on the preservation of semantics and speaker vocal style in translated speech.
1 code implementation • 10 May 2024 • Bandhav Veluri, Malek Itani, Tuochao Chen, Takuya Yoshioka, Shyamnath Gollakota
We present the first enrollment interface where the wearer looks at the target speaker for a few seconds to capture a single, short, highly noisy, binaural example of the target speaker.
1 code implementation • 1 Nov 2023 • Bandhav Veluri, Malek Itani, Justin Chan, Takuya Yoshioka, Shyamnath Gollakota
To achieve this, we make two technical contributions: 1) we present the first neural network that can achieve binaural target sound extraction in the presence of interfering sounds and background noise, and 2) we design a training methodology that allows our system to generalize to real-world use.
1 code implementation • 4 Nov 2022 • Bandhav Veluri, Justin Chan, Malek Itani, Tuochao Chen, Takuya Yoshioka, Shyamnath Gollakota
We present the first neural network model to achieve real-time and streaming target sound extraction.
Ranked #1 on
Streaming Target Sound Extraction
on FSDSoundScapes
1 code implementation • 25 Jul 2022 • Bandhav Veluri, Collin Pernu, Ali Saffari, Joshua Smith, Michael Taylor, Shyamnath Gollakota
Our idea is to design a dual-mode camera system where the first mode is low-power (1. 1 mW) but only outputs grey-scale, low resolution, and noisy video and the second mode consumes much higher power (100 mW) but outputs color and higher resolution images.
Colorization
Key-Frame-based Video Super-Resolution (K = 15)