Search Results for author: Nabarun Goswami

Found 5 papers, 2 papers with code

HyperVQ: MLR-based Vector Quantization in Hyperbolic Space

no code implementations18 Mar 2024 Nabarun Goswami, Yusuke Mukuta, Tatsuya Harada

However, since the VQVAE is trained with a reconstruction objective, there is no constraint for the embeddings to be well disentangled, a crucial aspect for using them in discriminative tasks.

Quantization Representation Learning

Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation

no code implementations18 Jan 2024 Kohei Uehara, Nabarun Goswami, Hanqin Wang, Toshiaki Baba, Kohtaro Tanaka, Tomohiro Hashimoto, Kai Wang, Rei Ito, Takagi Naoya, Ryo Umagami, Yingyi Wen, Tanachai Anakewat, Tatsuya Harada

The increasing demand for intelligent systems capable of interpreting and reasoning about visual content requires the development of Large Multi-Modal Models (LMMs) that are not only accurate but also have explicit reasoning capabilities.

Language Modelling Large Language Model +2

SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate

no code implementations13 Jul 2022 Nabarun Goswami, Tatsuya Harada

The mapping of text to speech (TTS) is non-deterministic, letters may be pronounced differently based on context, or phonemes can vary depending on various physiological and stylistic factors like gender, age, accent, emotions, etc.

Speech Separation

MMDenseLSTM: An efficient combination of convolutional and recurrent neural networks for audio source separation

1 code implementation7 May 2018 Naoya Takahashi, Nabarun Goswami, Yuki Mitsufuji

Deep neural networks have become an indispensable technique for audio source separation (ASS).

Ranked #17 on Music Source Separation on MUSDB18 (using extra training data)

Music Source Separation Sound Audio and Speech Processing

Cannot find the paper you are looking for? You can Submit a new open access paper.