Search Results for author: Nabarun Goswami

Found 5 papers, 2 papers with code

HyperVQ: MLR-based Vector Quantization in Hyperbolic Space

no code implementations • 18 Mar 2024 • Nabarun Goswami, Yusuke Mukuta, Tatsuya Harada

However, since the VQVAE is trained with a reconstruction objective, there is no constraint for the embeddings to be well disentangled, a crucial aspect for using them in discriminative tasks.

Quantization Representation Learning

Paper
Add Code

Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation

no code implementations • 18 Jan 2024 • Kohei Uehara, Nabarun Goswami, Hanqin Wang, Toshiaki Baba, Kohtaro Tanaka, Tomohiro Hashimoto, Kai Wang, Rei Ito, Takagi Naoya, Ryo Umagami, Yingyi Wen, Tanachai Anakewat, Tatsuya Harada

The increasing demand for intelligent systems capable of interpreting and reasoning about visual content requires the development of Large Multi-Modal Models (LMMs) that are not only accurate but also have explicit reasoning capabilities.

Language Modelling Large Language Model +2

Paper
Add Code

The Sound Demixing Challenge 2023 $\unicode{x2013}$ Music Demixing Track

2 code implementations • 14 Aug 2023 • Giorgio Fabbro, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Martínez-Ramírez, WeiHsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stöter, Alexandre Défossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Nabarun Goswami, Tatsuya Harada, Minseok Kim, Jun Hyung Lee, Yuanliang Dong, Xinran Zhang, Jiafeng Liu, Yuki Mitsufuji

We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce two new datasets that simulate such errors: SDXDB23_LabelNoise and SDXDB23_Bleeding.

Music Source Separation

488

Paper
Code

SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate

no code implementations • 13 Jul 2022 • Nabarun Goswami, Tatsuya Harada

The mapping of text to speech (TTS) is non-deterministic, letters may be pronounced differently based on context, or phonemes can vary depending on various physiological and stylistic factors like gender, age, accent, emotions, etc.

Speech Separation

Paper
Add Code

MMDenseLSTM: An efficient combination of convolutional and recurrent neural networks for audio source separation

1 code implementation • 7 May 2018 • Naoya Takahashi, Nabarun Goswami, Yuki Mitsufuji

Deep neural networks have become an indispensable technique for audio source separation (ASS).

Ranked #17 on Music Source Separation on MUSDB18 (using extra training data)

Music Source Separation Sound Audio and Speech Processing

1,392

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.