no code implementations • 28 Nov 2023 • Kazuki Yamauchi, Yusuke Ijima, Yuki Saito
The experimental results demonstrate that our StyleCap leveraging richer LLMs for the text decoder, speech self-supervised learning (SSL) features, and sentence rephrasing augmentation improves the accuracy and diversity of generated speaking-style captions.