no code implementations • 22 Feb 2024 • Nathaniel Weir, Kate Sanders, Orion Weller, Shreya Sharma, Dongwei Jiang, Zhengping Zhang, Bhavana Dalvi Mishra, Oyvind Tafjord, Peter Jansen, Peter Clark, Benjamin Van Durme
Contemporary language models enable new opportunities for structured reasoning with text, such as the construction and evaluation of intuitive, proof-like textual entailment trees without relying on brittle formal logic.
Self-supervised visual pretraining has shown significant progress recently.
Audio Visual Scene-aware Dialog (AVSD) is a task to generate responses when discussing about a given video.
This paper introduces a new open-sourced Mandarin speech corpus, called DiDiSpeech.
Audio and Speech Processing
In this paper, we conduct a further study on MPC and focus on three important aspects: the effect of pre-training data speaking style, its extension on streaming model, and how to better transfer learned knowledge from pre-training stage to downstream tasks.
Speech recognition technologies are gaining enormous popularity in various industrial applications.
Code-switching speech recognition has attracted an increasing interest recently, but the need for expert linguistic knowledge has always been a big issue.
We find that all types of modeling units can achieve approximate character error rate (CER) in CTC model and the performance of Chinese character attention model is better than syllable attention model.