1 code implementation • 27 Jan 2025 • Heting Gao, Hang Shao, Xiong Wang, Chaofan Qiu, Yunhang Shen, Siqi Cai, Yuchen Shi, Zihan Xu, Zuwei Long, Yike Zhang, Shaoqi Dong, Chaoyou Fu, Ke Li, Long Ma, Xing Sun
The film Her features Samantha, a sophisticated AI audio agent who is capable of understanding both linguistic and paralinguistic information in human speech and delivering real-time responses that are natural, informative and sensitive to emotional subtleties.
1 code implementation • 3 Jan 2025 • Chaoyou Fu, Haojia Lin, Xiong Wang, Yi-Fan Zhang, Yunhang Shen, Xiaoyu Liu, Haoyu Cao, Zuwei Long, Heting Gao, Ke Li, Long Ma, Xiawu Zheng, Rongrong Ji, Xing Sun, Caifeng Shan, Ran He
Recent Multimodal Large Language Models (MLLMs) have typically focused on integrating visual and textual modalities, with less emphasis placed on the role of speech in enhancing interaction.
no code implementations • 9 Sep 2024 • Yi Li, Heting Gao, Mingde He, Jinqian Liang, Jason Gu, Wei Liu
In scoliosis surgery, the limited field of view of the C-arm X-ray machine restricts the surgeons' holistic analysis of spinal structures . This paper presents an end-to-end efficient and robust intraoperative X-ray image stitching method for scoliosis surgery, named SX-Stitch.
1 code implementation • 12 Jun 2024 • Junrui Ni, Liming Wang, Yang Zhang, Kaizhi Qian, Heting Gao, Mark Hasegawa-Johnson, Chang D. Yoo
Recent advancements in supervised automatic speech recognition (ASR) have achieved remarkable performance, largely due to the growing availability of large transcribed speech corpora.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 16 Aug 2023 • Eunseop Yoon, Hee Suk Yoon, Dhananjaya Gowda, SooHwan Eom, Daehyeok Kim, John Harvill, Heting Gao, Mark Hasegawa-Johnson, Chanwoo Kim, Chang D. Yoo
Text-to-Text Transfer Transformer (T5) has recently been considered for the Grapheme-to-Phoneme (G2P) transduction.
1 code implementation • 20 Apr 2022 • Kaizhi Qian, Yang Zhang, Heting Gao, Junrui Ni, Cheng-I Lai, David Cox, Mark Hasegawa-Johnson, Shiyu Chang
Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to downstream tasks.
1 code implementation • 29 Mar 2022 • Heting Gao, Junrui Ni, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson
We show that WavPrompt is a few-shot learner that can perform speech understanding tasks better than a naive text baseline.
1 code implementation • 29 Mar 2022 • Junrui Ni, Liming Wang, Heting Gao, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson
An unsupervised text-to-speech synthesis (TTS) system learns to generate speech waveforms corresponding to any written sentence in a language by observing: 1) a collection of untranscribed speech waveforms in that language; 2) a collection of texts written in that language without access to any transcribed speech.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
no code implementations • 4 Dec 2019 • Yunan Zhang, Xiang Cheng, Heting Gao, ChengXiang Zhai
We model the question answering on KG as a cooperative task between two agents, a knowledge graph reasoning agent and an information extraction agent.
no code implementations • 28 Nov 2019 • Yunan Zhang, Heting Gao, Tarek Abdelzaher
As the ongoing rapid urbanization takes place with an ever-increasing speed, fully modeling urban dynamics becomes more and more challenging, but also a necessity for socioeconomic development.
no code implementations • 11 May 2019 • Suhansanu Kumar, Heting Gao, Changyu Wang, Hari Sundaram, Kevin Chen-Chuan Chang
When the property of the target entities is not directly queryable via the API, we refer to the property as `hidden' and the population as a hidden population.