Search Results for author: Heting Gao

Found 11 papers, 6 papers with code

LUCY: Linguistic Understanding and Control Yielding Early Stage of Her

1 code implementation27 Jan 2025 Heting Gao, Hang Shao, Xiong Wang, Chaofan Qiu, Yunhang Shen, Siqi Cai, Yuchen Shi, Zihan Xu, Zuwei Long, Yike Zhang, Shaoqi Dong, Chaoyou Fu, Ke Li, Long Ma, Xing Sun

The film Her features Samantha, a sophisticated AI audio agent who is capable of understanding both linguistic and paralinguistic information in human speech and delivering real-time responses that are natural, informative and sensitive to emotional subtleties.

Question Answering

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

1 code implementation3 Jan 2025 Chaoyou Fu, Haojia Lin, Xiong Wang, Yi-Fan Zhang, Yunhang Shen, Xiaoyu Liu, Haoyu Cao, Zuwei Long, Heting Gao, Ke Li, Long Ma, Xiawu Zheng, Rongrong Ji, Xing Sun, Caifeng Shan, Ran He

Recent Multimodal Large Language Models (MLLMs) have typically focused on integrating visual and textual modalities, with less emphasis placed on the role of speech in enhancing interaction.

SX-Stitch: An Efficient VMS-UNet Based Framework for Intraoperative Scoliosis X-Ray Image Stitching

no code implementations9 Sep 2024 Yi Li, Heting Gao, Mingde He, Jinqian Liang, Jason Gu, Wei Liu

In scoliosis surgery, the limited field of view of the C-arm X-ray machine restricts the surgeons' holistic analysis of spinal structures . This paper presents an end-to-end efficient and robust intraoperative X-ray image stitching method for scoliosis surgery, named SX-Stitch.

Image Segmentation Image Stitching +4

Towards Unsupervised Speech Recognition Without Pronunciation Models

1 code implementation12 Jun 2024 Junrui Ni, Liming Wang, Yang Zhang, Kaizhi Qian, Heting Gao, Mark Hasegawa-Johnson, Chang D. Yoo

Recent advancements in supervised automatic speech recognition (ASR) have achieved remarkable performance, largely due to the growing availability of large transcribed speech corpora.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers

1 code implementation20 Apr 2022 Kaizhi Qian, Yang Zhang, Heting Gao, Junrui Ni, Cheng-I Lai, David Cox, Mark Hasegawa-Johnson, Shiyu Chang

Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to downstream tasks.

Disentanglement Self-Supervised Learning

Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition

1 code implementation29 Mar 2022 Junrui Ni, Liming Wang, Heting Gao, Kaizhi Qian, Yang Zhang, Shiyu Chang, Mark Hasegawa-Johnson

An unsupervised text-to-speech synthesis (TTS) system learns to generate speech waveforms corresponding to any written sentence in a language by observing: 1) a collection of untranscribed speech waveforms in that language; 2) a collection of texts written in that language without access to any transcribed speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Cooperative Reasoning on Knowledge Graph and Corpus: A Multi-agentReinforcement Learning Approach

no code implementations4 Dec 2019 Yunan Zhang, Xiang Cheng, Heting Gao, ChengXiang Zhai

We model the question answering on KG as a cooperative task between two agents, a knowledge graph reasoning agent and an information extraction agent.

Question Answering

Macross: Urban Dynamics Modeling based on Metapath Guided Cross-Modal Embedding

no code implementations28 Nov 2019 Yunan Zhang, Heting Gao, Tarek Abdelzaher

As the ongoing rapid urbanization takes place with an ever-increasing speed, fully modeling urban dynamics becomes more and more challenging, but also a necessity for socioeconomic development.

Mining Hidden Populations through Attributed Search

no code implementations11 May 2019 Suhansanu Kumar, Heting Gao, Changyu Wang, Hari Sundaram, Kevin Chen-Chuan Chang

When the property of the target entities is not directly queryable via the API, we refer to the property as `hidden' and the population as a hidden population.

Cannot find the paper you are looking for? You can Submit a new open access paper.