no code implementations • ACL (ECNLP) 2021 • Jingxiang Chen, Kai Wei, Xiang Hao
With the rapid growth of online video streaming, recent years have seen increasing concerns about profane language in their content.
no code implementations • 13 Dec 2024 • Seon-Ho Lee, Jue Wang, David Fan, Zhikang Zhang, Linda Liu, Xiang Hao, Vimal Bhat, Xinyu Li
Audio Description (AD) plays a pivotal role as an application system aimed at guaranteeing accessibility in multimedia content, which provides additional narrations at suitable intervals to describe visual elements, catering specifically to the needs of visually impaired audiences.
no code implementations • 10 Dec 2024 • Yicheng Wang, Zhikang Zhang, Jue Wang, David Fan, Zhenlin Xu, Linda Liu, Xiang Hao, Vimal Bhat, Xinyu Li
In various video-language learning tasks, the challenge of achieving cross-modality alignment with multi-grained data persists.
1 code implementation • 7 Oct 2024 • Xiang Hao, Chenxiang Ma, Qu Yang, Jibin Wu, Kay Chen Tan
In recent years, deep learning-based methods have significantly improved speech enhancement performance, but they often come with a high computational cost, which is prohibitive for a large number of edge devices, such as headsets and hearing aids.
no code implementations • 4 Feb 2024 • Jincao Yao, Yunpeng Wang, Zhikai Lei, Kai Wang, Xiaoxian Li, Jianhua Zhou, Xiang Hao, Jiafei Shen, Zhenping Wang, Rongrong Ru, Yaqing Chen, Yahan Zhou, Chen Chen, YanMing Zhang, Ping Liang, Dong Xu
After training, ThyGPT could automatically evaluate thyroid nodule and engage in effective communication with physicians through human-computer interaction.
1 code implementation • 11 Oct 2023 • Xiang Hao, Jibin Wu, Jianwei Yu, Chenglin Xu, Kay Chen Tan
We demonstrate that textual descriptions alone can effectively serve as cues for extraction, thus addressing privacy concerns and reducing dependency on voiceprints.
no code implementations • 16 May 2023 • Di Xu, Yang Zhao, Xiang Hao, Xin Meng
We introduce a novel dataset consisting of images depicting pink eggs that have been identified as Pomacea canaliculata eggs, accompanied by corresponding bounding box annotations.
no code implementations • 14 Mar 2023 • Mingshuai Liu, Shubo Lv, Zihan Zhang, Runduo Han, Xiang Hao, Xianjun Xia, Li Chen, Yijian Xiao, Lei Xie
Achieving 0. 446 in the final score and 0. 517 in the P. 835 score, our system ranks 4th in the non-real-time track.
1 code implementation • 18 Dec 2022 • Xiang Hao, Xiaofei Li
FullSubNet is our recently proposed real-time single-channel speech enhancement network that achieves outstanding performance on the Deep Noise Suppression (DNS) Challenge dataset.
no code implementations • Submitted to ICLR 2022 • Wentao Zhu, Jingru Yi, Xiaohang Sun, Xiang Hao, Linda Liu, Mohamed Omar
In this work, we develop a multiscale multimodal Transformer (MMT) that employs hierarchical representation learning.
Ranked #1 on
Multi-modal Classification
on VGG-Sound
no code implementations • Submitted to ICLR 2022 • Wentao Zhu, Jingru Yi, Kevin Hsu, Xiaohang Sun, Xiang Hao, Linda Liu, Mohamed Omar
AVT uses a combination of video and audio signals to improve action recognition accuracy, leveraging the effective spatio-temporal representation by the video Transformer.
Ranked #4 on
Multi-modal Classification
on VGG-Sound
no code implementations • 16 Jun 2022 • Xiang Hao, Jingxiang Chen, Shixing Chen, Ahmed Saad, Raffay Hamid
To help customers make better-informed viewing choices, video-streaming services try to moderate their content and provide more visibility into which portions of their movies and TV episodes contain age-appropriate material (e. g., nudity, sex, violence, or drug-use).
no code implementations • 30 Mar 2022 • Zhenhao Jin, Xiang Hao, Xiangdong Su
This paper formulates the speech separation with the unknown number of speakers as a multi-pass source extraction problem and proposes a coarse-to-fine recursive speech separation method.
no code implementations • CVPR 2023 • Shixing Chen, Chun-Hao Liu, Xiang Hao, Xiaohan Nie, Maxim Arap, Raffay Hamid
However, labeling individual scenes is a time-consuming process.
no code implementations • 17 Dec 2020 • Hongya Song, Yaoguang Ma, Yubing Han, Weidong Shen, Wenyi Zhang, Yanghui Li, Xu Liu, Yifan Peng, Xiang Hao
Computational spectroscopic instruments with Broadband Encoding Stochastic (BEST) filters allow the reconstruction of the spectrum at high precision with only a few filters.
Instrumentation and Detectors
6 code implementations • 29 Oct 2020 • Xiang Hao, Xiangdong Su, Radu Horaud, Xiaofei Li
In our proposed FullSubNet, we connect a pure full-band model and a pure sub-band model sequentially and use practical joint training to integrate these two types of models' advantages.
no code implementations • 29 Oct 2020 • Xiang Hao, Xiangdong Su, Zhiyu Wang, HUI ZHANG, Batushiren
This approach consists of a generator network and a discriminator network, which operate directly in the time domain.
no code implementations • 11 Jun 2020 • Huali Xu, Xiangdong Su, Meng Wang, Xiang Hao, Guanglai Gao
The mask shrinking strategy is employed in the image completion model to track the areas to be repaired.
no code implementations • 29 May 2020 • Xiang Hao, Xiangdong Su, Zhiyu Wang, Qiang Zhang, Huali Xu, Guanglai Gao
Specifically, this method consists of multiple teacher models and a student model.
no code implementations • 29 May 2020 • Xiang Hao, Shixue Wen, Xiangdong Su, Yun Liu, Guanglai Gao, Xiaofei Li
In single-channel speech enhancement, methods based on full-band spectral features have been widely studied.