no code implementations • 20 Aug 2024 • Tianyi Xu, Kaixun Huang, Pengcheng Guo, Yu Zhou, Longtao Huang, Hui Xue, Lei Xie
Pre-trained multilingual speech foundation models, like Whisper, have shown impressive performance across different languages.
1 code implementation • 16 Aug 2024 • Tianyi Xu, Yiji Zhou, Xiaotao Hu, Kai Zhang, Anran Zhang, Xingye Qiu, Jun Xu
The TARC predicts the inference paths within feature extraction backbone, specifically selecting MSTBs based on the input images and SR scales.
no code implementations • 3 May 2024 • Xuelong Geng, Tianyi Xu, Kun Wei, Bingshen Mu, Hongfei Xue, He Wang, Yangze Li, Pengcheng Guo, Yuhang Dai, Longhao Li, Mingchen Shao, Lei Xie
Building upon this momentum, our research delves into an in-depth examination of this paradigm on a large open-source Chinese dataset.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 15 Nov 2023 • Yijie Zhou, Chao Li, Jin Liang, Tianyi Xu, Xin Liu, Jun Xu
The illumination of improperly exposed photographs has been widely corrected using deep convolutional neural networks or Transformers.
no code implementations • 7 Oct 2023 • Kaixun Huang, Ao Zhang, BinBin Zhang, Tianyi Xu, Xingchen Song, Lei Xie
However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control the degree of bias.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 8 Jul 2023 • Liqi Xue, Tianyi Xu, Yongbao Song, Yan Liu, Lei Zhang, XianTong Zhen, Jun Xu
But the majority of media images on the internet remain in 8-bit standard dynamic range (SDR) format.
no code implementations • 23 Jun 2023 • Yunian Pan, Tao Li, Henger Li, Tianyi Xu, Zizhan Zheng, Quanyan Zhu
Previous research has shown that federated learning (FL) systems are exposed to an array of security risks.
no code implementations • 6 Jun 2023 • Jianrong Wang, Yaxin Zhao, Li Liu, Tianyi Xu, Qi Li, Sen Li
Given an audio clip and a reference face image, the goal of the talking head generation is to generate a high-fidelity talking head video.
1 code implementation • 4 Jun 2023 • Jianrong Wang, Yuchen Huo, Li Liu, Tianyi Xu, Qi Li, Sen Li
Audio-visual speech recognition (AVSR) gains increasing attention from researchers as an important part of human-computer interaction.
no code implementations • 1 Jun 2023 • Tianyi Xu, Zhanheng Yang, Kaixun Huang, Pengcheng Guo, Ao Zhang, Biao Li, Changru Chen, Chao Li, Lei Xie
By incorporating additional contextual information, deep biasing methods have emerged as a promising solution for speech recognition of personalized words.
no code implementations • 21 May 2023 • Kaixun Huang, Ao Zhang, Zhanheng Yang, Pengcheng Guo, Bingshen Mu, Tianyi Xu, Lei Xie
In this study, we introduce a contextual phrase prediction network for an attention-based deep bias method.
no code implementations • 27 Dec 2022 • Tianyi Xu, Ding Zhang, Zizhan Zheng
The problem is challenging even when the link rate distributions are pre-known (the offline setting) due to the necessity of balancing the information gains from probing and the cost of reducing the data transmission opportunity.
no code implementations • 6 Aug 2021 • Tianyi Xu, Ding Zhang, Parth H. Pathak, Zizhan Zheng
In contrast to traditional link scheduling problems under uncertainty, we assume that in each time step, the device can probe a subset of links before deciding which one to use.
no code implementations • 9 Jul 2020 • Jianrong Wang, Xiaosheng Hu, Li Liu, Wei Liu, Mei Yu, Tianyi Xu
Given a speaker's speech, it is interesting to see if it is possible to generate this speaker's face.