no code implementations • 29 Aug 2023 • Longbin Ji, Pengfei Wei, Yi Ren, Jinglin Liu, Chen Zhang, Xiang Yin
Co-speech gesture generation is crucial for automatic digital avatar animation.
no code implementations • 25 Jul 2023 • Xiang Yin, Nico Potyka, Francesca Toni
Argumentative explainable AI has been advocated by several in recent years, with an increasing interest on explaining the reasoning outcomes of Argumentation Frameworks (AFs).
no code implementations • 24 Jul 2023 • Xinyi Yu, Xiang Yin, Lars Lindemann
Given an ATR bound, we compute a sequence of control inputs so that the specification is satisfied by the system as long as each sub-trajectory is shifted not more than the ATR bound.
no code implementations • 14 Jul 2023 • Ziyue Jiang, Jinglin Liu, Yi Ren, Jinzheng He, Chen Zhang, Zhenhui Ye, Pengfei Wei, Chunfeng Wang, Xiang Yin, Zejun Ma, Zhou Zhao
In this paper, we introduce Mega-TTS 2, a generic zero-shot multispeaker TTS model that is capable of synthesizing speech for unseen speakers with arbitrary-length prompts.
no code implementations • 27 Jun 2023 • Yahuan Cong, Haoyu Zhang, Haopeng Lin, Shichao Liu, Chunfeng Wang, Yi Ren, Xiang Yin, Zejun Ma
Cross-lingual timbre and style generalizable text-to-speech (TTS) aims to synthesize speech with a specific reference timbre or style that is never trained in the target language.
1 code implementation • 14 Jun 2023 • Xinghua Qu, Hongyang Liu, Zhu Sun, Xiang Yin, Yew Soon Ong, Lu Lu, Zejun Ma
Conversational recommender systems (CRSs) have become crucial emerging research topics in the field of RSs, thanks to their natural advantages of explicitly acquiring user preferences via interactive conversations and revealing the reasons behind recommendations.
no code implementations • 6 Jun 2023 • Zhenhui Ye, Ziyue Jiang, Yi Ren, Jinglin Liu, Chen Zhang, Xiang Yin, Zejun Ma, Zhou Zhao
We are interested in a novel task, namely low-resource text-to-talking avatar.
no code implementations • 6 Jun 2023 • Ziyue Jiang, Yi Ren, Zhenhui Ye, Jinglin Liu, Chen Zhang, Qian Yang, Shengpeng Ji, Rongjie Huang, Chunfeng Wang, Xiang Yin, Zejun Ma, Zhou Zhao
3) We further use a VQGAN-based acoustic model to generate the spectrogram and a latent code language model to fit the distribution of prosody, since prosody changes quickly over time in a sentence, and language models can capture both local and long-range dependencies.
1 code implementation • 4 Jun 2023 • Luping Liu, Zijian Zhang, Yi Ren, Rongjie Huang, Xiang Yin, Zhou Zhao
Previous works identify the problem of information mixing in the CLIP text encoder and introduce the T5 text encoder or incorporate strong prior knowledge to assist with the alignment.
no code implementations • 29 May 2023 • Jiawei Huang, Yi Ren, Rongjie Huang, Dongchao Yang, Zhenhui Ye, Chen Zhang, Jinglin Liu, Xiang Yin, Zejun Ma, Zhou Zhao
Finally, we use LLMs to augment and transform a large amount of audio-label data into audio-text datasets to alleviate the problem of scarcity of temporal data.
Ranked #4 on
Audio Generation
on AudioCaps
no code implementations • 24 May 2023 • Rongjie Huang, Huadai Liu, Xize Cheng, Yi Ren, Linjun Li, Zhenhui Ye, Jinzheng He, Lichao Zhang, Jinglin Liu, Xiang Yin, Zhou Zhao
Direct speech-to-speech translation (S2ST) aims to convert speech from one language into another, and has demonstrated significant progress to date.
no code implementations • 1 May 2023 • Zhenhui Ye, Jinzheng He, Ziyue Jiang, Rongjie Huang, Jiawei Huang, Jinglin Liu, Yi Ren, Xiang Yin, Zejun Ma, Zhou Zhao
Recently, neural radiance field (NeRF) has become a popular rendering technique in this field since it could achieve high-fidelity and 3D-consistent talking face generation with a few-minute-long training video.
no code implementations • 3 Apr 2023 • Yu Chen, Chao Shang, Xiaolin Huang, Xiang Yin
We first formulate the safety synthesis problem as a robust convex program (RCP) based on notion of control barrier function.
no code implementations • 2 Mar 2023 • Chunfeng Wang, Peisong Huang, Yuxiang Zou, Haoyu Zhang, Shichao Liu, Xiang Yin, Zejun Ma
As a key component of automated speech recognition (ASR) and the front-end in text-to-speech (TTS), grapheme-to-phoneme (G2P) plays the role of converting letters to their corresponding pronunciations.
no code implementations • 30 Jan 2023 • Rongjie Huang, Jiawei Huang, Dongchao Yang, Yi Ren, Luping Liu, Mingze Li, Zhenhui Ye, Jinglin Liu, Xiang Yin, Zhou Zhao
Its application to audio still lags behind for two main reasons: the lack of large-scale datasets with high-quality text-audio pairs, and the complexity of modeling long continuous audio data.
Ranked #7 on
Audio Generation
on AudioCaps
no code implementations • 12 Dec 2022 • Junhui Zhang, Junjie Pan, Xiang Yin, Zejun Ma
Speech-to-speech translation directly translates a speech utterance to another between different languages, and has great potential in tasks such as simultaneous interpretation.
no code implementations • 23 Nov 2022 • Yu Chen, ShaoYuan Li, Xiang Yin
We consider the problem of synthesizing optimal policies for Markov decision processes (MDP) for both utility objective and security constraint.
no code implementations • 23 Nov 2022 • Yu Chen, Shuo Yang, Rahul Mangharam, Xiang Yin
This problem is particularly challenging since future information is involved in the synthesis process.
no code implementations • 21 Nov 2022 • Nico Potyka, Xiang Yin, Francesca Toni
Random forests are decision tree ensembles that can be used to solve a variety of machine learning problems.
1 code implementation • 15 Nov 2022 • Xinyi Yu, Chuwei Wang, Dingran Yuan, ShaoYuan Li, Xiang Yin
However, instead of applying MPC directly for the entire task horizon, we decompose the STL formula into several sub-formulae with disjoint time horizons, and shrinking horizon MPC is applied for each short-horizon sub-formula iteratively.
no code implementations • 8 Nov 2022 • Junyao Hou, Siyuan Liu, Xiang Yin, Majid Zamani
In this paper, we first introduce a concept of approximate pre-opacity by capturing the security level of control systems with respect to the measurement precision of the intruder.
1 code implementation • 26 Sep 2022 • Xinyi Yu, Weijie Dong, Xiang Yin, ShaoYuan Li
To this end, effective approaches for the computation of feasible sets of STL formulae are provided.
1 code implementation • 15 Aug 2022 • Pengfei Wei, Lingdong Kong, Xinghua Qu, Yi Ren, Zhiqiang Xu, Jing Jiang, Xiang Yin
Specifically, we consider the generation of cross-domain videos from two sets of latent factors, one encoding the static information and another encoding the dynamic information.
no code implementations • 10 Jun 2022 • Junhui Zhang, Wudi Bao, Junjie Pan, Xiang Yin, Zejun Ma
In this paper, we propose a novel Chinese dialect TTS frontend with a translation module, which converts Mandarin text into dialectic expressions to improve the intelligibility and naturalness of synthesized speech.
no code implementations • 19 May 2022 • Nico Potyka, Xiang Yin, Francesca Toni
There is broad agreement in the literature that explanation methods should be faithful to the model that they explain, but faithfulness remains a rather vague term.
no code implementations • 3 May 2022 • Jianing Zhao, Xiang Yin, ShaoYuan Li
However, in contrast to existing results, where different verification procedures are developed for different properties case-by-case, in this work, we provide a unified framework for verifying all these properties by reducing each of them as an instance of HyperLTL model checking.
no code implementations • 27 Apr 2022 • Weijie Dong, Xiang Yin, ShaoYuan Li
In this work, we propose a novel \emph{uniform framework} for diagnosability of DES subject to, not only sensor failures, but also a very general class of unreliable sensors.
no code implementations • 6 Apr 2022 • Weijie Dong, Shang Gao, Xiang Yin, ShaoYuan Li
Non-deterministic observation is a general observation model that includes the case of intermittent loss of observations.
no code implementations • 1 Apr 2022 • Jianing Zhao, Keyi Zhu, Xiang Yin, ShaoYuan Li
In contrast to the standard game-based approach that optimizes the worst-case cost, in the paper, we propose to use regret as a new metric for planning in such a partially-known environment.
no code implementations • 31 Mar 2022 • Bohan Cui, Xiang Yin, ShaoYuan Li, Alessandro Giua
In this paper, we consider a new type of secret related to the knowledge of the system user.
no code implementations • 31 Mar 2022 • Jingshi Yao, Xiang Yin, ShaoYuan Li
Specifically, we consider an active attacker that can tamper with the observations received by the supervisor by, e. g., hacking on the communication channel between the sensors and the supervisor.
no code implementations • 30 Mar 2022 • Xinyi Yu, Weijie Dong, Xiang Yin, ShaoYuan Li
We show that, by explicitly utilizing the model information of the dynamic system, the proposed online monitoring algorithm can falsify or certify of the specification in advance compared with existing algorithms, where no model information is used.
no code implementations • 17 Jan 2022 • Tianyi Xie, Liucheng Liao, Cheng Bi, Benlai Tang, Xiang Yin, Jianfei Yang, Mingjie Wang, Jiali Yao, Yang Zhang, Zejun Ma
The task of few-shot visual dubbing focuses on synchronizing the lip movements with arbitrary speech input for any talking head video.
no code implementations • 9 Nov 2021 • Boyuan Ma, Xiang Yin, Xiaojuan Ban, Haiyou Huang, Neng Zhang, Hao Wang, Weihua Xue
The core contributions are as follows: 1) the federated learning algorithm is introduced into the polycrystalline microstructure image segmentation task to make full use of different user data to carry out machine learning, break the data island and improve the model generalization ability under the condition of ensuring the privacy and security of user data; 2) A data sharing strategy based on style transfer is proposed.
1 code implementation • 14 Oct 2021 • Jingning Xu, Benlai Tang, Mingjie Wang, Siyuan Bian, Wenyi Guo, Xiang Yin, Zejun Ma
To tackle this problem, most recent AdaIN-based architectures are proposed to extract clothes and scenario features for generation.
no code implementations • 10 Oct 2021 • Chao Wang, Zhonghao Li, Benlai Tang, Xiang Yin, Yuan Wan, Yibiao Yu, Zejun Ma
Experiments show that, compared with the baseline models, our proposed model can significantly improve the naturalness of converted singing voices and the similarity with the target singer.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 8 Oct 2021 • Pengfei Wu, Junjie Pan, Chenchang Xu, Junhui Zhang, Lin Wu, Xiang Yin, Zejun Ma
In expressive speech synthesis, there are high requirements for emotion interpretation.
no code implementations • 29 Sep 2021 • Jing Xie, Xiang Yin, Xiyi Zhang, Juan Chen, Quan Wen, Qiang Yang, Xuan Mo
In SPFL, the server uses the Softmax Normalized Gradient Similarity (SNGS) to weight the relationship between clients, and sends the personalized global model to each client.
no code implementations • 20 Jun 2021 • Nan Li, Kaixiang Zhang, Zhaojian Li, Vaibhav Srivastava, Xiang Yin
In this paper, we propose a novel cloud-assisted model predictive control (MPC) framework in which we systematically fuse a cloud MPC that uses a high-fidelity nonlinear model but is subject to communication delays with a local MPC that exploits simplified dynamics (due to limited computation) but has timely feedback.
no code implementations • 2 Feb 2021 • Yifan Xie, Xiang Yin, ShaoYuan Li
We assume that the system has a "secret" that does not want to be revealed to the intruder.
no code implementations • 9 Dec 2020 • Junyao Hou, Xiang Yin, ShaoYuan Li
In this paper, we investigate the verification of current-state opacity for discrete-event systems under Orwellian-type observations, i. e., the system is allowed to re-interpret the observation of an event based on its future suffix.
no code implementations • 28 Oct 2020 • Zhonghao Li, Benlai Tang, Xiang Yin, Yuan Wan, Ling Xu, Chen Shen, Zejun Ma
Singing voice conversion (SVC) aims to convert the voice of one singer to that of other singers while keeping the singing content and melody.
no code implementations • 20 Oct 2020 • Yifan Xie, Xiang Yin, ShaoYuan Li
Compared with the standard deterministic control mechanism, such a non-deterministic control mechanism can enhance the plausible deniability of the controlled system as the online control decision is a random realization and cannot be implicitly inferred from the control policy.
2 code implementations • 17 Oct 2020 • Boyuan Ma, Xiang Yin, Di wu, Xiaojuan Ban
In this work, to handle the requirements of both output image quality and comprehensive simplicity of structure implementation, we propose a cascade network to simultaneously generate decision map and fused result with an end-to-end training procedure.
no code implementations • ACL 2020 • Runxin Xu, Jun Cao, Mingxuan Wang, Jiaze Chen, Hao Zhou, Ying Zeng, Yu-Ping Wang, Li Chen, Xiang Yin, Xijin Zhang, Songcheng Jiang, Yuxuan Wang, Lei LI
This paper proposes the building of Xiaomingbot, an intelligent, multilingual and multimodal software robot equipped with four integral capabilities: news generation, news translation, news reading and avatar animation.
no code implementations • 19 May 2020 • Wenjie Li, Benlai Tang, Xiang Yin, Yushi Zhao, Wei Li, Kang Wang, Hao Huang, Yuxuan Wang, Zejun Ma
Accent conversion (AC) transforms a non-native speaker's accent into a native accent while maintaining the speaker's voice timbre.
no code implementations • 23 Apr 2020 • Yu Gu, Xiang Yin, Yonghui Rao, Yuan Wan, Benlai Tang, Yang Zhang, Jitong Chen, Yuxuan Wang, Zejun Ma
This paper presents ByteSing, a Chinese singing voice synthesis (SVS) system based on duration allocated Tacotron-like acoustic models and WaveRNN neural vocoders.
no code implementations • 11 Nov 2019 • Junhui Zhang, Junjie Pan, Xiang Yin, Chen Li, Shichao Liu, Yang Zhang, Yuxuan Wang, Zejun Ma
In this paper, we propose a hybrid text normalization system using multi-head self-attention.
no code implementations • 11 Nov 2019 • Junjie Pan, Xiang Yin, Zhiling Zhang, Shichao Liu, Yang Zhang, Zejun Ma, Yuxuan Wang
In Mandarin text-to-speech (TTS) system, the front-end text processing module significantly influences the intelligibility and naturalness of synthesized speech.