Argument Attribution Explanations in Quantitative Bipolar Argumentation Frameworks (Technical Report)

no code implementations25 Jul 2023 Xiang Yin, Nico Potyka, Francesca Toni

Argumentative explainable AI has been advocated by several in recent years, with an increasing interest on explaining the reasoning outcomes of Argumentation Frameworks (AFs).

Fake News Detection Recommendation Systems

Efficient STL Control Synthesis under Asynchronous Temporal Robustness Constraints

no code implementations24 Jul 2023 Xinyi Yu, Xiang Yin, Lars Lindemann

Given an ATR bound, we compute a sequence of control inputs so that the specification is satisfied by the system as long as each sub-trajectory is shifted not more than the ATR bound.

Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts

no code implementations14 Jul 2023 Ziyue Jiang, Jinglin Liu, Yi Ren, Jinzheng He, Chen Zhang, Zhenhui Ye, Pengfei Wei, Chunfeng Wang, Xiang Yin, Zejun Ma, Zhou Zhao

In this paper, we introduce Mega-TTS 2, a generic zero-shot multispeaker TTS model that is capable of synthesizing speech for unseen speakers with arbitrary-length prompts.

Language Modelling

GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech

no code implementations27 Jun 2023 Yahuan Cong, Haoyu Zhang, Haopeng Lin, Shichao Liu, Chunfeng Wang, Yi Ren, Xiang Yin, Zejun Ma

Cross-lingual timbre and style generalizable text-to-speech (TTS) aims to synthesize speech with a specific reference timbre or style that is never trained in the target language.

Disentanglement Style Generalization

Towards Building Voice-based Conversational Recommender Systems: Datasets, Potential Solutions, and Prospects

1 code implementation14 Jun 2023 Xinghua Qu, Hongyang Liu, Zhu Sun, Xiang Yin, Yew Soon Ong, Lu Lu, Zejun Ma

Conversational recommender systems (CRSs) have become crucial emerging research topics in the field of RSs, thanks to their natural advantages of explicitly acquiring user preferences via interactive conversations and revealing the reasons behind recommendations.

Recommendation Systems

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias

no code implementations6 Jun 2023 Ziyue Jiang, Yi Ren, Zhenhui Ye, Jinglin Liu, Chen Zhang, Qian Yang, Shengpeng Ji, Rongjie Huang, Chunfeng Wang, Xiang Yin, Zejun Ma, Zhou Zhao

3) We further use a VQGAN-based acoustic model to generate the spectrogram and a latent code language model to fit the distribution of prosody, since prosody changes quickly over time in a sentence, and language models can capture both local and long-range dependencies.

Inductive Bias Language Modelling +1

Detector Guidance for Multi-Object Text-to-Image Generation

1 code implementation4 Jun 2023 Luping Liu, Zijian Zhang, Yi Ren, Rongjie Huang, Xiang Yin, Zhou Zhao

Previous works identify the problem of information mixing in the CLIP text encoder and introduce the T5 text encoder or incorporate strong prior knowledge to assist with the alignment.

object-detection Object Detection

Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation

no code implementations29 May 2023 Jiawei Huang, Yi Ren, Rongjie Huang, Dongchao Yang, Zhenhui Ye, Chen Zhang, Jinglin Liu, Xiang Yin, Zejun Ma, Zhou Zhao

Finally, we use LLMs to augment and transform a large amount of audio-label data into audio-text datasets to alleviate the problem of scarcity of temporal data.

Audio Generation Denoising +2

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation

no code implementations24 May 2023 Rongjie Huang, Huadai Liu, Xize Cheng, Yi Ren, Linjun Li, Zhenhui Ye, Jinzheng He, Lichao Zhang, Jinglin Liu, Xiang Yin, Zhou Zhao

Direct speech-to-speech translation (S2ST) aims to convert speech from one language into another, and has demonstrated significant progress to date.

Speech-to-Speech Translation Translation

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

no code implementations1 May 2023 Zhenhui Ye, Jinzheng He, Ziyue Jiang, Rongjie Huang, Jiawei Huang, Jinglin Liu, Yi Ren, Xiang Yin, Zejun Ma, Zhou Zhao

Recently, neural radiance field (NeRF) has become a popular rendering technique in this field since it could achieve high-fidelity and 3D-consistent talking face generation with a few-minute-long training video.

motion prediction Talking Face Generation

Data-Driven Safe Controller Synthesis for Deterministic Systems: A Posteriori Method With Validation Tests

no code implementations3 Apr 2023 Yu Chen, Chao Shang, Xiaolin Huang, Xiang Yin

We first formulate the safety synthesis problem as a robust convex program (RCP) based on notion of control barrier function.

LiteG2P: A fast, light and high accuracy model for grapheme-to-phoneme conversion

no code implementations2 Mar 2023 Chunfeng Wang, Peisong Huang, Yuxiang Zou, Haoyu Zhang, Shichao Liu, Xiang Yin, Zejun Ma

As a key component of automated speech recognition (ASR) and the front-end in text-to-speech (TTS), grapheme-to-phoneme (G2P) plays the role of converting letters to their corresponding pronunciations.

speech-recognition Speech Recognition

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models

no code implementations30 Jan 2023 Rongjie Huang, Jiawei Huang, Dongchao Yang, Yi Ren, Luping Liu, Mingze Li, Zhenhui Ye, Jinglin Liu, Xiang Yin, Zhou Zhao

Its application to audio still lags behind for two main reasons: the lack of large-scale datasets with high-quality text-audio pairs, and the complexity of modeling long continuous audio data.

Audio Generation Text-to-Video Generation +1

Direct Speech-to-speech Translation without Textual Annotation using Bottleneck Features

no code implementations12 Dec 2022 Junhui Zhang, Junjie Pan, Xiang Yin, Zejun Ma

Speech-to-speech translation directly translates a speech utterance to another between different languages, and has great potential in tasks such as simultaneous interpretation.

Speech-to-Speech Translation Translation

Markov decision processes with maximum entropy rate for Surveillance Tasks

no code implementations23 Nov 2022 Yu Chen, ShaoYuan Li, Xiang Yin

We consider the problem of synthesizing optimal policies for Markov decision processes (MDP) for both utility objective and security constraint.

You Don't Know When I Will Arrive: Unpredictable Controller Synthesis for Temporal Logic Tasks

no code implementations23 Nov 2022 Yu Chen, Shuo Yang, Rahul Mangharam, Xiang Yin

This problem is particularly challenging since future information is involved in the synthesis process.

Robot Task Planning

Explaining Random Forests using Bipolar Argumentation and Markov Networks (Technical Report)

no code implementations21 Nov 2022 Nico Potyka, Xiang Yin, Francesca Toni

Random forests are decision tree ensembles that can be used to solve a variety of machine learning problems.

Decision Making

Model Predictive Control for Signal Temporal Logic Specifications with Time Interval Decomposition

1 code implementation15 Nov 2022 Xinyi Yu, Chuwei Wang, Dingran Yuan, ShaoYuan Li, Xiang Yin

However, instead of applying MPC directly for the entire task horizon, we decompose the STL formula into several sub-formulae with disjoint time horizons, and shrinking horizon MPC is applied for each short-horizon sub-formula iteratively.

Abstraction-Based Verification of Approximate Pre-Opacity for Control Systems

no code implementations8 Nov 2022 Junyao Hou, Siyuan Liu, Xiang Yin, Majid Zamani

In this paper, we first introduce a concept of approximate pre-opacity by capturing the security level of control systems with respect to the measurement precision of the intruder.

Model Predictive Monitoring of Dynamic Systems for Signal Temporal Logic Specifications

1 code implementation26 Sep 2022 Xinyi Yu, Weijie Dong, Xiang Yin, ShaoYuan Li

To this end, effective approaches for the computation of feasible sets of STL formulae are provided.

Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective

1 code implementation15 Aug 2022 Pengfei Wei, Lingdong Kong, Xinghua Qu, Yi Ren, Zhiqiang Xu, Jing Jiang, Xiang Yin

Specifically, we consider the generation of cross-domain videos from two sets of latent factors, one encoding the static information and another encoding the dynamic information.

Action Recognition Disentanglement +2

A Novel Chinese Dialect TTS Frontend with Non-Autoregressive Neural Machine Translation

no code implementations10 Jun 2022 Junhui Zhang, Wudi Bao, Junjie Pan, Xiang Yin, Zejun Ma

In this paper, we propose a novel Chinese dialect TTS frontend with a translation module, which converts Mandarin text into dialectic expressions to improve the intelligibility and naturalness of synthesized speech.

Machine Translation Translation

Towards a Theory of Faithfulness: Faithful Explanations of Differentiable Classifiers over Continuous Data

no code implementations19 May 2022 Nico Potyka, Xiang Yin, Francesca Toni

There is broad agreement in the literature that explanation methods should be faithful to the model that they explain, but faithfulness remains a rather vague term.

A Unified Framework for Verification of Observational Properties for Partially-Observed Discrete-Event Systems

no code implementations3 May 2022 Jianing Zhao, Xiang Yin, ShaoYuan Li

However, in contrast to existing results, where different verification procedures are developed for different properties case-by-case, in this work, we provide a unified framework for verifying all these properties by reducing each of them as an instance of HyperLTL model checking.

A Uniform Framework for Diagnosis of Discrete-Event Systems with Unreliable Sensors using Linear Temporal Logic

no code implementations27 Apr 2022 Weijie Dong, Xiang Yin, ShaoYuan Li

In this work, we propose a novel \emph{uniform framework} for diagnosability of DES subject to, not only sensor failures, but also a very general class of unreliable sensors.

Fault Diagnosis of Discrete-Event Systems under Non-Deterministic Observations with Output Fairness

no code implementations6 Apr 2022 Weijie Dong, Shang Gao, Xiang Yin, ShaoYuan Li

Non-deterministic observation is a general observation model that includes the case of intermittent loss of observations.


To Explore or Not to Explore: Regret-Based LTL Planning in Partially-Known Environments

no code implementations1 Apr 2022 Jianing Zhao, Keyi Zhu, Xiang Yin, ShaoYuan Li

In contrast to the standard game-based approach that optimizes the worst-case cost, in the paper, we propose to use regret as a new metric for planning in such a partially-known environment.

You Don't Know What I Know: On Notion of High-Order Opacity in Discrete-Event Systems

no code implementations31 Mar 2022 Bohan Cui, Xiang Yin, ShaoYuan Li, Alessandro Giua

In this paper, we consider a new type of secret related to the knowledge of the system user.

Sensor Deception Attacks Against Initial-State Privacy in Supervisory Control Systems

no code implementations31 Mar 2022 Jingshi Yao, Xiang Yin, ShaoYuan Li

Specifically, we consider an active attacker that can tamper with the observations received by the supervisor by, e. g., hacking on the communication channel between the sensors and the supervisor.

Online Monitoring of Dynamic Systems for Signal Temporal Logic Specifications with Model Information

no code implementations30 Mar 2022 Xinyi Yu, Weijie Dong, Xiang Yin, ShaoYuan Li

We show that, by explicitly utilizing the model information of the dynamic system, the proposed online monitoring algorithm can falsify or certify of the specification in advance compared with existing algorithms, where no model information is used.

Towards Realistic Visual Dubbing with Heterogeneous Sources

no code implementations17 Jan 2022 Tianyi Xie, Liucheng Liao, Cheng Bi, Benlai Tang, Xiang Yin, Jianfei Yang, Mingjie Wang, Jiali Yao, Yang Zhang, Zejun Ma

The task of few-shot visual dubbing focuses on synchronizing the lip movements with arbitrary speech input for any talking head video.

Disentanglement Talking Head Generation

Data privacy protection in microscopic image analysis for material data mining

no code implementations9 Nov 2021 Boyuan Ma, Xiang Yin, Xiaojuan Ban, Haiyou Huang, Neng Zhang, Hao Wang, Weihua Xue

The core contributions are as follows: 1) the federated learning algorithm is introduced into the polycrystalline microstructure image segmentation task to make full use of different user data to carry out machine learning, break the data island and improve the model generalization ability under the condition of ensuring the privacy and security of user data; 2) A data sharing strategy based on style transfer is proposed.

Federated Learning Image Segmentation +2

Towards Using Clothes Style Transfer for Scenario-aware Person Video Generation

1 code implementation14 Oct 2021 Jingning Xu, Benlai Tang, Mingjie Wang, Siyuan Bian, Wenyi Guo, Xiang Yin, Zejun Ma

To tackle this problem, most recent AdaIN-based architectures are proposed to extract clothes and scenario features for generation.

Style Transfer Video Generation

Towards High-fidelity Singing Voice Conversion with Acoustic Reference and Contrastive Predictive Coding

no code implementations10 Oct 2021 Chao Wang, Zhonghao Li, Benlai Tang, Xiang Yin, Yuan Wan, Yibiao Yu, Zejun Ma

Experiments show that, compared with the baseline models, our proposed model can significantly improve the naturalness of converted singing voices and the similarity with the target singer.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Personalized Heterogeneous Federated Learning with Gradient Similarity

no code implementations29 Sep 2021 Jing Xie, Xiang Yin, Xiyi Zhang, Juan Chen, Quan Wen, Qiang Yang, Xuan Mo

In SPFL, the server uses the Softmax Normalized Gradient Similarity (SNGS) to weight the relationship between clients, and sends the personalized global model to each client.

Federated Learning

Cloud-Assisted Nonlinear Model Predictive Control for Finite-Duration Tasks

no code implementations20 Jun 2021 Nan Li, Kaixiang Zhang, Zhaojian Li, Vaibhav Srivastava, Xiang Yin

In this paper, we propose a novel cloud-assisted model predictive control (MPC) framework in which we systematically fuse a cloud MPC that uses a high-fidelity nonlinear model but is subject to communication delays with a local MPC that exploits simplified dynamics (due to limited computation) but has timely feedback.

Cloud Computing

Optimal Synthesis of Opacity-Enforcing Supervisors for Qualitative and Quantitative Specifications

no code implementations2 Feb 2021 Yifan Xie, Xiang Yin, ShaoYuan Li

We assume that the system has a "secret" that does not want to be revealed to the intruder.

A Framework for Current-State Opacity under Dynamic Information Release Mechanism

no code implementations9 Dec 2020 Junyao Hou, Xiang Yin, ShaoYuan Li

In this paper, we investigate the verification of current-state opacity for discrete-event systems under Orwellian-type observations, i. e., the system is allowed to re-interpret the observation of an event based on its future suffix.

PPG-based singing voice conversion with adversarial representation learning

no code implementations28 Oct 2020 Zhonghao Li, Benlai Tang, Xiang Yin, Yuan Wan, Ling Xu, Chen Shen, Zejun Ma

Singing voice conversion (SVC) aims to convert the voice of one singer to that of other singers while keeping the singing content and melody.

Representation Learning Voice Conversion +1

Opacity Enforcing Supervisory Control using Non-deterministic Supervisors

no code implementations20 Oct 2020 Yifan Xie, Xiang Yin, ShaoYuan Li

Compared with the standard deterministic control mechanism, such a non-deterministic control mechanism can enhance the plausible deniability of the controlled system as the online control decision is a random realization and cannot be implicitly inferred from the control policy.

End-to-End Learning for Simultaneously Generating Decision Map and Multi-Focus Image Fusion Result

2 code implementations17 Oct 2020 Boyuan Ma, Xiang Yin, Di wu, Xiaojuan Ban

In this work, to handle the requirements of both output image quality and comprehensive simplicity of structure implementation, we propose a cascade network to simultaneously generate decision map and fused result with an end-to-end training procedure.

2D Cyclist Detection

Xiaomingbot: A Multilingual Robot News Reporter

no code implementations ACL 2020 Runxin Xu, Jun Cao, Mingxuan Wang, Jiaze Chen, Hao Zhou, Ying Zeng, Yu-Ping Wang, Li Chen, Xiang Yin, Xijin Zhang, Songcheng Jiang, Yuxuan Wang, Lei LI

This paper proposes the building of Xiaomingbot, an intelligent, multilingual and multimodal software robot equipped with four integral capabilities: news generation, news translation, news reading and avatar animation.

News Generation Translation +1

Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech

no code implementations19 May 2020 Wenjie Li, Benlai Tang, Xiang Yin, Yushi Zhao, Wei Li, Kang Wang, Hao Huang, Yuxuan Wang, Zejun Ma

Accent conversion (AC) transforms a non-native speaker's accent into a native accent while maintaining the speaker's voice timbre.

ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders

no code implementations23 Apr 2020 Yu Gu, Xiang Yin, Yonghui Rao, Yuan Wan, Benlai Tang, Yang Zhang, Jitong Chen, Yuxuan Wang, Zejun Ma

This paper presents ByteSing, a Chinese singing voice synthesis (SVS) system based on duration allocated Tacotron-like acoustic models and WaveRNN neural vocoders.

Singing Voice Synthesis

A hybrid text normalization system using multi-head self-attention for mandarin

no code implementations11 Nov 2019 Junhui Zhang, Junjie Pan, Xiang Yin, Chen Li, Shichao Liu, Yang Zhang, Yuxuan Wang, Zejun Ma

In this paper, we propose a hybrid text normalization system using multi-head self-attention.

A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis

no code implementations11 Nov 2019 Junjie Pan, Xiang Yin, Zhiling Zhang, Shichao Liu, Yang Zhang, Zejun Ma, Yuxuan Wang

In Mandarin text-to-speech (TTS) system, the front-end text processing module significantly influences the intelligibility and naturalness of synthesized speech.

Polyphone disambiguation Speech Synthesis +1

