Search Results for author: Xinsheng Wang

Found 13 papers, 3 papers with code

StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion

no code implementations5 Aug 2024 Zhichao Wang, Yuanzhe Chen, Xinsheng Wang, Lei Xie, Yuping Wang

StreamVoice+ integrates a semantic encoder and a connector with the original StreamVoice framework, now trained using a non-streaming ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

SCDNet: Self-supervised Learning Feature-based Speaker Change Detection

no code implementations12 Jun 2024 Yue Li, Xinsheng Wang, Li Zhang, Lei Xie

Furthermore, a contrastive learning method is proposed to mitigate the overfitting tendencies in the training of both the fine-tuning-based method and SCDNet.

Change Detection Contrastive Learning +1

StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion

no code implementations19 Jan 2024 Zhichao Wang, Yuanzhe Chen, Xinsheng Wang, Lei Xie, Yuping Wang

Specifically, to enable streaming capability, StreamVoice employs a fully causal context-aware LM with a temporal-independent acoustic predictor, while alternately processing semantic and acoustic features at each time step of autoregression which eliminates the dependence on complete source speech.

Language Modelling Voice Conversion

MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling

no code implementations3 Sep 2023 Zhichao Wang, Xinsheng Wang, Qicong Xie, Tao Li, Lei Xie, Qiao Tian, Yuping Wang

In addition to conveying the linguistic content from source speech to converted speech, maintaining the speaking style of source speech also plays an important role in the voice conversion (VC) task, which is essential in many scenarios with highly expressive source speech, such as dubbing and data augmentation.

Data Augmentation Disentanglement +3

Delivering Speaking Style in Low-resource Voice Conversion with Multi-factor Constraints

no code implementations16 Nov 2022 Zhichao Wang, Xinsheng Wang, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang

Conveying the linguistic content and maintaining the source speech's speaking style, such as intonation and emotion, is essential in voice conversion (VC).

Voice Conversion

Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios

no code implementations23 Dec 2021 Qicong Xie, Tao Li, Xinsheng Wang, Zhichao Wang, Lei Xie, Guoqiao Yu, Guanglu Wan

Moreover, the explicit prosody features used in the prosody predicting module can increase the diversity of synthetic speech by adjusting the value of prosody features.

Diversity Speech Synthesis +2

AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person

no code implementations9 Aug 2021 Xinsheng Wang, Qicong Xie, Jihua Zhu, Lei Xie, Scharenborg

In this paper, we present an automatic method to generate synchronized speech and talking-head videos on the basis of text and a single face image of an arbitrary person as input.

Talking Head Generation

Show and Speak: Directly Synthesize Spoken Description of Images

1 code implementation23 Oct 2020 Xinsheng Wang, Siyuan Feng, Jihua Zhu, Mark Hasegawa-Johnson, Odette Scharenborg

This paper proposes a new model, referred to as the show and speak (SAS) model that, for the first time, is able to directly synthesize spoken descriptions of images, bypassing the need for any text or phonemes.

Decoder

S2IGAN: Speech-to-Image Generation via Adversarial Learning

2 code implementations14 May 2020 Xinsheng Wang, Tingting Qiao, Jihua Zhu, Alan Hanjalic, Odette Scharenborg

An estimated half of the world's languages do not have a written form, making it impossible for these languages to benefit from any existing text-based technologies.

Image Generation

Domain segmentation and adjustment for generalized zero-shot learning

no code implementations1 Feb 2020 Xinsheng Wang, Shanmin Pang, Jihua Zhu

In the generalized zero-shot learning, synthesizing unseen data with generative models has been the most popular method to address the imbalance of training data between seen and unseen classes.

Generalized Zero-Shot Learning

Competing Ratio Loss for Discriminative Multi-class Image Classification

1 code implementation25 Dec 2019 Ke Zhang, Yurong Guo, Xinsheng Wang, Dongliang Chang, Zhenbing Zhao, Zhanyu Ma, Tony X. Han

However, during the training of the deep convolutional neural network, the value of NLLR is not always positive or negative, which severely affects the convergence of NLLR.

Age Estimation Classification +3

Competing Ratio Loss for Discriminative Multi-class Image Classification

no code implementations31 Jul 2019 Ke Zhang, Xinsheng Wang, Yurong Guo, Zhenbing Zhao, Zhanyu Ma, Tony X. Han

A lot of studies of image classification based on deep convolutional neural network focus on the network structure to improve the image classification performance.

Age Estimation Classification +3

Visual Space Optimization for Zero-shot Learning

no code implementations30 Jun 2019 Xinsheng Wang, Shanmin Pang, Jihua Zhu, Zhongyu Li, Zhiqiang Tian, Yaochen Li

The other is to optimize the visual feature structure in an intermediate embedding space, and in this method we successfully devise a multilayer perceptron framework based algorithm that is able to learn the common intermediate embedding space and meanwhile to make the visual data structure more distinctive.

Zero-Shot Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.