Search Results for author: Yixuan Zhou

Found 16 papers, 9 papers with code

DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models

no code implementations27 Feb 2025 Weihao wu, Zhiwei Lin, Yixuan Zhou, Jingbei Li, Rui Niu, Qinghua Wu, Songjun Cao, Long Ma, Zhiyong Wu

A diffusion-based context-aware prosody predictor is proposed to sample diverse prosody embeddings conditioned on multimodal conversational context.

Diversity Language Modeling +2

SongCreator: Lyrics-based Universal Song Generation

no code implementations9 Sep 2024 Shun Lei, Yixuan Zhou, Boshi Tang, Max W. Y. Lam, Feng Liu, Hangyu Liu, Jingcheng Wu, Shiyin Kang, Zhiyong Wu, Helen Meng

While various aspects of song generation have been explored by previous works, such as singing voice, vocal composition and instrumental arrangement, etc., generating songs with both vocals and accompaniment given lyrics remains a significant challenge, hindering the application of music generation models in the real world.

Language Modelling Music Generation

VQ-Flow: Taming Normalizing Flows for Multi-Class Anomaly Detection via Hierarchical Vector Quantization

1 code implementation2 Sep 2024 Yixuan Zhou, Xing Xu, Zhe Sun, Jingkuan Song, Andrzej Cichocki, Heng Tao Shen

Through the integration of vector quantization (VQ), we empower the flow models to distinguish different concepts of multi-class normal data in an unsupervised manner, resulting in a novel flow-based unified method, named VQ-Flow.

Multi-class Anomaly Detection Quantization +1

Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models

no code implementations18 Jul 2024 Weiqin Li, Peiji Yang, Yicheng Zhong, Yixuan Zhou, Zhisheng Wang, Zhiyong Wu, Xixin Wu, Helen Meng

Moreover, fine-grained prosody modeling is introduced to enhance the model's ability to capture subtle prosody variations in spontaneous speech. Experimental results show that our proposed method significantly outperforms the baseline methods in terms of prosody naturalness and spontaneous behavior naturalness.

Language Modeling Language Modelling +3

BatchNorm-based Weakly Supervised Video Anomaly Detection

1 code implementation26 Nov 2023 Yixuan Zhou, Yi Qu, Xing Xu, Fumin Shen, Jingkuan Song, HengTao Shen

In the proposed BN-WVAD, we leverage the Divergence of Feature from Mean vector (DFM) of BatchNorm as a reliable abnormality criterion to discern potential abnormal snippets in abnormal videos.

Anomaly Detection In Surveillance Videos Weakly-supervised Video Anomaly Detection

X-HRNet: Towards Lightweight Human Pose Estimation with Spatially Unidimensional Self-Attention

1 code implementation12 Oct 2023 Yixuan Zhou, Xuanhan Wang, Xing Xu, Lei Zhao, Jingkuan Song

Inspired by this observation, we introduce a lightweight and powerful alternative, Spatially Unidimensional Self-Attention (SUSA), to the pointwise (1x1) convolution that is the main computational bottleneck in the depthwise separable 3c3 convolution.

Pose Estimation

MSFlow: Multi-Scale Flow-based Framework for Unsupervised Anomaly Detection

1 code implementation29 Aug 2023 Yixuan Zhou, Xing Xu, Jingkuan Song, Fumin Shen, Heng Tao Shen

Unsupervised anomaly detection (UAD) attracts a lot of research interest and drives widespread applications, where only anomaly-free samples are available for training.

Anomaly Localization Unsupervised Anomaly Detection

ImbSAM: A Closer Look at Sharpness-Aware Minimization in Class-Imbalanced Recognition

2 code implementations ICCV 2023 Yixuan Zhou, Yi Qu, Xing Xu, HengTao Shen

To overcome this bottleneck, we leverage class priors to restrict the generalization scope of the class-agnostic SAM and propose a class-aware smoothness optimization algorithm named Imbalanced-SAM (ImbSAM).

Semi-supervised Anomaly Detection Supervised Anomaly Detection

AnoOnly: Semi-Supervised Anomaly Detection with the Only Loss on Anomalies

1 code implementation30 May 2023 Yixuan Zhou, Peiyu Yang, Yi Qu, Xing Xu, Zhe Sun, Andrzej Cichocki

Unlike existing SSAD methods that resort to strict loss supervision, AnoOnly suspends it and introduces a form of weak supervision for normal data.

Semi-supervised Anomaly Detection Supervised Anomaly Detection +1

KTN: Knowledge Transfer Network for Learning Multi-person 2D-3D Correspondences

1 code implementation21 Jun 2022 Xuanhan Wang, Lianli Gao, Yixuan Zhou, Jingkuan Song, Meng Wang

Human densepose estimation, aiming at establishing dense correspondences between 2D pixels of human body and 3D human body template, is a key technique in enabling machines to have an understanding of people in images.

Human Part Segmentation Transfer Learning

A Character-level Span-based Model for Mandarin Prosodic Structure Prediction

1 code implementation31 Mar 2022 Xueyuan Chen, Changhe Song, Yixuan Zhou, Zhiyong Wu, Changbin Chen, Zhongqin Wu, Helen Meng

In this paper, we propose a span-based Mandarin prosodic structure prediction model to obtain an optimal prosodic structure tree, which can be converted to corresponding prosodic label sequence.

Sentence Text to Speech

Post-training Quantization for Neural Networks with Provable Guarantees

2 code implementations26 Jan 2022 Jinjie Zhang, Yixuan Zhou, Rayan Saab

Additionally, our error analysis expands the results of previous work on GPFQ to handle general quantization alphabets, showing that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights -- i. e., level of over-parametrization.

Quantization

Syntactic representation learning for neural network based TTS with syntactic parse tree traversal

no code implementations13 Dec 2020 Changhe Song, Jingbei Li, Yixuan Zhou, Zhiyong Wu, Helen Meng

Meanwhile, nuclear-norm maximization loss is introduced to enhance the discriminability and diversity of the embeddings of constituent labels.

Diversity Representation Learning +2

Cannot find the paper you are looking for? You can Submit a new open access paper.