no code implementations • 27 Feb 2025 • Weihao wu, Zhiwei Lin, Yixuan Zhou, Jingbei Li, Rui Niu, Qinghua Wu, Songjun Cao, Long Ma, Zhiyong Wu
A diffusion-based context-aware prosody predictor is proposed to sample diverse prosody embeddings conditioned on multimodal conversational context.
no code implementations • 9 Sep 2024 • Shun Lei, Yixuan Zhou, Boshi Tang, Max W. Y. Lam, Feng Liu, Hangyu Liu, Jingcheng Wu, Shiyin Kang, Zhiyong Wu, Helen Meng
While various aspects of song generation have been explored by previous works, such as singing voice, vocal composition and instrumental arrangement, etc., generating songs with both vocals and accompaniment given lyrics remains a significant challenge, hindering the application of music generation models in the real world.
1 code implementation • 2 Sep 2024 • Yixuan Zhou, Xing Xu, Zhe Sun, Jingkuan Song, Andrzej Cichocki, Heng Tao Shen
Through the integration of vector quantization (VQ), we empower the flow models to distinguish different concepts of multi-class normal data in an unsupervised manner, resulting in a novel flow-based unified method, named VQ-Flow.
no code implementations • 18 Jul 2024 • Weiqin Li, Peiji Yang, Yicheng Zhong, Yixuan Zhou, Zhisheng Wang, Zhiyong Wu, Xixin Wu, Helen Meng
Moreover, fine-grained prosody modeling is introduced to enhance the model's ability to capture subtle prosody variations in spontaneous speech. Experimental results show that our proposed method significantly outperforms the baseline methods in terms of prosody naturalness and spontaneous behavior naturalness.
1 code implementation • 26 Nov 2023 • Yixuan Zhou, Yi Qu, Xing Xu, Fumin Shen, Jingkuan Song, HengTao Shen
In the proposed BN-WVAD, we leverage the Divergence of Feature from Mean vector (DFM) of BatchNorm as a reliable abnormality criterion to discern potential abnormal snippets in abnormal videos.
Anomaly Detection In Surveillance Videos
Weakly-supervised Video Anomaly Detection
1 code implementation • 12 Oct 2023 • Yixuan Zhou, Xuanhan Wang, Xing Xu, Lei Zhao, Jingkuan Song
Inspired by this observation, we introduce a lightweight and powerful alternative, Spatially Unidimensional Self-Attention (SUSA), to the pointwise (1x1) convolution that is the main computational bottleneck in the depthwise separable 3c3 convolution.
no code implementations • 31 Aug 2023 • Weiqin Li, Shun Lei, Qiaochu Huang, Yixuan Zhou, Zhiyong Wu, Shiyin Kang, Helen Meng
The spontaneous behavior that often occurs in conversations makes speech more human-like compared to reading-style.
1 code implementation • 29 Aug 2023 • Yixuan Zhou, Xing Xu, Jingkuan Song, Fumin Shen, Heng Tao Shen
Unsupervised anomaly detection (UAD) attracts a lot of research interest and drives widespread applications, where only anomaly-free samples are available for training.
Ranked #9 on
Anomaly Detection
on MVTec AD
2 code implementations • ICCV 2023 • Yixuan Zhou, Yi Qu, Xing Xu, HengTao Shen
To overcome this bottleneck, we leverage class priors to restrict the generalization scope of the class-agnostic SAM and propose a class-aware smoothness optimization algorithm named Imbalanced-SAM (ImbSAM).
Semi-supervised Anomaly Detection
Supervised Anomaly Detection
1 code implementation • 30 May 2023 • Yixuan Zhou, Peiyu Yang, Yi Qu, Xing Xu, Zhe Sun, Andrzej Cichocki
Unlike existing SSAD methods that resort to strict loss supervision, AnoOnly suspends it and introduces a form of weak supervision for normal data.
Semi-supervised Anomaly Detection
Supervised Anomaly Detection
+1
1 code implementation • 21 Jun 2022 • Xuanhan Wang, Lianli Gao, Yixuan Zhou, Jingkuan Song, Meng Wang
Human densepose estimation, aiming at establishing dense correspondences between 2D pixels of human body and 3D human body template, is a key technique in enabling machines to have an understanding of people in images.
1 code implementation • 31 Mar 2022 • Xueyuan Chen, Changhe Song, Yixuan Zhou, Zhiyong Wu, Changbin Chen, Zhongqin Wu, Helen Meng
In this paper, we propose a span-based Mandarin prosodic structure prediction model to obtain an optimal prosodic structure tree, which can be converted to corresponding prosodic label sequence.
no code implementations • 23 Mar 2022 • Shun Lei, Yixuan Zhou, Liyang Chen, Zhiyong Wu, Shiyin Kang, Helen Meng
In this paper, we propose a hierarchical framework to model speaking style from context.
2 code implementations • 26 Jan 2022 • Jinjie Zhang, Yixuan Zhou, Rayan Saab
Additionally, our error analysis expands the results of previous work on GPFQ to handle general quantization alphabets, showing that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights -- i. e., level of over-parametrization.
no code implementations • 14 Apr 2021 • Yixuan Zhou, Changhe Song, Jingbei Li, Zhiyong Wu, Yanyao Bian, Dan Su, Helen Meng
Exploiting rich linguistic information in raw text is crucial for expressive text-to-speech (TTS).
no code implementations • 13 Dec 2020 • Changhe Song, Jingbei Li, Yixuan Zhou, Zhiyong Wu, Helen Meng
Meanwhile, nuclear-norm maximization loss is introduced to enhance the discriminability and diversity of the embeddings of constituent labels.