no code implementations • IWSLT (EMNLP) 2018 • Yuguang Wang, Liangliang Shi, Linyu Wei, Weifeng Zhu, Jinkun Chen, Zhichao Wang, Shixue Wen, Wei Chen, Yanfeng Wang, Jia Jia
Our final average result on speech translation is 31. 02 BLEU.
no code implementations • 3 May 2022 • Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu, Greg Yang
We study the first gradient descent step on the first-layer parameters $\boldsymbol{W}$ in a two-layer neural network: $f(\boldsymbol{x}) = \frac{1}{\sqrt{N}}\boldsymbol{a}^\top\sigma(\boldsymbol{W}^\top\boldsymbol{x})$, where $\boldsymbol{W}\in\mathbb{R}^{d\times N}, \boldsymbol{a}\in\mathbb{R}^{N}$ are randomly initialized, and the training objective is the empirical MSE loss: $\frac{1}{n}\sum_{i=1}^n (f(\boldsymbol{x}_i)-y_i)^2$.
no code implementations • 5 Feb 2022 • Xiangmeng Wang, Qian Li, Dianer Yu, Peng Cui, Zhichao Wang, Guandong Xu
Traditional recommendation models trained on observational interaction data have generated large impacts in a wide range of applications, it faces bias problems that cover users' true intent and thus deteriorate the recommendation effectiveness.
no code implementations • 2 Jan 2022 • Wendong Gan, Bolong Wen, Ying Yan, Haitao Chen, Zhichao Wang, Hongqiang Du, Lei Xie, Kaixuan Guo, Hai Li
Specifically, prosody vector is first extracted from pre-trained VQ-Wav2Vec model, where rich prosody information is embedded while most speaker and environment information are removed effectively by quantization.
no code implementations • 23 Dec 2021 • Qicong Xie, Tao Li, Xinsheng Wang, Zhichao Wang, Lei Xie, Guoqiao Yu, Guanglu Wan
Moreover, the explicit prosody features used in the prosody predicting module can increase the diversity of synthetic speech by adjusting the value of prosody features.
no code implementations • 24 Nov 2021 • Zhichao Wang, Qicong Xie, Tao Li, Hongqiang Du, Lei Xie, Pengcheng Zhu, Mengxiao Bi
One-shot style transfer is a challenging task, since training on one utterance makes model extremely easy to over-fit to training data and causes low speaker similarity and lack of expressiveness.
no code implementations • 20 Sep 2021 • Zhichao Wang, Yizhe Zhu
In this paper, we study the two-layer fully connected neural network given by $f(X)=\frac{1}{\sqrt{d_1}}\boldsymbol{a}^\top\sigma\left(WX\right)$, where $X\in\mathbb{R}^{d_0\times n}$ is a deterministic data matrix, $W\in\mathbb{R}^{d_1\times d_0}$ and $\boldsymbol{a}\in\mathbb{R}^{d_1}$ are random Gaussian weights, and $\sigma$ is a nonlinear activation function.
no code implementations • CVPR 2021 • Qian Li, Zhichao Wang, Gang Li, Jun Pang, Guandong Xu
Sinkhorn divergence has become a very popular metric to compare probability distributions in optimal transport.
no code implementations • 16 Jun 2021 • Zhichao Wang, Xinyong Zhou, Fengyu Yang, Tao Li, Hongqiang Du, Lei Xie, Wendong Gan, Haitao Chen, Hai Li
Specifically, prosodic features are used to explicit model prosody, while VAE and reference encoder are used to implicitly model prosody, which take Mel spectrum and bottleneck feature as input respectively.
no code implementations • 8 Apr 2021 • Zhichao Wang, Wenwen Yang, Pan Zhou, Wei Chen
Recently, attention-based encoder-decoder (AED) end-to-end (E2E) models have drawn more and more attention in the field of automatic speech recognition (ASR).
no code implementations • NeurIPS 2020 • Zhou Fan, Zhichao Wang
We study the eigenvalue distributions of the Conjugate Kernel and Neural Tangent Kernel associated to multi-layer feedforward neural networks.
no code implementations • CVPR 2019 • Zhichao Wang, Qian Li, Gang Li, Guandong Xu
In this work, we discover a set of general polynomials that vanish on vectorized PDs and extract the task-adapted feature representation from these polynomials.