Search Results for author: Xiang-Yang Li

Found 25 papers, 10 papers with code

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

no code implementations5 Mar 2024 Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao

Specifically, 1) we design a neural codec with factorized vector quantization (FVQ) to disentangle speech waveform into subspaces of content, prosody, timbre, and acoustic details; 2) we propose a factorized diffusion model to generate attributes in each subspace following its corresponding prompt.

Quantization Speech Synthesis

Secure Transformer Inference

1 code implementation14 Nov 2023 Mu Yuan, Lan Zhang, Xiang-Yang Li

Our protocol, Secure Transformer Inference Protocol (STIP), can be applied to real-world services like ChatGPT.

PromptTTS 2: Describing and Generating Voices with Text Prompt

no code implementations5 Sep 2023 Yichong Leng, Zhifang Guo, Kai Shen, Xu Tan, Zeqian Ju, Yanqing Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He, Xiang-Yang Li, Sheng Zhao, Tao Qin, Jiang Bian

TTS approaches based on the text prompt face two main challenges: 1) the one-to-many problem, where not all details about voice variability can be described in the text prompt, and 2) the limited availability of text prompt datasets, where vendors and large cost of data labeling are required to write text prompts for speech.

Language Modelling Large Language Model

PacketGame: Multi-Stream Packet Gating for Concurrent Video Inference at Scale

1 code implementation journal 2023 Mu Yuan, Lan Zhang, Xuanke You, Xiang-Yang Li

The resource efficiency of video analytics workloads is critical for large-scale deployments on edge nodes and cloud clusters.

Video Compression

Tight Memory-Regret Lower Bounds for Streaming Bandits

no code implementations13 Jun 2023 Shaoang Li, Lan Zhang, Junhao Wang, Xiang-Yang Li

We establish the tight worst-case regret lower bound of $\Omega \left( (TB)^{\alpha} K^{1-\alpha}\right), \alpha = 2^{B} / (2^{B+1}-1)$ for any algorithm with a time horizon $T$, number of arms $K$, and number of passes $B$.

SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition

1 code implementation2 Dec 2022 Yichong Leng, Xu Tan, Wenjie Liu, Kaitao Song, Rui Wang, Xiang-Yang Li, Tao Qin, Edward Lin, Tie-Yan Liu

In this paper, we propose SoftCorrect with a soft error detection mechanism to avoid the limitations of both explicit and implicit error detection.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Data Origin Inference in Machine Learning

1 code implementation24 Nov 2022 Mingxue Xu, Xiang-Yang Li

We formally define the data origin and the data origin inference task in the development of the ML model (mainly neural networks).

Inference Attack Memorization

MLink: Linking Black-Box Models from Multiple Domains for Collaborative Inference

3 code implementations28 Sep 2022 Mu Yuan, Lan Zhang, Zimu Zheng, Yi-Nan Zhang, Xiang-Yang Li

The cost efficiency of model inference is critical to real-world machine learning (ML) applications, especially for delay-sensitive tasks and resource-limited devices.

Collaborative Inference Multi-Task Learning +1

InFi: End-to-End Learning to Filter Input for Resource-Efficiency in Mobile-Centric Inference

3 code implementations28 Sep 2022 Mu Yuan, Lan Zhang, Fengxiang He, Xueting Tong, Miao-Hui Song, Zhengyuan Xu, Xiang-Yang Li

Previous efforts have tailored effective solutions for many applications, but left two essential questions unanswered: (1) theoretical filterability of an inference workload to guide the application of input filtering techniques, thereby avoiding the trial-and-error cost for resource-constrained mobile applications; (2) robust discriminability of feature embedding to allow input filtering to be widely effective for diverse inference tasks and input content.

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis

1 code implementation30 May 2022 Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Jiawei Chen, Xu Tan, Danilo Mandic, Lei He, Xiang-Yang Li, Tao Qin, Sheng Zhao, Tie-Yan Liu

Combining this novel perspective of two-stage synthesis with advanced generative models (i. e., the diffusion models), the proposed BinauralGrad is able to generate accurate and high-fidelity binaural audio samples.

Audio Synthesis

FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition

1 code implementation Findings (EMNLP) 2021 Yichong Leng, Xu Tan, Rui Wang, Linchen Zhu, Jin Xu, Wenjie Liu, Linquan Liu, Tao Qin, Xiang-Yang Li, Edward Lin, Tie-Yan Liu

Although multiple candidates are generated by an ASR system through beam search, current error correction approaches can only correct one sentence at a time, failing to leverage the voting effect from multiple candidates to better detect and correct error tokens.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

1 code implementation NeurIPS 2021 Yichong Leng, Xu Tan, Linchen Zhu, Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiang-Yang Li, Ed Lin, Tie-Yan Liu

A straightforward solution to reduce latency, inspired by non-autoregressive (NAR) neural machine translation, is to use an NAR sequence generation model for ASR error correction, which, however, comes at the cost of significantly increased ASR error rate.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Dataset Bias in Few-shot Image Recognition

no code implementations18 Aug 2020 Shuqiang Jiang, Yaohui Zhu, Chenlong Liu, Xinhang Song, Xiang-Yang Li, Weiqing Min

Second, we investigate performance differences on different datasets from dataset structures and different few-shot learning methods.

Few-Shot Learning

Learning to Reweight with Deep Interactions

no code implementations9 Jul 2020 Yang Fan, Yingce Xia, Lijun Wu, Shufang Xie, Weiqing Liu, Jiang Bian, Tao Qin, Xiang-Yang Li

Recently, the concept of teaching has been introduced into machine learning, in which a teacher model is used to guide the training of a student model (which will be used in real tasks) through data selection, loss function design, etc.

Image Classification Machine Translation +1

Multi-branch Attentive Transformer

1 code implementation18 Jun 2020 Yang Fan, Shufang Xie, Yingce Xia, Lijun Wu, Tao Qin, Xiang-Yang Li, Tie-Yan Liu

While the multi-branch architecture is one of the key ingredients to the success of computer vision tasks, it has not been well investigated in natural language processing, especially sequence learning tasks.

Code Generation Machine Translation +2

FenceMask: A Data Augmentation Approach for Pre-extracted Image Features

no code implementations14 Jun 2020 Pu Li, Xiang-Yang Li, Xiang Long

It is based on the 'simulation of object occlusion' strategy, which aim to achieve the balance between object occlusion and information retention of the input data.

Data Augmentation Fine-Grained Visual Categorization +1

Review of Text Style Transfer Based on Deep Learning

no code implementations6 May 2020 Xiang-Yang Li, Guo Pu, Keyu Ming, Pu Li, Jie Wang, Yuxuan Wang

In the traditional text style transfer model, the text style is generally relied on by experts knowledge and hand-designed rules, but with the application of deep learning in the field of natural language processing, the text style transfer method based on deep learning Started to be heavily researched.

Style Transfer Text Style Transfer

Comprehensive and Efficient Data Labeling via Adaptive Model Scheduling

no code implementations8 Feb 2020 Mu Yuan, Lan Zhang, Xiang-Yang Li, Hui Xiong

With limited computing resources and stringent delay, given a data stream and a collection of applicable resource-hungry deep-learning models, we design a novel approach to adaptively schedule a subset of these models to execute on each data item, aiming to maximize the value of the model output (e. g., the number of high-confidence labels).

Image Retrieval Management +3

Weighted Laplacian and Its Theoretical Applications

no code implementations23 Nov 2019 Shijie Xu, Jiayan Fang, Xiang-Yang Li

In this paper, we develop a novel weighted Laplacian method, which is partially inspired by the theory of graph Laplacian, to study recent popular graph problems, such as multilevel graph partitioning and balanced minimum cut problem, in a more convenient manner.

Clustering graph partitioning

Multifaceted Analysis of Fine-Tuning in Deep Model for Visual Recognition

no code implementations11 Jul 2019 Xiang-Yang Li, Luis Herranz, Shuqiang Jiang

In this paper, we introduce and systematically investigate several factors that influence the performance of fine-tuning for visual recognition.

Unsupervised Pivot Translation for Distant Languages

no code implementations ACL 2019 Yichong Leng, Xu Tan, Tao Qin, Xiang-Yang Li, Tie-Yan Liu

In this work, we introduce unsupervised pivot translation for distant languages, which translates a language to a distant language through multiple hops, and the unsupervised translation on each hop is relatively easier than the original direct translation.

Machine Translation NMT +1

Learning to Teach

no code implementations ICLR 2018 Yang Fan, Fei Tian, Tao Qin, Xiang-Yang Li, Tie-Yan Liu

Teaching plays a very important role in our society, by spreading human knowledge and educating our next generations.

BIG-bench Machine Learning Image Classification

Scene recognition with CNNs: objects, scales and dataset bias

no code implementations CVPR 2016 Luis Herranz, Shuqiang Jiang, Xiang-Yang Li

Thus, adapting the feature extractor to each particular scale (i. e. scale-specific CNNs) is crucial to improve recognition, since the objects in the scenes have their specific range of scales.

Scene Recognition

Towards Distribution-Free Multi-Armed Bandits with Combinatorial Strategies

no code implementations20 Jul 2013 Xiang-Yang Li, Shaojie Tang, Yaqin Zhou

At each decision epoch, we select a strategy, i. e., a subset of RVs, subject to arbitrary constraints on constituent RVs.

Multi-Armed Bandits

Cannot find the paper you are looking for? You can Submit a new open access paper.