Search Results for author: Yi Ren

Found 115 papers, 45 papers with code

FastSpeech: Fast, Robust and Controllable Text to Speech

21 code implementations NeurIPS 2019 Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

In this work, we propose a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS.

Ranked #10 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Speech Synthesis Text-To-Speech Synthesis

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

32 code implementations ICLR 2021 Yi Ren, Chenxu Hu, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

In this paper, we propose FastSpeech 2, which addresses the issues in FastSpeech and better solves the one-to-many mapping problem in TTS by 1) directly training the model with ground-truth target instead of the simplified output from teacher, and 2) introducing more variation information of speech (e. g., pitch, energy and more accurate duration) as conditional inputs.

Ranked #6 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Knowledge Distillation Speech Synthesis +1

FastSpeech: Fast,Robustand Controllable Text-to-Speech

11 code implementations22 May 2019 Yi Ren, Yangjun Ruan, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

Compared with traditional concatenative and statistical parametric approaches, neural network based end-to-end models suffer from slow inference speed, and the synthesized speech is usually not robust (i. e., some words are skipped or repeated) and lack of controllability (voice speed or prosody control).

Text-To-Speech Synthesis

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism

7 code implementations6 May 2021 Jinglin Liu, Chengxi Li, Yi Ren, Feiyang Chen, Zhou Zhao

Singing voice synthesis (SVS) systems are built to synthesize high-quality and expressive singing voice, in which the acoustic model generates the acoustic features (e. g., mel-spectrogram) given a music score.

Generative Adversarial Network Singing Voice Synthesis +1

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

1 code implementation25 Apr 2023 Rongjie Huang, Mingze Li, Dongchao Yang, Jiatong Shi, Xuankai Chang, Zhenhui Ye, Yuning Wu, Zhiqing Hong, Jiawei Huang, Jinglin Liu, Yi Ren, Zhou Zhao, Shinji Watanabe

In this work, we propose a multi-modal AI system named AudioGPT, which complements LLMs (i. e., ChatGPT) with 1) foundation models to process complex audio information and solve numerous understanding and generation tasks; and 2) the input/output interface (ASR, TTS) to support spoken dialogue.

SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint

1 code implementation9 Dec 2020 Zhonghao Sheng, Kaitao Song, Xu Tan, Yi Ren, Wei Ye, Shikun Zhang, Tao Qin

Automatic song writing aims to compose a song (lyric and/or melody) by machine, which is an interesting topic in both academia and industry.

Sentence

PortaSpeech: Portable and High-Quality Generative Text-to-Speech

3 code implementations NeurIPS 2021 Yi Ren, Jinglin Liu, Zhou Zhao

Non-autoregressive text-to-speech (NAR-TTS) models such as FastSpeech 2 and Glow-TTS can synthesize high-quality speech from the given text in parallel.

Text-To-Speech Synthesis Vocal Bursts Intensity Prediction +1

Learning the Beauty in Songs: Neural Singing Voice Beautifier

3 code implementations ACL 2022 Jinglin Liu, Chengxi Li, Yi Ren, Zhiying Zhu, Zhou Zhao

Furthermore, we propose a latent-mapping algorithm in the latent space to convert the amateur vocal tone to the professional one.

Dynamic Time Warping

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis

1 code implementation31 Jan 2023 Zhenhui Ye, Ziyue Jiang, Yi Ren, Jinglin Liu, Jinzheng He, Zhou Zhao

Generating photo-realistic video portrait with arbitrary speech audio is a crucial problem in film-making and virtual reality.

Lip Reading Talking Face Generation +1

ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech

3 code implementations13 Jul 2022 Rongjie Huang, Zhou Zhao, Huadai Liu, Jinglin Liu, Chenye Cui, Yi Ren

Through the preliminary study on diffusion model parameterization, we find that previous gradient-based TTS models require hundreds or thousands of iterations to guarantee high sample quality, which poses a challenge for accelerating sampling.

Denoising Knowledge Distillation +3

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models

1 code implementation30 Jan 2023 Rongjie Huang, Jiawei Huang, Dongchao Yang, Yi Ren, Luping Liu, Mingze Li, Zhenhui Ye, Jinglin Liu, Xiang Yin, Zhou Zhao

Its application to audio still lags behind for two main reasons: the lack of large-scale datasets with high-quality text-audio pairs, and the complexity of modeling long continuous audio data.

Audio Generation Text-to-Video Generation +1

PopMAG: Pop Music Accompaniment Generation

1 code implementation18 Aug 2020 Yi Ren, Jinzheng He, Xu Tan, Tao Qin, Zhou Zhao, Tie-Yan Liu

To improve harmony, in this paper, we propose a novel MUlti-track MIDI representation (MuMIDI), which enables simultaneous multi-track generation in a single sequence and explicitly models the dependency of the notes from different tracks.

Music Modeling

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis

2 code implementations21 Apr 2022 Rongjie Huang, Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu, Yi Ren, Zhou Zhao

Also, FastDiff enables a sampling speed of 58x faster than real-time on a V100 GPU, making diffusion models practically applicable to speech synthesis deployment for the first time.

Ranked #7 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Denoising Speech Synthesis +2

GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech

2 code implementations15 May 2022 Rongjie Huang, Yi Ren, Jinglin Liu, Chenye Cui, Zhou Zhao

Style transfer for out-of-domain (OOD) speech synthesis aims to generate speech samples with unseen style (e. g., speaker identity, emotion, and prosody) derived from an acoustic reference, while facing the following challenges: 1) The highly dynamic style features in expressive voice are difficult to model and transfer; and 2) the TTS models should be robust enough to handle diverse OOD conditions that differ from the source data.

Speech Synthesis Style Transfer +1

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation

1 code implementation25 May 2022 Rongjie Huang, Jinglin Liu, Huadai Liu, Yi Ren, Lichao Zhang, Jinzheng He, Zhou Zhao

Specifically, a sequence of discrete representations derived in a self-supervised manner are predicted from the model and passed to a vocoder for speech reconstruction, while still facing the following challenges: 1) Acoustic multimodality: the discrete units derived from speech with same content could be indeterministic due to the acoustic property (e. g., rhythm, pitch, and energy), which causes deterioration of translation accuracy; 2) high latency: current S2ST systems utilize autoregressive models which predict each unit conditioned on the sequence previously generated, failing to take full advantage of parallelism.

Representation Learning Speech Synthesis +2

Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus

1 code implementation MM '21: Proceedings of the 29th ACM International Conference on Multimedia 2021 Rongjie Huang, Feiyang Chen, Yi Ren, Jinglin Liu, Chenye Cui, Zhou Zhao

High-fidelity multi-singer singing voice synthesis is challenging for neural vocoder due to the singing voice data shortage, limited singer generalization, and large computational cost.

Audio Generation Singing Voice Synthesis +1

Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech

1 code implementation5 Jun 2022 Ziyue Jiang, Zhe Su, Zhou Zhao, Qian Yang, Yi Ren, Jinglin Liu, Zhenhui Ye

This paper tackles the polyphone disambiguation problem from a concise and novel perspective: we propose Dict-TTS, a semantic-aware generative text-to-speech model with an online website dictionary (the existing prior information in the natural language).

Polyphone disambiguation

Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective

1 code implementation NeurIPS 2023 Pengfei Wei, Lingdong Kong, Xinghua Qu, Yi Ren, Zhiqiang Xu, Jing Jiang, Xiang Yin

Specifically, we consider the generation of cross-domain videos from two sets of latent factors, one encoding the static information and another encoding the dynamic information.

Action Recognition Disentanglement +1

Parallel and High-Fidelity Text-to-Lip Generation

1 code implementation14 Jul 2021 Jinglin Liu, Zhiying Zhu, Yi Ren, Wencan Huang, Baoxing Huai, Nicholas Yuan, Zhou Zhao

However, the AR decoding manner generates current lip frame conditioned on frames generated previously, which inherently hinders the inference speed, and also has a detrimental effect on the quality of generated lip frames due to error propagation.

Talking Face Generation Text-to-Face Generation +1

MUG: A General Meeting Understanding and Generation Benchmark

1 code implementation24 Mar 2023 Qinglin Zhang, Chong Deng, Jiaqing Liu, Hai Yu, Qian Chen, Wen Wang, Zhijie Yan, Jinglin Liu, Yi Ren, Zhou Zhao

To prompt SLP advancement, we establish a large-scale general Meeting Understanding and Generation Benchmark (MUG) to benchmark the performance of a wide range of SLP tasks, including topic segmentation, topic-level and session-level extractive summarization and topic title generation, keyphrase extraction, and action item detection.

Extractive Summarization Keyphrase Extraction +1

Multilingual Neural Machine Translation with Knowledge Distillation

1 code implementation ICLR 2019 Xu Tan, Yi Ren, Di He, Tao Qin, Zhou Zhao, Tie-Yan Liu

Multilingual machine translation, which translates multiple languages with a single model, has attracted much attention due to its efficiency of offline training and online serving.

Knowledge Distillation Machine Translation +1

Diffusion Denoising Process for Perceptron Bias in Out-of-distribution Detection

1 code implementation21 Nov 2022 Luping Liu, Yi Ren, Xize Cheng, Rongjie Huang, Chongxuan Li, Zhou Zhao

In this paper, we introduce a new perceptron bias assumption that suggests discriminator models are more sensitive to certain features of the input, leading to the overconfidence problem.

Denoising Out-of-Distribution Detection +1

Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling

1 code implementation19 Dec 2023 Rui Liu, Yifan Hu, Yi Ren, Xiang Yin, Haizhou Li

Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.

Contrastive Learning Speech Synthesis

Benchmarking Large Multimodal Models against Common Corruptions

1 code implementation22 Jan 2024 Jiawei Zhang, Tianyu Pang, Chao Du, Yi Ren, Bo Li, Min Lin

This technical report aims to fill a deficiency in the assessment of large multimodal models (LMMs) by specifically examining the self-consistency of their outputs when subjected to common corruptions.

Benchmarking

MultiSpeech: Multi-Speaker Text to Speech with Transformer

1 code implementation8 Jun 2020 Mingjian Chen, Xu Tan, Yi Ren, Jin Xu, Hao Sun, Sheng Zhao, Tao Qin, Tie-Yan Liu

Transformer-based text to speech (TTS) model (e. g., Transformer TTS~\cite{li2019neural}, FastSpeech~\cite{ren2019fastspeech}) has shown the advantages of training and inference efficiency over RNN-based model (e. g., Tacotron~\cite{shen2018natural}) due to its parallel computation in training and/or inference.

Practical Quasi-Newton Methods for Training Deep Neural Networks

1 code implementation NeurIPS 2020 Donald Goldfarb, Yi Ren, Achraf Bahamou

We consider the development of practical stochastic quasi-Newton, and in particular Kronecker-factored block-diagonal BFGS and L-BFGS methods, for training deep neural networks (DNNs).

Improving Item Cold-start Recommendation via Model-agnostic Conditional Variational Autoencoder

1 code implementation27 May 2022 Xu Zhao, Yi Ren, Ying Du, Shenzheng Zhang, Nian Wang

This paper attempts to tackle the item cold-start problem by generating enhanced warmed-up ID embeddings for cold items with historical data and limited interaction records.

News Recommendation Recommendation Systems

Slate-Aware Ranking for Recommendation

1 code implementation24 Feb 2023 Yi Ren, Xiao Han, Xu Zhao, Shenzheng Zhang, Yan Zhang

Therefore, the ranking stage is still essential for most applications to provide high-quality candidate set for the re-ranking stage.

Recommendation Systems Re-Ranking

One-Shot Generation of Near-Optimal Topology through Theory-Driven Machine Learning

1 code implementation27 Jul 2018 Ruijin Cang, Hope Yao, Yi Ren

We introduce a theory-driven mechanism for learning a neural network model that performs generative topology design in one shot given a problem setting, circumventing the conventional iterative process that computational design tasks usually entail.

BIG-bench Machine Learning

Compositional Languages Emerge in a Neural Iterated Learning Model

1 code implementation ICLR 2020 Yi Ren, Shangmin Guo, Matthieu Labeau, Shay B. Cohen, Simon Kirby

The principle of compositionality, which enables natural language to represent complex concepts via a structured combination of simpler ones, allows us to convey an open-ended set of messages using a limited vocabulary.

Detector Guidance for Multi-Object Text-to-Image Generation

1 code implementation4 Jun 2023 Luping Liu, Zijian Zhang, Yi Ren, Rongjie Huang, Xiang Yin, Zhou Zhao

Previous works identify the problem of information mixing in the CLIP text encoder and introduce the T5 text encoder or incorporate strong prior knowledge to assist with the alignment.

Object object-detection +2

Video-Guided Curriculum Learning for Spoken Video Grounding

1 code implementation1 Sep 2022 Yan Xia, Zhou Zhao, Shangwei Ye, Yang Zhao, Haoyuan Li, Yi Ren

To rectify the discriminative phonemes and extract video-related information from noisy audio, we develop a novel video-guided curriculum learning (VGCL) during the audio pre-training process, which can make use of the vital visual perceptions to help understand the spoken language and suppress the external noise.

Video Grounding

Targeting SARS-CoV-2 with AI- and HPC-enabled Lead Generation: A First Data Release

1 code implementation28 May 2020 Yadu Babuji, Ben Blaiszik, Tom Brettin, Kyle Chard, Ryan Chard, Austin Clyde, Ian Foster, Zhi Hong, Shantenu Jha, Zhuozhao Li, Xuefeng Liu, Arvind Ramanathan, Yi Ren, Nicholaus Saint, Marcus Schwarting, Rick Stevens, Hubertus van Dam, Rick Wagner

Researchers across the globe are seeking to rapidly repurpose existing drugs or discover new drugs to counter the the novel coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

BIG-bench Machine Learning

Inductive Bias and Language Expressivity in Emergent Communication

1 code implementation4 Dec 2020 Shangmin Guo, Yi Ren, Agnieszka Słowik, Kory Mathewson

Referential games and reconstruction games are the most common game types for studying emergent languages.

Inductive Bias

Expressivity of Emergent Language is a Trade-off between Contextual Complexity and Unpredictability

1 code implementation7 Jun 2021 Shangmin Guo, Yi Ren, Kory Mathewson, Simon Kirby, Stefano V. Albrecht, Kenny Smith

Researchers are using deep learning models to explore the emergence of language in various language games, where agents interact and develop an emergent language to solve tasks.

Improving Direct Physical Properties Prediction of Heterogeneous Materials from Imaging Data via Convolutional Neural Network and a Morphology-Aware Generative Model

1 code implementation7 Dec 2017 Ruijin Cang, Hechao Li, Hope Yao, Yang Jiao, Yi Ren

Direct prediction of material properties from microstructures through statistical models has shown to be a potential approach to accelerating computational material design with large design spaces.

Computational Physics Materials Science

Targeted Attack on Deep RL-based Autonomous Driving with Learned Visual Patterns

1 code implementation16 Sep 2021 Prasanth Buddareddygari, Travis Zhang, Yezhou Yang, Yi Ren

This paper investigates the feasibility of targeted attacks through visually learned patterns placed on physical objects in the environment, a threat model that combines the practicality and effectiveness of the existing ones.

Autonomous Driving

Tensor Normal Training for Deep Learning Models

1 code implementation NeurIPS 2021 Yi Ren, Donald Goldfarb

Based on the so-called tensor normal (TN) distribution, we propose and analyze a brand new approximate natural gradient method, Tensor Normal Training (TNT), which like Shampoo, only requires knowledge of the shape of the training parameters.

Second-order methods

Toward Global Sensing Quality Maximization: A Configuration Optimization Scheme for Camera Networks

1 code implementation28 Nov 2022 Xuechao Zhang, Xuda Ding, Yi Ren, Yu Zheng, Chongrong Fang, Jianping He

Then, we form a single quantity that measures the sensing quality of the targets by the camera network.

Attributing Image Generative Models using Latent Fingerprints

1 code implementation17 Apr 2023 GuangYu Nie, Changhoon Kim, Yezhou Yang, Yi Ren

This paper investigates the use of latent semantic dimensions as fingerprints, from where we can analyze the effects of design variables, including the choice of fingerprinting dimensions, strength, and capacity, on the accuracy-quality tradeoff.

Attribute

Unbiased Pairwise Learning to Rank in Recommender Systems

1 code implementation25 Nov 2021 Yi Ren, Hongyan Tang, Siwen Zhu

To provide personalized high quality recommendation results, conventional systems usually train pointwise rankers to predict the absolute value of objectives and leverage a distinct shallow tower to estimate and alleviate the impact of position bias.

Attribute Learning-To-Rank +2

Better Supervisory Signals by Observing Learning Paths

1 code implementation ICLR 2022 Yi Ren, Shangmin Guo, Danica J. Sutherland

Observing the learning path not only provides a new perspective for understanding knowledge distillation, overfitting, and learning dynamics, but also reveals that the supervisory signal of a teacher network can be very unstable near the best points in training on real tasks.

Knowledge Distillation

Gear Training: A new way to implement high-performance model-parallel training

no code implementations11 Jun 2018 Hao Dong, Shuai Li, Dongchang Xu, Yi Ren, Di Zhang

The training of Deep Neural Networks usually needs tremendous computing resources.

Example-Based Image Synthesis via Randomized Patch-Matching

no code implementations23 Sep 2016 Yi Ren, Yaniv Romano, Michael Elad

Image and texture synthesis is a challenging task that has long been drawing attention in the fields of image processing, graphics, and machine learning.

Image Generation Patch Matching +1

Augmenting Model Robustness with Transformation-Invariant Attacks

no code implementations31 Jan 2019 Houpu Yao, Zhe Wang, GuangYu Nie, Yassine Mazboudi, Yezhou Yang, Yi Ren

The vulnerability of neural networks under adversarial attacks has raised serious concerns and motivated extensive research.

Image Cropping Translation

Low-cost Measurement of Industrial Shock Signals via Deep Learning Calibration

no code implementations7 Feb 2019 Houpu Yao, Jingjing Wen, Yi Ren, Bin Wu, Ze Ji

The results show that the proposed network is capable to map low-end shock signals to its high-end counterparts with satisfactory accuracy.

Image Decomposition and Classification through a Generative Model

no code implementations9 Feb 2019 Houpu Yao, Malcolm Regan, Yezhou Yang, Yi Ren

We demonstrate in this paper that a generative model can be designed to perform classification tasks under challenging settings, including adversarial attacks and input distribution shifts.

Classification General Classification

Almost Unsupervised Text to Speech and Automatic Speech Recognition

no code implementations13 May 2019 Yi Ren, Xu Tan, Tao Qin, Sheng Zhao, Zhou Zhao, Tie-Yan Liu

Text to speech (TTS) and automatic speech recognition (ASR) are two dual tasks in speech processing and both achieve impressive performance thanks to the recent advance in deep learning and large amount of aligned speech and text data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Efficient Subsampled Gauss-Newton and Natural Gradient Methods for Training Neural Networks

no code implementations5 Jun 2019 Yi Ren, Donald Goldfarb

We present practical Levenberg-Marquardt variants of Gauss-Newton and natural gradient methods for solving non-convex optimization problems that arise in training deep neural networks involving enormous numbers of variables and huge data sets.

A Study of Multilingual Neural Machine Translation

no code implementations25 Dec 2019 Xu Tan, Yichong Leng, Jiale Chen, Yi Ren, Tao Qin, Tie-Yan Liu

Multilingual neural machine translation (NMT) has recently been investigated from different aspects (e. g., pivot translation, zero-shot translation, fine-tuning, or training from scratch) and in different settings (e. g., rich resource and low resource, one-to-many, and many-to-one translation).

Machine Translation NMT +1

How Shall I Drive? Interaction Modeling and Motion Planning towards Empathetic and Socially-Graceful Driving

no code implementations28 Jan 2019 Yi Ren, Steven Elliott, Yiwei Wang, Yezhou Yang, Wenlong Zhang

While intelligence of autonomous vehicles (AVs) has significantly advanced in recent years, accidents involving AVs suggest that these autonomous systems lack gracefulness in driving when interacting with human drivers.

Robotics Computer Science and Game Theory

A Study of Non-autoregressive Model for Sequence Generation

no code implementations ACL 2020 Yi Ren, Jinglin Liu, Xu Tan, Zhou Zhao, Sheng Zhao, Tie-Yan Liu

In this work, we conduct a study to understand the difficulty of NAR sequence generation and try to answer: (1) Why NAR models can catch up with AR models in some tasks but not all?

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

UWSpeech: Speech to Speech Translation for Unwritten Languages

no code implementations14 Jun 2020 Chen Zhang, Xu Tan, Yi Ren, Tao Qin, Ke-jun Zhang, Tie-Yan Liu

Existing speech to speech translation systems heavily rely on the text of target language: they usually translate source language either to target text and then synthesize target speech from text, or directly to target speech with target text for auxiliary training.

speech-recognition Speech Recognition +2

SimulSpeech: End-to-End Simultaneous Speech to Text Translation

no code implementations ACL 2020 Yi Ren, Jinglin Liu, Xu Tan, Chen Zhang, Tao Qin, Zhou Zhao, Tie-Yan Liu

In this work, we develop SimulSpeech, an end-to-end simultaneous speech to text translation system which translates speech in source language to text in target language concurrently.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

DeepSinger: Singing Voice Synthesis with Data Mined From the Web

no code implementations9 Jul 2020 Yi Ren, Xu Tan, Tao Qin, Jian Luan, Zhou Zhao, Tie-Yan Liu

DeepSinger has several advantages over previous SVS systems: 1) to the best of our knowledge, it is the first SVS system that directly mines training data from music websites, 2) the lyrics-to-singing alignment model further avoids any human efforts for alignment labeling and greatly reduces labeling cost, 3) the singing model based on a feed-forward Transformer is simple and efficient, by removing the complicated acoustic feature modeling in parametric synthesis and leveraging a reference encoder to capture the timbre of a singer from noisy singing data, and 4) it can synthesize singing voices in multiple languages and multiple singers.

Sentence Singing Voice Synthesis

FastLR: Non-Autoregressive Lipreading Model with Integrate-and-Fire

no code implementations6 Aug 2020 Jinglin Liu, Yi Ren, Zhou Zhao, Chen Zhang, Baoxing Huai, Nicholas Jing Yuan

NAR lipreading is a challenging task that has many difficulties: 1) the discrepancy of sequence lengths between source and target makes it difficult to estimate the length of the output sequence; 2) the conditionally independent behavior of NAR generation lacks the correlation across time which leads to a poor approximation of target distribution; 3) the feature representation ability of encoder can be weak due to lack of effective alignment mechanism; and 4) the removal of AR language model exacerbates the inherent ambiguity problem of lipreading.

Language Modelling Lipreading

LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition

no code implementations9 Aug 2020 Jin Xu, Xu Tan, Yi Ren, Tao Qin, Jian Li, Sheng Zhao, Tie-Yan Liu

However, there are more than 6, 000 languages in the world and most languages are lack of speech training data, which poses significant challenges when building TTS and ASR systems for extremely low-resource languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Decentralized Attribution of Generative Models

no code implementations ICLR 2021 Changhoon Kim, Yi Ren, Yezhou Yang

Growing applications of generative models have led to new threats such as malicious personation and digital copyright infringement.

Invariant Deep Compressible Covariance Pooling for Aerial Scene Categorization

no code implementations11 Nov 2020 Shidong Wang, Yi Ren, Gerard Parr, Yu Guan, Ling Shao

In this article, we propose a novel invariant deep compressible covariance pooling (IDCCP) to solve nuisance variations in aerial scene categorization.

Image Categorization

The Period-Luminosity Relations of Red Supergiants in M33 and M31

no code implementations20 Feb 2019 Yi Ren, B. W. Jiang, Ming Yang, Jian Gao

The period-luminosity (P-L) relation is analyzed for the RSGs in the fundamental mode.

Solar and Stellar Astrophysics Astrophysics of Galaxies

TSSRGCN: Temporal Spectral Spatial Retrieval Graph Convolutional Network for Traffic Flow Forecasting

no code implementations30 Nov 2020 Xu Chen, Yuanxing Zhang, Lun Du, Zheng Fang, Yi Ren, Kaigui Bian, Kunqing Xie

Further analysis indicates that the locality and globality of the traffic networks are critical to traffic flow prediction and the proposed TSSRGCN model can adapt to the various temporal traffic patterns.

Retrieval

Denoising Text to Speech with Frame-Level Noise Modeling

no code implementations17 Dec 2020 Chen Zhang, Yi Ren, Xu Tan, Jinglin Liu, Kejun Zhang, Tao Qin, Sheng Zhao, Tie-Yan Liu

In DenoiSpeech, we handle real-world noisy speech by modeling the fine-grained frame-level noise with a noise condition module, which is jointly trained with the TTS model.

Denoising

Evolved Massive Stars at Low-metallicity IV. Using 1.6 $μ$m "H-bump" to identify red supergiant stars: a case study of NGC 6822

no code implementations21 Jan 2021 Ming Yang, Alceste Z. Bonanos, Biwei Jiang, Man I Lam, Jian Gao, Panagiotis Gavras, Grigoris Maravelias, Shu Wang, Xiao-Dian Chen, Frank Tramper, Yi Ren, Zoi T. Spetsieri

Further separating RSG candidates from the rest of the LSG candidates is done by using semi-empirical criteria on NIR CMDs and resulted in 323 RSG candidates.

Solar and Stellar Astrophysics Astrophysics of Galaxies

Kronecker-factored Quasi-Newton Methods for Deep Learning

no code implementations12 Feb 2021 Yi Ren, Achraf Bahamou, Donald Goldfarb

Several improvements to the methods in Goldfarb et al. (2020) are also proposed that can be applied to both MLPs and CNNs.

Second-order methods

Graph Intention Network for Click-through Rate Prediction in Sponsored Search

no code implementations30 Mar 2021 Feng Li, Zhenrui Chen, Pengjie Wang, Yi Ren, Di Zhang, Xiaoyu Zhu

Moreover, it is difficult for user to jump out of their specific historical behaviors for possible interest exploration, namely weak generalization problem.

Click-Through Rate Prediction Graph Learning

EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model

no code implementations17 Jun 2021 Chenye Cui, Yi Ren, Jinglin Liu, Feiyang Chen, Rongjie Huang, Ming Lei, Zhou Zhao

Finally, by showing a comparable performance in the emotional speech synthesis task, we successfully demonstrate the ability of the proposed model.

Emotional Speech Synthesis Emotion Classification

Data-Driven Learning of 3-Point Correlation Functions as Microstructure Representations

1 code implementation6 Sep 2021 Sheng Cheng, Yang Jiao, Yi Ren

This paper considers the open challenge of identifying complete, concise, and explainable quantitative microstructure representations for disordered heterogeneous material systems.

Bayesian Optimization

Data Generation Method for Learning a Low-dimensional Safe Region in Safe Reinforcement Learning

no code implementations10 Sep 2021 Zhehua Zhou, Ozgur S. Oguz, Yi Ren, Marion Leibold, Martin Buss

Safe reinforcement learning aims to learn a control policy while ensuring that neither the system nor the environment gets damaged during the learning process.

reinforcement-learning Reinforcement Learning (RL) +1

SynCLR: A Synthesis Framework for Contrastive Learning of out-of-domain Speech Representations

no code implementations29 Sep 2021 Rongjie Huang, Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu, Zhou Zhao, Yi Ren

Learning generalizable speech representations for unseen samples in different domains has been a challenge with ever increasing importance to date.

Contrastive Learning Data Augmentation +4

PhaseFool: Phase-oriented Audio Adversarial Examples via Energy Dissipation

no code implementations29 Sep 2021 Ziyue Jiang, Yi Ren, Zhou Zhao

In this work, we propose a novel phase-oriented algorithm named PhaseFool that can efficiently construct imperceptible audio adversarial examples with energy dissipation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Expressivity of Emergent Languages is a Trade-off between Contextual Complexity and Unpredictability

no code implementations ICLR 2022 Shangmin Guo, Yi Ren, Kory Wallace Mathewson, Simon Kirby, Stefano V Albrecht, Kenny Smith

Researchers are using deep learning models to explore the emergence of language in various language games, where simulated agents interact and develop an emergent language to solve a task.

FedSpeech: Federated Text-to-Speech with Continual Learning

no code implementations14 Oct 2021 Ziyue Jiang, Yi Ren, Ming Lei, Zhou Zhao

Federated learning enables collaborative training of machine learning models under strict privacy restrictions and federated text-to-speech aims to synthesize natural speech of multiple users with a few audio training samples stored in their devices locally.

Continual Learning Federated Learning

MIC: Model-agnostic Integrated Cross-channel Recommenders

no code implementations22 Oct 2021 Yujie Lu, Ping Nie, Shengyu Zhang, Ming Zhao, Ruobing Xie, William Yang Wang, Yi Ren

However, existing work are primarily built upon pre-defined retrieval channels, including User-CF (U2U), Item-CF (I2I), and Embedding-based Retrieval (U2I), thus access to the limited correlation between users and items which solely entail from partial information of latent interactions.

Recommendation Systems Retrieval +2

MR-SVS: Singing Voice Synthesis with Multi-Reference Encoder

no code implementations11 Jan 2022 Shoutong Wang, Jinglin Liu, Yi Ren, Zhen Wang, Changliang Xu, Zhou Zhao

However, they face several challenges: 1) the fixed-size speaker embedding is not powerful enough to capture full details of the target timbre; 2) single reference audio does not contain sufficient timbre information of the target speaker; 3) the pitch inconsistency between different speakers also leads to a degradation in the generated voice.

Singing Voice Synthesis

Video-based Facial Micro-Expression Analysis: A Survey of Datasets, Features and Algorithms

no code implementations30 Jan 2022 Xianye Ben, Yi Ren, Junping Zhang, Su-Jing Wang, Kidiyo Kpalma, Weixiao Meng, Yong-Jin Liu

Unlike the conventional facial expressions, micro-expressions are involuntary and transient facial expressions capable of revealing the genuine emotions that people attempt to hide.

A Mini-Block Fisher Method for Deep Neural Networks

no code implementations8 Feb 2022 Achraf Bahamou, Donald Goldfarb, Yi Ren

Specifically, our method uses a block-diagonal approximation to the empirical Fisher matrix, where for each layer in the DNN, whether it is convolutional or feed-forward and fully connected, the associated diagonal block is itself block-diagonal and is composed of a large number of mini-blocks of modest size.

Second-order methods

ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech

no code implementations16 Feb 2022 Yi Ren, Ming Lei, Zhiying Huang, Shiliang Zhang, Qian Chen, Zhijie Yan, Zhou Zhao

Specifically, we first introduce a word-level prosody encoder, which quantizes the low-frequency band of the speech and compresses prosody attributes in the latent prosody vector (LPV).

Revisiting Over-Smoothness in Text to Speech

no code implementations ACL 2022 Yi Ren, Xu Tan, Tao Qin, Zhou Zhao, Tie-Yan Liu

Then we conduct a comprehensive study on NAR-TTS models that use some advanced modeling methods.

SSR-GNNs: Stroke-based Sketch Representation with Graph Neural Networks

no code implementations27 Apr 2022 Sheng Cheng, Yi Ren, Yezhou Yang

This paper follows cognitive studies to investigate a graph representation for sketches, where the information of strokes, i. e., parts of a sketch, are encoded on vertices and information of inter-stroke on edges.

Approximating Discontinuous Nash Equilibrial Values of Two-Player General-Sum Differential Games

no code implementations5 Jul 2022 Lei Zhang, Mukesh Ghimire, Wenlong Zhang, Zhe Xu, Yi Ren

This paper investigates two potential solutions to this problem: a hybrid method that leverages both supervised Nash equilibria and the HJI PDE, and a value-hardening method where a sequence of HJIs are solved with a gradually hardening reward.

Autonomous Driving Self-Supervised Learning

A Study of Syntactic Multi-Modality in Non-Autoregressive Machine Translation

no code implementations NAACL 2022 Kexun Zhang, Rui Wang, Xu Tan, Junliang Guo, Yi Ren, Tao Qin, Tie-Yan Liu

Furthermore, we take the best of both and design a new loss function to better handle the complicated syntactic multi-modality in real-world datasets.

Machine Translation Translation

DA$^2$ Dataset: Toward Dexterity-Aware Dual-Arm Grasping

no code implementations31 Jul 2022 Guangyao Zhai, Yu Zheng, Ziwei Xu, Xin Kong, Yong liu, Benjamin Busam, Yi Ren, Nassir Navab, Zhengyou Zhang

In this paper, we introduce DA$^2$, the first large-scale dual-arm dexterity-aware dataset for the generation of optimal bimanual grasping pairs for arbitrary large objects.

Principles for generation of reverberation

no code implementations11 Nov 2022 Yi Ren, Yanyang Xiao, Guo-Qiang Bi, Pek-Ming Lau

With these understandings, we developed a STDP based learning rule which could drive the network to remember any presupposed sequence.

VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement

no code implementations19 Nov 2022 Chenye Cui, Yi Ren, Jinglin Liu, Rongjie Huang, Zhou Zhao

In this paper, we pose the task of generating sound with a specific timbre given a video input and a reference audio sample.

Disentanglement

XNLI: Explaining and Diagnosing NLI-based Visual Data Analysis

no code implementations25 Jan 2023 Yingchaojie Feng, Xingbo Wang, Bo Pan, Kam Kwai Wong, Yi Ren, Shi Liu, Zihan Yan, Yuxin Ma, Huamin Qu, Wei Chen

Our research explores how to provide explanations for NLIs to help users locate the problems and further revise the queries.

Data Visualization

Learning 6-DoF Fine-grained Grasp Detection Based on Part Affordance Grounding

no code implementations27 Jan 2023 Yaoxian Song, Penglei Sun, Yi Ren, Yu Zheng, Yue Zhang

To evaluate the effectiveness, we perform multi-level difficulty part language grounding grasping experiments and deploy our proposed model on a real robot.

Representation Learning Robotic Grasping

How to prepare your task head for finetuning

no code implementations11 Feb 2023 Yi Ren, Shangmin Guo, Wonho Bae, Danica J. Sutherland

We identify a significant trend in the effect of changes in this initial energy on the resulting features after fine-tuning.

Item Cold Start Recommendation via Adversarial Variational Auto-encoder Warm-up

no code implementations28 Feb 2023 Shenzheng Zhang, Qi Tan, Xinzhi Zheng, Yi Ren, Xu Zhao

The gap between the randomly initialized item ID embedding and the well-trained warm item ID embedding makes the cold items hard to suit the recommendation system, which is trained on the data of historical warm items.

News Recommendation

Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG)

no code implementations24 Mar 2023 Qinglin Zhang, Chong Deng, Jiaqing Liu, Hai Yu, Qian Chen, Wen Wang, Zhijie Yan, Jinglin Liu, Yi Ren, Zhou Zhao

ICASSP2023 General Meeting Understanding and Generation Challenge (MUG) focuses on prompting a wide range of spoken language processing (SLP) research on meeting transcripts, as SLP applications are critical to improve users' efficiency in grasping important information in meetings.

Extractive Summarization Keyphrase Extraction

Unbiased Pairwise Learning from Implicit Feedback for Recommender Systems without Biased Variance Control

no code implementations11 Apr 2023 Yi Ren, Hongyan Tang, Jiangpeng Rong, Siwen Zhu

As pairwise learning suits well for the ranking tasks, the previously proposed unbiased pairwise learning algorithm already achieves state-of-the-art performance.

Recommendation Systems

GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation

no code implementations1 May 2023 Zhenhui Ye, Jinzheng He, Ziyue Jiang, Rongjie Huang, Jiawei Huang, Jinglin Liu, Yi Ren, Xiang Yin, Zejun Ma, Zhou Zhao

Recently, neural radiance field (NeRF) has become a popular rendering technique in this field since it could achieve high-fidelity and 3D-consistent talking face generation with a few-minute-long training video.

motion prediction Talking Face Generation

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation

no code implementations24 May 2023 Rongjie Huang, Huadai Liu, Xize Cheng, Yi Ren, Linjun Li, Zhenhui Ye, Jinzheng He, Lichao Zhang, Jinglin Liu, Xiang Yin, Zhou Zhao

Direct speech-to-speech translation (S2ST) aims to convert speech from one language into another, and has demonstrated significant progress to date.

Speech-to-Speech Translation Translation

Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation

no code implementations29 May 2023 Jiawei Huang, Yi Ren, Rongjie Huang, Dongchao Yang, Zhenhui Ye, Chen Zhang, Jinglin Liu, Xiang Yin, Zejun Ma, Zhou Zhao

Finally, we use LLMs to augment and transform a large amount of audio-label data into audio-text datasets to alleviate the problem of scarcity of temporal data.

Audio Generation Denoising +2

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias

no code implementations6 Jun 2023 Ziyue Jiang, Yi Ren, Zhenhui Ye, Jinglin Liu, Chen Zhang, Qian Yang, Shengpeng Ji, Rongjie Huang, Chunfeng Wang, Xiang Yin, Zejun Ma, Zhou Zhao

3) We further use a VQGAN-based acoustic model to generate the spectrogram and a latent code language model to fit the distribution of prosody, since prosody changes quickly over time in a sentence, and language models can capture both local and long-range dependencies.

Attribute Inductive Bias +3

GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech

no code implementations27 Jun 2023 Yahuan Cong, Haoyu Zhang, Haopeng Lin, Shichao Liu, Chunfeng Wang, Yi Ren, Xiang Yin, Zejun Ma

Cross-lingual timbre and style generalizable text-to-speech (TTS) aims to synthesize speech with a specific reference timbre or style that is never trained in the target language.

Disentanglement Style Generalization

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis

no code implementations14 Jul 2023 Ziyue Jiang, Jinglin Liu, Yi Ren, Jinzheng He, Zhenhui Ye, Shengpeng Ji, Qian Yang, Chen Zhang, Pengfei Wei, Chunfeng Wang, Xiang Yin, Zejun Ma, Zhou Zhao

However, the prompting mechanisms of zero-shot TTS still face challenges in the following aspects: 1) previous works of zero-shot TTS are typically trained with single-sentence prompts, which significantly restricts their performance when the data is relatively sufficient during the inference stage.

In-Context Learning Language Modelling +3

Unbiased Image Synthesis via Manifold-Driven Sampling in Diffusion Models

no code implementations17 Jul 2023 Xingzhe Su, Yi Ren, Wenwen Qiang, Zeen Song, Hang Gao, Fengge Wu, Changwen Zheng

Diffusion models are a potent class of generative models capable of producing high-quality images.

Image Generation

Disentangled Counterfactual Reasoning for Unbiased Sequential Recommendation

no code implementations5 Aug 2023 Yi Ren, Xu Zhao, Hongyan Tang, Shuai Li

In this paper, we propose a structural causal model-based method to address the popularity bias issue for sequential recommendation model learning.

counterfactual Counterfactual Reasoning +1

Deep Mutual Learning across Task Towers for Effective Multi-Task Recommender Learning

no code implementations19 Sep 2023 Yi Ren, Ying Du, Bin Wang, Shenzheng Zhang

Recommender systems usually leverage multi-task learning methods to simultaneously optimize several objectives because of the multi-faceted user behavior data.

Multi-Task Learning Recommendation Systems

AdaFlood: Adaptive Flood Regularization

no code implementations6 Nov 2023 Wonho Bae, Yi Ren, Mohamad Osama Ahmed, Frederick Tung, Danica J. Sutherland, Gabriel L. Oliveira

Although neural networks are conventionally optimized towards zero training loss, it has been recently learned that targeting a non-zero training loss threshold, referred to as a flood level, often enables better test time generalization.

Value Approximation for Two-Player General-Sum Differential Games with State Constraints

no code implementations28 Nov 2023 Lei Zhang, Mukesh Ghimire, Wenlong Zhang, Zhe Xu, Yi Ren

Solving Hamilton-Jacobi-Isaacs (HJI) PDEs enables equilibrial feedback control in two-player differential games, yet faces the curse of dimensionality (CoD).

Physics-informed machine learning

Pontryagin Neural Operator for Solving Parametric General-Sum Differential Games

no code implementations3 Jan 2024 Lei Zhang, Mukesh Ghimire, Zhe Xu, Wenlong Zhang, Yi Ren

To address these challenges, we propose in this paper a Pontryagin-mode neural operator that outperforms existing state-of-the-art (SOTA) on safety performance across games with parametric state constraints.

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis

no code implementations16 Jan 2024 Zhenhui Ye, Tianyun Zhong, Yi Ren, Jiaqi Yang, Weichuang Li, Jiawei Huang, Ziyue Jiang, Jinzheng He, Rongjie Huang, Jinglin Liu, Chen Zhang, Xiang Yin, Zejun Ma, Zhou Zhao

One-shot 3D talking portrait generation aims to reconstruct a 3D avatar from an unseen image, and then animate it with a reference video or audio to generate a talking portrait video.

3D Reconstruction Super-Resolution +1

Sample Relationship from Learning Dynamics Matters for Generalisation

no code implementations16 Jan 2024 Shangmin Guo, Yi Ren, Stefano V. Albrecht, Kenny Smith

Although much research has been done on proposing new models or loss functions to improve the generalisation of artificial neural networks (ANNs), less attention has been directed to the impact of the training data on generalisation.

ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition

1 code implementation27 Mar 2024 Weidong Xie, Lun Luo, Nanfei Ye, Yi Ren, Shaoyi Du, Minhang Wang, Jintao Xu, Rui Ai, Weihao Gu, Xieyuanli Chen

Experimental results on the KITTI dataset show that our proposed methods achieve state-of-the-art performance while running in real time.

Depth Estimation

Cannot find the paper you are looking for? You can Submit a new open access paper.