no code implementations • 12 May 2022 • Le Zhang, Zichao Yang, Diyi Yang
Data augmentation is an effective approach to tackle over-fitting.
1 code implementation • 29 Dec 2021 • Xiaonan Nie, Shijie Cao, Xupeng Miao, Lingxiao Ma, Jilong Xue, Youshan Miao, Zichao Yang, Zhi Yang, Bin Cui
However, we found that the current approach of jointly training experts and the sparse gate introduces a negative impact on model accuracy, diminishing the efficiency of expensive large-scale model training.
no code implementations • 29 Sep 2021 • Wentao Zhang, Zeang Sheng, Mingyu Yang, Yang Li, Yu Shen, Zhi Yang, Zichao Yang, Bin Cui
First, GNNs can learn higher-order structural information by stacking more layers but can not deal with large depth due to the over-smoothing issue.
1 code implementation • 29 Jun 2021 • Guangyi Liu, Zichao Yang, Tianhua Tao, Xiaodan Liang, Junwei Bao, Zhen Li, Xiaodong He, Shuguang Cui, Zhiting Hu
Such training objective is sub-optimal when the target sequence is not perfect, e. g., when the target sequence is corrupted with noises, or when only weak sequence supervision is available.
1 code implementation • EMNLP 2020 • Jiaao Chen, Zhenghui Wang, Ran Tian, Zichao Yang, Diyi Yang
Named Entity Recognition (NER) is one of the first stages in deep language understanding yet current NER models heavily rely on human-annotated data.
1 code implementation • NAACL 2021 • Bowen Tan, Zichao Yang, Maruan AI-Shedivat, Eric P. Xing, Zhiting Hu
However, as our systematic examination reveals, it is still challenging for such models to generate coherent long passages of text (e. g., 1000 tokens), especially when the models are fine-tuned to the target domain on a small corpus.
2 code implementations • ACL 2020 • Jiaao Chen, Zichao Yang, Diyi Yang
This paper presents MixText, a semi-supervised learning method for text classification, which uses our newly designed data augmentation method called TMix.
no code implementations • 10 Nov 2019 • Chao Zhang, Zichao Yang, Xiaodong He, Li Deng
This review provides a comprehensive analysis of recent works on multimodal deep learning from three perspectives: learning multimodal representations, fusing multimodal signals at various levels, and multimodal applications.
no code implementations • NAACL 2019 • Diyi Yang, Jiaao Chen, Zichao Yang, Dan Jurafsky, Eduard Hovy
Modeling what makes a request persuasive - eliciting the desired response from a reader - is critical to the study of propaganda, behavioral economics, and advertising.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Shuai Lin, Wentao Wang, Zichao Yang, Xiaodan Liang, Frank F. Xu, Eric Xing, Zhiting Hu
That is, the model learns to imitate the writing style of any given exemplar sentence, with automatic adaptions to faithfully describe the content record.
no code implementations • 24 Nov 2018 • Bowen Tan, Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Eric Xing
Reinforcement learning such as policy gradient addresses the issue but can have prohibitively poor exploration efficiency.
no code implementations • ICLR Workshop drlStructPred 2019 • Bowen Tan*, Zhiting Hu*, Zichao Yang, Ruslan Salakhutdinov, Eric P. Xing
We present a generalized entropy regularized policy optimization formulation, and show that the apparently divergent algorithms can all be reformulated as special instances of the framework, with the only difference being the configurations of reward function and a couple of hyperparameters.
no code implementations • 27 Sep 2018 • Wentao Wang, Zhiting Hu, Zichao Yang, Haoran Shi, Eric P. Xing
Neural text generation models such as recurrent networks are typically trained by maximizing data log-likelihood based on cross entropy.
3 code implementations • ACL 2019 • Zhiting Hu, Haoran Shi, Bowen Tan, Wentao Wang, Zichao Yang, Tiancheng Zhao, Junxian He, Lianhui Qin, Di Wang, Xuezhe Ma, Zhengzhong Liu, Xiaodan Liang, Wangrong Zhu, Devendra Singh Sachan, Eric P. Xing
The versatile toolkit also fosters technique sharing across different text generation tasks.
no code implementations • WS 2018 • Zhiting Hu, Zichao Yang, Tiancheng Zhao, Haoran Shi, Junxian He, Di Wang, Xuezhe Ma, Zhengzhong Liu, Xiaodan Liang, Lianhui Qin, Devendra Singh Chaplot, Bowen Tan, Xingjiang Yu, Eric Xing
The features make Texar particularly suitable for technique sharing and generalization across different text generation applications.
no code implementations • NeurIPS 2018 • Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Xiaodan Liang, Lianhui Qin, Haoye Dong, Eric Xing
The broad set of deep generative models (DGMs) has achieved remarkable advances.
1 code implementation • NeurIPS 2018 • Zichao Yang, Zhiting Hu, Chris Dyer, Eric P. Xing, Taylor Berg-Kirkpatrick
Binary classifiers are often employed as discriminators in GAN-based unsupervised style transfer systems to ensure that transferred sentences are similar to sentences in the target domain.
no code implementations • ICLR 2018 • Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Eric P. Xing
Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), as emerging families for generative model learning, have largely been considered as two distinct paradigms and received extensive independent studies respectively.
3 code implementations • ICML 2017 • Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, Eric P. Xing
Generic generation and manipulation of text is challenging and has limited success compared to recent deep generative modeling in visual domain.
3 code implementations • ICML 2017 • Zichao Yang, Zhiting Hu, Ruslan Salakhutdinov, Taylor Berg-Kirkpatrick
Recent work on generative modeling of text has found that variational auto-encoders (VAE) incorporating LSTM decoders perform worse than simpler LSTM language models (Bowman et al., 2015).
Ranked #3 on
Text Generation
on Yahoo Questions
no code implementations • EMNLP 2017 • Zichao Yang, Phil Blunsom, Chris Dyer, Wang Ling
We propose a general class of language models that treat reference as an explicit stochastic latent variable.
Ranked #1 on
Recipe Generation
on allrecipes.com
no code implementations • EACL 2017 • Zichao Yang, Zhiting Hu, Yuntian Deng, Chris Dyer, Alex Smola
Knowing which words have been attended to in previous time steps while generating a translation is a rich source of information for predicting what words will be attended to in the future.
15 code implementations • CVPR 2016 • Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Smola
Thus, we develop a multiple-layer SAN in which we query an image multiple times to infer the answer progressively.
Ranked #5 on
Visual Question Answering
on VQA v1 test-std
1 code implementation • ICCV 2015 • Zichao Yang, Marcin Moczulski, Misha Denil, Nando de Freitas, Alex Smola, Le Song, Ziyu Wang
The fully connected layers of a deep convolutional neural network typically contain over 90% of the network parameters, and consume the majority of the memory required to store the network parameters.
Ranked #56 on
Image Classification
on MNIST
no code implementations • 19 Dec 2014 • Zichao Yang, Alexander J. Smola, Le Song, Andrew Gordon Wilson
Kernel methods have great promise for learning rich statistical representations of large modern datasets.