1 code implementation • NAACL 2022 • Guangyi Liu, Zichao Yang, Tianhua Tao, Xiaodan Liang, Junwei Bao, Zhen Li, Xiaodong He, Shuguang Cui, Zhiting Hu
Such training objective is sub-optimal when the target sequence is not perfect, e. g., when the target sequence is corrupted with noises, or when only weak sequence supervision is available.
no code implementations • 29 Feb 2024 • Guangyi Liu, Yu Wang, Zeyu Feng, Qiyu Wu, Liping Tang, Yuan Gao, Zhen Li, Shuguang Cui, Julian McAuley, Eric P. Xing, Zichao Yang, Zhiting Hu
The vast applications of deep generative models are anchored in three core capabilities -- generating new instances, reconstructing inputs, and learning compact representations -- across various data types, such as discrete text/protein sequences and continuous images.
no code implementations • 20 Aug 2023 • Kwok Ping Tsang, Zichao Yang
Based on a record of dissents on FOMC votes and transcripts of the meetings from 1976 to 2017, we develop a deep learning model based on self-attention mechanism to create a measure of disagreement for members in each meeting.
1 code implementation • NeurIPS 2023 • Jiannan Xiang, Tianhua Tao, Yi Gu, Tianmin Shu, ZiRui Wang, Zichao Yang, Zhiting Hu
While large language models (LMs) have shown remarkable capabilities across numerous tasks, they often struggle with simple reasoning and planning in physical environments, such as understanding object permanence or planning household activities.
no code implementations • 8 Apr 2023 • Xiaonan Nie, Xupeng Miao, Zilong Wang, Zichao Yang, Jilong Xue, Lingxiao Ma, Gang Cao, Bin Cui
We first present an empirical analysis on the problems and opportunities of training MoE models, which motivates us to overcome the routing imbalance and fluctuation problems by a dynamic expert management and device placement mechanism.
1 code implementation • 1 Aug 2022 • Guangyi Liu, Zeyu Feng, Yuan Gao, Zichao Yang, Xiaodan Liang, Junwei Bao, Xiaodong He, Shuguang Cui, Zhen Li, Zhiting Hu
This paper proposes a new efficient approach for composable text operations in the compact latent space of text.
Ranked #2 on Unsupervised Text Style Transfer on Yelp
1 code implementation • NAACL 2022 • Le Zhang, Zichao Yang, Diyi Yang
Data augmentation is an effective approach to tackle over-fitting.
1 code implementation • 29 Jun 2021 • Guangyi Liu, Zichao Yang, Tianhua Tao, Xiaodan Liang, Junwei Bao, Zhen Li, Xiaodong He, Shuguang Cui, Zhiting Hu
Such training objective is sub-optimal when the target sequence is not perfect, e. g., when the target sequence is corrupted with noises, or when only weak sequence supervision is available.
1 code implementation • EMNLP 2020 • Jiaao Chen, Zhenghui Wang, Ran Tian, Zichao Yang, Diyi Yang
Named Entity Recognition (NER) is one of the first stages in deep language understanding yet current NER models heavily rely on human-annotated data.
1 code implementation • NAACL 2021 • Bowen Tan, Zichao Yang, Maruan AI-Shedivat, Eric P. Xing, Zhiting Hu
However, as our systematic examination reveals, it is still challenging for such models to generate coherent long passages of text (e. g., 1000 tokens), especially when the models are fine-tuned to the target domain on a small corpus.
2 code implementations • ACL 2020 • Jiaao Chen, Zichao Yang, Diyi Yang
This paper presents MixText, a semi-supervised learning method for text classification, which uses our newly designed data augmentation method called TMix.
no code implementations • 10 Nov 2019 • Chao Zhang, Zichao Yang, Xiaodong He, Li Deng
This review provides a comprehensive analysis of recent works on multimodal deep learning from three perspectives: learning multimodal representations, fusing multimodal signals at various levels, and multimodal applications.
no code implementations • NAACL 2019 • Diyi Yang, Jiaao Chen, Zichao Yang, Dan Jurafsky, Eduard Hovy
Modeling what makes a request persuasive - eliciting the desired response from a reader - is critical to the study of propaganda, behavioral economics, and advertising.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Shuai Lin, Wentao Wang, Zichao Yang, Xiaodan Liang, Frank F. Xu, Eric Xing, Zhiting Hu
That is, the model learns to imitate the writing style of any given exemplar sentence, with automatic adaptions to faithfully describe the content record.
no code implementations • 24 Nov 2018 • Bowen Tan, Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Eric Xing
Reinforcement learning such as policy gradient addresses the issue but can have prohibitively poor exploration efficiency.
no code implementations • 27 Sep 2018 • Wentao Wang, Zhiting Hu, Zichao Yang, Haoran Shi, Eric P. Xing
Neural text generation models such as recurrent networks are typically trained by maximizing data log-likelihood based on cross entropy.
no code implementations • ICLR Workshop drlStructPred 2019 • Bowen Tan*, Zhiting Hu*, Zichao Yang, Ruslan Salakhutdinov, Eric P. Xing
We present a generalized entropy regularized policy optimization formulation, and show that the apparently divergent algorithms can all be reformulated as special instances of the framework, with the only difference being the configurations of reward function and a couple of hyperparameters.
4 code implementations • ACL 2019 • Zhiting Hu, Haoran Shi, Bowen Tan, Wentao Wang, Zichao Yang, Tiancheng Zhao, Junxian He, Lianhui Qin, Di Wang, Xuezhe Ma, Zhengzhong Liu, Xiaodan Liang, Wangrong Zhu, Devendra Singh Sachan, Eric P. Xing
The versatile toolkit also fosters technique sharing across different text generation tasks.
no code implementations • WS 2018 • Zhiting Hu, Zichao Yang, Tiancheng Zhao, Haoran Shi, Junxian He, Di Wang, Xuezhe Ma, Zhengzhong Liu, Xiaodan Liang, Lianhui Qin, Devendra Singh Chaplot, Bowen Tan, Xingjiang Yu, Eric Xing
The features make Texar particularly suitable for technique sharing and generalization across different text generation applications.
no code implementations • NeurIPS 2018 • Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Xiaodan Liang, Lianhui Qin, Haoye Dong, Eric Xing
The broad set of deep generative models (DGMs) has achieved remarkable advances.
1 code implementation • NeurIPS 2018 • Zichao Yang, Zhiting Hu, Chris Dyer, Eric P. Xing, Taylor Berg-Kirkpatrick
Binary classifiers are often employed as discriminators in GAN-based unsupervised style transfer systems to ensure that transferred sentences are similar to sentences in the target domain.
no code implementations • ICLR 2018 • Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Eric P. Xing
Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs), as emerging families for generative model learning, have largely been considered as two distinct paradigms and received extensive independent studies respectively.
3 code implementations • ICML 2017 • Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, Eric P. Xing
Generic generation and manipulation of text is challenging and has limited success compared to recent deep generative modeling in visual domain.
3 code implementations • ICML 2017 • Zichao Yang, Zhiting Hu, Ruslan Salakhutdinov, Taylor Berg-Kirkpatrick
Recent work on generative modeling of text has found that variational auto-encoders (VAE) incorporating LSTM decoders perform worse than simpler LSTM language models (Bowman et al., 2015).
Ranked #3 on Text Generation on Yahoo Questions
no code implementations • EMNLP 2017 • Zichao Yang, Phil Blunsom, Chris Dyer, Wang Ling
We propose a general class of language models that treat reference as an explicit stochastic latent variable.
Ranked #1 on Recipe Generation on allrecipes.com
no code implementations • EACL 2017 • Zichao Yang, Zhiting Hu, Yuntian Deng, Chris Dyer, Alex Smola
Knowing which words have been attended to in previous time steps while generating a translation is a rich source of information for predicting what words will be attended to in the future.
16 code implementations • CVPR 2016 • Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Smola
Thus, we develop a multiple-layer SAN in which we query an image multiple times to infer the answer progressively.
Ranked #5 on Visual Question Answering (VQA) on VQA v1 test-std
1 code implementation • ICCV 2015 • Zichao Yang, Marcin Moczulski, Misha Denil, Nando de Freitas, Alex Smola, Le Song, Ziyu Wang
The fully connected layers of a deep convolutional neural network typically contain over 90% of the network parameters, and consume the majority of the memory required to store the network parameters.
Ranked #54 on Image Classification on MNIST
no code implementations • 19 Dec 2014 • Zichao Yang, Alexander J. Smola, Le Song, Andrew Gordon Wilson
Kernel methods have great promise for learning rich statistical representations of large modern datasets.