Search Results for author: Caiming Xiong

Found 185 papers, 85 papers with code

The Thieves on Sesame Street are Polyglots - Extracting Multilingual Models from Monolingual APIs

no code implementations EMNLP 2020 Nitish Shirish Keskar, Bryan McCann, Caiming Xiong, Richard Socher

Pre-training in natural language processing makes it easier for an adversary with only query access to a victim model to reconstruct a local copy of the victim by training with gibberish input data paired with the victim{'}s labels for that data.

Few-Shot Intent Classification by Gauging Entailment Relationship Between Utterance and Semantic Label

no code implementations EMNLP (NLP4ConvAI) 2021 Jin Qu, Kazuma Hashimoto, Wenhao Liu, Caiming Xiong, Yingbo Zhou

Compared with DNNC, our proposed method is more efficient in both training and serving since it is based upon the entailment between query utterance and labels instead of all the training examples.

Intent Classification Natural Language Inference

Simple Data Augmentation with the Mask Token Improves Domain Adaptation for Dialog Act Tagging

no code implementations EMNLP 2020 Semih Yavuz, Kazuma Hashimoto, Wenhao Liu, Nitish Shirish Keskar, Richard Socher, Caiming Xiong

The concept of Dialogue Act (DA) is universal across different task-oriented dialogue domains - the act of {``}request{''} carries the same speaker intention whether it is for restaurant reservation or flight booking.

Data Augmentation Domain Generalization

RGRecSys: A Toolkit for Robustness Evaluation of Recommender Systems

1 code implementation12 Jan 2022 Zohreh Ovaisi, Shelby Heinecke, Jia Li, Yongfeng Zhang, Elena Zheleva, Caiming Xiong

Robust machine learning is an increasingly important topic that focuses on developing models resilient to various forms of imperfect data.

Recommendation Systems

QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization

no code implementations16 Dec 2021 Alexander R. Fabbri, Chien-Sheng Wu, Wenhao Liu, Caiming Xiong

In this work, we conduct an extensive comparison of entailment and QA-based metrics, demonstrating that carefully choosing the components of a QA-based metric is critical to performance.

Question Answering Question Generation +1

Value Retrieval with Arbitrary Queries for Form-like Documents

no code implementations15 Dec 2021 Mingfei Gao, Le Xue, Chetan Ramaiah, Chen Xing, ran Xu, Caiming Xiong

Unlike previous methods that only address a fixed set of field items, our method predicts target value for an arbitrary query based on the understanding of layout and semantics of a form.

Language Modelling

Towards Open Vocabulary Object Detection without Human-provided Bounding Boxes

no code implementations18 Nov 2021 Mingfei Gao, Chen Xing, Juan Carlos Niebles, Junnan Li, ran Xu, Wenhao Liu, Caiming Xiong

We propose an open vocabulary detection framework that can be trained without manually provided bounding-box annotations.

Object Detection

Dense Hierarchical Retrieval for Open-Domain Question Answering

1 code implementation Findings (EMNLP) 2021 Ye Liu, Kazuma Hashimoto, Yingbo Zhou, Semih Yavuz, Caiming Xiong, Philip S. Yu

In this work, we propose Dense Hierarchical Retrieval (DHR), a hierarchical framework that can generate accurate dense representations of passages by utilizing both macroscopic semantics in the document and microscopic semantics specific to each passage.

Open-Domain Question Answering

Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization

no code implementations21 Oct 2021 Devansh Arpit, Huan Wang, Yingbo Zhou, Caiming Xiong

In Domain Generalization (DG) settings, models trained on a given set of training domains have notoriously chaotic performance on distribution shifted test domains, and stochasticity in optimization (e. g. seed) plays a big role.

Domain Generalization Model Selection

Improving Tail-Class Representation with Centroid Contrastive Learning

no code implementations19 Oct 2021 Anthony Meng Huat Tiong, Junnan Li, Guosheng Lin, Boyang Li, Caiming Xiong, Steven C. H. Hoi

ICCL interpolates two images from a class-agnostic sampler and a class-aware sampler, and trains the model such that the representation of the interpolative image can be used to retrieve the centroids for both source classes.

Contrastive Learning Image Classification +1

Learning Rich Nearest Neighbor Representations from Self-supervised Ensembles

no code implementations19 Oct 2021 Bram Wallace, Devansh Arpit, Huan Wang, Caiming Xiong

Pretraining convolutional neural networks via self-supervision, and applying them in transfer learning, is an incredibly fast-growing field that is rapidly and iteratively improving performance across practically all image domains.

Transfer Learning

Momentum Contrastive Autoencoder: Using Contrastive Learning for Latent Space Distribution Matching in WAE

no code implementations19 Oct 2021 Devansh Arpit, Aadyot, Bhatnagar, Huan Wang, Caiming Xiong

Wasserstein autoencoder (WAE) shows that matching two distributions is equivalent to minimizing a simple autoencoder (AE) loss under the constraint that the latent space of this AE matches a pre-specified prior distribution.

Contrastive Learning Representation Learning

DialFact: A Benchmark for Fact-Checking in Dialogue

1 code implementation15 Oct 2021 Prakhar Gupta, Chien-Sheng Wu, Wenhao Liu, Caiming Xiong

Fact-checking is an essential tool to mitigate the spread of misinformation and disinformation, however, it has been often explored to verify formal single-sentence claims instead of casual conversational claims.

Fact Checking Misinformation

Improving Gender Fairness of Pre-Trained Language Models without Catastrophic Forgetting

no code implementations11 Oct 2021 Zahra Fatemi, Chen Xing, Wenhao Liu, Caiming Xiong

However, given the limited size of the gender-neutral data and its potential distributional mismatch with the original pre-training data, catastrophic forgetting would occur during the second-phase pre-training.

Coreference Resolution Fairness

Field Extraction from Forms with Unlabeled Data

no code implementations8 Oct 2021 Mingfei Gao, Zeyuan Chen, Nikhil Naik, Kazuma Hashimoto, Caiming Xiong, ran Xu

We propose a novel framework to conduct field extraction from forms with unlabeled data.

Robustness Evaluation of Transformer-based Form Field Extractors via Form Attacks

1 code implementation8 Oct 2021 Le Xue, Mingfei Gao, Zeyuan Chen, Caiming Xiong, ran Xu

We propose a novel framework to evaluate the robustness of transformer-based form field extraction methods via form attacks.

Optical Character Recognition

Modeling Dynamic Attributes for Next Basket Recommendation

no code implementations23 Sep 2021 Yongjun Chen, Jia Li, Chenghao Liu, Chenxi Li, Markus Anderle, Julian McAuley, Caiming Xiong

However, properly integrating them into user interest models is challenging since attribute dynamics can be diverse such as time-interval aware, periodic patterns (etc.

Next-basket recommendation

RnG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering

1 code implementation17 Sep 2021 Xi Ye, Semih Yavuz, Kazuma Hashimoto, Yingbo Zhou, Caiming Xiong

We present RnG-KBQA, a Rank-and-Generate approach for KBQA, which remedies the coverage issue with a generation model while preserving a strong generalization capability.

Entity Linking Knowledge Base Question Answering

Contrastive Self-supervised Sequential Recommendation with Robust Augmentation

1 code implementation14 Aug 2021 Zhiwei Liu, Yongjun Chen, Jia Li, Philip S. Yu, Julian McAuley, Caiming Xiong

In this paper, we investigate the application of contrastive Self-Supervised Learning (SSL) to the sequential recommendation, as a way to alleviate some of these issues.

Contrastive Learning Self-Supervised Learning +1

A Theory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning

no code implementations NeurIPS 2021 Pan Zhou, Caiming Xiong, Xiao-Tong Yuan, Steven Hoi

Although intuitive, such a native label assignment strategy cannot reveal the underlying semantic similarity between a query and its positives and negatives, and impairs performance, since some negatives are semantically similar to the query or even share the same semantic class as the query.

Contrastive Learning Representation Learning +2

ERMAS: Becoming Robust to Reward Function Sim-to-Real Gaps in Multi-Agent Simulations

no code implementations10 Jun 2021 Eric Zhao, Alexander R. Trott, Caiming Xiong, Stephan Zheng

We introduce Epsilon-Robust Multi-Agent Simulation (ERMAS), a robust optimization framework for learning AI policies that are robust to such multiagent sim-to-real gaps.

Understanding the Under-Coverage Bias in Uncertainty Estimation

no code implementations NeurIPS 2021 Yu Bai, Song Mei, Huan Wang, Caiming Xiong

Estimating the data uncertainty in regression tasks is often done by learning a quantile function or a prediction interval of the true label conditioned on the input.

Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning

no code implementations NeurIPS 2021 Tengyang Xie, Nan Jiang, Huan Wang, Caiming Xiong, Yu Bai

This offline result is the first that matches the sample complexity lower bound in this setting, and resolves a recent open question in offline RL.

Offline RL

Evaluating State-of-the-Art Classification Models Against Bayes Optimality

1 code implementation NeurIPS 2021 Ryan Theisen, Huan Wang, Lav R. Varshney, Caiming Xiong, Richard Socher

Moreover, we show that by varying the temperature of the learned flow models, we can generate synthetic datasets that closely resemble standard benchmark datasets, but with almost any desired Bayes error.

Unsupervised Out-of-Domain Detection via Pre-trained Transformers

1 code implementation ACL 2021 Keyang Xu, Tongzheng Ren, Shikun Zhang, Yihao Feng, Caiming Xiong

Deployed real-world machine learning applications are often subject to uncontrolled and even potentially malicious inputs.

BookSum: A Collection of Datasets for Long-form Narrative Summarization

1 code implementation18 May 2021 Wojciech Kryściński, Nazneen Rajani, Divyansh Agarwal, Caiming Xiong, Dragomir Radev

The majority of available text summarization datasets include short-form source documents that lack long-range causal and temporal dependencies, and often contain strong layout and stylistic biases.

Abstractive Text Summarization

QAConv: Question Answering on Informative Conversations

1 code implementation14 May 2021 Chien-Sheng Wu, Andrea Madotto, Wenhao Liu, Pascale Fung, Caiming Xiong

In total, we collect 34, 204 QA pairs, including span-based, free-form, and unanswerable questions, from 10, 259 selected conversations with both human-written and machine-generated questions.

Question Answering

Pseudo Siamese Network for Few-shot Intent Generation

no code implementations3 May 2021 Congying Xia, Caiming Xiong, Philip Yu

PSN consists of two identical subnetworks with the same structure but different weights: an action network and an object network.

Intent Detection

Learning to Synthesize Data for Semantic Parsing

1 code implementation NAACL 2021 Bailin Wang, Wenpeng Yin, Xi Victoria Lin, Caiming Xiong

Moreover, explicitly modeling compositions using PCFG leads to a better exploration of unseen programs, thus generate more diverse data.

Domain Generalization Semantic Parsing +3

FeTaQA: Free-form Table Question Answering

1 code implementation1 Apr 2021 Linyong Nan, Chiachun Hsieh, Ziming Mao, Xi Victoria Lin, Neha Verma, Rui Zhang, Wojciech Kryściński, Nick Schoelkopf, Riley Kong, Xiangru Tang, Murori Mutuma, Ben Rosand, Isabel Trindade, Renusree Bandaru, Jacob Cunningham, Caiming Xiong, Dragomir Radev

Existing table question answering datasets contain abundant factual questions that primarily evaluate the query and schema comprehension capability of a system, but they fail to include questions that require complex reasoning and integration of information due to the constraint of the associated short-form answers.

Question Answering Semantic Parsing +1

Causal-aware Safe Policy Improvement for Task-oriented dialogue

1 code implementation10 Mar 2021 Govardana Sachithanandam Ramachandran, Kazuma Hashimoto, Caiming Xiong

This method gives guarantees on dialogue policy's performance and also learns to shape rewards according to intentions behind human responses, rather than just mimicking demonstration data; this couple with batch-RL helps overall with sample efficiency of the framework.

Dialogue Management Text Generation

Structured Scene Memory for Vision-Language Navigation

1 code implementation CVPR 2021 Hanqing Wang, Wenguan Wang, Wei Liang, Caiming Xiong, Jianbing Shen

Recently, numerous algorithms have been developed to tackle the problem of vision-language navigation (VLN), i. e., entailing an agent to navigate 3D environments through following linguistic instructions.

Decision Making Vision-Language Navigation

Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games

no code implementations NeurIPS 2021 Yu Bai, Chi Jin, Huan Wang, Caiming Xiong

Real world applications such as economics and policy making often involve solving multi-agent games with two unique features: (1) The agents are inherently asymmetric and partitioned into leaders and followers; (2) The agents have different reward functions, thus the game is general-sum.

Localized Calibration: Metrics and Recalibration

no code implementations22 Feb 2021 Rachel Luo, Aadyot Bhatnagar, Huan Wang, Caiming Xiong, Silvio Savarese, Yu Bai, Shengjia Zhao, Stefano Ermon

Probabilistic classifiers output confidence scores along with their predictions, and these confidence scores must be well-calibrated (i. e. reflect the true probability of an event) to be meaningful and useful for downstream tasks.

Decision Making

Don't Just Blame Over-parametrization for Over-confidence: Theoretical Analysis of Calibration in Binary Classification

no code implementations15 Feb 2021 Yu Bai, Song Mei, Huan Wang, Caiming Xiong

Modern machine learning models with high accuracy are often miscalibrated -- the predicted top probability does not reflect the actual accuracy, and tends to be over-confident.

Joint Energy-based Model Training for Better Calibrated Natural Language Understanding Models

1 code implementation EACL 2021 Tianxing He, Bryan McCann, Caiming Xiong, Ehsan Hosseini-Asl

In this work, we explore joint energy-based model (EBM) training during the finetuning of pretrained text encoders (e. g., Roberta) for natural language understanding (NLU) tasks.

Language Modelling Natural Language Understanding

Robustness Gym: Unifying the NLP Evaluation Landscape

2 code implementations NAACL 2021 Karan Goel, Nazneen Rajani, Jesse Vig, Samson Tan, Jason Wu, Stephan Zheng, Caiming Xiong, Mohit Bansal, Christopher Ré

Despite impressive performance on standard benchmarks, deep neural networks are often brittle when deployed in real-world systems.

Entity Linking

Noise-Robust Contrastive Learning

no code implementations1 Jan 2021 Junnan Li, Caiming Xiong, Steven Hoi

In contrast to most existing methods, we combat noise by learning robust representation.

Contrastive Learning

Improved Uncertainty Post-Calibration via Rank Preserving Transforms

no code implementations1 Jan 2021 Yu Bai, Tengyu Ma, Huan Wang, Caiming Xiong

In this paper, we propose Neural Rank Preserving Transforms (NRPT), a new post-calibration method that adjusts the output probabilities of a trained classifier using a calibrator of higher capacity, while maintaining its prediction accuracy.

Text Classification

Neural Bayes: A Generic Parameterization Method for Unsupervised Learning

no code implementations1 Jan 2021 Devansh Arpit, Huan Wang, Caiming Xiong, Richard Socher, Yoshua Bengio

Disjoint Manifold Separation: Neural Bayes allows us to formulate an objective which can optimally label samples from disjoint manifolds present in the support of a continuous distribution.

Representation Learning

Momentum Contrastive Autoencoder

no code implementations1 Jan 2021 Devansh Arpit, Aadyot Bhatnagar, Huan Wang, Caiming Xiong

Quantitatively, we show that our algorithm achieves a new state-of-the-art FID of 54. 36 on CIFAR-10, and performs competitively with existing models on CelebA in terms of FID score.

Contrastive Learning Representation Learning

ERMAS: Learning Policies Robust to Reality Gaps in Multi-Agent Simulations

no code implementations1 Jan 2021 Eric Zhao, Alexander R Trott, Caiming Xiong, Stephan Zheng

Policies for real-world multi-agent problems, such as optimal taxation, can be learned in multi-agent simulations with AI agents that emulate humans.

Meta-Learning

Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing

2 code implementations Findings of the Association for Computational Linguistics 2020 Xi Victoria Lin, Richard Socher, Caiming Xiong

We present BRIDGE, a powerful sequential architecture for modeling dependencies between natural language questions and relational databases in cross-DB semantic parsing.

Deep Attention Semantic Parsing +1

Learning from Mistakes: Using Mis-predictions as Harm Alerts in Language Pre-Training

no code implementations16 Dec 2020 Chen Xing, Wenhao Liu, Caiming Xiong

According to recent studies and our empirical observations, one possible reason is that some easy-to-fit patterns in the training data, such as frequently co-occurring word combinations, dominate and harm pre-training, making it hard for the model to fit more complex information.

CTRLsum: Towards Generic Controllable Text Summarization

1 code implementation8 Dec 2020 Junxian He, Wojciech Kryściński, Bryan McCann, Nazneen Rajani, Caiming Xiong

Our approach enables users to control multiple aspects of generated summaries by interacting with the summarization system through textual input in the form of a set of keywords or descriptive prompts.

Reading Comprehension Text Summarization

GAEA: Graph Augmentation for Equitable Access via Reinforcement Learning

1 code implementation7 Dec 2020 Govardana Sachithanandam Ramachandran, Ivan Brugere, Lav R. Varshney, Caiming Xiong

Similarly, social networks within universities and organizations may enable certain groups to more easily access people with valuable information or influence.

Adapt-and-Adjust: Overcoming the Long-Tail Problem of Multilingual Speech Recognition

no code implementations3 Dec 2020 Genta Indra Winata, Guangsen Wang, Caiming Xiong, Steven Hoi

One crucial challenge of real-world multilingual speech recognition is the long-tailed distribution problem, where some resource-rich languages like English have abundant training data, but a long tail of low-resource languages have varying amounts of limited training data.

Multi-Task Learning Speech Recognition

What's New? Summarizing Contributions in Scientific Literature

no code implementations6 Nov 2020 Hiroaki Hayashi, Wojciech Kryściński, Bryan McCann, Nazneen Rajani, Caiming Xiong

To overcome this problem, we introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work, making it easier to identify the key findings shared in articles.

Probing Task-Oriented Dialogue Representation from Language Models

no code implementations EMNLP 2020 Chien-Sheng Wu, Caiming Xiong

This paper investigates pre-trained language models to find out which model intrinsically carries the most informative representation for task-oriented dialogue tasks.

Language Modelling Model Selection

CoCo: Controllable Counterfactuals for Evaluating Dialogue State Trackers

1 code implementation ICLR 2021 Shiyang Li, Semih Yavuz, Kazuma Hashimoto, Jia Li, Tong Niu, Nazneen Rajani, Xifeng Yan, Yingbo Zhou, Caiming Xiong

Dialogue state trackers have made significant progress on benchmark datasets, but their generalization capability to novel and realistic scenarios beyond the held-out conversations is less understood.

Ranked #2 on Multi-domain Dialogue State Tracking on MULTIWOZ 2.1 (using extra training data)

Dialogue State Tracking Multi-domain Dialogue State Tracking

Unsupervised Paraphrasing with Pretrained Language Models

no code implementations EMNLP 2021 Tong Niu, Semih Yavuz, Yingbo Zhou, Nitish Shirish Keskar, Huan Wang, Caiming Xiong

To enforce a surface form dissimilar from the input, whenever the language model emits a token contained in the source sequence, DB prevents the model from outputting the subsequent source token for the next generation step.

Language Modelling Paraphrase Generation +1

Online Structured Meta-learning

no code implementations NeurIPS 2020 Huaxiu Yao, Yingbo Zhou, Mehrdad Mahdavi, Zhenhui Li, Richard Socher, Caiming Xiong

When a new task is encountered, it constructs a meta-knowledge pathway by either utilizing the most relevant knowledge blocks or exploring new blocks.

Meta-Learning

Explaining and Improving Model Behavior with k Nearest Neighbor Representations

no code implementations18 Oct 2020 Nazneen Fatema Rajani, Ben Krause, Wengpeng Yin, Tong Niu, Richard Socher, Caiming Xiong

Interpretability techniques in NLP have mainly focused on understanding individual predictions using attention visualization or gradient-based saliency maps over tokens.

Natural Language Inference

How Important is the Train-Validation Split in Meta-Learning?

no code implementations12 Oct 2020 Yu Bai, Minshuo Chen, Pan Zhou, Tuo Zhao, Jason D. Lee, Sham Kakade, Huan Wang, Caiming Xiong

A common practice in meta-learning is to perform a train-validation split (\emph{train-val method}) where the prior adapts to the task on one split of the data, and the resulting predictor is evaluated on another split.

Meta-Learning

Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning

no code implementations NeurIPS 2020 Pan Zhou, Jiashi Feng, Chao Ma, Caiming Xiong, Steven Hoi, Weinan E

The result shows that (1) the escaping time of both SGD and ADAM~depends on the Radon measure of the basin positively and the heaviness of gradient noise negatively; (2) for the same basin, SGD enjoys smaller escaping time than ADAM, mainly because (a) the geometry adaptation in ADAM~via adaptively scaling each gradient coordinate well diminishes the anisotropic structure in gradient noise and results in larger Radon measure of a basin; (b) the exponential gradient average in ADAM~smooths its gradient and leads to lighter gradient noise tails than SGD.

Representation Learning for Sequence Data with Deep Autoencoding Predictive Components

2 code implementations ICLR 2021 Junwen Bai, Weiran Wang, Yingbo Zhou, Caiming Xiong

We propose Deep Autoencoding Predictive Components (DAPC) -- a self-supervised representation learning method for sequence data, based on the intuition that useful representations of sequence data should exhibit a simple structure in the latent space.

Contrastive Learning Representation Learning +1

Universal Natural Language Processing with Limited Annotations: Try Few-shot Textual Entailment as a Start

1 code implementation EMNLP 2020 Wenpeng Yin, Nazneen Fatema Rajani, Dragomir Radev, Richard Socher, Caiming Xiong

We demonstrate that this framework enables a pretrained entailment model to work well on new entailment domains in a few-shot setting, and show its effectiveness as a unified solver for several downstream NLP tasks such as question answering and coreference resolution when the end-task annotations are limited.

Coreference Resolution Natural Language Inference +1

Discern: Discourse-Aware Entailment Reasoning Network for Conversational Machine Reading

1 code implementation EMNLP 2020 Yifan Gao, Chien-Sheng Wu, Jingjing Li, Shafiq Joty, Steven C. H. Hoi, Caiming Xiong, Irwin King, Michael R. Lyu

Based on the learned EDU and entailment representations, we either reply to the user our final decision "yes/no/irrelevant" of the initial question, or generate a follow-up question to inquiry more information.

Decision Making Discourse Segmentation +2

GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing

1 code implementation ICLR 2021 Tao Yu, Chien-Sheng Wu, Xi Victoria Lin, Bailin Wang, Yi Chern Tan, Xinyi Yang, Dragomir Radev, Richard Socher, Caiming Xiong

We present GraPPa, an effective pre-training approach for table semantic parsing that learns a compositional inductive bias in the joint representations of textual and tabular data.

Language Modelling Semantic Parsing +1

MoPro: Webly Supervised Learning with Momentum Prototypes

1 code implementation ICLR 2021 Junnan Li, Caiming Xiong, Steven C. H. Hoi

We propose momentum prototypes (MoPro), a simple contrastive learning method that achieves online label noise correction, out-of-distribution sample removal, and representation learning.

Contrastive Learning Image Classification +2

SummEval: Re-evaluating Summarization Evaluation

5 code implementations24 Jul 2020 Alexander R. Fabbri, Wojciech Kryściński, Bryan McCann, Caiming Xiong, Richard Socher, Dragomir Radev

The scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack of consensus regarding evaluation protocols continue to inhibit progress.

Text Summarization

Theory-Inspired Path-Regularized Differential Network Architecture Search

1 code implementation NeurIPS 2020 Pan Zhou, Caiming Xiong, Richard Socher, Steven C. H. Hoi

Then we propose a theory-inspired path-regularized DARTS that consists of two key modules: (i) a differential group-structured sparse binary gate introduced for each operation to avoid unfair competition among operations, and (ii) a path-depth-wise regularization used to incite search exploration for deep architectures that often converge slower than shallow ones as shown in our theory and are not well explored during the search.

Image Classification

BERTology Meets Biology: Interpreting Attention in Protein Language Models

2 code implementations ICLR 2021 Jesse Vig, Ali Madani, Lav R. Varshney, Caiming Xiong, Richard Socher, Nazneen Fatema Rajani

Transformer architectures have proven to learn useful representations for protein classification and generation tasks.

Towards Understanding Hierarchical Learning: Benefits of Neural Representations

no code implementations NeurIPS 2020 Minshuo Chen, Yu Bai, Jason D. Lee, Tuo Zhao, Huan Wang, Caiming Xiong, Richard Socher

When the trainable network is the quadratic Taylor model of a wide two-layer network, we show that neural representation can achieve improved sample complexities compared with the raw input: For learning a low-rank degree-$p$ polynomial ($p \geq 4$) in $d$ dimension, neural representation requires only $\tilde{O}(d^{\lceil p/2 \rceil})$ samples, while the best-known sample complexity upper bound for the raw input is $\tilde{O}(d^{p-1})$.

A High-Quality Multilingual Dataset for Structured Documentation Translation

1 code implementation WS 2019 Kazuma Hashimoto, Raffaella Buschiazzo, James Bradbury, Teresa Marshall, Richard Socher, Caiming Xiong

We build and evaluate translation models for seven target languages from English, with several different copy mechanisms and an XML-constrained beam search.

Translation

WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos

no code implementations CVPR 2021 Mingfei Gao, Yingbo Zhou, ran Xu, Richard Socher, Caiming Xiong

Online action detection in untrimmed videos aims to identify an action as it happens, which makes it very important for real-time applications.

Action Detection Action Recognition

EMT: Explicit Memory Tracker with Coarse-to-Fine Reasoning for Conversational Machine Reading

1 code implementation26 May 2020 Yifan Gao, Chien-Sheng Wu, Shafiq Joty, Caiming Xiong, Richard Socher, Irwin King, Michael R. Lyu, Steven C. H. Hoi

The goal of conversational machine reading is to answer user questions given a knowledge base text which may require asking clarification questions.

Decision Making Reading Comprehension

Prototypical Contrastive Learning of Unsupervised Representations

3 code implementations ICLR 2021 Junnan Li, Pan Zhou, Caiming Xiong, Steven C. H. Hoi

This paper presents Prototypical Contrastive Learning (PCL), an unsupervised representation learning method that addresses the fundamental limitations of instance-wise contrastive learning.

Contrastive Learning Representation Learning +3

Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation

1 code implementation ACL 2020 Tianlu Wang, Xi Victoria Lin, Nazneen Fatema Rajani, Bryan McCann, Vicente Ordonez, Caiming Xiong

Word embeddings derived from human-generated corpora inherit strong gender bias which can be further amplified by downstream models.

Word Embeddings

ESPRIT: Explaining Solutions to Physical Reasoning Tasks

2 code implementations ACL 2020 Nazneen Fatema Rajani, Rui Zhang, Yi Chern Tan, Stephan Zheng, Jeremy Weiss, Aadit Vyas, Abhijit Gupta, Caiming Xiong, Richard Socher, Dragomir Radev

Our framework learns to generate explanations of how the physical simulation will causally evolve so that an agent or a human can easily reason about a solution using those interpretable descriptions.

VD-BERT: A Unified Vision and Dialog Transformer with BERT

1 code implementation EMNLP 2020 Yue Wang, Shafiq Joty, Michael R. Lyu, Irwin King, Caiming Xiong, Steven C. H. Hoi

By contrast, in this work, we propose VD-BERT, a simple yet effective framework of unified vision-dialog Transformer that leverages the pretrained BERT language models for Visual Dialog tasks.

Visual Dialog

TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue

1 code implementation EMNLP 2020 Chien-Sheng Wu, Steven Hoi, Richard Socher, Caiming Xiong

The underlying difference of linguistic patterns between general text and task-oriented dialogue makes existing pre-trained language models less useful in practice.

Dialogue State Tracking Intent Detection +2

An investigation of phone-based subword units for end-to-end speech recognition

no code implementations8 Apr 2020 Weiran Wang, Guangsen Wang, Aadyot Bhatnagar, Yingbo Zhou, Caiming Xiong, Richard Socher

For Switchboard, our phone-based BPE system achieves 6. 8\%/14. 4\% word error rate (WER) on the Switchboard/CallHome portion of the test set while joint decoding achieves 6. 3\%/13. 3\% WER.

Speech Recognition

Towards Noise-resistant Object Detection with Noisy Annotations

no code implementations3 Mar 2020 Junnan Li, Caiming Xiong, Richard Socher, Steven Hoi

We address the challenging problem of training object detectors with noisy annotations, where the noise contains a mixture of label noise and bounding box noise.

Object Detection

Differentially Private Deep Learning with Smooth Sensitivity

no code implementations1 Mar 2020 Lichao Sun, Yingbo Zhou, Philip S. Yu, Caiming Xiong

Ensuring the privacy of sensitive data used to train modern machine learning models is of paramount importance in many areas of practice.

Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT

no code implementations27 Feb 2020 Lichao Sun, Kazuma Hashimoto, Wenpeng Yin, Akari Asai, Jia Li, Philip Yu, Caiming Xiong

There is an increasing amount of literature that claims the brittleness of deep neural networks in dealing with adversarial examples that are created maliciously.

Question Answering Sentiment Analysis

Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning

1 code implementation20 Feb 2020 Devansh Arpit, Huan Wang, Caiming Xiong, Richard Socher, Yoshua Bengio

Disjoint Manifold Labeling: Neural Bayes allows us to formulate an objective which can optimally label samples from disjoint manifolds present in the support of a continuous distribution.

Representation Learning

Taylorized Training: Towards Better Approximation of Neural Network Training at Finite Width

no code implementations10 Feb 2020 Yu Bai, Ben Krause, Huan Wang, Caiming Xiong, Richard Socher

We propose \emph{Taylorized training} as an initiative towards better understanding neural network training at finite width.

Explore, Discover and Learn: Unsupervised Discovery of State-Covering Skills

1 code implementation ICML 2020 Víctor Campos, Alexander Trott, Caiming Xiong, Richard Socher, Xavier Giro-i-Nieto, Jordi Torres

We perform an extensive evaluation of skill discovery methods on controlled environments and show that EDL offers significant advantages, such as overcoming the coverage problem, reducing the dependence of learned skills on the initial state, and allowing the user to define a prior over which behaviors should be learned.

Learning from Noisy Anchors for One-stage Object Detection

1 code implementation CVPR 2020 Hengduo Li, Zuxuan Wu, Chen Zhu, Caiming Xiong, Richard Socher, Larry S. Davis

State-of-the-art object detectors rely on regressing and classifying an extensive list of possible anchors, which are divided into positive and negative samples based on their intersection-over-union (IoU) with corresponding groundtruth objects.

General Classification Object Detection

LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition

no code implementations NeurIPS 2019 Zuxuan Wu, Caiming Xiong, Yu-Gang Jiang, Larry S. Davis

This paper presents LiteEval, a simple yet effective coarse-to-fine framework for resource efficient video recognition, suitable for both online and offline scenarios.

Video Recognition

Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering

2 code implementations ICLR 2020 Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, Caiming Xiong

Answering questions that require multi-hop reasoning at web-scale necessitates retrieving multiple evidence documents, one of which often has little lexical or semantic relationship to the question.

Question Answering

Deep Verifier Networks: Verification of Deep Discriminative Models with Deep Generative Models

no code implementations18 Nov 2019 Tong Che, Xiaofeng Liu, Site Li, Yubin Ge, Ruixiang Zhang, Caiming Xiong, Yoshua Bengio

We test the verifier network on out-of-distribution detection and adversarial example detection problems, as well as anomaly detection problems in structured prediction tasks such as image caption generation.

Anomaly Detection Autonomous Driving +2

ERASER: A Benchmark to Evaluate Rationalized NLP Models

1 code implementation ACL 2020 Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming Xiong, Richard Socher, Byron C. Wallace

We propose several metrics that aim to capture how well the rationales provided by models align with human rationales, and also how faithful these rationales are (i. e., the degree to which provided rationales influenced the corresponding predictions).

Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards

1 code implementation NeurIPS 2019 Alexander Trott, Stephan Zheng, Caiming Xiong, Richard Socher

For instance, in tasks where the agent must achieve some goal state, simple distance-to-goal reward shaping often fails, as it renders learning vulnerable to local optima.

WSLLN:Weakly Supervised Natural Language Localization Networks

no code implementations IJCNLP 2019 Mingfei Gao, Larry Davis, Richard Socher, Caiming Xiong

We propose weakly supervised language localization networks (WSLLN) to detect events in long, untrimmed videos given language queries.

BERT is Not an Interlingua and the Bias of Tokenization

1 code implementation WS 2019 Jasdeep Singh, Bryan McCann, Richard Socher, Caiming Xiong

Multilingual transfer learning can benefit both high- and low-resource languages, but the source of these improvements is not well understood.

Transfer Learning

Evaluating the Factual Consistency of Abstractive Text Summarization

3 code implementations EMNLP 2020 Wojciech Kryściński, Bryan McCann, Caiming Xiong, Richard Socher

Currently used metrics for assessing summarization algorithms do not account for whether summaries are factually consistent with source documents.

Abstractive Text Summarization Fact Checking +1

Global Capacity Measures for Deep ReLU Networks via Path Sampling

no code implementations22 Oct 2019 Ryan Theisen, Jason M. Klusowski, Huan Wang, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

Classical results on the statistical complexity of linear models have commonly identified the norm of the weights $\|w\|$ as a fundamental capacity measure.

Generalization Bounds Multi-class Classification

Predicting with High Correlation Features

2 code implementations1 Oct 2019 Devansh Arpit, Caiming Xiong, Richard Socher

In this paper, we consider distribution shift as a shift in the distribution of input features during test time that exhibit low correlation with targets in the training set.

Guided Adaptive Credit Assignment for Sample Efficient Policy Optimization

no code implementations25 Sep 2019 Hao liu, Richard Socher, Caiming Xiong

In this work, we propose a guided adaptive credit assignment method to do effectively credit assignment for policy gradient methods.

Policy Gradient Methods

Learning World Graph Decompositions To Accelerate Reinforcement Learning

no code implementations25 Sep 2019 Wenling Shang, Alex Trott, Stephan Zheng, Caiming Xiong, Richard Socher

Efficiently learning to solve tasks in complex environments is a key challenge for reinforcement learning (RL) agents.

Entropy Penalty: Towards Generalization Beyond the IID Assumption

no code implementations25 Sep 2019 Devansh Arpit, Caiming Xiong, Richard Socher

This allows deep networks trained with Entropy Penalty to generalize well even under distribution shift of spurious features.

Near-Zero-Cost Differentially Private Deep Learning with Teacher Ensembles

no code implementations25 Sep 2019 Lichao Sun, Yingbo Zhou, Jia Li, Richard Socher, Philip S. Yu, Caiming Xiong

Ensuring the privacy of sensitive data used to train modern machine learning models is of paramount importance in many areas of practice.

CTRL: A Conditional Transformer Language Model for Controllable Generation

5 code implementations Preprint 2019 Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, Richard Socher

Large-scale language models show promising text generation capabilities, but users cannot easily control particular aspects of the generated text.

Language Modelling Text Generation

Deleter: Leveraging BERT to Perform Unsupervised Successive Text Compression

no code implementations7 Sep 2019 Tong Niu, Caiming Xiong, Richard Socher

In this work, we propose a fully unsupervised model, Deleter, that is able to discover an "optimal deletion path" for an arbitrary sentence, where each intermediate sequence along the path is a coherent subsequence of the previous one.

Language Modelling Reading Comprehension +2

WSLLN: Weakly Supervised Natural Language Localization Networks

no code implementations31 Aug 2019 Mingfei Gao, Larry S. Davis, Richard Socher, Caiming Xiong

We propose weakly supervised language localization networks (WSLLN) to detect events in long, untrimmed videos given language queries.

Neural Text Summarization: A Critical Evaluation

no code implementations IJCNLP 2019 Wojciech Kryściński, Nitish Shirish Keskar, Bryan McCann, Caiming Xiong, Richard Socher

Text summarization aims at compressing long documents into a shorter form that conveys the most important parts of the original document.

Text Summarization

Learning World Graphs to Accelerate Hierarchical Reinforcement Learning

1 code implementation1 Jul 2019 Wenling Shang, Alex Trott, Stephan Zheng, Caiming Xiong, Richard Socher

We perform a thorough ablation study to evaluate our approach on a suite of challenging maze tasks, demonstrating significant advantages from the proposed framework over baselines that lack world graph knowledge in terms of performance and efficiency.

Hierarchical Reinforcement Learning

Explain Yourself! Leveraging Language Models for Commonsense Reasoning

1 code implementation ACL 2019 Nazneen Fatema Rajani, Bryan McCann, Caiming Xiong, Richard Socher

Deep learning models perform poorly on tasks that require commonsense reasoning, which often necessitates some form of world-knowledge or reasoning over information not immediately present in the input.

Common Sense Reasoning

Private Deep Learning with Teacher Ensembles

no code implementations5 Jun 2019 Lichao Sun, Yingbo Zhou, Ji Wang, Jia Li, Richard Sochar, Philip S. Yu, Caiming Xiong

Privacy-preserving deep learning is crucial for deploying deep neural network based solutions, especially when the model works on data that contains sensitive information.

Ensemble Learning Knowledge Distillation +1

SParC: Cross-Domain Semantic Parsing in Context

5 code implementations ACL 2019 Tao Yu, Rui Zhang, Michihiro Yasunaga, Yi Chern Tan, Xi Victoria Lin, Suyi Li, Heyang Er, Irene Li, Bo Pang, Tao Chen, Emily Ji, Shreya Dixit, David Proctor, Sungrok Shim, Jonathan Kraft, Vincent Zhang, Caiming Xiong, Richard Socher, Dragomir Radev

The best model obtains an exact match accuracy of 20. 2% over all questions and less than10% over all interaction sequences, indicating that the cross-domain setting and the con-textual phenomena of the dataset present significant challenges for future research.

Semantic Parsing Text-To-Sql

On the Generalization Gap in Reparameterizable Reinforcement Learning

no code implementations29 May 2019 Huan Wang, Stephan Zheng, Caiming Xiong, Richard Socher

For this problem class, estimating the expected return is efficient and the trajectory can be computed deterministically given peripheral random variables, which enables us to study reparametrizable RL using supervised learning and transfer learning theory.

Learning Theory Transfer Learning

XLDA: Cross-Lingual Data Augmentation for Natural Language Inference and Question Answering

no code implementations ICLR 2020 Jasdeep Singh, Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

XLDA is in contrast to, and performs markedly better than, a more naive approach that aggregates examples in various languages in a way that each example is solely in one language.

Cross-Lingual Natural Language Inference Data Augmentation +3

Unifying Question Answering, Text Classification, and Regression via Span Extraction

no code implementations19 Apr 2019 Nitish Shirish Keskar, Bryan McCann, Caiming Xiong, Richard Socher

Even as pre-trained language encoders such as BERT are shared across many tasks, the output layers of question answering, text classification, and regression models are significantly different.

General Classification Multi-Task Learning +2

Learn to Grow: A Continual Structure Learning Framework for Overcoming Catastrophic Forgetting

no code implementations31 Mar 2019 Xilai Li, Yingbo Zhou, Tianfu Wu, Richard Socher, Caiming Xiong

Addressing catastrophic forgetting is one of the key challenges in continual learning where machine learning systems are trained with sequential or streaming tasks.

Continual Learning Neural Architecture Search

StartNet: Online Detection of Action Start in Untrimmed Videos

no code implementations ICCV 2019 Mingfei Gao, Mingze Xu, Larry S. Davis, Richard Socher, Caiming Xiong

We propose StartNet to address Online Detection of Action Start (ODAS) where action starts and their associated categories are detected in untrimmed, streaming videos.

Action Classification Policy Gradient Methods

The Regretful Navigation Agent for Vision-and-Language Navigation

1 code implementation CVPR 2019 (Oral) 2019 Chih-Yao Ma, Zuxuan Wu, Ghassan AlRegib, Caiming Xiong, Zsolt Kira

As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making.

Decision Making Vision and Language Navigation +2

Competitive Experience Replay

no code implementations ICLR 2019 Hao Liu, Alexander Trott, Richard Socher, Caiming Xiong

We propose a novel method called competitive experience replay, which efficiently supplements a sparse reward by placing learning in the context of an exploration competition between a pair of agents.

Coarse-grain Fine-grain Coattention Network for Multi-evidence Question Answering

no code implementations ICLR 2019 Victor Zhong, Caiming Xiong, Nitish Shirish Keskar, Richard Socher

End-to-end neural models have made significant progress in question answering, however recent studies show that these models implicitly assume that the answer and evidence appear close together in a single document.

Question Answering

Neural Abstract Style Transfer for Chinese Traditional Painting

1 code implementation8 Dec 2018 Bo Li, Caiming Xiong, Tianfu Wu, Yu Zhou, Lun Zhang, Rufeng Chu

In experiments, the proposed method shows more appealing stylized results in transferring the style of Chinese traditional painting than state-of-the-art neural style transfer methods.

Style Transfer

Interactive Agent Modeling by Learning to Probe

no code implementations1 Oct 2018 Tianmin Shu, Caiming Xiong, Ying Nian Wu, Song-Chun Zhu

In particular, the probing agent (i. e. a learner) learns to interact with the environment and with a target agent (i. e., a demonstrator) to maximize the change in the observed behaviors of that agent.

Imitation Learning

Correction Networks: Meta-Learning for Zero-Shot Learning

no code implementations27 Sep 2018 R. Lily Hu, Caiming Xiong, Richard Socher

We propose a model that learns to perform zero-shot classification using a meta-learner that is trained to produce a correction to the output of a previously trained learner.

Meta-Learning Zero-Shot Learning

Identifying Generalization Properties in Neural Networks

no code implementations ICLR 2019 Huan Wang, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

In particular, we prove that model generalization ability is related to the Hessian, the higher-order "smoothness" terms characterized by the Lipschitz constant of the Hessian, and the scales of the parameters.

Improving Abstraction in Text Summarization

no code implementations EMNLP 2018 Wojciech Kryściński, Romain Paulus, Caiming Xiong, Richard Socher

Abstractive text summarization aims to shorten long text documents into a human readable form that contains the most important facts from the original document.

Abstractive Text Summarization Language Modelling +1

Global-Locally Self-Attentive Encoder for Dialogue State Tracking

no code implementations ACL 2018 Victor Zhong, Caiming Xiong, Richard Socher

Dialogue state tracking, which estimates user goals and requests given the dialogue context, is an essential part of task-oriented dialogue systems.

Dialogue State Tracking Representation Learning +3

Augmented Cyclic Adversarial Learning for Low Resource Domain Adaptation

2 code implementations ICLR 2019 Ehsan Hosseini-Asl, Yingbo Zhou, Caiming Xiong, Richard Socher

In low-resource supervised setting, the results show that our approach improves absolute performance by 14% and 4% when adapting SVHN to MNIST and vice versa, respectively, which outperforms unsupervised domain adaptation methods that require high-resource unlabeled target domain.

Speech Recognition Unsupervised Domain Adaptation

The Natural Language Decathlon: Multitask Learning as Question Answering

5 code implementations ICLR 2019 Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

Though designed for decaNLP, MQAN also achieves state of the art results on the WikiSQL semantic parsing task in the single-task setting.

Domain Adaptation Machine Translation +10

Using Mode Connectivity for Loss Landscape Analysis

no code implementations18 Jun 2018 Akhilesh Gotmare, Nitish Shirish Keskar, Caiming Xiong, Richard Socher

Mode connectivity is a recently introduced frame- work that empirically establishes the connected- ness of minima by finding a high accuracy curve between two independently trained models.

Global-Locally Self-Attentive Dialogue State Tracker

2 code implementations19 May 2018 Victor Zhong, Caiming Xiong, Richard Socher

Dialogue state tracking, which estimates user goals and requests given the dialogue context, is an essential part of task-oriented dialogue systems.

Dialogue State Tracking Multi-domain Dialogue State Tracking +1

A Multi-Discriminator CycleGAN for Unsupervised Non-Parallel Speech Domain Adaptation

no code implementations27 Mar 2018 Ehsan Hosseini-Asl, Yingbo Zhou, Caiming Xiong, Richard Socher

Domain adaptation plays an important role for speech recognition models, in particular, for domains that have low resources.

Domain Adaptation Speech Recognition

Interpretable Counting for Visual Question Answering

no code implementations ICLR 2018 Alexander Trott, Caiming Xiong, Richard Socher

Questions that require counting a variety of objects in images remain a major challenge in visual question answering (VQA).

Question Answering Visual Question Answering

Hierarchical and Interpretable Skill Acquisition in Multi-task Reinforcement Learning

no code implementations ICLR 2018 Tianmin Shu, Caiming Xiong, Richard Socher

In order to help the agent learn the complex temporal dependencies necessary for the hierarchical policy, we provide it with a stochastic temporal grammar that modulates when to rely on previously learned skills and when to execute new skills.

Block-diagonal Hessian-free Optimization for Training Neural Networks

no code implementations ICLR 2018 Huishuai Zhang, Caiming Xiong, James Bradbury, Richard Socher

Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence.