no code implementations • ICLR 2019 • Xilai Li, Yingbo Zhou, Tianfu Wu, Richard Socher, Caiming Xiong
During structure learning, the model optimizes for the best structure for the current task.
no code implementations • EMNLP 2020 • Nitish Shirish Keskar, Bryan McCann, Caiming Xiong, Richard Socher
Pre-training in natural language processing makes it easier for an adversary with only query access to a victim model to reconstruct a local copy of the victim by training with gibberish input data paired with the victim{'}s labels for that data.
no code implementations • EMNLP 2020 • Semih Yavuz, Kazuma Hashimoto, Wenhao Liu, Nitish Shirish Keskar, Richard Socher, Caiming Xiong
The concept of Dialogue Act (DA) is universal across different task-oriented dialogue domains - the act of {``}request{''} carries the same speaker intention whether it is for restaurant reservation or flight booking.
no code implementations • 9 Feb 2024 • Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, Jianfeng Gao
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks, since the release of ChatGPT in November 2022.
1 code implementation • 23 Mar 2022 • Tian Xie, Xinyi Yang, Angela S. Lin, Feihong Wu, Kazuma Hashimoto, Jin Qu, Young Mo Kang, Wenpeng Yin, Huan Wang, Semih Yavuz, Gang Wu, Michael Jones, Richard Socher, Yingbo Zhou, Wenhao Liu, Caiming Xiong
At the core of the struggle is the need to script every single turn of interactions between the bot and the human user.
1 code implementation • 5 Aug 2021 • Stephan Zheng, Alexander Trott, Sunil Srinivasa, David C. Parkes, Richard Socher
Here we show that machine-learning-based economic simulation is a powerful policy and mechanism design framework to overcome these limitations.
1 code implementation • NeurIPS 2021 • Ryan Theisen, Huan Wang, Lav R. Varshney, Caiming Xiong, Richard Socher
Moreover, we show that by varying the temperature of the learned flow models, we can generate synthetic datasets that closely resemble standard benchmark datasets, but with almost any desired Bayes error.
no code implementations • 1 Jan 2021 • Devansh Arpit, Huan Wang, Caiming Xiong, Richard Socher, Yoshua Bengio
Disjoint Manifold Separation: Neural Bayes allows us to formulate an objective which can optimally label samples from disjoint manifolds present in the support of a continuous distribution.
no code implementations • 28 Dec 2020 • Stanislaw Jastrzebski, Devansh Arpit, Oliver Astrand, Giancarlo Kerg, Huan Wang, Caiming Xiong, Richard Socher, Kyunghyun Cho, Krzysztof Geras
The early phase of training a deep neural network has a dramatic effect on the local curvature of the loss function.
2 code implementations • Findings of the Association for Computational Linguistics 2020 • Xi Victoria Lin, Richard Socher, Caiming Xiong
We present BRIDGE, a powerful sequential architecture for modeling dependencies between natural language questions and relational databases in cross-DB semantic parsing.
1 code implementation • EMNLP 2020 • Jian-Guo Zhang, Kazuma Hashimoto, Wenhao Liu, Chien-Sheng Wu, Yao Wan, Philip S. Yu, Richard Socher, Caiming Xiong
Intent detection is one of the core components of goal-oriented dialog systems, and detecting out-of-scope (OOS) intents is also a practically important skill.
no code implementations • NeurIPS 2020 • Huaxiu Yao, Yingbo Zhou, Mehrdad Mahdavi, Zhenhui Li, Richard Socher, Caiming Xiong
When a new task is encountered, it constructs a meta-knowledge pathway by either utilizing the most relevant knowledge blocks or exploring new blocks.
no code implementations • 18 Oct 2020 • Nazneen Fatema Rajani, Ben Krause, Wengpeng Yin, Tong Niu, Richard Socher, Caiming Xiong
Interpretability techniques in NLP have mainly focused on understanding individual predictions using attention visualization or gradient-based saliency maps over tokens.
no code implementations • 14 Oct 2020 • Lav R. Varshney, Nazneen Fatema Rajani, Richard Socher
Human creativity is often described as the mental process of combining associative elements into a new form, but emerging computational creativity algorithms may not operate in this manner.
1 code implementation • EMNLP 2020 • Wenpeng Yin, Nazneen Fatema Rajani, Dragomir Radev, Richard Socher, Caiming Xiong
We demonstrate that this framework enables a pretrained entailment model to work well on new entailment domains in a few-shot setting, and show its effectiveness as a unified solver for several downstream NLP tasks such as question answering and coreference resolution when the end-task annotations are limited.
1 code implementation • ICLR 2021 • Tao Yu, Chien-Sheng Wu, Xi Victoria Lin, Bailin Wang, Yi Chern Tan, Xinyi Yang, Dragomir Radev, Richard Socher, Caiming Xiong
We present GraPPa, an effective pre-training approach for table semantic parsing that learns a compositional inductive bias in the joint representations of textual and tabular data.
Ranked #8 on Semantic Parsing on spider
no code implementations • Findings of the Association for Computational Linguistics 2020 • Congying Xia, Caiming Xiong, Philip Yu, Richard Socher
In this paper, we focus on generating training examples for few-shot intents in the realistic imbalanced scenario.
3 code implementations • Findings (EMNLP) 2021 • Ben Krause, Akhilesh Deepak Gotmare, Bryan McCann, Nitish Shirish Keskar, Shafiq Joty, Richard Socher, Nazneen Fatema Rajani
While large-scale language models (LMs) are able to imitate the distribution of natural language well enough to generate realistic text, it is difficult to control which regions of the distribution they generate.
no code implementations • 9 Sep 2020 • Christopher Liu, Laura Dominé, Kevin Chavez, Richard Socher
Machine translation tools do not yet exist for the Yup'ik language, a polysynthetic language spoken by around 8, 000 people who live primarily in Southwest Alaska.
no code implementations • ACL 2020 • Jichuan Zeng, Xi Victoria Lin, Caiming Xiong, Richard Socher, Michael R. Lyu, Irwin King, Steven C. H. Hoi
Natural language interfaces to databases (NLIDB) democratize end user access to relational data.
6 code implementations • 24 Jul 2020 • Alexander R. Fabbri, Wojciech Kryściński, Bryan McCann, Caiming Xiong, Richard Socher, Dragomir Radev
The scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack of consensus regarding evaluation protocols continue to inhibit progress.
2 code implementations • NAACL 2021 • Linyong Nan, Dragomir Radev, Rui Zhang, Amrit Rau, Abhinand Sivaprasad, Chiachun Hsieh, Xiangru Tang, Aadit Vyas, Neha Verma, Pranav Krishna, Yangxiaokang Liu, Nadia Irwanto, Jessica Pan, Faiaz Rahman, Ahmad Zaidi, Mutethia Mutuma, Yasin Tarabar, Ankit Gupta, Tao Yu, Yi Chern Tan, Xi Victoria Lin, Caiming Xiong, Richard Socher, Nazneen Fatema Rajani
Data-to-Text annotations can be a costly process, especially when dealing with tables which are the major source of structured data and contain nontrivial structures.
1 code implementation • ACL 2020 • Yifan Gao, Chien-Sheng Wu, Shafiq Joty, Caiming Xiong, Richard Socher, Irwin King, Michael Lyu, Steven C. H. Hoi
The goal of conversational machine reading is to answer user questions given a knowledge base text which may require asking clarification questions.
1 code implementation • NeurIPS 2020 • Pan Zhou, Caiming Xiong, Richard Socher, Steven C. H. Hoi
Then we propose a theory-inspired path-regularized DARTS that consists of two key modules: (i) a differential group-structured sparse binary gate introduced for each operation to avoid unfair competition among operations, and (ii) a path-depth-wise regularization used to incite search exploration for deep architectures that often converge slower than shallow ones as shown in our theory and are not well explored during the search.
2 code implementations • ICLR 2021 • Jesse Vig, Ali Madani, Lav R. Varshney, Caiming Xiong, Richard Socher, Nazneen Fatema Rajani
Transformer architectures have proven to learn useful representations for protein classification and generation tasks.
no code implementations • NeurIPS 2020 • Minshuo Chen, Yu Bai, Jason D. Lee, Tuo Zhao, Huan Wang, Caiming Xiong, Richard Socher
When the trainable network is the quadratic Taylor model of a wide two-layer network, we show that neural representation can achieve improved sample complexities compared with the raw input: For learning a low-rank degree-$p$ polynomial ($p \geq 4$) in $d$ dimension, neural representation requires only $\tilde{O}(d^{\lceil p/2 \rceil})$ samples, while the best-known sample complexity upper bound for the raw input is $\tilde{O}(d^{p-1})$.
1 code implementation • WS 2019 • Kazuma Hashimoto, Raffaella Buschiazzo, James Bradbury, Teresa Marshall, Richard Socher, Caiming Xiong
We build and evaluate translation models for seven target languages from English, with several different copy mechanisms and an XML-constrained beam search.
no code implementations • 17 Jun 2020 • Andre Esteva, Anuprit Kale, Romain Paulus, Kazuma Hashimoto, Wenpeng Yin, Dragomir Radev, Richard Socher
The COVID-19 global pandemic has resulted in international efforts to understand, track, and mitigate the disease, yielding a significant corpus of COVID-19 and SARS-CoV-2-related publications across scientific disciplines.
no code implementations • CVPR 2021 • Mingfei Gao, Yingbo Zhou, ran Xu, Richard Socher, Caiming Xiong
Online action detection in untrimmed videos aims to identify an action as it happens, which makes it very important for real-time applications.
Ranked #5 on Online Action Detection on THUMOS'14
1 code implementation • 26 May 2020 • Yifan Gao, Chien-Sheng Wu, Shafiq Joty, Caiming Xiong, Richard Socher, Irwin King, Michael R. Lyu, Steven C. H. Hoi
The goal of conversational machine reading is to answer user questions given a knowledge base text which may require asking clarification questions.
1 code implementation • ACL 2020 • Samson Tan, Shafiq Joty, Min-Yen Kan, Richard Socher
Training on only perfect Standard English corpora predisposes pre-trained neural networks to discriminate against minorities from non-standard linguistic backgrounds (e. g., African American Vernacular English, Colloquial Singapore English, etc.).
2 code implementations • ACL 2020 • Nazneen Fatema Rajani, Rui Zhang, Yi Chern Tan, Stephan Zheng, Jeremy Weiss, Aadit Vyas, Abhijit Gupta, Caiming Xiong, Richard Socher, Dragomir Radev
Our framework learns to generate explanations of how the physical simulation will causally evolve so that an agent or a human can easily reason about a solution using those interpretable descriptions.
1 code implementation • NeurIPS 2020 • Ehsan Hosseini-Asl, Bryan McCann, Chien-Sheng Wu, Semih Yavuz, Richard Socher
Task-oriented dialogue is often decomposed into three tasks: understanding user input, deciding actions, and generating a response.
Ranked #2 on Response Generation on MMConv
2 code implementations • 28 Apr 2020 • Stephan Zheng, Alexander Trott, Sunil Srinivasa, Nikhil Naik, Melvin Gruesbeck, David C. Parkes, Richard Socher
In experiments conducted on MTurk, an AI tax policy provides an equality-productivity trade-off that is similar to that provided by the Saez framework along with higher inverse-income weighted social welfare.
1 code implementation • EMNLP 2020 • Chien-Sheng Wu, Steven Hoi, Richard Socher, Caiming Xiong
The underlying difference of linguistic patterns between general text and task-oriented dialogue makes existing pre-trained language models less useful in practice.
no code implementations • 8 Apr 2020 • Weiran Wang, Guangsen Wang, Aadyot Bhatnagar, Yingbo Zhou, Caiming Xiong, Richard Socher
For Switchboard, our phone-based BPE system achieves 6. 8\%/14. 4\% word error rate (WER) on the Switchboard/CallHome portion of the test set while joint decoding achieves 6. 3\%/13. 3\% WER.
no code implementations • 30 Mar 2020 • Isabela Albuquerque, Nikhil Naik, Junnan Li, Nitish Keskar, Richard Socher
Self-supervised feature representations have been shown to be useful for supervised classification, few-shot learning, and adversarial robustness.
Ranked #126 on Domain Generalization on PACS
2 code implementations • 8 Mar 2020 • Ali Madani, Bryan McCann, Nikhil Naik, Nitish Shirish Keskar, Namrata Anand, Raphael R. Eguchi, Po-Ssu Huang, Richard Socher
Generative modeling for protein engineering is key to solving fundamental problems in synthetic biology, medicine, and material science.
no code implementations • 3 Mar 2020 • Junnan Li, Caiming Xiong, Richard Socher, Steven Hoi
We address the challenging problem of training object detectors with noisy annotations, where the noise contains a mixture of label noise and bounding box noise.
1 code implementation • 20 Feb 2020 • Devansh Arpit, Huan Wang, Caiming Xiong, Richard Socher, Yoshua Bengio
Disjoint Manifold Labeling: Neural Bayes allows us to formulate an objective which can optimally label samples from disjoint manifolds present in the support of a continuous distribution.
no code implementations • ICLR 2020 • Xuan-Phi Nguyen, Shafiq Joty, Steven C. H. Hoi, Richard Socher
Incorporating hierarchical structures like constituency trees has been shown to be effective for various natural language processing (NLP) tasks.
1 code implementation • ICLR 2020 • Hung Le, Richard Socher, Steven C. H. Hoi
Recent efforts in Dialogue State Tracking (DST) for task-oriented dialogues have progressed toward open-vocabulary or generation-based approaches where the models can generate slot value candidates from the dialogue history itself.
Ranked #13 on Multi-domain Dialogue State Tracking on MULTIWOZ 2.0
2 code implementations • ICLR 2020 • Junnan Li, Richard Socher, Steven C. H. Hoi
Two prominent directions include learning with noisy labels and semi-supervised learning by exploiting unlabeled data.
Ranked #4 on Learning with noisy labels on CIFAR-100N
1 code implementation • ICML 2020 • Víctor Campos, Alexander Trott, Caiming Xiong, Richard Socher, Xavier Giro-i-Nieto, Jordi Torres
We perform an extensive evaluation of skill discovery methods on controlled environments and show that EDL offers significant advantages, such as overcoming the coverage problem, reducing the dependence of learned skills on the initial state, and allowing the user to define a prior over which behaviors should be learned.
no code implementations • 10 Feb 2020 • Yu Bai, Ben Krause, Huan Wang, Caiming Xiong, Richard Socher
We propose \emph{Taylorized training} as an initiative towards better understanding neural network training at finite width.
no code implementations • 9 Feb 2020 • Lav R. Varshney, Nitish Shirish Keskar, Richard Socher
Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns.
1 code implementation • CVPR 2020 • Hengduo Li, Zuxuan Wu, Chen Zhu, Caiming Xiong, Richard Socher, Larry S. Davis
State-of-the-art object detectors rely on regressing and classifying an extensive list of possible anchors, which are divided into positive and negative samples based on their intersection-over-union (IoU) with corresponding groundtruth objects.
2 code implementations • ICLR 2020 • Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, Caiming Xiong
Answering questions that require multi-hop reasoning at web-scale necessitates retrieving multiple evidence documents, one of which often has little lexical or semantic relationship to the question.
Ranked #26 on Question Answering on HotpotQA
no code implementations • 9 Nov 2019 • Linqing Liu, Huan Wang, Jimmy Lin, Richard Socher, Caiming Xiong
Our approach is model agnostic and can be easily applied on different future teacher model architectures.
2 code implementations • ACL 2020 • Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming Xiong, Richard Socher, Byron C. Wallace
We propose several metrics that aim to capture how well the rationales provided by models align with human rationales, and also how faithful these rationales are (i. e., the degree to which provided rationales influenced the corresponding predictions).
1 code implementation • NeurIPS 2019 • Alexander Trott, Stephan Zheng, Caiming Xiong, Richard Socher
For instance, in tasks where the agent must achieve some goal state, simple distance-to-goal reward shaping often fails, as it renders learning vulnerable to local optima.
no code implementations • IJCNLP 2019 • Mingfei Gao, Larry Davis, Richard Socher, Caiming Xiong
We propose weakly supervised language localization networks (WSLLN) to detect events in long, untrimmed videos given language queries.
1 code implementation • WS 2019 • Jasdeep Singh, Bryan McCann, Richard Socher, Caiming Xiong
Multilingual transfer learning can benefit both high- and low-resource languages, but the source of these improvements is not well understood.
no code implementations • WS 2020 • Michael Shum, Stephan Zheng, Wojciech Kryściński, Caiming Xiong, Richard Socher
Human-like chit-chat conversation requires agents to generate responses that are fluent, engaging and consistent.
4 code implementations • EMNLP 2020 • Wojciech Kryściński, Bryan McCann, Caiming Xiong, Richard Socher
Currently used metrics for assessing summarization algorithms do not account for whether summaries are factually consistent with source documents.
no code implementations • 22 Oct 2019 • Ryan Theisen, Jason M. Klusowski, Huan Wang, Nitish Shirish Keskar, Caiming Xiong, Richard Socher
Classical results on the statistical complexity of linear models have commonly identified the norm of the weights $\|w\|$ as a fundamental capacity measure.
1 code implementation • Joint Conference on Lexical and Computational Semantics 2020 • Jian-Guo Zhang, Kazuma Hashimoto, Chien-Sheng Wu, Yao Wan, Philip S. Yu, Richard Socher, Caiming Xiong
Dialog state tracking (DST) is a core component in task-oriented dialog systems.
Ranked #4 on Multi-domain Dialogue State Tracking on MULTIWOZ 2.0
dialog state tracking Multi-domain Dialogue State Tracking +1
2 code implementations • 1 Oct 2019 • Devansh Arpit, Caiming Xiong, Richard Socher
In this paper, we consider distribution shift as a shift in the distribution of input features during test time that exhibit low correlation with targets in the training set.
no code implementations • 25 Sep 2019 • Devansh Arpit, Caiming Xiong, Richard Socher
This allows deep networks trained with Entropy Penalty to generalize well even under distribution shift of spurious features.
no code implementations • NeurIPS Workshop DL-IG 2020 • Peiliang Zhang, Huan Wang, Nikhil Naik, Caiming Xiong, Richard Socher
Empirically, we estimate this lower bound using a neural network to compute DIME.
no code implementations • 25 Sep 2019 • Hao liu, Richard Socher, Caiming Xiong
In this work, we propose a guided adaptive credit assignment method to do effectively credit assignment for policy gradient methods.
no code implementations • 25 Sep 2019 • Wenling Shang, Alex Trott, Stephan Zheng, Caiming Xiong, Richard Socher
Efficiently learning to solve tasks in complex environments is a key challenge for reinforcement learning (RL) agents.
no code implementations • 25 Sep 2019 • Lichao Sun, Yingbo Zhou, Jia Li, Richard Socher, Philip S. Yu, Caiming Xiong
Ensuring the privacy of sensitive data used to train modern machine learning models is of paramount importance in many areas of practice.
3 code implementations • IJCNLP 2019 • Tao Yu, Rui Zhang, He Yang Er, Suyi Li, Eric Xue, Bo Pang, Xi Victoria Lin, Yi Chern Tan, Tianze Shi, Zihan Li, Youxuan Jiang, Michihiro Yasunaga, Sungrok Shim, Tao Chen, Alexander Fabbri, Zifan Li, Luyao Chen, Yuwen Zhang, Shreya Dixit, Vincent Zhang, Caiming Xiong, Richard Socher, Walter S. Lasecki, Dragomir Radev
We present CoSQL, a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems.
Ranked #8 on Dialogue State Tracking on CoSQL
8 code implementations • Preprint 2019 • Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, Richard Socher
Large-scale language models show promising text generation capabilities, but users cannot easily control particular aspects of the generated text.
no code implementations • 7 Sep 2019 • Tong Niu, Caiming Xiong, Richard Socher
In this work, we propose a fully unsupervised model, Deleter, that is able to discover an "optimal deletion path" for an arbitrary sentence, where each intermediate sequence along the path is a coherent subsequence of the previous one.
no code implementations • 7 Sep 2019 • Lav R. Varshney, Nitish Shirish Keskar, Richard Socher
The paradigm of pretrained deep learning models has recently emerged in artificial intelligence practice, allowing deployment in numerous societal settings with limited computational resources, but also embedding biases and enabling unintended negative uses.
3 code implementations • IJCNLP 2019 • Rui Zhang, Tao Yu, He Yang Er, Sungrok Shim, Eric Xue, Xi Victoria Lin, Tianze Shi, Caiming Xiong, Richard Socher, Dragomir Radev
We focus on the cross-domain context-dependent text-to-SQL generation task.
Ranked #5 on Text-To-SQL on SParC
no code implementations • 31 Aug 2019 • Mingfei Gao, Larry S. Davis, Richard Socher, Caiming Xiong
We propose weakly supervised language localization networks (WSLLN) to detect events in long, untrimmed videos given language queries.
no code implementations • IJCNLP 2019 • Wojciech Kryściński, Nitish Shirish Keskar, Bryan McCann, Caiming Xiong, Richard Socher
Text summarization aims at compressing long documents into a shorter form that conveys the most important parts of the original document.
no code implementations • 1 Jul 2019 • Wenling Shang, Alex Trott, Stephan Zheng, Caiming Xiong, Richard Socher
We perform a thorough ablation study to evaluate our approach on a suite of challenging maze tasks, demonstrating significant advantages from the proposed framework over baselines that lack world graph knowledge in terms of performance and efficiency.
Hierarchical Reinforcement Learning reinforcement-learning +2
1 code implementation • ACL 2019 • Nazneen Fatema Rajani, Bryan McCann, Caiming Xiong, Richard Socher
Deep learning models perform poorly on tasks that require commonsense reasoning, which often necessitates some form of world-knowledge or reasoning over information not immediately present in the input.
Ranked #23 on Common Sense Reasoning on CommonsenseQA
4 code implementations • ACL 2019 • Tao Yu, Rui Zhang, Michihiro Yasunaga, Yi Chern Tan, Xi Victoria Lin, Suyi Li, Heyang Er, Irene Li, Bo Pang, Tao Chen, Emily Ji, Shreya Dixit, David Proctor, Sungrok Shim, Jonathan Kraft, Vincent Zhang, Caiming Xiong, Richard Socher, Dragomir Radev
The best model obtains an exact match accuracy of 20. 2% over all questions and less than10% over all interaction sequences, indicating that the cross-domain setting and the con-textual phenomena of the dataset present significant challenges for future research.
no code implementations • 29 May 2019 • Huan Wang, Stephan Zheng, Caiming Xiong, Richard Socher
For this problem class, estimating the expected return is efficient and the trajectory can be computed deterministically given peripheral random variables, which enables us to study reparametrizable RL using supervised learning and transfer learning theory.
no code implementations • ICLR 2020 • Jasdeep Singh, Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, Richard Socher
XLDA is in contrast to, and performs markedly better than, a more naive approach that aggregates examples in various languages in a way that each example is solely in one language.
Cross-Lingual Natural Language Inference Data Augmentation +3
2 code implementations • ACL 2019 • Chien-Sheng Wu, Andrea Madotto, Ehsan Hosseini-Asl, Caiming Xiong, Richard Socher, Pascale Fung
Over-dependence on domain ontology and lack of knowledge sharing across domains are two practical and yet less studied problems of dialogue state tracking.
Ranked #15 on Multi-domain Dialogue State Tracking on MULTIWOZ 2.0
Dialogue State Tracking Multi-domain Dialogue State Tracking +2
no code implementations • 19 Apr 2019 • Nitish Shirish Keskar, Bryan McCann, Caiming Xiong, Richard Socher
Even as pre-trained language encoders such as BERT are shared across many tasks, the output layers of question answering, text classification, and regression models are significantly different.
1 code implementation • 18 Apr 2019 • Giovanni Campagna, Silei Xu, Mehrad Moradshahi, Richard Socher, Monica S. Lam
We advocate formalizing the capability of virtual assistants with a Virtual Assistant Programming Language (VAPL) and using a neural semantic parser to translate natural language into VAPL code.
no code implementations • 31 Mar 2019 • Xilai Li, Yingbo Zhou, Tianfu Wu, Richard Socher, Caiming Xiong
Addressing catastrophic forgetting is one of the key challenges in continual learning where machine learning systems are trained with sequential or streaming tasks.
no code implementations • ICCV 2019 • Mingfei Gao, Mingze Xu, Larry S. Davis, Richard Socher, Caiming Xiong
We propose StartNet to address Online Detection of Action Start (ODAS) where action starts and their associated categories are detected in untrimmed, streaming videos.
no code implementations • ICLR 2019 • Hao Liu, Alexander Trott, Richard Socher, Caiming Xiong
We propose a novel method called competitive experience replay, which efficiently supplements a sparse reward by placing learning in the context of an exploration competition between a pair of agents.
4 code implementations • ICLR 2019 • Chien-Sheng Wu, Richard Socher, Caiming Xiong
In our model, a global memory encoder and a local memory decoder are proposed to share external knowledge.
Ranked #4 on Task-Oriented Dialogue Systems on KVRET
2 code implementations • ICLR 2019 • Chih-Yao Ma, Jiasen Lu, Zuxuan Wu, Ghassan AlRegib, Zsolt Kira, Richard Socher, Caiming Xiong
The Vision-and-Language Navigation (VLN) task entails an agent following navigational instruction in photo-realistic unknown environments.
Ranked #115 on Vision and Language Navigation on VLN Challenge
Natural Language Visual Grounding Vision and Language Navigation +2
no code implementations • ICLR 2019 • Victor Zhong, Caiming Xiong, Nitish Shirish Keskar, Richard Socher
End-to-end neural models have made significant progress in question answering, however recent studies show that these models implicitly assume that the answer and evidence appear close together in a single document.
Ranked #5 on Question Answering on WikiHop
no code implementations • CVPR 2019 • Zuxuan Wu, Caiming Xiong, Chih-Yao Ma, Richard Socher, Larry S. Davis
We present AdaFrame, a framework that adaptively selects relevant frames on a per-input basis for fast video recognition.
no code implementations • ICLR 2019 • Akhilesh Gotmare, Nitish Shirish Keskar, Caiming Xiong, Richard Socher
In particular, we explore knowledge distillation and learning rate heuristics of (cosine) restarts and warmup using mode connectivity and CCA.
no code implementations • 27 Sep 2018 • R. Lily Hu, Caiming Xiong, Richard Socher
We propose a model that learns to perform zero-shot classification using a meta-learner that is trained to produce a correction to the output of a previously trained learner.
no code implementations • ICLR 2019 • Huan Wang, Nitish Shirish Keskar, Caiming Xiong, Richard Socher
In particular, we prove that model generalization ability is related to the Hessian, the higher-order "smoothness" terms characterized by the Lipschitz constant of the Hessian, and the scales of the parameters.
3 code implementations • EMNLP 2018 • Xi Victoria Lin, Richard Socher, Caiming Xiong
Multi-hop reasoning is an effective approach for query answering (QA) over incomplete knowledge graphs (KGs).
no code implementations • EMNLP 2018 • Wojciech Kryściński, Romain Paulus, Caiming Xiong, Richard Socher
Abstractive text summarization aims to shorten long text documents into a human readable form that contains the most important facts from the original document.
Ranked #4 on Text Summarization on CNN / Daily Mail (Anonymized)
no code implementations • ACL 2018 • Victor Zhong, Caiming Xiong, Richard Socher
Dialogue state tracking, which estimates user goals and requests given the dialogue context, is an essential part of task-oriented dialogue systems.
Automatic Speech Recognition (ASR) Dialogue State Tracking +3
2 code implementations • ICLR 2019 • Ehsan Hosseini-Asl, Yingbo Zhou, Caiming Xiong, Richard Socher
In low-resource supervised setting, the results show that our approach improves absolute performance by 14% and 4% when adapting SVHN to MNIST and vice versa, respectively, which outperforms unsupervised domain adaptation methods that require high-resource unlabeled target domain.
6 code implementations • ICLR 2019 • Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, Richard Socher
Though designed for decaNLP, MQAN also achieves state of the art results on the WikiSQL semantic parsing task in the single-task setting.
no code implementations • 18 Jun 2018 • Akhilesh Gotmare, Nitish Shirish Keskar, Caiming Xiong, Richard Socher
Mode connectivity is a recently introduced frame- work that empirically establishes the connected- ness of minima by finding a high accuracy curve between two independently trained models.
1 code implementation • ACL 2018 • Sewon Min, Victor Zhong, Richard Socher, Caiming Xiong
Neural models for question answering (QA) over documents have achieved significant performance improvements.
Ranked #3 on Question Answering on NewsQA
2 code implementations • 19 May 2018 • Victor Zhong, Caiming Xiong, Richard Socher
Dialogue state tracking, which estimates user goals and requests given the dialogue context, is an essential part of task-oriented dialogue systems.
Dialogue State Tracking Multi-domain Dialogue State Tracking +1
1 code implementation • CVPR 2018 • Luowei Zhou, Yingbo Zhou, Jason J. Corso, Richard Socher, Caiming Xiong
To address this problem, we propose an end-to-end transformer model for dense video captioning.
Ranked #12 on Video Captioning on YouCook2
no code implementations • 27 Mar 2018 • Ehsan Hosseini-Asl, Yingbo Zhou, Caiming Xiong, Richard Socher
Domain adaptation plays an important role for speech recognition models, in particular, for domains that have low resources.
12 code implementations • 22 Mar 2018 • Stephen Merity, Nitish Shirish Keskar, Richard Socher
Many of the leading approaches in language modeling introduce novel, complex and specialized architectures.
1 code implementation • 22 Mar 2018 • Eric Zelikman, Richard Socher
We introduce contextual salience (CoSal), a measure of word importance that uses the distribution of context vectors to normalize distances and weights.
no code implementations • ICLR 2018 • Alexander Trott, Caiming Xiong, Richard Socher
Questions that require counting a variety of objects in images remain a major challenge in visual question answering (VQA).
6 code implementations • 20 Dec 2017 • Nitish Shirish Keskar, Richard Socher
Concretely, we propose SWATS, a simple strategy which switches from Adam to SGD when a triggering condition is satisfied.
no code implementations • ICLR 2018 • Tianmin Shu, Caiming Xiong, Richard Socher
In order to help the agent learn the complex temporal dependencies necessary for the hierarchical policy, we provide it with a stochastic temporal grammar that modulates when to rely on previously learned skills and when to execute new skills.
no code implementations • ICLR 2018 • Martin Schrimpf, Stephen Merity, James Bradbury, Richard Socher
The process of designing neural architectures requires expert knowledge and extensive trial and error.
no code implementations • ICLR 2018 • Huishuai Zhang, Caiming Xiong, James Bradbury, Richard Socher
Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence.
no code implementations • 19 Dec 2017 • Yingbo Zhou, Caiming Xiong, Richard Socher
We augment audio data through random perturbations of tempo, pitch, volume, temporal alignment, and adding random noise. We further investigate the effect of dropout when applied to the inputs of all layers of the network.
no code implementations • 19 Dec 2017 • Yingbo Zhou, Caiming Xiong, Richard Socher
However, there is usually a disparity between the negative maximum likelihood and the performance metric used in speech recognition, e. g., word error rate (WER).
Ranked #57 on Speech Recognition on LibriSpeech test-clean
no code implementations • WS 2017 • Alexander Rosenberg Johansen, Richard Socher
Many recent advances in deep learning for natural language processing have come at increasing computational cost, but the power of these state-of-the-art models is not needed for every example in a dataset.
2 code implementations • ICLR 2018 • Jiatao Gu, James Bradbury, Caiming Xiong, Victor O. K. Li, Richard Socher
Existing approaches to neural machine translation condition each output word on previously generated outputs.
Ranked #3 on Machine Translation on IWSLT2015 English-German
5 code implementations • ICLR 2018 • Karim Ahmed, Nitish Shirish Keskar, Richard Socher
State-of-the-art results on neural machine translation often use attentional sequence-to-sequence models with some form of convolution or recursion.
Ranked #24 on Machine Translation on WMT2014 English-French
1 code implementation • ICLR 2018 • Caiming Xiong, Victor Zhong, Richard Socher
Traditional models for question answering optimize using cross entropy loss, which encourages exact answers at the cost of penalizing nearby or overlapping answers that are sometimes equally accurate.
Ranked #28 on Question Answering on SQuAD1.1 dev
no code implementations • WS 2017 • James Bradbury, Richard Socher
Building models that take advantage of the hierarchical structure of language without a priori annotation is a longstanding goal in natural language processing.
15 code implementations • ICLR 2018 • Victor Zhong, Caiming Xiong, Richard Socher
A significant amount of the world's knowledge is stored in relational databases.
Ranked #9 on Code Generation on WikiSQL
47 code implementations • ICLR 2018 • Stephen Merity, Nitish Shirish Keskar, Richard Socher
Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language modeling, and question answering.
Ranked #17 on Language Modelling on Penn Treebank (Word Level)
no code implementations • 3 Aug 2017 • Stephen Merity, Bryan McCann, Richard Socher
Both of these techniques require minimal modification to existing RNN architectures and result in performance improvements comparable or superior to more complicated regularization techniques or custom cell architectures.
5 code implementations • NeurIPS 2017 • Bryan McCann, James Bradbury, Caiming Xiong, Richard Socher
For fine-grained sentiment analysis and entailment, CoVe improves performance of our baseline models to the state of the art.
Ranked #9 on Text Classification on TREC-6
10 code implementations • ICLR 2018 • Romain Paulus, Caiming Xiong, Richard Socher
We introduce a neural network model with a novel intra-attention that attends over the input and continuously generated output separately, and a new training method that combines standard supervised word prediction and reinforcement learning (RL).
Ranked #6 on Text Summarization on CNN / Daily Mail (Anonymized)
9 code implementations • CVPR 2017 • Jiasen Lu, Caiming Xiong, Devi Parikh, Richard Socher
The model decides whether to attend to the image and where, in order to extract meaningful information for sequential word generation.
no code implementations • 16 Nov 2016 • Shayne Longpre, Sabeek Pradhan, Caiming Xiong, Richard Socher
LSTMs have become a basic building block for many deep NLP models.
2 code implementations • EMNLP 2017 • Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, Richard Socher
Transfer and multi-task learning have traditionally focused on either a single source-target pair or very few, similar tasks.
Ranked #3 on Chunking on Penn Treebank
6 code implementations • 5 Nov 2016 • Caiming Xiong, Victor Zhong, Richard Socher
Several deep learning models have been proposed for question answering.
Ranked #2 on Open-Domain Question Answering on SQuAD1.1
8 code implementations • 5 Nov 2016 • James Bradbury, Stephen Merity, Caiming Xiong, Richard Socher
Recurrent neural networks are a powerful tool for modeling sequential data, but the dependence of each timestep's computation on the previous timestep's output limits parallelism and makes RNNs unwieldy for very long sequences.
Ranked #15 on Machine Translation on IWSLT2015 German-English
5 code implementations • 4 Nov 2016 • Hakan Inan, Khashayar Khosravi, Richard Socher
Recurrent neural networks have been very successful at predicting sequences of words in tasks such as language modeling.
Ranked #34 on Language Modelling on Penn Treebank (Word Level)
9 code implementations • 26 Sep 2016 • Stephen Merity, Caiming Xiong, James Bradbury, Richard Socher
Recent neural network sequence models with softmax classifiers have achieved their best language modeling performance only with very large hidden states and large vocabularies.
11 code implementations • 4 Mar 2016 • Caiming Xiong, Stephen Merity, Richard Socher
Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering.
Ranked #4 on Visual Question Answering (VQA) on VQA v1 test-std
11 code implementations • 24 Jun 2015 • Ankit Kumar, Ozan .Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, Richard Socher
Most tasks in natural language processing can be cast into question answering (QA) problems over language input.
Ranked #67 on Sentiment Analysis on SST-2 Binary classification
16 code implementations • IJCNLP 2015 • Kai Sheng Tai, Richard Socher, Christopher D. Manning
Because of their superior ability to preserve sequence information over time, Long Short-Term Memory (LSTM) networks, a type of recurrent neural network with a more complex computational unit, have obtained strong results on a variety of sequence modeling tasks.
Ranked #1 on Semantic Similarity on SICK
no code implementations • NeurIPS 2014 • Romain Paulus, Richard Socher, Christopher D. Manning
Recursive Neural Networks have recently obtained state of the art performance on several natural language processing tasks.
4 code implementations • EMNLP 2014 • Jeffrey Pennington, Richard Socher, Christopher Manning
Ranked #14 on Only Connect Walls Dataset Task 1 (Grouping) on OCW (using extra training data)
no code implementations • TACL 2014 • Richard Socher, Andrej Karpathy, Quoc V. Le, Christopher D. Manning, Andrew Y. Ng
Previous work on Recursive Neural Networks (RNNs) shows that these models can produce compositional feature vectors for accurately representing and classifying sentences or images.
no code implementations • NeurIPS 2013 • Richard Socher, Danqi Chen, Christopher D. Manning, Andrew Ng
We assess the model by considering the problem of predicting additional true relations between entities given a partial knowledge base.
2 code implementations • NeurIPS 2013 • Richard Socher, Milind Ganjoo, Hamsa Sridhar, Osbert Bastani, Christopher D. Manning, Andrew Y. Ng
This work introduces a model that can recognize objects in images even if no training data is available for the objects.
no code implementations • NeurIPS 2009 • Richard Socher, Samuel Gershman, Per Sederberg, Kenneth Norman, Adler J. Perotte, David M. Blei
We develop a probabilistic model of human memory performance in free recall experiments.