no code implementations • 12 Mar 2024 • Shikhar Murty, Christopher Manning, Peter Shaw, Mandar Joshi, Kenton Lee
Unfortunately, LM agents often fail to generalize to new environments without human demonstrations.
1 code implementation • 29 Oct 2023 • Shikhar Murty, Pratyusha Sharma, Jacob Andreas, Christopher D. Manning
Recursion is a prominent feature of human language, and fundamentally challenging for self-attention due to the lack of an explicit recursive-state tracking mechanism.
no code implementations • 18 Oct 2023 • Shikhar Murty, Orr Paradise, Pratyusha Sharma
With large language models surpassing human performance on an increasing number of benchmarks, we must take a principled approach for targeted evaluation of model capabilities.
1 code implementation • 30 May 2023 • Shikhar Murty, Pratyusha Sharma, Jacob Andreas, Christopher D. Manning
When analyzing the relationship between model-internal properties and grokking, we find that optimal depth for grokking can be identified using the tree-structuredness metric of \citet{murty2023projections}.
1 code implementation • 16 Nov 2022 • Xinran Zhao, Shikhar Murty, Christopher D. Manning
While advances in pre-training have led to dramatic improvements in few-shot learning of NLP tasks, there is limited understanding of what drives successful few-shot adaptation in datasets.
1 code implementation • 7 Nov 2022 • Shikhar Murty, Christopher D. Manning, Scott Lundberg, Marco Tulio Ribeiro
Current approaches for fixing systematic problems in NLP models (e. g. regex patches, finetuning on more data) are either brittle, or labor-intensive and liable to shortcuts.
no code implementations • 2 Nov 2022 • Shikhar Murty, Pratyusha Sharma, Jacob Andreas, Christopher D. Manning
To evaluate this possibility, we describe an unsupervised and parameter-free method to \emph{functionally project} the behavior of any transformer into the space of tree-structured networks.
no code implementations • NAACL 2021 • Shikhar Murty, Tatsunori B. Hashimoto, Christopher Manning
Meta-learning promises few-shot learners that can adapt to new distributions by repurposing knowledge acquired from previous training.
2 code implementations • ACL 2020 • Shikhar Murty, Pang Wei Koh, Percy Liang
Suppose we want to specify the inductive bias that married couples typically go on honeymoons for the task of extracting pairs of spouses from text.
3 code implementations • 12 Dec 2019 • Dzmitry Bahdanau, Harm de Vries, Timothy J. O'Donnell, Shikhar Murty, Philippe Beaudoin, Yoshua Bengio, Aaron Courville
In this work, we study how systematic the generalization of such models is, that is to which extent they are capable of handling novel combinations of known linguistic constructs.
no code implementations • NAACL 2019 • Pradeep Dasigi, Matt Gardner, Shikhar Murty, Luke Zettlemoyer, Eduard Hovy
Training semantic parsers from question-answer pairs typically involves searching over an exponentially large space of logical forms, and an unguided search can easily be misled by spurious logical forms that coincidentally evaluate to the correct answer.
2 code implementations • ICLR 2019 • Dzmitry Bahdanau, Shikhar Murty, Michael Noukhovitch, Thien Huu Nguyen, Harm de Vries, Aaron Courville
Numerous models for grounded language understanding have been recently proposed, including (i) generic models that can be easily adapted to any given task and (ii) intuitively appealing modular models that require background knowledge to be instantiated.
no code implementations • CONLL 2018 • Dung Thai, Sree Harsha Ramesh, Shikhar Murty, Luke Vilnis, Andrew McCallum
Complex textual information extraction tasks are often posed as sequence labeling or \emph{shallow parsing}, where fields are extracted using local labels made consistent through probabilistic inference in a graphical model with constrained transitions.
no code implementations • ACL 2018 • Luke Vilnis, Xiang Li, Shikhar Murty, Andrew McCallum
Embedding methods which enforce a partial order or lattice structure over the concept space, such as Order Embeddings (OE) (Vendrov et al., 2016), are a natural way to model transitive relational data (e. g. entailment graphs).
no code implementations • 15 Nov 2017 • Shikhar Murty, Patrick Verga, Luke Vilnis, Andrew McCallum
We consider the challenging problem of entity typing over an extremely fine grained set of types, wherein a single mention or entity can have many simultaneous and often hierarchically-structured types.
no code implementations • 2 Aug 2017 • Dung Thai, Shikhar Murty, Trapit Bansal, Luke Vilnis, David Belanger, Andrew McCallum
In textual information extraction and other sequence labeling tasks it is now common to use recurrent neural networks (such as LSTM) to form rich embedded representations of long-term input co-occurrence patterns.
2 code implementations • 2 Jun 2017 • Prachi Jain, Shikhar Murty, Mausam, Soumen Chakrabarti
If not, what characteristics of a dataset determine the performance of MF and TF models?