Search Results for author: Shikhar Murty

Found 17 papers, 8 papers with code

Pushdown Layers: Encoding Recursive Structure in Transformer Language Models

1 code implementation29 Oct 2023 Shikhar Murty, Pratyusha Sharma, Jacob Andreas, Christopher D. Manning

Recursion is a prominent feature of human language, and fundamentally challenging for self-attention due to the lack of an explicit recursive-state tracking mechanism.

text-classification Text Classification

Pseudointelligence: A Unifying Framework for Language Model Evaluation

no code implementations18 Oct 2023 Shikhar Murty, Orr Paradise, Pratyusha Sharma

With large language models surpassing human performance on an increasing number of benchmarks, we must take a principled approach for targeted evaluation of model capabilities.

Language Modelling

Grokking of Hierarchical Structure in Vanilla Transformers

1 code implementation30 May 2023 Shikhar Murty, Pratyusha Sharma, Jacob Andreas, Christopher D. Manning

When analyzing the relationship between model-internal properties and grokking, we find that optimal depth for grokking can be identified using the tree-structuredness metric of \citet{murty2023projections}.

On Measuring the Intrinsic Few-Shot Hardness of Datasets

1 code implementation16 Nov 2022 Xinran Zhao, Shikhar Murty, Christopher D. Manning

While advances in pre-training have led to dramatic improvements in few-shot learning of NLP tasks, there is limited understanding of what drives successful few-shot adaptation in datasets.

Few-Shot Learning

Fixing Model Bugs with Natural Language Patches

1 code implementation7 Nov 2022 Shikhar Murty, Christopher D. Manning, Scott Lundberg, Marco Tulio Ribeiro

Current approaches for fixing systematic problems in NLP models (e. g. regex patches, finetuning on more data) are either brittle, or labor-intensive and liable to shortcuts.

Relation Extraction Sentiment Analysis

Characterizing Intrinsic Compositionality in Transformers with Tree Projections

no code implementations2 Nov 2022 Shikhar Murty, Pratyusha Sharma, Jacob Andreas, Christopher D. Manning

To evaluate this possibility, we describe an unsupervised and parameter-free method to \emph{functionally project} the behavior of any transformer into the space of tree-structured networks.

Sentence

DReCa: A General Task Augmentation Strategy for Few-Shot Natural Language Inference

no code implementations NAACL 2021 Shikhar Murty, Tatsunori B. Hashimoto, Christopher Manning

Meta-learning promises few-shot learners that can adapt to new distributions by repurposing knowledge acquired from previous training.

Clustering Few-Shot NLI +2

ExpBERT: Representation Engineering with Natural Language Explanations

2 code implementations ACL 2020 Shikhar Murty, Pang Wei Koh, Percy Liang

Suppose we want to specify the inductive bias that married couples typically go on honeymoons for the task of extracting pairs of spouses from text.

Inductive Bias Relation Extraction +1

CLOSURE: Assessing Systematic Generalization of CLEVR Models

3 code implementations12 Dec 2019 Dzmitry Bahdanau, Harm de Vries, Timothy J. O'Donnell, Shikhar Murty, Philippe Beaudoin, Yoshua Bengio, Aaron Courville

In this work, we study how systematic the generalization of such models is, that is to which extent they are capable of handling novel combinations of known linguistic constructs.

Few-Shot Learning Systematic Generalization +1

Iterative Search for Weakly Supervised Semantic Parsing

no code implementations NAACL 2019 Pradeep Dasigi, Matt Gardner, Shikhar Murty, Luke Zettlemoyer, Eduard Hovy

Training semantic parsers from question-answer pairs typically involves searching over an exponentially large space of logical forms, and an unguided search can easily be misled by spurious logical forms that coincidentally evaluate to the correct answer.

Semantic Parsing Visual Reasoning

Systematic Generalization: What Is Required and Can It Be Learned?

2 code implementations ICLR 2019 Dzmitry Bahdanau, Shikhar Murty, Michael Noukhovitch, Thien Huu Nguyen, Harm de Vries, Aaron Courville

Numerous models for grounded language understanding have been recently proposed, including (i) generic models that can be easily adapted to any given task and (ii) intuitively appealing modular models that require background knowledge to be instantiated.

Systematic Generalization Visual Question Answering (VQA)

Embedded-State Latent Conditional Random Fields for Sequence Labeling

no code implementations CONLL 2018 Dung Thai, Sree Harsha Ramesh, Shikhar Murty, Luke Vilnis, Andrew McCallum

Complex textual information extraction tasks are often posed as sequence labeling or \emph{shallow parsing}, where fields are extracted using local labels made consistent through probabilistic inference in a graphical model with constrained transitions.

Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures

no code implementations ACL 2018 Luke Vilnis, Xiang Li, Shikhar Murty, Andrew McCallum

Embedding methods which enforce a partial order or lattice structure over the concept space, such as Order Embeddings (OE) (Vendrov et al., 2016), are a natural way to model transitive relational data (e. g. entailment graphs).

Inductive Bias Knowledge Graphs +1

Finer Grained Entity Typing with TypeNet

no code implementations15 Nov 2017 Shikhar Murty, Patrick Verga, Luke Vilnis, Andrew McCallum

We consider the challenging problem of entity typing over an extremely fine grained set of types, wherein a single mention or entity can have many simultaneous and often hierarchically-structured types.

Entity Typing

Low-Rank Hidden State Embeddings for Viterbi Sequence Labeling

no code implementations2 Aug 2017 Dung Thai, Shikhar Murty, Trapit Bansal, Luke Vilnis, David Belanger, Andrew McCallum

In textual information extraction and other sequence labeling tasks it is now common to use recurrent neural networks (such as LSTM) to form rich embedded representations of long-term input co-occurrence patterns.

named-entity-recognition Named Entity Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.