Search Results for author: Mark Johnson

Found 81 papers, 25 papers with code

Open-Domain Contextual Link Prediction and its Complementarity with Entailment Graphs

1 code implementation Findings (EMNLP) 2021 Mohammad Javad Hosseini, Shay B. Cohen, Mark Johnson, Mark Steedman

In this paper, we introduce the new task of open-domain contextual link prediction which has access to both the textual context and the KG structure to perform link prediction.

Link Prediction

Mastering the Craft of Data Synthesis for CodeLLMs

no code implementations16 Oct 2024 Meng Chen, Philip Arthur, Qianyu Feng, Cong Duy Vu Hoang, Yu-Heng Hong, Mahdi Kazemi Moghaddam, Omid Nezami, Thien Nguyen, Gioacchino Tangari, Duy Vu, Thanh Vu, Mark Johnson, Krishnaram Kenthapadi, Don Dharmasiri, Long Duong, Yuan-Fang Li

Large language models (LLMs) have shown impressive performance in \emph{code} understanding and generation, making coding tasks a key focus for researchers due to their practical applications and value as a testbed for LLM evaluation.

A Language-agnostic Model of Child Language Acquisition

no code implementations22 Aug 2024 Louis Mahon, Omri Abend, Uri Berger, Katherine Demuth, Mark Johnson, Mark Steedman

This work reimplements a recent semantic bootstrapping child-language acquisition model, which was originally designed for English, and trains it to learn a new language: Hebrew.

Language Acquisition

Sources of Hallucination by Large Language Models on Inference Tasks

1 code implementation23 May 2023 Nick McKenna, Tianyi Li, Liang Cheng, Mohammad Javad Hosseini, Mark Johnson, Mark Steedman

Large Language Models (LLMs) are claimed to be capable of Natural Language Inference (NLI), necessary for applied tasks like question answering and summarization.

Hallucination Memorization +2

Smoothing Entailment Graphs with Language Models

1 code implementation30 Jul 2022 Nick McKenna, Tianyi Li, Mark Johnson, Mark Steedman

The diversity and Zipfian frequency distribution of natural language predicates in corpora leads to sparsity in Entailment Graphs (EGs) built by Open Relation Extraction (ORE).

Diversity Explainable Models +3

Three-dimensional Cooperative Localization of Commercial-Off-The-Shelf Sensors

no code implementations3 Nov 2021 Yulong Wang, Shenghong Li, Wei Ni, David Abbott, Mark Johnson, Guangyu Pei, Mark Hedley

We propose an efficient approach to solve the corresponding permutation combinatorial optimization problem, which integrates continuous space cooperative localization and permutation space likelihood ascent search.

Combinatorial Optimization

Blindness to Modality Helps Entailment Graph Mining

1 code implementation EMNLP (insights) 2021 Liane Guillou, Sander Bijl de Vroe, Mark Johnson, Mark Steedman

Understanding linguistic modality is widely seen as important for downstream tasks such as Question Answering and Knowledge Graph Population.

Graph Learning Graph Mining +1

Incorporating Temporal Information in Entailment Graph Mining

1 code implementation COLING (TextGraphs) 2020 Liane Guillou, Sander Bijl de Vroe, Mohammad Javad Hosseini, Mark Johnson, Mark Steedman

We present a novel method for injecting temporality into entailment graphs to address the problem of spurious entailments, which may arise from similar but temporally distinct events involving the same pair of entities.

Graph Mining

Mention Flags (MF): Constraining Transformer-based Text Generators

1 code implementation ACL 2021 YuFei Wang, Ian Wood, Stephen Wan, Mark Dras, Mark Johnson

In this paper, we propose Mention Flags (MF), which traces whether lexical constraints are satisfied in the generated outputs in an S2S decoder.

Common Sense Reasoning Decoder +1

Neural Rule-Execution Tracking Machine For Transformer-Based Text Generation

no code implementations NeurIPS 2021 YuFei Wang, Can Xu, Huang Hu, Chongyang Tao, Stephen Wan, Mark Dras, Mark Johnson, Daxin Jiang

Sequence-to-Sequence (S2S) neural text generation models, especially the pre-trained ones (e. g., BART and T5), have exhibited compelling performance on various natural language generation tasks.

Text Generation

Integrating Lexical Information into Entity Neighbourhood Representations for Relation Prediction

1 code implementation NAACL 2021 Ian Wood, Mark Johnson, Stephen Wan

OpenKi[1] addresses this task through extraction of named entities and predicates via OpenIE tools then learning relation embeddings from the resulting entity-relation graph for relation prediction, outperforming previous approaches.

Knowledge Graph Completion Relation +1

There and Back Again: Self-supervised Multispectral Correspondence Estimation

no code implementations19 Mar 2021 Celyn Walters, Oscar Mendez, Mark Johnson, Richard Bowden

In this work, we aim to address the dense correspondence estimation problem in a way that generalizes to more than one spectrum.

Autonomous Vehicles

ECOL-R: Encouraging Copying in Novel Object Captioning with Reinforcement Learning

no code implementations EACL 2021 YuFei Wang, Ian D. Wood, Stephen Wan, Mark Johnson

In this paper, we focus on this challenge and propose the ECOL-R model (Encouraging Copying of Object Labels with Reinforced Learning), a copy-augmented transformer model that is encouraged to accurately describe the novel object labels.

Image Captioning Object +2

Detecting and Exorcising Statistical Demons from Language Models with Anti-Models of Negative Data

no code implementations22 Oct 2020 Michael L. Wick, Kate Silverstein, Jean-Baptiste Tristan, Adam Pocock, Mark Johnson

Indeed, self-supervised language models trained on "positive" examples of English text generalize in desirable ways to many natural language tasks.

Inductive Bias

Improving Disfluency Detection by Self-Training a Self-Attentive Model

no code implementations ACL 2020 Paria Jamshid Lou, Mark Johnson

However, we show that self-training - a semi-supervised technique for incorporating unlabeled data - sets a new state-of-the-art for the self-attentive parser on disfluency detection, demonstrating that self-training provides benefits orthogonal to the pre-trained contextualized word representations.

Word Embeddings

Duality of Link Prediction and Entailment Graph Induction

1 code implementation ACL 2019 Mohammad Javad Hosseini, Shay B. Cohen, Mark Johnson, Mark Steedman

The new entailment score outperforms prior state-of-the-art results on a standard entialment dataset and the new link prediction scores show improvements over the raw link prediction scores.

Link Prediction

How to best use Syntax in Semantic Role Labelling

1 code implementation ACL 2019 Yufei Wang, Mark Johnson, Stephen Wan, Yifang Sun, Wei Wang

There are many different ways in which external information might be used in an NLP task.

Semantic Role Labeling

nocaps: novel object captioning at scale

2 code implementations ICCV 2019 Harsh Agrawal, Karan Desai, YuFei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson

To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale benchmark for this task.

Image Captioning Object +2

Improving Topic Models with Latent Feature Word Representations

no code implementations TACL 2015 Dat Quoc Nguyen, Richard Billingsley, Lan Du, Mark Johnson

Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks.

Clustering Document Classification +2

Disfluency Detection using a Noisy Channel Model and a Deep Neural Language Model

no code implementations ACL 2017 Paria Jamshid Lou, Mark Johnson

This paper presents a model for disfluency detection in spontaneous speech transcripts called LSTM Noisy Channel Model.

Language Modelling

Disfluency Detection using Auto-Correlational Neural Networks

4 code implementations EMNLP 2018 Paria Jamshid Lou, Peter Anderson, Mark Johnson

In recent years, the natural language processing community has moved away from task-specific feature engineering, i. e., researchers discovering ad-hoc feature representations for various tasks, in favor of general-purpose methods that learn the input representation by themselves.

Feature Engineering

Predicting accuracy on large datasets from smaller pilot data

no code implementations ACL 2018 Mark Johnson, Peter Anderson, Mark Dras, Mark Steedman

Because obtaining training data is often the most difficult part of an NLP or ML project, we develop methods for predicting how much data is required to achieve a desired test accuracy by extrapolating results from models trained on a small pilot training dataset.

Document Classification

Partially-Supervised Image Captioning

no code implementations NeurIPS 2018 Peter Anderson, Stephen Gould, Mark Johnson

To address this problem, we teach image captioning models new visual concepts from labeled images and object detection datasets.

Image Captioning Object +3

AMR Dependency Parsing with a Typed Semantic Algebra

no code implementations ACL 2018 Jonas Groschwitz, Matthias Lindemann, Meaghan Fowlie, Mark Johnson, Alexander Koller

We present a semantic parser for Abstract Meaning Representations which learns to parse strings into tree representations of the compositional structure of an AMR graph.

Dependency Parsing

Learning Typed Entailment Graphs with Global Soft Constraints

1 code implementation TACL 2018 Mohammad Javad Hosseini, Nathanael Chambers, Siva Reddy, Xavier R. Holt, Shay B. Cohen, Mark Johnson, Mark Steedman

We instead propose a scalable method that learns globally consistent similarity scores based on new soft constraints that consider both the structures across typed entailment graphs and inside each graph.

Graph Learning

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments

8 code implementations CVPR 2018 Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton Van Den Hengel

This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering.

Reinforcement Learning Translation +3

From Word Segmentation to POS Tagging for Vietnamese

1 code implementation ALTA 2017 Dat Quoc Nguyen, Thanh Vu, Dai Quoc Nguyen, Mark Dras, Mark Johnson

This paper presents an empirical comparison of two strategies for Vietnamese Part-of-Speech (POS) tagging from unsegmented text: (i) a pipeline strategy where we consider the output of a word segmenter as the input of a POS tagger, and (ii) a joint strategy where we predict a combined segmentation and POS tag for each syllable.

Part-Of-Speech Tagging POS +2

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

65 code implementations CVPR 2018 Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang

Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning.

Image Captioning Visual Question Answering

Unsupervised Text Segmentation Based on Native Language Characteristics

no code implementations ACL 2017 Shervin Malmasi, Mark Dras, Mark Johnson, Lan Du, Magdalena Wolska

Most work on segmenting text does so on the basis of topic changes, but it can be of interest to segment by other, stylistically expressed characteristics such as change of authorship or native language.

Segmentation Text Segmentation

Idea density for predicting Alzheimer's disease from transcribed speech

no code implementations CONLL 2017 Kairit Sirts, Olivier Piguet, Mark Johnson

ID has been used in two different versions: propositional idea density (PID) counts the expressed ideas and can be applied to any text while semantic idea density (SID) counts pre-defined information content units and is naturally more applicable to normative domains, such as picture description tasks.

Clustering

Search Personalization with Embeddings

1 code implementation12 Dec 2016 Thanh Vu, Dat Quoc Nguyen, Mark Johnson, Dawei Song, Alistair Willis

Recent research has shown that the performance of search personalization depends on the richness of user profiles which normally represent the user's topical interests.

Grammar induction from (lots of) words alone

1 code implementation COLING 2016 John K Pate, Mark Johnson

We show that grammar induction from words alone is in fact feasible when the model is provided with sufficient training data, and present two new streaming or mini-batch algorithms for PCFG inference that can learn from millions of words of training data.

Language Acquisition POS +2

An empirical study for Vietnamese dependency parsing

no code implementations ALTA 2016 Dat Quoc Nguyen, Mark Dras, Mark Johnson

This paper presents an empirical comparison of different dependency parsers for Vietnamese, which has some unusual characteristics such as copula drop and verb serialization.

Dependency Parsing

SPICE: Semantic Propositional Image Caption Evaluation

11 code implementations29 Jul 2016 Peter Anderson, Basura Fernando, Mark Johnson, Stephen Gould

There is considerable interest in the task of automatically generating image captions.

Image Captioning

STransE: a novel embedding model of entities and relationships in knowledge bases

1 code implementation NAACL 2016 Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu, Mark Johnson

Knowledge bases of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks.

Knowledge Base Completion Link Prediction +1

Bridging the gap between speech technology and natural language processing: an evaluation toolbox for term discovery systems

no code implementations LREC 2014 Bogdan Ludusan, Maarten Versteegh, Aren Jansen, Guillaume Gravier, Xuan-Nga Cao, Mark Johnson, Emmanuel Dupoux

The unsupervised discovery of linguistic terms from either continuous phoneme transcriptions or from raw speech has seen an increasing interest in the past years both from a theoretical and a practical standpoint.

Language Acquisition

Parsing entire discourses as very long strings: Capturing topic continuity in grounded language learning

no code implementations TACL 2013 Minh-Thang Luong, Michael C. Frank, Mark Johnson

Grounded language learning, the task of mapping from natural language to a representation of meaning, has attracted more and more interest in recent years.

Grounded language learning Sentence

Synergies in learning words and their referents

no code implementations NeurIPS 2010 Mark Johnson, Katherine Demuth, Bevan Jones, Michael J. Black

This paper presents Bayesian non-parametric models that simultaneously learn to segment words from phoneme strings and learn the referents of some of those words, and shows that there is a synergistic interaction in the acquisition of these two kinds of linguistic information.

Language Acquisition Segmentation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.