1 code implementation • Findings (EMNLP) 2021 • Mohammad Javad Hosseini, Shay B. Cohen, Mark Johnson, Mark Steedman
In this paper, we introduce the new task of open-domain contextual link prediction which has access to both the textual context and the KG structure to perform link prediction.
no code implementations • 16 Oct 2024 • Meng Chen, Philip Arthur, Qianyu Feng, Cong Duy Vu Hoang, Yu-Heng Hong, Mahdi Kazemi Moghaddam, Omid Nezami, Thien Nguyen, Gioacchino Tangari, Duy Vu, Thanh Vu, Mark Johnson, Krishnaram Kenthapadi, Don Dharmasiri, Long Duong, Yuan-Fang Li
Large language models (LLMs) have shown impressive performance in \emph{code} understanding and generation, making coding tasks a key focus for researchers due to their practical applications and value as a testbed for LLM evaluation.
no code implementations • 22 Aug 2024 • Louis Mahon, Omri Abend, Uri Berger, Katherine Demuth, Mark Johnson, Mark Steedman
This work reimplements a recent semantic bootstrapping child-language acquisition model, which was originally designed for English, and trains it to learn a new language: Hebrew.
1 code implementation • 23 May 2023 • Nick McKenna, Tianyi Li, Liang Cheng, Mohammad Javad Hosseini, Mark Johnson, Mark Steedman
Large Language Models (LLMs) are claimed to be capable of Natural Language Inference (NLI), necessary for applied tasks like question answering and summarization.
1 code implementation • 30 Jul 2022 • Nick McKenna, Tianyi Li, Mark Johnson, Mark Steedman
The diversity and Zipfian frequency distribution of natural language predicates in corpora leads to sparsity in Entailment Graphs (EGs) built by Open Relation Extraction (ORE).
no code implementations • 3 Nov 2021 • Yulong Wang, Shenghong Li, Wei Ni, David Abbott, Mark Johnson, Guangyu Pei, Mark Hedley
We propose an efficient approach to solve the corresponding permutation combinatorial optimization problem, which integrates continuous space cooperative localization and permutation space likelihood ascent search.
1 code implementation • EMNLP (insights) 2021 • Liane Guillou, Sander Bijl de Vroe, Mark Johnson, Mark Steedman
Understanding linguistic modality is widely seen as important for downstream tasks such as Question Answering and Knowledge Graph Population.
1 code implementation • COLING (TextGraphs) 2020 • Liane Guillou, Sander Bijl de Vroe, Mohammad Javad Hosseini, Mark Johnson, Mark Steedman
We present a novel method for injecting temporality into entailment graphs to address the problem of spurious entailments, which may arise from similar but temporally distinct events involving the same pair of entities.
1 code implementation • ACL 2021 • YuFei Wang, Ian Wood, Stephen Wan, Mark Dras, Mark Johnson
In this paper, we propose Mention Flags (MF), which traces whether lexical constraints are satisfied in the generated outputs in an S2S decoder.
no code implementations • NeurIPS 2021 • YuFei Wang, Can Xu, Huang Hu, Chongyang Tao, Stephen Wan, Mark Dras, Mark Johnson, Daxin Jiang
Sequence-to-Sequence (S2S) neural text generation models, especially the pre-trained ones (e. g., BART and T5), have exhibited compelling performance on various natural language generation tasks.
1 code implementation • NAACL 2021 • Ian Wood, Mark Johnson, Stephen Wan
OpenKi[1] addresses this task through extraction of named entities and predicates via OpenIE tools then learning relation embeddings from the resulting entity-relation graph for relation prediction, outperforming previous approaches.
no code implementations • EMNLP 2021 • Nick McKenna, Liane Guillou, Mohammad Javad Hosseini, Sander Bijl de Vroe, Mark Johnson, Mark Steedman
Drawing inferences between open-domain natural language predicates is a necessity for true language understanding.
no code implementations • 19 Mar 2021 • Celyn Walters, Oscar Mendez, Mark Johnson, Richard Bowden
In this work, we aim to address the dense correspondence estimation problem in a way that generalizes to more than one spectrum.
no code implementations • EACL 2021 • YuFei Wang, Ian D. Wood, Stephen Wan, Mark Johnson
In this paper, we focus on this challenge and propose the ECOL-R model (Encouraging Copying of Object Labels with Reinforced Learning), a copy-augmented transformer model that is encouraged to accurately describe the novel object labels.
no code implementations • 22 Oct 2020 • Michael L. Wick, Kate Silverstein, Jean-Baptiste Tristan, Adam Pocock, Mark Johnson
Indeed, self-supervised language models trained on "positive" examples of English text generalize in desirable ways to many natural language tasks.
2 code implementations • Findings of the Association for Computational Linguistics 2020 • Paria Jamshid Lou, Mark Johnson
Disfluency detection is usually an intermediate step between an automatic speech recognition (ASR) system and a downstream task.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 5 Jun 2020 • Ryan M. Corey, Evan M. Widloski, David Null, Brian Ricconi, Mark Johnson, Karen White, Jennifer R. Amos, Alex Pagano, Michael Oelze, Rachel Switzky, Matthew B. Wheeler, Eliot Bethke, Clifford Shipley, Andrew C. Singer
In response to the shortage of ventilators caused by the COVID-19 pandemic, many organizations have designed low-cost emergency ventilators.
no code implementations • ACL 2020 • Paria Jamshid Lou, Mark Johnson
However, we show that self-training - a semi-supervised technique for incorporating unlabeled data - sets a new state-of-the-art for the self-attentive parser on disfluency detection, demonstrating that self-training provides benefits orthogonal to the pre-trained contextualized word representations.
1 code implementation • ACL 2019 • Mohammad Javad Hosseini, Shay B. Cohen, Mark Johnson, Mark Steedman
The new entailment score outperforms prior state-of-the-art results on a standard entialment dataset and the new link prediction scores show improvements over the raw link prediction scores.
no code implementations • ACL 2019 • Long Duong, Vu Cong Duy Hoang, Tuyen Quang Pham, Yu-Heng Hong, Vladislavs Dovgalecs, Guy Bashkansky, Jason Black, Andrew Bleeker, Serge Le Huitouze, Mark Johnson
This paper describes a spoken-language end-to-end task-oriented dialogue system for small embedded devices such as home appliances.
1 code implementation • ACL 2019 • Yufei Wang, Mark Johnson, Stephen Wan, Yifang Sun, Wei Wang
There are many different ways in which external information might be used in an NLP task.
no code implementations • NAACL 2019 • Paria Jamshid Lou, YuFei Wang, Mark Johnson
This paper studies the performance of a neural self-attentive parser on transcribed speech.
2 code implementations • ICCV 2019 • Harsh Agrawal, Karan Desai, YuFei Wang, Xinlei Chen, Rishabh Jain, Mark Johnson, Dhruv Batra, Devi Parikh, Stefan Lee, Peter Anderson
To encourage the development of image captioning models that can learn visual concepts from alternative data sources, such as object detection datasets, we present the first large-scale benchmark for this task.
no code implementations • TACL 2015 • Dat Quoc Nguyen, Richard Billingsley, Lan Du, Mark Johnson
Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks.
no code implementations • ACL 2017 • Paria Jamshid Lou, Mark Johnson
This paper presents a model for disfluency detection in spontaneous speech transcripts called LSTM Noisy Channel Model.
4 code implementations • EMNLP 2018 • Paria Jamshid Lou, Peter Anderson, Mark Johnson
In recent years, the natural language processing community has moved away from task-specific feature engineering, i. e., researchers discovering ad-hoc feature representations for various tasks, in favor of general-purpose methods that learn the input representation by themselves.
no code implementations • ACL 2018 • Mark Johnson, Peter Anderson, Mark Dras, Mark Steedman
Because obtaining training data is often the most difficult part of an NLP or ML project, we develop methods for predicting how much data is required to achieve a desired test accuracy by extrapolating results from models trained on a small pilot training dataset.
no code implementations • ACL 2018 • Long Duong, Hadi Afshar, Dominique Estival, Glen Pink, Philip Cohen, Mark Johnson
Semantic parsing requires training data that is expensive and slow to collect.
no code implementations • NeurIPS 2018 • Peter Anderson, Stephen Gould, Mark Johnson
To address this problem, we teach image captioning models new visual concepts from labeled images and object detection datasets.
no code implementations • ACL 2018 • Jonas Groschwitz, Matthias Lindemann, Meaghan Fowlie, Mark Johnson, Alexander Koller
We present a semantic parser for Abstract Meaning Representations which learns to parse strings into tree representations of the compositional structure of an AMR graph.
2 code implementations • NAACL 2018 • Thanh Vu, Dat Quoc Nguyen, Dai Quoc Nguyen, Mark Dras, Mark Johnson
We present an easy-to-use and fast toolkit, namely VnCoreNLP---a Java NLP annotation pipeline for Vietnamese.
1 code implementation • TACL 2018 • Mohammad Javad Hosseini, Nathanael Chambers, Siva Reddy, Xavier R. Holt, Shay B. Cohen, Mark Johnson, Mark Steedman
We instead propose a scalable method that learns globally consistent similarity scores based on new soft constraints that consider both the structures across typed entailment graphs and inside each graph.
8 code implementations • CVPR 2018 • Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton Van Den Hengel
This is significant because a robot interpreting a natural-language navigation instruction on the basis of what it sees is carrying out a vision and language process that is similar to Visual Question Answering.
Ranked #10 on Visual Navigation on R2R
1 code implementation • ALTA 2017 • Dat Quoc Nguyen, Thanh Vu, Dai Quoc Nguyen, Mark Dras, Mark Johnson
This paper presents an empirical comparison of two strategies for Vietnamese Part-of-Speech (POS) tagging from unsegmented text: (i) a pipeline strategy where we consider the output of a word segmenter as the input of a POS tagger, and (ii) a joint strategy where we predict a combined segmentation and POS tag for each syllable.
1 code implementation • LREC 2018 • Dat Quoc Nguyen, Dai Quoc Nguyen, Thanh Vu, Mark Dras, Mark Johnson
We propose a novel approach to Vietnamese word segmentation.
1 code implementation • CONLL 2017 • Long Duong, Hadi Afshar, Dominique Estival, Glen Pink, Philip Cohen, Mark Johnson
As far as we know, this is the first study of code-switching in semantic parsing.
65 code implementations • CVPR 2018 • Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang
Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning.
Ranked #29 on Visual Question Answering (VQA) on VQA v2 test-std
no code implementations • ACL 2017 • Shervin Malmasi, Mark Dras, Mark Johnson, Lan Du, Magdalena Wolska
Most work on segmenting text does so on the basis of topic changes, but it can be of interest to segment by other, stylistically expressed characteristics such as change of authorship or native language.
no code implementations • CONLL 2017 • Kairit Sirts, Olivier Piguet, Mark Johnson
ID has been used in two different versions: propositional idea density (PID) counts the expressed ideas and can be applied to any text while semantic idea density (SID) counts pre-defined information content units and is naturally more applicable to normative domains, such as picture description tasks.
1 code implementation • CONLL 2017 • Dat Quoc Nguyen, Mark Dras, Mark Johnson
We present a novel neural network model that learns POS tagging and graph-based dependency parsing jointly.
Ranked #5 on Part-Of-Speech Tagging on UD
1 code implementation • 12 Dec 2016 • Thanh Vu, Dat Quoc Nguyen, Mark Johnson, Dawei Song, Alistair Willis
Recent research has shown that the performance of search personalization depends on the richness of user profiles which normally represent the user's topical interests.
1 code implementation • EMNLP 2017 • Peter Anderson, Basura Fernando, Mark Johnson, Stephen Gould
Existing image captioning models do not generalize well to out-of-domain images containing novel scenes or objects.
1 code implementation • COLING 2016 • John K Pate, Mark Johnson
We show that grammar induction from words alone is in fact feasible when the model is provided with sufficient training data, and present two new streaming or mini-batch algorithms for PCFG inference that can learn from millions of words of training data.
no code implementations • ALTA 2016 • Dat Quoc Nguyen, Mark Dras, Mark Johnson
This paper presents an empirical comparison of different dependency parsers for Vietnamese, which has some unusual characteristics such as copula drop and verb serialization.
11 code implementations • 29 Jul 2016 • Peter Anderson, Basura Fernando, Mark Johnson, Stephen Gould
There is considerable interest in the task of automatically generating image captions.
1 code implementation • NAACL 2016 • Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu, Mark Johnson
Knowledge bases of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks.
no code implementations • CONLL 2016 • Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu, Mark Johnson
Knowledge bases are useful resources for many natural language processing tasks, however, they are far from complete.
no code implementations • LREC 2014 • Bogdan Ludusan, Maarten Versteegh, Aren Jansen, Guillaume Gravier, Xuan-Nga Cao, Mark Johnson, Emmanuel Dupoux
The unsupervised discovery of linguistic terms from either continuous phoneme transcriptions or from raw speech has seen an increasing interest in the past years both from a theoretical and a practical standpoint.
no code implementations • TACL 2014 • Benjamin B{\"o}rschinger, Mark Johnson
Stress has long been established as a major cue in word segmentation for English infants.
no code implementations • TACL 2014 • Matthew Honnibal, Mark Johnson
We present an incremental dependency parsing model that jointly performs disfluency detection.
no code implementations • TACL 2013 • Minh-Thang Luong, Michael C. Frank, Mark Johnson
Grounded language learning, the task of mapping from natural language to a representation of meaning, has attracted more and more interest in recent years.
no code implementations • NeurIPS 2010 • Mark Johnson, Katherine Demuth, Bevan Jones, Michael J. Black
This paper presents Bayesian non-parametric models that simultaneously learn to segment words from phoneme strings and learn the referents of some of those words, and shows that there is a synergistic interaction in the acquisition of these two kinds of linguistic information.