Search Results for author: Lori Levin

Found 44 papers, 10 papers with code

PanPhon: A Resource for Mapping IPA Segments to Articulatory Feature Vectors

1 code implementation • COLING 2016 • David R. Mortensen, Patrick Littell, Akash Bharadwaj, Kartik Goyal, Chris Dyer, Lori Levin

This paper contributes to a growing body of evidence that{---}when coupled with appropriate machine-learning techniques{--}linguistically motivated, information-rich representations can outperform one-hot encodings of linguistic data.

NER

192

Paper
Code

The CMU METAL Farsi NLP Approach

1 code implementation • LREC 2014 • Weston Feely, Mehdi Manshadi, Robert Frederking, Lori Levin

While many high-quality tools are available for analyzing major languages such as English, equivalent freely-available tools for important but lower-resourced languages such as Farsi are more difficult to acquire and integrate into a useful NLP front end.

Dependency Parsing

Paper
Code

The BECauSE Corpus 2.0: Annotating Causality and Overlapping Relations

1 code implementation • WS 2017 • Jesse Dunietz, Lori Levin, Jaime Carbonell

Language of cause and effect captures an essential component of the semantics of a text.

Decision Making

Paper
Code

An Empirical Exploration of Local Ordering Pre-training for Structured Prediction

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Zhisong Zhang, Xiang Kong, Lori Levin, Eduard Hovy

Recently, pre-training contextualized encoders with language model (LM) objectives has been shown an effective semi-supervised method for structured prediction.

Dependency Parsing Language Modelling +4

Paper
Code

Adapting Word Embeddings to New Languages with Morphological and Phonological Subword Representations

1 code implementation • EMNLP 2018 • Aditi Chaudhary, Chunting Zhou, Lori Levin, Graham Neubig, David R. Mortensen, Jaime G. Carbonell

Much work in Natural Language Processing (NLP) has been for resource-rich languages, making generalization to new, less-resourced languages challenging.

Avg Machine Translation +6

Paper
Code

A Resource for Computational Experiments on Mapudungun

1 code implementation • LREC 2020 • Mingjun Duan, Carlos Fasola, Sai Krishna Rallabandi, Rodolfo M. Vega, Antonios Anastasopoulos, Lori Levin, Alan W. black

We present a resource for computational experiments on Mapudungun, a polysynthetic indigenous language spoken in Chile with upwards of 200 thousand speakers.

Machine Translation speech-recognition +3

Paper
Code

UCxn: Typologically Informed Annotation of Constructions Atop Universal Dependencies

1 code implementation • 26 Mar 2024 • Leonie Weissweiler, Nina Böbel, Kirian Guiller, Santiago Herrera, Wesley Scivetti, Arthur Lorenzi, Nurit Melnik, Archna Bhatia, Hinrich Schütze, Lori Levin, Amir Zeldes, Joakim Nivre, William Croft, Nathan Schneider

The Universal Dependencies (UD) project has created an invaluable collection of treebanks with contributions in over 140 languages.

Paper
Code

Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning

no code implementations • NAACL 2016 • Yulia Tsvetkov, Sunayana Sitaram, Manaal Faruqui, Guillaume Lample, Patrick Littell, David Mortensen, Alan W. black, Lori Levin, Chris Dyer

We introduce polyglot language models, recurrent neural network models trained to predict symbol sequences in many different languages using shared representations of symbols and conditioning on typological information about the language to be predicted.

Representation Learning

Paper
Add Code

Unsupervised POS Induction with Word Embeddings

no code implementations • HLT 2015 • Chu-Cheng Lin, Waleed Ammar, Chris Dyer, Lori Levin

Unsupervised word embeddings have been shown to be valuable as features in supervised learning problems; however, their role in unsupervised problems has been less thoroughly explored.

POS Word Embeddings

Paper
Add Code

Statistical modality tagging from rule-based annotations and crowdsourcing

no code implementations • WS 2012 • Vinodkumar Prabhakaran, Michael Bloodgood, Mona Diab, Bonnie Dorr, Lori Levin, Christine D. Piatko, Owen Rambow, Benjamin Van Durme

We explore training an automatic modality tagger.

Paper
Add Code

Use of Modality and Negation in Semantically-Informed Syntactic MT

no code implementations • 5 Feb 2015 • Kathryn Baker, Michael Bloodgood, Bonnie J. Dorr, Chris Callison-Burch, Nathaniel W. Filardo, Christine Piatko, Lori Levin, Scott Miller

We apply our MN annotation scheme to statistical machine translation using a syntactic framework that supports the inclusion of semantic annotations.

Machine Translation Negation +1

Paper
Add Code

A Modality Lexicon and its use in Automatic Tagging

no code implementations • 17 Oct 2014 • Kathryn Baker, Michael Bloodgood, Bonnie J. Dorr, Nathaniel W. Filardo, Lori Levin, Christine Piatko

Specifically, we describe the construction of a modality annotation scheme, a modality lexicon, and two automated modality taggers that were built using the lexicon and annotation scheme.

Machine Translation Translation

Paper
Add Code

Semantically-Informed Syntactic Machine Translation: A Tree-Grafting Approach

no code implementations • 24 Sep 2014 • Kathryn Baker, Michael Bloodgood, Chris Callison-Burch, Bonnie J. Dorr, Nathaniel W. Filardo, Lori Levin, Scott Miller, Christine Piatko

We describe a unified and coherent syntactic framework for supporting a semantically-informed syntactic approach to statistical machine translation.

Machine Translation Translation

Paper
Add Code

DeepCx: A transition-based approach for shallow semantic parsing with complex constructional triggers

no code implementations • EMNLP 2018 • Jesse Dunietz, Jaime Carbonell, Lori Levin

This paper introduces the surface construction labeling (SCL) task, which expands the coverage of Shallow Semantic Parsing (SSP) to include frames triggered by complex constructions.

Semantic Parsing

Paper
Add Code

Automatically Tagging Constructions of Causation and Their Slot-Fillers

no code implementations • TACL 2017 • Jesse Dunietz, Lori Levin, Jaime Carbonell

Semantic parsing becomes difficult in the face of the wide variety of linguistic realizations that causation can take on.

Semantic Parsing

Paper
Add Code

URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors

no code implementations • EACL 2017 • Patrick Littell, David R. Mortensen, Ke Lin, Katherine Kairis, Carlisle Turner, Lori Levin

We introduce the URIEL knowledge base for massively multilingual NLP and the lang2vec utility, which provides information-rich vector identifications of languages drawn from typological, geographical, and phylogenetic databases and normalized to have straightforward and consistent formats, naming, and semantics.

Language Identification Language Modelling +1

Paper
Add Code

Annotation Schemes for Surface Construction Labeling

no code implementations • COLING 2018 • Lori Levin

In this talk I will describe the interaction of linguistics and language technologies in Surface Construction Labeling (SCL) from the perspective of corpus annotation tasks such as definiteness, modality, and causality.

Semantic Parsing

Paper
Add Code

Code-Switching as a Social Act: The Case of Arabic Wikipedia Talk Pages

no code implementations • WS 2017 • Michael Yoder, Shruti Rijhwani, Carolyn Ros{\'e}, Lori Levin

Code-switching has been found to have social motivations in addition to syntactic constraints.

Paper
Add Code

Named Entity Recognition for Linguistic Rapid Response in Low-Resource Languages: Sorani Kurdish and Tajik

no code implementations • COLING 2016 • Patrick Littell, Kartik Goyal, David R. Mortensen, Alexa Little, Chris Dyer, Lori Levin

This paper describes our construction of named-entity recognition (NER) systems in two Western Iranian languages, Sorani Kurdish and Tajik, as a part of a pilot study of {``}Linguistic Rapid Response{''} to potential emergency humanitarian relief situations.

Humanitarian named-entity-recognition +2

Paper
Add Code

Parser combinators for Tigrinya and Oromo morphology

no code implementations • LREC 2018 • Patrick Littell, Tom McCoy, Na-Rae Han, Shruti Rijhwani, Zaid Sheikh, David Mortensen, Teruko Mitamura, Lori Levin

Lemmatization Machine Translation

Paper
Add Code

Modality and Negation in SIMT Use of Modality and Negation in Semantically-Informed Syntactic MT

no code implementations • CL 2012 • Kathryn Baker, Michael Bloodgood, Bonnie J. Dorr, Chris Callison-Burch, Nathaniel W. Filardo, Christine Piatko, Lori Levin, Scott Miller

Negation

Paper
Add Code

The Effects of Lexical Resource Quality on Preference Violation Detection

no code implementations • ACL 2013 • Jesse Dunietz, Lori Levin, Jaime Carbonell

Semantic Parsing Word Sense Disambiguation

Paper
Add Code

Annotating Causal Language Using Corpus Lexicography of Constructions

no code implementations • WS 2015 • Jesse Dunietz, Lori Levin, Jaime Carbonell

Paper
Add Code

The CMU Submission for the Shared Task on Language Identification in Code-Switched Data

no code implementations • WS 2014 • Chu-Cheng Lin, Waleed Ammar, Lori Levin, Chris Dyer

Language Identification Learning Word Embeddings

Paper
Add Code

Keynote Lecture 3: Modeling Non-Propositional Semantics

no code implementations • WS 2014 • Lori Levin

Paper
Add Code

Generating English Determiners in Phrase-Based Translation with Synthetic Translation Options

no code implementations • WS 2013 • Yulia Tsvetkov, Chris Dyer, Lori Levin, Archna Bhatia

Language Modelling Machine Translation +1

Paper
Add Code

Introducing Computational Concepts in a Linguistics Olympiad

no code implementations • WS 2013 • Patrick Littell, Lori Levin, Jason Eisner, Dragomir Radev

Paper
Add Code

Automatic Classification of Communicative Functions of Definiteness

no code implementations • COLING 2014 • Archna Bhatia, Chu-Cheng Lin, Nathan Schneider, Yulia Tsvetkov, Fatima Talib Al-Raisi, Laleh Roostapour, Jordan Bender, Abhimanu Kumar, Lori Levin, M Simons, y, Chris Dyer

Classification General Classification

Paper
Add Code

A Unified Annotation Scheme for the Semantic/Pragmatic Components of Definiteness

no code implementations • LREC 2014 • Archna Bhatia, M Simons, y, Lori Levin, Yulia Tsvetkov, Chris Dyer, Jordan Bender

We present a definiteness annotation scheme that captures the semantic, pragmatic, and discourse information, which we call communicative functions, associated with linguistic descriptions such as {``}a story about my speech{''}, {``}the story{''}, {``}every time I give it{''}, {``}this slideshow{''}.

Machine Translation Specificity

Paper
Add Code

Resources for the Detection of Conventionalized Metaphors in Four Languages

no code implementations • LREC 2014 • Lori Levin, Teruko Mitamura, Brian MacWhinney, Davida Fromm, Jaime Carbonell, Weston Feely, Robert Frederking, Anatole Gershman, Carlos Ramirez

The extraction rules operate on the output of a dependency parser and identify the grammatical configurations (such as a verb with a prepositional phrase complement) that are likely to contain conventional metaphors.

Paper
Add Code

Morphological parsing of Swahili using crowdsourced lexical resources

no code implementations • LREC 2014 • Patrick Littell, Kaitlyn Price, Lori Levin

We describe a morphological analyzer for the Swahili language, written in an extension of XFST/LEXC intended for the easy declaration of morphophonological patterns and importation of lexical resources.

Machine Translation

Paper
Add Code

The ARIEL-CMU Systems for LoReHLT18

no code implementations • 24 Feb 2019 • Aditi Chaudhary, Siddharth Dalmia, Junjie Hu, Xinjian Li, Austin Matthews, Aldrian Obaja Muis, Naoki Otani, Shruti Rijhwani, Zaid Sheikh, Nidhi Vyas, Xinyi Wang, Jiateng Xie, Ruochen Xu, Chunting Zhou, Peter J. Jansen, Yiming Yang, Lori Levin, Florian Metze, Teruko Mitamura, David R. Mortensen, Graham Neubig, Eduard Hovy, Alan W. black, Jaime Carbonell, Graham V. Horwood, Shabnam Tafreshi, Mona Diab, Efsun S. Kayi, Noura Farra, Kathleen McKeown

This paper describes the ARIEL-CMU submissions to the Low Resource Human Language Technologies (LoReHLT) 2018 evaluations for the tasks Machine Translation (MT), Entity Discovery and Linking (EDL), and detection of Situation Frames in Text and Speech (SF Text and Speech).

Machine Translation Translation

Paper
Add Code

Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik

no code implementations • LREC 2016 • Patrick Littell, David R. Mortensen, Kartik Goyal, Chris Dyer, Lori Levin

In Sorani Kurdish, one of the most useful orthographic features in named-entity recognition {--} capitalization {--} is absent, as the language{'}s Perso-Arabic script does not make a distinction between uppercase and lowercase letters.

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

Using Interlinear Glosses as Pivot in Low-Resource Multilingual Machine Translation

no code implementations • 7 Nov 2019 • Zhong Zhou, Lori Levin, David R. Mortensen, Alex Waibel

Firstly, we pool IGT for 1, 497 languages in ODIN (54, 545 glosses) and 70, 918 glosses in Arapaho and train a gloss-to-target NMT system from IGT to English, with a BLEU score of 25. 94.

Machine Translation NMT +2

Paper
Add Code

Neural Polysynthetic Language Modelling

no code implementations • 11 May 2020 • Lane Schwartz, Francis Tyers, Lori Levin, Christo Kirov, Patrick Littell, Chi-kiu Lo, Emily Prud'hommeaux, Hyunji Hayley Park, Kenneth Steimel, Rebecca Knowles, Jeffrey Micher, Lonny Strunk, Han Liu, Coleman Haley, Katherine J. Zhang, Robbie Jimmerson, Vasilisa Andriyanets, Aldrian Obaja Muis, Naoki Otani, Jong Hyuk Park, Zhisong Zhang

In the literature, languages like Finnish or Turkish are held up as extreme examples of complexity that challenge common modelling assumptions.

Language Modelling Lemmatization +1

Paper
Add Code

Pre-tokenization of Multi-word Expressions in Cross-lingual Word Embeddings

no code implementations • EMNLP 2020 • Naoki Otani, Satoru Ozaki, Xingyuan Zhao, Yucen Li, Micaelah St Johns, Lori Levin

We propose a simple method for word translation of MWEs to and from English in ten languages: we first compile lists of MWEs in each language and then tokenize the MWEs as single tokens before training word embeddings.

Cross-Lingual Word Embeddings Translation +2

Paper
Add Code

Automatic Interlinear Glossing for Under-Resourced Languages Leveraging Translations

no code implementations • COLING 2020 • Xingyuan Zhao, Satoru Ozaki, Antonios Anastasopoulos, Graham Neubig, Lori Levin

Interlinear Glossed Text (IGT) is a widely used format for encoding linguistic information in language documentation projects and scholarly papers.

Cross-Lingual Transfer LEMMA +1

Paper
Add Code

How well do LSTM language models learn filler-gap dependencies?

1 code implementation • SCiL 2022 • Satoru Ozaki, Dan Yurovsky, Lori Levin

Paper
Code

The CODI-CRAC 2022 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

no code implementations • COLING (CODI, CRAC) 2022 • Juntao Yu, Sopan Khosla, Ramesh Manuvinakurike, Lori Levin, Vincent Ng, Massimo Poesio, Michael Strube, Carolyn Rosé

The CODI-CRAC 2022 Shared Task on Anaphora Resolution in Dialogues is the second edition of an initiative focused on detecting different types of anaphoric relations in conversations of different kinds.

Paper
Add Code

Construction Grammar Provides Unique Insight into Neural Language Models

no code implementations • 4 Feb 2023 • Leonie Weissweiler, Taiqi He, Naoki Otani, David R. Mortensen, Lori Levin, Hinrich Schütze

Construction Grammar (CxG) has recently been used as the basis for probing studies that have investigated the performance of large pretrained language models (PLMs) with respect to the structure and meaning of constructions.

Position

Paper
Add Code

Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics Interface of LMs Through Agentivity

1 code implementation • 29 May 2023 • Lindia Tjuatja, Emmy Liu, Lori Levin, Graham Neubig

Recent advances in large language models have prompted researchers to examine their abilities across a variety of linguistic tasks, but little has been done to investigate how models handle the interactions in meaning across words and larger syntactic forms -- i. e. phenomena at the intersection of syntax and semantics.

Paper
Code

GlossLM: Multilingual Pretraining for Low-Resource Interlinear Glossing

no code implementations • 11 Mar 2024 • Michael Ginn, Lindia Tjuatja, Taiqi He, Enora Rice, Graham Neubig, Alexis Palmer, Lori Levin

A key aspect of language documentation is the creation of annotated text in a format such as interlinear glossed text (IGT), which captures fine-grained morphosyntactic analyses in a morpheme-by-morpheme format.

Paper
Add Code

Wav2Gloss: Generating Interlinear Glossed Text from Speech

no code implementations • 19 Mar 2024 • Taiqi He, Kwanghee Choi, Lindia Tjuatja, Nathaniel R. Robinson, Jiatong Shi, Shinji Watanabe, Graham Neubig, David R. Mortensen, Lori Levin

Thousands of the world's languages are in danger of extinction--a tremendous threat to cultural identities and human language diversity.

Paper
Add Code

Constructions Are So Difficult That Even Large Language Models Get Them Right for the Wrong Reasons

1 code implementation • 26 Mar 2024 • Shijia Zhou, Leonie Weissweiler, Taiqi He, Hinrich Schütze, David R. Mortensen, Lori Levin

In this paper, we make a contribution that can be understood from two perspectives: from an NLP perspective, we introduce a small challenge dataset for NLI with large lexical overlap, which minimises the possibility of models discerning entailment solely based on token distinctions, and show that GPT-4 and Llama 2 fail it with strong bias.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.