Search Results for author: Lori Levin

Found 44 papers, 10 papers with code

PanPhon: A Resource for Mapping IPA Segments to Articulatory Feature Vectors

1 code implementation COLING 2016 David R. Mortensen, Patrick Littell, Akash Bharadwaj, Kartik Goyal, Chris Dyer, Lori Levin

This paper contributes to a growing body of evidence that{---}when coupled with appropriate machine-learning techniques{--}linguistically motivated, information-rich representations can outperform one-hot encodings of linguistic data.

NER

The CMU METAL Farsi NLP Approach

1 code implementation LREC 2014 Weston Feely, Mehdi Manshadi, Robert Frederking, Lori Levin

While many high-quality tools are available for analyzing major languages such as English, equivalent freely-available tools for important but lower-resourced languages such as Farsi are more difficult to acquire and integrate into a useful NLP front end.

Dependency Parsing

A Resource for Computational Experiments on Mapudungun

1 code implementation LREC 2020 Mingjun Duan, Carlos Fasola, Sai Krishna Rallabandi, Rodolfo M. Vega, Antonios Anastasopoulos, Lori Levin, Alan W. black

We present a resource for computational experiments on Mapudungun, a polysynthetic indigenous language spoken in Chile with upwards of 200 thousand speakers.

Machine Translation speech-recognition +3

Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning

no code implementations NAACL 2016 Yulia Tsvetkov, Sunayana Sitaram, Manaal Faruqui, Guillaume Lample, Patrick Littell, David Mortensen, Alan W. black, Lori Levin, Chris Dyer

We introduce polyglot language models, recurrent neural network models trained to predict symbol sequences in many different languages using shared representations of symbols and conditioning on typological information about the language to be predicted.

Representation Learning

Unsupervised POS Induction with Word Embeddings

no code implementations HLT 2015 Chu-Cheng Lin, Waleed Ammar, Chris Dyer, Lori Levin

Unsupervised word embeddings have been shown to be valuable as features in supervised learning problems; however, their role in unsupervised problems has been less thoroughly explored.

POS Word Embeddings

Use of Modality and Negation in Semantically-Informed Syntactic MT

no code implementations5 Feb 2015 Kathryn Baker, Michael Bloodgood, Bonnie J. Dorr, Chris Callison-Burch, Nathaniel W. Filardo, Christine Piatko, Lori Levin, Scott Miller

We apply our MN annotation scheme to statistical machine translation using a syntactic framework that supports the inclusion of semantic annotations.

Machine Translation Negation +1

A Modality Lexicon and its use in Automatic Tagging

no code implementations17 Oct 2014 Kathryn Baker, Michael Bloodgood, Bonnie J. Dorr, Nathaniel W. Filardo, Lori Levin, Christine Piatko

Specifically, we describe the construction of a modality annotation scheme, a modality lexicon, and two automated modality taggers that were built using the lexicon and annotation scheme.

Machine Translation Translation

Semantically-Informed Syntactic Machine Translation: A Tree-Grafting Approach

no code implementations24 Sep 2014 Kathryn Baker, Michael Bloodgood, Chris Callison-Burch, Bonnie J. Dorr, Nathaniel W. Filardo, Lori Levin, Scott Miller, Christine Piatko

We describe a unified and coherent syntactic framework for supporting a semantically-informed syntactic approach to statistical machine translation.

Machine Translation Translation

DeepCx: A transition-based approach for shallow semantic parsing with complex constructional triggers

no code implementations EMNLP 2018 Jesse Dunietz, Jaime Carbonell, Lori Levin

This paper introduces the surface construction labeling (SCL) task, which expands the coverage of Shallow Semantic Parsing (SSP) to include frames triggered by complex constructions.

Semantic Parsing

Automatically Tagging Constructions of Causation and Their Slot-Fillers

no code implementations TACL 2017 Jesse Dunietz, Lori Levin, Jaime Carbonell

Semantic parsing becomes difficult in the face of the wide variety of linguistic realizations that causation can take on.

Semantic Parsing

URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors

no code implementations EACL 2017 Patrick Littell, David R. Mortensen, Ke Lin, Katherine Kairis, Carlisle Turner, Lori Levin

We introduce the URIEL knowledge base for massively multilingual NLP and the lang2vec utility, which provides information-rich vector identifications of languages drawn from typological, geographical, and phylogenetic databases and normalized to have straightforward and consistent formats, naming, and semantics.

Language Identification Language Modelling +1

Annotation Schemes for Surface Construction Labeling

no code implementations COLING 2018 Lori Levin

In this talk I will describe the interaction of linguistics and language technologies in Surface Construction Labeling (SCL) from the perspective of corpus annotation tasks such as definiteness, modality, and causality.

Semantic Parsing

Named Entity Recognition for Linguistic Rapid Response in Low-Resource Languages: Sorani Kurdish and Tajik

no code implementations COLING 2016 Patrick Littell, Kartik Goyal, David R. Mortensen, Alexa Little, Chris Dyer, Lori Levin

This paper describes our construction of named-entity recognition (NER) systems in two Western Iranian languages, Sorani Kurdish and Tajik, as a part of a pilot study of {``}Linguistic Rapid Response{''} to potential emergency humanitarian relief situations.

Humanitarian named-entity-recognition +2

A Unified Annotation Scheme for the Semantic/Pragmatic Components of Definiteness

no code implementations LREC 2014 Archna Bhatia, M Simons, y, Lori Levin, Yulia Tsvetkov, Chris Dyer, Jordan Bender

We present a definiteness annotation scheme that captures the semantic, pragmatic, and discourse information, which we call communicative functions, associated with linguistic descriptions such as {``}a story about my speech{''}, {``}the story{''}, {``}every time I give it{''}, {``}this slideshow{''}.

Machine Translation Specificity

Resources for the Detection of Conventionalized Metaphors in Four Languages

no code implementations LREC 2014 Lori Levin, Teruko Mitamura, Brian MacWhinney, Davida Fromm, Jaime Carbonell, Weston Feely, Robert Frederking, Anatole Gershman, Carlos Ramirez

The extraction rules operate on the output of a dependency parser and identify the grammatical configurations (such as a verb with a prepositional phrase complement) that are likely to contain conventional metaphors.

Morphological parsing of Swahili using crowdsourced lexical resources

no code implementations LREC 2014 Patrick Littell, Kaitlyn Price, Lori Levin

We describe a morphological analyzer for the Swahili language, written in an extension of XFST/LEXC intended for the easy declaration of morphophonological patterns and importation of lexical resources.

Machine Translation

The ARIEL-CMU Systems for LoReHLT18

no code implementations24 Feb 2019 Aditi Chaudhary, Siddharth Dalmia, Junjie Hu, Xinjian Li, Austin Matthews, Aldrian Obaja Muis, Naoki Otani, Shruti Rijhwani, Zaid Sheikh, Nidhi Vyas, Xinyi Wang, Jiateng Xie, Ruochen Xu, Chunting Zhou, Peter J. Jansen, Yiming Yang, Lori Levin, Florian Metze, Teruko Mitamura, David R. Mortensen, Graham Neubig, Eduard Hovy, Alan W. black, Jaime Carbonell, Graham V. Horwood, Shabnam Tafreshi, Mona Diab, Efsun S. Kayi, Noura Farra, Kathleen McKeown

This paper describes the ARIEL-CMU submissions to the Low Resource Human Language Technologies (LoReHLT) 2018 evaluations for the tasks Machine Translation (MT), Entity Discovery and Linking (EDL), and detection of Situation Frames in Text and Speech (SF Text and Speech).

Machine Translation Translation

Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik

no code implementations LREC 2016 Patrick Littell, David R. Mortensen, Kartik Goyal, Chris Dyer, Lori Levin

In Sorani Kurdish, one of the most useful orthographic features in named-entity recognition {--} capitalization {--} is absent, as the language{'}s Perso-Arabic script does not make a distinction between uppercase and lowercase letters.

named-entity-recognition Named Entity Recognition +1

Using Interlinear Glosses as Pivot in Low-Resource Multilingual Machine Translation

no code implementations7 Nov 2019 Zhong Zhou, Lori Levin, David R. Mortensen, Alex Waibel

Firstly, we pool IGT for 1, 497 languages in ODIN (54, 545 glosses) and 70, 918 glosses in Arapaho and train a gloss-to-target NMT system from IGT to English, with a BLEU score of 25. 94.

Machine Translation NMT +2

Pre-tokenization of Multi-word Expressions in Cross-lingual Word Embeddings

no code implementations EMNLP 2020 Naoki Otani, Satoru Ozaki, Xingyuan Zhao, Yucen Li, Micaelah St Johns, Lori Levin

We propose a simple method for word translation of MWEs to and from English in ten languages: we first compile lists of MWEs in each language and then tokenize the MWEs as single tokens before training word embeddings.

Cross-Lingual Word Embeddings Translation +2

Automatic Interlinear Glossing for Under-Resourced Languages Leveraging Translations

no code implementations COLING 2020 Xingyuan Zhao, Satoru Ozaki, Antonios Anastasopoulos, Graham Neubig, Lori Levin

Interlinear Glossed Text (IGT) is a widely used format for encoding linguistic information in language documentation projects and scholarly papers.

Cross-Lingual Transfer LEMMA +1

The CODI-CRAC 2022 Shared Task on Anaphora, Bridging, and Discourse Deixis in Dialogue

no code implementations COLING (CODI, CRAC) 2022 Juntao Yu, Sopan Khosla, Ramesh Manuvinakurike, Lori Levin, Vincent Ng, Massimo Poesio, Michael Strube, Carolyn Rosé

The CODI-CRAC 2022 Shared Task on Anaphora Resolution in Dialogues is the second edition of an initiative focused on detecting different types of anaphoric relations in conversations of different kinds.

Construction Grammar Provides Unique Insight into Neural Language Models

no code implementations4 Feb 2023 Leonie Weissweiler, Taiqi He, Naoki Otani, David R. Mortensen, Lori Levin, Hinrich Schütze

Construction Grammar (CxG) has recently been used as the basis for probing studies that have investigated the performance of large pretrained language models (PLMs) with respect to the structure and meaning of constructions.

Position

Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics Interface of LMs Through Agentivity

1 code implementation29 May 2023 Lindia Tjuatja, Emmy Liu, Lori Levin, Graham Neubig

Recent advances in large language models have prompted researchers to examine their abilities across a variety of linguistic tasks, but little has been done to investigate how models handle the interactions in meaning across words and larger syntactic forms -- i. e. phenomena at the intersection of syntax and semantics.

GlossLM: Multilingual Pretraining for Low-Resource Interlinear Glossing

no code implementations11 Mar 2024 Michael Ginn, Lindia Tjuatja, Taiqi He, Enora Rice, Graham Neubig, Alexis Palmer, Lori Levin

A key aspect of language documentation is the creation of annotated text in a format such as interlinear glossed text (IGT), which captures fine-grained morphosyntactic analyses in a morpheme-by-morpheme format.

Wav2Gloss: Generating Interlinear Glossed Text from Speech

no code implementations19 Mar 2024 Taiqi He, Kwanghee Choi, Lindia Tjuatja, Nathaniel R. Robinson, Jiatong Shi, Shinji Watanabe, Graham Neubig, David R. Mortensen, Lori Levin

Thousands of the world's languages are in danger of extinction--a tremendous threat to cultural identities and human language diversity.

Constructions Are So Difficult That Even Large Language Models Get Them Right for the Wrong Reasons

1 code implementation26 Mar 2024 Shijia Zhou, Leonie Weissweiler, Taiqi He, Hinrich Schütze, David R. Mortensen, Lori Levin

In this paper, we make a contribution that can be understood from two perspectives: from an NLP perspective, we introduce a small challenge dataset for NLI with large lexical overlap, which minimises the possibility of models discerning entailment solely based on token distinctions, and show that GPT-4 and Llama 2 fail it with strong bias.

Cannot find the paper you are looking for? You can Submit a new open access paper.