Search Results for author: Stephanie Strassel

Found 37 papers, 0 papers with code

CAMIO: A Corpus for OCR in Multiple Languages

no code implementations LREC 2022 Michael Arrigo, Stephanie Strassel, Nolan King, Thao Tran, Lisa Mason

CAMIO (Corpus of Annotated Multilingual Images for OCR) is a new corpus created by Linguistic Data Consortium to serve as a resource to support the development and evaluation of optical character recognition (OCR) and related technologies for 35 languages across 24 unique scripts.

Optical Character Recognition Optical Character Recognition (OCR)

A Study in Contradiction: Data and Annotation for AIDA Focusing on Informational Conflict in Russia-Ukraine Relations

no code implementations LREC 2022 Jennifer Tracey, Ann Bies, Jeremy Getman, Kira Griffitt, Stephanie Strassel

This paper describes data resources created for Phase 1 of the DARPA Active Interpretation of Disparate Alternatives (AIDA) program, which aims to develop language technology that can help humans manage large volumes of sometimes conflicting information to develop a comprehensive understanding of events around the world, even when such events are described in multiple media and languages.

Reflections on 30 Years of Language Resource Development and Sharing

no code implementations LREC 2022 Christopher Cieri, Mark Liberman, Sunghye Cho, Stephanie Strassel, James Fiumara, Jonathan Wright

The Linguistic Data Consortium was founded in 1992 to solve the problem that limitations in access to shareable data was impeding progress in Human Language Technology research and development.

Management Open-Ended Question Answering

Basic Language Resources for 31 Languages (Plus English): The LORELEI Representative and Incident Language Packs

no code implementations LREC 2020 Jennifer Tracey, Stephanie Strassel

This paper documents and describes the thirty-one basic language resource packs created for the DARPA LORELEI program for use in development and testing of systems capable of providing language-independent situational awareness in emerging scenarios in a low resource language context.

Call My Net 2: A New Resource for Speaker Recognition

no code implementations LREC 2020 Karen Jones, Stephanie Strassel, Kevin Walker, Jonathan Wright

Speakers used a variety of handsets, including landline and mobile devices, and made VoIP calls from tablets or computers.

Speaker Recognition

Morphological Segmentation for Low Resource Languages

no code implementations LREC 2020 Justin Mott, Ann Bies, Stephanie Strassel, Jordan Kodner, Caitlin Richter, Hongzhi Xu, Mitchell Marcus

This paper describes a new morphology resource created by Linguistic Data Consortium and the University of Pennsylvania for the DARPA LORELEI Program.

Segmentation

A Progress Report on Activities at the Linguistic Data Consortium Benefitting the LREC Community

no code implementations LREC 2020 Christopher Cieri, James Fiumara, Stephanie Strassel, Jonathan Wright, Denise DiPersio, Mark Liberman

This latest in a series of Linguistic Data Consortium (LDC) progress reports to the LREC community does not describe any single language resource, evaluation campaign or technology but sketches the activities, since the last report, of a data center devoted to supporting the work of LREC attendees among other research communities.

The SAFE-T Corpus: A New Resource for Simulated Public Safety Communications

no code implementations LREC 2020 Dana Delgado, Kevin Walker, Stephanie Strassel, Karen Jones, Christopher Caruso, David Graff

We introduce a new resource, the SAFE-T (Speech Analysis for Emergency Response Technology) Corpus, designed to simulate first-responder communications by inducing high vocal effort and urgent speech with situational background noise in a game-based collection protocol.

Action Detection Activity Detection +3

The Query of Everything: Developing Open-Domain, Natural-Language Queries for BOLT Information Retrieval

no code implementations LREC 2016 Kira Griffitt, Stephanie Strassel

The DARPA BOLT Information Retrieval evaluations target open-domain natural-language queries over a large corpus of informal text in English, Chinese and Egyptian Arabic.

Information Retrieval Natural Language Queries +1

Selection Criteria for Low Resource Language Programs

no code implementations LREC 2016 Christopher Cieri, Mike Maxwell, Stephanie Strassel, Jennifer Tracey

This paper documents and describes the criteria used to select languages for study within programs that include low resource languages whether given that label or another similar one.

Management

Parallel Chinese-English Entities, Relations and Events Corpora

no code implementations LREC 2016 Justin Mott, Ann Bies, Zhiyi Song, Stephanie Strassel

This paper introduces the parallel Chinese-English Entities, Relations and Events (ERE) corpora developed by Linguistic Data Consortium under the DARPA Deep Exploration and Filtering of Text (DEFT) Program.

Knowledge Base Population Translation

LORELEI Language Packs: Data, Tools, and Resources for Technology Development in Low Resource Languages

no code implementations LREC 2016 Stephanie Strassel, Jennifer Tracey

In this paper, we describe the textual linguistic resources in nearly 3 dozen languages being produced by Linguistic Data Consortium for DARPA{'}s LORELEI (Low Resource Languages for Emergent Incidents) Program.

Uzbek-English and Turkish-English Morpheme Alignment Corpora

no code implementations LREC 2016 Xuansong Li, Jennifer Tracey, Stephen Grimes, Stephanie Strassel

Morphologically-rich languages pose problems for machine translation (MT) systems, including word-alignment errors, data sparsity and multiple affixes.

Machine Translation Translation +1

Multi-language Speech Collection for NIST LRE

no code implementations LREC 2016 Karen Jones, Stephanie Strassel, Kevin Walker, David Graff, Jonathan Wright

The Multi-language Speech (MLS) Corpus supports NIST{'}s Language Recognition Evaluation series by providing new conversational telephone speech and broadcast narrowband data in 20 languages/dialects.

The RATS Collection: Supporting HLT Research with Degraded Audio Data

no code implementations LREC 2014 David Graff, Kevin Walker, Stephanie Strassel, Xiaoyi Ma, Karen Jones, Ann Sawyer

The DARPA RATS program was established to foster development of language technology systems that can perform well on speaker-to-speaker communications over radio channels that evince a wide range in the type and extent of signal variability and acoustic degradation.

Action Detection Activity Detection +3

New Directions for Language Resource Development and Distribution

no code implementations LREC 2014 Christopher Cieri, Denise DiPersio, Mark Liberman, Andrea Mazzucchi, Stephanie Strassel, Jonathan Wright

Despite the growth in the number of linguistic data centers around the world, their accomplishments and expansions and the advances they have help enable, the language resources that exist are a small fraction of those required to meet the goals of Human Language Technologies (HLT) for the worldÂ’s languages and the promises they offer: broad access to knowledge, direct communication across language boundaries and engagement in a global community.

Transfer Learning

Parallel Aligned Treebanks at LDC: New Challenges Interfacing Existing Infrastructures

no code implementations LREC 2012 Xuansong Li, Stephanie Strassel, Stephen Grimes, Safa Ismael, Mohamed Maamouri, Ann Bies, Nianwen Xue

Parallel aligned treebanks (PAT) are linguistic corpora annotated with morphological and syntactic structures that are aligned at sentence as well as sub-sentence levels.

Machine Translation Sentence +2

Linguistic Resources for Entity Linking Evaluation: from Monolingual to Cross-lingual

no code implementations LREC 2012 Xuansong Li, Stephanie Strassel, Heng Ji, Kira Griffitt, Joe Ellis

To advance information extraction and question answering technologies toward a more realistic path, the U. S. NIST (National Institute of Standards and Technology) initiated the KBP (Knowledge Base Population) task as one of the TAC (Text Analysis Conference) evaluation tracks.

Cross-Lingual Entity Linking Entity Linking +5

Creating HAVIC: Heterogeneous Audio Visual Internet Collection

no code implementations LREC 2012 Stephanie Strassel, Am Morris, a, Jonathan Fiscus, Christopher Caruso, Haejoong Lee, Paul Over, James Fiumara, Barbara Shaw, Brian Antonishek, Martial Michel

Linguistic Data Consortium and the National Institute of Standards and Technology are collaborating to create a large, heterogeneous annotated multimodal corpus to support research in multimodal event detection and related technologies.

Event Detection

Annotation Trees: LDC's customizable, extensible, scalable, annotation infrastructure

no code implementations LREC 2012 Jonathan Wright, Kira Griffitt, Joe Ellis, Stephanie Strassel, Brendan Callahan

In recent months, LDC has developed a web-based annotation infrastructure centered around a tree model of annotations and a Ruby on Rails application called the LDC User Interface (LUI).

Reading Comprehension

Linguistic Resources for Handwriting Recognition and Translation Evaluation

no code implementations LREC 2012 Zhiyi Song, Safa Ismael, Stephen Grimes, David Doermann, Stephanie Strassel

LDC has developed a stable pipeline and infrastructures for collecting and annotating handwriting linguistic resources to support the evaluation of MADCAT and OpenHaRT.

Document Classification Handwriting Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.