Search Results for author: James Mayfield

Found 31 papers, 11 papers with code

PLAID SHIRTTT for Large-Scale Streaming Dense Retrieval

1 code implementation2 May 2024 Dawn Lawrie, Efsun Kayi, Eugene Yang, James Mayfield, Douglas W. Oard

PLAID, an efficient implementation of the ColBERT late interaction bi-encoder using pretrained language models for ranking, consistently achieves state-of-the-art performance in monolingual, cross-language, and multilingual retrieval.

Retrieval

Distillation for Multilingual Information Retrieval

1 code implementation2 May 2024 Eugene Yang, Dawn Lawrie, James Mayfield

Recent work in cross-language information retrieval (CLIR), where queries and documents are in different languages, has shown the benefit of the Translate-Distill framework that trains a cross-language neural dual-encoder model using translation and distillation.

Information Retrieval Retrieval

Language Fairness in Multilingual Information Retrieval

1 code implementation2 May 2024 Eugene Yang, Thomas Jänich, James Mayfield, Dawn Lawrie

We also evaluate real MLIR systems on two publicly available benchmarks and show that the PEER scores align with prior analytical findings on MLIR fairness.

Fairness Information Retrieval +1

Efficiency-Effectiveness Tradeoff of Probabilistic Structured Queries for Cross-Language Information Retrieval

1 code implementation29 Apr 2024 Eugene Yang, Suraj Nair, Dawn Lawrie, James Mayfield, Douglas W. Oard, Kevin Duh

Probabilistic Structured Queries (PSQ) is a cross-language information retrieval (CLIR) method that uses translation probabilities statistically derived from aligned corpora.

Information Retrieval Retrieval +1

Overview of the TREC 2023 NeuCLIR Track

no code implementations11 Apr 2024 Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, Eugene Yang

The principal tasks are ranked retrieval of news in one of the three languages, using English topics.

Information Retrieval Retrieval

HLTCOE at TREC 2023 NeuCLIR Track

no code implementations11 Apr 2024 Eugene Yang, Dawn Lawrie, James Mayfield

TT trains a ColBERT model with English queries and passages automatically translated into the document language from the MS-MARCO v1 collection.

Document Translation

Extending Translate-Train for ColBERT-X to African Language CLIR

no code implementations11 Apr 2024 Eugene Yang, Dawn J. Lawrie, Paul McNamee, James Mayfield

This paper describes the submission runs from the HLTCOE team at the CIRAL CLIR tasks for African languages at FIRE 2023.

Machine Translation Retrieval +1

Translate-Distill: Learning Cross-Language Dense Retrieval by Translation and Distillation

1 code implementation9 Jan 2024 Eugene Yang, Dawn Lawrie, James Mayfield, Douglas W. Oard, Scott Miller

Applying a similar knowledge distillation approach to training an efficient dual-encoder model for Cross-Language Information Retrieval (CLIR), where queries and documents are in different languages, is challenging due to the lack of a sufficiently large training collection when the query and document languages differ.

Information Retrieval Knowledge Distillation +2

Synthetic Cross-language Information Retrieval Training Data

no code implementations29 Apr 2023 James Mayfield, Eugene Yang, Dawn Lawrie, Samuel Barham, Orion Weller, Marc Mason, Suraj Nair, Scott Miller

By repeating this process, collections of arbitrary size can be created in the style of MS MARCO but using naturally-occurring documents in any desired genre and domain of discourse.

Information Retrieval Language Modelling +4

Overview of the TREC 2022 NeuCLIR Track

no code implementations24 Apr 2023 Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, Eugene Yang

This is the first year of the TREC Neural CLIR (NeuCLIR) track, which aims to study the impact of neural approaches to cross-language information retrieval.

Information Retrieval Retrieval

Parameter-efficient Zero-shot Transfer for Cross-Language Dense Retrieval with Adapters

no code implementations20 Dec 2022 Eugene Yang, Suraj Nair, Dawn Lawrie, James Mayfield, Douglas W. Oard

By adding adapters pretrained on language tasks for a specific language with task-specific adapters, prior work has shown that the adapter-enhanced models perform better than fine-tuning the entire model when transferring across languages in various NLP tasks.

Information Retrieval Language Modelling +1

Neural Approaches to Multilingual Information Retrieval

1 code implementation3 Sep 2022 Dawn Lawrie, Eugene Yang, Douglas W. Oard, James Mayfield

Providing access to information across languages has been a goal of Information Retrieval (IR) for decades.

Document Translation Information Retrieval +3

HC4: A New Suite of Test Collections for Ad Hoc CLIR

1 code implementation24 Jan 2022 Dawn Lawrie, James Mayfield, Douglas Oard, Eugene Yang

HC4 is a new suite of test collections for ad hoc Cross-Language Information Retrieval (CLIR), with Common Crawl News documents in Chinese, Persian, and Russian, topics in English and in the document languages, and graded relevance judgments.

Active Learning Information Retrieval +1

Patapasco: A Python Framework for Cross-Language Information Retrieval Experiments

1 code implementation24 Jan 2022 Cash Costello, Eugene Yang, Dawn Lawrie, James Mayfield

While there are high-quality software frameworks for information retrieval experimentation, they do not explicitly support cross-language information retrieval (CLIR).

Information Retrieval Retrieval

Improving Zero-Shot Multi-Lingual Entity Linking

no code implementations16 Apr 2021 Elliot Schumacher, James Mayfield, Mark Dredze

Entity linking -- the task of identifying references in free text to relevant knowledge base representations -- often focuses on single languages.

Entity Linking

Tagging Location Phrases in Text

no code implementations LREC 2020 Paul McNamee, James Mayfield, Cash Costello, Caitlyn Bishop, Shelby Anderson

Throughout this time the majority of such work has focused on detection and classification of entities into coarse-grained types like: PERSON, ORGANIZATION, and LOCATION.

Humanitarian

Dragonfly: Advances in Non-Speaker Annotation for Low Resource Languages

no code implementations LREC 2020 Cash Costello, Shelby Anderson, Caitlyn Bishop, James Mayfield, Paul McNamee

Dragonfly is an open source software tool that supports annotation of text in a low resource language by non-speakers of the language.

Improving Neural Named Entity Recognition with Gazetteers

1 code implementation6 Mar 2020 Chan Hee Song, Dawn Lawrie, Tim Finin, James Mayfield

The goal of this work is to improve the performance of a neural named entity recognition system by adding input features that indicate a word is part of a name included in a gazetteer.

named-entity-recognition Named Entity Recognition +1

Platforms for Non-speakers Annotating Names in Any Language

no code implementations ACL 2018 Ying Lin, Cash Costello, Boliang Zhang, Di Lu, Heng Ji, James Mayfield, Paul McNamee

We demonstrate two annotation platforms that allow an English speaker to annotate names for any language without knowing the language.

Language-Independent Named Entity Analysis Using Parallel Projection and Rule-Based Disambiguation

no code implementations WS 2017 James Mayfield, Paul McNamee, Cash Costello

The 2017 shared task at the Balto-Slavic NLP workshop requires identifying coarse-grained named entities in seven languages, identifying each entity{'}s base form, and clustering name mentions across the multilingual set of documents.

Clustering named-entity-recognition +2

Interactive Knowledge Base Population

no code implementations31 May 2015 Travis Wolfe, Mark Dredze, James Mayfield, Paul McNamee, Craig Harman, Tim Finin, Benjamin Van Durme

Most work on building knowledge bases has focused on collecting entities and facts from as large a collection of documents as possible.

Knowledge Base Population

Creating and Curating a Cross-Language Person-Entity Linking Collection

no code implementations LREC 2012 Dawn Lawrie, James Mayfield, Paul McNamee, Douglas Oard

To stimulate research in cross-language entity linking, we present a new test collection for evaluating the accuracy of cross-language entity linking in twenty-one languages.

Entity Linking Knowledge Base Population +1

Cannot find the paper you are looking for? You can Submit a new open access paper.