Search Results for author: Jheng-Hong Yang

Found 15 papers, 6 papers with code

In-Batch Negatives for Knowledge Distillation with Tightly-Coupled Teachers for Dense Retrieval

no code implementations • ACL (RepL4NLP) 2021 • Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin

We present an efficient training approach to text retrieval with dense representations that applies knowledge distillation using the ColBERT late-interaction ranking model.

Document Ranking Knowledge Distillation +2

Paper
Add Code

Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard

2 code implementations • 13 Jun 2023 • Ehsan Kamalloo, Nandan Thakur, Carlos Lassance, Xueguang Ma, Jheng-Hong Yang, Jimmy Lin

BEIR is a benchmark dataset for zero-shot evaluation of information retrieval models across 18 different domain/task combinations.

Information Retrieval Representation Learning +1

1,370

Paper
Code

AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation

2 code implementations • 4 Apr 2023 • Jheng-Hong Yang, Carlos Lassance, Rafael Sampaio de Rezende, Krishna Srinivasan, Miriam Redi, Stéphane Clinchant, Jimmy Lin

This paper presents the AToMiC (Authoring Tools for Multimedia Content) dataset, designed to advance research in image/text cross-modal retrieval.

Cross-Modal Retrieval Retrieval +1

957

Paper
Code

Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval

no code implementations • 3 Apr 2023 • Jimmy Lin, David Alfonso-Hermelo, Vitor Jeronymo, Ehsan Kamalloo, Carlos Lassance, Rodrigo Nogueira, Odunayo Ogundepo, Mehdi Rezagholizadeh, Nandan Thakur, Jheng-Hong Yang, Xinyu Zhang

The advent of multilingual language models has generated a resurgence of interest in cross-lingual information retrieval (CLIR), which is the task of searching documents in one language with queries from another.

Cross-Lingual Information Retrieval Retrieval

Paper
Add Code

Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval

1 code implementation • 21 Mar 2022 • Wei Zhong, Jheng-Hong Yang, Yuqing Xie, Jimmy Lin

With the recent success of dense retrieval methods based on bi-encoders, studies have applied this approach to various interesting downstream retrieval tasks with good efficiency and in-domain effectiveness.

Ranked #1 on Math Information Retrieval on ARQMath (using extra training data)

Information Retrieval Math +2

Paper
Code

Sparsifying Sparse Representations for Passage Retrieval by Top-$k$ Masking

no code implementations • 17 Dec 2021 • Jheng-Hong Yang, Xueguang Ma, Jimmy Lin

Sparse lexical representation learning has demonstrated much progress in improving passage retrieval effectiveness in recent models such as DeepImpact, uniCOIL, and SPLADE.

Passage Retrieval Representation Learning +2

Paper
Add Code

Text-to-Text Multi-view Learning for Passage Re-ranking

no code implementations • 29 Apr 2021 • Jia-Huei Ju, Jheng-Hong Yang, Chuan-Ju Wang

Recently, much progress in natural language processing has been driven by deep contextualized representations pretrained on large corpora.

MULTI-VIEW LEARNING Passage Ranking +4

Paper
Add Code

Contextualized Query Embeddings for Conversational Search

no code implementations • EMNLP 2021 • Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin

This paper describes a compact and effective model for low-latency passage retrieval in conversational search based on learned dense representations.

Conversational Search Open-Domain Question Answering +2

Paper
Add Code

Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling

4 code implementations • 14 Apr 2021 • Sebastian Hofstätter, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin, Allan Hanbury

A vital step towards the widespread adoption of neural retrieval models is their resource efficiency throughout the training, indexing and query workflows.

Ranked #15 on Zero-shot Text Search on BEIR

Re-Ranking Retrieval +2

Paper
Code

Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations

1 code implementation • 19 Feb 2021 • Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, Rodrigo Nogueira

Pyserini is an easy-to-use Python toolkit that supports replicable IR research by providing effective first-stage retrieval in a multi-stage ranking architecture.

Cultural Vocal Bursts Intensity Prediction Information Retrieval +1

1,448

Paper
Code

Designing Templates for Eliciting Commonsense Knowledge from Pretrained Sequence-to-Sequence Models

no code implementations • COLING 2020 • Jheng-Hong Yang, Sheng-Chieh Lin, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin

While internalized {``}implicit knowledge{''} in pretrained transformers has led to fruitful progress in many natural language understanding tasks, how to most effectively elicit such knowledge remains an open question.

Multiple-choice Natural Language Understanding +1

Paper
Add Code

Distilling Dense Representations for Ranking using Tightly-Coupled Teachers

2 code implementations • 22 Oct 2020 • Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin

We present an approach to ranking with dense representations that applies knowledge distillation to improve the recently proposed late-interaction ColBERT model.

Knowledge Distillation

Paper
Code

Multi-Stage Conversational Passage Retrieval: An Approach to Fusing Term Importance Estimation and Neural Query Rewriting

no code implementations • 5 May 2020 • Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin

Conversational search plays a vital role in conversational information seeking.

Ad-Hoc Information Retrieval Conversational Search +2

Paper
Add Code

Conversational Question Reformulation via Sequence-to-Sequence Architectures and Pretrained Language Models

no code implementations • 4 Apr 2020 • Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin

This paper presents an empirical study of conversational question reformulation (CQR) with sequence-to-sequence architectures and pretrained language models (PLMs).

Task-Oriented Dialogue Systems

Paper
Add Code

TTTTTackling WinoGrande Schemas

no code implementations • 18 Mar 2020 • Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin

We applied the T5 sequence-to-sequence model to tackle the AI2 WinoGrande Challenge by decomposing each example into two input text strings, each containing a hypothesis, and using the probabilities assigned to the "entailment" token as a score of the hypothesis.

Ranked #17 on Coreference Resolution on Winograd Schema Challenge

Coreference Resolution

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.