Search Results for author: Sheng-Chieh Lin

Found 20 papers, 11 papers with code

In-Batch Negatives for Knowledge Distillation with Tightly-Coupled Teachers for Dense Retrieval

no code implementations • ACL (RepL4NLP) 2021 • Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin

We present an efficient training approach to text retrieval with dense representations that applies knowledge distillation using the ColBERT late-interaction ranking model.

Document Ranking Knowledge Distillation +2

Paper
Add Code

Improving Conversational Passage Re-ranking with View Ensemble

1 code implementation • 26 Apr 2023 • Jia-Huei Ju, Sheng-Chieh Lin, Ming-Feng Tsai, Chuan-Ju Wang

This paper presents ConvRerank, a conversational passage re-ranker that employs a newly developed pseudo-labeling approach.

Conversational Search Passage Re-Ranking +1

Paper
Code

How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval

1 code implementation • 15 Feb 2023 • Sheng-Chieh Lin, Akari Asai, Minghan Li, Barlas Oguz, Jimmy Lin, Yashar Mehdad, Wen-tau Yih, Xilun Chen

We hence propose a new DA approach with diverse queries and sources of supervision to progressively train a generalizable DR. As a result, DRAGON, our dense retriever trained with diverse augmentation, is the first BERT-base-sized DR to achieve state-of-the-art effectiveness in both supervised and zero-shot evaluations and even competes with models using more complex late interaction (ColBERTv2 and SPLADE++).

Contrastive Learning Data Augmentation +1

247

Paper
Code

SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes

1 code implementation • 13 Feb 2023 • Minghan Li, Sheng-Chieh Lin, Xueguang Ma, Jimmy Lin

Multi-vector retrieval methods have demonstrated their effectiveness on various retrieval datasets, and among them, ColBERT is the most established method based on the late interaction of contextualized token embeddings of pre-trained language models.

Information Retrieval Retrieval

1,452

Paper
Code

CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval

1 code implementation • 18 Nov 2022 • Minghan Li, Sheng-Chieh Lin, Barlas Oguz, Asish Ghoshal, Jimmy Lin, Yashar Mehdad, Wen-tau Yih, Xilun Chen

In this paper, we unify different multi-vector retrieval models from a token routing viewpoint and propose conditional token interaction via dynamic lexical routing, namely CITADEL, for efficient and effective multi-vector retrieval.

Retrieval

247

Paper
Code

Strong Gravitational Lensing Parameter Estimation with Vision Transformer

1 code implementation • 9 Oct 2022 • Kuan-Wei Huang, Geoff Chih-Fan Chen, Po-Wen Chang, Sheng-Chieh Lin, Chia-Jung Hsu, Vishal Thengane, Joshua Yao-Yu Lin

Quantifying the parameters and corresponding uncertainties of hundreds of strongly lensed quasar systems holds the key to resolving one of the most important scientific questions: the Hubble constant ($H_{0}$) tension.

Paper
Code

Aggretriever: A Simple Approach to Aggregate Textual Representations for Robust Dense Passage Retrieval

1 code implementation • 31 Jul 2022 • Sheng-Chieh Lin, Minghan Li, Jimmy Lin

Pre-trained language models have been successful in many knowledge-intensive NLP tasks.

Knowledge Distillation Language Modelling +2

Paper
Code

A Dense Representation Framework for Lexical and Semantic Matching

1 code implementation • 20 Jun 2022 • Sheng-Chieh Lin, Jimmy Lin

In contrast, our work integrates lexical representations with dense semantic representations by densifying high-dimensional lexical representations into what we call low-dimensional dense lexical representations (DLRs).

Retrieval Semantic Text Matching +2

Paper
Code

Densifying Sparse Representations for Passage Retrieval by Representational Slicing

1 code implementation • 9 Dec 2021 • Sheng-Chieh Lin, Jimmy Lin

Learned sparse and dense representations capture different successful approaches to text retrieval and the fusion of their results has proven to be more effective and robust.

Passage Retrieval Retrieval +1

Paper
Code

Contextualized Query Embeddings for Conversational Search

no code implementations • EMNLP 2021 • Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin

This paper describes a compact and effective model for low-latency passage retrieval in conversational search based on learned dense representations.

Conversational Search Open-Domain Question Answering +2

Paper
Add Code

Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling

4 code implementations • 14 Apr 2021 • Sebastian Hofstätter, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin, Allan Hanbury

A vital step towards the widespread adoption of neural retrieval models is their resource efficiency throughout the training, indexing and query workflows.

Ranked #15 on Zero-shot Text Search on BEIR

Re-Ranking Retrieval +2

Paper
Code

Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations

1 code implementation • 19 Feb 2021 • Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, Rodrigo Nogueira

Pyserini is an easy-to-use Python toolkit that supports replicable IR research by providing effective first-stage retrieval in a multi-stage ranking architecture.

Cultural Vocal Bursts Intensity Prediction Information Retrieval +1

1,452

Paper
Code

Angular clustering and host halo properties of [OII] emitters at $z >1$ in the Subaru HSC survey

no code implementations • 22 Dec 2020 • Teppei Okumura, Masao Hayashi, I-Non Chiu, Yen-Ting Lin, Ken Osato, Bau-Ching Hsieh, Sheng-Chieh Lin

From the constrained HOD model, the average mass of halos hosting the [OII] emitters is derived to be $\log{M_{eff}/(h^{-1}M_\odot)}=12. 70^{+0. 09}_{-0. 07}$ and $12. 61^{+0. 09}_{-0. 05}$ at z=1. 19 and 1. 47, respectively, which will become halos with the present-day mass, $M\sim 1. 5 \times 10^{13}h^{-1}M_\odot$.

Astrophysics of Galaxies Cosmology and Nongalactic Astrophysics

Paper
Add Code

Optical Wavelength Guided Self-Supervised Feature Learning For Galaxy Cluster Richness Estimate

no code implementations • 4 Dec 2020 • Gongbo Liang, Yuanyuan Su, Sheng-Chieh Lin, Yu Zhang, Yuanyuan Zhang, Nathan Jacobs

We believe the proposed method will benefit astronomy and cosmology, where a large number of unlabeled multi-band images are available, but acquiring image labels is costly.

Astronomy

Paper
Add Code

Designing Templates for Eliciting Commonsense Knowledge from Pretrained Sequence-to-Sequence Models

no code implementations • COLING 2020 • Jheng-Hong Yang, Sheng-Chieh Lin, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin

While internalized {``}implicit knowledge{''} in pretrained transformers has led to fruitful progress in many natural language understanding tasks, how to most effectively elicit such knowledge remains an open question.

Multiple-choice Natural Language Understanding +1

Paper
Add Code

Distilling Dense Representations for Ranking using Tightly-Coupled Teachers

2 code implementations • 22 Oct 2020 • Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin

We present an approach to ranking with dense representations that applies knowledge distillation to improve the recently proposed late-interaction ColBERT model.

Knowledge Distillation

Paper
Code

Personalized TV Recommendation: Fusing User Behavior and Preferences

no code implementations • 30 Aug 2020 • Sheng-Chieh Lin, Ting-Wei Lin, Jing-Kai Lou, Ming-Feng Tsai, Chuan-Ju Wang

In this paper, we propose a two-stage ranking approach for recommending linear TV programs.

Paper
Add Code

Multi-Stage Conversational Passage Retrieval: An Approach to Fusing Term Importance Estimation and Neural Query Rewriting

no code implementations • 5 May 2020 • Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin

Conversational search plays a vital role in conversational information seeking.

Ad-Hoc Information Retrieval Conversational Search +2

Paper
Add Code

Conversational Question Reformulation via Sequence-to-Sequence Architectures and Pretrained Language Models

no code implementations • 4 Apr 2020 • Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin

This paper presents an empirical study of conversational question reformulation (CQR) with sequence-to-sequence architectures and pretrained language models (PLMs).

Task-Oriented Dialogue Systems

Paper
Add Code

TTTTTackling WinoGrande Schemas

no code implementations • 18 Mar 2020 • Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin

We applied the T5 sequence-to-sequence model to tackle the AI2 WinoGrande Challenge by decomposing each example into two input text strings, each containing a hypothesis, and using the probabilities assigned to the "entailment" token as a score of the hypothesis.

Ranked #17 on Coreference Resolution on Winograd Schema Challenge

Coreference Resolution

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.