Search Results for author: Sheng-Chieh Lin

Found 21 papers, 11 papers with code

In-Batch Negatives for Knowledge Distillation with Tightly-Coupled Teachers for Dense Retrieval

no code implementations ACL (RepL4NLP) 2021 Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin

We present an efficient training approach to text retrieval with dense representations that applies knowledge distillation using the ColBERT late-interaction ranking model.

Document Ranking Knowledge Distillation +2

FLAME: Factuality-Aware Alignment for Large Language Models

no code implementations2 May 2024 Sheng-Chieh Lin, Luyu Gao, Barlas Oguz, Wenhan Xiong, Jimmy Lin, Wen-tau Yih, Xilun Chen

Furthermore, reward functions used in standard RL can also encourage hallucination, because it guides the LLM to provide more helpful responses on a diverse set of instructions, often preferring longer and more detailed responses.

Hallucination Instruction Following +1

Improving Conversational Passage Re-ranking with View Ensemble

1 code implementation26 Apr 2023 Jia-Huei Ju, Sheng-Chieh Lin, Ming-Feng Tsai, Chuan-Ju Wang

This paper presents ConvRerank, a conversational passage re-ranker that employs a newly developed pseudo-labeling approach.

Conversational Search Passage Re-Ranking +1

How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval

1 code implementation15 Feb 2023 Sheng-Chieh Lin, Akari Asai, Minghan Li, Barlas Oguz, Jimmy Lin, Yashar Mehdad, Wen-tau Yih, Xilun Chen

We hence propose a new DA approach with diverse queries and sources of supervision to progressively train a generalizable DR. As a result, DRAGON, our dense retriever trained with diverse augmentation, is the first BERT-base-sized DR to achieve state-of-the-art effectiveness in both supervised and zero-shot evaluations and even competes with models using more complex late interaction (ColBERTv2 and SPLADE++).

Contrastive Learning Data Augmentation +1

SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes

1 code implementation13 Feb 2023 Minghan Li, Sheng-Chieh Lin, Xueguang Ma, Jimmy Lin

Multi-vector retrieval methods have demonstrated their effectiveness on various retrieval datasets, and among them, ColBERT is the most established method based on the late interaction of contextualized token embeddings of pre-trained language models.

Information Retrieval Retrieval

CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval

1 code implementation18 Nov 2022 Minghan Li, Sheng-Chieh Lin, Barlas Oguz, Asish Ghoshal, Jimmy Lin, Yashar Mehdad, Wen-tau Yih, Xilun Chen

In this paper, we unify different multi-vector retrieval models from a token routing viewpoint and propose conditional token interaction via dynamic lexical routing, namely CITADEL, for efficient and effective multi-vector retrieval.


Strong Gravitational Lensing Parameter Estimation with Vision Transformer

1 code implementation9 Oct 2022 Kuan-Wei Huang, Geoff Chih-Fan Chen, Po-Wen Chang, Sheng-Chieh Lin, Chia-Jung Hsu, Vishal Thengane, Joshua Yao-Yu Lin

Quantifying the parameters and corresponding uncertainties of hundreds of strongly lensed quasar systems holds the key to resolving one of the most important scientific questions: the Hubble constant ($H_{0}$) tension.

A Dense Representation Framework for Lexical and Semantic Matching

1 code implementation20 Jun 2022 Sheng-Chieh Lin, Jimmy Lin

In contrast, our work integrates lexical representations with dense semantic representations by densifying high-dimensional lexical representations into what we call low-dimensional dense lexical representations (DLRs).

Retrieval Semantic Text Matching +2

Densifying Sparse Representations for Passage Retrieval by Representational Slicing

1 code implementation9 Dec 2021 Sheng-Chieh Lin, Jimmy Lin

Learned sparse and dense representations capture different successful approaches to text retrieval and the fusion of their results has proven to be more effective and robust.

Passage Retrieval Retrieval +1

Contextualized Query Embeddings for Conversational Search

no code implementations EMNLP 2021 Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin

This paper describes a compact and effective model for low-latency passage retrieval in conversational search based on learned dense representations.

Conversational Search Open-Domain Question Answering +2

Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling

4 code implementations14 Apr 2021 Sebastian Hofstätter, Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin, Allan Hanbury

A vital step towards the widespread adoption of neural retrieval models is their resource efficiency throughout the training, indexing and query workflows.

Re-Ranking Retrieval +2

Angular clustering and host halo properties of [OII] emitters at $z >1$ in the Subaru HSC survey

no code implementations22 Dec 2020 Teppei Okumura, Masao Hayashi, I-Non Chiu, Yen-Ting Lin, Ken Osato, Bau-Ching Hsieh, Sheng-Chieh Lin

From the constrained HOD model, the average mass of halos hosting the [OII] emitters is derived to be $\log{M_{eff}/(h^{-1}M_\odot)}=12. 70^{+0. 09}_{-0. 07}$ and $12. 61^{+0. 09}_{-0. 05}$ at z=1. 19 and 1. 47, respectively, which will become halos with the present-day mass, $M\sim 1. 5 \times 10^{13}h^{-1}M_\odot$.

Astrophysics of Galaxies Cosmology and Nongalactic Astrophysics

Optical Wavelength Guided Self-Supervised Feature Learning For Galaxy Cluster Richness Estimate

no code implementations4 Dec 2020 Gongbo Liang, Yuanyuan Su, Sheng-Chieh Lin, Yu Zhang, Yuanyuan Zhang, Nathan Jacobs

We believe the proposed method will benefit astronomy and cosmology, where a large number of unlabeled multi-band images are available, but acquiring image labels is costly.


Designing Templates for Eliciting Commonsense Knowledge from Pretrained Sequence-to-Sequence Models

no code implementations COLING 2020 Jheng-Hong Yang, Sheng-Chieh Lin, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin

While internalized {``}implicit knowledge{''} in pretrained transformers has led to fruitful progress in many natural language understanding tasks, how to most effectively elicit such knowledge remains an open question.

Multiple-choice Natural Language Understanding +1

Distilling Dense Representations for Ranking using Tightly-Coupled Teachers

2 code implementations22 Oct 2020 Sheng-Chieh Lin, Jheng-Hong Yang, Jimmy Lin

We present an approach to ranking with dense representations that applies knowledge distillation to improve the recently proposed late-interaction ColBERT model.

Knowledge Distillation

Personalized TV Recommendation: Fusing User Behavior and Preferences

no code implementations30 Aug 2020 Sheng-Chieh Lin, Ting-Wei Lin, Jing-Kai Lou, Ming-Feng Tsai, Chuan-Ju Wang

In this paper, we propose a two-stage ranking approach for recommending linear TV programs.

Conversational Question Reformulation via Sequence-to-Sequence Architectures and Pretrained Language Models

no code implementations4 Apr 2020 Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin

This paper presents an empirical study of conversational question reformulation (CQR) with sequence-to-sequence architectures and pretrained language models (PLMs).

Task-Oriented Dialogue Systems

TTTTTackling WinoGrande Schemas

no code implementations18 Mar 2020 Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin

We applied the T5 sequence-to-sequence model to tackle the AI2 WinoGrande Challenge by decomposing each example into two input text strings, each containing a hypothesis, and using the probabilities assigned to the "entailment" token as a score of the hypothesis.

Coreference Resolution

Cannot find the paper you are looking for? You can Submit a new open access paper.