no code implementations • 7 Feb 2025 • Jungwoo Kim, Minsang Kim, Sungjin Lee
Diversity-based filtering maintains model accuracy without excessively discarding low-quality generated instructions by enhancing the diversity of instructions in a batch.
1 code implementation • 6 Feb 2025 • Minsang Kim, Seungjun Baek
We propose Syntriever, a training framework for retrievers using synthetic data from black-box LLMs.
no code implementations • 12 Dec 2024 • Minsang Kim, Seungjun Baek
Large language models (LLMs) closely interact with humans, and thus need an intimate understanding of the cultural values of human society.
1 code implementation • 20 Jun 2024 • Minsang Kim, Cheoneum Park, Seungjun Baek
In addition, to compensate for the case where the retrieved passages contain distracting information or divided opinions, we augment the retrieved passages with self-generated passages by LLMs to guide the answer extraction.
no code implementations • 20 Jun 2024 • Minsang Kim, Seungjun Baek
In this work, we consider a data pruning method based on information entropy.
1 code implementation • 13 Feb 2024 • Minsang Kim, Seungjun Baek
HPLC leverages the positional information of nodes based on landmarks at various levels of hierarchy such as nodes' distances to landmarks, inter-landmark distances and hierarchical grouping of clusters.
1 code implementation • 29 Jun 2022 • Minsang Kim, Seungjun Baek
In the common feature extraction, we apply the common encoding function to all input embeddings.
1 code implementation • 29 Jun 2022 • Minsang Kim, Sang-hyun Je, Eunjoo Park
We provide both a human-annotated test dataset and an auto-generated dataset.