1 code implementation • 27 Jan 2024 • Minbyul Jeong, Jiwoong Sohn, Mujeen Sung, Jaewoo Kang
To address challenges that still cannot be handled with the encoded knowledge of LLMs, various retrieval-augmented generation (RAG) methods have been developed by searching documents from the knowledge corpus and appending them unconditionally or selectively to the input of LLMs for generation.
no code implementations • 10 Jul 2023 • Gangwoo Kim, Hajung Kim, Lei Ji, Seongsu Bae, Chanhwi Kim, Mujeen Sung, Hyunjae Kim, Kun Yan, Eric Chang, Jaewoo Kang
In this paper, we introduce CheXOFA, a new pre-trained vision-language model (VLM) for the chest X-ray domain.
1 code implementation • 24 May 2023 • Mujeen Sung, James Gung, Elman Mansimov, Nikolaos Pappas, Raphael Shu, Salvatore Romeo, Yi Zhang, Vittorio Castelli
Intent classification (IC) plays an important role in task-oriented dialogue systems.
1 code implementation • 25 May 2022 • Mujeen Sung, Jungsoo Park, Jaewoo Kang, Danqi Chen, Jinhyuk Lee
In this paper, we introduce TOUR (Test-Time Optimization of Query Representations), which further optimizes instance-level query representations guided by signals from test-time retrieval results.
1 code implementation • 6 Jan 2022 • Mujeen Sung, Minbyul Jeong, Yonghwa Choi, Donghyeon Kim, Jinhyuk Lee, Jaewoo Kang
In biomedical natural language processing, named entity recognition (NER) and named entity normalization (NEN) are key tasks that enable the automatic extraction of biomedical entities (e. g. diseases and drugs) from the ever-growing biomedical literature.
no code implementations • 20 Nov 2021 • Hyunjae Kim, Mujeen Sung, Wonjin Yoon, Sungjoon Park, Jaewoo Kang
This paper is a technical report on our system submitted to the chemical identification task of the BioCreative VII Track 2 challenge.
1 code implementation • EMNLP 2021 • Mujeen Sung, Jinhyuk Lee, Sean Yi, Minji Jeon, Sungdong Kim, Jaewoo Kang
To this end, we create the BioLAMA benchmark, which is comprised of 49K biomedical factual knowledge triples for probing biomedical LMs.
4 code implementations • ACL 2021 • Jinhyuk Lee, Mujeen Sung, Jaewoo Kang, Danqi Chen
Open-domain question answering can be reformulated as a phrase retrieval problem, without the need for processing documents on-demand during inference (Seo et al., 2019).
Ranked #1 on Question Answering on Natural Questions (long)
2 code implementations • 1 Jul 2020 • Minbyul Jeong, Mujeen Sung, Gangwoo Kim, Donghyeon Kim, Wonjin Yoon, Jaehyo Yoo, Jaewoo Kang
We observe that BioBERT trained on the NLI dataset obtains better performance on Yes/No (+5. 59%), Factoid (+0. 53%), List type (+13. 58%) questions compared to performance obtained in a previous challenge (BioASQ 7B Phase B).
1 code implementation • EMNLP (NLP-COVID19) 2020 • Jinhyuk Lee, Sean S. Yi, Minbyul Jeong, Mujeen Sung, Wonjin Yoon, Yonghwa Choi, Miyoung Ko, Jaewoo Kang
The recent outbreak of the novel coronavirus is wreaking havoc on the world and researchers are struggling to effectively combat it.
3 code implementations • ACL 2020 • Mujeen Sung, Hwisang Jeon, Jinhyuk Lee, Jaewoo Kang
In this way, we avoid the explicit pre-selection of negative samples from more than 400K candidates.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Jungsoo Park, Mujeen Sung, Jinhyuk Lee, Jaewoo Kang
Exposing diverse subword segmentations to neural machine translation (NMT) models often improves the robustness of machine translation as NMT models can experience various subword candidates.