no code implementations • 4 Apr 2023 • Jinhyuk Lee, Zhuyun Dai, Sai Meher Karthik Duddu, Tao Lei, Iftekhar Naim, Ming-Wei Chang, Vincent Y. Zhao
Multi-vector retrieval models such as ColBERT [Khattab and Zaharia, 2020] allow token-level interactions between queries and documents, and hence achieve state of the art on many information retrieval benchmarks.
no code implementations • 2 Nov 2022 • Yujie Qian, Jinhyuk Lee, Sai Meher Karthik Duddu, Zhuyun Dai, Siddhartha Brahma, Iftekhar Naim, Tao Lei, Vincent Y. Zhao
With sparsified unary saliences, we are able to prune a large number of query and document token vectors and improve the efficiency of multi-vector retrieval.
no code implementations • 25 Oct 2022 • Gyuwan Kim, Jinhyuk Lee, Barlas Oguz, Wenhan Xiong, Yizhe Zhang, Yashar Mehdad, William Yang Wang
Building dense retrievers requires a series of standard procedures, including training and validating neural models and creating indexes for efficient search.
1 code implementation • 25 May 2022 • Mujeen Sung, Jungsoo Park, Jaewoo Kang, Danqi Chen, Jinhyuk Lee
In this paper, we introduce TOUR (Test-Time Optimization of Query Representations), which further optimizes instance-level query representations guided by signals from test-time retrieval results.
1 code implementation • 6 Jan 2022 • Mujeen Sung, Minbyul Jeong, Yonghwa Choi, Donghyeon Kim, Jinhyuk Lee, Jaewoo Kang
In biomedical natural language processing, named entity recognition (NER) and named entity normalization (NEN) are key tasks that enable the automatic extraction of biomedical entities (e. g. diseases and drugs) from the ever-growing biomedical literature.
1 code implementation • 16 Dec 2021 • Hyunjae Kim, Jaehyo Yoo, Seunghyun Yoon, Jinhyuk Lee, Jaewoo Kang
Recent named entity recognition (NER) models often rely on human-annotated datasets, requiring the significant engagement of professional knowledge on the target domain and entities.
1 code implementation • EMNLP 2021 • Christopher Sciavolino, Zexuan Zhong, Jinhyuk Lee, Danqi Chen
Open-domain question answering has exploded in popularity recently due to the success of dense retrieval models, which have surpassed sparse models using only a few supervised training examples.
Ranked #3 on
Passage Retrieval
on EntityQuestions
1 code implementation • EMNLP 2021 • Jinhyuk Lee, Alexander Wettig, Danqi Chen
Dense retrieval methods have shown great promise over sparse retrieval methods in a range of NLP problems.
1 code implementation • EMNLP 2021 • Mujeen Sung, Jinhyuk Lee, Sean Yi, Minji Jeon, Sungdong Kim, Jaewoo Kang
To this end, we create the BioLAMA benchmark, which is comprised of 49K biomedical factual knowledge triples for probing biomedical LMs.
4 code implementations • ACL 2021 • Jinhyuk Lee, Mujeen Sung, Jaewoo Kang, Danqi Chen
Open-domain question answering can be reformulated as a phrase retrieval problem, without the need for processing documents on-demand during inference (Seo et al., 2019).
Ranked #1 on
Question Answering
on Natural Questions (long)
1 code implementation • EMNLP (NLP-COVID19) 2020 • Jinhyuk Lee, Sean S. Yi, Minbyul Jeong, Mujeen Sung, Wonjin Yoon, Yonghwa Choi, Miyoung Ko, Jaewoo Kang
The recent outbreak of the novel coronavirus is wreaking havoc on the world and researchers are struggling to effectively combat it.
3 code implementations • ACL 2020 • Mujeen Sung, Hwisang Jeon, Jinhyuk Lee, Jaewoo Kang
In this way, we avoid the explicit pre-selection of negative samples from more than 400K candidates.
1 code implementation • EMNLP 2020 • Miyoung Ko, Jinhyuk Lee, Hyunjae Kim, Gangwoo Kim, Jaewoo Kang
In this study, we hypothesize that when the distribution of the answer positions is highly skewed in the training set (e. g., answers lie only in the k-th sentence of each passage), QA models predicting answers as positions can learn spurious positional cues and fail to give answers in different positions.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Jungsoo Park, Mujeen Sung, Jinhyuk Lee, Jaewoo Kang
Exposing diverse subword segmentations to neural machine translation (NMT) models often improves the robustness of machine translation as NMT models can experience various subword candidates.
3 code implementations • ACL 2020 • Jinhyuk Lee, Minjoon Seo, Hannaneh Hajishirzi, Jaewoo Kang
Open-domain question answering can be formulated as a phrase retrieval problem, in which we can expect huge scalability and speed benefit but often suffer from low accuracy due to the limitation of existing phrase representation models.
3 code implementations • 18 Sep 2019 • Wonjin Yoon, Jinhyuk Lee, Donghyeon Kim, Minbyul Jeong, Jaewoo Kang
The recent success of question answering systems is largely attributed to pre-trained language models.
1 code implementation • ACL 2019 • Minjoon Seo, Jinhyuk Lee, Tom Kwiatkowski, Ankur P. Parikh, Ali Farhadi, Hannaneh Hajishirzi
Existing open-domain question answering (QA) models are not suitable for real-time usage because they need to process several long documents on-demand for every input query.
19 code implementations • 25 Jan 2019 • Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, Jaewoo Kang
Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows.
2 code implementations • 9 Nov 2018 • Yonggyu Park, Junhyun Lee, Yookyung Koh, Inyeop Lee, Jinhyuk Lee, Jaewoo Kang
However, in designing a typeface, it is difficult to keep the style of various characters consistent, especially for languages with lots of morphological variations such as Chinese.
1 code implementation • EMNLP 2018 • Jinhyuk Lee, Seongjun Yun, Hyunjae Kim, Miyoung Ko, Jaewoo Kang
Recently, open-domain question answering (QA) has been combined with machine comprehension models to find answers in a large knowledge source.
2 code implementations • 21 Sep 2018 • Wonjin Yoon, Chan Ho So, Jinhyuk Lee, Jaewoo Kang
Our model has successfully reduced the number of misclassified entities and improved the performance by leveraging multiple datasets annotated for different entity types.
Ranked #13 on
Named Entity Recognition (NER)
on BC5CDR
1 code implementation • 5 Sep 2018 • Donghyeon Kim, Jinhyuk Lee, Donghee Choi, Jaehoon Choi, Jaewoo Kang
With online calendar services gaining popularity worldwide, calendar data has become one of the richest context sources for understanding human behavior.