1 code implementation • 1 Jan 2025 • Hieu Man, Nghia Trung Ngo, Viet Dac Lai, Ryan A. Rossi, Franck Dernoncourt, Thien Huu Nguyen
Extensive experimental results demonstrate that LUSIFER significantly enhances the multilingual performance across various embedding tasks, particularly for medium and low-resource languages, without requiring explicit multilingual training data.
no code implementations • 14 Nov 2024 • Nghia Trung Ngo, Chien Van Nguyen, Franck Dernoncourt, Thien Huu Nguyen
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs) in knowledge-intensive tasks such as those from medical domain.
no code implementations • 13 Nov 2024 • Nghia Trung Ngo, Thien Huu Nguyen
The majority of previous researches addressing multi-lingual IE are limited to zero-shot cross-lingual single-transfer (one-to-one) setting, with high-resource languages predominantly as source training data.
1 code implementation • 6 Aug 2024 • Hieu Man, Nghia Trung Ngo, Franck Dernoncourt, Thien Huu Nguyen
This is due to their causal attention mechanism and the misalignment between their pre-training objectives and the text ranking tasks.
no code implementations • 17 Sep 2023 • Thuat Nguyen, Chien Van Nguyen, Viet Dac Lai, Hieu Man, Nghia Trung Ngo, Franck Dernoncourt, Ryan A. Rossi, Thien Huu Nguyen
However, when it comes to training datasets for these LLMs, especially the recent state-of-the-art models, they are often not fully disclosed.
2 code implementations • 29 Jul 2023 • Viet Dac Lai, Chien Van Nguyen, Nghia Trung Ngo, Thuat Nguyen, Franck Dernoncourt, Ryan A. Rossi, Thien Huu Nguyen
Okapi introduces instruction and response-ranked data in 26 diverse languages to facilitate the experiments and development of future multilingual LLM research.
no code implementations • 12 Apr 2023 • Viet Dac Lai, Nghia Trung Ngo, Amir Pouran Ben Veyseh, Hieu Man, Franck Dernoncourt, Trung Bui, Thien Huu Nguyen
The answer to this question requires a thorough evaluation of ChatGPT over multiple tasks with diverse languages and large datasets (i. e., beyond reported anecdotes), which is still missing or limited in current research.
1 code implementation • NAACL (ACL) 2022 • Minh Van Nguyen, Nghia Trung Ngo, Bonan Min, Thien Huu Nguyen
FAMIE is designed to address a fundamental problem in existing AL frameworks where annotators need to wait for a long time between annotation batches due to the time-consuming nature of model training and data selection at each AL iteration.