1 code implementation • Findings (ACL) 2022 • Thuy-Trang Vu, Shahram Khadivi, Dinh Phung, Gholamreza Haffari
Generalising to unseen domains is under-explored and remains a challenge in neural machine translation.
no code implementations • 31 Mar 2025 • Jiangnan Li, Thuy-Trang Vu, Christian Herold, Amirhossein Tebbifakhr, Shahram Khadivi, Gholamreza Haffari
To address this issue, we propose CONGRAD, a scalable and effective filtering method that selects high-quality preference samples with minimal gradient conflicts across languages.
1 code implementation • 31 Mar 2025 • Minghan Wang, Ye Bai, Yuxia Wang, Thuy-Trang Vu, Ehsan Shareghi, Gholamreza Haffari
High-quality speech dialogue datasets are crucial for Speech-LLM development, yet existing acquisition methods face significant limitations.
no code implementations • 21 Jan 2025 • Minghan Wang, Viet-Thanh Pham, Farhad Moghimifar, Thuy-Trang Vu
Despite achieving remarkable performance, machine translation (MT) research remains underexplored in terms of translating cultural elements in languages, such as idioms, proverbs, and colloquial expressions.
1 code implementation • 17 Dec 2024 • Samin Mahdizadeh Sani, Pouya Sadeghi, Thuy-Trang Vu, Yadollah Yaghoobzadeh, Gholamreza Haffari
Large language models (LLMs) have made great progress in classification and text generation tasks.
no code implementations • 16 Oct 2024 • Minghao Wu, Thuy-Trang Vu, Lizhen Qu, Gholamreza Haffari
In this paper, we introduce GraphFilter, a novel method that represents the dataset as a bipartite graph, linking sentences to their constituent n-grams.
no code implementations • 18 Jun 2024 • Tuan-Luc Huynh, Thuy-Trang Vu, Weiqing Wang, Yinwei Wei, Trung Le, Dragan Gasevic, Yuan-Fang Li, Thanh-Toan Do
Differentiable Search Index (DSI) utilizes Pre-trained Language Models (PLMs) for efficient document retrieval without relying on external indexes.
1 code implementation • 16 Jun 2024 • Zhuang Li, Yuncheng Hua, Thuy-Trang Vu, Haolan Zhan, Lizhen Qu, Gholamreza Haffari
Recent studies emphasize that manually ensuring a consistent response style and maintaining high data quality in training sets can significantly improve the performance of fine-tuned Large Language Models (LLMs) while reducing the number of training examples needed.
1 code implementation • 16 Jun 2024 • Minghan Wang, Yuxia Wang, Thuy-Trang Vu, Ehsan Shareghi, Gholamreza Haffari
Recent advancements in multimodal large language models (MLLMs) have made significant progress in integrating information across various modalities, yet real-world applications in educational and scientific domains remain challenging.
no code implementations • 13 Jun 2024 • Minghao Wu, Thuy-Trang Vu, Lizhen Qu, Gholamreza Haffari
In this work, we propose a general, model-agnostic, reinforcement learning framework, Mixture-of-Skills (MoS), that learns to optimize data usage automatically during the fine-tuning process.
1 code implementation • 17 Feb 2024 • Minh-Vuong Nguyen, Linhao Luo, Fatemeh Shiri, Dinh Phung, Yuan-Fang Li, Thuy-Trang Vu, Gholamreza Haffari
Large language models (LLMs) demonstrate strong reasoning abilities when prompted to generate chain-of-thought (CoT) explanations alongside answers.
no code implementations • 16 Feb 2024 • Minghan Wang, Thuy-Trang Vu, Yuxia Wang, Ehsan Shareghi, Gholamreza Haffari
Simultaneous machine translation (SimulMT) presents a challenging trade-off between translation quality and latency.
no code implementations • 2 Feb 2024 • Tongtong Wu, Linhao Luo, Yuan-Fang Li, Shirui Pan, Thuy-Trang Vu, Gholamreza Haffari
Large language models (LLMs) are not amenable to frequent re-training, due to high training costs arising from their massive scale.
no code implementations • 12 Jan 2024 • Minghao Wu, Thuy-Trang Vu, Lizhen Qu, George Foster, Gholamreza Haffari
We provide an in-depth analysis of these LLMs tailored for DocMT, examining translation errors, discourse phenomena, strategies for training and inference, the data efficiency of parallel documents, recent test set evaluations, and zero-shot crosslingual transfer.
no code implementations • 18 Oct 2023 • Linhao Luo, Thuy-Trang Vu, Dinh Phung, Gholamreza Haffari
We systematically evaluate the state-of-the-art LLMs with KGs in generic and specific domains.
no code implementations • 13 Sep 2023 • Minghan Wang, Jinming Zhao, Thuy-Trang Vu, Fatemeh Shiri, Ehsan Shareghi, Gholamreza Haffari
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
no code implementations • 6 May 2023 • Thuy-Trang Vu, Shahram Khadivi, Mahsa Ghorbanali, Dinh Phung, Gholamreza Haffari
Acquiring new knowledge without forgetting what has been learned in a sequence of tasks is the central focus of continual learning (CL).
no code implementations • 26 Mar 2023 • Thuy-Trang Vu, Xuanli He, Gholamreza Haffari, Ehsan Shareghi
In very recent years more attention has been placed on probing the role of pre-training data in Large Language Models (LLMs) downstream behaviour.
no code implementations • 20 Oct 2022 • Thuy-Trang Vu, Shahram Khadivi, Xuanli He, Dinh Phung, Gholamreza Haffari
Previous works mostly focus on either multilingual or multi-domain aspects of neural machine translation (NMT).
1 code implementation • EMNLP 2021 • Thuy-Trang Vu, Xuanli He, Dinh Phung, Gholamreza Haffari
Once the in-domain data is detected by the classifier, the NMT model is then adapted to the new domain by jointly learning translation and domain discrimination tasks.
1 code implementation • EMNLP 2020 • Thuy-Trang Vu, Dinh Phung, Gholamreza Haffari
Recent work has shown the importance of adaptation of broad-coverage contextualised embedding models on the domain of the target task of interest.
1 code implementation • ACL 2019 • Thuy-Trang Vu, Ming Liu, Dinh Phung, Gholamreza Haffari
Heuristic-based active learning (AL) methods are limited when the data distribution of the underlying learning problems vary.
1 code implementation • EMNLP 2018 • Thuy-Trang Vu, Gholamreza Haffari
Automated Post-Editing (PE) is the task of automatically correct common and repetitive errors found in machine translation (MT) output.