no code implementations • 2 Oct 2024 • Thao Nguyen, Kuan-Hao Huang, Ge Liu, Martin D. Burke, Ying Diao, Heng Ji
This strategic reduction in tokenization granularity is intentionally aligned with key drivers of functional properties (i. e., functional groups), enhancing the model's understanding of chemical language.
no code implementations • 19 Aug 2024 • Xiaomeng Jin, Jeonghwan Kim, Yu Zhou, Kuan-Hao Huang, Te-Lin Wu, Nanyun Peng, Heng Ji
To address these issues, we propose Attribute-based Multimodal Data Augmentation (ARMADA), a novel multimodal data augmentation method via knowledge-guided manipulation of visual attributes of the mentioned entities.
1 code implementation • 1 Jul 2024 • Ziqi Wang, HANLIN ZHANG, Xiner Li, Kuan-Hao Huang, Chi Han, Shuiwang Ji, Sham M. Kakade, Hao Peng, Heng Ji
Position bias has proven to be a prevalent issue of modern language models (LMs), where the models prioritize content based on its position within the given context.
1 code implementation • 9 Apr 2024 • Zhenhailong Wang, Joy Hsu, Xingyao Wang, Kuan-Hao Huang, Manling Li, Jiajun Wu, Heng Ji
Without human-annotated data, empirical results show that VDLM significantly improves state-of-the-art LMMs like GPT-4o on various multimodal perception and reasoning tasks.
1 code implementation • 2 Apr 2024 • Tanmay Parekh, Anh Mac, Jiarui Yu, Yuxuan Dong, Syed Shahriar, Bonnie Liu, Eric Yang, Kuan-Hao Huang, Wei Wang, Nanyun Peng, Kai-Wei Chang
In our work, we pioneer exploiting Event Detection (ED) for better preparedness and early warnings of any upcoming epidemic by developing a framework to extract and analyze epidemic-related events from social media posts.
1 code implementation • 16 Nov 2023 • Kuan-Hao Huang, I-Hung Hsu, Tanmay Parekh, Zhiyu Xie, Zixuan Zhang, Premkumar Natarajan, Kai-Wei Chang, Nanyun Peng, Heng Ji
In this work, we identify and address evaluation challenges, including inconsistency due to varying data assumptions or preprocessing steps, the insufficiency of current evaluation frameworks that may introduce dataset or data split bias, and the low reproducibility of some previous approaches.
no code implementations • 19 Sep 2023 • Fei Wang, Kuan-Hao Huang, Kai-Wei Chang, Muhao Chen
In this paper, we propose a simple yet effective method, SALT, to improve the zero-shot cross-lingual transfer of the multilingual pretrained language models without the help of such external data.
1 code implementation • 16 Sep 2023 • Tanmay Parekh, I-Hung Hsu, Kuan-Hao Huang, Kai-Wei Chang, Nanyun Peng
Label projection, which involves obtaining translated labels and texts jointly, is essential for leveraging machine translation to facilitate cross-lingual transfer in structured prediction tasks.
1 code implementation • 26 May 2023 • Yixin Wan, Kuan-Hao Huang, Kai-Wei Chang
Existing fine-tuning methods for this task are costly as all the parameters of the model need to be updated during the training process.
1 code implementation • 26 May 2023 • Kuan-Hao Huang, Varun Iyer, I-Hung Hsu, Anoop Kumar, Kai-Wei Chang, Aram Galstyan
Paraphrase generation is a long-standing task in natural language processing (NLP).
1 code implementation • 26 May 2023 • I-Hung Hsu, Zhiyu Xie, Kuan-Hao Huang, Prem Natarajan, Nanyun Peng
However, existing generation-based EAE models mostly focus on problem re-formulation and prompt design, without incorporating additional information that has been shown to be effective for classification-based models, such as the abstract meaning representation (AMR) of the input passages.
1 code implementation • 23 May 2023 • Oscar Chew, Hsuan-Tien Lin, Kai-Wei Chang, Kuan-Hao Huang
Recent research has revealed that machine learning models have a tendency to leverage spurious correlations that exist in the training set but may not hold true in general circumstances.
no code implementations • 22 May 2023 • Kuan-Hao Huang, Liang Tan, Rui Hou, Sinong Wang, Amjad Almahairi, Ruty Rinott
Fine-tuning a large pre-trained language model for each downstream task causes computational burdens in the inference time due to several times of forward passes.
no code implementations • 2 Nov 2022 • Kuan-Hao Huang, Varun Iyer, Anoop Kumar, Sriram Venkatapathy, Kai-Wei Chang, Aram Galstyan
In this paper, we demonstrate that leveraging Abstract Meaning Representations (AMR) can greatly improve the performance of unsupervised syntactically controlled paraphrase generation.
1 code implementation • 25 May 2022 • I-Hung Hsu, Kuan-Hao Huang, Shuning Zhang, Wenxin Cheng, Premkumar Natarajan, Kai-Wei Chang, Nanyun Peng
In this work, we propose to take a unified view of all these tasks and introduce TAGPRIME to address relational structure extraction problems.
1 code implementation • 25 May 2022 • Tanmay Parekh, I-Hung Hsu, Kuan-Hao Huang, Kai-Wei Chang, Nanyun Peng
We utilize this ontology to further introduce GENEVA, a diverse generalizability benchmarking dataset comprising four test suites, aimed at evaluating models' ability to handle limited data and unseen event type generalization.
1 code implementation • 25 Mar 2022 • Xueying Zhan, Qingzhong Wang, Kuan-Hao Huang, Haoyi Xiong, Dejing Dou, Antoni B. Chan
In this work, We construct a DAL toolkit, DeepAL+, by re-implementing 19 highly-cited DAL methods.
1 code implementation • ACL 2022 • Kuan-Hao Huang, I-Hung Hsu, Premkumar Natarajan, Kai-Wei Chang, Nanyun Peng
We present a study on leveraging multilingual pre-trained generative language models for zero-shot cross-lingual event argument extraction (EAE).
2 code implementations • 30 Nov 2021 • Kuan-Hao Huang
We present DeepAL, a Python library that implements several common strategies for active learning, with a particular emphasis on deep active learning.
2 code implementations • NAACL 2022 • I-Hung Hsu, Kuan-Hao Huang, Elizabeth Boschee, Scott Miller, Prem Natarajan, Kai-Wei Chang, Nanyun Peng
Given a passage and a manually designed prompt, DEGREE learns to summarize the events mentioned in the passage into a natural sentence that follows a predefined pattern.
1 code implementation • EMNLP 2021 • Kuan-Hao Huang, Wasi Uddin Ahmad, Nanyun Peng, Kai-Wei Chang
Pre-trained multilingual language encoders, such as multilingual BERT and XLM-R, show great potential for zero-shot cross-lingual transfer.
1 code implementation • NAACL 2021 • James Y. Huang, Kuan-Hao Huang, Kai-Wei Chang
In this work, we present ParaBART, a semantic sentence embedding model that learns to disentangle semantics and syntax in sentence embeddings obtained by pre-trained language models.
1 code implementation • EACL 2021 • Kuan-Hao Huang, Kai-Wei Chang
We also demonstrate that the performance of SynPG is competitive or even better than supervised models when the unannotated data is large.
1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Kuan-Hao Huang, Chen Li, Kai-Wei Chang
To deeply study this task, we present SportsSum, a Chinese sports game summarization dataset which contains 5, 428 soccer games of live commentaries and the corresponding news articles.
1 code implementation • IJCNLP 2019 • Pei Zhou, Weijia Shi, Jieyu Zhao, Kuan-Hao Huang, Muhao Chen, Ryan Cotterell, Kai-Wei Chang
Recent studies have shown that word embeddings exhibit gender bias inherited from the training corpora.
no code implementations • 4 Jan 2019 • Sean T. Yang, Kuan-Hao Huang, Bill Howe
We propose JECL, a method for clustering image-caption pairs by training parallel encoders with regularized clustering and alignment objectives, simultaneously learning both representations and cluster assignments.
no code implementations • 14 Nov 2017 • Hong-Min Chu, Kuan-Hao Huang, Hsuan-Tien Lin
The foundation of CS-DPP is an online LSDR framework derived from a leading LSDR algorithm.
1 code implementation • 29 Nov 2016 • Yao-Yuan Yang, Kuan-Hao Huang, Chih-Wei Chang, Hsuan-Tien Lin
Label space expansion for multi-label classification (MLC) is a methodology that encodes the original label vectors to higher dimensional codes before training and decodes the predicted codes back to the label vectors during testing.
2 code implementations • 30 Mar 2016 • Kuan-Hao Huang, Hsuan-Tien Lin
Furthermore, extensive experimental results demonstrate that CLEMS is significantly better than a wide spectrum of existing LE algorithms and state-of-the-art cost-sensitive algorithms across different cost functions.