Search Results for author: Pin-Jie Lin

Found 11 papers, 3 papers with code

Efficient Model Development through Fine-tuning Transfer

no code implementations25 Mar 2025 Pin-Jie Lin, Rishab Balasubramanian, Fengyuan Liu, Nikhil Kandpal, Tu Vu

In a multilingual model development setting, we show that this approach can significantly increase performance on target-language tasks without retraining, achieving an absolute improvement of 4. 7% and 15. 5% on Global MMLU for Malagasy and Turkish, respectively, compared to Llama 3. 1 8B Instruct.

MMLU model

Self-Vocabularizing Training for Neural Machine Translation

no code implementations18 Mar 2025 Pin-Jie Lin, Ernie Chang, Yangyang Shi, Vikas Chandra

Past vocabulary learning techniques identify relevant vocabulary before training, relying on statistical and entropy-based assumptions that largely neglect the role of model training.

Machine Translation Translation

Scaling Parameter-Constrained Language Models with Quality Data

no code implementations4 Oct 2024 Ernie Chang, Matteo Paltenghi, Yang Li, Pin-Jie Lin, Changsheng Zhao, Patrick Huber, Zechun Liu, Rastislav Rabatin, Yangyang Shi, Vikas Chandra

Scaling laws in language modeling traditionally quantify training loss as a function of dataset size and model parameters, providing compute-optimal estimates but often neglecting the impact of data quality on model generalization.

Diversity Language Modeling +1

Target-Aware Language Modeling via Granular Data Sampling

no code implementations23 Sep 2024 Ernie Chang, Pin-Jie Lin, Yang Li, Changsheng Zhao, Daeil Kim, Rastislav Rabatin, Zechun Liu, Yangyang Shi, Vikas Chandra

A cost-effective and straightforward approach is sampling with low-dimensional data features, which allows to select large-scale pretraining data for domain-specific use cases.

Language Modeling Language Modelling +2

Exploring the Effectiveness and Consistency of Task Selection in Intermediate-Task Transfer Learning

1 code implementation23 Jul 2024 Pin-Jie Lin, Miaoran Zhang, Marius Mosbach, Dietrich Klakow

Identifying beneficial tasks to transfer from is a critical step toward successful intermediate-task transfer learning.

Transfer Learning

Modeling Orthographic Variation Improves NLP Performance for Nigerian Pidgin

no code implementations28 Apr 2024 Pin-Jie Lin, Merel Scholman, Muhammed Saeed, Vera Demberg

We test the effect of this data augmentation on two critical NLP tasks: machine translation and sentiment analysis.

Data Augmentation Machine Translation +2

On The Open Prompt Challenge In Conditional Audio Generation

no code implementations1 Nov 2023 Ernie Chang, Sidd Srinivasan, Mahi Luthra, Pin-Jie Lin, Varun Nagaraja, Forrest Iandola, Zechun Liu, Zhaoheng Ni, Changsheng Zhao, Yangyang Shi, Vikas Chandra

Text-to-audio generation (TTA) produces audio from a text description, learning from pairs of audio samples and hand-annotated text.

Audio Generation

In-Context Prompt Editing For Conditional Audio Generation

no code implementations1 Nov 2023 Ernie Chang, Pin-Jie Lin, Yang Li, Sidd Srinivasan, Gael Le Lan, David Kant, Yangyang Shi, Forrest Iandola, Vikas Chandra

We show that the framework enhanced the audio quality across the set of collected user prompts, which were edited with reference to the training captions as exemplars.

Audio Generation Retrieval

Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin

1 code implementation1 Jul 2023 Pin-Jie Lin, Muhammed Saeed, Ernie Chang, Merel Scholman

In this work, we target on improving upon both text classification and translation of Nigerian Pidgin (Naija) by collecting a large-scale parallel English-Pidgin corpus and further propose a framework of cross-lingual adaptive training that includes both continual and task adaptive training so as to adapt a base pre-trained model to low-resource languages.

text-classification Text Classification +1

Revisiting Sample Size Determination in Natural Language Understanding

1 code implementation1 Jul 2023 Ernie Chang, Muhammad Hassan Rashid, Pin-Jie Lin, Changsheng Zhao, Vera Demberg, Yangyang Shi, Vikas Chandra

Knowing exactly how many data points need to be labeled to achieve a certain model performance is a hugely beneficial step towards reducing the overall budgets for annotation.

Active Learning Natural Language Understanding

Cannot find the paper you are looking for? You can Submit a new open access paper.