no code implementations • COLING (CreativeSumm) 2022 • Dongqi Pu, Xudong Hong, Pin-Jie Lin, Ernie Chang, Vera Demberg
The Creative Summarization Shared Task at COLING 2022 aspires to generate summaries given long-form texts from creative writing.
no code implementations • 25 Mar 2025 • Pin-Jie Lin, Rishab Balasubramanian, Fengyuan Liu, Nikhil Kandpal, Tu Vu
In a multilingual model development setting, we show that this approach can significantly increase performance on target-language tasks without retraining, achieving an absolute improvement of 4. 7% and 15. 5% on Global MMLU for Malagasy and Turkish, respectively, compared to Llama 3. 1 8B Instruct.
no code implementations • 18 Mar 2025 • Pin-Jie Lin, Ernie Chang, Yangyang Shi, Vikas Chandra
Past vocabulary learning techniques identify relevant vocabulary before training, relying on statistical and entropy-based assumptions that largely neglect the role of model training.
no code implementations • 4 Oct 2024 • Ernie Chang, Matteo Paltenghi, Yang Li, Pin-Jie Lin, Changsheng Zhao, Patrick Huber, Zechun Liu, Rastislav Rabatin, Yangyang Shi, Vikas Chandra
Scaling laws in language modeling traditionally quantify training loss as a function of dataset size and model parameters, providing compute-optimal estimates but often neglecting the impact of data quality on model generalization.
no code implementations • 23 Sep 2024 • Ernie Chang, Pin-Jie Lin, Yang Li, Changsheng Zhao, Daeil Kim, Rastislav Rabatin, Zechun Liu, Yangyang Shi, Vikas Chandra
A cost-effective and straightforward approach is sampling with low-dimensional data features, which allows to select large-scale pretraining data for domain-specific use cases.
1 code implementation • 23 Jul 2024 • Pin-Jie Lin, Miaoran Zhang, Marius Mosbach, Dietrich Klakow
Identifying beneficial tasks to transfer from is a critical step toward successful intermediate-task transfer learning.
no code implementations • 28 Apr 2024 • Pin-Jie Lin, Merel Scholman, Muhammed Saeed, Vera Demberg
We test the effect of this data augmentation on two critical NLP tasks: machine translation and sentiment analysis.
no code implementations • 1 Nov 2023 • Ernie Chang, Sidd Srinivasan, Mahi Luthra, Pin-Jie Lin, Varun Nagaraja, Forrest Iandola, Zechun Liu, Zhaoheng Ni, Changsheng Zhao, Yangyang Shi, Vikas Chandra
Text-to-audio generation (TTA) produces audio from a text description, learning from pairs of audio samples and hand-annotated text.
no code implementations • 1 Nov 2023 • Ernie Chang, Pin-Jie Lin, Yang Li, Sidd Srinivasan, Gael Le Lan, David Kant, Yangyang Shi, Forrest Iandola, Vikas Chandra
We show that the framework enhanced the audio quality across the set of collected user prompts, which were edited with reference to the training captions as exemplars.
1 code implementation • 1 Jul 2023 • Pin-Jie Lin, Muhammed Saeed, Ernie Chang, Merel Scholman
In this work, we target on improving upon both text classification and translation of Nigerian Pidgin (Naija) by collecting a large-scale parallel English-Pidgin corpus and further propose a framework of cross-lingual adaptive training that includes both continual and task adaptive training so as to adapt a base pre-trained model to low-resource languages.
1 code implementation • 1 Jul 2023 • Ernie Chang, Muhammad Hassan Rashid, Pin-Jie Lin, Changsheng Zhao, Vera Demberg, Yangyang Shi, Vikas Chandra
Knowing exactly how many data points need to be labeled to achieve a certain model performance is a hugely beneficial step towards reducing the overall budgets for annotation.