Search Results for author: Samuel Cahyawijaya

Found 61 papers, 30 papers with code

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

6 code implementations • 9 Nov 2022 • BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major, Iz Beltagy, Huu Nguyen, Lucile Saulnier, Samson Tan, Pedro Ortiz Suarez, Victor Sanh, Hugo Laurençon, Yacine Jernite, Julien Launay, Margaret Mitchell, Colin Raffel, Aaron Gokaslan, Adi Simhi, Aitor Soroa, Alham Fikri Aji, Amit Alfassy, Anna Rogers, Ariel Kreisberg Nitzav, Canwen Xu, Chenghao Mou, Chris Emezue, Christopher Klamm, Colin Leong, Daniel van Strien, David Ifeoluwa Adelani, Dragomir Radev, Eduardo González Ponferrada, Efrat Levkovizh, Ethan Kim, Eyal Bar Natan, Francesco De Toni, Gérard Dupont, Germán Kruszewski, Giada Pistilli, Hady Elsahar, Hamza Benyamina, Hieu Tran, Ian Yu, Idris Abdulmumin, Isaac Johnson, Itziar Gonzalez-Dios, Javier de la Rosa, Jenny Chim, Jesse Dodge, Jian Zhu, Jonathan Chang, Jörg Frohberg, Joseph Tobing, Joydeep Bhattacharjee, Khalid Almubarak, Kimbo Chen, Kyle Lo, Leandro von Werra, Leon Weber, Long Phan, Loubna Ben allal, Ludovic Tanguy, Manan Dey, Manuel Romero Muñoz, Maraim Masoud, María Grandury, Mario Šaško, Max Huang, Maximin Coavoux, Mayank Singh, Mike Tian-Jian Jiang, Minh Chien Vu, Mohammad A. Jauhar, Mustafa Ghaleb, Nishant Subramani, Nora Kassner, Nurulaqilla Khamis, Olivier Nguyen, Omar Espejel, Ona de Gibert, Paulo Villegas, Peter Henderson, Pierre Colombo, Priscilla Amuok, Quentin Lhoest, Rheza Harliman, Rishi Bommasani, Roberto Luis López, Rui Ribeiro, Salomey Osei, Sampo Pyysalo, Sebastian Nagel, Shamik Bose, Shamsuddeen Hassan Muhammad, Shanya Sharma, Shayne Longpre, Somaieh Nikpoor, Stanislav Silberberg, Suhas Pai, Sydney Zink, Tiago Timponi Torrent, Timo Schick, Tristan Thrush, Valentin Danchev, Vassilina Nikoulina, Veronika Laippala, Violette Lepercq, Vrinda Prabhu, Zaid Alyafeai, Zeerak Talat, Arun Raja, Benjamin Heinzerling, Chenglei Si, Davut Emre Taşar, Elizabeth Salesky, Sabrina J. Mielke, Wilson Y. Lee, Abheesht Sharma, Andrea Santilli, Antoine Chaffin, Arnaud Stiegler, Debajyoti Datta, Eliza Szczechla, Gunjan Chhablani, Han Wang, Harshit Pandey, Hendrik Strobelt, Jason Alan Fries, Jos Rozen, Leo Gao, Lintang Sutawika, M Saiful Bari, Maged S. Al-shaibani, Matteo Manica, Nihal Nayak, Ryan Teehan, Samuel Albanie, Sheng Shen, Srulik Ben-David, Stephen H. Bach, Taewoon Kim, Tali Bers, Thibault Fevry, Trishala Neeraj, Urmish Thakker, Vikas Raunak, Xiangru Tang, Zheng-Xin Yong, Zhiqing Sun, Shaked Brody, Yallow Uri, Hadar Tojarieh, Adam Roberts, Hyung Won Chung, Jaesung Tae, Jason Phang, Ofir Press, Conglong Li, Deepak Narayanan, Hatim Bourfoune, Jared Casper, Jeff Rasley, Max Ryabinin, Mayank Mishra, Minjia Zhang, Mohammad Shoeybi, Myriam Peyrounette, Nicolas Patry, Nouamane Tazi, Omar Sanseviero, Patrick von Platen, Pierre Cornette, Pierre François Lavallée, Rémi Lacroix, Samyam Rajbhandari, Sanchit Gandhi, Shaden Smith, Stéphane Requena, Suraj Patil, Tim Dettmers, Ahmed Baruwa, Amanpreet Singh, Anastasia Cheveleva, Anne-Laure Ligozat, Arjun Subramonian, Aurélie Névéol, Charles Lovering, Dan Garrette, Deepak Tunuguntla, Ehud Reiter, Ekaterina Taktasheva, Ekaterina Voloshina, Eli Bogdanov, Genta Indra Winata, Hailey Schoelkopf, Jan-Christoph Kalo, Jekaterina Novikova, Jessica Zosa Forde, Jordan Clive, Jungo Kasai, Ken Kawamura, Liam Hazan, Marine Carpuat, Miruna Clinciu, Najoung Kim, Newton Cheng, Oleg Serikov, Omer Antverg, Oskar van der Wal, Rui Zhang, Ruochen Zhang, Sebastian Gehrmann, Shachar Mirkin, Shani Pais, Tatiana Shavrina, Thomas Scialom, Tian Yun, Tomasz Limisiewicz, Verena Rieser, Vitaly Protasov, Vladislav Mikhailov, Yada Pruksachatkun, Yonatan Belinkov, Zachary Bamberger, Zdeněk Kasner, Alice Rueda, Amanda Pestana, Amir Feizpour, Ammar Khan, Amy Faranak, Ana Santos, Anthony Hevia, Antigona Unldreaj, Arash Aghagol, Arezoo Abdollahi, Aycha Tammour, Azadeh HajiHosseini, Bahareh Behroozi, Benjamin Ajibade, Bharat Saxena, Carlos Muñoz Ferrandis, Daniel McDuff, Danish Contractor, David Lansky, Davis David, Douwe Kiela, Duong A. Nguyen, Edward Tan, Emi Baylor, Ezinwanne Ozoani, Fatima Mirza, Frankline Ononiwu, Habib Rezanejad, Hessie Jones, Indrani Bhattacharya, Irene Solaiman, Irina Sedenko, Isar Nejadgholi, Jesse Passmore, Josh Seltzer, Julio Bonis Sanz, Livia Dutra, Mairon Samagaio, Maraim Elbadri, Margot Mieskes, Marissa Gerchick, Martha Akinlolu, Michael McKenna, Mike Qiu, Muhammed Ghauri, Mykola Burynok, Nafis Abrar, Nazneen Rajani, Nour Elkott, Nour Fahmy, Olanrewaju Samuel, Ran An, Rasmus Kromann, Ryan Hao, Samira Alizadeh, Sarmad Shubber, Silas Wang, Sourav Roy, Sylvain Viguier, Thanh Le, Tobi Oyebade, Trieu Le, Yoyo Yang, Zach Nguyen, Abhinav Ramesh Kashyap, Alfredo Palasciano, Alison Callahan, Anima Shukla, Antonio Miranda-Escalada, Ayush Singh, Benjamin Beilharz, Bo wang, Caio Brito, Chenxi Zhou, Chirag Jain, Chuxin Xu, Clémentine Fourrier, Daniel León Periñán, Daniel Molano, Dian Yu, Enrique Manjavacas, Fabio Barth, Florian Fuhrimann, Gabriel Altay, Giyaseddin Bayrak, Gully Burns, Helena U. Vrabec, Imane Bello, Ishani Dash, Jihyun Kang, John Giorgi, Jonas Golde, Jose David Posada, Karthik Rangasai Sivaraman, Lokesh Bulchandani, Lu Liu, Luisa Shinzato, Madeleine Hahn de Bykhovetz, Maiko Takeuchi, Marc Pàmies, Maria A Castillo, Marianna Nezhurina, Mario Sänger, Matthias Samwald, Michael Cullan, Michael Weinberg, Michiel De Wolf, Mina Mihaljcic, Minna Liu, Moritz Freidank, Myungsun Kang, Natasha Seelam, Nathan Dahlberg, Nicholas Michio Broad, Nikolaus Muellner, Pascale Fung, Patrick Haller, Ramya Chandrasekhar, Renata Eisenberg, Robert Martin, Rodrigo Canalli, Rosaline Su, Ruisi Su, Samuel Cahyawijaya, Samuele Garda, Shlok S Deshmukh, Shubhanshu Mishra, Sid Kiblawi, Simon Ott, Sinee Sang-aroonsiri, Srishti Kumar, Stefan Schweter, Sushil Bharati, Tanmay Laud, Théo Gigant, Tomoya Kainuma, Wojciech Kusa, Yanis Labrak, Yash Shailesh Bajaj, Yash Venkatraman, Yifan Xu, Yingxin Xu, Yu Xu, Zhe Tan, Zhongli Xie, Zifan Ye, Mathilde Bras, Younes Belkada, Thomas Wolf

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions.

Language Modelling Multilingual NLP

2,181

Paper
Code

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

2 code implementations • 6 Dec 2021 • Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, Jinho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo, Samuel Cahyawijaya, Emile Chapuis, Wanxiang Che, Mukund Choudhary, Christian Clauss, Pierre Colombo, Filip Cornell, Gautier Dagan, Mayukh Das, Tanay Dixit, Thomas Dopierre, Paul-Alexis Dray, Suchitra Dubey, Tatiana Ekeinhor, Marco Di Giovanni, Tanya Goyal, Rishabh Gupta, Louanes Hamla, Sang Han, Fabrice Harel-Canada, Antoine Honore, Ishan Jindal, Przemyslaw K. Joniak, Denis Kleyko, Venelin Kovatchev, Kalpesh Krishna, Ashutosh Kumar, Stefan Langer, Seungjae Ryan Lee, Corey James Levinson, Hualou Liang, Kaizhao Liang, Zhexiong Liu, Andrey Lukyanenko, Vukosi Marivate, Gerard de Melo, Simon Meoni, Maxime Meyer, Afnan Mir, Nafise Sadat Moosavi, Niklas Muennighoff, Timothy Sum Hon Mun, Kenton Murray, Marcin Namysl, Maria Obedkova, Priti Oli, Nivranshu Pasricha, Jan Pfister, Richard Plant, Vinay Prabhu, Vasile Pais, Libo Qin, Shahab Raji, Pawan Kumar Rajpoot, Vikas Raunak, Roy Rinberg, Nicolas Roberts, Juan Diego Rodriguez, Claude Roux, Vasconcellos P. H. S., Ananya B. Sai, Robin M. Schmidt, Thomas Scialom, Tshephisho Sefara, Saqib N. Shamsi, Xudong Shen, Haoyue Shi, Yiwen Shi, Anna Shvets, Nick Siegel, Damien Sileo, Jamie Simon, Chandan Singh, Roman Sitelew, Priyank Soni, Taylor Sorensen, William Soto, Aman Srivastava, KV Aditya Srivatsa, Tony Sun, Mukund Varma T, A Tabassum, Fiona Anting Tan, Ryan Teehan, Mo Tiwari, Marie Tolkiehn, Athena Wang, Zijian Wang, Gloria Wang, Zijie J. Wang, Fuxuan Wei, Bryan Wilie, Genta Indra Winata, Xinyi Wu, Witold Wydmański, Tianbao Xie, Usama Yaseen, Michael A. Yee, Jing Zhang, Yue Zhang

Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on.

Data Augmentation

758

Paper
Code

IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding

3 code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Bryan Wilie, Karissa Vincentio, Genta Indra Winata, Samuel Cahyawijaya, Xiaohong Li, Zhi Yuan Lim, Sidik Soleman, Rahmad Mahendra, Pascale Fung, Syafri Bahar, Ayu Purwarianti

Although Indonesian is known to be the fourth most frequently used language over the internet, the research progress on this language in the natural language processing (NLP) is slow-moving due to a lack of available resources.

Benchmarking Natural Language Understanding +2

495

Paper
Code

BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing

1 code implementation • 30 Jun 2022 • Jason Alan Fries, Leon Weber, Natasha Seelam, Gabriel Altay, Debajyoti Datta, Samuele Garda, Myungsun Kang, Ruisi Su, Wojciech Kusa, Samuel Cahyawijaya, Fabio Barth, Simon Ott, Matthias Samwald, Stephen Bach, Stella Biderman, Mario Sänger, Bo wang, Alison Callahan, Daniel León Periñán, Théo Gigant, Patrick Haller, Jenny Chim, Jose David Posada, John Michael Giorgi, Karthik Rangasai Sivaraman, Marc Pàmies, Marianna Nezhurina, Robert Martin, Michael Cullan, Moritz Freidank, Nathan Dahlberg, Shubhanshu Mishra, Shamik Bose, Nicholas Michio Broad, Yanis Labrak, Shlok S Deshmukh, Sid Kiblawi, Ayush Singh, Minh Chien Vu, Trishala Neeraj, Jonas Golde, Albert Villanova del Moral, Benjamin Beilharz

Training and evaluating language models increasingly requires the construction of meta-datasets --diverse collections of curated data with clear provenance.

Language Modelling Multi-Task Learning +3

420

Paper
Code

NusaCrowd: Open Source Initiative for Indonesian NLP Resources

1 code implementation • 19 Dec 2022 • Samuel Cahyawijaya, Holy Lovenia, Alham Fikri Aji, Genta Indra Winata, Bryan Wilie, Rahmad Mahendra, Christian Wibisono, Ade Romadhony, Karissa Vincentio, Fajri Koto, JENNIFER SANTOSO, David Moeljadi, Cahya Wirawan, Frederikus Hudi, Ivan Halim Parmonangan, Ika Alfina, Muhammad Satrio Wicaksono, Ilham Firdausi Putra, Samsul Rahmadani, Yulianti Oenang, Ali Akbar Septiandri, James Jaya, Kaustubh D. Dhole, Arie Ardiyanti Suryani, Rifki Afina Putri, Dan Su, Keith Stevens, Made Nindyatama Nityasya, Muhammad Farid Adilazuarda, Ryan Ignatius, Ryandito Diandaru, Tiezheng Yu, Vito Ghifari, Wenliang Dai, Yan Xu, Dyah Damapuspita, Cuk Tho, Ichwanul Muslim Karo Karo, Tirana Noor Fatyanosa, Ziwei Ji, Pascale Fung, Graham Neubig, Timothy Baldwin, Sebastian Ruder, Herry Sujaini, Sakriani Sakti, Ayu Purwarianti

We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

253

Paper
Code

ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation

2 code implementations • LREC 2022 • Holy Lovenia, Samuel Cahyawijaya, Genta Indra Winata, Peng Xu, Xu Yan, Zihan Liu, Rita Frieske, Tiezheng Yu, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

ASCEND (A Spontaneous Chinese-English Dataset) is a high-quality Mandarin Chinese-English code-switching corpus built on spontaneous multi-turn conversational dialogue sources collected in Hong Kong.

119

Paper
Code

CrossNER: Evaluating Cross-Domain Named Entity Recognition

5 code implementations • 8 Dec 2020 • Zihan Liu, Yan Xu, Tiezheng Yu, Wenliang Dai, Ziwei Ji, Samuel Cahyawijaya, Andrea Madotto, Pascale Fung

Cross-domain named entity recognition (NER) models are able to cope with the scarcity issue of NER samples in target domains.

Cross-Domain Named Entity Recognition Domain Adaptation +3

114

Paper
Code

Multimodal End-to-End Sparse Model for Emotion Recognition

1 code implementation • NAACL 2021 • Wenliang Dai, Samuel Cahyawijaya, Zihan Liu, Pascale Fung

Existing works on multimodal affective computing tasks, such as emotion recognition, generally adopt a two-phase pipeline, first extracting feature representations for each single modality with hand-crafted algorithms and then performing end-to-end learning with the extracted features.

Emotion Recognition

Paper
Code

NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages

2 code implementations • 31 May 2022 • Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Rahmad Mahendra, Fajri Koto, Ade Romadhony, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Pascale Fung, Timothy Baldwin, Jey Han Lau, Rico Sennrich, Sebastian Ruder

In this work, we focus on developing resources for languages in Indonesia.

Machine Translation Translation

Paper
Code

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

1 code implementation • 8 Feb 2023 • Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, Quyet V. Do, Yan Xu, Pascale Fung

It is, for example, better at deductive than inductive reasoning.

Code Generation Hallucination +4

Paper
Code

XPersona: Evaluating Multilingual Personalized Chatbot

1 code implementation • EMNLP (NLP4ConvAI) 2021 • Zhaojiang Lin, Zihan Liu, Genta Indra Winata, Samuel Cahyawijaya, Andrea Madotto, Yejin Bang, Etsuko Ishii, Pascale Fung

Experimental results show that the multilingual trained models outperform the translation-pipeline and that they are on par with the monolingual models, with the advantage of having a single model across multiple languages.

Chatbot Translation

Paper
Code

Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset

1 code implementation • LREC 2022 • Tiezheng Yu, Rita Frieske, Peng Xu, Samuel Cahyawijaya, Cheuk Tung Shadow Yiu, Holy Lovenia, Wenliang Dai, Elham J. Barezi, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

We further conduct experiments with Fairseq S2T Transformer, a state-of-the-art ASR model, on the biggest existing dataset, Common Voice zh-HK, and our proposed MDCC, and the results show the effectiveness of our dataset.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Meta-Transfer Learning for Code-Switched Speech Recognition

1 code implementation • ACL 2020 • Genta Indra Winata, Samuel Cahyawijaya, Zhaojiang Lin, Zihan Liu, Peng Xu, Pascale Fung

An increasing number of people in the world today speak a mixed-language as a result of being multilingual.

Language Modelling speech-recognition +2

Paper
Code

Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Andrea Madotto, Samuel Cahyawijaya, Genta Indra Winata, Yan Xu, Zihan Liu, Zhaojiang Lin, Pascale Fung

In this paper, we propose a method to embed the KB, of any size, directly into the model parameters.

Dialogue State Tracking Management +1

Paper
Code

Learning Fast Adaptation on Cross-Accented Speech Recognition

1 code implementation • 4 Mar 2020 • Genta Indra Winata, Samuel Cahyawijaya, Zihan Liu, Zhaojiang Lin, Andrea Madotto, Peng Xu, Pascale Fung

The great variability and complex characteristics of accents creates a major challenge for training a robust and accent-agnostic automatic speech recognition (ASR) system.

Audio and Speech Processing Sound

Paper
Code

Subobject-level Image Tokenization

1 code implementation • 22 Feb 2024 • Delong Chen, Samuel Cahyawijaya, Jianfeng Liu, Baoyuan Wang, Pascale Fung

Transformer-based vision models typically tokenize images into fixed-size square patches as input units, which lacks the adaptability to image content and overlooks the inherent pixel grouping structure.

Attribute Language Modelling +1

Paper
Code

CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition

1 code implementation • 11 Jan 2022 • Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J. Barezi, Peng Xu, Cheuk Tung Shadow Yiu, Rita Frieske, Holy Lovenia, Genta Indra Winata, Qifeng Chen, Xiaojuan Ma, Bertram E. Shi, Pascale Fung

With the rise of deep learning and intelligent vehicle, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities.

Audio-Visual Speech Recognition speech-recognition +1

Paper
Code

CI-AVSR: A Cantonese Audio-Visual Speech Datasetfor In-car Command Recognition

1 code implementation • LREC 2022 • Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J. Barezi, Peng Xu, Cheuk Tung Yiu, Rita Frieske, Holy Lovenia, Genta Winata, Qifeng Chen, Xiaojuan Ma, Bertram Shi, Pascale Fung

With the rise of deep learning and intelligent vehicles, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities.

Audio-Visual Speech Recognition speech-recognition +1

Paper
Code

NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages

1 code implementation • 19 Sep 2023 • Samuel Cahyawijaya, Holy Lovenia, Fajri Koto, Dea Adhista, Emmanuel Dave, Sarah Oktavianti, Salsabil Maulana Akbar, Jhonson Lee, Nuur Shadieq, Tjeng Wawan Cenggoro, Hanung Wahyuning Linuwih, Bryan Wilie, Galih Pradipta Muridan, Genta Indra Winata, David Moeljadi, Alham Fikri Aji, Ayu Purwarianti, Pascale Fung

We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets.

Document Translation Translation

Paper
Code

Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters

1 code implementation • dialdoc (ACL) 2022 • Yan Xu, Etsuko Ishii, Samuel Cahyawijaya, Zihan Liu, Genta Indra Winata, Andrea Madotto, Dan Su, Pascale Fung

This paper proposes KnowExpert, a framework to bypass the explicit retrieval process and inject knowledge into the pre-trained language models with lightweight adapters and adapt to the knowledge-grounded dialogue task.

Response Generation Retrieval

Paper
Code

IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation

2 code implementations • EMNLP 2021 • Samuel Cahyawijaya, Genta Indra Winata, Bryan Wilie, Karissa Vincentio, Xiaohong Li, Adhiguna Kuncoro, Sebastian Ruder, Zhi Yuan Lim, Syafri Bahar, Masayu Leylia Khodra, Ayu Purwarianti, Pascale Fung

Natural language generation (NLG) benchmarks provide an important avenue to measure progress and develop better NLG systems.

Machine Translation Question Answering +2

Paper
Code

SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study

1 code implementation • BioNLP (ACL) 2022 • Samuel Cahyawijaya, Tiezheng Yu, Zihan Liu, Tiffany T. W. Mak, Xiaopu Zhou, Nancy Y. Ip, Pascale Fung

We apply SNP2Vec to perform long-sequence genomics modeling, and we evaluate the effectiveness of our approach on predicting Alzheimer's disease risk in a Chinese cohort.

Genome Understanding

Paper
Code

Can Question Rewriting Help Conversational Question Answering?

1 code implementation • insights (ACL) 2022 • Etsuko Ishii, Yan Xu, Samuel Cahyawijaya, Bryan Wilie

Question rewriting (QR) is a subtask of conversational question answering (CQA) aiming to ease the challenges of understanding dependencies among dialogue history by reformulating questions in a self-contained form.

Question Rewriting reinforcement-learning +1

Paper
Code

Cross-Lingual Cross-Age Group Adaptation for Low-Resource Elderly Speech Emotion Recognition

1 code implementation • 26 Jun 2023 • Samuel Cahyawijaya, Holy Lovenia, Willy Chung, Rita Frieske, Zihan Liu, Pascale Fung

In this work, we analyze the transferability of emotion recognition across three different languages--English, Mandarin Chinese, and Cantonese; and 2 different age groups--adults and the elderly.

Data Augmentation Speech Emotion Recognition

Paper
Code

InstructAlign: High-and-Low Resource Language Alignment via Continual Crosslingual Instruction Tuning

1 code implementation • 23 May 2023 • Samuel Cahyawijaya, Holy Lovenia, Tiezheng Yu, Willy Chung, Pascale Fung

Our results demonstrate the effectiveness of InstructAlign in enabling the model to understand low-resource languages with limited parallel data while preventing catastrophic forgetting.

Paper
Code

InstructTODS: Large Language Models for End-to-End Task-Oriented Dialogue Systems

1 code implementation • 13 Oct 2023 • Willy Chung, Samuel Cahyawijaya, Bryan Wilie, Holy Lovenia, Pascale Fung

We present InstructTODS, a novel off-the-shelf framework for zero-shot end-to-end task-oriented dialogue systems that can adapt to diverse domains without fine-tuning.

Dialogue State Tracking Informativeness +4

Paper
Code

How Long Is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling

1 code implementation • 25 Oct 2022 • Samuel Cahyawijaya, Bryan Wilie, Holy Lovenia, Huan Zhong, MingQian Zhong, Yuk-Yu Nancy Ip, Pascale Fung

Large pre-trained language models (LMs) have been widely adopted in biomedical and clinical domains, introducing many powerful LMs such as bio-lm and BioELECTRA.

Language Modelling

Paper
Code

Which One Are You Referring To? Multimodal Object Identification in Situated Dialogue

1 code implementation • 28 Feb 2023 • Holy Lovenia, Samuel Cahyawijaya, Pascale Fung

The demand for multimodal dialogue systems has been rising in various domains, emphasizing the importance of interpreting multimodal inputs from conversational and situational contexts.

Paper
Code

Lightweight and Efficient End-to-End Speech Recognition Using Low-Rank Transformer

no code implementations • 30 Oct 2019 • Genta Indra Winata, Samuel Cahyawijaya, Zhaojiang Lin, Zihan Liu, Pascale Fung

Highly performing deep neural networks come at the cost of computational complexity that limits their practicality for deployment on portable devices.

Language Modelling speech-recognition +1

Paper
Add Code

On the Importance of Word Order Information in Cross-lingual Sequence Labeling

no code implementations • 30 Jan 2020 • Zihan Liu, Genta Indra Winata, Samuel Cahyawijaya, Andrea Madotto, Zhaojiang Lin, Pascale Fung

To verify this hypothesis, we investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.

named-entity-recognition Named Entity Recognition +3

Paper
Add Code

Model Generalization on COVID-19 Fake News Detection

no code implementations • 11 Jan 2021 • Yejin Bang, Etsuko Ishii, Samuel Cahyawijaya, Ziwei Ji, Pascale Fung

Amid the pandemic COVID-19, the world is facing unprecedented infodemic with the proliferation of both fake and real information.

Fake News Detection Misinformation

Paper
Add Code

Are Multilingual Models Effective in Code-Switching?

no code implementations • NAACL (CALCS) 2021 • Genta Indra Winata, Samuel Cahyawijaya, Zihan Liu, Zhaojiang Lin, Andrea Madotto, Pascale Fung

Multilingual language models have shown decent performance in multilingual and cross-lingual natural language understanding tasks.

named-entity-recognition Named Entity Recognition +3

Paper
Add Code

Weakly-supervised Multi-task Learning for Multimodal Affect Recognition

no code implementations • 23 Apr 2021 • Wenliang Dai, Samuel Cahyawijaya, Yejin Bang, Pascale Fung

In this paper, we propose to leverage these datasets using weakly-supervised multi-task learning to improve the generalization performance on each of them.

Emotion Recognition Multi-Task Learning +1

Paper
Add Code

ERICA: An Empathetic Android Companion for Covid-19 Quarantine

no code implementations • SIGDIAL (ACL) 2021 • Etsuko Ishii, Genta Indra Winata, Samuel Cahyawijaya, Divesh Lala, Tatsuya Kawahara, Pascale Fung

Over the past year, research in various domains, including Natural Language Processing (NLP), has been accelerated to fight against the COVID-19 pandemic, yet such research has just started on dialogue systems.

Paper
Add Code

Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation

no code implementations • 24 Aug 2021 • Samuel Cahyawijaya

We also show that Low-Rank Transformer is more suitable for on-device deployment, as it significantly reduces the model size.

Paper
Add Code

Greenformer: Factorization Toolkit for Efficient Deep Neural Networks

no code implementations • 14 Sep 2021 • Samuel Cahyawijaya, Genta Indra Winata, Holy Lovenia, Bryan Wilie, Wenliang Dai, Etsuko Ishii, Pascale Fung

While the recent advances in deep neural networks (DNN) bring remarkable success, the computational cost also increases considerably.

Paper
Add Code

VScript: Controllable Script Generation with Visual Presentation

no code implementations • 1 Mar 2022 • Ziwei Ji, Yan Xu, I-Tsun Cheng, Samuel Cahyawijaya, Rita Frieske, Etsuko Ishii, Min Zeng, Andrea Madotto, Pascale Fung

In order to offer a customized script tool and inspire professional scriptwriters, we present VScript.

Dialogue Generation Retrieval +1

Paper
Add Code

One Country, 700+ Languages: NLP Challenges for Underrepresented Languages and Dialects in Indonesia

no code implementations • ACL 2022 • Alham Fikri Aji, Genta Indra Winata, Fajri Koto, Samuel Cahyawijaya, Ade Romadhony, Rahmad Mahendra, Kemal Kurniawan, David Moeljadi, Radityo Eko Prasojo, Timothy Baldwin, Jey Han Lau, Sebastian Ruder

NLP research is impeded by a lack of resources and awareness of the challenges presented by underrepresented languages and dialects.

Paper
Add Code

Clozer: Adaptable Data Augmentation for Cloze-style Reading Comprehension

no code implementations • 30 Mar 2022 • Holy Lovenia, Bryan Wilie, Willy Chung, Min Zeng, Samuel Cahyawijaya, Su Dan, Pascale Fung

Task-adaptive pre-training (TAPT) alleviates the lack of labelled data and provides performance lift by adapting unlabelled data to downstream task.

Data Augmentation Machine Reading Comprehension +1

Paper
Add Code

Towards Answering Open-ended Ethical Quandary Questions

no code implementations • 12 May 2022 • Yejin Bang, Nayeon Lee, Tiezheng Yu, Leila Khalatbari, Yan Xu, Samuel Cahyawijaya, Dan Su, Bryan Wilie, Romain Barraud, Elham J. Barezi, Andrea Madotto, Hayden Kee, Pascale Fung

We explore the current capability of LLMs in providing an answer with a deliberative exchange of different perspectives to an ethical quandary, in the approach of Socratic philosophy, instead of providing a closed answer like an oracle.

Few-Shot Learning Generative Question Answering +2

Paper
Add Code

Clozer”:" Adaptable Data Augmentation for Cloze-style Reading Comprehension

no code implementations • RepL4NLP (ACL) 2022 • Holy Lovenia, Bryan Wilie, Willy Chung, Zeng Min, Samuel Cahyawijaya, Dan Su, Pascale Fung

Task-adaptive pre-training (TAPT) alleviates the lack of labelled data and provides performance lift by adapting unlabelled data to downstream task.

Data Augmentation Machine Reading Comprehension +1

Paper
Add Code

Integrating Question Rewrites in Conversational Question Answering: A Reinforcement Learning Approach

no code implementations • ACL 2022 • Etsuko Ishii, Bryan Wilie, Yan Xu, Samuel Cahyawijaya, Pascale Fung

Resolving dependencies among dialogue history is one of the main obstacles in the research on conversational question answering (QA).

Conversational Question Answering reinforcement-learning +1

Paper
Add Code

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

no code implementations • 22 Jun 2022 • Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina McMillan-Major, Anna Shvets, Ashish Upadhyay, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Wang, Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna Kanerva, Jenny Chim, Jiawei Zhou, Jordan Clive, Joshua Maynez, João Sedoc, Juraj Juraska, Kaustubh Dhole, Khyathi Raghavi Chandu, Laura Perez-Beltrachini, Leonardo F. R. Ribeiro, Lewis Tunstall, Li Zhang, Mahima Pushkarna, Mathias Creutz, Michael White, Mihir Sanjay Kale, Moussa Kamal Eddine, Nico Daheim, Nishant Subramani, Ondrej Dusek, Paul Pu Liang, Pawan Sasanka Ammanamanchi, Qi Zhu, Ratish Puduppully, Reno Kriz, Rifat Shahriyar, Ronald Cardenas, Saad Mahamood, Salomey Osei, Samuel Cahyawijaya, Sanja Štajner, Sebastien Montella, Shailza, Shailza Jolly, Simon Mille, Tahmid Hasan, Tianhao Shen, Tosin Adewumi, Vikas Raunak, Vipul Raheja, Vitaly Nikolaev, Vivian Tsai, Yacine Jernite, Ying Xu, Yisi Sang, Yixin Liu, Yufang Hou

This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims.

Benchmarking Text Generation

Paper
Add Code

Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands

no code implementations • 6 Jul 2022 • Wenliang Dai, Samuel Cahyawijaya, Tiezheng Yu, Elham J Barezi, Pascale Fung

With the rise of deep learning and intelligent vehicles, the smart assistant has become an essential in-car component to facilitate driving and provide extra functionalities.

Audio-Visual Speech Recognition speech-recognition +1

Paper
Add Code

NusaCrowd: A Call for Open and Reproducible NLP Research in Indonesian Languages

no code implementations • 21 Jul 2022 • Samuel Cahyawijaya, Alham Fikri Aji, Holy Lovenia, Genta Indra Winata, Bryan Wilie, Rahmad Mahendra, Fajri Koto, David Moeljadi, Karissa Vincentio, Ade Romadhony, Ayu Purwarianti

At the center of the underlying issues that halt Indonesian natural language processing (NLP) research advancement, we find data scarcity.

Paper
Add Code

Every picture tells a story: Image-grounded controllable stylistic story generation

no code implementations • LaTeCHCLfL (COLING) 2022 • Holy Lovenia, Bryan Wilie, Romain Barraud, Samuel Cahyawijaya, Willy Chung, Pascale Fung

Generating a short story out of an image is arduous.

Image Captioning Story Generation

Paper
Add Code

Biomedical Image Reconstruction: A Survey

no code implementations • 4 Jan 2023 • Samuel Cahyawijaya

Biomedical image reconstruction research has been developed for more than five decades, giving rise to various techniques such as central and filtered back projection.

Image Reconstruction

Paper
Add Code

Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages

no code implementations • 23 Mar 2023 • Zheng-Xin Yong, Ruochen Zhang, Jessica Zosa Forde, Skyler Wang, Arjun Subramonian, Holy Lovenia, Samuel Cahyawijaya, Genta Indra Winata, Lintang Sutawika, Jan Christian Blaise Cruz, Yin Lin Tan, Long Phan, Rowena Garcia, Thamar Solorio, Alham Fikri Aji

While code-mixing is a common linguistic practice in many parts of the world, collecting high-quality and low-cost code-mixed data remains a challenge for natural language processing (NLP) research.

Paper
Add Code

Multilingual Large Language Models Are Not (Yet) Code-Switchers

no code implementations • 23 May 2023 • Ruochen Zhang, Samuel Cahyawijaya, Jan Christian Blaise Cruz, Genta Indra Winata, Alham Fikri Aji

Multilingual Large Language Models (LLMs) have recently shown great capabilities in a wide range of tasks, exhibiting state-of-the-art performance through zero-shot or few-shot prompting methods.

Benchmarking Language Identification +2

Paper
Add Code

GlobalBench: A Benchmark for Global Progress in Natural Language Processing

no code implementations • 24 May 2023 • Yueqi Song, Catherine Cui, Simran Khanuja, PengFei Liu, Fahim Faisal, Alissa Ostapenko, Genta Indra Winata, Alham Fikri Aji, Samuel Cahyawijaya, Yulia Tsvetkov, Antonios Anastasopoulos, Graham Neubig

Despite the major advances in NLP, significant disparities in NLP system performance across languages still exist.

Paper
Add Code

Multi-lingual and Multi-cultural Figurative Language Understanding

no code implementations • 25 May 2023 • Anubha Kabra, Emmy Liu, Simran Khanuja, Alham Fikri Aji, Genta Indra Winata, Samuel Cahyawijaya, Anuoluwapo Aremu, Perez Ogayo, Graham Neubig

Figurative language permeates human communication, but at the same time is relatively understudied in NLP.

Paper
Add Code

PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded Dialogue Systems

1 code implementation • 19 Sep 2023 • Bryan Wilie, Yan Xu, Willy Chung, Samuel Cahyawijaya, Holy Lovenia, Pascale Fung

Grounding dialogue response generation on external knowledge is proposed to produce informative and engaging responses.

Hallucination Language Modelling +1

Paper
Code

Survey of Social Bias in Vision-Language Models

no code implementations • 24 Sep 2023 • Nayeon Lee, Yejin Bang, Holy Lovenia, Samuel Cahyawijaya, Wenliang Dai, Pascale Fung

This survey aims to provide researchers with a high-level insight into the similarities and differences of social bias studies in pre-trained models across NLP, CV, and VL.

Fairness

Paper
Add Code

Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models

no code implementations • 9 Oct 2023 • Holy Lovenia, Wenliang Dai, Samuel Cahyawijaya, Ziwei Ji, Pascale Fung

Object hallucination poses a significant challenge in vision-language (VL) models, often leading to the generation of nonsensical or unfaithful responses with non-existent objects.

Hallucination Object +2

Paper
Add Code

IndoToD: A Multi-Domain Indonesian Benchmark For End-to-End Task-Oriented Dialogue Systems

1 code implementation • 2 Nov 2023 • Muhammad Dehan Al Kautsar, Rahmah Khoirussyifa' Nurdini, Samuel Cahyawijaya, Genta Indra Winata, Ayu Purwarianti

Task-oriented dialogue (ToD) systems have been mostly created for high-resource languages, such as English and Chinese.

Task-Oriented Dialogue Systems Transfer Learning

Paper
Code

IndoRobusta: Towards Robustness Against Diverse Code-Mixed Indonesian Local Languages

no code implementations • 21 Nov 2023 • Muhammad Farid Adilazuarda, Samuel Cahyawijaya, Genta Indra Winata, Pascale Fung, Ayu Purwarianti

Significant progress has been made on Indonesian NLP.

Paper
Add Code

The Obscure Limitation of Modular Multilingual Language Models

no code implementations • 21 Nov 2023 • Muhammad Farid Adilazuarda, Samuel Cahyawijaya, Ayu Purwarianti

We expose the limitation of modular multilingual language models (MLMs) in multilingual inference scenarios with unknown languages.

Language Identification

Paper
Add Code

LinguAlchemy: Fusing Typological and Geographical Elements for Unseen Language Generalization

no code implementations • 11 Jan 2024 • Muhammad Farid Adilazuarda, Samuel Cahyawijaya, Alham Fikri Aji, Genta Indra Winata, Ayu Purwarianti

Pretrained language models (PLMs) have shown remarkable generalization toward multiple tasks and languages.

XLM-R

Paper
Add Code

LLMs Are Few-Shot In-Context Low-Resource Language Learners

no code implementations • 25 Mar 2024 • Samuel Cahyawijaya, Holy Lovenia, Pascale Fung

In-context learning (ICL) empowers large language models (LLMs) to perform diverse tasks in underrepresented languages using only short in-context information, offering a crucial avenue for narrowing the gap between high-resource and low-resource languages.

In-Context Learning

Paper
Add Code

Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages

no code implementations • 9 Apr 2024 • Samuel Cahyawijaya, Holy Lovenia, Fajri Koto, Rifki Afina Putri, Emmanuel Dave, Jhonson Lee, Nuur Shadieq, Wawan Cenggoro, Salsabil Maulana Akbar, Muhammad Ihza Mahendra, Dea Annisayanti Putri, Bryan Wilie, Genta Indra Winata, Alham Fikri Aji, Ayu Purwarianti, Pascale Fung

To bridge this quality gap, we introduce Cendol, a collection of Indonesian LLMs encompassing both decoder-only and encoder-decoder architectures across a range of model sizes.

Paper
Add Code

High-Dimension Human Value Representation in Large Language Models

no code implementations • 11 Apr 2024 • Samuel Cahyawijaya, Delong Chen, Yejin Bang, Leila Khalatbari, Bryan Wilie, Ziwei Ji, Etsuko Ishii, Pascale Fung

there is an urgent need to understand the scope and nature of human values injected into these models before their release.

Language Modelling

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.