no code implementations • 20 Dec 2022 • Yangruibo Ding, Zijian Wang, Wasi Uddin Ahmad, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, Bing Xiang
While pre-trained language models (LM) for code have achieved great success in code completion, they generate code conditioned only on the contents within the file, i. e., in-file context, but ignore the rich semantics in other files within the same project, i. e., cross-file context, a critical source of information that is especially useful in modern modular software development.
no code implementations • 20 Dec 2022 • Jianfeng Chi, Wasi Uddin Ahmad, Yuan Tian, Kai-Wei Chang
To this end, we introduce the Privacy Policy Language Understanding Evaluation (PLUE) benchmark, a multi-task benchmark for evaluating the privacy policy language understanding across various tasks.
1 code implementation • 20 Dec 2022 • Di wu, Wasi Uddin Ahmad, Kai-Wei Chang
However, there lacks a systematic study of how the two types of approaches compare and how different design choices can affect the performance of PLM-based models.
1 code implementation • 26 Oct 2022 • Ben Athiwaratkun, Sanjay Krishna Gouda, Zijian Wang, Xiaopeng Li, Yuchen Tian, Ming Tan, Wasi Uddin Ahmad, Shiqi Wang, Qing Sun, Mingyue Shang, Sujan Kumar Gonugondla, Hantian Ding, Varun Kumar, Nathan Fulton, Arash Farahani, Siddhartha Jain, Robert Giaquinto, Haifeng Qian, Murali Krishna Ramanathan, Ramesh Nallapati, Baishakhi Ray, Parminder Bhatia, Sudipta Sengupta, Dan Roth, Bing Xiang
Using these benchmarks, we are able to assess the performance of code generation models in a multi-lingual fashion, and discovered generalization ability of language models on out-of-domain languages, advantages of multi-lingual models over mono-lingual, the ability of few-shot prompting to teach the model new languages, and zero-shot translation abilities even on mono-lingual settings.
no code implementations • 3 Oct 2022 • Nihal Jain, Dejiao Zhang, Wasi Uddin Ahmad, Zijian Wang, Feng Nan, Xiaopeng Li, Ming Tan, Ramesh Nallapati, Baishakhi Ray, Parminder Bhatia, Xiaofei Ma, Bing Xiang
Despite exciting progress in large-scale language generation, the expressiveness of its representations is severely limited by the \textit{anisotropy} issue where the hidden representations are distributed into a narrow cone in the vector space.
1 code implementation • 15 Jun 2022 • Md Mahim Anjum Haque, Wasi Uddin Ahmad, Ismini Lourentzou, Chris Brown
To address this issue, we introduce FixEval, a benchmark comprising of buggy code submissions to competitive programming problems and their corresponding fixes.
1 code implementation • 23 May 2022 • Abhik Bhattacharjee, Tahmid Hasan, Wasi Uddin Ahmad, Rifat Shahriyar
This work presents BanglaNLG, a comprehensive benchmark for evaluating natural language generation (NLG) models in Bangla, a widely spoken yet low-resource language.
1 code implementation • 23 May 2022 • Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang
In code generation, the model learns to do the opposite.
no code implementations • 19 Apr 2022 • Md Rizwan Parvez, Jianfeng Chi, Wasi Uddin Ahmad, Yuan Tian, Kai-Wei Chang
Prior studies in privacy policies frame the question answering (QA) tasks as identifying the most relevant text segment or a list of sentences from the policy document for a user query.
1 code implementation • 15 Mar 2022 • Di wu, Wasi Uddin Ahmad, Sunipa Dev, Kai-Wei Chang
State-of-the-art keyphrase generation methods generally depend on large annotated datasets, limiting their performance in domains with limited annotated data.
1 code implementation • 16 Dec 2021 • Abhik Bhattacharjee, Tahmid Hasan, Wasi Uddin Ahmad, Yuan-Fang Li, Yong-Bin Kang, Rifat Shahriyar
We present CrossSum, a large-scale cross-lingual abstractive summarization dataset comprising 1. 7 million article-summary samples in 1500+ language pairs.
Abstractive Text Summarization
Cross-Lingual Abstractive Summarization
+1
1 code implementation • Findings (EMNLP) 2021 • Md Rizwan Parvez, Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang
To mimic developers' code or summary generation behavior, we propose a retrieval augmented framework, REDCODER, that retrieves relevant code or summaries from a retrieval database and provides them as a supplement to code generation or summarization models.
Ranked #1 on
Code Generation
on CodeXGLUE - CodeSearchNet
(using extra training data)
1 code implementation • 26 Aug 2021 • Wasi Uddin Ahmad, Md Golam Rahman Tushar, Saikat Chakraborty, Kai-Wei Chang
Program translation refers to migrating source code from one programming language to another.
1 code implementation • ACL 2021 • Wasi Uddin Ahmad, Haoran Li, Kai-Wei Chang, Yashar Mehdad
In recent years, we have seen a colossal effort in pre-training multilingual text encoders using large-scale corpora in many languages to facilitate cross-lingual transfer learning.
1 code implementation • 29 May 2021 • Masum Hasan, Tanveer Muttaqueen, Abdullah Al Ishtiaq, Kazi Sajeed Mehrab, Md. Mahim Anjum Haque, Tahmid Hasan, Wasi Uddin Ahmad, Anindya Iqbal, Rifat Shahriyar
In this study, we present CoDesc -- a large parallel dataset composed of 4. 2 million Java methods and natural language descriptions.
Ranked #1 on
Code Search
on CoDesc
1 code implementation • EMNLP 2021 • Kuan-Hao Huang, Wasi Uddin Ahmad, Nanyun Peng, Kai-Wei Chang
Pre-trained multilingual language encoders, such as multilingual BERT and XLM-R, show great potential for zero-shot cross-lingual transfer.
2 code implementations • 16 Apr 2021 • Masum Hasan, Kazi Sajeed Mehrab, Wasi Uddin Ahmad, Rifat Shahriyar
We overcome this limitation by transforming natural language into an abstract intermediate formal language representing an application with a substantially smaller number of tokens.
1 code implementation • NAACL 2021 • Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang
Experiments on code summarization in the English language, code generation, and code translation in seven programming languages show that PLBART outperforms or rivals state-of-the-art models.
1 code implementation • Findings (NAACL) 2022 • Abhik Bhattacharjee, Tahmid Hasan, Wasi Uddin Ahmad, Kazi Samin, Md Saiful Islam, Anindya Iqbal, M. Sohel Rahman, Rifat Shahriyar
In this work, we introduce BanglaBERT, a BERT-based Natural Language Understanding (NLU) model pretrained in Bangla, a widely spoken yet low-resource language in the NLP literature.
1 code implementation • ACL 2021 • Wasi Uddin Ahmad, Jianfeng Chi, Tu Le, Thomas Norton, Yuan Tian, Kai-Wei Chang
We refer to predicting the privacy practice explained in a sentence as intent classification and identifying the text spans sharing specific information as slot filling.
1 code implementation • 9 Dec 2020 • Susmoy Chakraborty, Mir Tafseer Nayeem, Wasi Uddin Ahmad
Determining the readability of a text is the first step to its simplification.
1 code implementation • 6 Oct 2020 • Wasi Uddin Ahmad, Nanyun Peng, Kai-Wei Chang
Recent progress in cross-lingual relation and event extraction use graph convolutional networks (GCNs) with universal dependency parses to learn language-agnostic sentence representations such that models trained on one language can be applied to other languages.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Wasi Uddin Ahmad, Jianfeng Chi, Yuan Tian, Kai-Wei Chang
Prior studies in this domain frame the QA task as retrieving the most relevant text segment or a list of sentences from the policy document given a question.
no code implementations • ACL 2021 • Wasi Uddin Ahmad, Xiao Bai, Soomin Lee, Kai-Wei Chang
Natural language processing techniques have demonstrated promising results in keyphrase generation.
8 code implementations • ACL 2020 • Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang
Generating a readable summary that describes the functionality of a program is known as source code summarization.
1 code implementation • CONLL 2019 • Wasi Uddin Ahmad, Zhisong Zhang, Xuezhe Ma, Kai-Wei Chang, Nanyun Peng
We conduct experiments on cross-lingual dependency parsing where we train a dependency parser on a source language and transfer it to a wide range of target languages.
5 code implementations • 5 Jun 2019 • Wasi Uddin Ahmad, Kai-Wei Chang, Hongning Wang
We present a context-aware neural ranking model to exploit users' on-task search activities and enhance retrieval performance.
2 code implementations • NAACL 2019 • Wasi Uddin Ahmad, Zhisong Zhang, Xuezhe Ma, Eduard Hovy, Kai-Wei Chang, Nanyun Peng
Different languages might have different word orders.
no code implementations • 21 Apr 2018 • Wasi Uddin Ahmad, Xueying Bai, Zhechao Huang, Chao Jiang, Nanyun Peng, Kai-Wei Chang
Learning distributed sentence representations is one of the key challenges in natural language processing.
1 code implementation • ICLR 2018 • Wasi Uddin Ahmad, Kai-Wei Chang, Hongning Wang
We propose a multi-task learning framework to jointly learn document ranking and query suggestion for web search.