no code implementations • 10 Jun 2019 • Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang
However, all these approaches still require large amount of manual effort in preparing parallel program corpora, ranging from pairs of APIs, to manually identified code in different languages that are considered as functionally equivalent.
no code implementations • 6 Sep 2020 • Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang
Corder is designed to alleviate the need of labeled data for code retrieval and code summarization tasks.
no code implementations • 5 Sep 2020 • Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang
Although syntax trees are precisely defined according to the language grammar and easier to construct and process than graphs, previous tree-based learning techniques have not been able to learn semantic information from trees to achieve better accuracy than graph-based techniques.
no code implementations • 13 Dec 2020 • Nghi D. Q. Bui, Yijun Yu, Lingxiao Jiang
We trained an InferCode model instance using the Tree-based CNN as the encoder of a large set of Java code and applied it to downstream unsupervised tasks such as code clustering, code clone detection, cross-language code search or reused under a transfer learning scheme to continue training the model weights for supervised tasks such as code classification and method name prediction.
no code implementations • 20 Dec 2021 • Nghi D. Q. Bui, Yijun Yu
In programming, learning code representations has a variety of applications, including code classification, code search, comment generation, bug prediction, and so on.
no code implementations • 25 May 2022 • Anh T. V. Dau, Thang Nguyen-Duc, Hoang Thanh-Tung, Nghi D. Q. Bui
Despite the recent trend of developing and applying neural source code models to software engineering tasks, the quality of such models is insufficient for real-world use.
no code implementations • 31 May 2022 • Minh Huynh Nguyen, Nghi D. Q. Bui, Truong Son Hy, Long Tran-Thanh, Tien N. Nguyen
We propose a novel method for code summarization utilizing Heterogeneous Code Representations (HCRs) and our specially designed HierarchyNet.
no code implementations • 27 Nov 2022 • Nghi D. Q. Bui, Yue Wang, Steven Hoi
Specifically, we propose three objectives to adapt the generic CodeT5 for debugging: a bug detection objective to determine whether a given code snippet is buggy or not, a bug localization objective to identify the buggy lines, and a program repair objective to translate the buggy code to its fixed version.
no code implementations • 2 Apr 2023 • Hung Quoc To, Nghi D. Q. Bui, Jin Guo, Tien N. Nguyen
We aim to improve this issue by proposing a simple data augmentation framework.
no code implementations • 21 Mar 2024 • Khanh Nghiem, Anh Minh Nguyen, Nghi D. Q. Bui
As a research-product hybrid group in AI for Software Engineering (AI4SE), we present four key takeaways from our experience developing in-IDE AI coding assistants.
1 code implementation • 2 May 2023 • Thang Nguyen-Duc, Hoang Thanh-Tung, Quan Hung Tran, Dang Huu-Tien, Hieu Ngoc Nguyen, Anh T. V. Dau, Nghi D. Q. Bui
Influence functions (IFs) are a powerful tool for detecting anomalous examples in large scale datasets.
2 code implementations • 16 Oct 2023 • Hung Quoc To, Minh Huynh Nguyen, Nghi D. Q. Bui
In this work, we introduce \textit{SRank}, a novel reranking strategy for selecting the best solution from code generation that focuses on modeling the relationship between clusters of solutions.
Ranked #17 on Code Generation on HumanEval
1 code implementation • 13 Mar 2018 • Nghi D. Q. Bui, Lingxiao Jiang
Our preliminary evaluations on about 40, 000 Java and C# source files from 9 software projects show that our approach can automatically learn shared embeddings for various code elements in different languages and identify their cross-language mappings with reasonable Mean Average Precision scores.
1 code implementation • 31 Jul 2020 • Md Rafiqul Islam Rabin, Nghi D. Q. Bui, Ke Wang, Yijun Yu, Lingxiao Jiang, Mohammad Amin Alipour
With the prevalence of publicly available source code repositories to train deep neural network models, neural program models can do well in source code analysis tasks such as predicting method names in given programs that cannot be easily done by traditional program analysis techniques.
1 code implementation • 10 Mar 2024 • Huy N. Phan, Hoang N. Phan, Tien N. Nguyen, Nghi D. Q. Bui
Code Large Language Models (CodeLLMs) have demonstrated impressive proficiency in code completion tasks.
1 code implementation • 17 Oct 2017 • Nghi D. Q. Bui, Lingxiao Jiang, Yijun Yu
It is layered on top of two tree-based convolutional neural networks (TBCNNs), each of which recognizes the algorithm of code written in an individual programming language.
1 code implementation • 9 May 2023 • Dung Nguyen Manh, Nam Le Hai, Anh T. V. Dau, Anh Minh Nguyen, Khanh Nghiem, Jin Guo, Nghi D. Q. Bui
We present The Vault, a dataset of high-quality code-text pairs in multiple programming languages for training large language models to understand and generate code.
1 code implementation • 31 May 2023 • Nghi D. Q. Bui, Hung Le, Yue Wang, Junnan Li, Akhilesh Deepak Gotmare, Steven C. H. Hoi
In this paper, we present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence.
1 code implementation • 13 May 2023 • Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi D. Q. Bui, Junnan Li, Steven C. H. Hoi
To address these limitations, we propose ``CodeT5+'', a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks.
Ranked #1 on Code Search on CodeXGLUE - AdvTest