no code implementations • COLING (TextGraphs) 2020 • Zhenqi Zhao, Yuchen Guo, Dingxian Wang, Yufan Huang, Xiangnan He, Bin Gu
Entity Resolution (ER) identifies records that refer to the same real-world entity.
no code implementations • 13 Apr 2024 • MengNan Qi, Yufan Huang, Yongqiang Yao, Maoquan Wang, Bin Gu, Neel Sundaresan
Our experimental results reveal that following this pretraining, both Code Llama and StarCoder, the prevalent code domain pretraining models, display significant improvements on our logically equivalent code selection task and the code completion task.
no code implementations • 12 Dec 2023 • Yang Xu, Yongqiang Yao, Yufan Huang, MengNan Qi, Maoquan Wang, Bin Gu, Neel Sundaresan
Instruction tuning, a specialized technique to enhance large language model (LLM) performance via instruction datasets, relies heavily on the quality of employed data.
no code implementations • 22 Oct 2023 • MengNan Qi, Yufan Huang, Maoquan Wang, Yongqiang Yao, Zihan Liu, Bin Gu, Colin Clement, Neel Sundaresan
In this paper we introduce a new metrics for programming language translation and these metrics address these basic syntax errors.
no code implementations • 17 Oct 2023 • Yufan Huang, MengNan Qi, Yongqiang Yao, Maoquan Wang, Bin Gu, Colin Clement, Neel Sundaresan
Distilled code serves as a translation pivot for any programming language, leading by construction to parallel corpora which scale to all available source code by simply applying the distillation compiler.
1 code implementation • 22 Jul 2022 • Disha Shur, Yufan Huang, David F. Gleich
We study a simple embedding technique based on a matrix of personalized PageRank vectors seeded on a random set of nodes.
1 code implementation • NAACL 2021 • Yufan Huang, Yanzhe Zhang, Jiaao Chen, Xuezhi Wang, Diyi Yang
Continual learning has become increasingly important as it enables NLP models to constantly learn and gain knowledge over time.
no code implementations • 25 Feb 2020 • Richeng Jin, Yufan Huang, Xiaofan He, Huaiyu Dai, Tianfu Wu
We present Stochastic-Sign SGD which utilizes novel stochastic-sign based gradient compressors enabling the aforementioned properties in a unified framework.