Search Results for author: Yao Wan

Found 51 papers, 25 papers with code

Modeling Hierarchical Syntax Structure with Triplet Position for Source Code Summarization

no code implementations ACL 2022 Juncai Guo, Jin Liu, Yao Wan, Li Li, Pingyi Zhou

In this paper, we propose CODESCRIBE to model the hierarchical syntax structure of code by introducing a novel triplet position for code summarization.

Code Summarization Graph Neural Network +2

Self-Cognition in Large Language Models: An Exploratory Study

no code implementations1 Jul 2024 Dongping Chen, Jiawen Shi, Yao Wan, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun

Additionally, we also explore the utility and trustworthiness of LLM in the self-cognition state, revealing that the self-cognition state enhances some specific tasks such as creative writing and exaggeration.

Chatbot

UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models

1 code implementation27 Jun 2024 Siyuan Wu, Yue Huang, Chujie Gao, Dongping Chen, Qihui Zhang, Yao Wan, Tianyi Zhou, Xiangliang Zhang, Jianfeng Gao, Chaowei Xiao, Lichao Sun

Large Language Models (LLMs) such as GPT-4 and Llama3 have significantly impacted various fields by enabling high-quality synthetic data generation and reducing dependence on expensive human-generated datasets.

Attribute Benchmarking +3

ObscurePrompt: Jailbreaking Large Language Models via Obscure Input

1 code implementation19 Jun 2024 Yue Huang, Jingyu Tang, Dongping Chen, Bingda Tang, Yao Wan, Lichao Sun, Xiangliang Zhang

Recently, Large Language Models (LLMs) have garnered significant attention for their exceptional natural language processing capabilities.

GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents

1 code implementation16 Jun 2024 Dongping Chen, Yue Huang, Siyuan Wu, Jingyu Tang, Liuyi Chen, Yilin Bai, Zhigang He, Chenlong Wang, Huichi Zhou, Yiqiang Li, Tianshuo Zhou, Yue Yu, Chujie Gao, Qihui Zhang, Yi Gui, Zhen Li, Yao Wan, Pan Zhou, Jianfeng Gao, Lichao Sun

We evaluate the capabilities of current state-of-the-art MLLMs, including ImageLLMs and VideoLLMs, in understanding various types of GUI content, especially dynamic and sequential content.

The Best of Both Worlds: Toward an Honest and Helpful Large Language Model

1 code implementation1 Jun 2024 Chujie Gao, Qihui Zhang, Dongping Chen, Yue Huang, Siyuan Wu, Zhengyan Fu, Yao Wan, Xiangliang Zhang, Lichao Sun

Subsequently, we present two approaches to augmenting honesty and helpfulness in LLMs: a training-free enhancement and a fine-tuning-based improvement.

Language Modelling Large Language Model

Automated Data Visualization from Natural Language via Large Language Models: An Exploratory Study

1 code implementation26 Apr 2024 Yang Wu, Yao Wan, Hongyu Zhang, Yulei Sui, Wucai Wei, Wei Zhao, Guandong Xu, Hai Jin

In particular, we first explore the ways of transforming structured tabular data into sequential text prompts, as to feed them into LLMs and analyze which table content contributes most to the NL2Vis.

Data Visualization In-Context Learning

Graph Neural Networks for Vulnerability Detection: A Counterfactual Explanation

1 code implementation24 Apr 2024 Zhaoyang Chu, Yao Wan, Qian Li, Yang Wu, Hongyu Zhang, Yulei Sui, Guandong Xu, Hai Jin

We argue that these factual reasoning-based explanations cannot answer critical what-if questions: What would happen to the GNN's decision if we were to alter the code graph into alternative structures?

counterfactual Counterfactual Explanation +2

CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code

1 code implementation24 Apr 2024 Batu Guan, Yao Wan, Zhangqian Bi, Zheng Wang, Hongyu Zhang, Pan Zhou, Lichao Sun

Experiments conducted on a real-world dataset across five programming languages demonstrate the effectiveness of CodeIP in watermarking LLMs for code generation while maintaining the syntactical correctness of code.

Code Generation Diversity

Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach

1 code implementation22 Apr 2024 Yao Wan, Guanghua Wan, Shijie Zhang, Hongyu Zhang, Pan Zhou, Hai Jin, Lichao Sun

Subsequently, the membership classifier can be effectively employed to deduce the membership status of a given code sample based on the output of a target code completion model.

Code Completion Memorization

VISION2UI: A Real-World Dataset with Layout for Code Generation from UI Designs

no code implementations9 Apr 2024 Yi Gui, Zhen Li, Yao Wan, Yemin Shi, Hongyu Zhang, Yi Su, Shaoling Dong, Xing Zhou, Wenbin Jiang

Automatically generating UI code from webpage design visions can significantly alleviate the burden of developers, enabling beginner developers or designers to directly generate Web pages from design diagrams.

Code Generation

NL2Formula: Generating Spreadsheet Formulas from Natural Language Queries

no code implementations20 Feb 2024 Wei Zhao, Zhitao Hou, Siyuan Wu, Yan Gao, Haoyu Dong, Yao Wan, Hongyu Zhang, Yulei Sui, Haidong Zhang

Writing formulas on spreadsheets, such as Microsoft Excel and Google Sheets, is a widespread practice among users performing data analysis.

Natural Language Queries

MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark

1 code implementation7 Feb 2024 Dongping Chen, Ruoxi Chen, Shilin Zhang, Yinuo Liu, Yaochen Wang, Huichi Zhou, Qihui Zhang, Yao Wan, Pan Zhou, Lichao Sun

Drawing inspiration from the concept of LLM-as-a-Judge within LLMs, this paper introduces a novel benchmark, termed MLLM-as-a-Judge, to assess the ability of MLLMs in assisting judges across diverse modalities, encompassing three distinct tasks: Scoring Evaluation, Pair Comparison, and Batch Ranking.

LLM-as-a-Coauthor: Can Mixed Human-Written and Machine-Generated Text Be Detected?

2 code implementations11 Jan 2024 Qihui Zhang, Chujie Gao, Dongping Chen, Yue Huang, Yixin Huang, Zhenyang Sun, Shilin Zhang, Weiye Li, Zhengyan Fu, Yao Wan, Lichao Sun

With the rapid development and widespread application of Large Language Models (LLMs), the use of Machine-Generated Text (MGT) has become increasingly common, bringing with it potential risks, especially in terms of quality and integrity in fields like news, education, and science.

Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit

no code implementations30 Dec 2023 Yao Wan, Yang He, Zhangqian Bi, JianGuo Zhang, Hongyu Zhang, Yulei Sui, Guandong Xu, Hai Jin, Philip S. Yu

We also benchmark several state-of-the-art neural models for code intelligence, and provide an open-source toolkit tailored for the rapid prototyping of deep-learning-based code intelligence models.

Representation Learning

kNN-ICL: Compositional Task-Oriented Parsing Generalization with Nearest Neighbor In-Context Learning

no code implementations17 Dec 2023 Wenting Zhao, Ye Liu, Yao Wan, Yibo Wang, Qingyang Wu, Zhongfen Deng, Jiangshu Du, Shuaiqi Liu, Yunlong Xu, Philip S. Yu

Task-Oriented Parsing (TOP) enables conversational assistants to interpret user commands expressed in natural language, transforming them into structured outputs that combine elements of both natural language and intent/slot tags.

In-Context Learning Prompt Engineering +1

DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text

no code implementations31 Oct 2023 Wenting Zhao, Ye Liu, Tong Niu, Yao Wan, Philip S. Yu, Shafiq Joty, Yingbo Zhou, Semih Yavuz

Moreover, a significant gap in the current landscape is the absence of a realistic benchmark for evaluating the effectiveness of grounding LLMs on heterogeneous knowledge sources (e. g., knowledge base and text).

Knowledge Graphs Open-Domain Question Answering +2

MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use

1 code implementation4 Oct 2023 Yue Huang, Jiawen Shi, Yuan Li, Chenrui Fan, Siyuan Wu, Qihui Zhang, Yixin Liu, Pan Zhou, Yao Wan, Neil Zhenqiang Gong, Lichao Sun

However, in scenarios where LLMs serve as intelligent agents, as seen in applications like AutoGPT and MetaGPT, LLMs are expected to engage in intricate decision-making processes that involve deciding whether to employ a tool and selecting the most suitable tool(s) from a collection of available tools to fulfill user requests.

Decision Making

Named Entity Recognition via Machine Reading Comprehension: A Multi-Task Learning Approach

1 code implementation20 Sep 2023 Yibo Wang, Wenting Zhao, Yao Wan, Zhongfen Deng, Philip S. Yu

In this paper, we propose to incorporate the label dependencies among entity types into a multi-task learning framework for better MRC-based NER.

Machine Reading Comprehension Multi-Task Learning +3

Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding

no code implementations2 Jan 2023 Jiahao Zhu, Daizong Liu, Pan Zhou, Xing Di, Yu Cheng, Song Yang, Wenzheng Xu, Zichuan Xu, Yao Wan, Lichao Sun, Zeyu Xiong

All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning.

Sentence Temporal Sentence Grounding

Diverse Title Generation for Stack Overflow Posts with Multiple Sampling Enhanced Transformer

1 code implementation24 Aug 2022 Fengji Zhang, Jin Liu, Yao Wan, Xiao Yu, Xiao Liu, Jacky Keung

Stack Overflow is one of the most popular programming communities where developers can seek help for their encountered problems.

Collaborative Knowledge Graph Fusion by Exploiting the Open Corpus

no code implementations15 Jun 2022 Yue Wang, Yao Wan, Lu Bai, Lixin Cui, Zhuo Xu, Ming Li, Philip S. Yu, Edwin R Hancock

To alleviate the challenges of building Knowledge Graphs (KG) from scratch, a more general task is to enrich a KG using triples from an open corpus, where the obtained triples contain noisy entities and relations.

Event Extraction Knowledge Graphs

CODE-MVP: Learning to Represent Source Code from Multiple Views with Contrastive Pre-Training

no code implementations Findings (NAACL) 2022 Xin Wang, Yasheng Wang, Yao Wan, Jiawei Wang, Pingyi Zhou, Li Li, Hao Wu, Jin Liu

Specifically, we first extract multiple code views using compiler tools, and learn the complementary information among them under a contrastive learning framework.

Contrastive Learning Defect Detection +2

Compilable Neural Code Generation with Compiler Feedback

no code implementations Findings (ACL) 2022 Xin Wang, Yasheng Wang, Yao Wan, Fei Mi, Yitong Li, Pingyi Zhou, Jin Liu, Hao Wu, Xin Jiang, Qun Liu

Automatically generating compilable programs with (or without) natural language descriptions has always been a touchstone problem for computational linguistics and automated software engineering.

Code Completion Code Generation +4

Reinforced MOOCs Concept Recommendation in Heterogeneous Information Networks

no code implementations8 Mar 2022 Jibing Gong, Yao Wan, Ye Liu, Xuewen Li, Yi Zhao, Cheng Wang, YuTing Lin, Xiaohan Fang, Wenzheng Feng, Jingyi Zhang, Jie Tang

Despite the usefulness of this service, we consider that recommending courses to users directly may neglect their varying degrees of expertise.

Graph Attention Graph Neural Network +2

Attend, Memorize and Generate: Towards Faithful Table-to-Text Generation in Few Shots

1 code implementation Findings (EMNLP) 2021 Wenting Zhao, Ye Liu, Yao Wan, Philip S. Yu

Few-shot table-to-text generation is a task of composing fluent and faithful sentences to convey table content using limited data.

Table-to-Text Generation

What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source Code

1 code implementation14 Feb 2022 Yao Wan, Wei Zhao, Hongyu Zhang, Yulei Sui, Guandong Xu, Hai Jin

In this paper, we conduct a thorough structural analysis aiming to provide an interpretation of pre-trained language models for source code (e. g., CodeBERT, and GraphCodeBERT) from three distinctive perspectives: (1) attention analysis, (2) probing on the word embedding, and (3) syntax tree induction.

Code Completion Code Search +1

Cross-Language Binary-Source Code Matching with Intermediate Representations

1 code implementation19 Jan 2022 Yi Gui, Yao Wan, Hongyu Zhang, Huifang Huang, Yulei Sui, Guandong Xu, Zhiyuan Shao, Hai Jin

Binary-source code matching plays an important role in many security and software engineering related tasks such as malware detection, reverse engineering and vulnerability assessment.

Malware Detection

DANets: Deep Abstract Networks for Tabular Data Classification and Regression

1 code implementation6 Dec 2021 Jintai Chen, Kuanlun Liao, Yao Wan, Danny Z. Chen, Jian Wu

A special basic block is built using AbstLays, and we construct a family of Deep Abstract Networks (DANets) for tabular data classification and regression by stacking such blocks.

regression

FedHM: Efficient Federated Learning for Heterogeneous Models via Low-rank Factorization

no code implementations29 Nov 2021 Dezhong Yao, Wanning Pan, Michael J O'Neill, Yutong Dai, Yao Wan, Hai Jin, Lichao Sun

To this end, this paper proposes FedHM, a novel heterogeneous federated model compression framework, distributing the heterogeneous low-rank models to clients and then aggregating them into a full-rank model.

Distributed Computing Federated Learning +3

SynCoBERT: Syntax-Guided Multi-Modal Contrastive Pre-Training for Code Representation

no code implementations10 Aug 2021 Xin Wang, Yasheng Wang, Fei Mi, Pingyi Zhou, Yao Wan, Xiao Liu, Li Li, Hao Wu, Jin Liu, Xin Jiang

Code representation learning, which aims to encode the semantics of source code into distributed vectors, plays an important role in recent deep-learning-based models for code intelligence.

Clone Detection Code Search +5

Local-Global Knowledge Distillation in Heterogeneous Federated Learning with Non-IID Data

no code implementations30 Jun 2021 Dezhong Yao, Wanning Pan, Yutong Dai, Yao Wan, Xiaofeng Ding, Hai Jin, Zheng Xu, Lichao Sun

Federated learning enables multiple clients to collaboratively learn a global model by periodically aggregating the clients' models without transferring the local data.

Federated Learning Knowledge Distillation

Enriching Non-Autoregressive Transformer with Syntactic and Semantic Structures for Neural Machine Translation

no code implementations EACL 2021 Ye Liu, Yao Wan, JianGuo Zhang, Wenting Zhao, Philip Yu

In this paper, we claim that the syntactic and semantic structures among natural language are critical for non-autoregressive machine translation and can further improve the performance.

Machine Translation Translation

Enriching Non-Autoregressive Transformer with Syntactic and SemanticStructures for Neural Machine Translation

no code implementations22 Jan 2021 Ye Liu, Yao Wan, Jian-Guo Zhang, Wenting Zhao, Philip S. Yu

In this paper, we claim that the syntactic and semantic structures among natural language are critical for non-autoregressive machine translation and can further improve the performance.

Machine Translation Translation

Cross-Supervised Joint-Event-Extraction with Heterogeneous Information Networks

no code implementations13 Oct 2020 Yue Wang, Zhuo Xu, Lu Bai, Yao Wan, Lixin Cui, Qian Zhao, Edwin R. Hancock, Philip S. Yu

To verify the effectiveness of our proposed method, we conduct extensive experiments on four real-world datasets as well as compare our method with state-of-the-art methods.

Event Extraction TAG

KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning

1 code implementation26 Sep 2020 Ye Liu, Yao Wan, Lifang He, Hao Peng, Philip S. Yu

To promote the ability of commonsense reasoning for text generation, we propose a novel knowledge graph augmented pre-trained language generation model KG-BART, which encompasses the complex relations of concepts through the knowledge graph and produces more logical and natural sentences as output.

Graph Attention Text Generation

Competitive Multi-Agent Deep Reinforcement Learning with Counterfactual Thinking

no code implementations13 Aug 2019 Yue Wang, Yao Wan, Chenwei Zhang, Lixin Cui, Lu Bai, Philip S. Yu

During the iterations, our model updates the parallel policies and the corresponding scenario-based regrets for agents simultaneously.

counterfactual Decision Making +3

Multi-Modal Generative Adversarial Network for Short Product Title Generation in Mobile E-Commerce

no code implementations NAACL 2019 Jian-Guo Zhang, Pengcheng Zou, Zhao Li, Yao Wan, Xiuming Pan, Yu Gong, Philip S. Yu

To address this discrepancy, previous studies mainly consider textual information of long product titles and lacks of human-like view during training and evaluation process.

Attribute Generative Adversarial Network

Improving Automatic Source Code Summarization via Deep Reinforcement Learning

2 code implementations17 Nov 2018 Yao Wan, Zhou Zhao, Min Yang, Guandong Xu, Haochao Ying, Jian Wu, Philip S. Yu

To the best of our knowledge, most state-of-the-art approaches follow an encoder-decoder framework which encodes the code into a hidden space and then decode it into natural language space, suffering from two major drawbacks: a) Their encoders only consider the sequential content of code, ignoring the tree structure which is also critical for the task of code summarization, b) Their decoders are typically trained to predict the next word by maximizing the likelihood of next ground-truth word with previous ground-truth word given.

Code Summarization Decoder +4

Improved Dynamic Memory Network for Dialogue Act Classification with Adversarial Training

no code implementations12 Nov 2018 Yao Wan, Wenqiang Yan, Jianwei Gao, Zhou Zhao, Jian Wu, Philip S. Yu

Dialogue Act (DA) classification is a challenging problem in dialogue interpretation, which aims to attach semantic labels to utterances and characterize the speaker's intention.

Classification Dialogue Act Classification +3

Cannot find the paper you are looking for? You can Submit a new open access paper.