Search Results for author: Dongmei Zhang

Found 101 papers, 50 papers with code

Weakly Supervised Semantic Parsing by Learning from Mistakes

1 code implementation • Findings (EMNLP) 2021 • Jiaqi Guo, Jian-Guang Lou, Ting Liu, Dongmei Zhang

Using only 10% of utterance-denotation pairs, the parser achieves 84. 2 denotation accuracy on WikiSQL, which is competitive with the previous state-of-the-art approaches using 100% labeled data.

Semantic Parsing

Paper
Code

``What Do You Mean by That?'' A Parser-Independent Interactive Approach for Enhancing Text-to-SQL

no code implementations • EMNLP 2020 • Yuntao Li, Bei Chen, Qian Liu, Yan Gao, Jian-Guang Lou, Yan Zhang, Dongmei Zhang

In Natural Language Interfaces to Databases systems, the text-to-SQL technique allows users to query databases by using natural language questions.

Text-To-SQL

Paper
Add Code

CONLINE: Complex Code Generation and Refinement with Online Searching and Correctness Testing

no code implementations • 20 Mar 2024 • Xinyi He, Jiaru Zou, Yun Lin, Mengyu Zhou, Shi Han, Zejian yuan, Dongmei Zhang

Large Language Models (LLMs) have revolutionized code generation ability by converting natural language descriptions into executable code.

Code Generation Information Retrieval +1

Paper
Add Code

LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

1 code implementation • 19 Mar 2024 • Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, QIngwei Lin, Victor Rühle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, Dongmei Zhang

The challenge is that information entropy may be a suboptimal compression metric: (i) it only leverages unidirectional context and may fail to capture all essential information needed for prompt compression; (ii) it is not aligned with the prompt compression objective.

GSM8K Language Modelling +3

3,627

Paper
Code

Call Me When Necessary: LLMs can Efficiently and Faithfully Reason over Structured Environments

no code implementations • 13 Mar 2024 • Sitao Cheng, Ziyuan Zhuang, Yong Xu, Fangkai Yang, Chaoyun Zhang, Xiaoting Qin, Xiang Huang, Ling Chen, QIngwei Lin, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

We instantiate the path on structured environments and provide feedback to edit the path if anything goes wrong.

Paper
Add Code

Ploutos: Towards interpretable stock movement prediction with financial large language model

no code implementations • 18 Feb 2024 • Hanshuang Tong, Jun Li, Ning Wu, Ming Gong, Dongmei Zhang, Qi Zhang

Recent advancements in large language models (LLMs) have opened new pathways for many domains.

Language Modelling Large Language Model

Paper
Add Code

UFO: A UI-Focused Agent for Windows OS Interaction

1 code implementation • 8 Feb 2024 • Chaoyun Zhang, Liqun Li, Shilin He, Xu Zhang, Bo Qiao, Si Qin, Minghua Ma, Yu Kang, QIngwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision.

Navigate

3,869

Paper
Code

Revisiting VAE for Unsupervised Time Series Anomaly Detection: A Frequency Perspective

1 code implementation • 5 Feb 2024 • Zexin Wang, Changhua Pei, Minghua Ma, Xin Wang, Zhihan Li, Dan Pei, Saravan Rajmohan, Dongmei Zhang, QIngwei Lin, Haiming Zhang, Jianhui Li, Gaogang Xie

To ensure an accurate AD, FCVAE exploits an innovative approach to concurrently integrate both the global and local frequency features into the condition of Conditional Variational Autoencoder (CVAE) to significantly increase the accuracy of reconstructing the normal data.

Anomaly Detection Time Series +1

Paper
Code

COIN: Chance-Constrained Imitation Learning for Uncertainty-aware Adaptive Resource Oversubscription Policy

no code implementations • 13 Jan 2024 • Lu Wang, Mayukh Das, Fangkai Yang, Chao Duo, Bo Qiao, Hang Dong, Si Qin, Chetan Bansal, QIngwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

We address the challenge of learning safe and robust decision policies in presence of uncertainty in context of the real scientific problem of adaptive resource oversubscription to enhance resource efficiency while ensuring safety against resource congestion risk.

Imitation Learning Management

Paper
Add Code

Contrastive Learning with Negative Sampling Correction

no code implementations • 13 Jan 2024 • Lu Wang, Chao Du, Pu Zhao, Chuan Luo, Zhangchi Zhu, Bo Qiao, Wei zhang, QIngwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

To correct the negative sampling bias, we propose a novel contrastive learning method named Positive-Unlabeled Contrastive Learning (PUCL).

Contrastive Learning Data Augmentation +2

Paper
Add Code

Why does Prediction Accuracy Decrease over Time? Uncertain Positive Learning for Cloud Failure Prediction

no code implementations • 8 Jan 2024 • Haozhe Li, Minghua Ma, Yudong Liu, Pu Zhao, Lingling Zheng, Ze Li, Yingnong Dang, Murali Chintalapati, Saravan Rajmohan, QIngwei Lin, Dongmei Zhang

Using two real-world datasets of disk failure prediction and conducting node prediction experiments in Microsoft Azure, which is a top-tier cloud provider that serves millions of users, we demonstrate Uptake can significantly improve the failure prediction accuracy by 5% on average.

Cloud Computing

Paper
Add Code

FM-OV3D: Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection

no code implementations • 22 Dec 2023 • Dongmei Zhang, Chang Li, Ray Zhang, Shenghao Xie, Wei Xue, Xiaodong Xie, Shanghang Zhang

In this work, we propose FM-OV3D, a method of Foundation Model-based Cross-modal Knowledge Blending for Open-Vocabulary 3D Detection, which improves the open-vocabulary localization and recognition abilities of 3D model by blending knowledge from multiple pre-trained foundation models, achieving true open-vocabulary without facing constraints from original 3D datasets.

Ranked #3 on 3D Open-Vocabulary Object Detection on ScanNet on unseen classes

3D Object Detection 3D Open-Vocabulary Object Detection +2

Paper
Add Code

Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries

no code implementations • 21 Dec 2023 • Xinyi He, Mengyu Zhou, Xinrun Xu, Xiaojun Ma, Rui Ding, Lun Du, Yan Gao, Ran Jia, Xu Chen, Shi Han, Zejian yuan, Dongmei Zhang

We evaluate five state-of-the-art models using three different metrics and the results show that our benchmark presents introduces considerable challenge in the field of tabular data analysis, paving the way for more advanced research opportunities.

Question Answering

Paper
Add Code

Xpert: Empowering Incident Management with Query Recommendations via Large Language Models

no code implementations • 19 Dec 2023 • YuXuan Jiang, Chaoyun Zhang, Shilin He, Zhihao Yang, Minghua Ma, Si Qin, Yu Kang, Yingnong Dang, Saravan Rajmohan, QIngwei Lin, Dongmei Zhang

This paper presents a thorough empirical study on the utilization of queries of KQL, a DSL employed for incident management in a large-scale cloud management system at Microsoft.

Management

Paper
Add Code

TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning

no code implementations • 14 Dec 2023 • Yuan Sui, Jiaru Zou, Mengyu Zhou, Xinyi He, Lun Du, Shi Han, Dongmei Zhang

Table-based reasoning has shown remarkable progress in combining deep models with discrete reasoning, which requires reasoning over both free-form natural language (NL) questions and semi-structured tabular data.

Language Modelling Large Language Model +2

Paper
Add Code

Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers

1 code implementation • 7 Dec 2023 • Nuo Chen, Ning Wu, Shining Liang, Ming Gong, Linjun Shou, Dongmei Zhang, Jia Li

This paper presents an in-depth analysis of Large Language Models (LLMs), focusing on LLaMA, a prominent open-source foundational model in natural language processing.

Math Multiple-choice +1

Paper
Code

TaskWeaver: A Code-First Agent Framework

1 code implementation • 29 Nov 2023 • Bo Qiao, Liqun Li, Xu Zhang, Shilin He, Yu Kang, Chaoyun Zhang, Fangkai Yang, Hang Dong, Jue Zhang, Lu Wang, Minghua Ma, Pu Zhao, Si Qin, Xiaoting Qin, Chao Du, Yong Xu, QIngwei Lin, Saravan Rajmohan, Dongmei Zhang

TaskWeaver provides support for rich data structures, flexible plugin usage, and dynamic plugin selection, and leverages LLM coding capabilities for complex logic.

Natural Language Understanding

4,674

Paper
Code

LayoutPrompter: Awaken the Design Ability of Large Language Models

1 code implementation • NeurIPS 2023 • Jiawei Lin, Jiaqi Guo, Shizhao Sun, Zijiang James Yang, Jian-Guang Lou, Dongmei Zhang

In this work, we propose LayoutPrompter, which leverages large language models (LLMs) to address the above problems through in-context learning.

In-Context Learning

Paper
Code

Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation

1 code implementation • 7 Nov 2023 • Ruomeng Ding, Chaoyun Zhang, Lu Wang, Yong Xu, Minghua Ma, Wei zhang, Si Qin, Saravan Rajmohan, QIngwei Lin, Dongmei Zhang

To address these limitations, we introduce a novel thought prompting approach called "Everything of Thoughts" (XoT) to defy the law of "Penrose triangle of existing thought paradigms.

Decision Making

Paper
Code

Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations

1 code implementation • 31 Oct 2023 • Nuo Chen, Zinan Zheng, Ning Wu, Ming Gong, Yangqiu Song, Dongmei Zhang, Jia Li

This indicates that crafting multilingual corpora can be regarded as a vital strategy for enhancing model performance in a specific language, especially in mathematical reasoning tasks.

GSM8K Math +1

Paper
Code

Table-GPT: Table-tuned GPT for Diverse Table Tasks

no code implementations • 13 Oct 2023 • Peng Li, Yeye He, Dror Yashar, Weiwei Cui, Song Ge, Haidong Zhang, Danielle Rifinski Fainman, Dongmei Zhang, Surajit Chaudhuri

Language models, such as GPT-3. 5 and ChatGPT, demonstrate remarkable abilities to follow diverse human instructions and perform a wide range of tasks.

Probing Language Models

Paper
Add Code

Text-to-Image Generation for Abstract Concepts

no code implementations • 26 Sep 2023 • Jiayi Liao, Xu Chen, Qiang Fu, Lun Du, Xiangnan He, Xiang Wang, Shi Han, Dongmei Zhang

Recent years have witnessed the substantial progress of large-scale models across various domains, such as natural language processing and computer vision, facilitating the expression of concrete concepts.

Text-to-Image Generation

Paper
Add Code

SoTaNa: The Open-Source Software Development Assistant

1 code implementation • 25 Aug 2023 • Ensheng Shi, Fengji Zhang, Yanlin Wang, Bei Chen, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun

To meet the demands of this dynamic field, there is a growing need for an effective software development assistant.

Code Summarization

125

Paper
Code

A Parse-Then-Place Approach for Generating Graphic Layouts from Textual Descriptions

no code implementations • ICCV 2023 • Jiawei Lin, Jiaqi Guo, Shizhao Sun, Weijiang Xu, Ting Liu, Jian-Guang Lou, Dongmei Zhang

To model combined and incomplete constraints, we use a Transformer-based layout generation model and carefully design a way to represent constraints and layouts as sequences.

Paper
Add Code

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

1 code implementation • 18 Aug 2023 • Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, JianGuang Lou, Chongyang Tao, Xiubo Geng, QIngwei Lin, Shifeng Chen, Dongmei Zhang

Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our model.

Ranked #49 on Arithmetic Reasoning on GSM8K (using extra training data)

Arithmetic Reasoning GSM8K +2

8,813

Paper
Code

End-to-End Beam Retrieval for Multi-Hop Question Answering

2 code implementations • 17 Aug 2023 • Jiahao Zhang, Haiyang Zhang, Dongmei Zhang, Yong liu, Shen Huang

This approach models the multi-hop retrieval process in an end-to-end manner by jointly optimizing an encoder and two classification heads across all hops.

Ranked #1 on Question Answering on HotpotQA

Language Modelling Large Language Model +3

Paper
Code

Diffusion-based Time Series Data Imputation for Microsoft 365

no code implementations • 3 Aug 2023 • Fangkai Yang, Wenjie Yin, Lu Wang, Tianci Li, Pu Zhao, Bo Liu, Paul Wang, Bo Qiao, Yudong Liu, Mårten Björkman, Saravan Rajmohan, QIngwei Lin, Dongmei Zhang

However, they suffer from poor data quality like data missing in model training and prediction, which limits the performance.

Imputation Time Series

Paper
Add Code

Robust Positive-Unlabeled Learning via Noise Negative Sample Self-correction

1 code implementation • 1 Aug 2023 • Zhangchi Zhu, Lu Wang, Pu Zhao, Chao Du, Wei zhang, Hang Dong, Bo Qiao, QIngwei Lin, Saravan Rajmohan, Dongmei Zhang

To mitigate the impact of label uncertainty and improve the robustness of learning with positive and unlabeled data, we propose a new robust PU learning method with a training strategy motivated by the nature of human learning: easy cases should be learned first.

Paper
Code

ImDiffusion: Imputed Diffusion Models for Multivariate Time Series Anomaly Detection

1 code implementation • 3 Jul 2023 • Yuhang Chen, Chaoyun Zhang, Minghua Ma, Yudong Liu, Ruomeng Ding, Bowen Li, Shilin He, Saravan Rajmohan, QIngwei Lin, Dongmei Zhang

To the best of our knowledge, ImDiffusion represents a pioneering approach that combines imputation-based techniques with time series anomaly detection, while introducing the novel use of diffusion models to the field.

Anomaly Detection Imputation +2

Paper
Code

On Manipulating Signals of User-Item Graph: A Jacobi Polynomial-based Graph Collaborative Filtering

1 code implementation • 6 Jun 2023 • Jiayan Guo, Lun Du, Xu Chen, Xiaojun Ma, Qiang Fu, Shi Han, Dongmei Zhang, Yan Zhang

Graph CF has attracted more and more attention in recent years due to its effectiveness in leveraging high-order information in the user-item bipartite graph for better recommendations.

Collaborative Filtering Recommendation Systems

Paper
Code

Auto-Validate by-History: Auto-Program Data Quality Constraints to Validate Recurring Data Pipelines

no code implementations • 4 Jun 2023 • Dezhan Tu, Yeye He, Weiwei Cui, Song Ge, Haidong Zhang, Han Shi, Dongmei Zhang, Surajit Chaudhuri

Data pipelines are widely employed in modern enterprises to power a variety of Machine-Learning (ML) and Business-Intelligence (BI) applications.

Paper
Add Code

Enabling and Analyzing How to Efficiently Extract Information from Hybrid Long Documents with LLMs

no code implementations • 24 May 2023 • Chongjian Yue, Xinrun Xu, Xiaojun Ma, Lun Du, Hengyu Liu, Zhiming Ding, Yanbing Jiang, Shi Han, Dongmei Zhang

We propose an Automated Financial Information Extraction (AFIE) framework that enhances LLMs' ability to comprehend and extract information from financial reports.

Retrieval

Paper
Add Code

Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study

1 code implementation • 22 May 2023 • Yuan Sui, Mengyu Zhou, Mingjie Zhou, Shi Han, Dongmei Zhang

Although tables can be used as input to LLMs with serialization, there is a lack of comprehensive studies that examine whether LLMs can truly comprehend such data.

Retrieval

Paper
Code

Introspective Tips: Large Language Model for In-Context Decision Making

no code implementations • 19 May 2023 • Liting Chen, Lu Wang, Hang Dong, Yali Du, Jie Yan, Fangkai Yang, Shuang Li, Pu Zhao, Si Qin, Saravan Rajmohan, QIngwei Lin, Dongmei Zhang

The emergence of large language models (LLMs) has substantially influenced natural language processing, demonstrating exceptional results across various tasks.

Decision Making Language Modelling +2

Paper
Add Code

Empower Large Language Model to Perform Better on Industrial Domain-Specific Question Answering

1 code implementation • 19 May 2023 • Fangkai Yang, Pu Zhao, Zezhong Wang, Lu Wang, Jue Zhang, Mohit Garg, QIngwei Lin, Saravan Rajmohan, Dongmei Zhang

Large Language Model (LLM) has gained popularity and achieved remarkable results in open-domain tasks, but its performance in real industrial domain-specific scenarios is average due to its lack of specific domain knowledge.

Language Modelling Large Language Model +2

Paper
Code

How Do In-Context Examples Affect Compositional Generalization?

no code implementations • 8 May 2023 • Shengnan An, Zeqi Lin, Qiang Fu, Bei Chen, Nanning Zheng, Jian-Guang Lou, Dongmei Zhang

Compositional generalization--understanding unseen combinations of seen primitives--is an essential reasoning capability in human intelligence.

In-Context Learning

Paper
Add Code

Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond

1 code implementation • 11 Apr 2023 • Ensheng Shi, Yanlin Wang, Hongyu Zhang, Lun Du, Shi Han, Dongmei Zhang, Hongbin Sun

Our experimental study shows that (1) lexical, syntactic and structural properties of source code are encoded in the lower, intermediate, and higher layers, respectively, while the semantic property spans across the entire model.

Paper
Code

Demonstration of InsightPilot: An LLM-Empowered Automated Data Exploration System

no code implementations • 2 Apr 2023 • Pingchuan Ma, Rui Ding, Shuai Wang, Shi Han, Dongmei Zhang

In brief, an IQuery is an abstraction and automation of data analysis operations, which mimics the approach of data analysts and simplifies the exploration process for users.

Language Modelling Large Language Model

Paper
Add Code

LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models

1 code implementation • ICCV 2023 • Junyi Zhang, Jiaqi Guo, Shizhao Sun, Jian-Guang Lou, Dongmei Zhang

To tackle the challenge, we summarize three critical factors for achieving a mild forward process for the layout, i. e., legality, coordinate proximity and type disruption.

Layout Design

Paper
Code

Robust Mid-Pass Filtering Graph Convolutional Networks

1 code implementation • 16 Feb 2023 • Jincheng Huang, Lun Du, Xu Chen, Qiang Fu, Shi Han, Dongmei Zhang

Theoretical analyses guarantee the robustness of signals through the mid-pass filter, and we also shed light on the properties of different frequency signals under adversarial attacks.

Adversarial Attack Node Classification

Paper
Code

Conservative State Value Estimation for Offline Reinforcement Learning

1 code implementation • NeurIPS 2023 • Liting Chen, Jie Yan, Zhengdao Shao, Lu Wang, QIngwei Lin, Saravan Rajmohan, Thomas Moscibroda, Dongmei Zhang

In this paper, we propose Conservative State Value Estimation (CSVE), a new approach that learns conservative V-function via directly imposing penalty on OOD states.

D4RL reinforcement-learning

Paper
Code

Homophily-oriented Heterogeneous Graph Rewiring

no code implementations • 13 Feb 2023 • Jiayan Guo, Lun Du, Wendong Bi, Qiang Fu, Xiaojun Ma, Xu Chen, Shi Han, Dongmei Zhang, Yan Zhang

To this end, we propose HDHGR, a homophily-oriented deep heterogeneous graph rewiring approach that modifies the HG structure to increase the performance of HGNN.

Paper
Add Code

Out-of-Distribution Detection based on In-Distribution Data Patterns Memorization with Modern Hopfield Energy

1 code implementation • ICLR 2023 • Jinsong Zhang, Qiang Fu, Xu Chen, Lun Du, Zelin Li, Gang Wang, Xiaoguang Liu, Shi Han, Dongmei Zhang

In more detail, penultimate layer outputs on the training set are considered as the representations of in-distribution (ID) data.

Ranked #11 on Out-of-Distribution Detection on ImageNet-1k vs Places

Computational Efficiency Memorization +2

Paper
Code

LUNA: Language Understanding with Number Augmentations on Transformers via Number Plugins and Pre-training

1 code implementation • 6 Dec 2022 • Hongwei Han, Jialiang Xu, Mengyu Zhou, Yijia Shao, Shi Han, Dongmei Zhang

But current approaches to rich-number tasks with transformer-based language models abandon or lose some of the numeracy information - e. g., breaking numbers into sub-word tokens - which leads to many number-related errors.

Paper
Code

Learning Cooperative Oversubscription for Cloud by Chance-Constrained Multi-Agent Reinforcement Learning

no code implementations • 21 Nov 2022 • Junjie Sheng, Lu Wang, Fangkai Yang, Bo Qiao, Hang Dong, Xiangfeng Wang, Bo Jin, Jun Wang, Si Qin, Saravan Rajmohan, QIngwei Lin, Dongmei Zhang

To address these two limitations, this paper formulates the oversubscription for cloud as a chance-constrained optimization problem and propose an effective Chance Constrained Multi-Agent Reinforcement Learning (C2MARL) method to solve this problem.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Towards Robust Numerical Question Answering: Diagnosing Numerical Capabilities of NLP Systems

no code implementations • 14 Nov 2022 • Jialiang Xu, Mengyu Zhou, Xinyi He, Shi Han, Dongmei Zhang

Numerical Question Answering is the task of answering questions that require numerical capabilities.

Data Augmentation Open-Ended Question Answering

Paper
Add Code

FormLM: Recommending Creation Ideas for Online Forms by Modelling Semantic and Structural Information

no code implementations • 10 Nov 2022 • Yijia Shao, Mengyu Zhou, Yifan Zhong, Tao Wu, Hongwei Han, Shi Han, Gideon Huang, Dongmei Zhang

To assist form designers, in this work we present FormLM to model online forms (by enhancing pre-trained language model with form structural information) and recommend form creation ideas (including question / options recommendations and block type suggestion).

Language Modelling

Paper
Add Code

Reflection of Thought: Inversely Eliciting Numerical Reasoning in Language Models via Solving Linear Systems

no code implementations • 11 Oct 2022 • Fan Zhou, Haoyu Dong, Qian Liu, Zhoujun Cheng, Shi Han, Dongmei Zhang

Numerical reasoning over natural language has been a long-standing goal for the research community.

Paper
Add Code

Unveiling the Black Box of PLMs with Semantic Anchors: Towards Interpretable Neural Semantic Parsing

no code implementations • 4 Oct 2022 • Lunyiu Nie, Jiuding Sun, Yanlin Wang, Lun Du, Lei Hou, Juanzi Li, Shi Han, Dongmei Zhang, Jidong Zhai

The recent prevalence of pretrained language models (PLMs) has dramatically shifted the paradigm of semantic parsing, where the mapping from natural language utterances to structured logical forms is now formulated as a Seq2Seq task.

Hallucination Semantic Parsing +1

Paper
Add Code

Make Heterophily Graphs Better Fit GNN: A Graph Rewiring Approach

no code implementations • 17 Sep 2022 • Wendong Bi, Lun Du, Qiang Fu, Yanlin Wang, Shi Han, Dongmei Zhang

Graph Neural Networks (GNNs) are popular machine learning methods for modeling graph data.

Ranked #5 on Node Classification on Squirrel

Node Classification

Paper
Add Code

Enhanced Fairness Testing via Generating Effective Initial Individual Discriminatory Instances

no code implementations • 17 Sep 2022 • Minghua Ma, Zhao Tian, Max Hort, Federica Sarro, Hongyu Zhang, QIngwei Lin, Dongmei Zhang

In this paper, we propose an approach for the selection of the initial seeds to generate IDIs for fairness testing.

Decision Making Fairness

Paper
Add Code

AnaMeta: A Table Understanding Dataset of Field Metadata Knowledge Shared by Multi-dimensional Data Analysis Tasks

no code implementations • 2 Sep 2022 • Xinyi He, Mengyu Zhou, Mingjie Zhou, Jialiang Xu, Xiao Lv, Tianle Li, Yijia Shao, Shi Han, Zejian yuan, Dongmei Zhang

Tabular data analysis is performed every day across various domains.

Paper
Add Code

Learning Rate Perturbation: A Generic Plugin of Learning Rate Schedule towards Flatter Local Minima

no code implementations • 25 Aug 2022 • Hengyu Liu, Qiang Fu, Lun Du, Tiancheng Zhang, Ge Yu, Shi Han, Dongmei Zhang

Learning rate is one of the most important hyper-parameters that has a significant influence on neural network training.

Paper
Add Code

LayoutFormer++: Conditional Graphic Layout Generation via Constraint Serialization and Decoding Space Restriction

no code implementations • CVPR 2023 • Zhaoyun Jiang, Jiaqi Guo, Shizhao Sun, Huayu Deng, Zhongkai Wu, Vuksan Mijovic, Zijiang James Yang, Jian-Guang Lou, Dongmei Zhang

First, to flexibly handle diverse constraints, we propose a constraint serialization scheme, which represents different user constraints as sequences of tokens with a predefined format.

Paper
Add Code

MM-GNN: Mix-Moment Graph Neural Network towards Modeling Neighborhood Feature Distribution

1 code implementation • 15 Aug 2022 • Wendong Bi, Lun Du, Qiang Fu, Yanlin Wang, Shi Han, Dongmei Zhang

Graph Neural Networks (GNNs) have shown expressive performance on graph representation learning by aggregating information from neighbors.

Graph Representation Learning

Paper
Code

ASTA: Learning Analytical Semantics over Tables for Intelligent Data Analysis and Visualization

no code implementations • 1 Aug 2022 • Lingbo Li, Tianle Li, Xinyi He, Mengyu Zhou, Shi Han, Dongmei Zhang

ASTA framework extracts data features by designing signatures based on expert knowledge, and enables data referencing at field- (chart) or cell-level (conditional formatting) with pre-trained models.

Paper
Add Code

XInsight: eXplainable Data Analysis Through The Lens of Causality

no code implementations • 26 Jul 2022 • Pingchuan Ma, Rui Ding, Shuai Wang, Shi Han, Dongmei Zhang

XInsight is a three-module, end-to-end pipeline designed to extract causal graphs, translate causal primitives into XDA semantics, and quantify the quantitative contribution of each explanation to a data fact.

Decision Making

Paper
Add Code

Solving the Batch Stochastic Bin Packing Problem in Cloud: A Chance-constrained Optimization Approach

no code implementations • 20 Jul 2022 • Jie Yan, Yunlei Lu, Liting Chen, Si Qin, Yixin Fang, QIngwei Lin, Thomas Moscibroda, Saravan Rajmohan, Dongmei Zhang

This paper investigates a critical resource allocation problem in the first party cloud: scheduling containers to machines.

Scheduling

Paper
Add Code

TaCube: Pre-computing Data Cubes for Answering Numerical-Reasoning Questions over Tabular Data

1 code implementation • 25 May 2022 • Fan Zhou, Mengkang Hu, Haoyu Dong, Zhoujun Cheng, Shi Han, Dongmei Zhang

Existing auto-regressive pre-trained language models (PLMs) like T5 and BART, have been well applied to table question answering by UNIFIEDSKG and TAPEX, respectively, and demonstrated state-of-the-art results on multiple benchmarks.

Question Answering

Paper
Code

PLOG: Table-to-Logic Pretraining for Logical Table-to-Text Generation

1 code implementation • 25 May 2022 • Ao Liu, Haoyu Dong, Naoaki Okazaki, Shi Han, Dongmei Zhang

However, directly learning the logical inference knowledge from table-text pairs is very difficult for neural models because of the ambiguity of natural language and the scarcity of parallel data.

Table-to-Text Generation

Paper
Code

CoCoSoDa: Effective Contrastive Learning for Code Search

no code implementations • 7 Apr 2022 • Ensheng Shi, Yanlin Wang, Wenchao Gu, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun

However, there is still a lot of room for improvement in using contrastive learning for code search.

Code Search Contrastive Learning +2

Paper
Add Code

Accelerating Code Search with Deep Hashing and Code Classification

no code implementations • ACL 2022 • Wenchao Gu, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Michael R. Lyu

Code search is to search reusable code snippets from source code corpus based on natural languages queries.

Classification Code Classification +2

Paper
Add Code

OneLabeler: A Flexible System for Building Data Labeling Tools

1 code implementation • 27 Mar 2022 • Yu Zhang, Yun Wang, Haidong Zhang, Bin Zhu, Siming Chen, Dongmei Zhang

In this paper, we propose a conceptual framework for data labeling and OneLabeler based on the conceptual framework to support easy building of labeling tools for diverse usage scenarios.

Paper
Code

RACE: Retrieval-Augmented Commit Message Generation

2 code implementations • 5 Mar 2022 • Ensheng Shi, Yanlin Wang, Wei Tao, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun

Furthermore, RACE can boost the performance of existing Seq2Seq models in commit message generation.

Information Retrieval Retrieval +2

Paper
Code

Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks

no code implementations • 24 Jan 2022 • Haoyu Dong, Zhoujun Cheng, Xinyi He, Mengyu Zhou, Anda Zhou, Fan Zhou, Ao Liu, Shi Han, Dongmei Zhang

Since a vast number of tables can be easily collected from web pages, spreadsheets, PDFs, and various other document types, a flurry of table pre-training frameworks have been proposed following the success of text and images, and they have achieved new state-of-the-arts on various tasks such as table question answering, table type recognition, column relation classification, table search, formula prediction, etc.

Denoising Question Answering +2

Paper
Add Code

Source Free Unsupervised Graph Domain Adaptation

1 code implementation • 2 Dec 2021 • Haitao Mao, Lun Du, Yujia Zheng, Qiang Fu, Zelin Li, Xu Chen, Shi Han, Dongmei Zhang

To address the non-trivial adaptation challenges in this practical scenario, we propose a model-agnostic algorithm called SOGA for domain adaptation to fully exploit the discriminative ability of the source model while preserving the consistency of structural proximity on the target graph.

Domain Adaptation Node Classification

Paper
Code

A Surrogate Objective Framework for Prediction+Programming with Soft Constraints

no code implementations • NeurIPS 2021 • Kai Yan, Jie Yan, Chuan Luo, Liting Chen, QIngwei Lin, Dongmei Zhang

Prediction+optimization is a common real-world paradigm where we have to predict problem parameters before solving the optimization problem.

Portfolio Optimization

Paper
Add Code

Neuron with Steady Response Leads to Better Generalization

no code implementations • 30 Nov 2021 • Qiang Fu, Lun Du, Haitao Mao, Xu Chen, Wei Fang, Shi Han, Dongmei Zhang

Based on the analysis results, we articulate the Neuron Steadiness Hypothesis: the neuron with similar responses to instances of the same class leads to better generalization.

Inductive Bias

Paper
Add Code

A Surrogate Objective Framework for Prediction+Optimization with Soft Constraints

1 code implementation • 22 Nov 2021 • Kai Yan, Jie Yan, Chuan Luo, Liting Chen, QIngwei Lin, Dongmei Zhang

Prediction+optimization is a common real-world paradigm where we have to predict problem parameters before solving the optimization problem.

Portfolio Optimization

Paper
Code

A Unified and Fast Interpretable Model for Predictive Analytics

no code implementations • 16 Nov 2021 • Yuanyuan Jiang, Rui Ding, Tianchi Qiao, Yunan Zhu, Shi Han, Dongmei Zhang

Predictive analytics is human involved, thus the machine learning model is preferred to be interpretable.

Decision Making

Paper
Add Code

GBK-GNN: Gated Bi-Kernel Graph Neural Networks for Modeling Both Homophily and Heterophily

1 code implementation • 29 Oct 2021 • Lun Du, Xiaozhou Shi, Qiang Fu, Xiaojun Ma, Hengyu Liu, Shi Han, Dongmei Zhang

For node-level tasks, GNNs have strong power to model the homophily property of graphs (i. e., connected nodes are more similar) while their ability to capture the heterophily property is often doubtful.

Graph Attention

Paper
Code

ML4C: Seeing Causality Through Latent Vicinity

1 code implementation • NeurIPS 2021 • Haoyue Dai, Rui Ding, Yuanyuan Jiang, Shi Han, Dongmei Zhang

Starting from seeing that SCL is not better than random guessing if the learning target is non-identifiable a priori, we propose a two-phase paradigm for SCL by explicitly considering structure identifiability.

Paper
Code

FORTAP: Using Formulas for Numerical-Reasoning-Aware Table Pretraining

1 code implementation • ACL 2022 • Zhoujun Cheng, Haoyu Dong, Ran Jia, Pengfei Wu, Shi Han, Fan Cheng, Dongmei Zhang

In this paper, we find that the spreadsheet formula, which performs calculations on numerical values in tables, is naturally a strong supervision of numerical reasoning.

Paper
Code

HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation

1 code implementation • ACL 2022 • Zhoujun Cheng, Haoyu Dong, Zhiruo Wang, Ran Jia, Jiaqi Guo, Yan Gao, Shi Han, Jian-Guang Lou, Dongmei Zhang

HiTab provides 10, 686 QA pairs and descriptive sentences with well-annotated quantity and entity alignment on 3, 597 tables with broad coverage of table hierarchies and numerical reasoning types.

Descriptive Entity Alignment +2

Paper
Code

Neuron Campaign for Initialization Guided by Information Bottleneck Theory

1 code implementation • 14 Aug 2021 • Haitao Mao, Xu Chen, Qiang Fu, Lun Du, Shi Han, Dongmei Zhang

Initialization plays a critical role in the training of deep neural networks (DNN).

Paper
Code

On the Evaluation of Neural Code Summarization

1 code implementation • 15 Jul 2021 • Ensheng Shi, Yanlin Wang, Lun Du, Junjie Chen, Shi Han, Hongyu Zhang, Dongmei Zhang, Hongbin Sun

To achieve a profound understanding of how far we are from solving this problem and provide suggestions to future research, in this paper, we conduct a systematic and in-depth analysis of 5 state-of-the-art neural code summarization models on 6 widely used BLEU variants, 4 pre-processing operations and their combinations, and 3 widely used datasets.

Code Summarization Source Code Summarization

Paper
Code

Learning Algebraic Recombination for Compositional Generalization

2 code implementations • Findings (ACL) 2021 • Chenyao Liu, Shengnan An, Zeqi Lin, Qian Liu, Bei Chen, Jian-Guang Lou, Lijie Wen, Nanning Zheng, Dongmei Zhang

In this paper, we propose LeAR, an end-to-end neural model to learn algebraic recombination for compositional generalization.

Ranked #2 on Semantic Parsing on CFQ

Semantic Parsing

359

Paper
Code

On the Evaluation of Commit Message Generation Models: An Experimental Study

1 code implementation • 12 Jul 2021 • Wei Tao, Yanlin Wang, Ensheng Shi, Lun Du, Shi Han, Hongyu Zhang, Dongmei Zhang, Wenqiang Zhang

We find that: (1) Different variants of the BLEU metric are used in previous works, which affects the evaluation and understanding of existing methods.

Retrieval

Paper
Code

Is a Single Model Enough? MuCoS: A Multi-Model Ensemble Learning for Semantic Code Search

1 code implementation • 10 Jul 2021 • Lun Du, Xiaozhou Shi, Yanlin Wang, Ensheng Shi, Shi Han, Dongmei Zhang

On the other hand, as a specific query may focus on one or several perspectives, it is difficult for a single query representation module to represent different user intents.

Code Search Data Augmentation +1

Paper
Code

TableSense: Spreadsheet Table Detection with Convolutional Neural Networks

1 code implementation • 25 Jun 2021 • Haoyu Dong, Shijie Liu, Shi Han, Zhouyu Fu, Dongmei Zhang

Spreadsheet table detection is the task of detecting all tables on a given sheet and locating their respective ranges.

Active Learning Boundary Detection +1

Paper
Code

TabularNet: A Neural Network Architecture for Understanding Semantic Structures of Tabular Data

no code implementations • 6 Jun 2021 • Lun Du, Fei Gao, Xu Chen, Ran Jia, Junshan Wang, Jiang Zhang, Shi Han, Dongmei Zhang

To simultaneously extract spatial and relational information from tables, we propose a novel neural network architecture, TabularNet.

graph construction

Paper
Add Code

Understanding and Improvement of Adversarial Training for Network Embedding from an Optimization Perspective

no code implementations • 17 May 2021 • Lun Du, Xu Chen, Fei Gao, Kunqing Xie, Shi Han, Dongmei Zhang

Network Embedding aims to learn a function mapping the nodes to Euclidean space contribute to multiple learning analysis tasks on networks.

Link Prediction Network Embedding +1

Paper
Add Code

Iterative Utterance Segmentation for Neural Semantic Parsing

no code implementations • 13 Dec 2020 • Yinuo Guo, Zeqi Lin, Jian-Guang Lou, Dongmei Zhang

Experiments on Geo, ComplexWebQuestions, and Formulas show that our framework can consistently improve performances of neural semantic parsers in different domains.

Segmentation Semantic Parsing

Paper
Add Code

Revisiting Iterative Back-Translation from the Perspective of Compositional Generalization

no code implementations • 8 Dec 2020 • Yinuo Guo, Hualei Zhu, Zeqi Lin, Bei Chen, Jian-Guang Lou, Dongmei Zhang

Human intelligence exhibits compositional generalization (i. e., the capacity to understand and produce unseen combinations of seen components), but current neural seq2seq models lack such ability.

Translation

Paper
Add Code

"What Do You Mean by That?" A Parser-Independent Interactive Approach for Enhancing Text-to-SQL

1 code implementation • 9 Nov 2020 • Yuntao Li, Bei Chen, Qian Liu, Yan Gao, Jian-Guang Lou, Yan Zhang, Dongmei Zhang

In Natural Language Interfaces to Databases systems, the text-to-SQL technique allows users to query databases by using natural language questions.

Text-To-SQL

359

Paper
Code

TUTA: Tree-based Transformers for Generally Structured Table Pre-training

1 code implementation • 21 Oct 2020 • Zhiruo Wang, Haoyu Dong, Ran Jia, Jia Li, Zhiyi Fu, Shi Han, Dongmei Zhang

First, we devise a unified tree-based structure, called a bi-dimensional coordinate tree, to describe both the spatial and hierarchical information of generally structured tables.

Paper
Code

Hierarchical Poset Decoding for Compositional Generalization in Language

no code implementations • NeurIPS 2020 • Yinuo Guo, Zeqi Lin, Jian-Guang Lou, Dongmei Zhang

We formalize human language understanding as a structured prediction task where the output is a partially ordered set (poset).

Ranked #4 on Semantic Parsing on CFQ

Question Answering Semantic Parsing +1

Paper
Add Code

Incomplete Utterance Rewriting as Semantic Segmentation

1 code implementation • EMNLP 2020 • Qian Liu, Bei Chen, Jian-Guang Lou, Bin Zhou, Dongmei Zhang

Recent years the task of incomplete utterance rewriting has raised a large attention.

Ranked #1 on Dialogue Rewriting on Rewrite

Context Query Reformulation Dialogue Rewriting +2

359

Paper
Code

Table2Charts: Recommending Charts by Learning Shared Table Representations

1 code implementation • 24 Aug 2020 • Mengyu Zhou, Qingtao Li, Xinyi He, Yuejiang Li, Yibo Liu, Wei Ji, Shi Han, Yining Chen, Daxin Jiang, Dongmei Zhang

It is common for people to create different types of charts to explore a multi-dimensional dataset (table).

Q-Learning Recommendation Systems

Paper
Code

Compositional Generalization by Learning Analytical Expressions

1 code implementation • NeurIPS 2020 • Qian Liu, Shengnan An, Jian-Guang Lou, Bei Chen, Zeqi Lin, Yan Gao, Bin Zhou, Nanning Zheng, Dongmei Zhang

Compositional generalization is a basic and essential intellective capability of human beings, which allows us to recombine known parts readily.

Hierarchical Reinforcement Learning

359

Paper
Code

You Impress Me: Dialogue Generation via Mutual Persona Perception

1 code implementation • ACL 2020 • Qian Liu, Yihong Chen, Bei Chen, Jian-Guang Lou, Zixuan Chen, Bin Zhou, Dongmei Zhang

Despite the continuing efforts to improve the engagingness and consistency of chit-chat dialogue systems, the majority of current work simply focus on mimicking human-like responses, leaving understudied the aspects of modeling understanding between interlocutors.

Ranked #2 on Dialogue Generation on Persona-Chat (using extra training data)

Dialogue Generation

307

Paper
Code

How Far are We from Effective Context Modeling? An Exploratory Study on Semantic Parsing in Context

1 code implementation • 3 Feb 2020 • Qian Liu, Bei Chen, Jiaqi Guo, Jian-Guang Lou, Bin Zhou, Dongmei Zhang

Recently semantic parsing in context has received considerable attention, which is challenging since there are complex contextual phenomena.

Semantic Parsing

359

Paper
Code

Data-Anonymous Encoding for Text-to-SQL Generation

no code implementations • IJCNLP 2019 • Zhen Dong, Shizhao Sun, Hongzhi Liu, Jian-Guang Lou, Dongmei Zhang

On text-to-SQL generation, the input utterance usually contains lots of tokens that are related to column names or cells in the table, called \textit{table-related tokens}.

Text-To-SQL

Paper
Add Code

A Hybrid Semantic Parsing Approach for Tabular Data Analysis

no code implementations • 23 Oct 2019 • Yan Gao, Jian-Guang Lou, Dongmei Zhang

This paper presents a novel approach to translating natural language questions to SQL queries for given tables, which meets three requirements as a real-world data analysis application: cross-domain, multilingualism and enabling quick-start.

Semantic Parsing

Paper
Add Code

A Split-and-Recombine Approach for Follow-up Query Analysis

1 code implementation • IJCNLP 2019 • Qian Liu, Bei Chen, Haoyan Liu, Lei Fang, Jian-Guang Lou, Bin Zhou, Dongmei Zhang

To leverage the advances in context-independent semantic parsing, we propose to perform follow-up query analysis, aiming to restate context-dependent natural language queries with contextual information.

Natural Language Queries Semantic Parsing

Paper
Code

Semantic Structure Extraction for Spreadsheet Tables with a Multi-task Learning Architecture

no code implementations • NeurIPS Workshop Document_Intelligen 2019 • Haoyu Dong, Shijie Liu, Zhouyu Fu, Shi Han, Dongmei Zhang

To learn spatial correlations and capture semantics on spreadsheets, we have developed a novel learning-based framework for spreadsheet semantic structure extraction.

Language Modelling Multi-Task Learning

Paper
Add Code

Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation

5 code implementations • ACL 2019 • Jiaqi Guo, Zecheng Zhan, Yan Gao, Yan Xiao, Jian-Guang Lou, Ting Liu, Dongmei Zhang

We present a neural approach called IRNet for complex and cross-domain Text-to-SQL.

Text-To-SQL

256

Paper
Code

FANDA: A Novel Approach to Perform Follow-up Query Analysis

1 code implementation • 24 Jan 2019 • Qian Liu, Bei Chen, Jian-Guang Lou, Ge Jin, Dongmei Zhang

NLIDB allow users to search databases using natural language instead of SQL-like query languages.

Paper
Code

SemRegex: A Semantics-Based Approach for Generating Regular Expressions from Natural Language Specifications

no code implementations • EMNLP 2018 • Zexuan Zhong, Jiaqi Guo, Wei Yang, Jian Peng, Tao Xie, Jian-Guang Lou, Ting Liu, Dongmei Zhang

Recent research proposes syntax-based approaches to address the problem of generating programs from natural language specifications.

Program Synthesis

Paper
Add Code

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning

no code implementations • 25 Apr 2017 • Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, Sunghun Kim

They rely on the sparse availability of bilingual projects, thus producing a limited number of API mappings.

Paper
Add Code

Deep API Learning

no code implementations • 27 May 2016 • Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, Sunghun Kim

We propose DeepAPI, a deep learning based approach to generate API usage sequences for a given natural language query.

Information Retrieval Language Modelling +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.