Search Results for author: Shi Han

Found 52 papers, 23 papers with code

CONLINE: Complex Code Generation and Refinement with Online Searching and Correctness Testing

no code implementations20 Mar 2024 Xinyi He, Jiaru Zou, Yun Lin, Mengyu Zhou, Shi Han, Zejian yuan, Dongmei Zhang

Large Language Models (LLMs) have revolutionized code generation ability by converting natural language descriptions into executable code.

Code Generation Information Retrieval +1

TAROT: A Hierarchical Framework with Multitask Co-Pretraining on Semi-Structured Data towards Effective Person-Job Fit

no code implementations15 Jan 2024 Yihan Cao, Xu Chen, Lun Du, Hao Chen, Qiang Fu, Shi Han, Yushu Du, Yanbin Kang, Guangming Lu, Zi Li

Person-job fit is an essential part of online recruitment platforms in serving various downstream applications like Job Search and Candidate Recommendation.

Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries

no code implementations21 Dec 2023 Xinyi He, Mengyu Zhou, Xinrun Xu, Xiaojun Ma, Rui Ding, Lun Du, Yan Gao, Ran Jia, Xu Chen, Shi Han, Zejian yuan, Dongmei Zhang

We evaluate five state-of-the-art models using three different metrics and the results show that our benchmark presents introduces considerable challenge in the field of tabular data analysis, paving the way for more advanced research opportunities.

Question Answering

Professional Network Matters: Connections Empower Person-Job Fit

no code implementations19 Dec 2023 Hao Chen, Lun Du, Yuxuan Lu, Qiang Fu, Xu Chen, Shi Han, Yanbin Kang, Guangming Lu, Zi Li

Online recruitment platforms typically employ Person-Job Fit models in the core service that automatically match suitable job seekers with appropriate job positions.

TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning

no code implementations14 Dec 2023 Yuan Sui, Jiaru Zou, Mengyu Zhou, Xinyi He, Lun Du, Shi Han, Dongmei Zhang

Table-based reasoning has shown remarkable progress in combining deep models with discrete reasoning, which requires reasoning over both free-form natural language (NL) questions and semi-structured tabular data.

Language Modelling Large Language Model +2

Text-to-Image Generation for Abstract Concepts

no code implementations26 Sep 2023 Jiayi Liao, Xu Chen, Qiang Fu, Lun Du, Xiangnan He, Xiang Wang, Shi Han, Dongmei Zhang

Recent years have witnessed the substantial progress of large-scale models across various domains, such as natural language processing and computer vision, facilitating the expression of concrete concepts.

Text-to-Image Generation

SoTaNa: The Open-Source Software Development Assistant

1 code implementation25 Aug 2023 Ensheng Shi, Fengji Zhang, Yanlin Wang, Bei Chen, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun

To meet the demands of this dynamic field, there is a growing need for an effective software development assistant.

Code Summarization

On Manipulating Signals of User-Item Graph: A Jacobi Polynomial-based Graph Collaborative Filtering

1 code implementation6 Jun 2023 Jiayan Guo, Lun Du, Xu Chen, Xiaojun Ma, Qiang Fu, Shi Han, Dongmei Zhang, Yan Zhang

Graph CF has attracted more and more attention in recent years due to its effectiveness in leveraging high-order information in the user-item bipartite graph for better recommendations.

Collaborative Filtering Recommendation Systems

Enabling and Analyzing How to Efficiently Extract Information from Hybrid Long Documents with LLMs

no code implementations24 May 2023 Chongjian Yue, Xinrun Xu, Xiaojun Ma, Lun Du, Hengyu Liu, Zhiming Ding, Yanbing Jiang, Shi Han, Dongmei Zhang

We propose an Automated Financial Information Extraction (AFIE) framework that enhances LLMs' ability to comprehend and extract information from financial reports.

Retrieval

GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking

no code implementations24 May 2023 Jiayan Guo, Lun Du, Hengyu Liu, Mengyu Zhou, Xinyi He, Shi Han

In this study, we conduct an extensive investigation to assess the proficiency of LLMs in comprehending graph data, employing a diverse range of structural and semantic-related tasks.

Benchmarking Graph Mining +1

Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study

1 code implementation22 May 2023 Yuan Sui, Mengyu Zhou, Mingjie Zhou, Shi Han, Dongmei Zhang

Although tables can be used as input to LLMs with serialization, there is a lack of comprehensive studies that examine whether LLMs can truly comprehend such data.

Retrieval

Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond

1 code implementation11 Apr 2023 Ensheng Shi, Yanlin Wang, Hongyu Zhang, Lun Du, Shi Han, Dongmei Zhang, Hongbin Sun

Our experimental study shows that (1) lexical, syntactic and structural properties of source code are encoded in the lower, intermediate, and higher layers, respectively, while the semantic property spans across the entire model.

Demonstration of InsightPilot: An LLM-Empowered Automated Data Exploration System

no code implementations2 Apr 2023 Pingchuan Ma, Rui Ding, Shuai Wang, Shi Han, Dongmei Zhang

In brief, an IQuery is an abstraction and automation of data analysis operations, which mimics the approach of data analysts and simplifies the exploration process for users.

Language Modelling Large Language Model

Robust Mid-Pass Filtering Graph Convolutional Networks

1 code implementation16 Feb 2023 Jincheng Huang, Lun Du, Xu Chen, Qiang Fu, Shi Han, Dongmei Zhang

Theoretical analyses guarantee the robustness of signals through the mid-pass filter, and we also shed light on the properties of different frequency signals under adversarial attacks.

Adversarial Attack Node Classification

Homophily-oriented Heterogeneous Graph Rewiring

no code implementations13 Feb 2023 Jiayan Guo, Lun Du, Wendong Bi, Qiang Fu, Xiaojun Ma, Xu Chen, Shi Han, Dongmei Zhang, Yan Zhang

To this end, we propose HDHGR, a homophily-oriented deep heterogeneous graph rewiring approach that modifies the HG structure to increase the performance of HGNN.

LUNA: Language Understanding with Number Augmentations on Transformers via Number Plugins and Pre-training

1 code implementation6 Dec 2022 Hongwei Han, Jialiang Xu, Mengyu Zhou, Yijia Shao, Shi Han, Dongmei Zhang

But current approaches to rich-number tasks with transformer-based language models abandon or lose some of the numeracy information - e. g., breaking numbers into sub-word tokens - which leads to many number-related errors.

FormLM: Recommending Creation Ideas for Online Forms by Modelling Semantic and Structural Information

no code implementations10 Nov 2022 Yijia Shao, Mengyu Zhou, Yifan Zhong, Tao Wu, Hongwei Han, Shi Han, Gideon Huang, Dongmei Zhang

To assist form designers, in this work we present FormLM to model online forms (by enhancing pre-trained language model with form structural information) and recommend form creation ideas (including question / options recommendations and block type suggestion).

Language Modelling

DIGMN: Dynamic Intent Guided Meta Network for Differentiated User Engagement Forecasting in Online Professional Social Platforms

no code implementations22 Oct 2022 Feifan Li, Lun Du, Qiang Fu, Shi Han, Yushu Du, Guangming Lu, Zi Li

Furthermore, based on the dynamic user intent representations, we propose a meta predictor to perform differentiated user engagement forecasting.

Unveiling the Black Box of PLMs with Semantic Anchors: Towards Interpretable Neural Semantic Parsing

no code implementations4 Oct 2022 Lunyiu Nie, Jiuding Sun, Yanlin Wang, Lun Du, Lei Hou, Juanzi Li, Shi Han, Dongmei Zhang, Jidong Zhai

The recent prevalence of pretrained language models (PLMs) has dramatically shifted the paradigm of semantic parsing, where the mapping from natural language utterances to structured logical forms is now formulated as a Seq2Seq task.

Hallucination Semantic Parsing +1

Learning Rate Perturbation: A Generic Plugin of Learning Rate Schedule towards Flatter Local Minima

no code implementations25 Aug 2022 Hengyu Liu, Qiang Fu, Lun Du, Tiancheng Zhang, Ge Yu, Shi Han, Dongmei Zhang

Learning rate is one of the most important hyper-parameters that has a significant influence on neural network training.

MM-GNN: Mix-Moment Graph Neural Network towards Modeling Neighborhood Feature Distribution

1 code implementation15 Aug 2022 Wendong Bi, Lun Du, Qiang Fu, Yanlin Wang, Shi Han, Dongmei Zhang

Graph Neural Networks (GNNs) have shown expressive performance on graph representation learning by aggregating information from neighbors.

Graph Representation Learning

ASTA: Learning Analytical Semantics over Tables for Intelligent Data Analysis and Visualization

no code implementations1 Aug 2022 Lingbo Li, Tianle Li, Xinyi He, Mengyu Zhou, Shi Han, Dongmei Zhang

ASTA framework extracts data features by designing signatures based on expert knowledge, and enables data referencing at field- (chart) or cell-level (conditional formatting) with pre-trained models.

XInsight: eXplainable Data Analysis Through The Lens of Causality

no code implementations26 Jul 2022 Pingchuan Ma, Rui Ding, Shuai Wang, Shi Han, Dongmei Zhang

XInsight is a three-module, end-to-end pipeline designed to extract causal graphs, translate causal primitives into XDA semantics, and quantify the quantitative contribution of each explanation to a data fact.

Decision Making

PLOG: Table-to-Logic Pretraining for Logical Table-to-Text Generation

1 code implementation25 May 2022 Ao Liu, Haoyu Dong, Naoaki Okazaki, Shi Han, Dongmei Zhang

However, directly learning the logical inference knowledge from table-text pairs is very difficult for neural models because of the ambiguity of natural language and the scarcity of parallel data.

Table-to-Text Generation

TaCube: Pre-computing Data Cubes for Answering Numerical-Reasoning Questions over Tabular Data

1 code implementation25 May 2022 Fan Zhou, Mengkang Hu, Haoyu Dong, Zhoujun Cheng, Shi Han, Dongmei Zhang

Existing auto-regressive pre-trained language models (PLMs) like T5 and BART, have been well applied to table question answering by UNIFIEDSKG and TAPEX, respectively, and demonstrated state-of-the-art results on multiple benchmarks.

Question Answering

Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks

no code implementations24 Jan 2022 Haoyu Dong, Zhoujun Cheng, Xinyi He, Mengyu Zhou, Anda Zhou, Fan Zhou, Ao Liu, Shi Han, Dongmei Zhang

Since a vast number of tables can be easily collected from web pages, spreadsheets, PDFs, and various other document types, a flurry of table pre-training frameworks have been proposed following the success of text and images, and they have achieved new state-of-the-arts on various tasks such as table question answering, table type recognition, column relation classification, table search, formula prediction, etc.

Denoising Question Answering +2

Source Free Unsupervised Graph Domain Adaptation

1 code implementation2 Dec 2021 Haitao Mao, Lun Du, Yujia Zheng, Qiang Fu, Zelin Li, Xu Chen, Shi Han, Dongmei Zhang

To address the non-trivial adaptation challenges in this practical scenario, we propose a model-agnostic algorithm called SOGA for domain adaptation to fully exploit the discriminative ability of the source model while preserving the consistency of structural proximity on the target graph.

Domain Adaptation Node Classification

Neuron with Steady Response Leads to Better Generalization

no code implementations30 Nov 2021 Qiang Fu, Lun Du, Haitao Mao, Xu Chen, Wei Fang, Shi Han, Dongmei Zhang

Based on the analysis results, we articulate the Neuron Steadiness Hypothesis: the neuron with similar responses to instances of the same class leads to better generalization.

Inductive Bias

A Unified and Fast Interpretable Model for Predictive Analytics

no code implementations16 Nov 2021 Yuanyuan Jiang, Rui Ding, Tianchi Qiao, Yunan Zhu, Shi Han, Dongmei Zhang

Predictive analytics is human involved, thus the machine learning model is preferred to be interpretable.

Decision Making

GBK-GNN: Gated Bi-Kernel Graph Neural Networks for Modeling Both Homophily and Heterophily

1 code implementation29 Oct 2021 Lun Du, Xiaozhou Shi, Qiang Fu, Xiaojun Ma, Hengyu Liu, Shi Han, Dongmei Zhang

For node-level tasks, GNNs have strong power to model the homophily property of graphs (i. e., connected nodes are more similar) while their ability to capture the heterophily property is often doubtful.

Graph Attention

ML4C: Seeing Causality Through Latent Vicinity

1 code implementation NeurIPS 2021 Haoyue Dai, Rui Ding, Yuanyuan Jiang, Shi Han, Dongmei Zhang

Starting from seeing that SCL is not better than random guessing if the learning target is non-identifiable a priori, we propose a two-phase paradigm for SCL by explicitly considering structure identifiability.

FORTAP: Using Formulas for Numerical-Reasoning-Aware Table Pretraining

1 code implementation ACL 2022 Zhoujun Cheng, Haoyu Dong, Ran Jia, Pengfei Wu, Shi Han, Fan Cheng, Dongmei Zhang

In this paper, we find that the spreadsheet formula, which performs calculations on numerical values in tables, is naturally a strong supervision of numerical reasoning.

HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation

1 code implementation ACL 2022 Zhoujun Cheng, Haoyu Dong, Zhiruo Wang, Ran Jia, Jiaqi Guo, Yan Gao, Shi Han, Jian-Guang Lou, Dongmei Zhang

HiTab provides 10, 686 QA pairs and descriptive sentences with well-annotated quantity and entity alignment on 3, 597 tables with broad coverage of table hierarchies and numerical reasoning types.

Descriptive Entity Alignment +2

Neuron Campaign for Initialization Guided by Information Bottleneck Theory

1 code implementation14 Aug 2021 Haitao Mao, Xu Chen, Qiang Fu, Lun Du, Shi Han, Dongmei Zhang

Initialization plays a critical role in the training of deep neural networks (DNN).

On the Evaluation of Neural Code Summarization

1 code implementation15 Jul 2021 Ensheng Shi, Yanlin Wang, Lun Du, Junjie Chen, Shi Han, Hongyu Zhang, Dongmei Zhang, Hongbin Sun

To achieve a profound understanding of how far we are from solving this problem and provide suggestions to future research, in this paper, we conduct a systematic and in-depth analysis of 5 state-of-the-art neural code summarization models on 6 widely used BLEU variants, 4 pre-processing operations and their combinations, and 3 widely used datasets.

Code Summarization Source Code Summarization

On the Evaluation of Commit Message Generation Models: An Experimental Study

1 code implementation12 Jul 2021 Wei Tao, Yanlin Wang, Ensheng Shi, Lun Du, Shi Han, Hongyu Zhang, Dongmei Zhang, Wenqiang Zhang

We find that: (1) Different variants of the BLEU metric are used in previous works, which affects the evaluation and understanding of existing methods.

Retrieval

Is a Single Model Enough? MuCoS: A Multi-Model Ensemble Learning for Semantic Code Search

1 code implementation10 Jul 2021 Lun Du, Xiaozhou Shi, Yanlin Wang, Ensheng Shi, Shi Han, Dongmei Zhang

On the other hand, as a specific query may focus on one or several perspectives, it is difficult for a single query representation module to represent different user intents.

Code Search Data Augmentation +1

TableSense: Spreadsheet Table Detection with Convolutional Neural Networks

1 code implementation25 Jun 2021 Haoyu Dong, Shijie Liu, Shi Han, Zhouyu Fu, Dongmei Zhang

Spreadsheet table detection is the task of detecting all tables on a given sheet and locating their respective ranges.

Active Learning Boundary Detection +1

TabularNet: A Neural Network Architecture for Understanding Semantic Structures of Tabular Data

no code implementations6 Jun 2021 Lun Du, Fei Gao, Xu Chen, Ran Jia, Junshan Wang, Jiang Zhang, Shi Han, Dongmei Zhang

To simultaneously extract spatial and relational information from tables, we propose a novel neural network architecture, TabularNet.

graph construction

Understanding and Improvement of Adversarial Training for Network Embedding from an Optimization Perspective

no code implementations17 May 2021 Lun Du, Xu Chen, Fei Gao, Kunqing Xie, Shi Han, Dongmei Zhang

Network Embedding aims to learn a function mapping the nodes to Euclidean space contribute to multiple learning analysis tasks on networks.

Link Prediction Network Embedding +1

TUTA: Tree-based Transformers for Generally Structured Table Pre-training

1 code implementation21 Oct 2020 Zhiruo Wang, Haoyu Dong, Ran Jia, Jia Li, Zhiyi Fu, Shi Han, Dongmei Zhang

First, we devise a unified tree-based structure, called a bi-dimensional coordinate tree, to describe both the spatial and hierarchical information of generally structured tables.

Semantic Structure Extraction for Spreadsheet Tables with a Multi-task Learning Architecture

no code implementations NeurIPS Workshop Document_Intelligen 2019 Haoyu Dong, Shijie Liu, Zhouyu Fu, Shi Han, Dongmei Zhang

To learn spatial correlations and capture semantics on spreadsheets, we have developed a novel learning-based framework for spreadsheet semantic structure extraction.

Language Modelling Multi-Task Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.