Search Results for author: Shi Han

Found 52 papers, 23 papers with code

CONLINE: Complex Code Generation and Refinement with Online Searching and Correctness Testing

no code implementations • 20 Mar 2024 • Xinyi He, Jiaru Zou, Yun Lin, Mengyu Zhou, Shi Han, Zejian yuan, Dongmei Zhang

Large Language Models (LLMs) have revolutionized code generation ability by converting natural language descriptions into executable code.

Code Generation Information Retrieval +1

Paper
Add Code

TAROT: A Hierarchical Framework with Multitask Co-Pretraining on Semi-Structured Data towards Effective Person-Job Fit

no code implementations • 15 Jan 2024 • Yihan Cao, Xu Chen, Lun Du, Hao Chen, Qiang Fu, Shi Han, Yushu Du, Yanbin Kang, Guangming Lu, Zi Li

Person-job fit is an essential part of online recruitment platforms in serving various downstream applications like Job Search and Candidate Recommendation.

Paper
Add Code

Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries

no code implementations • 21 Dec 2023 • Xinyi He, Mengyu Zhou, Xinrun Xu, Xiaojun Ma, Rui Ding, Lun Du, Yan Gao, Ran Jia, Xu Chen, Shi Han, Zejian yuan, Dongmei Zhang

We evaluate five state-of-the-art models using three different metrics and the results show that our benchmark presents introduces considerable challenge in the field of tabular data analysis, paving the way for more advanced research opportunities.

Question Answering

Paper
Add Code

Professional Network Matters: Connections Empower Person-Job Fit

no code implementations • 19 Dec 2023 • Hao Chen, Lun Du, Yuxuan Lu, Qiang Fu, Xu Chen, Shi Han, Yanbin Kang, Guangming Lu, Zi Li

Online recruitment platforms typically employ Person-Job Fit models in the core service that automatically match suitable job seekers with appropriate job positions.

Paper
Add Code

TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning

no code implementations • 14 Dec 2023 • Yuan Sui, Jiaru Zou, Mengyu Zhou, Xinyi He, Lun Du, Shi Han, Dongmei Zhang

Table-based reasoning has shown remarkable progress in combining deep models with discrete reasoning, which requires reasoning over both free-form natural language (NL) questions and semi-structured tabular data.

Language Modelling Large Language Model +2

Paper
Add Code

Text-to-Image Generation for Abstract Concepts

no code implementations • 26 Sep 2023 • Jiayi Liao, Xu Chen, Qiang Fu, Lun Du, Xiangnan He, Xiang Wang, Shi Han, Dongmei Zhang

Recent years have witnessed the substantial progress of large-scale models across various domains, such as natural language processing and computer vision, facilitating the expression of concrete concepts.

Text-to-Image Generation

Paper
Add Code

SoTaNa: The Open-Source Software Development Assistant

1 code implementation • 25 Aug 2023 • Ensheng Shi, Fengji Zhang, Yanlin Wang, Bei Chen, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun

To meet the demands of this dynamic field, there is a growing need for an effective software development assistant.

Code Summarization

125

Paper
Code

On Manipulating Signals of User-Item Graph: A Jacobi Polynomial-based Graph Collaborative Filtering

1 code implementation • 6 Jun 2023 • Jiayan Guo, Lun Du, Xu Chen, Xiaojun Ma, Qiang Fu, Shi Han, Dongmei Zhang, Yan Zhang

Graph CF has attracted more and more attention in recent years due to its effectiveness in leveraging high-order information in the user-item bipartite graph for better recommendations.

Collaborative Filtering Recommendation Systems

Paper
Code

Enabling and Analyzing How to Efficiently Extract Information from Hybrid Long Documents with LLMs

no code implementations • 24 May 2023 • Chongjian Yue, Xinrun Xu, Xiaojun Ma, Lun Du, Hengyu Liu, Zhiming Ding, Yanbing Jiang, Shi Han, Dongmei Zhang

We propose an Automated Financial Information Extraction (AFIE) framework that enhances LLMs' ability to comprehend and extract information from financial reports.

Retrieval

Paper
Add Code

GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking

no code implementations • 24 May 2023 • Jiayan Guo, Lun Du, Hengyu Liu, Mengyu Zhou, Xinyi He, Shi Han

In this study, we conduct an extensive investigation to assess the proficiency of LLMs in comprehending graph data, employing a diverse range of structural and semantic-related tasks.

Benchmarking Graph Mining +1

Paper
Add Code

Causal-Based Supervision of Attention in Graph Neural Network: A Better and Simpler Choice towards Powerful Attention

no code implementations • 22 May 2023 • Hongjun Wang, Jiyuan Chen, Lun Du, Qiang Fu, Shi Han, Xuan Song

Recent years have witnessed the great potential of attention mechanism in graph representation learning.

Graph Representation Learning

Paper
Add Code

Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study

1 code implementation • 22 May 2023 • Yuan Sui, Mengyu Zhou, Mingjie Zhou, Shi Han, Dongmei Zhang

Although tables can be used as input to LLMs with serialization, there is a lack of comprehensive studies that examine whether LLMs can truly comprehend such data.

Retrieval

Paper
Code

Towards Efficient Fine-tuning of Pre-trained Code Models: An Experimental Study and Beyond

1 code implementation • 11 Apr 2023 • Ensheng Shi, Yanlin Wang, Hongyu Zhang, Lun Du, Shi Han, Dongmei Zhang, Hongbin Sun

Our experimental study shows that (1) lexical, syntactic and structural properties of source code are encoded in the lower, intermediate, and higher layers, respectively, while the semantic property spans across the entire model.

Paper
Code

Demonstration of InsightPilot: An LLM-Empowered Automated Data Exploration System

no code implementations • 2 Apr 2023 • Pingchuan Ma, Rui Ding, Shuai Wang, Shi Han, Dongmei Zhang

In brief, an IQuery is an abstraction and automation of data analysis operations, which mimics the approach of data analysts and simplifies the exploration process for users.

Language Modelling Large Language Model

Paper
Add Code

Robust Mid-Pass Filtering Graph Convolutional Networks

1 code implementation • 16 Feb 2023 • Jincheng Huang, Lun Du, Xu Chen, Qiang Fu, Shi Han, Dongmei Zhang

Theoretical analyses guarantee the robustness of signals through the mid-pass filter, and we also shed light on the properties of different frequency signals under adversarial attacks.

Adversarial Attack Node Classification

Paper
Code

Homophily-oriented Heterogeneous Graph Rewiring

no code implementations • 13 Feb 2023 • Jiayan Guo, Lun Du, Wendong Bi, Qiang Fu, Xiaojun Ma, Xu Chen, Shi Han, Dongmei Zhang, Yan Zhang

To this end, we propose HDHGR, a homophily-oriented deep heterogeneous graph rewiring approach that modifies the HG structure to increase the performance of HGNN.

Paper
Add Code

Out-of-Distribution Detection based on In-Distribution Data Patterns Memorization with Modern Hopfield Energy

1 code implementation • ICLR 2023 • Jinsong Zhang, Qiang Fu, Xu Chen, Lun Du, Zelin Li, Gang Wang, Xiaoguang Liu, Shi Han, Dongmei Zhang

In more detail, penultimate layer outputs on the training set are considered as the representations of in-distribution (ID) data.

Ranked #11 on Out-of-Distribution Detection on ImageNet-1k vs Places

Computational Efficiency Memorization +2

Paper
Code

LUNA: Language Understanding with Number Augmentations on Transformers via Number Plugins and Pre-training

1 code implementation • 6 Dec 2022 • Hongwei Han, Jialiang Xu, Mengyu Zhou, Yijia Shao, Shi Han, Dongmei Zhang

But current approaches to rich-number tasks with transformer-based language models abandon or lose some of the numeracy information - e. g., breaking numbers into sub-word tokens - which leads to many number-related errors.

Paper
Code

Towards Robust Numerical Question Answering: Diagnosing Numerical Capabilities of NLP Systems

no code implementations • 14 Nov 2022 • Jialiang Xu, Mengyu Zhou, Xinyi He, Shi Han, Dongmei Zhang

Numerical Question Answering is the task of answering questions that require numerical capabilities.

Data Augmentation Open-Ended Question Answering

Paper
Add Code

FormLM: Recommending Creation Ideas for Online Forms by Modelling Semantic and Structural Information

no code implementations • 10 Nov 2022 • Yijia Shao, Mengyu Zhou, Yifan Zhong, Tao Wu, Hongwei Han, Shi Han, Gideon Huang, Dongmei Zhang

To assist form designers, in this work we present FormLM to model online forms (by enhancing pre-trained language model with form structural information) and recommend form creation ideas (including question / options recommendations and block type suggestion).

Language Modelling

Paper
Add Code

DIGMN: Dynamic Intent Guided Meta Network for Differentiated User Engagement Forecasting in Online Professional Social Platforms

no code implementations • 22 Oct 2022 • Feifan Li, Lun Du, Qiang Fu, Shi Han, Yushu Du, Guangming Lu, Zi Li

Furthermore, based on the dynamic user intent representations, we propose a meta predictor to perform differentiated user engagement forecasting.

Paper
Add Code

Reflection of Thought: Inversely Eliciting Numerical Reasoning in Language Models via Solving Linear Systems

no code implementations • 11 Oct 2022 • Fan Zhou, Haoyu Dong, Qian Liu, Zhoujun Cheng, Shi Han, Dongmei Zhang

Numerical reasoning over natural language has been a long-standing goal for the research community.

Paper
Add Code

Unveiling the Black Box of PLMs with Semantic Anchors: Towards Interpretable Neural Semantic Parsing

no code implementations • 4 Oct 2022 • Lunyiu Nie, Jiuding Sun, Yanlin Wang, Lun Du, Lei Hou, Juanzi Li, Shi Han, Dongmei Zhang, Jidong Zhai

The recent prevalence of pretrained language models (PLMs) has dramatically shifted the paradigm of semantic parsing, where the mapping from natural language utterances to structured logical forms is now formulated as a Seq2Seq task.

Hallucination Semantic Parsing +1

Paper
Add Code

Make Heterophily Graphs Better Fit GNN: A Graph Rewiring Approach

no code implementations • 17 Sep 2022 • Wendong Bi, Lun Du, Qiang Fu, Yanlin Wang, Shi Han, Dongmei Zhang

Graph Neural Networks (GNNs) are popular machine learning methods for modeling graph data.

Ranked #5 on Node Classification on Squirrel

Node Classification

Paper
Add Code

AnaMeta: A Table Understanding Dataset of Field Metadata Knowledge Shared by Multi-dimensional Data Analysis Tasks

no code implementations • 2 Sep 2022 • Xinyi He, Mengyu Zhou, Mingjie Zhou, Jialiang Xu, Xiao Lv, Tianle Li, Yijia Shao, Shi Han, Zejian yuan, Dongmei Zhang

Tabular data analysis is performed every day across various domains.

Paper
Add Code

Learning Rate Perturbation: A Generic Plugin of Learning Rate Schedule towards Flatter Local Minima

no code implementations • 25 Aug 2022 • Hengyu Liu, Qiang Fu, Lun Du, Tiancheng Zhang, Ge Yu, Shi Han, Dongmei Zhang

Learning rate is one of the most important hyper-parameters that has a significant influence on neural network training.

Paper
Add Code

MM-GNN: Mix-Moment Graph Neural Network towards Modeling Neighborhood Feature Distribution

1 code implementation • 15 Aug 2022 • Wendong Bi, Lun Du, Qiang Fu, Yanlin Wang, Shi Han, Dongmei Zhang

Graph Neural Networks (GNNs) have shown expressive performance on graph representation learning by aggregating information from neighbors.

Graph Representation Learning

Paper
Code

ASTA: Learning Analytical Semantics over Tables for Intelligent Data Analysis and Visualization

no code implementations • 1 Aug 2022 • Lingbo Li, Tianle Li, Xinyi He, Mengyu Zhou, Shi Han, Dongmei Zhang

ASTA framework extracts data features by designing signatures based on expert knowledge, and enables data referencing at field- (chart) or cell-level (conditional formatting) with pre-trained models.

Paper
Add Code

XInsight: eXplainable Data Analysis Through The Lens of Causality

no code implementations • 26 Jul 2022 • Pingchuan Ma, Rui Ding, Shuai Wang, Shi Han, Dongmei Zhang

XInsight is a three-module, end-to-end pipeline designed to extract causal graphs, translate causal primitives into XDA semantics, and quantify the quantitative contribution of each explanation to a data fact.

Decision Making

Paper
Add Code

PLOG: Table-to-Logic Pretraining for Logical Table-to-Text Generation

1 code implementation • 25 May 2022 • Ao Liu, Haoyu Dong, Naoaki Okazaki, Shi Han, Dongmei Zhang

However, directly learning the logical inference knowledge from table-text pairs is very difficult for neural models because of the ambiguity of natural language and the scarcity of parallel data.

Table-to-Text Generation

Paper
Code

TaCube: Pre-computing Data Cubes for Answering Numerical-Reasoning Questions over Tabular Data

1 code implementation • 25 May 2022 • Fan Zhou, Mengkang Hu, Haoyu Dong, Zhoujun Cheng, Shi Han, Dongmei Zhang

Existing auto-regressive pre-trained language models (PLMs) like T5 and BART, have been well applied to table question answering by UNIFIEDSKG and TAPEX, respectively, and demonstrated state-of-the-art results on multiple benchmarks.

Question Answering

Paper
Code

CoCoSoDa: Effective Contrastive Learning for Code Search

no code implementations • 7 Apr 2022 • Ensheng Shi, Yanlin Wang, Wenchao Gu, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun

However, there is still a lot of room for improvement in using contrastive learning for code search.

Code Search Contrastive Learning +2

Paper
Add Code

Accelerating Code Search with Deep Hashing and Code Classification

no code implementations • ACL 2022 • Wenchao Gu, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Michael R. Lyu

Code search is to search reusable code snippets from source code corpus based on natural languages queries.

Classification Code Classification +2

Paper
Add Code

RACE: Retrieval-Augmented Commit Message Generation

2 code implementations • 5 Mar 2022 • Ensheng Shi, Yanlin Wang, Wei Tao, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun

Furthermore, RACE can boost the performance of existing Seq2Seq models in commit message generation.

Information Retrieval Retrieval +2

Paper
Code

Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks

no code implementations • 24 Jan 2022 • Haoyu Dong, Zhoujun Cheng, Xinyi He, Mengyu Zhou, Anda Zhou, Fan Zhou, Ao Liu, Shi Han, Dongmei Zhang

Since a vast number of tables can be easily collected from web pages, spreadsheets, PDFs, and various other document types, a flurry of table pre-training frameworks have been proposed following the success of text and images, and they have achieved new state-of-the-arts on various tasks such as table question answering, table type recognition, column relation classification, table search, formula prediction, etc.

Denoising Question Answering +2

Paper
Add Code

Source Free Unsupervised Graph Domain Adaptation

1 code implementation • 2 Dec 2021 • Haitao Mao, Lun Du, Yujia Zheng, Qiang Fu, Zelin Li, Xu Chen, Shi Han, Dongmei Zhang

To address the non-trivial adaptation challenges in this practical scenario, we propose a model-agnostic algorithm called SOGA for domain adaptation to fully exploit the discriminative ability of the source model while preserving the consistency of structural proximity on the target graph.

Domain Adaptation Node Classification

Paper
Code

Neuron with Steady Response Leads to Better Generalization

no code implementations • 30 Nov 2021 • Qiang Fu, Lun Du, Haitao Mao, Xu Chen, Wei Fang, Shi Han, Dongmei Zhang

Based on the analysis results, we articulate the Neuron Steadiness Hypothesis: the neuron with similar responses to instances of the same class leads to better generalization.

Inductive Bias

Paper
Add Code

A Unified and Fast Interpretable Model for Predictive Analytics

no code implementations • 16 Nov 2021 • Yuanyuan Jiang, Rui Ding, Tianchi Qiao, Yunan Zhu, Shi Han, Dongmei Zhang

Predictive analytics is human involved, thus the machine learning model is preferred to be interpretable.

Decision Making

Paper
Add Code

GBK-GNN: Gated Bi-Kernel Graph Neural Networks for Modeling Both Homophily and Heterophily

1 code implementation • 29 Oct 2021 • Lun Du, Xiaozhou Shi, Qiang Fu, Xiaojun Ma, Hengyu Liu, Shi Han, Dongmei Zhang

For node-level tasks, GNNs have strong power to model the homophily property of graphs (i. e., connected nodes are more similar) while their ability to capture the heterophily property is often doubtful.

Graph Attention

Paper
Code

ML4C: Seeing Causality Through Latent Vicinity

1 code implementation • NeurIPS 2021 • Haoyue Dai, Rui Ding, Yuanyuan Jiang, Shi Han, Dongmei Zhang

Starting from seeing that SCL is not better than random guessing if the learning target is non-identifiable a priori, we propose a two-phase paradigm for SCL by explicitly considering structure identifiability.

Paper
Code

FORTAP: Using Formulas for Numerical-Reasoning-Aware Table Pretraining

1 code implementation • ACL 2022 • Zhoujun Cheng, Haoyu Dong, Ran Jia, Pengfei Wu, Shi Han, Fan Cheng, Dongmei Zhang

In this paper, we find that the spreadsheet formula, which performs calculations on numerical values in tables, is naturally a strong supervision of numerical reasoning.

Paper
Code

HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation

1 code implementation • ACL 2022 • Zhoujun Cheng, Haoyu Dong, Zhiruo Wang, Ran Jia, Jiaqi Guo, Yan Gao, Shi Han, Jian-Guang Lou, Dongmei Zhang

HiTab provides 10, 686 QA pairs and descriptive sentences with well-annotated quantity and entity alignment on 3, 597 tables with broad coverage of table hierarchies and numerical reasoning types.

Descriptive Entity Alignment +2

Paper
Code

Neuron Campaign for Initialization Guided by Information Bottleneck Theory

1 code implementation • 14 Aug 2021 • Haitao Mao, Xu Chen, Qiang Fu, Lun Du, Shi Han, Dongmei Zhang

Initialization plays a critical role in the training of deep neural networks (DNN).

Paper
Code

On the Evaluation of Neural Code Summarization

1 code implementation • 15 Jul 2021 • Ensheng Shi, Yanlin Wang, Lun Du, Junjie Chen, Shi Han, Hongyu Zhang, Dongmei Zhang, Hongbin Sun

To achieve a profound understanding of how far we are from solving this problem and provide suggestions to future research, in this paper, we conduct a systematic and in-depth analysis of 5 state-of-the-art neural code summarization models on 6 widely used BLEU variants, 4 pre-processing operations and their combinations, and 3 widely used datasets.

Code Summarization Source Code Summarization

Paper
Code

On the Evaluation of Commit Message Generation Models: An Experimental Study

1 code implementation • 12 Jul 2021 • Wei Tao, Yanlin Wang, Ensheng Shi, Lun Du, Shi Han, Hongyu Zhang, Dongmei Zhang, Wenqiang Zhang

We find that: (1) Different variants of the BLEU metric are used in previous works, which affects the evaluation and understanding of existing methods.

Retrieval

Paper
Code

Is a Single Model Enough? MuCoS: A Multi-Model Ensemble Learning for Semantic Code Search

1 code implementation • 10 Jul 2021 • Lun Du, Xiaozhou Shi, Yanlin Wang, Ensheng Shi, Shi Han, Dongmei Zhang

On the other hand, as a specific query may focus on one or several perspectives, it is difficult for a single query representation module to represent different user intents.

Code Search Data Augmentation +1

Paper
Code

TableSense: Spreadsheet Table Detection with Convolutional Neural Networks

1 code implementation • 25 Jun 2021 • Haoyu Dong, Shijie Liu, Shi Han, Zhouyu Fu, Dongmei Zhang

Spreadsheet table detection is the task of detecting all tables on a given sheet and locating their respective ranges.

Active Learning Boundary Detection +1

Paper
Code

TabularNet: A Neural Network Architecture for Understanding Semantic Structures of Tabular Data

no code implementations • 6 Jun 2021 • Lun Du, Fei Gao, Xu Chen, Ran Jia, Junshan Wang, Jiang Zhang, Shi Han, Dongmei Zhang

To simultaneously extract spatial and relational information from tables, we propose a novel neural network architecture, TabularNet.

graph construction

Paper
Add Code

Understanding and Improvement of Adversarial Training for Network Embedding from an Optimization Perspective

no code implementations • 17 May 2021 • Lun Du, Xu Chen, Fei Gao, Kunqing Xie, Shi Han, Dongmei Zhang

Network Embedding aims to learn a function mapping the nodes to Euclidean space contribute to multiple learning analysis tasks on networks.

Link Prediction Network Embedding +1

Paper
Add Code

TUTA: Tree-based Transformers for Generally Structured Table Pre-training

1 code implementation • 21 Oct 2020 • Zhiruo Wang, Haoyu Dong, Ran Jia, Jia Li, Zhiyi Fu, Shi Han, Dongmei Zhang

First, we devise a unified tree-based structure, called a bi-dimensional coordinate tree, to describe both the spatial and hierarchical information of generally structured tables.

Paper
Code

Table2Charts: Recommending Charts by Learning Shared Table Representations

1 code implementation • 24 Aug 2020 • Mengyu Zhou, Qingtao Li, Xinyi He, Yuejiang Li, Yibo Liu, Wei Ji, Shi Han, Yining Chen, Daxin Jiang, Dongmei Zhang

It is common for people to create different types of charts to explore a multi-dimensional dataset (table).

Q-Learning Recommendation Systems

Paper
Code

Semantic Structure Extraction for Spreadsheet Tables with a Multi-task Learning Architecture

no code implementations • NeurIPS Workshop Document_Intelligen 2019 • Haoyu Dong, Shijie Liu, Zhouyu Fu, Shi Han, Dongmei Zhang

To learn spatial correlations and capture semantics on spreadsheets, we have developed a novel learning-based framework for spreadsheet semantic structure extraction.

Language Modelling Multi-Task Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.