no code implementations • 28 Dec 2024 • Chongjian Yue, Xinrun Xu, Xiaojun Ma, Lun Du, Zhiming Ding, Shi Han, Dongmei Zhang, Qi Zhang
However, their ability to comprehend and analyze hybrid text, containing textual and tabular data, remains unexplored.
no code implementations • 16 Oct 2024 • Junjie Xing, Yeye He, Mengyu Zhou, Haoyu Dong, Shi Han, Dongmei Zhang, Surajit Chaudhuri
In this work, we propose Table-LLM-Specialist, or Table-Specialist for short, as a new self-trained fine-tuning paradigm specifically designed for table tasks.
2 code implementations • 12 Jul 2024 • Yuzhang Tian, Jianbo Zhao, Haoyu Dong, Junyu Xiong, Shiyu Xia, Mengyu Zhou, Yun Lin, José Cambronero, Yeye He, Shi Han, Dongmei Zhang
Finally, we propose Chain of Spreadsheet for downstream tasks of spreadsheet understanding and validate in a new and demanding spreadsheet QA task.
no code implementations • 2 Jul 2024 • Jiaru Zou, Mengyu Zhou, Tao Li, Shi Han, Dongmei Zhang
Recent advances in fine-tuning large language models (LLMs) have greatly enhanced their usage in domain-specific tasks.
no code implementations • 15 Jun 2024 • Pingchuan Ma, Rui Ding, Qiang Fu, Jiaru Zhang, Shuai Wang, Shi Han, Dongmei Zhang
Differentiable causal discovery has made significant advancements in the learning of directed acyclic graphs.
no code implementations • 25 May 2024 • Shiyu Xia, Junyu Xiong, Haoyu Dong, Jianbo Zhao, Yuzhang Tian, Mengyu Zhou, Yeye He, Shi Han, Dongmei Zhang
Notably, to leverage the strengths of VLMs in understanding text rather than two-dimensional positioning, we propose to decode cell values on the four boundaries of the table in spreadsheet boundary detection.
no code implementations • 13 May 2024 • Mengkang Hu, Haoyu Dong, Ping Luo, Shi Han, Dongmei Zhang
In this paper, we propose to use a knowledge base (KB) as the external knowledge source for TableQA and construct a dataset KET-QA with fine-grained gold evidence annotation.
no code implementations • 20 Mar 2024 • Xinyi He, Jiaru Zou, Yun Lin, Mengyu Zhou, Shi Han, Zejian yuan, Dongmei Zhang
Large Language Models have revolutionized code generation ability by converting natural language descriptions into executable code.
no code implementations • 15 Jan 2024 • Yihan Cao, Xu Chen, Lun Du, Hao Chen, Qiang Fu, Shi Han, Yushu Du, Yanbin Kang, Guangming Lu, Zi Li
Person-job fit is an essential part of online recruitment platforms in serving various downstream applications like Job Search and Candidate Recommendation.
no code implementations • 21 Dec 2023 • Xinyi He, Mengyu Zhou, Xinrun Xu, Xiaojun Ma, Rui Ding, Lun Du, Yan Gao, Ran Jia, Xu Chen, Shi Han, Zejian yuan, Dongmei Zhang
We evaluate five state-of-the-art models using three different metrics and the results show that our benchmark presents introduces considerable challenge in the field of tabular data analysis, paving the way for more advanced research opportunities.
no code implementations • 19 Dec 2023 • Hao Chen, Lun Du, Yuxuan Lu, Qiang Fu, Xu Chen, Shi Han, Yanbin Kang, Guangming Lu, Zi Li
Online recruitment platforms typically employ Person-Job Fit models in the core service that automatically match suitable job seekers with appropriate job positions.
1 code implementation • 14 Dec 2023 • Yuan Sui, Jiaru Zou, Mengyu Zhou, Xinyi He, Lun Du, Shi Han, Dongmei Zhang
Table reasoning tasks have shown remarkable progress with the development of large language models (LLMs), which involve interpreting and drawing conclusions from tabular data based on natural language (NL) questions.
no code implementations • 26 Sep 2023 • Jiayi Liao, Xu Chen, Qiang Fu, Lun Du, Xiangnan He, Xiang Wang, Shi Han, Dongmei Zhang
Recent years have witnessed the substantial progress of large-scale models across various domains, such as natural language processing and computer vision, facilitating the expression of concrete concepts.
1 code implementation • 25 Aug 2023 • Ensheng Shi, Fengji Zhang, Yanlin Wang, Bei Chen, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun
To meet the demands of this dynamic field, there is a growing need for an effective software development assistant.
1 code implementation • 6 Jun 2023 • Jiayan Guo, Lun Du, Xu Chen, Xiaojun Ma, Qiang Fu, Shi Han, Dongmei Zhang, Yan Zhang
Graph CF has attracted more and more attention in recent years due to its effectiveness in leveraging high-order information in the user-item bipartite graph for better recommendations.
no code implementations • 24 May 2023 • Chongjian Yue, Xinrun Xu, Xiaojun Ma, Lun Du, Hengyu Liu, Zhiming Ding, Yanbing Jiang, Shi Han, Dongmei Zhang
We propose an Automated Financial Information Extraction (AFIE) framework that enhances LLMs' ability to comprehend and extract information from financial reports.
no code implementations • 24 May 2023 • Jiayan Guo, Lun Du, Hengyu Liu, Mengyu Zhou, Xinyi He, Shi Han
In this study, we conduct an extensive investigation to assess the proficiency of LLMs in comprehending graph data, employing a diverse range of structural and semantic-related tasks.
1 code implementation • 22 May 2023 • Yuan Sui, Mengyu Zhou, Mingjie Zhou, Shi Han, Dongmei Zhang
Large language models (LLMs) are becoming attractive as few-shot reasoners to solve Natural Language (NL)-related tasks.
no code implementations • 22 May 2023 • Hongjun Wang, Jiyuan Chen, Lun Du, Qiang Fu, Shi Han, Xuan Song
Recent years have witnessed the great potential of attention mechanism in graph representation learning.
1 code implementation • 11 Apr 2023 • Ensheng Shi, Yanlin Wang, Hongyu Zhang, Lun Du, Shi Han, Dongmei Zhang, Hongbin Sun
Our experimental study shows that (1) lexical, syntactic and structural properties of source code are encoded in the lower, intermediate, and higher layers, respectively, while the semantic property spans across the entire model.
no code implementations • 2 Apr 2023 • Pingchuan Ma, Rui Ding, Shuai Wang, Shi Han, Dongmei Zhang
In brief, an IQuery is an abstraction and automation of data analysis operations, which mimics the approach of data analysts and simplifies the exploration process for users.
1 code implementation • 16 Feb 2023 • Jincheng Huang, Lun Du, Xu Chen, Qiang Fu, Shi Han, Dongmei Zhang
Theoretical analyses guarantee the robustness of signals through the mid-pass filter, and we also shed light on the properties of different frequency signals under adversarial attacks.
no code implementations • 13 Feb 2023 • Jiayan Guo, Lun Du, Wendong Bi, Qiang Fu, Xiaojun Ma, Xu Chen, Shi Han, Dongmei Zhang, Yan Zhang
To this end, we propose HDHGR, a homophily-oriented deep heterogeneous graph rewiring approach that modifies the HG structure to increase the performance of HGNN.
1 code implementation • ICLR 2023 • Jinsong Zhang, Qiang Fu, Xu Chen, Lun Du, Zelin Li, Gang Wang, Xiaoguang Liu, Shi Han, Dongmei Zhang
In more detail, penultimate layer outputs on the training set are considered as the representations of in-distribution (ID) data.
Ranked #11 on
Out-of-Distribution Detection
on ImageNet-1k vs Places
1 code implementation • 6 Dec 2022 • Hongwei Han, Jialiang Xu, Mengyu Zhou, Yijia Shao, Shi Han, Dongmei Zhang
But current approaches to rich-number tasks with transformer-based language models abandon or lose some of the numeracy information - e. g., breaking numbers into sub-word tokens - which leads to many number-related errors.
no code implementations • 14 Nov 2022 • Jialiang Xu, Mengyu Zhou, Xinyi He, Shi Han, Dongmei Zhang
Numerical Question Answering is the task of answering questions that require numerical capabilities.
no code implementations • 10 Nov 2022 • Yijia Shao, Mengyu Zhou, Yifan Zhong, Tao Wu, Hongwei Han, Shi Han, Gideon Huang, Dongmei Zhang
To assist form designers, in this work we present FormLM to model online forms (by enhancing pre-trained language model with form structural information) and recommend form creation ideas (including question / options recommendations and block type suggestion).
no code implementations • 22 Oct 2022 • Feifan Li, Lun Du, Qiang Fu, Shi Han, Yushu Du, Guangming Lu, Zi Li
Furthermore, based on the dynamic user intent representations, we propose a meta predictor to perform differentiated user engagement forecasting.
no code implementations • 11 Oct 2022 • Fan Zhou, Haoyu Dong, Qian Liu, Zhoujun Cheng, Shi Han, Dongmei Zhang
Numerical reasoning over natural language has been a long-standing goal for the research community.
no code implementations • 4 Oct 2022 • Lunyiu Nie, Jiuding Sun, Yanlin Wang, Lun Du, Lei Hou, Juanzi Li, Shi Han, Dongmei Zhang, Jidong Zhai
The recent prevalence of pretrained language models (PLMs) has dramatically shifted the paradigm of semantic parsing, where the mapping from natural language utterances to structured logical forms is now formulated as a Seq2Seq task.
no code implementations • 17 Sep 2022 • Wendong Bi, Lun Du, Qiang Fu, Yanlin Wang, Shi Han, Dongmei Zhang
Graph Neural Networks (GNNs) are popular machine learning methods for modeling graph data.
Ranked #7 on
Node Classification
on Squirrel
no code implementations • 2 Sep 2022 • Xinyi He, Mengyu Zhou, Mingjie Zhou, Jialiang Xu, Xiao Lv, Tianle Li, Yijia Shao, Shi Han, Zejian yuan, Dongmei Zhang
Tabular data analysis is performed every day across various domains.
no code implementations • 25 Aug 2022 • Hengyu Liu, Qiang Fu, Lun Du, Tiancheng Zhang, Ge Yu, Shi Han, Dongmei Zhang
Learning rate is one of the most important hyper-parameters that has a significant influence on neural network training.
1 code implementation • 15 Aug 2022 • Wendong Bi, Lun Du, Qiang Fu, Yanlin Wang, Shi Han, Dongmei Zhang
Graph Neural Networks (GNNs) have shown expressive performance on graph representation learning by aggregating information from neighbors.
no code implementations • 1 Aug 2022 • Lingbo Li, Tianle Li, Xinyi He, Mengyu Zhou, Shi Han, Dongmei Zhang
ASTA framework extracts data features by designing signatures based on expert knowledge, and enables data referencing at field- (chart) or cell-level (conditional formatting) with pre-trained models.
no code implementations • 26 Jul 2022 • Pingchuan Ma, Rui Ding, Shuai Wang, Shi Han, Dongmei Zhang
XInsight is a three-module, end-to-end pipeline designed to extract causal graphs, translate causal primitives into XDA semantics, and quantify the quantitative contribution of each explanation to a data fact.
1 code implementation • 25 May 2022 • Fan Zhou, Mengkang Hu, Haoyu Dong, Zhoujun Cheng, Shi Han, Dongmei Zhang
Existing auto-regressive pre-trained language models (PLMs) like T5 and BART, have been well applied to table question answering by UNIFIEDSKG and TAPEX, respectively, and demonstrated state-of-the-art results on multiple benchmarks.
1 code implementation • 25 May 2022 • Ao Liu, Haoyu Dong, Naoaki Okazaki, Shi Han, Dongmei Zhang
However, directly learning the logical inference knowledge from table-text pairs is very difficult for neural models because of the ambiguity of natural language and the scarcity of parallel data.
no code implementations • 7 Apr 2022 • Ensheng Shi, Yanlin Wang, Wenchao Gu, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun
However, there is still a lot of room for improvement in using contrastive learning for code search.
no code implementations • ACL 2022 • Wenchao Gu, Yanlin Wang, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Michael R. Lyu
Code search is to search reusable code snippets from source code corpus based on natural languages queries.
2 code implementations • 5 Mar 2022 • Ensheng Shi, Yanlin Wang, Wei Tao, Lun Du, Hongyu Zhang, Shi Han, Dongmei Zhang, Hongbin Sun
Furthermore, RACE can boost the performance of existing Seq2Seq models in commit message generation.
3 code implementations • 24 Jan 2022 • Haoyu Dong, Zhoujun Cheng, Xinyi He, Mengyu Zhou, Anda Zhou, Fan Zhou, Ao Liu, Shi Han, Dongmei Zhang
Since a vast number of tables can be easily collected from web pages, spreadsheets, PDFs, and various other document types, a flurry of table pre-training frameworks have been proposed following the success of text and images, and they have achieved new state-of-the-arts on various tasks such as table question answering, table type recognition, column relation classification, table search, formula prediction, etc.
1 code implementation • 2 Dec 2021 • Haitao Mao, Lun Du, Yujia Zheng, Qiang Fu, Zelin Li, Xu Chen, Shi Han, Dongmei Zhang
To address the non-trivial adaptation challenges in this practical scenario, we propose a model-agnostic algorithm called SOGA for domain adaptation to fully exploit the discriminative ability of the source model while preserving the consistency of structural proximity on the target graph.
no code implementations • 30 Nov 2021 • Qiang Fu, Lun Du, Haitao Mao, Xu Chen, Wei Fang, Shi Han, Dongmei Zhang
Based on the analysis results, we articulate the Neuron Steadiness Hypothesis: the neuron with similar responses to instances of the same class leads to better generalization.
no code implementations • 16 Nov 2021 • Yuanyuan Jiang, Rui Ding, Tianchi Qiao, Yunan Zhu, Shi Han, Dongmei Zhang
Predictive analytics is human involved, thus the machine learning model is preferred to be interpretable.
1 code implementation • 29 Oct 2021 • Lun Du, Xiaozhou Shi, Qiang Fu, Xiaojun Ma, Hengyu Liu, Shi Han, Dongmei Zhang
For node-level tasks, GNNs have strong power to model the homophily property of graphs (i. e., connected nodes are more similar) while their ability to capture the heterophily property is often doubtful.
1 code implementation • NeurIPS 2021 • Haoyue Dai, Rui Ding, Yuanyuan Jiang, Shi Han, Dongmei Zhang
Starting from seeing that SCL is not better than random guessing if the learning target is non-identifiable a priori, we propose a two-phase paradigm for SCL by explicitly considering structure identifiability.
1 code implementation • ACL 2022 • Zhoujun Cheng, Haoyu Dong, Ran Jia, Pengfei Wu, Shi Han, Fan Cheng, Dongmei Zhang
In this paper, we find that the spreadsheet formula, which performs calculations on numerical values in tables, is naturally a strong supervision of numerical reasoning.
1 code implementation • ACL 2022 • Zhoujun Cheng, Haoyu Dong, Zhiruo Wang, Ran Jia, Jiaqi Guo, Yan Gao, Shi Han, Jian-Guang Lou, Dongmei Zhang
HiTab provides 10, 686 QA pairs and descriptive sentences with well-annotated quantity and entity alignment on 3, 597 tables with broad coverage of table hierarchies and numerical reasoning types.
1 code implementation • 14 Aug 2021 • Haitao Mao, Xu Chen, Qiang Fu, Lun Du, Shi Han, Dongmei Zhang
Initialization plays a critical role in the training of deep neural networks (DNN).
1 code implementation • 15 Jul 2021 • Ensheng Shi, Yanlin Wang, Lun Du, Junjie Chen, Shi Han, Hongyu Zhang, Dongmei Zhang, Hongbin Sun
To achieve a profound understanding of how far we are from solving this problem and provide suggestions to future research, in this paper, we conduct a systematic and in-depth analysis of 5 state-of-the-art neural code summarization models on 6 widely used BLEU variants, 4 pre-processing operations and their combinations, and 3 widely used datasets.
1 code implementation • 12 Jul 2021 • Wei Tao, Yanlin Wang, Ensheng Shi, Lun Du, Shi Han, Hongyu Zhang, Dongmei Zhang, Wenqiang Zhang
We find that: (1) Different variants of the BLEU metric are used in previous works, which affects the evaluation and understanding of existing methods.
1 code implementation • 10 Jul 2021 • Lun Du, Xiaozhou Shi, Yanlin Wang, Ensheng Shi, Shi Han, Dongmei Zhang
On the other hand, as a specific query may focus on one or several perspectives, it is difficult for a single query representation module to represent different user intents.
1 code implementation • 25 Jun 2021 • Haoyu Dong, Shijie Liu, Shi Han, Zhouyu Fu, Dongmei Zhang
Spreadsheet table detection is the task of detecting all tables on a given sheet and locating their respective ranges.
no code implementations • 6 Jun 2021 • Lun Du, Fei Gao, Xu Chen, Ran Jia, Junshan Wang, Jiang Zhang, Shi Han, Dongmei Zhang
To simultaneously extract spatial and relational information from tables, we propose a novel neural network architecture, TabularNet.
no code implementations • 17 May 2021 • Lun Du, Xu Chen, Fei Gao, Kunqing Xie, Shi Han, Dongmei Zhang
Network Embedding aims to learn a function mapping the nodes to Euclidean space contribute to multiple learning analysis tasks on networks.
1 code implementation • 21 Oct 2020 • Zhiruo Wang, Haoyu Dong, Ran Jia, Jia Li, Zhiyi Fu, Shi Han, Dongmei Zhang
First, we devise a unified tree-based structure, called a bi-dimensional coordinate tree, to describe both the spatial and hierarchical information of generally structured tables.
1 code implementation • 24 Aug 2020 • Mengyu Zhou, Qingtao Li, Xinyi He, Yuejiang Li, Yibo Liu, Wei Ji, Shi Han, Yining Chen, Daxin Jiang, Dongmei Zhang
It is common for people to create different types of charts to explore a multi-dimensional dataset (table).
no code implementations • NeurIPS Workshop Document_Intelligen 2019 • Haoyu Dong, Shijie Liu, Zhouyu Fu, Shi Han, Dongmei Zhang
To learn spatial correlations and capture semantics on spreadsheets, we have developed a novel learning-based framework for spreadsheet semantic structure extraction.