no code implementations • 14 Apr 2025 • Zhengxuan Zhang, Zhuowen Liang, Yin Wu, Teng Lin, Yuyu Luo, Nan Tang
Large Language Models (LLMs) are transforming data analytics, but their widespread adoption is hindered by two critical limitations: they are not explainable (opaque reasoning processes) and not verifiable (prone to hallucinations and unchecked errors).
no code implementations • 28 Mar 2025 • Yizhang Zhu, Runzhi Jiang, Boyan Li, Nan Tang, Yuyu Luo
Text-to-SQL automatically translates natural language queries to SQL, allowing non-technical users to retrieve data from databases without specialized SQL knowledge.
no code implementations • 24 Mar 2025 • Changlun Li, Yao Shi, Yuyu Luo, Nan Tang
Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, but their effectiveness in financial decision making, particularly in fund investment, remains inadequately evaluated.
no code implementations • 17 Mar 2025 • Tianqi Luo, Chuhan Huang, Leixian Shen, Boyan Li, Shuyu Shen, Wei Zeng, Nan Tang, Yuyu Luo
Natural Language to Visualization (NL2VIS) enables users to create visualizations from natural language queries, making data insights more accessible.
no code implementations • 3 Mar 2025 • Teng Lin, Yizhang Zhu, Yuyu Luo, Nan Tang
The effectiveness of current retrieval-augmented generation (RAG) methods is limited by the LLMs' capacity to aggregate insights from numerous pages.
no code implementations • 28 Feb 2025 • Zhengxuan Zhang, Yin Wu, Yuyu Luo, Nan Tang
Visual Question Answering (VQA) focuses on providing answers to natural language questions by utilizing information from images.
1 code implementation • 9 Feb 2025 • Xudong Yang, Yizhang Zhu, Nan Tang, Yuyu Luo
Conventional multi-modal multi-label emotion recognition (MMER) from videos typically assumes full availability of visual, textual, and acoustic modalities.
no code implementations • 26 Dec 2024 • Yunfan Zhang, Changlun Li, Yuyu Luo, Nan Tang
This gap underscores the need for further advancements in LLM methodologies to enhance their reasoning capabilities for more reliable imputation outcomes.
1 code implementation • 26 Dec 2024 • Xudong Yang, Yifan Wu, Yizhang Zhu, Nan Tang, Yuyu Luo
To effectively train AskChart, we design a three-stage training strategy to align visual and textual modalities for learning robust visual-textual representations and optimizing the learning of the MoE layer.
no code implementations • 19 Dec 2024 • Xiangsen Chen, Xuming Hu, Nan Tang
Despite this progress, existing RAG frameworks, which usually follows the retrieve-then-read paradigm, often struggle with multi-hop QA with temporal information since it has difficulty retrieving and synthesizing accurate time-related information.
no code implementations • 10 Dec 2024 • Meihao Fan, Ju Fan, Nan Tang, Lei Cao, Guoliang Li, Xiaoyong Du
Many of these tables are derived from web sources or real-world scenarios, which require meticulous data preparation (or data prep) to ensure accurate responses.
1 code implementation • 9 Aug 2024 • Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuxin Zhang, Ju Fan, Guoliang Li, Nan Tang, Yuyu Luo
Translating users' natural language queries (NL) into SQL queries (i. e., NL2SQL, a. k. a., Text-to-SQL) can significantly reduce barriers to accessing relational databases and support various commercial applications.
1 code implementation • 17 Jun 2024 • Yushi Sun, Hao Xin, Kai Sun, Yifan Ethan Xu, Xiao Yang, Xin Luna Dong, Nan Tang, Lei Chen
Unfortunately, there lacks a comprehensive benchmark that evaluates the LLMs over a wide range of taxonomies from common to specialized domains and at levels from root to leaf so that we can draw a confident conclusion.
2 code implementations • 16 Jun 2024 • Yupeng Xie, Yuyu Luo, Guoliang Li, Nan Tang
The growing importance of data visualization in business intelligence and data science emphasizes the need for tools that can efficiently generate meaningful visualizations from large datasets.
1 code implementation • 12 Jun 2024 • Yizhang Zhu, Shiyin Du, Boyan Li, Yuyu Luo, Nan Tang
Large Language Models (LLMs) have demonstrated impressive capabilities across a range of scientific tasks including mathematics, physics, and chemistry.
1 code implementation • 7 Jun 2024 • Xiao Yang, Kai Sun, Hao Xin, Yushi Sun, Nikita Bhalla, Xiangsen Chen, Sajal Choudhary, Rongze Daniel Gui, Ziran Will Jiang, Ziyu Jiang, Lingkun Kong, Brian Moran, Jiaqi Wang, Yifan Ethan Xu, An Yan, Chenyu Yang, Eting Yuan, Hanwen Zha, Nan Tang, Lei Chen, Nicolas Scheffer, Yue Liu, Nirav Shah, Rakesh Wanga, Anuj Kumar, Wen-tau Yih, Xin Luna Dong
To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering benchmark of 4, 409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search.
no code implementations • 27 May 2024 • Chengxing Jia, Pengyuan Wang, Ziniu Li, Yi-Chen Li, Zhilong Zhang, Nan Tang, Yang Yu
In a similar vein, our proposed system, the BWArea model, conceptualizes language generation as a decision-making task.
no code implementations • 11 May 2024 • Yifan Wu, Lutao Yan, Leixian Shen, Yunhai Wang, Nan Tang, Yuyu Luo
To further explore the limitations of MLLMs in low-level ChartQA, we conduct experiments that alter visual elements of charts (e. g., changing color schemes, adding image noise) to assess their impact on the task effectiveness.
no code implementations • 14 Apr 2024 • Jing-Cheng Pang, Si-Hang Yang, Kaiyuan Li, Jiaji Zhang, Xiong-Hui Chen, Nan Tang, Yang Yu
Furthermore, KALM effectively enables the LLM to comprehend environmental dynamics, resulting in the generation of meaningful imaginary rollouts that reflect novel skills and demonstrate the seamless integration of large language models and reinforcement learning.
no code implementations • 6 Feb 2024 • Jing-Cheng Pang, Heng-Bo Fan, Pengyuan Wang, Jia-Hao Xiao, Nan Tang, Si-Hang Yang, Chengxing Jia, Sheng-Jun Huang, Yang Yu
The rise of large language models (LLMs) has revolutionized the way that we interact with artificial intelligence systems through natural language.
1 code implementation • 7 Dec 2023 • Meihao Fan, Xiaoyue Han, Ju Fan, Chengliang Chai, Nan Tang, Guoliang Li, Xiaoyong Du
However, existing ICL approaches to ER typically necessitate providing a task description and a set of demonstrations for each entity pair and thus have limitations on the monetary cost of interfacing LLMs.
no code implementations • 1 Oct 2023 • Zui Chen, Lei Cao, Sam Madden, Tim Kraska, Zeyuan Shang, Ju Fan, Nan Tang, Zihui Gu, Chunwei Liu, Michael Cafarella
As a result, data scientists often have to develop domain-specific solutions tailored to both the dataset and the task, e. g. writing domain-specific code or training machine learning models on a sufficient number of annotated examples.
no code implementations • 6 Jul 2023 • Nan Tang, Chenyu Yang, Ju Fan, Lei Cao, Yuyu Luo, Alon Halevy
We propose that verifying the outputs of generative AI from a data management perspective is an emerging issue for generative AI.
1 code implementation • 15 Jun 2023 • Zihui Gu, Ju Fan, Nan Tang, Songyue Zhang, Yuxin Zhang, Zui Chen, Lei Cao, Guoliang Li, Sam Madden, Xiaoyong Du
PLMs can perform well in schema alignment but struggle to achieve complex reasoning, while LLMs is superior in complex reasoning tasks but cannot achieve precise schema alignment.
1 code implementation • SIGMOD/PODS 2023 • Jianhong Tu, Ju Fan, Nan Tang, Peng Wang, Guoliang Li, Xiaoyong Du, Xiaofeng Jia, Song Gao
The widely used practice is to build task-specific or even dataset-specific solutions, which are hard to generalize and disable the opportunities of knowledge sharing that can be learned from different datasets and multiple tasks.
no code implementations • 7 Apr 2023 • Sibei Chen, Hanbing Liu, Weiting Jin, Xiangyu Sun, Xiaoyao Feng, Ju Fan, Xiaoyong Du, Nan Tang
Orchestrating a high-quality data preparation program is essential for successful machine learning (ML), but it is known to be time and effort consuming.
no code implementations • 29 Mar 2023 • Zan Ahmad Naeem, Mohammad Shahmeer Ahmad, Mohamed Eltabakh, Mourad Ouzzani, Nan Tang
To assist with this scenario, we developed a custom RoBERTa-based foundation model that can be locally deployed.
1 code implementation • 5 Nov 2022 • Zihui Gu, Ju Fan, Nan Tang, Preslav Nakov, Xiaoman Zhao, Xiaoyong Du
In particular, on the complex set of TabFact, which contains multiple operations, PASTA largely outperforms the previous state of the art by 4. 7 points (85. 6% vs. 80. 9%), and the gap between PASTA and human performance on the small TabFact test set is narrowed to just 1. 5 points (90. 6% vs. 92. 1%).
Ranked #3 on
Table-based Fact Verification
on TabFact
1 code implementation • SIGMOD/PODS 2022 • Jianhong Tu, Ju Fan, Nan Tang, Peng Wang, Chengliang Chai, Guoliang Li, Ruixue Fan, Xiaoyong Du
Entity resolution (ER) is a core problem of data integration.
Ranked #2 on
Entity Resolution
on WDC Watches-small
1 code implementation • Proceedings of the VLDB Endowment 2021 • Saravanan Thirumuruganathan, Han Li, Nan Tang, Mourad Ouzzani, Yash Govind, Derek Paulsen, Glenn Fung, AnHai Doan
In this paper, we develop the DeepBlocker framework that significantly advances the state of the art in applying DL to blocking for EM.
Ranked #5 on
Blocking
on Abt-Buy
no code implementations • 4 Dec 2020 • Nan Tang, Ju Fan, Fangyi Li, Jianhong Tu, Xiaoyong Du, Guoliang Li, Sam Madden, Mourad Ouzzani
RPT is pre-trained for a tuple-to-tuple model by corrupting the input tuple and then learning a model to reconstruct the original tuple.
1 code implementation • 8 Mar 2019 • John K. Feser, Samuel Madden, Nan Tang, Armando Solar-Lezama
Optimizing the physical data storage and retrieval of data are two key database management problems.
Programming Languages Databases
no code implementations • 28 Sep 2018 • Saravanan Thirumuruganathan, Shameem A Puthiya Parambath, Mourad Ouzzani, Nan Tang, Shafiq Joty
Entity resolution (ER) is one of the fundamental problems in data integration, where machine learning (ML) based classifiers often provide the state-of-the-art results.
3 code implementations • 2 Oct 2017 • Muhammad Ebraheem, Saravanan Thirumuruganathan, Shafiq Joty, Mourad Ouzzani, Nan Tang
word embeddings), we present a novel ER system, called DeepER, that achieves good accuracy, high efficiency, as well as ease-of-use (i. e., much less human efforts).
Databases