Search Results for author: Nan Tang

Found 34 papers, 15 papers with code

DataMosaic: Explainable and Verifiable Multi-Modal Data Analytics through Extract-Reason-Verify

no code implementations14 Apr 2025 Zhengxuan Zhang, Zhuowen Liang, Yin Wu, Teng Lin, Yuyu Luo, Nan Tang

Large Language Models (LLMs) are transforming data analytics, but their widespread adoption is hindered by two critical limitations: they are not explainable (opaque reasoning processes) and not verifiable (prone to hallucinations and unchecked errors).

RAG

EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing

no code implementations28 Mar 2025 Yizhang Zhu, Runzhi Jiang, Boyan Li, Nan Tang, Yuyu Luo

Text-to-SQL automatically translates natural language queries to SQL, allowing non-technical users to retrieve data from databases without specialized SQL knowledge.

Natural Language Queries Text-To-SQL

DeepFund: Will LLM be Professional at Fund Investment? A Live Arena Perspective

no code implementations24 Mar 2025 Changlun Li, Yao Shi, Yuyu Luo, Nan Tang

Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, but their effectiveness in financial decision making, particularly in fund investment, remains inadequately evaluated.

Decision Making

nvBench 2.0: A Benchmark for Natural Language to Visualization under Ambiguity

no code implementations17 Mar 2025 Tianqi Luo, Chuhan Huang, Leixian Shen, Boyan Li, Shuyu Shen, Wei Zeng, Nan Tang, Yuyu Luo

Natural Language to Visualization (NL2VIS) enables users to create visualizations from natural language queries, making data insights more accessible.

Natural Language Queries valid

SRAG: Structured Retrieval-Augmented Generation for Multi-Entity Question Answering over Wikipedia Graph

no code implementations3 Mar 2025 Teng Lin, Yizhang Zhu, Yuyu Luo, Nan Tang

The effectiveness of current retrieval-augmented generation (RAG) methods is limited by the LLMs' capacity to aggregate insights from numerous pages.

Question Answering RAG +1

Fine-Grained Retrieval-Augmented Generation for Visual Question Answering

no code implementations28 Feb 2025 Zhengxuan Zhang, Yin Wu, Yuyu Luo, Nan Tang

Visual Question Answering (VQA) focuses on providing answers to natural language questions by utilizing information from images.

Question Answering RAG +2

RAMer: Reconstruction-based Adversarial Model for Multi-party Multi-modal Multi-label Emotion Recognition

1 code implementation9 Feb 2025 Xudong Yang, Yizhang Zhu, Nan Tang, Yuyu Luo

Conventional multi-modal multi-label emotion recognition (MMER) from videos typically assumes full availability of visual, textual, and acoustic modalities.

Contrastive Learning Emotion Recognition +1

SketchFill: Sketch-Guided Code Generation for Imputing Derived Missing Values

no code implementations26 Dec 2024 Yunfan Zhang, Changlun Li, Yuyu Luo, Nan Tang

This gap underscores the need for further advancements in LLM methodologies to enhance their reasoning capabilities for more reliable imputation outcomes.

Code Generation Imputation +2

AskChart: Universal Chart Understanding through Textual Enhancement

1 code implementation26 Dec 2024 Xudong Yang, Yifan Wu, Yizhang Zhu, Nan Tang, Yuyu Luo

To effectively train AskChart, we design a three-stage training strategy to align visual and textual modalities for learning robust visual-textual representations and optimizing the learning of the MoE layer.

Chart Understanding

Review-Then-Refine: A Dynamic Framework for Multi-Hop Question Answering with Temporal Adaptability

no code implementations19 Dec 2024 Xiangsen Chen, Xuming Hu, Nan Tang

Despite this progress, existing RAG frameworks, which usually follows the retrieve-then-read paradigm, often struggle with multi-hop QA with temporal information since it has difficulty retrieving and synthesizing accurate time-related information.

Multi-hop Question Answering Question Answering +2

AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework

no code implementations10 Dec 2024 Meihao Fan, Ju Fan, Nan Tang, Lei Cao, Guoliang Li, Xiaoyong Du

Many of these tables are derived from web sources or real-world scenarios, which require meticulous data preparation (or data prep) to ensure accurate responses.

Code Generation Large Language Model +1

A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?

1 code implementation9 Aug 2024 Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuxin Zhang, Ju Fan, Guoliang Li, Nan Tang, Yuyu Luo

Translating users' natural language queries (NL) into SQL queries (i. e., NL2SQL, a. k. a., Text-to-SQL) can significantly reduce barriers to accessing relational databases and support various commercial applications.

Natural Language Queries Text-To-SQL

Are Large Language Models a Good Replacement of Taxonomies?

1 code implementation17 Jun 2024 Yushi Sun, Hao Xin, Kai Sun, Yifan Ethan Xu, Xiao Yang, Xin Luna Dong, Nan Tang, Lei Chen

Unfortunately, there lacks a comprehensive benchmark that evaluates the LLMs over a wide range of taxonomies from common to specialized domains and at levels from root to leaf so that we can draw a confident conclusion.

General Knowledge Knowledge Graphs

HAIChart: Human and AI Paired Visualization System

2 code implementations16 Jun 2024 Yupeng Xie, Yuyu Luo, Guoliang Li, Nan Tang

The growing importance of data visualization in business intelligence and data science emphasizes the need for tools that can efficiently generate meaningful visualizations from large datasets.

Data Visualization

Are Large Language Models Good Statisticians?

1 code implementation12 Jun 2024 Yizhang Zhu, Shiyin Du, Boyan Li, Yuyu Luo, Nan Tang

Large Language Models (LLMs) have demonstrated impressive capabilities across a range of scientific tasks including mathematics, physics, and chemistry.

In-Context Learning

CRAG -- Comprehensive RAG Benchmark

1 code implementation7 Jun 2024 Xiao Yang, Kai Sun, Hao Xin, Yushi Sun, Nikita Bhalla, Xiangsen Chen, Sajal Choudhary, Rongze Daniel Gui, Ziran Will Jiang, Ziyu Jiang, Lingkun Kong, Brian Moran, Jiaqi Wang, Yifan Ethan Xu, An Yan, Chenyu Yang, Eting Yuan, Hanwen Zha, Nan Tang, Lei Chen, Nicolas Scheffer, Yue Liu, Nirav Shah, Rakesh Wanga, Anuj Kumar, Wen-tau Yih, Xin Luna Dong

To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering benchmark of 4, 409 question-answer pairs and mock APIs to simulate web and Knowledge Graph (KG) search.

Hallucination Language Modelling +2

ChartInsights: Evaluating Multimodal Large Language Models for Low-Level Chart Question Answering

no code implementations11 May 2024 Yifan Wu, Lutao Yan, Leixian Shen, Yunhai Wang, Nan Tang, Yuyu Luo

To further explore the limitations of MLLMs in low-level ChartQA, we conduct experiments that alter visual elements of charts (e. g., changing color schemes, adding image noise) to assess their impact on the task effectiveness.

Chart Question Answering Question Answering

Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts

no code implementations14 Apr 2024 Jing-Cheng Pang, Si-Hang Yang, Kaiyuan Li, Jiaji Zhang, Xiong-Hui Chen, Nan Tang, Yang Yu

Furthermore, KALM effectively enables the LLM to comprehend environmental dynamics, resulting in the generation of meaningful imaginary rollouts that reflect novel skills and demonstrate the seamless integration of large language models and reinforcement learning.

Language Modeling Language Modelling +4

Empowering Language Models with Active Inquiry for Deeper Understanding

no code implementations6 Feb 2024 Jing-Cheng Pang, Heng-Bo Fan, Pengyuan Wang, Jia-Hao Xiao, Nan Tang, Si-Hang Yang, Chengxing Jia, Sheng-Jun Huang, Yang Yu

The rise of large language models (LLMs) has revolutionized the way that we interact with artificial intelligence systems through natural language.

Active Learning Language Modeling +2

Cost-Effective In-Context Learning for Entity Resolution: A Design Space Exploration

1 code implementation7 Dec 2023 Meihao Fan, Xiaoyue Han, Ju Fan, Chengliang Chai, Nan Tang, Guoliang Li, Xiaoyong Du

However, existing ICL approaches to ER typically necessitate providing a task description and a set of demonstrations for each entity pair and thus have limitations on the monetary cost of interfacing LLMs.

Entity Resolution In-Context Learning

SEED: Domain-Specific Data Curation With Large Language Models

no code implementations1 Oct 2023 Zui Chen, Lei Cao, Sam Madden, Tim Kraska, Zeyuan Shang, Ju Fan, Nan Tang, Zihui Gu, Chunwei Liu, Michael Cafarella

As a result, data scientists often have to develop domain-specific solutions tailored to both the dataset and the task, e. g. writing domain-specific code or training machine learning models on a sufficient number of annotated examples.

Code Generation Imputation +1

VerifAI: Verified Generative AI

no code implementations6 Jul 2023 Nan Tang, Chenyu Yang, Ju Fan, Lei Cao, Yuyu Luo, Alon Halevy

We propose that verifying the outputs of generative AI from a data management perspective is an emerging issue for generative AI.

Decision Making Knowledge Graphs +2

Interleaving Pre-Trained Language Models and Large Language Models for Zero-Shot NL2SQL Generation

1 code implementation15 Jun 2023 Zihui Gu, Ju Fan, Nan Tang, Songyue Zhang, Yuxin Zhang, Zui Chen, Lei Cao, Guoliang Li, Sam Madden, Xiaoyong Du

PLMs can perform well in schema alignment but struggle to achieve complex reasoning, while LLMs is superior in complex reasoning tasks but cannot achieve precise schema alignment.

Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data Integration

1 code implementation SIGMOD/PODS 2023 Jianhong Tu, Ju Fan, Nan Tang, Peng Wang, Guoliang Li, Xiaoyong Du, Xiaofeng Jia, Song Gao

The widely used practice is to build task-specific or even dataset-specific solutions, which are hard to generalize and disable the opportunities of knowledge sharing that can be learned from different datasets and multiple tasks.

Entity Resolution Zero-Shot Learning

ChatPipe: Orchestrating Data Preparation Program by Optimizing Human-ChatGPT Interactions

no code implementations7 Apr 2023 Sibei Chen, Hanbing Liu, Weiting Jin, Xiangyu Sun, Xiaoyao Feng, Ju Fan, Xiaoyong Du, Nan Tang

Orchestrating a high-quality data preparation program is essential for successful machine learning (ML), but it is known to be time and effort consuming.

RetClean: Retrieval-Based Data Cleaning Using Foundation Models and Data Lakes

no code implementations29 Mar 2023 Zan Ahmad Naeem, Mohammad Shahmeer Ahmad, Mohamed Eltabakh, Mourad Ouzzani, Nan Tang

To assist with this scenario, we developed a custom RoBERTa-based foundation model that can be locally deployed.

Retrieval

PASTA: Table-Operations Aware Fact Verification via Sentence-Table Cloze Pre-training

1 code implementation5 Nov 2022 Zihui Gu, Ju Fan, Nan Tang, Preslav Nakov, Xiaoman Zhao, Xiaoyong Du

In particular, on the complex set of TabFact, which contains multiple operations, PASTA largely outperforms the previous state of the art by 4. 7 points (85. 6% vs. 80. 9%), and the gap between PASTA and human performance on the small TabFact test set is narrowed to just 1. 5 points (90. 6% vs. 92. 1%).

Fact Checking Fact Verification +5

RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation

no code implementations4 Dec 2020 Nan Tang, Ju Fan, Fangyi Li, Jianhong Tu, Xiaoyong Du, Guoliang Li, Sam Madden, Mourad Ouzzani

RPT is pre-trained for a tuple-to-tuple model by corrupting the input tuple and then learning a model to reconstruct the original tuple.

All Decoder +6

Deductive Optimization of Relational Data Storage

1 code implementation8 Mar 2019 John K. Feser, Samuel Madden, Nan Tang, Armando Solar-Lezama

Optimizing the physical data storage and retrieval of data are two key database management problems.

Programming Languages Databases

Reuse and Adaptation for Entity Resolution through Transfer Learning

no code implementations28 Sep 2018 Saravanan Thirumuruganathan, Shameem A Puthiya Parambath, Mourad Ouzzani, Nan Tang, Shafiq Joty

Entity resolution (ER) is one of the fundamental problems in data integration, where machine learning (ML) based classifiers often provide the state-of-the-art results.

Entity Resolution Feature Engineering +1

DeepER -- Deep Entity Resolution

3 code implementations2 Oct 2017 Muhammad Ebraheem, Saravanan Thirumuruganathan, Shafiq Joty, Mourad Ouzzani, Nan Tang

word embeddings), we present a novel ER system, called DeepER, that achieves good accuracy, high efficiency, as well as ease-of-use (i. e., much less human efforts).

Databases

Cannot find the paper you are looking for? You can Submit a new open access paper.