Search Results for author: Ju Fan

Found 13 papers, 7 papers with code

CodeS: Towards Building Open-source Language Models for Text-to-SQL

1 code implementation • 26 Feb 2024 • Haoyang Li, Jing Zhang, Hanbing Liu, Ju Fan, Xiaokang Zhang, Jun Zhu, Renjie Wei, Hongyan Pan, Cuiping Li, Hong Chen

To address the limitations, we introduce CodeS, a series of pre-trained language models with parameters ranging from 1B to 15B, specifically designed for the text-to-SQL task.

Data Augmentation Domain Adaptation +2

Paper
Code

Cost-Effective In-Context Learning for Entity Resolution: A Design Space Exploration

1 code implementation • 7 Dec 2023 • Meihao Fan, Xiaoyue Han, Ju Fan, Chengliang Chai, Nan Tang, Guoliang Li, Xiaoyong Du

However, existing ICL approaches to ER typically necessitate providing a task description and a set of demonstrations for each entity pair and thus have limitations on the monetary cost of interfacing LLMs.

Entity Resolution In-Context Learning

Paper
Code

SEED: Domain-Specific Data Curation With Large Language Models

no code implementations • 1 Oct 2023 • Zui Chen, Lei Cao, Sam Madden, Tim Kraska, Zeyuan Shang, Ju Fan, Nan Tang, Zihui Gu, Chunwei Liu, Michael Cafarella

SEED uses these generated modules to process most of the data records and dynamically decides when the LLM should step in to directly process some individual records, possibly using the data-access modules to retrieve relevant information from the data sources to assist the LLM in solving the task.

Code Generation Imputation +1

Paper
Add Code

VerifAI: Verified Generative AI

no code implementations • 6 Jul 2023 • Nan Tang, Chenyu Yang, Ju Fan, Lei Cao, Yuyu Luo, Alon Halevy

We propose that verifying the outputs of generative AI from a data management perspective is an emerging issue for generative AI.

Decision Making Knowledge Graphs +2

Paper
Add Code

Interleaving Pre-Trained Language Models and Large Language Models for Zero-Shot NL2SQL Generation

1 code implementation • 15 Jun 2023 • Zihui Gu, Ju Fan, Nan Tang, Songyue Zhang, Yuxin Zhang, Zui Chen, Lei Cao, Guoliang Li, Sam Madden, Xiaoyong Du

PLMs can perform well in schema alignment but struggle to achieve complex reasoning, while LLMs is superior in complex reasoning tasks but cannot achieve precise schema alignment.

Paper
Code

Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data Integration

1 code implementation • SIGMOD/PODS 2023 • Jianhong Tu, Ju Fan, Nan Tang, Peng Wang, Guoliang Li, Xiaoyong Du, Xiaofeng Jia, Song Gao

The widely used practice is to build task-specific or even dataset-specific solutions, which are hard to generalize and disable the opportunities of knowledge sharing that can be learned from different datasets and multiple tasks.

Entity Resolution Zero-Shot Learning

Paper
Code

ChatPipe: Orchestrating Data Preparation Program by Optimizing Human-ChatGPT Interactions

no code implementations • 7 Apr 2023 • Sibei Chen, Hanbing Liu, Weiting Jin, Xiangyu Sun, Xiaoyao Feng, Ju Fan, Xiaoyong Du, Nan Tang

Orchestrating a high-quality data preparation program is essential for successful machine learning (ML), but it is known to be time and effort consuming.

Paper
Add Code

Contextual Expressive Text-to-Speech

no code implementations • 26 Nov 2022 • Jianhong Tu, Zeyu Cui, Xiaohuan Zhou, Siqi Zheng, Kai Hu, Ju Fan, Chang Zhou

To achieve this task, we construct a synthetic dataset and develop an effective framework.

Speech Synthesis

Paper
Add Code

PASTA: Table-Operations Aware Fact Verification via Sentence-Table Cloze Pre-training

1 code implementation • 5 Nov 2022 • Zihui Gu, Ju Fan, Nan Tang, Preslav Nakov, Xiaoman Zhao, Xiaoyong Du

In particular, on the complex set of TabFact, which contains multiple operations, PASTA largely outperforms the previous state of the art by 4. 7 points (85. 6% vs. 80. 9%), and the gap between PASTA and human performance on the small TabFact test set is narrowed to just 1. 5 points (90. 6% vs. 92. 1%).

Ranked #2 on Table-based Fact Verification on TabFact