Search Results for author: Ju Fan

Found 13 papers, 7 papers with code

CodeS: Towards Building Open-source Language Models for Text-to-SQL

1 code implementation26 Feb 2024 Haoyang Li, Jing Zhang, Hanbing Liu, Ju Fan, Xiaokang Zhang, Jun Zhu, Renjie Wei, Hongyan Pan, Cuiping Li, Hong Chen

To address the limitations, we introduce CodeS, a series of pre-trained language models with parameters ranging from 1B to 15B, specifically designed for the text-to-SQL task.

Data Augmentation Domain Adaptation +2

Cost-Effective In-Context Learning for Entity Resolution: A Design Space Exploration

1 code implementation7 Dec 2023 Meihao Fan, Xiaoyue Han, Ju Fan, Chengliang Chai, Nan Tang, Guoliang Li, Xiaoyong Du

However, existing ICL approaches to ER typically necessitate providing a task description and a set of demonstrations for each entity pair and thus have limitations on the monetary cost of interfacing LLMs.

Entity Resolution In-Context Learning

SEED: Domain-Specific Data Curation With Large Language Models

no code implementations1 Oct 2023 Zui Chen, Lei Cao, Sam Madden, Tim Kraska, Zeyuan Shang, Ju Fan, Nan Tang, Zihui Gu, Chunwei Liu, Michael Cafarella

SEED uses these generated modules to process most of the data records and dynamically decides when the LLM should step in to directly process some individual records, possibly using the data-access modules to retrieve relevant information from the data sources to assist the LLM in solving the task.

Code Generation Imputation +1

VerifAI: Verified Generative AI

no code implementations6 Jul 2023 Nan Tang, Chenyu Yang, Ju Fan, Lei Cao, Yuyu Luo, Alon Halevy

We propose that verifying the outputs of generative AI from a data management perspective is an emerging issue for generative AI.

Decision Making Knowledge Graphs +2

Interleaving Pre-Trained Language Models and Large Language Models for Zero-Shot NL2SQL Generation

1 code implementation15 Jun 2023 Zihui Gu, Ju Fan, Nan Tang, Songyue Zhang, Yuxin Zhang, Zui Chen, Lei Cao, Guoliang Li, Sam Madden, Xiaoyong Du

PLMs can perform well in schema alignment but struggle to achieve complex reasoning, while LLMs is superior in complex reasoning tasks but cannot achieve precise schema alignment.

Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data Integration

1 code implementation SIGMOD/PODS 2023 Jianhong Tu, Ju Fan, Nan Tang, Peng Wang, Guoliang Li, Xiaoyong Du, Xiaofeng Jia, Song Gao

The widely used practice is to build task-specific or even dataset-specific solutions, which are hard to generalize and disable the opportunities of knowledge sharing that can be learned from different datasets and multiple tasks.

Entity Resolution Zero-Shot Learning

ChatPipe: Orchestrating Data Preparation Program by Optimizing Human-ChatGPT Interactions

no code implementations7 Apr 2023 Sibei Chen, Hanbing Liu, Weiting Jin, Xiangyu Sun, Xiaoyao Feng, Ju Fan, Xiaoyong Du, Nan Tang

Orchestrating a high-quality data preparation program is essential for successful machine learning (ML), but it is known to be time and effort consuming.

Contextual Expressive Text-to-Speech

no code implementations26 Nov 2022 Jianhong Tu, Zeyu Cui, Xiaohuan Zhou, Siqi Zheng, Kai Hu, Ju Fan, Chang Zhou

To achieve this task, we construct a synthetic dataset and develop an effective framework.

Speech Synthesis

PASTA: Table-Operations Aware Fact Verification via Sentence-Table Cloze Pre-training

1 code implementation5 Nov 2022 Zihui Gu, Ju Fan, Nan Tang, Preslav Nakov, Xiaoman Zhao, Xiaoyong Du

In particular, on the complex set of TabFact, which contains multiple operations, PASTA largely outperforms the previous state of the art by 4. 7 points (85. 6% vs. 80. 9%), and the gap between PASTA and human performance on the small TabFact test set is narrowed to just 1. 5 points (90. 6% vs. 92. 1%).

Fact Checking Fact Verification +5

RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation

no code implementations4 Dec 2020 Nan Tang, Ju Fan, Fangyi Li, Jianhong Tu, Xiaoyong Du, Guoliang Li, Sam Madden, Mourad Ouzzani

RPT is pre-trained for a tuple-to-tuple model by corrupting the input tuple and then learning a model to reconstruct the original tuple.

Denoising Entity Resolution +4

Relational Data Synthesis using Generative Adversarial Networks: A Design Space Exploration

1 code implementation28 Aug 2020 Ju Fan, Tongyu Liu, Guoliang Li, Junyou Chen, Yuwei Shen, Xiaoyong Du

We conduct extensive experiments to explore the design space and compare with traditional data synthesis approaches.

Privacy Preserving

Crowd-Powered Data Mining

no code implementations13 Jun 2018 Chengliang Chai, Ju Fan, Guoliang Li, Jiannan Wang, Yudian Zheng

Many data mining tasks cannot be completely addressed by auto- mated processes, such as sentiment analysis and image classification.

Clustering General Classification +3

Cannot find the paper you are looking for? You can Submit a new open access paper.