Search Results for author: Longxu Dou

Found 25 papers, 17 papers with code

Can Large Language Models Understand You Better? An MBTI Personality Detection Dataset Aligned with Population Traits

1 code implementation17 Dec 2024 Bohan Li, Jiannan Guan, Longxu Dou, Yunlong Feng, Dingzirui Wang, Yang Xu, Enbo Wang, Qiguang Chen, Bichen Wang, Xiao Xu, Yimeng Zhang, Libo Qin, Yanyan Zhao, Qingfu Zhu, Wanxiang Che

In this paper, we optimize the task by constructing MBTIBench, the first manually annotated high-quality MBTI personality detection dataset with soft labels, under the guidance of psychologists.

SailCompass: Towards Reproducible and Robust Evaluation for Southeast Asian Languages

1 code implementation2 Dec 2024 Jia Guo, Longxu Dou, Guangtao Zeng, Stanley Kok, Wei Lu, Qian Liu

In this paper, we introduce SailCompass, a reproducible and robust evaluation benchmark for assessing Large Language Models (LLMs) on Southeast Asian Languages (SEA).

Multiple-choice

In-Context Transfer Learning: Demonstration Synthesis by Transferring Similar Tasks

1 code implementation2 Oct 2024 Dingzirui Wang, Xuanliang Zhang, Qiguang Chen, Longxu Dou, Xiao Xu, Rongyu Cao, Yingwei Ma, Qingfu Zhu, Wanxiang Che, Binhua Li, Fei Huang, Yongbin Li

To address this, inspired by transfer learning, we propose In-Context Transfer Learning (ICTL), which synthesizes target task demonstrations by transferring labeled demonstrations from similar source tasks.

In-Context Learning Transfer Learning

DAC: Decomposed Automation Correction for Text-to-SQL

1 code implementation16 Aug 2024 Dingzirui Wang, Longxu Dou, Xuanliang Zhang, Qingfu Zhu, Wanxiang Che

Therefore, in this paper, we propose to employ the decomposed correction to enhance text-to-SQL performance.

Entity Linking Text-To-SQL

FLEXTAF: Enhancing Table Reasoning with Flexible Tabular Formats

1 code implementation16 Aug 2024 Xuanliang Zhang, Dingzirui Wang, Longxu Dou, Baoxin Wang, Dayong Wu, Qingfu Zhu, Wanxiang Che

Most existing methods employ a fixed tabular format to represent the table, which could limit the performance.

Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies

1 code implementation18 Jul 2024 Chaofan Tao, Qian Liu, Longxu Dou, Niklas Muennighoff, Zhongwei Wan, Ping Luo, Min Lin, Ngai Wong

We investigate how vocabulary size impacts LLM scaling laws by training models ranging from 33M to 3B parameters on up to 500B characters with various vocabulary configurations.

ARC

RegMix: Data Mixture as Regression for Language Model Pre-training

1 code implementation1 Jul 2024 Qian Liu, Xiaosen Zheng, Niklas Muennighoff, Guangtao Zeng, Longxu Dou, Tianyu Pang, Jing Jiang, Min Lin

With the fitted regression model, we simulate the top-ranked mixture and use it to train a large-scale model with orders of magnitude more compute.

Common Sense Reasoning Language Modeling +3

Sailor: Open Language Models for South-East Asia

3 code implementations4 Apr 2024 Longxu Dou, Qian Liu, Guangtao Zeng, Jia Guo, Jiahui Zhou, Wei Lu, Min Lin

We present Sailor, a family of open language models ranging from 0. 5B to 7B parameters, tailored for South-East Asian (SEA) languages.

Language Modeling Language Modelling +2

Improving Demonstration Diversity by Human-Free Fusing for Text-to-SQL

1 code implementation16 Feb 2024 Dingzirui Wang, Longxu Dou, Xuanliang Zhang, Qingfu Zhu, Wanxiang Che

Currently, the in-context learning method based on large language models (LLMs) has become the mainstream of text-to-SQL research.

Diversity In-Context Learning +1

Enhancing Numerical Reasoning with the Guidance of Reliable Reasoning Processes

no code implementations16 Feb 2024 Dingzirui Wang, Longxu Dou, Xuanliang Zhang, Qingfu Zhu, Wanxiang Che

Numerical reasoning is an essential ability for NLP systems to handle numeric information.

A Survey of Table Reasoning with Large Language Models

1 code implementation13 Feb 2024 Xuanliang Zhang, Dingzirui Wang, Longxu Dou, Qingfu Zhu, Wanxiang Che

In this paper, we analyze the mainstream techniques used to improve table reasoning performance in the LLM era, and the advantages of LLMs compared to pre-LLMs for solving table reasoning.

Survey

Exploring Equation as a Better Intermediate Meaning Representation for Numerical Reasoning

1 code implementation21 Aug 2023 Dingzirui Wang, Longxu Dou, Wenbin Zhang, Junyu Zeng, Wanxiang Che

So in this paper, we try to use equations as IMRs to solve the numerical reasoning task by addressing two problems: (1) Theoretically, how to prove that the equation is an IMR with higher generation accuracy than programs; (2) Empirically, how to improve the generation accuracy of equations with LLMs.

GSM8K

Controllable Data Augmentation for Context-Dependent Text-to-SQL

no code implementations27 Apr 2023 Dingzirui Wang, Longxu Dou, Wanxiang Che

In this paper, we introduce ConDA, which generates interactive questions and corresponding SQL results.

Data Augmentation Diversity +1

MixPro: Simple yet Effective Data Augmentation for Prompt-based Learning

no code implementations19 Apr 2023 Bohan Li, Longxu Dou, Yutai Hou, Yunlong Feng, Honglin Mu, Qingfu Zhu, Qinghua Sun, Wanxiang Che

Prompt-based learning has shown considerable promise in reformulating various downstream tasks as cloze problems by combining original input with a predetermined template.

Data Augmentation Few-Shot Learning +1

From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning

1 code implementation17 Apr 2023 Qian Liu, Fan Zhou, Zhengbao Jiang, Longxu Dou, Min Lin

Empirical results on various benchmarks validate that the integration of SQL execution leads to significant improvements in zero-shot scenarios, particularly in table reasoning.

MMLU Zero-shot Generalization

Towards Knowledge-Intensive Text-to-SQL Semantic Parsing with Formulaic Knowledge

1 code implementation3 Jan 2023 Longxu Dou, Yan Gao, Xuqi Liu, Mingyang Pan, Dingzirui Wang, Wanxiang Che, Dechen Zhan, Min-Yen Kan, Jian-Guang Lou

In this paper, we study the problem of knowledge-intensive text-to-SQL, in which domain knowledge is necessary to parse expert questions into SQL queries over domain-specific tables.

Text-To-SQL

A Survey on Table-and-Text HybridQA: Concepts, Methods, Challenges and Future Directions

no code implementations27 Dec 2022 Dingzirui Wang, Longxu Dou, Wanxiang Che

Table-and-text hybrid question answering (HybridQA) is a widely used and challenging NLP task commonly applied in the financial and scientific domain.

Question Answering Survey

MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing

1 code implementation27 Dec 2022 Longxu Dou, Yan Gao, Mingyang Pan, Dingzirui Wang, Wanxiang Che, Dechen Zhan, Jian-Guang Lou

Text-to-SQL semantic parsing is an important NLP task, which greatly facilitates the interaction between users and the database and becomes the key component in many human-computer interaction systems.

Benchmarking Text-To-SQL

UniSAr: A Unified Structure-Aware Autoregressive Language Model for Text-to-SQL

1 code implementation15 Mar 2022 Longxu Dou, Yan Gao, Mingyang Pan, Dingzirui Wang, Wanxiang Che, Dechen Zhan, Jian-Guang Lou

Existing text-to-SQL semantic parsers are typically designed for particular settings such as handling queries that span multiple tables, domains or turns which makes them ineffective when applied to different settings.

Language Modeling Language Modelling +1

HIT-SCIR at MRP 2020: Transition-based Parser and Iterative Inference Parser

no code implementations CONLL 2020 Longxu Dou, Yunlong Feng, Yuqiu Ji, Wanxiang Che, Ting Liu

This paper describes our submission system (HIT-SCIR) for the CoNLL 2020 shared task: Cross-Framework and Cross-Lingual Meaning Representation Parsing.

Cannot find the paper you are looking for? You can Submit a new open access paper.