Improving Text-to-SQL Evaluation Methodology

To be informative, an evaluation must measure how well systems generalize to realistic unseen data. We identify limitations of and propose improvements to current evaluations of text-to-SQL systems... (read more)

PDF Abstract ACL 2018 PDF ACL 2018 Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
SQL Parsing Academic Template Baseline Question Split 0 # 3
Query Split 0 # 3
SQL Parsing Academic Seq2Seq with copying Question Split 81 # 1
Query Split 74 # 1
SQL Parsing Advising Template Baseline Question Split 80 # 1
Query Split 0 # 2
SQL Parsing Advising Seq2Seq with copying Question Split 70 # 2
Query Split 0 # 2
SQL Parsing ATIS Template Baseline Question Split 45 # 2
Query Split 0 # 3
SQL Parsing ATIS Seq2Seq with copying Question Split 51 # 1
Query Split 32 # 1
SQL Parsing GeoQuery Seq2Seq with copying Question Split 71 # 1
Query Split 20 # 2
SQL Parsing GeoQuery Template Baseline Question Split 66 # 2
Query Split 0 # 3
SQL Parsing IMDb Template Baseline Question Split 0 # 3
Query Split 0 # 3
SQL Parsing IMDb Seq2Seq with copying Question Split 26 # 1
Query Split 9 # 1
SQL Parsing Restaurants Template Baseline Question Split 95 # 3
Query Split 0 # 3
SQL Parsing Restaurants Seq2Seq with copying Question Split 100 # 1
Query Split 4 # 2
SQL Parsing Scholar Template Baseline Question Split 52 # 2
Query Split 0 # 3
SQL Parsing Scholar Seq2Seq with copying Question Split 59 # 1
Query Split 5 # 1
SQL Parsing Yelp Seq2Seq with copying Question Split 12 # 1
Query Split 4 # 2
SQL Parsing Yelp Template Baseline Question Split 1 # 3
Query Split 0 # 3

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet