Stanford Schema2QA Dataset

Introduced by Xu et al. in Schema2QA: High-Quality and Low-Cost Q&A Agents for the Structured Web

Schema2QA is the first large question answering dataset over real-world Schema.org data. It covers 6 common domains: restaurants, hotels, people, movies, books, and music, based on crawled Schema.org metadata from 6 different websites (Yelp, Hyatt, LinkedIn, IMDb, Goodreads, and last.fm.). In total, there are over 2,000,000 examples for training, consisting of both augmented human paraphrase data and high-quality synthetic data generated by Genie. All questions are annotated with executable virtual assistant programming language ThingTalk.

Schema2QA includes challenging evaluation questions collected from crowd workers. Workers are prompted with only what the domain is and what properties are supported. Thus, the sentences are natural and diverse. They also contain entities unseen during training. The collected sentences are manually annotated with ThingTalk by the authors. In total there are over 5,000 examples for dev and test.

An example of an evaluation question and its ThingTalk annotation is shown below:

"What are the highest ranked burger joints in the 40 mile area around Asheville NC?"

sort(aggregateRating.ratingValue desc of @org.schema.Restaurant.Restaurant() 
  filter distance(geo, new Location("asheville nc" )) <= 40 mi && 
         servesCuisine =~ "burger")[1] ;

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Stanford Schema2QA Dataset

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Usage

License

Modalities

Languages

Stanford Schema2QA Dataset

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages