CoVA (CoVA dataset for Webpage Object Detection / Information Extraction)

Introduced by Kumar et al. in CoVA: Context-aware Visual Attention for Webpage Information Extraction

We labeled 7,740 webpage screenshots spanning 408 domains (Amazon, Walmart, Target, etc.). Each of these webpages contains exactly one labeled price, title, and image. All other web elements are labeled as background. On average, there are 90 web elements in a webpage.

Webpage screenshots and bounding boxes can be obtained here

Train-Val-Test split

We create a cross-domain split which ensures that each of the train, val and test sets contains webpages from different domains. Specifically, we construct a 3 : 1 : 1 split based on the number of distinct domains. We observed that the top-5 domains (based on number of samples) were Amazon, EBay, Walmart, Etsy, and Target. So, we created 5 different splits for 5-Fold Cross Validation such that each of the major domains is present in one of the 5 splits for test data.

Homepage

Benchmarks

Add a new result Link an existing benchmark

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Webpage Object Detection	CoVA	CoVA++

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

kevalmorabia97/cova-web-object-detection

Tasks

Webpage Object Detection

CoVA (CoVA dataset for Webpage Object Detection / Information Extraction)

Train-Val-Test split

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Usage

License

Modalities

Languages

CoVA (CoVA dataset for Webpage Object Detection / Information Extraction)

Train-Val-Test split

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages