Context-aware Visual Attention-based (CoVA) webpage object detection pipeline

Introduced by Kumar et al. in CoVA: Context-aware Visual Attention for Webpage Information Extraction

Context-Aware Visual Attention-based end-to-end pipeline for Webpage Object Detection (CoVA) aims to learn function f to predict labels y = [$y_1, y_2, ..., y_N$] for a webpage containing N elements. The input to CoVA consists of: 1. a screenshot of a webpage, 2. list of bounding boxes [x, y, w, h] of the web elements, and 3. neighborhood information for each element obtained from the DOM tree.

This information is processed in four stages: 1. the graph representation extraction for the webpage, 2. the Representation Network (RN), 3. the Graph Attention Network (GAT), and 4. a fully connected (FC) layer.

The graph representation extraction computes for every web element i its set of K neighboring web elements $N_i$. The RN consists of a Convolutional Neural Net (CNN) and a positional encoder aimed to learn a visual representation $v_i$ for each web element i ∈ {1, ..., N}. The GAT combines the visual representation $v_i$ of the web element i to be classified and those of its neighbors, i.e., $v_k$ ∀k ∈ $N_i$ to compute the contextual representation $c_i$ for web element i. Finally, the visual and contextual representations of the web element are concatenated and passed through the FC layer to obtain the classification output.

Source: CoVA: Context-aware Visual Attention for Webpage Information Extraction

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Object Detection	1	50.00%
Webpage Object Detection	1	50.00%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Fast R-CNN	Object Detection Models
GAT	Graph Models

Categories

Add Remove

Object Detection Models

Webpage Object Detection Pipeline