EasySpider: A No-Code Visual System for Crawling the Web

The web is a treasure trove for data that is increasingly used by computer scientists for building large machine learning models as well as non-computer scientists for social studies or marketing analyses. As such, web-crawling is an essential tool for both computational and non-computational scientists to conduct research. However, most of the existing web crawler frameworks and software products either require professional coding skills without an easy-to-use graphic user interface or are expensive and limited in features. They are thus not friendly to newbies and inconvenient for complicated web-crawling tasks. In this paper, we present an easy-to-use visual web crawler system, EasySpider, for designing and executing web crawling tasks without coding. The workflow of a new web crawling task can be visually programmed by following EasySpider’s visual wizard on the target webpages using an intuitive point-and-click interface. The generated crawler task can then be easily invoked locally or as a web service. Our EasySpider is cross-platform and flexible to adapt to different web-resources. It also supports advanced configuration for complicated tasks and extension. The whole system is open-sourced and transparent for free-access at GitHub, which avoids possible privacy leakage.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods