The Food Recall Incidents dataset consists of 7,546 short texts (from 5 to 360 characters each), which are the titles of food recall announcements (therefore referred to as title), crawled from 24 public food safety authority websites by Agroknow. The texts are written in 6 languages, with English (6,644) and German (888) being the most common, followed by French (8), Greek (4), Italian (1) and Danish (1). Most of the texts have been authored after 2010 and they describe recalls of specific food products due to specific hazards. Experts manually classified each text to four groups of classes describing hazards and products on two levels of granularity:

  • hazard: fine-grained description of the hazards mentioned in the texts comprising 261 classes;
  • hazard-category: categorized version of the hazard classification task comprising 10 classes;
  • product: fine-grained description of the products mentioned in the texts comprising 1,256 classes;
  • product-category: categorized version of the product classification task comprising 22 classes.

The columns hazard-title and product-title comprise character spans, generated based on feature importance of a Logistic Regression (LR) classifier. These signify parts of the title that are important for hazard and product classification. Due to their very low support for many classes, the fine-grained tasks of hazard and product classification may require further pre-processing (e.g. label clustering or filtering), dependent on the application. The dataset comprises also metadata, such as the release date of the text (year, month, day), the language of the text (language), and the country of issue (country).

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks