Cost-Sensitive Feature Selection of Data with Errors

13 Dec 2012  ·  Hong Zhao, Fan Min, William Zhu ·

In data mining applications, feature selection is an essential process since it reduces a model's complexity. The cost of obtaining the feature values must be taken into consideration in many domains. In this paper, we study the cost-sensitive feature selection problem on numerical data with measurement errors, test costs and misclassification costs. The major contributions of this paper are four-fold. First, a new data model is built to address test costs and misclassification costs as well as error boundaries. Second, a covering-based rough set with measurement errors is constructed. Given a confidence interval, the neighborhood is an ellipse in a two-dimension space, or an ellipsoidal in a three-dimension space, etc. Third, a new cost-sensitive feature selection problem is defined on this covering-based rough set. Fourth, both backtracking and heuristic algorithms are proposed to deal with this new problem. The algorithms are tested on six UCI (University of California - Irvine) data sets. Experimental results show that (1) the pruning techniques of the backtracking algorithm help reducing the number of operations significantly, and (2) the heuristic algorithm usually obtains optimal results. This study is a step toward realistic applications of cost-sensitive learning.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here