Dataset transformations trade-offs to adapt machine learning methods across domains

29 Sep 2021 · Napoleon Costilla-Enriquez, Yang Weng ·

Machine learning-based methods have been proved to be quite successful in different domains. However, applying the same techniques across disciplines is not a trivial task with benefits and drawbacks. In the literature, the most common approach is to convert a dataset into the same format as the original domain to employ the same architecture that was successful in the original domain. Although this approach is fast and convenient, we argue it is suboptimal due to the lack of tailoring to the specific problem at hand. To prove our point, we examine dataset transformations used in the literature to adapt machine learning-based methods across domains and show that these dataset transformations are not always beneficial in terms of performance. In addition, we show that these data transformations open the door to unforeseen vulnerabilities in the new applied different domain. To quantify how different the original dataset is with respect to the transformed one, we compute the dataset distances via Optimal Transport. Also, we present simulations with the original and transformed data to show that the data conversion is not always needed and exposes the new domain to unsought menaces.

PDF Abstract