no code implementations • 30 Oct 2023 • Mayana Pereira, Meghana Kshirsagar, Sumit Mukherjee, Rahul Dodhia, Juan Lavista Ferres, Rafael de Sousa
To the best of our knowledge, our work is the first that: (i) proposes a training and evaluation framework that does not assume that real data is available for testing the utility and fairness of machine learning models trained on synthetic data; (ii) presents the most extensive analysis of synthetic data set generation algorithms in terms of utility and fairness when used for training machine learning models; and (iii) encompasses several different definitions of fairness.
no code implementations • 13 Oct 2022 • Mayana Pereira, Sikha Pentyala, Anderson Nascimento, Rafael T. de Sousa Jr., Martine De Cock
Legal and ethical restrictions on accessing relevant data inhibit data science research in critical domains such as health, finance, and education.
no code implementations • 15 Jun 2021 • Mayana Pereira, Meghana Kshirsagar, Sumit Mukherjee, Rahul Dodhia, Juan Lavista Ferres
Diferentially private (DP) synthetic datasets are a powerful approach for training machine learning models while respecting the privacy of individual data providers.
no code implementations • 5 Oct 2020 • Mayana Pereira, Rahul Dodhia, Hyrum Anderson, Richard Brown
With such restrictions in place, the development of CSAM machine learning detection systems based on file metadata uncovers several opportunities.