A Large-scale Study on Training Sample Memorization in Generative Modeling

1 Jan 2021 · Ching-Yuan Bai, Hsuan-Tien Lin, Colin Raffel, Wendy Kan ·

Many recent developments on generative models for natural images have relied on heuristically-motivated metrics that can be easily gamed by memorizing a small sample from the true distribution or training a model directly to improve the metric. In this work, we critically evaluate the gameability of the benchmarking procedure by running a competition which ultimately resulted in participants attempting to cheat. Our competition received over 11000 submitted models which allowed us to investigate memorization-aware metrics for measuring generative model performance. Specifically, we propose the Memorization-Informed Frechet Inception Distance (MiFID) and discuss ways to ensure that winning submissions were based on genuine improvements in perceptual quality. We evaluate the effectiveness of our benchmark by manually inspecting the code for the 1000 top-performing models and labeling different forms of memorization that were intentionally or unintentionally used. To facilitate future work on benchmarking generative models, we release generated images and our labels for these models as well as code to compute the MiFID metric.

PDF Abstract