Builds on top of recent data collection efforts by domain experts in these applications and provides a unified collection of datasets with evaluation metrics and train/test splits that are representative of real-world distribution shifts.
The v2.0 update adds unlabeled data to 8 datasets. The labeled data and evaluation metrics are exactly the same, so all previous results are directly comparable.
Source: WILDS: A Benchmark of in-the-Wild Distribution ShiftsPaper | Code | Results | Date | Stars |
---|