State-of-the-art machine learning models are often tested on their ability to generalize materials deemed ’dissimilar’ to training data, but such definitions frequently rely on heuristics. Here, an analysis of over 700 out-of-distribution tasks reveals that heuristic-based criteria mostly test interpolation rather than true extrapolation.
- Kangming Li
- Andre Niyongabo Rubungo
- Jason Hattrick-Simpers