Hard ImageNet Dataset

Classes

The following are the 15 classes in Hard ImageNet.

Each image has an object segmentation. Also, all training images are ranked based on the strength of the spurious cues present. This allows for the selection of balanced subsets (i.e. where spurious correlations are broken).

With our richly annotated dataset and benchmark, we hope the community can begin to consider new training and evaluation paradigms for faithful image classification under suboptimal data conditions; that is, predicting for the right reasons, even when our data is riddled with spurious cues.

BENCHMARK

We present three families of metrics for assessing the spurious feature reliance of models performing Hard ImageNet classification.

Ablation: removing the object from an image should result in lower accuracy, but when spurious features are relied upon, accuracy remains high after ablation. We ablate in multiple ways, and use accuracy drop as a proxy for how well a model attends to the object.

Relative Foreground Sensitivity: the degree to which model performance drops due to corruption of a region proxies the model sensitivity to that region. We add noise in foregrounds and backgrounds, and compare accuracy drops in a normalized way to determine foreground sensitivity.

Saliency Alignment: the intersection-over-union of GradCAM saliencies with object segmentations gives a notion of how well models recognize object regions as salient to classfication.

1 / 3

2 / 3

3 / 3

❮ ❯

Our benchmark includes ablation, noise-based, and saliency analyses to assess whether models predict because of the object region, or if they rely instead on spurious features. Compared to more typical data (as represented by RIVAL10), Hard ImageNet classification induces far greater spurious feature reliance.

Download

From Github (recommended)

Follow the setup instructions in this repository to download the data and code for the dataset object and evaluation benchmark.

Directly From Box

Download the dataset directly from box, or use the following code to download the data on remote server:

curl -L 'https://app.box.com/index.php?rm=box_download_shared_file&shared_name=ca7qlcfsqlfqul9rzgtuqhb2c6pm62qd&file_id=f_972129165893' -o hardImageNet.zip
    
unzip hardImageNet.zip

Citation

Please cite our paper if Hard ImageNet is of use to you.

@misc{moayeri2022hard, title = {Hard ImageNet: Segmentations for Objects with Strong Spurious Cues}, author = {Moayeri, Mazda and Singla, Sahil and Feizi, Soheil}, booktitle = {openreview}, month = {June}, year = {2022}, }