This is a benchmark dataset, IMAGENET1M, for (semi-supervised) approximate nearest neighbors search (ANNS) algorithms. IMAGENET1M contains 2048-dimensional real-valued features obtained from a deep neural network model on 10%-labeled ILSVRC-15 dataset.
IMAGENET1M has 4 parts:
A Content Based Image Retrieval (CBIR) system is essentially a semi-supervised approximate nearest neighbor search system. Some of the samples in the database have labels and we want to develop effective semi-supervised ANNS methods to provide efficient search.
However,
K | 10 | 50 | 100 |
P@K | 0.424504 | 0.39418 | 0.377997 |
In the following table, we provide the details about the dataset and the link to download the dataset.
Dataset split | Data dimension | # Data | Dowload features | MD5 | Image list | Label | |
Base | 2048 | 1,281,167 | Download base features (8.79GB) | 21a976548a419a27ac6393e8f399f346 | base_image_list | base_label | training indics (0 represents not in the train, 1 represents in the train) |
Query | 2048 | 25,000 | Download query features (175MB) | 566487cfc54650154a405efa301bb876 | query_image_list | query_label | |
Training | 2048 | 128,161 | can be generated using base feature and training indics | can be generated using base_image_list and training indics | |||
Validation | 2048 | 25,000 | Download validation features (175MB) | ecb5eea2f20e1355590bbe670006cfbc | val_image_list | val_label |
The features are in the fvecs format; the brute-force search results are in the ivecs format (more information on the fvecs and ivecs format). The image list and label are in TXT.
If you used this data set, we appreciate it very much if you can cite our following paper:
If you have any questions or sugesstions, please feel free to contact us!
Deng Cai (dengcai@gmail.com)
Xiuye Gu (gxy0922@zju.edu.cn)
Chaoqi Wang (cqwong@zju.edu.cn)