We collect one million images from the Flickr (http://www.flickr.com) and use a feature extraction code (http://www.vision.ee.ethz.ch/~zhuji/felib.html) to extract a GIST feature for each image. Each image is represented by a 512-dim GIST feature vector.
Data File: contains variables 'fTrain', 'fTest' and 'trueTrainTest'. 'fTrain' is the database, which contains 999483 data points; 'fTest' consists of 1000 query points; The groundtruth is recorded in 'trueTrainTest', which is a 999483x1000 matrix that only contains 0 and 1. Each column is a groundtruth vector to the corresponding query point. It means the i-th point in database is a true neighbor to the j-th query point if the value of i-th row and j-th column in 'trueTrainTest' is 1. A point is considered to be a true neighbor if it lies in the top 2 percentile points closest to a query (measured by the Euclidian distance).
The data set has been normalized and centralized to have zero mean.