Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman
- Update – 15th September 2014
- We won ‘Best Scientific Paper Award’ at BMVC 2014!
- 15th July 2014
- Precompiled MEX files and models for computing the ConvNet features in the paper are now available (see below). Full source code is also available.
Overview
The latest generation of Convolutional Neural Networks (CNN) have achieved impressive results in challenging benchmarks on image recognition and object detection, significantly raising the interest of the community in these methods. Nevertheless, it is still unclear how different CNN methods compare with each other and with previous state-of-the-art shallow representations such as the Bag-of-Visual-Words and the Improved Fisher Vector. This work conducts a rigorous evaluation of these new techniques, exploring different deep architectures and comparing them on a common ground, identifying and disclosing important implementation details in a similar vein to our previous work on shallow encoding methods, and identifying the aspects of deep and shallow methods that can be successfully shared.
We evaluate over several datasets (PASCAL VOC 2007 and 2012, Caltech-101, Caltech-256) and our best method achieves state-of-the-art performance over all four. We release the full source code and CNN models for the experiments on this page, in the hope that it would provide good baselines for future image representation research.
Results

ILSVRC-2012 (top-5 error) |
VOC-2007 (mAP) |
VOC-2012 (mAP) |
Caltech-101 (accuracy) |
Caltech-256 (accuracy) |
|
FK IN 512 | – |
68.0 |
– |
– |
– |
CNN M 2048 | 13.5 |
80.1 |
82.3 |
– |
– |
CNN S | 13.1 |
79.6 |
82.7 |
88.54 ± 0.33 |
78.82 ± 0.31 |
CNN S TUNE-CLS | 13.1 |
– |
83.0 |
88.35 ± 0.56 |
– |
CNN S TUNE-RNK | 13.1 |
82.4 |
83.2 |
– |
– |
Zeiler & Fergus [2] | 16.1 |
– |
79.0 |
86.5 ± 0.5 |
74.2 ± 0.3 |
Razavian et al. [3], [4] | 14.7 |
77.2 |
– |
– |
– |
Oquab et al. [5] | 18 |
77.7 |
78.7 / 82.8 |
– |
– |
Software & Paper Updates
Software to compute the ConvNet features used in the paper is now available from the sofware page:
In addition to the feature computation binaries and CNN models available above, we further plan to release the full source code in the near future, along with continuing to make updates to the paper, with new versions available on the arXiv page.
Related publications
Acknowledgements
Funding is provided by the EPSRC, ERC grant VisRec no. 228180, and EU Project FP7 AXES ICT-269980. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GPUs used for this research.

