Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Kaiming He; Xiangyu Zhang; Shaoqing Ren; Jian Sun · IEEE Transactions on Pattern Analysis and Machine Intelligence · 2015

Página del recurso

Lectura rápida. Revisá los datos básicos del recurso y luego accedé al contenido desde el botón principal. En esta ficha solo se muestra la información necesaria para identificar la obra, citarla y abrirla.

Autor / responsable

Kaiming He; Xiangyu Zhang; Shaoqing Ren; Jian Sun

Editorial

IEEE Transactions on Pattern Analysis and Machine Intelligence

Año

2015

Idioma

en

Acceso al recurso

Entrá al contenido desde la opción principal o elegí otra fuente disponible.

Acceso principal

Página del recurso

Página de referencia del recurso. El texto completo no está confirmado automáticamente.

Abrir recurso

Resumen

Descripción general del contenido del recurso.

Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224 × 224) input image. This requirement is "artificial" and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with another pooling strategy, "spatial pyramid pooling", to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. Pyramid pooling is also robust to object deformations. With these advantages, SPP-net should in general improve all CNN-based image classification methods. On the ImageNet 2012 dataset, we demonstrate that SPP-net boosts the accuracy of a variety of CNN architectures despite their different designs. On the Pascal VOC 2007 and Caltech101 datasets, SPP-net achieves state-of-the-art classification results using a single full-image representation and no fine-tuning. The power of SPP-net is also significant in object detection. Using SPP-net, we compute the feature maps from the entire image only once, and then pool features in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. This method avoids repeatedly computing the convolutional features. In processing test images, our method is 24-102 × faster than the R-CNN method, while achieving better or comparable accuracy on Pascal VOC 2007. In ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, our methods rank #2 in object detection and #3 in image classification among all 38 teams. This manuscript also introduces the improvement made for this competition.

Cómo citar

Elegí el formato que necesitás y copiá la referencia al portapapeles.

APA 7

He, K, Zhang, X, Ren, S, & Sun, J. (2015). Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. https://doi.org/10.1109/tpami.2015.2389824

MLA

He, Kaiming, et al. "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition." 2015. https://doi.org/10.1109/tpami.2015.2389824.

Chicago

He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.". https://doi.org/10.1109/tpami.2015.2389824.

Harvard

He, K. et al. 2015, Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, available at: https://doi.org/10.1109/tpami.2015.2389824 [Accessed 28 Jun. 2026].

Compartir e imprimir

Guardá la ficha, copiá su enlace permanente o imprimila como PDF.

Exportar referencia

Si usás un gestor bibliográfico, podés exportar el registro en los formatos más comunes.

RIS BibTeX

Detalles del recurso

Información bibliográfica útil para confirmar que se trata del material correcto.

Título: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Autor / colaboradores: Kaiming He; Xiangyu Zhang; Shaoqing Ren; Jian Sun

Editorial: IEEE Transactions on Pattern Analysis and Machine Intelligence

Año de publicación: 2015

Idioma: en

Materias

Explorá otros recursos relacionados a partir de estas materias.

Pooling; Pascal (unit); Artificial intelligence; Computer science; Convolutional neural network; Pattern recognition (psychology); Pyramid (geometry); Contextual image classification; Object detection; Deep learning; Feature extraction; Computer vision