Region Based Convolutional Neural Networks

Region-based Convolutional Neural Networks (R-CNN) are a family of machine learning models for computer vision, and specifically object detection and localization.^[1] The original goal of R-CNN was to take an input image and produce a set of bounding boxes as output, where each bounding box contains an object and also the category (e.g. car or pedestrian) of the object. In general, R-CNN architectures perform selective search^[2] over feature maps outputted by a CNN.

R-CNN has been extended to perform other computer vision tasks, such as: tracking objects from a drone-mounted camera,^[3] locating text in an image,^[4] and enabling object detection in Google Lens.^[5]

Mask R-CNN is also one of seven tasks in the MLPerf Training Benchmark, which is a competition to speed up the training of neural networks.^[6]

^ Zhang, Aston; Lipton, Zachary; Li, Mu; Smola, Alexander J. (2024). "14.8. Region-based CNNs (R-CNNs)". Dive into deep learning. Cambridge New York Port Melbourne New Delhi Singapore: Cambridge University Press. ISBN 978-1-009-38943-3.
^ Uijlings, J. R. R.; van de Sande, K. E. A.; Gevers, T.; Smeulders, A. W. M. (2013-09-01). "Selective Search for Object Recognition". International Journal of Computer Vision. 104 (2): 154–171. doi:10.1007/s11263-013-0620-5. ISSN 1573-1405.
^ Nene, Vidi (Aug 2, 2019). "Deep Learning-Based Real-Time Multiple-Object Detection and Tracking via Drone". Drone Below. Retrieved Mar 28, 2020.
^ Ray, Tiernan (Sep 11, 2018). "Facebook pumps up character recognition to mine memes". ZDNET. Retrieved Mar 28, 2020.
^ Sagar, Ram (Sep 9, 2019). "These machine learning methods make google lens a success". Analytics India. Retrieved Mar 28, 2020.
^ Mattson, Peter; et al. (2019). "MLPerf Training Benchmark". arXiv:1910.01500v3 [math.LG].

[:0-1] Zhang, Aston; Lipton, Zachary; Li, Mu; Smola, Alexander J. (2024). "14.8. Region-based CNNs (R-CNNs)". Dive into deep learning. Cambridge New York Port Melbourne New Delhi Singapore: Cambridge University Press. ISBN 978-1-009-38943-3.

[:1-2] Uijlings, J. R. R.; van de Sande, K. E. A.; Gevers, T.; Smeulders, A. W. M. (2013-09-01). "Selective Search for Object Recognition". International Journal of Computer Vision. 104 (2): 154–171. doi:10.1007/s11263-013-0620-5. ISSN 1573-1405.

[3] Nene, Vidi (Aug 2, 2019). "Deep Learning-Based Real-Time Multiple-Object Detection and Tracking via Drone". Drone Below. Retrieved Mar 28, 2020.

[4] Ray, Tiernan (Sep 11, 2018). "Facebook pumps up character recognition to mine memes". ZDNET. Retrieved Mar 28, 2020.

[5] Sagar, Ram (Sep 9, 2019). "These machine learning methods make google lens a success". Analytics India. Retrieved Mar 28, 2020.

[6] Mattson, Peter; et al. (2019). "MLPerf Training Benchmark". arXiv:1910.01500v3 [math.LG].

[1]

[2]

[3]

[4]

[5]

[6]