Categories
Algorithms Computer Vision Machine Learning

YOLO Alternatives for Object detection

Finding a best Yolo Alternative could be hard when everyone is obsessed with Yolo now days due to its good performance and easy to use method. But if you are like me who like to explore other options or just curious to know what other YOLO Object detection alternatives available, you are in a right place.

In todays’ article we will be looking on few alternative options which you can utilize for object detection, and which are also easy to implement and easy to use. If you are looking for some resources, the PaperWithCode for object detection is a good place to learn the latest trends and research regarding the object detection algorithms. There is also very wonderful Digital Ocean article about the Object detection which you can check with explains the Object detection pipeline.

The R-CNN Family: R-CNN, Fast R-CNN, and Faster R-CNN

R-CNN family is one of the pioneer family in the field of computer vision for Object detection. This family started from the R-CNN algorithm for object detection back in 2014 and improves its performance in their upcoming variants which make it more and more fast and efficient. Here is the little breakdown of each for better understanding. Here is very good summary and detailed article about it.

Evolution of R-CNN Models

R-CNN

R-CNN was introduced in 2014. The full form of R-CNN is Region based Convolutional Neural Network). It is pioneering object detection model and introduced by Ross Girshick and their colleagues. It marked a paradigm shift in object detection by proposing regions and analyzing each region separately. It combines the Region Proposal Methods with CNN(Convolutional Neural Network) to effectively detect objects from the image and hence get his name of R-CNN.

How the R-CNN Works?

There are 4 basic steps in R-CNN. First of all, it does the Region Proposal, which is a method for selective search to generate around 2000 region proposals (regions of interest or ROIs) from and input image which are supposed to contain objects in them. Then comes the Feature extraction part, here for each of the region is resized to a fixed size and passed through a pre-trained CNN which could be like (AlexNet or VGG) which is used to extract the features from the image which is later used for Classification. The Classification is simple SVM (support Vector Machines) which classify each region into one of the object categories or simple as backgrounds. Now with the results and the applied threshold, we get lot number of regions, and we have to refine these regions which could be called the Bounding box Regression, so a simple bounding box regression later applied to these regions to refine their coordinates for the proposed regions to better fit the objects.

How R-CNN works?

Fast R-CNN

Over the improvement of the R-CNN there the Fast R-CNN was introduced which improves the speed of the traditional R-CNN object detection algorithm. It passes the entire image through a CNN once and then use ROL Pooling to maintain the high accuracy.

Faster R-CNN

Even the Fast R-CNN was not fast enough so the Faster R-CNN algorithm was proposed to overcome the limitations of Fast R-CNN object detection algorithm. It introduced the method of Region Proposal Network (RPN), which makes the detection pipeline end to end trainable for almost Realtime performance.

Other Object Detection Famous Alternatives

Here is the list of other famous alternatives for object detection to Yolo. I will come back to them later and extend this article as I get time, So this article will be on going article. Until I complete it.

  • Spatial Pyramid Pooling (SPP-net)
  • Mobilenet ssd v2 (It has potential to run on Raspberry Pi with faster speed then yolov3-tiny.
  • RetinaNet is good specially for faces
  • EfficientDet is a good choice if you want some model to work well with various hardware configurations.
  • Detectron2 by Facebook AI Research

By Abdul Rehman

My name is Abdul Rehman and I love to do Reasearch in Embedded Systems, Artificial Intelligence, Computer Vision and Engineering related fields. With 10+ years of experience in Research and Development field in Embedded systems I touched lot of technologies including Web development, and Mobile Application development. Now with the help of Social Presence, I like to share my knowledge and to document everything I learned and still learning.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.