- Perform a Histogram of Oriented Gradients (HOG) feature extraction on a labeled training set of images and train a classifier Linear SVM classifier
- Apply a color transform and append binned color features, as well as histograms of color, to your HOG feature vector.
- Normalize your features and randomize a selection for training and testing.
- Implement a sliding-window technique and use the trained classifier to search for vehicles in images.
- Run the pipeline on a video stream and create a heat map of recurring detections frame by frame to reject outliers and follow detected vehicles.
- Estimate a bounding box for vehicles detected.
Histogram of Oriented Gradients (HOG)
1. Extract HOG features from the training images
extract_features function calls
get_hog_features function to extract HoG features from training images.
I started by reading in all the
non-vehicle images. Below is shown few example images from the
Next, I experimented with different color spaces as well as different sklearn.hog parameters like
cell_per_block and so on. Eventually I narrowed down the parameter values so as to achieve best accuracy on test data.
YUV color space which looks like this:
Here is an example using the
YUV color space and HOG parameters of
pixels_per_cell=(16, 16) and
Figure below shows Histogram Feature of
2. Choice of HOG parameters
I experimented with different values of each parameter, each time evaluating the accuracy on the test set and correctness on the given test video. Later tuning for some parameters was required when working with the actual project video.
Train a classifier
From the Raw image data, I extracted the 3-channel spatial color, histogram and HoG features. After spliting the entire feature data into test and train data, I created a Linear SVM object. This SVM was trained using 80% of the total data to result in 98.56 % accuracy on Test set. All immediate predictions were on point. This part is marked as
Training a SVM in the
Sliding Window Search
1. Sliding window search
Instead of using sliding window of different sizes, I used different scaling of the feature image. Thus with a fixed window size of 64 pixels, I iterated over every 2 cell steps. All 3 type of Features were extracted along each window and fed to the
predict function of my SVC after realigning. If prediction was 1 (i.e. Car), I recorded the window. I experimented with different scales and decided to use gradually increasing scale as we move towards the bottom of the image. However, the overlap was fixed to 2 cell steps.
The image area covered by different scales is shown below.
2. Testing and performance optimizations
Sliding the window over the entire image is time_consuming and redundant. Since car can only appear at the bottom half of any image, I constarined my window search to only bottom part. Moreover, the cars appear to be smaller at the upper part as compared to the bottom-most part. Therefore I used different scaling throughout the bottom half of the test image. This reduced the number of window iterations dramatically and improved performance of classifier.
Here are some example images:
Filter and Combine
find_cars return a list of all bounding boxes where a car was found. However, this list contains many false positives. To tackle this, I first combined all overlapping boxes by adding on top of each other to generate a heatmap of detections. This is implemented in function
add_heat. Stronger the value of heat at point, more are the chances that multiple windows overlapped to give a positive detection at this point. Thus I thresholded this heatmap in
apply_threshold function to only keep strong detections.
label function returns all the bounding boxes from the thresholded heatmap. These boxes are then plotted using
Following image was taken from a processed Video. Noisy Heatmap is shown in the top-left corner and Filtered Heatmap is shown in the top-right corner:
Here’s an example result showing the heatmap from a series of frames of video, the result of
scipy.ndimage.measurements.label() and the bounding boxes then overlaid on the last frame of video:
Here are six frames and their corresponding Noisy detections, heatmaps and label or filtered output:
The only thing that took most of my time was tuning the HoG parameters for optimum detection. One case where my pipeline is likely to fail is when the plane of the road is different then the project video. As I have truncated the top half part of the frame, any vehicle in that part won’t be detected. Apart from this, lightning conditions might affect the detection.