Semantic Segmentation for Autonomous Vehicles

Implementing multiple segmentation architectures and comparing the results

Project Description:

Autonomous vehicles are an incredibly large industry, with an international market size estimated at USD 121.78 billion in 2022. Scene understanding is extremely important for autonomous driving safety and consistency. Semantic segmentation is one technique used for scene understanding. Cameras record roads, pedestrians, buildings, and other important objects, and each pixel within an image is assigned to a different object class.

Each color represents a different object class. Note how pedestrians are all the same color, and how that is a different color from the road.

This is fundamentally a machine learning problem, as there is no known way to do this with non-deep-learning methods. However, even with deep learning, data becomes an issue. A lack of good data often limits deep learning models from performing well. Thus, the purpose of this project is to determine which segmentation models perform best given the same training and test data.

Method:

After significant literature review, a few models stood out. FCN, U-Net, U-Net++, DeepLab, GCN, and SegNet were among those that were interesting. All of these models are filled to the brim with convolutional neural networks, which are able to extract data through a variety of different filters.

The SegNet architecture. Source: https://arxiv.org/abs/1511.00561

The chosen dataset was a subset of the CityScapes dataset. 2600 images were used for training, 300 for validation, and 500 for testing post-training.

An example of the data taken from the CityScapes dataset. Source: https://www.kaggle.com/datasets/dansbecker/cityscapes-image-pairs

U-net, U-net++, and SegNet were tested to see which would perform the best. Standard data augmentation techniques were used, including mirroring, darkening/brightening, and blurring images to bolster the amount of training data.

On the left is the U-Net architecture. On the right is the U-Net++ architecture. Source: https://arxiv.org/abs/1807.10165v1

After a few hours of training, the following training and validation losses were presented by Tensorflow.

The training and validation loss of all different segmentation models tested.

Pixel Accuracy was the chosen metric for comparison, and the following pixel accuracies were calculated:

Per pixel accuracy of the different segmentation algorithms.

The U-Net++ ended up having the best results, followed by U-Net and SegNet. However, at the same time, the U-Net++ had the most trainable parameters and also took the longest to train. So, it’s definitely a tradeoff. However, because training only happens once and memory is not too much of a problem nowadays, U-Net++ is the segmentation algorithm to use!