Automatically generates a JSON file containing the necessary annotation data following the Common Objects in Context (COCO) format needed to train an AI instance segmentation model.
Able to separate the image's foreground and background along with cleaning up any noise or missing gaps within the foreground.
For my Bachelor’s dissertation, I implemented an automatic instance segmentation annotator in Python using Google Colab. This project aimed to reduce the time spent annotating a dataset which would then be used to train an instance segmentation model. Typically annotating a dataset involves tediously going through images and labelling the contents within the image, which is a time-consuming process. A larger dataset will generally lead to better results, but this also means having to spend more time annotating data, delaying the creation of the AI model. This leads to having to make a compromise between time versus accuracy which is something preferably avoided.
This involves the user acquiring and organising the images which will be used as the input for generating the dataset.
Produces a binary mask version of the images which separates the foreground from the background.
Obtaining the necessary data to generate the JSON file containing the annotation data following the Common Objects in Context (COCO) format.
Combining the original images with the annotation data to train an instance segmentation model.
This process starts by picking out a random image and using it to calculate the minimum and maximum pixels which should be present in the foreground. This calculation takes into account the image’s width and height, just using one image is enough because all the images should have the same background and are therefore of equal dimensions.
minimum pixels = (image width × image height) × (L/100) maximum pixels = (image width × image height) × (H/100)
Here L represents the lower threshold of white pixels which should be present in the binary mask, while H represents the upper threshold of white pixels which should be present in the binary mask.
Following this, the images are all iterated through to train the OpenCV background separator, which functions by comparing the pixels between images. When a pixel is found to be present across different images then that pixel is considered a part of the background.
The next phase involves checking that the binary mask does not exceed the maximum white pixel count. This is in place as a high count may indicate that the subject in the image was too close to the camera, resulting in a blurred image which is unsuitable. To ensure that an image with a different background was not accidentally included the program checks the left and right borders. The amount of white pixel on the left and right borders were counted and checked against this formula:
maximum white border pixels = (image height × 2) × (X/100)
Here X represents the percentage of white pixels which should not be exceeded. Once the binary mask had passed all these checks, it was saved to a separate folder along with a copy of the original image in another folder. Once all the images had been iterated through, the folder containing the binary masks and the folder containing their corresponding original images were zipped and downloaded.
The process for producing the annotation data uses NumPy and Shapely Geometry to create a polygon around the clusters of white pixels and saves the points along these polygons. The annotation format is Common Objects in Context (COCO) this is widely used by various Convolutional Neural Networks (CNN), including Mask R-CNN which is the CNN used for this project. There are five sections which need to be filled out for the annotations file to contain all the necessary data required to train an instance segmentation model these are: info, licenses, images, annotations, and categories. The polygon and bounding box created, help in filling out the annotations section, with the other sections using data which is simply inputted beforehand or taken from the folder naming convention.
This process can take some time to execute and Google Colab has a time limit in which it will disconnect a program from the virtual environment should there be no human input after a while. To not lose progress I implemented a checkpoint system which after a certain number of images will generate a temporary text file containing the data and upload it onto Google Drive. The program will look for this file during start-up and continue from it, this checkpoint file is deleted once the program has generated the complete JSON file. It should be noted that there are different image detection approaches and each one demands different data, for this project the annotation data is catering for instance segmentation detection.
This process was tested using a dataset of images provided by BirdLife Malta. In this dataset were three different categories of birds found within the Maltese Archipelago with all the images being taken inside of caves. These birds were Scopoli’s Shearwater (998 images), Storm Petrel (3661 images) and Yelkouan Shearwater (11,889). The parameters used to create the binary masks were varied to see how this affected the output, with the findings showing that stricter parameters which resulted in less usable images produced a more accurate AI model.
The AI models trained using the generated training dataset and validation dataset were found to range between 75% - 85% accurate. The false positive of the AI models ranged between 10% - 20% while the false negative was higher with a range of 20% - 30%. This high percentage is attributed to the shortcomings during the creation of the binary masks. Some images contained birds flight and therefore their wings were blurred. This resulted in the program splitting one bird into two different clusters, and therefore being considered two different birds.
Stage 2 in which the binary masks are created produced 1.15 masks a second. Stage 3 was able to generate the annotation data for an image at the rate of 0.10 images a second. On average this program was able to fully annotate one image every 11 seconds, rivalling the time it would take to manually annotate an image.
Copyright © 2024 Kooldiagon - All Rights Reserved.
Powered by GoDaddy