Preparing datasets is an important step in any deep learning program. Embedded vision and/or a multi camera setup offer the chance to gather high quality data for applications in almost all areas of life.
AI (Artificial Intelligence) based applications are becoming more and more popular in various fields nowadays.
They already solve tasks with different levels of complexity, often being faster and more reliable than humans.
This article is focused on implementations which are based on image and video processing like UAV control, self-driving cars, driverless trains, boats or mobile robots.
Any of the automated systems mentioned above will need a lot of data for inference purposes to learn how to behave.
Getting this high quality data for a particular set of situations is a crucial starting point, posing an important question: where or how to obtain such dataset.
Using just any standard training set is a possibility, but it does not usually correspond to the real situation which requires to be managed.
Feeding the neural network with such material will not provide enough confidence that the system will behave correctly.
Most of such data setups, especially the complex ones, are therefore built on a collection of real life examples.
For autonomous cars, this would mean the practical installation of the necessary number of specific embedded camera models, thus creating a multi camera setup on the vehicle, and running a plethora of recordings.
Deep learning sets will then depend on a particular camera and image processing algorithms and such a camera system generates some artifacts.
Which is why the following contents and aspects need to be considered when assembling a camera setup for data gathering:
For example, in the case of NVIDIA DALI project, the workflow starting point is to utilize a standard image database.
Decode JPEG images and then apply several image processing transforms to train the network on changed images which could be derived from the original set via the following operations:
This could be an artificial way how to significantly increase the number of images in the database.
It a virtual increase, but images are not the same and such an approach turns can be useful.
In fact, something like this can be done for video as well by getting video in RAW and then choosing different sets of parameters for GPU-based RAW processing to multiply new image series.
Provided the original RAW video is of high enough quality, many more different videos can be prepared for use in neural network training. Such GPU-based RAW processing takes minimum time.
Combining XIMEA embedded cameras for video recordings and Fastvideo SDK for raw image/video processing the following can be achieved:
This is also the approach to simulate through software different lighting conditions in terms of exposure control and spectral characteristics of illumination.
Possible to simulate are various lenses and orientations, so the total number of new videos/pictures for training could be increased in magnitude.
There is no need to save these processed videos, they can be generated on-the-fly by doing realtime RAW processing on GPU.
These are the basics of how to prepare a dataset for deep learning and what type of equipment is needed for a multi camera setup.
Credentials
Fastvideo Blog:
https://www.fastcompression.com/blog/ai-video-training.htm