Fig.1. Jetson Nano Developer Kit
The top beneficiaries of the latest release of NVIDIA Jetson Nano hardware are the Embedded imaging applications.
Essentially, the new NVIDIA Jetson Nano is a very small, powerful enough computer with an integrated GPU that lets you run multiple neural networks in parallel for applications like image classification, object detection, segmentation, and speech processing.
The tested XIMEA camera families so far include - xiQ, xiMU and xiC models - for more details check HERE
Below are already some of the benchmarks and results from testing Image & Video Processing SDK from Fastvideo with Jetson Nano Developer Kit.
These are specific for camera applications.
Fig.2. NVIDIA Jetson Nano module
Useful links:
Jetson Nano Presentation
Jetson Nano Product Brief
Getting Started with AI on Jetson Nano
Jetson Family Presentation
According to CUDA Device Query application, the classification of the tested Jetson Nano module is NVIDIA Tegra X1 with CUDA Capability 5.3.
So it resembles Jetson TX1, but with half of CUDA Cores.
Following are NVIDIA NVENC and NVDEC benchmarks:
In the case of Jetson Nano, NVIDIA uses the Dynamic Voltage and Frequency Scaling (DVFS) approach.
This power management technology is utilized in most modern computer hardware to maximize power savings - the voltage used in a component is increased or decreased depending on external conditions.
Jetson Nano Developer Kit is configured to accept power via the Micro USB connector.
Some Micro USB power supplies are designed in a way to output slightly more than 5V to account for voltage loss across the cable.
The critical point is that the new NVIDIA Jetson Nano module requires a minimum of 4.75V to operate.
It's recommended to use a power supply capable of delivering 5V to the J28 Micro-USB connector.
There are some other power supply options for Jetson Nano.
If the total load is expected to exceed 2A, e.g., due to peripherals attached to the carrier board or due to high performance computational tasks, you have to lock the J48 Power Select pins, disable power supply via Micro USB and enable 5V-4A via the J25 power jack.
Another option is to supply 5V-6A via the J41 expansion header - two 5V pins can be used to power the developer kit at 3A each.
The NVIDIA Jetson Nano Developer Kit is equipped with a passive heatsink, to which a fan can be mounted.
Fig.3. Top View of Jetson Nano Developer Kit
NVIDIA Jetson Nano module is designed to optimize power efficiency and supports two software-defined power modes.
The default mode provides a 10W power budget for the module and the other a 5W budget.
These power modes restrain the 10W or 5W budgets by capping the GPU and CPU frequencies and the number of online CPU cores.
Individual parts of the CORE power domain, such as video encode (NVENC) and video decode (NVDEC), are not covered by these budgets.
The carrier board consumes between 0.5W (at 2A) and 1.25W (at 4A) with no peripherals attached.
According to the tests, the normal operation of the Jetson Nano Developer Kit in 10W mode requires more power than USB can offer (5V and 2A).
USB-powered NVIDIA Jetson Nano can't work continuously under heavy workload on default clock (no jetson_clocks applied).
USB-powered Jetson Nano is working perfectly in 5W mode, but with less performance.
For the below benchmark measurements the external power supply with 5V and 4A was used.
Even better performance could be achieved by supplying more power.
To manage the speed and the amount of power consumed use:
nvpmodel -m0 and jetson_clocks to get maximum performance.
The following image processing kernels, which are conventional for camera applications, were used as examples for benchmarks:
white balance, demosaic, color correction, LUT, resize, gamma, jpeg / jpeg2000 / h.264 encoding, etc.
To evaluate the total time for the chosen set of modules, GPU kernel time for each image processing module was measured.
The performance of some modules depends on image content.
CUDA initialization and GPU memory buffers allocations are not included in the benchmarks.
Usually, it is done just once, before the measurements, so it doesn't affect GPU performance.
All computations were performed with 16-bit precision.
Before JPEG compression the 16-bit data was converted to the 8-bit per channel to comply with JPEG Standard.
JPEG2000 compression benchmarks were measured for 24-bit images with 4:4:4 subsampling.
The last row of each Table shows the total values for the GPU kernel pipeline.
Table 1. NVIDIA Jetson Nano performance benchmarks for 2K raw image processing (1920×1080, 8-bit)
Algorithm and parameters | Kernel time (ms) | Performance (MB/s) | Frames per second |
Host to Device | 0.2 | 10,000 | -- |
White Balance | 0.6 | 6,500 | 1,660 |
HQLI Debayer | 1.8 | 2,200 | 550 |
DFPD Debayer | 4.7 | 850 | 212 |
MG Debayer | 12.7 | 315 | 78 |
Color Correction with 3×4 matrix | 1.7 | 7,000 | 588 |
Resize from 2K to 960×540 | 10.0 | 600 | 100 |
Resize from 2K to 1919×1079 | 19.8 | 303 | 50 |
Gamma (1920×1080) | 1.4 | 8,500 | 710 |
JPEG Encoding (1920×1080, 90%, 4:2:0) | 4.3 | 1,400 | 230 |
JPEG Encoding (1920×1080, 90%, 4:4:4) | 6.8 | 880 | 147 |
JPEG2000 (lossy, 32×32, single mode) | 81 | 74 | 12 |
JPEG2000 (lossless, 32×32, single mode) | 190 | 31 | 5 |
Device to Host | 0.1 | 10,000 | -- |
In real life camera applications, there is a possibility to eliminate Host to Device copy by utilizing Jetson Zero-Copy. In that case, an image from a camera is written via DMA directly to pinned buffer in system memory.
Pinned buffer is accessible in both CPU and GPU.
In another option, Device to Host copy could be hidden by overlapping of data transfer and computations in multi-thread applications.
NVIDIA Jetson Nano can do concurrent copy and kernel execution with 1 copy engine.
The simplest image processing pipeline for 2K images on NVIDIA Jetson Nano can reach 100 fps performance.
If for the same pipeline the H.264 encoding is utilized via hardware-based NVENC (instead of Fastvideo CUDA-based Motion JPEG encoding) you can reach a total of 120 fps, which is the limitation of H.264 encoder (NVENC) for 2K resolution.
Table 2. NVIDIA Jetson Nano performance benchmarks for 4K raw image processing (3840×2160, 8-bit)
Algorithm and parameters | Kernel time (ms) | Performance (MB/s) | Frames per second |
Host to Device | 0.8 | 10,000 | -- |
White Balance | 2.2 | 7,200 | 455 |
HQLI Debayer | 7.1 | 2,250 | 141 |
DFPD Debayer | 18.2 | 880 | 55 |
MG Debayer | 50.3 | 318 | 20 |
Color Correction with 3×4 matrix | 6.9 | 7,000 | 145 |
Resize from 4K to 1920×1080 | 39.4 | 610 | 25 |
Resize from 4K to 3839×2159 | 77.9 | 308 | 12 |
Gamma (3840×2160) | 5.7 | 8,400 | 175 |
JPEG Encoding (3840×2160, 90%, 4:2:0) | 17.1 | 1,400 | 58 |
JPEG Encoding (3840×2160, 90%, 4:4:4) | 27.3 | 880 | 36 |
JPEG2000 (lossy, 32×32, single mode) | 309 | 77 | 3 |
JPEG2000 (lossless, 32×32, single mode) | 620 | 38 | 1.6 |
Device to Host | 0.2 | 10,000 | -- |
The same image processing pipeline for 4K RAW image on NVIDIA Jetson Nano can achieve 30 fps.
If the H.264 encoding is utilized via hardware-based NVENC (instead of Fastvideo JPEG or MJPEG on GPU), the result of 30 fps will stay the same because it is the maximum for H.264 encoder (NVENC) for 4K resolution, but GPU occupancy, in that case, would be less.
It is clear that NVIDIA Jetson Nano has sufficient performance for image processing in camera applications.
For resolutions up to 4K you can get realtime performance to convert RAW to RGB with JPEG or H.264 compression.
Published here is just a small part of Jetson Nano benchmarks that were performed with Fastvideo SDK.
You can test the Fastvideo SDK with XIMEA cameras and your image processing pipeline.
Credentials
Fastvideo Blog:
https://www.fastcompression.com/blog/jetson-nano-benchmarks-image-processing.htm