Jetson Nano with Embedded vision cameras - Benchmarks¶

Jetson Nano with Embedded vision cameras - Benchmarks

Fig.1. Jetson Nano Developer Kit

NVIDIA Jetson Nano - new module¶

The top beneficiaries of the latest release of NVIDIA Jetson Nano hardware are the Embedded imaging applications.
Essentially, the new NVIDIA Jetson Nano is a very small, powerful enough computer with an integrated GPU that lets you run multiple neural networks in parallel for applications like image classification, object detection, segmentation, and speech processing.

The tested XIMEA camera families so far include - xiQ, xiMU and xiC models - for more details check HERE

Below are already some of the benchmarks and results from testing Image & Video Processing SDK from Fastvideo with Jetson Nano Developer Kit.
These are specific for camera applications.

Fig.2. NVIDIA Jetson Nano module

Useful links:
Jetson Nano Presentation
Jetson Nano Product Brief
Getting Started with AI on Jetson Nano
Jetson Family Presentation

NVIDIA Jetson Nano specifications¶

According to CUDA Device Query application, the classification of the tested Jetson Nano module is NVIDIA Tegra X1 with CUDA Capability 5.3.
So it resembles Jetson TX1, but with half of CUDA Cores.

128-core Maxwell GPU (for display and computing)
Quad-core ARM A57 @ 1.43 GHz (main CPU)
4 GB LPDDR4 (rated at 25.6 GB/s)
Gigabit Ethernet
4x USB 3.0, USB 2.0 Micro-B (the Micro USB port could be utilized both for 5V power input and for data)
HDMI 2.0 & eDP 1.4 (4K monitor support, HDMI or Display Port)
Support of MIPI CSI-2 and PCIe Gen2 high-speed I/O
DC Barrel jack for 5V power input
Storage microSD
Dimensions: 100 mm × 80 mm × 29 mm (including the carrier board)

Video Encoding and Decoding Options¶

Following are NVIDIA NVENC and NVDEC benchmarks:

Video Encode:
4K at 30 fps, 4x for 1080p at 30 fps, 9x for 720p at 30 fps (H.264 / H.265)

Video Decode:
4K at 60 fps, 2x for 4K at 30 fps, 8x for 1080p at 30 fps, 18x for 720p at 30 fps (H.264 / H.265)

Hardware and software used for benchmarking¶

CPU/GPU NVIDIA Jetson Nano Developer Kit
OS L4T (Ubuntu 18.04)
JetPack 4.2 with CUDA Toolkit 10.0
Fastvideo SDK 0.14.1

NVIDIA Jetson Nano Power Consumption and Power Management¶

In the case of Jetson Nano, NVIDIA uses the Dynamic Voltage and Frequency Scaling (DVFS) approach.
This power management technology is utilized in most modern computer hardware to maximize power savings - the voltage used in a component is increased or decreased depending on external conditions.

Jetson Nano Developer Kit is configured to accept power via the Micro USB connector.
Some Micro USB power supplies are designed in a way to output slightly more than 5V to account for voltage loss across the cable.
The critical point is that the new NVIDIA Jetson Nano module requires a minimum of 4.75V to operate.
It's recommended to use a power supply capable of delivering 5V to the J28 Micro-USB connector.

There are some other power supply options for Jetson Nano.
If the total load is expected to exceed 2A, e.g., due to peripherals attached to the carrier board or due to high performance computational tasks, you have to lock the J48 Power Select pins, disable power supply via Micro USB and enable 5V-4A via the J25 power jack.
Another option is to supply 5V-6A via the J41 expansion header - two 5V pins can be used to power the developer kit at 3A each.
The NVIDIA Jetson Nano Developer Kit is equipped with a passive heatsink, to which a fan can be mounted.

Fig.3. Top View of Jetson Nano Developer Kit

NVIDIA Jetson Nano module is designed to optimize power efficiency and supports two software-defined power modes.
The default mode provides a 10W power budget for the module and the other a 5W budget.
These power modes restrain the 10W or 5W budgets by capping the GPU and CPU frequencies and the number of online CPU cores.
Individual parts of the CORE power domain, such as video encode (NVENC) and video decode (NVDEC), are not covered by these budgets.

The carrier board consumes between 0.5W (at 2A) and 1.25W (at 4A) with no peripherals attached.
According to the tests, the normal operation of the Jetson Nano Developer Kit in 10W mode requires more power than USB can offer (5V and 2A).
USB-powered NVIDIA Jetson Nano can't work continuously under heavy workload on default clock (no jetson_clocks applied).
USB-powered Jetson Nano is working perfectly in 5W mode, but with less performance.

For the below benchmark measurements the external power supply with 5V and 4A was used.
Even better performance could be achieved by supplying more power.

To manage the speed and the amount of power consumed use:
nvpmodel -m0 and jetson_clocks to get maximum performance.

NVIDIA Jetson Nano Benchmark Performance Clarification¶

The following image processing kernels, which are conventional for camera applications, were used as examples for benchmarks:
white balance, demosaic, color correction, LUT, resize, gamma, jpeg / jpeg2000 / h.264 encoding, etc.

To evaluate the total time for the chosen set of modules, GPU kernel time for each image processing module was measured.
The performance of some modules depends on image content.
CUDA initialization and GPU memory buffers allocations are not included in the benchmarks.
Usually, it is done just once, before the measurements, so it doesn't affect GPU performance.

All computations were performed with 16-bit precision.
Before JPEG compression the 16-bit data was converted to the 8-bit per channel to comply with JPEG Standard.
JPEG2000 compression benchmarks were measured for 24-bit images with 4:4:4 subsampling.
The last row of each Table shows the total values for the GPU kernel pipeline.

Table for 2K RAW¶

Table 1. NVIDIA Jetson Nano performance benchmarks for 2K raw image processing (1920×1080, 8-bit)

Algorithm and parameters	Kernel time (ms)	Performance (MB/s)	Frames per second
Host to Device	0.2	10,000	--
White Balance	0.6	6,500	1,660
HQLI Debayer	1.8	2,200	550
DFPD Debayer	4.7	850	212
MG Debayer	12.7	315	78
Color Correction with 3×4 matrix	1.7	7,000	588
Resize from 2K to 960×540	10.0	600	100
Resize from 2K to 1919×1079	19.8	303	50
Gamma (1920×1080)	1.4	8,500	710
JPEG Encoding (1920×1080, 90%, 4:2:0)	4.3	1,400	230
JPEG Encoding (1920×1080, 90%, 4:4:4)	6.8	880	147
JPEG2000 (lossy, 32×32, single mode)	81	74	12
JPEG2000 (lossless, 32×32, single mode)	190	31	5
Device to Host	0.1	10,000	--

In real life camera applications, there is a possibility to eliminate Host to Device copy by utilizing Jetson Zero-Copy. In that case, an image from a camera is written via DMA directly to pinned buffer in system memory.
Pinned buffer is accessible in both CPU and GPU.
In another option, Device to Host copy could be hidden by overlapping of data transfer and computations in multi-thread applications.
NVIDIA Jetson Nano can do concurrent copy and kernel execution with 1 copy engine.

The simplest image processing pipeline for 2K images on NVIDIA Jetson Nano can reach 100 fps performance.
If for the same pipeline the H.264 encoding is utilized via hardware-based NVENC (instead of Fastvideo CUDA-based Motion JPEG encoding) you can reach a total of 120 fps, which is the limitation of H.264 encoder (NVENC) for 2K resolution.

Table for 4K RAW¶

Table 2. NVIDIA Jetson Nano performance benchmarks for 4K raw image processing (3840×2160, 8-bit)

Algorithm and parameters	Kernel time (ms)	Performance (MB/s)	Frames per second
Host to Device	0.8	10,000	--
White Balance	2.2	7,200	455
HQLI Debayer	7.1	2,250	141
DFPD Debayer	18.2	880	55
MG Debayer	50.3	318	20
Color Correction with 3×4 matrix	6.9	7,000	145
Resize from 4K to 1920×1080	39.4	610	25
Resize from 4K to 3839×2159	77.9	308	12
Gamma (3840×2160)	5.7	8,400	175
JPEG Encoding (3840×2160, 90%, 4:2:0)	17.1	1,400	58
JPEG Encoding (3840×2160, 90%, 4:4:4)	27.3	880	36
JPEG2000 (lossy, 32×32, single mode)	309	77	3
JPEG2000 (lossless, 32×32, single mode)	620	38	1.6
Device to Host	0.2	10,000	--

The same image processing pipeline for 4K RAW image on NVIDIA Jetson Nano can achieve 30 fps.
If the H.264 encoding is utilized via hardware-based NVENC (instead of Fastvideo JPEG or MJPEG on GPU), the result of 30 fps will stay the same because it is the maximum for H.264 encoder (NVENC) for 4K resolution, but GPU occupancy, in that case, would be less.

Summary¶

It is clear that NVIDIA Jetson Nano has sufficient performance for image processing in camera applications.
For resolutions up to 4K you can get realtime performance to convert RAW to RGB with JPEG or H.264 compression.

Published here is just a small part of Jetson Nano benchmarks that were performed with Fastvideo SDK.
You can test the Fastvideo SDK with XIMEA cameras and your image processing pipeline.

Credentials
Fastvideo Blog:
https://www.fastcompression.com/blog/jetson-nano-benchmarks-image-processing.htm