Benchmarks comparison for Jetson Nano, TX2 and AGX Xavier¶

NVIDIA® Jetson is the world's leading embedded platform for image processing and DL/AI tasks.
Its high-performance, compact size, variability and low-power computing for deep learning make it the ideal component of mobile compute-intensive projects.
NVIDIA has released a series of SBC (Single board computer) Jetson hardware modules focused on utilization in embedded vision systems and applications.
XIMEA has developed a carrier board for Jetson TX2 and offers a wide portfolio of cameras that are able to run on Jetson Nano and AGX Xavier.

Hardware features for Jetson Nano, TX2, AGX Xavier¶

The following is a brief comparison of Jetsons hardware features showing a variety of setup options for different markets.

Feature	Nano	TX2 / TX2i	Xavier
CPU (ARM)	4-core ARM A57 @ 1.43 GHz	4-core ARM Cortex-A57, 2-core Denver2 @ 2GHz	8-core ARM Carmel v.8.2 @ 2.26GHz
GPU	128-core Maxwell @ 921MHz	256-core Pascal @ 1.3GHz	512-core Volta @ 1.37GHz
Memory	4GB LPDDR4, 25.6 GB/s	8GB 128-bit LPDDR4, 58.3 GB/s	16GB 256-bit LPDDR4, 137 GB/s
Storage	MicroSD	32 GB eMMC 5.1	32 GB eMMC 5.1
Tensor cores	NA	NA	64
Video encoding	(1x) 4Kp30, (2x) 1080p60, (4x) 1080p30	(1x) 4Kp60, (3x) 4Kp30, (4x) 1080p60, (8x) 1080p30	(4x) 4Kp60, (8x) 4Kp30, (32x) 1080p30
Video decoding	(1x) 4Kp60, (2x) 4Kp30, (4x) 1080p60, (8x) 1080p30	(2x) 4Kp60, (4x) 4Kp30, (7x) 1080p60	(2x) 8Kp30, (6x) 4Kp60, (12x) 4Kp30
USB	(4x) USB 3.0 + Micro-USB 2.0	USB 3.0 + USB 2.0	(3x) USB 3.1 + (4x) USB 2.0
PCIe	4 lanes PCIe Gen 2	5 lanes PCIe Gen 2	16 lanes PCIe Gen 4
Power	5W / 10W	7.5W / 15W	10W / 15W / 30W
Size	70 x 45 mm	90 x 50 mm	100 x 87 mm

In the camera applications, the Host-to-Device transfers can be usually hidden by implementing the GPU Zero Copy or by overlapping GPU copy/compute.

Performance Comparison: Jetson Nano vs TX1 vs TX2 vs AGX Xavier¶

In order to fairly compare the performance of each module the following basic image processing tasks were chosen.
They are specific for benchmarking the camera applications: white balance, demosaic (debayer), color correction, optional resize, jpeg encoding, etc.

Hardware and software for benchmarking¶

CPU/GPU NVIDIA Jetson Nano, TX1, TX2/TX2i, AGX Xavier
OS L4T (Ubuntu 18.04)
CUDA Toolkit 10.0 for Jetson Nano, TX2/TX2i, AGX Xavier
Fastvideo SDK 0.14.2

GPU kernel times for 2K image processing (1920×1080, 8/16 bits per channel, milliseconds)¶

Algorithm and parameters	Nano	TX2 / TX2i	Xavier
Host to Device	0.2	0.2	0.05
White Balance	0.6	0.24	0.08
HQLI Debayer	1.8	0.47	0.36
DFPD Debayer	4.7	2.06	0.95
MG Debayer	12.7	5.9	2.2
Color Correction with 3×4 matrix	1.7	0.81	0.25
Resize from 2K to 960×540	10	4.3	1.5
Resize from 2K to 1919×1079	19.8	8.2	2.4
Gamma (1920×1080)	1.4	0.84	0.2
JPEG Encoding (1920×1080, 90%, 4:2:0)	4.3	1.7	0.62
JPEG Encoding (1920×1080, 90%, 4:4:4)	6.8	2.6	0.75
JPEG2000 Encoding (lossy, 32×32, single mode)	81	63	11.1
JPEG2000 Encoding (lossless, 32×32, single mode)	190	163	23.3
Device to Host	0.1	0.1	0.02

It is possible to choose a particular debayer algorithm and output compression (JPEG or JPEG2000) to define the image processing pipeline.

The Fastvideo company has also done the same kernel time measurements for NVIDIA GeForce and Quadro GPUs.
You can get that document HERE