JetPack-4.3 for Jetson Nano

Dec 23, 2019

NVIDIA JetPack-4.3 - L4T 32.3.1 was officially released on 2019-12-18. There were 2 significant updates in this JetPack release: OpenCV 4.1.1 and TensorRT 6 (6.0.1) (previously TensorRT 5). I tested most of my development scripts and demo programs with this new JetPack release on my Jetson Nano DevKit as soon as I could.

1. Basic set-up

Reference: Setting up Jetson Nano: The Basics

### Set proper environment variables
$ mkdir ${HOME}/project
$ cd ${HOME}/project
$ git clone https://github.com/jkjung-avt/jetson_nano.git
$ cd jetson_nano
$ ./install_basics.sh
$ source ${HOME}/.bashrc

By default, a swap space of 2GB was already created by JetPack-4.3. So I chose not to create an additional swap file on the SD Card partition. If you feel that you need more virtual memory space, you could create a swap file manually. You could refer to my “Setting up Jetson Nano: The Basics” (as referenced above) blog post for how to do that.

2. Making sure python3 ‘cv2’ is working

Reference: Installing OpenCV 3.4.6 on Jetson Nano

One very nice thing about this JetPack-4.3 is that it already comes with a relatively new version of OpenCV (properly compiled with GStreamer support), so we no longer need to compile OpenCV by ourselves!

I did the following to make sure system library dependencies and python module dependencies were OK.

### Install dependencies for python3 'cv2'
$ sudo apt-get update
$ sudo apt-get install -y build-essential make cmake cmake-curses-gui \
                          git g++ pkg-config curl libfreetype6-dev \
                          libcanberra-gtk-module libcanberra-gtk3-module
$ sudo apt-get install -y python3-dev python3-testresources python3-pip
$ sudo pip3 install -U pip Cython
$ cd ${HOME}/project/jetson_nano
$ ./install_protobuf-3.8.0.sh
$ sudo pip3 install numpy matplotlib

Then I tested my tegra-cam.py script with a USB webcam, and it worked OK.

### Test tegra-cam.py (using a USB webcam)
$ cd ${HOME}/project
$ wget https://gist.githubusercontent.com/jkjung-avt/86b60a7723b97da19f7bfa3cb7d2690e/raw/3dd82662f6b4584c58ba81ecba93dd6f52c3366c/tegra-cam.py
$ python3 tegra-cam.py --usb --vid 0

NOTE: Since SSD Caffe could not be compiled against OpenCV-4.x.x, I just skipped testing of Caffe.

3. Installing tensorflow-1.15.0

[EDIT] NVIDIA has release tensorflow-1.15.0 wheel for JetPack-4.3, so you probably no longer need to build it by yourself. Just follow the steps in the Official TensorFlow for Jetson Nano !!! post and do sudo pip3 install --pre --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v43 tensorflow-gpu==1.15.0+nv19.12. If you do that, you could skip to step #4.

Referece: Building TensorFlow 1.12.2 on Jetson Nano

According to the Release Notes, TensorRT 6 is compatible with tensorflow-1.14.0. So I first checked the official tensorflow wheels (1.14.0 and 1.15.0) provided by NVIDIA. But unfortunately I quickly found these wheels were no good since they were built for TensorRT 5 (TF-TRT wouldn’t work…).

Disappointed, I decided to modify my install_tensorflow script for tensorflow-1.15.0. Here is how I ran the script. (NOTE: I already installed protobuf-3.8.0 in the previous step.) The script took probably ~40 hours to finish.

$ cd ${HOME}/project/jetson_nano
$ ./install_bazel-0.26.1.sh
$ ./install_tensorflow-1.15.0.sh

4. Testing TF-TRT SSD models

Reference: Testing TF-TRT Object Detectors on Jetson Nano

Check out the code and install dependencies.

$ cd ${HOME}/project
$ git clone --recursive https://github.com/jkjung-avt/tf_trt_models
$ cd tf_trt_models
$ ./install.sh

When I first ran the camera_tf_trt.py script, I encountered this error and core dump: F tensorflow/core/util/device_name_utils.cc:92] Check failed: IsJobName(job) (link). After a few tries, I found I just could not place ‘NonMaxSupression (NMS)’ operation on CPU, using tensorflow-1.15.0 and TensorRT 6 on my Jetson Nano. (NOTE: ‘NMS’ runs faster on CPU than GPU for Jetson Nano/TX2.) To work around the issue, I made the following changes to the code.

diff --git a/utils/od_utils.py b/utils/od_utils.py
index 2755bb5..b8ebe1b 100644
--- a/utils/od_utils.py
+++ b/utils/od_utils.py
@@ -52,7 +52,9 @@ def build_trt_pb(model_name, pb_path, download_dir='data'):
             get_egohands_model(model_name)
     frozen_graph_def, input_names, output_names = build_detection_graph(
         config=config_path,
-        checkpoint=checkpoint_path
+        checkpoint=checkpoint_path,
+        force_nms_cpu=False,
+        force_frcn2_cpu=False,
     )
     trt_graph_def = trt.create_inference_graph(
         input_graph_def=frozen_graph_def,
@@ -77,8 +79,8 @@ def load_trt_pb(pb_path):
             node.device = '/device:GPU:0'
         if 'faster_rcnn_' in pb_path and 'SecondStage' in node.name:
             node.device = '/device:GPU:0'
-        if 'NonMaxSuppression' in node.name:
-            node.device = '/device:CPU:0'
+        #if 'NonMaxSuppression' in node.name:
+        #    node.device = '/device:CPU:0'
     with tf.Graph().as_default() as trt_graph:
         tf.import_graph_def(trt_graph_def, name='')
     return trt_graph

After that, I was able to optimize the SSD models with TF-TRT. Inference speed (FPS) was slightly worse than my previous result with TensorRT 5 (JetPack-4.2), though. As stated above, I think it was mainly due to placement of NMS operations on GPU (not a typo)…

$ cd ${HOME}/project/tf_trt_models
$ python3 camera_tf_trt.py --image examples/detection/data/huskies.jpg \
                           --model ssd_mobilenet_v1_coco --build
$ python3 camera_tf_trt.py --image examples/detection/data/huskies.jpg \
                           --model ssd_mobilenet_v2_coco --build
$ python3 camera_tf_trt.py --image examples/detection/data/huskies.jpg \
                           --model ssd_inception_v2_coco --build

5. Testing TensorRT UFF SSD models

Reference #1: TensorRT UFF SSD

Reference #2: Speeding Up TensorRT UFF SSD

I needed to make some minor changes to the code for it to work for both TensorRT 6 and TensorRT 5. I’ve committed the changes to my jkjung-avt/tensorrt_demos repository. So I could just do the following to optimize the SSD models with TensorRT and then run the demo.

$ cd ${HOME}/project
$ git clone https://github.com/jkjung-avt/tensorrt_demos.git
$ cd tensorrt_demos
$ cd ssd
$ ./install.sh
$ ./build_engines.sh
$ cd ..
$ python3 trt_ssd.py --image ${HOME}/project/tf_trt_models/examples/detection/data/huskies.jpg \
                     --model ssd_mobilenet_v1_coco
$ python3 trt_ssd.py --image ${HOME}/project/tf_trt_models/examples/detection/data/huskies.jpg \
                     --model ssd_mobilenet_v2_coco

Inference speed (FPS) was similar to previous test result using TensorRT 5. However, the code would spit out this error message, [TensorRT] ERROR: Could not register plugin creator: FlattenConcat_TRT in namespace. According to NVIDIA, this is a known issue which should be fixed in a future version of TensorRT.

Conclusion (preliminary)

Overall, JetPack-4.3 seems to work as expected. The update of OpenCV-4.1.1 is really nice. However, although NVIDIA boasts TensorRT 6 improves NN model inference performance by 25%, it doesn’t seem to make much difference for the tests I did on Jetson Nano…

What are your test results of JetPack-4.3? Do feel free to leave a comment below.