JetPack-4.4 for Jetson Nano
2020-07-12 update: JetPack 4.4 - L4T R32.4.3 production release has been formally released. I have tested the latest SD Card image and updated this post accordingly.
I wrote about JetPack-4.3 soon after it was released late last year. Now that JetPack-4.4 is released, I have created this post about it. This post is somewhat similar to the previous one.
You could refer to this announcement for a clear summary of NVIDIA software updates in JetPack-4.4. From my own point of view, the most significant updates (comparing to JetPack-4.3) are CUDA 10.2, cuDNN 8 and TensorRT 7 (previously CUDA 10.0, cuDNN 7 and TensorRT 6). Upgrades of these libraries might result in better CNN/DNN inferencing performance on the Jetson platforms.
I break down my steps of setting up and testing JetPack-4.4 on Jetson Nano as follows.
1. Basic set-up
Reference: Setting up Jetson Nano: The Basics
I downloaded the JetPack-4.4 image, “Jetson Nano Developer Kit SD Card Image”, from the official Getting Started With Jetson Nano Developer Kit page and “etched” the image onto my microSD card. After inserting the microSD card and booting up Jetson Nano with the image, I finished Ubuntu set-up process and, for simplicity, created an account named “nvidia”. The initial set-up took a few minutes, and the Jetson Nano reboot itself. Afterwards, I made sure internet connection was working on the device.
Note: For some reason, the Jetson Nano got stuck in Linux kernel booting process the 1st time. I just power cycled it and the Linux kernel boot up OK the 2nd time around.
Tip #1: When the Jetson Nano boot up to Desktop, I would hit Ctrl-Alt-T to bring up a terminal and then lock the terminal on the Launcher (on the left-hand side).
Tip #2: In the command terminal, I would type and execute “chromium-browser” to bring up the web browser. Again, I would lock it onto Launcher for easier use later on.
Then I ran the following commands to do basic set-up of the system.
$ sudo apt update
###
### Set proper environment variables
$ mkdir ${HOME}/project
$ cd ${HOME}/project
$ git clone https://github.com/jkjung-avt/jetson_nano.git
$ cd jetson_nano
$ ./install_basics.sh
$ source ${HOME}/.bashrc
Note: By default, a swap space of 2GB was already created by JetPack-4.4. So I chose not to create an additional swap file on the SD Card partition. If you feel that you need more virtual memory space, you could create a swap file manually. You could refer to my “Setting up Jetson Nano: The Basics” (as referenced above) blog post for how to do that.
2. Making sure python3 “cv2” is working
I used the built-in OpenCV-4.1.1 in JetPack-4.4. I did the following for system library dependencies and python module dependencies. NOTE: For “protobuf” libraries, instead of doing apt install, I recommend installing the newer version (3.8.0) using my “install_protobuf-3.8.0.sh” script as shown below. The “protobuf” libraries could have a noticeable effect on the performance (inference speed) of tensorflow or else.
### Install dependencies for python3 "cv2"
$ sudo apt update
$ sudo apt install -y build-essential make cmake cmake-curses-gui \
git g++ pkg-config curl libfreetype6-dev \
libcanberra-gtk-module libcanberra-gtk3-module \
python3-dev python3-pip
$ sudo pip3 install -U pip==20.2.1 Cython testresources setuptools
$ cd ${HOME}/project/jetson_nano
$ ./install_protobuf-3.8.0.sh
$ sudo pip3 install numpy==1.16.1 matplotlib==3.2.2
Then I tested my tegra-cam.py script with a USB webcam, and made sure the python3 “cv2” module could capture and display images properly.
### Test tegra-cam.py (using a USB webcam)
$ cd ${HOME}/project
$ wget https://gist.githubusercontent.com/jkjung-avt/86b60a7723b97da19f7bfa3cb7d2690e/raw/3dd82662f6b4584c58ba81ecba93dd6f52c3366c/tegra-cam.py
$ python3 tegra-cam.py --usb --vid 0
3. Installing tensorflow-1.15.2
Reference (official documentation from NVIDIA): Installing TensorFlow For Jetson Platform
At the time of this writing, NVIDIA has provided pip wheel files for both tensorflow-1.15.2 and tensorflow-2.2.0 (link). I used 1.15.2 since my TensorRT Demo #3: SSD only works for tensorflow-1.x.
To install tensorflow, I just followed instructions on the official documentation, but skipped installation of “protobuf”. (I already built and installed “protobuf-3.8.0” in the opencv section.)
$ sudo apt install -y libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev \
zip libjpeg8-dev liblapack-dev libblas-dev gfortran
$ sudo pip3 install -U numpy==1.16.1 future==0.18.2 mock==3.0.5 h5py==2.10.0 \
keras_preprocessing==1.1.1 keras_applications==1.0.8 \
gast==0.2.2 futures pybind11
$ sudo pip3 install --pre --extra-index-url \
https://developer.download.nvidia.com/compute/redist/jp/v44 \
tensorflow==1.15.2
At this point, I tested and made sure “import tensorflow as tf” worked OK in python3.
4. Testing TensorRT GoogLeNet and MTCNN
Reference: Demo #1: GoogLeNet and Demo #2: MTCNN
### Clone the tensorrt_demos code
$ cd ${HOME}/project
$ git clone https://github.com/jkjung-avt/tensorrt_demos.git
###
### Build TensorRT engine for GoogLeNet
$ cd ${HOME}/project/tensorrt_demos/googlenet
$ make
$ ./create_engine
###
### Build TensorRT engines for MTCNN
$ cd ${HOME}/project/tensorrt_demos/mtcnn
$ make
$ ./create_engines
###
### Build the "pytrt" Cython module
$ cd ${HOME}/project/tensorrt_demos
$ sudo pip3 install Cython
$ make
Test TensorRT GoogLeNet.
$ python3 trt_googlenet.py --usb 0
Test TensorRT MTCNN.
$ python3 trt_mtcnn.py --usb 0
5. Testing TensorRT UFF SSD models
Reference: Demo #3: SSD
### Install dependencies and build TensorRT engines
$ cd ${HOME}/project/tensorrt_demos/ssd
$ ./install.sh
$ ./build_engines.sh
Test TensorRT SSD models.
$ cd ${HOME}/project/tensorrt_demos
$ python3 trt_ssd_async.py --usb 0 --model ssd_mobilenet_v1_coco
$ python3 trt_ssd_async.py --usb 0 --model ssd_mobilenet_v2_coco
6. Testing TensorRT ONNX YOLOv3 model
Reference: Demo #4: YOLOv3
### Install dependencies and build TensorRT engine
$ sudo pip3 install onnx==1.9.0
$ cd ${HOME}/project/tensorrt_demos/plugins
$ make
$ cd ${HOME}/project/tensorrt_demos/yolo
$ ./download_yolo.sh
$ python3 yolo_to_onnx.py -m yolov3-416
$ python3 onnx_to_tensorrt.py -v -m yolov3-416
Test TensorRT YOLOv3 (416x416) model.
$ cd ${HOME}/project/tensorrt_demos
$ python3 trt_yolo.py --usb 0 -m yolov3-416
Test results
Overall, I was pretty happy to see that my tensorrt_demos worked on JetPack-4.4 almost without any modifications. Although NVIDIA has announced that Caffe Parser and UFF Parser were deprecated since TensorRT 7.0.0 release, my TensorRT GoogLeNet (Caffe), MTCNN (Caffe) and SSD (UFF) code were still working.
One other important goal of my testing was to see whether CNN inference speed was improved with this new JetPack release (newer versions of CUDA, cuDNN and TensorRT). Based on my current observation, FPS (frames per second) numbers of my TensorRT GoogLeNet, MTCNN and SSD models were roughly the same as before (JetPack-4.3). But I did see better FPS number for the TensorRT YOLOv3-416 model (3.17 -> 3.75).
My preliminary guess is maybe TensorRT 7 could improve inferencing performance for “larger models” (but not so much for the smaller models?). I’d share more if I have more findings later on.