How to Do Real-time Object Detection with SSD on Jetson TX2

Jul 30, 2018

Prerequisite:

Refer to my previous post “Multi-threaded Camera Caffe Inferencing”. But install all dependencies for python2 instead of python3. Make sure tegra-cam-threaded.py runs OK with python2 on the Jetson TX2.
Refer to my previous post “Single Shot MultiBox Detector (SSD) on Jetson TX2”. Build SSD Caffe and make sure ./examples/ssd/ssd_pascal_webcam.py runs OK on the Jetson TX2. I assume SSD Caffe has been downloaded and built at /home/nvidia/project/ssd-caffe.
Download the pre-trained SSD model for testing. By default, my camera-ssd-threaded.py script would use the ‘COCO’ SSD300 model. The download link of this model could be found at the bottom of the SSD GitHub page. Just follow the SSD300 link under 2. COCO models:. I made a copy of the link here. Go ahead and download models_VGGNet_coco_SSD_300x300.tar.gz from there, and untar the file into the SSD Caffe folder.

$ cd /home/nvidia/project/ssd-caffe
$ tar xzvf ~/Downloads/models_VGGNet_coco_SSD_300x300.tar.gz
### Verify the prototxt and caffemodel files are present
$ ls -l models/VGGNet/coco/SSD_300x300/

How to run the camera SSD sample code:

Download the camera-ssd-threaded.py source code from my GitHubGist: https://gist.github.com/jkjung-avt/605904dc05691e44a26bc57bb50d3f04

To dump help messages:

$ python camera-ssd-threaded.py --help
usage: camera-ssd-threaded.py [-h] [--rtsp] [--uri RTSP_URI]
                              [--latency RTSP_LATENCY] [--usb]
                              [--vid VIDEO_DEV] [--width IMAGE_WIDTH]
                              [--height IMAGE_HEIGHT] [--cpu]
                              [--prototxt CAFFE_PROTOTXT]
                              [--model CAFFE_MODEL] [--labelmap LABELMAP_FILE]
                              [--confidence CONF_TH]
  
This script captures and displays live camera video, and does real-time object
detection with Single-Shot Multibox Detector (SSD) in Caffe on Jetson TX2/TX1.
  
optional arguments:
  -h, --help            show this help message and exit
  --rtsp                use IP CAM (remember to also set --uri)
  --uri RTSP_URI        RTSP URI, e.g. rtsp://192.168.1.64:554
  --latency RTSP_LATENCY
                        latency in ms for RTSP [200]
  --usb                 use USB webcam (remember to also set --vid)
  --vid VIDEO_DEV       device # of USB webcam (/dev/video?) [1]
  --width IMAGE_WIDTH   image width [1280]
  --height IMAGE_HEIGHT
                        image height [720]
  --cpu                 run Caffe in CPU mode (default: GPU mode)
  --prototxt CAFFE_PROTOTXT
                        [/home/nvidia/project/ssd-
                        caffe/models/VGGNet/coco/SSD_300x300/deploy.prototxt]
  --model CAFFE_MODEL   [/home/nvidia/project/ssd-caffe/models/VGGNet/coco/SSD
                        _300x300/VGG_coco_SSD_300x300_iter_400000.caffemodel]
  --labelmap LABELMAP_FILE
                        [/home/nvidia/project/ssd-
                        caffe/data/coco/labelmap_coco.prototxt]
  --confidence CONF_TH  confidence threshold [0.3]

To do real-time object detection with the default COCO SSD model, using the Jetson onboard camera (default behavior of the python script), do the following. According to my own testing, it takes ~180ms for SSD to process each image frame on JTX2 this way. That equates to 5~6 fps.
```
$ python camera-ssd-threaded.py
```

To use USB webcam /dev/video1 instead, while setting video resolution to 1920x1080:

$ python camera-ssd-threaded.py --usb --vid 1 --width 1920 --height 1080

Or, to use an IP CAM:

$ python camera-ssd-threaded.py --rtsp --uri rtsp://admin:XXXXXX@192.168.1.64:554

We could also do real-time object detection with a different Caffe model. For example, download and untar the pre-trained VOC0712Plus SSD300 model from here. And execute the following:

$ python ./camera-ssd-threaded.py --usb \
                                  --prototxt /home/nvidia/project/ssd-caffe/models/VGGNet/VOC0712Plus/SSD_300x300_ft/deploy.prototxt
                                  --model /home/nvidia/project/ssd-caffe/models/VGGNet/VOC0712Plus/SSD_300x300_ft/VGG_VOC0712Plus_SSD_300x300_ft_iter_160000.caffemodel
                                  --labelmap /home/nvidia/project/ssd-caffe/data/VOC0712/labelmap_voc.prototxt

Additional Notes:

PIXEL_MEAN ([B, G, R] = [104.0, 117.0, 123.0]) has been hardcoded in the python script.
Input image size (300, 300) has also been hardcoded. Just modify the preprocess() function if a SSD512 (or other sizes) is used.
You can adjust --confidence (confidence threshold value: 0.0~1.0) depending on whether you prefer better ‘precision’ (less false positives) or better ‘recall’ (less missed detections).
I only tested the code with python3 (with my own modified version ssd-caffe). Feel free to report issues to me if you have trouble running with code with python2 or else.