Table Of Contents
This sample, sample SSD, is based on the SSD: Single Shot MultiBox Detector paper. The SSD network performs the task of object detection and localization in a single forward pass of the network. This network is built using the VGG network as a backbone and trained using PASCAL VOC 2007+ 2012 datasets.
Unlike Faster R-CNN, SSD completely eliminates proposal generation and subsequent pixel or feature resampling stages and encapsulates all computation in a single network. This makes SSD straightforward to integrate into systems that require a detection component.
This sample pre-processes the input to the SSD network and performs inference on the SSD network in TensorRT, using plugins to run layers that are not natively supported in TensorRT. Additionally, the sample can also be run in INT8 mode for which it first performs INT8 calibration and then does inference int INT8.
Specifically, this sample:
The input to the SSD network in this sample is a RGB 300x300 image. The image format is Portable PixMap (PPM), which is a netpbm color image format. In this format, the R
, G
, and B
values for each pixel are represented by a byte of integer (0-255) and they are stored together, pixel by pixel.
The authors of SSD have trained the network such that the first Convolution layer sees the image data in B
, G
, and R
order. Therefore, the channel order needs to be changed when the PPM image is being put into the network’s input buffer.
The readPPMFile
and writePPMFileWithBBox
functions read a PPM image and produce output images with red colored bounding boxes respectively.
Note: The readPPMFile
function will not work correctly if the header of the PPM image contains any annotations starting with #
.
The network is defined in a prototxt file which is shipped with the sample and located in the data/ssd
directory. The original prototxt file provided by the authors is modified and included in the TensorRT in-built plugin layers in the prototxt file.
The built-in plugin layers used in sampleSSD are Normalize, PriorBox, and DetectionOutput. The corresponding registered plugins for these layers are Normalize_TRT
, PriorBox_TRT
and NMS_TRT
.
To initialize and register these TensorRT plugins to the plugin registry, the initLibNvInferPlugins
method is used. After registering the plugins and while parsing the prototxt file, the NvCaffeParser creates plugins for the layers based on the parameters that were provided in the prototxt file automatically. The details about each parameter is provided in the README.md
and can be modified similar to the Caffe Layer parameter.
The sampleSSD sample builds a network based on a Caffe model and network description. For details on importing a Caffe model, see Importing A Caffe Model Using The C++ Parser API. The SSD network has few non-natively supported layers which are implemented as plugins in TensorRT. The Caffe parser can create plugins for these layers internally using the plugin registry.
This sample can run in FP16 and INT8 modes based on the user input. For more details, see INT8 Calibration Using C++ and Enabling FP16 Inference Using C++. The sample selects the entropy calibrator as a default choice. The CalibrationMode
parameter in the sample code needs to be set to 0
to switch to the Legacy calibrator.
For details on how to build the TensorRT engine, see Building An Engine In C++. After the engine is built, the next steps are to serialize the engine and run the inference with the deserialized engine. For more information about these steps, see Serializing A Model In C++.
After deserializing the engine, you can perform inference. To perform inference, see Performing Inference In C++.
In sampleSSD, there is a single input:
data
, namely the image inputAnd 2 outputs:
detectionOut
is the detection array, containing the image ID, label, confidence, 4 coordinateskeepCount
is the number of valid detectionsThe outputs of the SSD network are directly human interpretable. The results are organized as tuples of 7. In each tuple, the 7 elements are:
This information can be drawn in the output PPM image using the writePPMFileWithBBox
function. The kVISUAL_THRESHOLD
parameter can be used to control the visualization of objects in the image. It is currently set to 0.6, therefore, the output will display all objects with confidence score of 60% and above.
In this sample, the following layers are used. For more information about these layers, see the TensorRT Developer Guide: Layers documentation.
Activation layer The Activation layer implements element-wise activation functions. Specifically, this sample uses the Activation layer with the type kRELU
.
Concatenation layer The Concatenation layer links together multiple tensors of the same non-channel sizes along the channel dimension.
Convolution layer The Convolution layer computes a 2D (channel, height, and width) convolution, with or without bias.
Plugin layer Plugin layers are user-defined and provide the ability to extend the functionalities of TensorRT. See Extending TensorRT With Custom Layers for more details.
Pooling layer The Pooling layer implements pooling within a channel. Supported pooling types are maximum
, average
and maximum-average blend
.
Shuffle layer The Shuffle layer implements a reshape and transpose operator for tensors.
SoftMax layer The SoftMax layer applies the SoftMax function on the input tensor along an input dimension specified by the user.
sampleSSD has three plugin layers; Normalize, PriorBox and DetectionOutput. The details about each layer and its parameters is shown below in caffe.proto
format.
Due to the size of the SSD Caffe model, it is not included in the product bundle. Before you can run the sample, you’ll need to download the model, perform some configuration, and generate INT8 calibration batches.
9a795fc161fff2e8f3aed07f4d488faf
. ```bash md5sum models_VGGNet_VOC0712_SSD_300x300.tar.gz ``
Extract the archive, and copy the model file to the TensorRT
datadirectory.
``bash tar xvf models_VGGNet_VOC0712_SSD_300x300.tar.gz cp models/VGGNet/VOC0712/SSD_300x300/VGG_VOC0712_SSD_300x300_iter_120000.caffemodel <TensorRT root directory>/data/ssd cp models/VGGNet/VOC0712/SSD_300x300/deploy.prototxt <TensorRT root directory>/data/ssd/ssd.prototxt ``
In
ssd.prototxt<tt>, change allFlattenlayers to
Reshapeoperations (e.g.
type:Reshape) as TensorRT enables
Flattenby
Reshape, and add
reshape_param<tt>(like below) to each of them: ``bash reshape_param { shape { dim: 0 dim: -1 dim: 1 dim: 1 } } ``
In
ssd.prototxt<tt>, addtop: "keep_count"in
detection_out` layer as TensorRT DetectionOutput Plugin requires this output.Generate the INT8 calibration batches. The script selects 500 random JPEG images from the PASCAL VOC dataset and converts them to PPM images. These 500 PPM images are used to generate INT8 calibration batches. ```bash $TRT_SOURCE/samples/opensource/sampleSSD/PrepareINT8CalibrationBatches.sh `` **Note:** Do not move the batch files from the
<TensorRT root directory>/data/ssd/batches` directory.
If you want to use a different dataset to generate INT8 batches, use the batchPrepare.py
script and place the batch files in the <TensorRT root directory>/data/ssd/batches
directory.
make
in the <TensorRT root directory>/samples/sampleSSD
directory. The binary named sample_ssd
will be created in the <TensorRT root directory>/bin
directory. ```bash cd <TensorRT root directory>/samples/sampleSSD make `` Where
<TensorRT root directory>` is where you installed TensorRT.Verify that the sample ran successfully. If the sample runs successfully you should see output similar to the following: ``` &&&& RUNNING TensorRT.sample_ssd # ./sample_ssd [I] Begin parsing model... [I] FP32 mode running... [I] End parsing model... [I] Begin building engine... [I] [TRT] Detected 1 input and 2 output network tensors. [I] End building engine... [I] *** deserializing [I] Image name:../data/samples/ssd/bus.ppm, Label: car, confidence: 96.0587 xmin: 4.14486 ymin: 117.443 xmax: 244.102 ymax: 241.829 &&&& PASSED TensorRT.sample_ssd # ./build/x86_64-linux/sample_ssd ```
This output shows that the sample ran successfully; PASSED
.
--help
optionsTo see the full list of available options and their descriptions, use the -h
or --help
command line option.
The following resources provide a deeper understanding about how the SSD model works:
Models
Dataset
Documentation
For terms and conditions for use, reproduction, and distribution, see the TensorRT Software License Agreement documentation.
February 2019 This README.md
file was recreated, updated and reviewed.