../

TLDR Summary

How many FPS can you expect to process when working with Darknet, YOLO, and OpenCV?

Device FPS Method
Beaglebone 0.005 FPS Darknet (CPU)
Raspberry Pi 2.4 FPS OpenCV DNN (CPU)
Jetson Nano 16.9 FPS OpenCV DNN + CUDA
Jeson Xavier NX 46.4 FPS OpenCV DNN + CUDA
Jetson AGX 71.5 FPS Darknet + CUDA
NVIDIA RTX 2070 209.7 FPS OpenCV DNN + CUDA
See the full table for the details and the information on resizing frames.

What Was Tested

The original soccer video is 1920x1080 @ 29.97 FPS. It has 1080 frames for a total length of 36 seconds.

For these tests, the video was pre-processed to stretch each frame to exactly 416x416 @ 29.97 FPS. This way the timing test measures the length of time it takes to apply the neural network (and save the results), not the amount of time spent stretching each video frame.

A total of 140 video frames were annotated in DarkMark to create the YOLOv3-tiny 416x416 neural network. I saved a screenshot of the training options.

If you want to run some additional timing tests and compare the results, the files you'll need are:

All tests were run using DarkHelp version 1.3.10-1, Darknet hash #aef928c from 2021-09-28, and either OpenCV v4.1.1 or OpenCV v4.5.3.

The command used to run the timing tests is one of the following, depending on the test:

DarkHelp --driver darknet --json --autohide off --tiles off soccer_kids* soccer_input_416x416.mp4 DarkHelp --driver opencvcpu --json --autohide off --tiles off soccer_kids* soccer_input_416x416.mp4 DarkHelp --driver opencv --json --autohide off --tiles off soccer_kids* soccer_input_416x416.mp4

Here is an example of what that looks like:

> DarkHelp --driver opencv --json --autohide off --tiles off soccer_kids* soccer_input_416x416.mp4 -> config file: soccer_kids.cfg -> weights file: soccer_kids_best.weights -> names file: soccer_kids.names -> driver: OpenCV DNN ***EXPERIMENTAL*** -> loading network took 1723.114 milliseconds -> neural network dimensions: 416x416 -> output directory: . -> looking for image and video files -> found 1 file #1/1: loading "soccer_input_416x416.mp4" 29.97 FPS, 1080 frames, 416x416 -> 416x416, 0m 36s processing frame 1080/1080 (100% @ 207.1 FPS)

Result Details

For most devices, there are 4 test scenarios:

  1. Darknet compiled to use the CPU
  2. Darknet compiled to use CUDA
  3. OpenCV DNN compiled to use the CPU
  4. OpenCV DNN compiled to use CUDA

The older version of OpenCV v4.1.1 that comes on the NVIDIA Jetson images is not compiled with CUDA support. For this reason, the Jetson devices were manually upgraded to OpenCV v4.5.3. (See these scripts to get started.)

In addition, the Jetson devices were tested with Jetson Clocks both disabled and enabled to determine the difference this makes.

The highest FPS values for each device is highlighted to make it easy to find in the table.

Device Jetson Clocks OS OpenCV CPU Darknet (CPU) Darknet (CUDA) OpenCV DNN (CPU) OpenCV DNN (CUDA)
Time (ms) FPS Time (ms) FPS Time (ms) FPS Time (ms) FPS
Beaglebone n/a Debian 9 Stretch 3.2.0 ARMv7 x 1 @ 1 GHz 62 hours 0.005 FPS CUDA not available DNN not available DNN not available
Raspberry Pi 4 n/a Ubuntu 20.04.3 4.2.0 ARM x 4 @ ? GHz 38 minutes 0.5 FPS CUDA not available 449929 2.4 FPS CUDA not available
Jetson Nano disabled Ubuntu 18.04.6 4.1.1 ARMv8 x 4 @ ? GHz ? ? 81655 13.2 FPS ? ? CUDA not available
enabled Ubuntu 18.04.6 4.1.1 ARMv8 x 4 @ 1.5 GHz ? ? 72873 14.8 FPS ? ? CUDA not available
disabled Ubuntu 18.04.6 4.5.3 ARMv8 x 4 @ ? GHz 1427781 0.8 FPS 80867 13.4 FPS 369439 2.9 FPS 69293 15.6 FPS
enabled Ubuntu 18.04.6 4.5.3 ARMv8 x 4 @ 1.5 GHz 1448414 0.7 FPS 73158 14.8 FPS 367345 2.9 FPS 64016 16.9 FPS
Jetson Xavier NX disabled Ubuntu 18.04.6 4.1.1 ARMv8 x 4 @ 1.9 GHz 379265 2.8 FPS 30903 34.9 FPS 156581 6.9 FPS CUDA not available
enabled Ubuntu 18.04.6 4.1.1 ARMv8 x 6 @ 1.4 GHz 360043 3.0 FPS 24440 44.2 FPS 155879 6.9 FPS CUDA not available
disabled Ubuntu 18.04.6 4.5.3 ARMv8 x 4 @ 1.9 GHz 353190 3.1 FPS 30215 35.7 FPS 150356 7.2 FPS 29881 36.1 FPS
enabled Ubuntu 18.04.6 4.5.3 ARMv8 x 6 @ 1.4 GHz 346793 3.1 FPS 24528 44.0 FPS 147004 7.3 FPS 23272 46.4 FPS
Jetson AGX disabled Ubuntu 18.04.6 4.1.1 ARMv8 x 8 @ ? GHz 182048 5.9 FPS 22204 48.6 FPS 72043 15.0 FPS CUDA not available
enabled Ubuntu 18.04.6 4.1.1 ARMv8 x 8 @ 2.3 GHz 181224 6.0 FPS 15151 71.3 FPS 71508 15.1 FPS CUDA not available
disabled Ubuntu 18.04.6 4.5.3 ARMv8 x 8 @ ? GHz 181211 6.0 FPS 22223 48.6 FPS 70424 15.3 FPS 20972 51.5 FPS
enabled Ubuntu 18.04.6 4.5.3 ARMv8 x 8 @ 2.3 GHz 177985 6.1 FPS 15108 71.5 FPS 68709 15.7 FPS 16883 64.0 FPS
Virtualbox VM n/a Ubuntu 20.04.3 4.2.0 Intel i7 x 16 @ 3.2 GHz 156626 6.9 FPS CUDA not available 27383 39.4 FPS CUDA not available
RTX 2070 n/a Ubuntu 18.04.6 4.5.3 Intel i7 x 8 @ 3.4 GHz 205703 5.3 FPS 6084 177.5 FPS 54791 19.7 FPS 5149 209.7 FPS
Device Jetson Clocks OS OpenCV CPU Time (ms) FPS Time (ms) FPS Time (ms) FPS Time (ms) FPS
Darknet (CPU) Darknet (CUDA) OpenCV DNN (CPU) OpenCV DNN (CUDA)

Resizing Frames

The cost of resizing the video frames or images to match the neural network dimensions was ignored in the table above. The test video was manually pre-processed so the frame dimensions would match the 416x416 neural network, allowing the test to focus on the cost of applying the neural network.

But the cost incured to resize each video frame is non-trivial and negatively impacts the FPS. The next table highlights this cost by comparing the FPS achieved when working with 416x416 video versus the same video in the original 1920x1080 dimensions.

Device Video Measures
416x416
Video Measures
1920x1080
Raspberry Pi 4 2.4 FPS 1.8 FPS
Jetson Nano 16.9 FPS 5.0 FPS
Jetson Xavier NX 46.4 FPS 9.1 FPS
Jetson AGX 71.5 FPS 14.6 FPS
Virtualbox VM 39.4 FPS 21.3 FPS
RTX 2070 209.7 FPS 43.6 FPS
Last modified: 2021-10-17
Stéphane Charette, stephanecharette@gmail.com
../