The Darknet/YOLO Discord and Darknet Issues regularly sees certain questions come up. This page will attempt to answer these.

Does the network have to be perfectly square?


The default network sizes in the common template configuration files is defined as 416x416 or 608x608, but those are only examples!

Choose a size that works for you and your images. The only restrictions are:

Whatever size you choose, Darknet will stretch (without preserving the aspect ratio!) your images to be exactly that size prior to processing the image. This includes both training and inference. So use a size that makes sense for you and the images you need to process, but remember that there are important speed and memory limitations. The larger the size, the slower it will be to train and run, and the more GPU memory will be required.

See also: How much memory does it take?

Can I train a neural network with a CPU?

Not really.

While technically you could, the length of time you'll have to wait does not make sense. Doing inference of single images with a CPU does work (measured in seconds, versus GPU which is measured in milliseconds), but to give you an idea of the difference between CPU and GPU, see the following:

See also: How long does it take?

Can I train a neural network using synthetic images?


Or, to be more precise, you'll end up with a neural network that is great at detecting your synthetic images, but unable to detect much in real-world images.

(Yes, one of my first Darknet tutorials was to detect barcodes in synthetic images, and that neural network worked great...but only at detecting barcodes in synthetic images!)

What does the error message "CUDA Error: out of memory" mean during training?

If training abruptly stops with the following message:

Try to set subdivisions=64 in your cfg-file. CUDA Error: out of memory: File exists darknet: ./src/utils.c:325: error: Assertion `0' failed. Command terminated by signal 6

This means you don't have enough video memory on your GPU card. There are several possible solutions:

  1. decrease the network size, meaning width=... and height=... in the [net] section of your .cfg file
  2. increase the subdivisions, meaning subdivisions=... in the [net] section of your .cfg file
  3. choose a different configuration file that consumes less memory
  4. purchase a different GPU with more memory

If the network size cannot be modified, the most common solution is to increase the subdivision. For example, in the [net] section of your .cfg file you might see this:

batch=64 subdivisions=2

Try doubling the subdivisions and train again:

batch=64 subdivisions=4


batch=64 subdivisions=8

Keep doubling the subdivisions until you don't get the out-of-memory error.

If subdivision=... matches the value in batch=... and you still get an out-of-memory error, then you'll need to decrease the network dimensions or select a less demanding configuration.

See also: How much memory does it take?

How much memory does it take?

There are several factors that determine how much video memory is needed on your GPU to train a network. Except for the first (the configuration file itself), these are all defined in the [net] section at the top of the configuration file:

Typically, once a network configuration and dimensions are chosen, the value that gets modified to make the network fit in the available memory is the batch subdivision.

You'll want the subdivision to be as small as possible without causing an out-of-memory error. Here are some values showing the amount of GPU memory required using various configurations and subdivisions:

subdivisions= 64 32 16 8 4 2 1
YOLOv3 3085 MiB 4406 MiB 6746 MiB ? ? ? ?
YOLOv3-tiny 1190 MiB 1204 MiB 1652 MiB 2522 MiB 4288 MiB ? ?
YOLOv3-tiny-3l 1046 MiB 1284 MiB 1814 MiB 2810 MiB 4846 MiB ? ?
YOLOv4 4236 MiB 6246 MiB ? ? ? ? ?
YOLOv4-tiny 814 MiB 956 MiB 1321 MiB 1752 MiB 2770 MiB 5532 MiB ?
YOLOv4-tiny-3l 830 MiB 1085 MiB 1282 MiB 1862 MiB 2982 MiB 5748 MiB ?

Here is the same table but for a slightly larger network size:

subdivisions= 64 32 16 8 4 2 1
YOLOv3 4648 MiB 4745 MiB ? ? ? ? ?
YOLOv3-tiny 1278 MiB 1774 MiB 2728 MiB 4634 MiB ? ? ?
YOLOv3-tiny-3l 1473 MiB 2059 MiB 3044 MiB 5420 MiB ? ? ?
YOLOv4 6906 MiB ? ? ? ? ? ?
YOLOv4-tiny 984 MiB 1262 MiB 1909 MiB 2902 MiB 5076 MiB ? ?
YOLOv4-tiny-3l 1020 MiB 1332 MiB 1938 MiB 3134 MiB 5518 MiB ? ?

Memory values as reported by nvidia-smi. My setup is a GeForce RTX 2070 with only 8 GiB of memory, which limits the configurations I can run.

How long does it take to train?

The length of time it takes to train a network depends on the input image data, the network configuration, the available hardware, how Darknet was compiled, even the format of the images at extremes.

Some tldr notes:

  1. Resize your training and validation images to match exactly the network size.
    • For example: mogrify -verbose -strip -resize 416x416! -quality 75 *.JPG
  2. Build Darknet with support for OpenCV: loading images is slower without OpenCV.
  3. Build Darknet with support for OpenCV: resizing images is slower without OpenCV.
  4. Build Darknet with support for CUDA/CUDNN. (This requires supported hardware.)
  5. Use the tiny variants of the network.

The format of the images -- JPG or PNG -- has no meaningful impact on the length of time it takes to train unless the images are excessively large. When very large photo-realistic image files are saved as PNG, the excessive file sizes means loading the images from disk is slow, which significantly impacts the training time. This should never be an issue when the image sizes match the network sizes.

The table shows the length of time it takes to train a neural network:
  original 4608x3456 JPG images 4608x3456 JPG images, quality=75 800x600 JPG images, quality=75 416x416 JPG images, quality=75
Darknet compiled to use GPU + OpenCV 10 iterations: 42.26 seconds
10K iterations: 11h 44m
10 iterations: 35.27 seconds
10K iterations: 9h 47m
10 iterations: 6.90 seconds
10K iterations: 1h 55m
10 iterations: 6.76 seconds
10K iterations: 1h 53m
Darknet compiled to use GPU + OpenCV,
but using PNG images instead of JPG
n/a 10 iterations: 80.70 seconds
10K iterations: 22h 25m
10 iterations: 6.93 seconds
10K iterations: 1h 56m
10 iterations: 6.71 seconds
10K iterations: 1h 52m
Darknet compiled to use GPU, but without OpenCV 10 iterations: 113.31 seconds
10K iterations: 31h 29m
10 iterations: 106.56 seconds
10K iterations: 29h 36m
10 iterations: 9.19 seconds
10K iterations: 2h 33m
10 iterations: 7.70 seconds
10K iterations: 2h 8m
Darknet compiled for CPU + OpenCV (no GPU) 10 iterations: 532.86 seconds
10K iterations: > 6 days
10 iterations: 527.41 seconds
10K iterations: > 6 days
10 iterations: 496.47 seconds
10K iterations: > 5 days
10 iterations: 496.03 seconds
10K iterations: > 5 days

For these tests, GPU was a GeForce RTX 2070 with 8 GiB of memory, CPU was a 8-core 3.40 GHz.

Note that all the neural networks trained in the previous table are exactly the same. The training images are identical, the validation images are the same, and the resulting neural networks are virtually identical. But the length of time it takes to train varies between ~2 hours and 6+ days.

Should I crop my training images?


Say you want a network trained to find barcodes. If you crop and label your training images like this:


...then your network will only learn to recognize barcodes when they take up 100% of the image. It is unlikely you want that; if the objects to find always took up 100% of the image, then there is little use to train a neural network.

Instead, make sure your training images are representative of the final images. Using this barcode as an example, a more likely marked up training image would be similar to this:

book and barcode

See also: Darknet & DarkMark image markup.

How do I turn off data augmentation?

If you are using DarkMark, then set to zero or turn off all data augmentation options.

If you are editing your configuration file by hand, verify these settings in the [net] section:

saturation=0 exposure=0 hue=0 cutmix=0 flip=0 mixup=0 mosaic=0

How important is image rotation as a data augmentation technique?

Depends on the type of image. Some things don't make much sense rotated (e.g., dashcam or highway cam images). But the impact of rotated images needs to be considered. For example, here is a network that is really good at detecting animals:

puppy upside down puppy

With 100% certainty, that is a very cute dog. But when the exact same image is rotated 180 degrees, all of a sudden the neural network thinks this is more likely to be a cat than a dog.

See also: Data Augmentation - Rotation

Where can I get more help?

Come see us on the Darknet/YOLO Discord!

Last modified: 2020-10-04
Stéphane Charette, stephanecharette@gmail.com