Programming Comments - Neural Network: Dogs and Cats

Also see my recent posts on how to train an artificial neural network, and how to use a neural network from within a C++ application.

Summary

I've been working with Darknet[1], DarkHelp[2], and DarkMark to train artificial neural networks. Today, I was looking for pictures to use to train a new neural network and run some tests. I came across the Kaggle "Dogs and Cats" data set: 25,000 low-resolution images of cats and dogs. Perfect to train a new neural network.

Quick Results

I trained a YOLOv3-Tiny artificial neural network using 900 marked images -- which represents only 3.6% of the total number of images -- across 8000 iterations. Even at these low numbers, it works relatively well at correctly identifying dogs and cats in the large image dataset:

Specific Results

Definitely the most interesting result from the dataset is this image where a painting of a dog's face was identified, while the dog (whose face is not visible) was skipped:

Regardless of how the results from the previous image are interpreted, it doesn't take much to confuse the artificial neural network:

The network is effective at correctly identifying multiple objects in an image:

But a single image stood out where one specific cat in a set was incorrectly identified as a dog:

The excessively blurry images were also correctly identified, several of which were so blurry I found it difficult to manually identify so it was a pleasant surprise to see the ANN perform a decent job:

And lastly in a few cases, people's faces were incorrectly identified as a pet, such as:

I suspect that the poor results such as those in the last 3 images are due to the very low number of marked images (3.6%) which likely would be resolved by marking additional images and re-training the network.

Setup And Marking Images

To test with this dataset, download and extract the PetImages directory from the Kaggle project .zip file.[3, 4]

The data set is simple: there are only 2 classes. I assigned #0 to dogs, and #1 to cats. Then I used DarkMark to start marking up images. DarkMark traverses subdirectories when looking for images to mark, so you can either decide to keep the original Kaggle directory structure, or put all 25,000 pet images into the subdirectory of your choosing. This is how I set it up:

I tried several times at half a dozen different points to build a neural network. First after having marked up only a few dozen images, and steadily increasing the number of marked images and the number of training iterations.

By the time I'd reached 900 marked images and 8000 iterations, I had a neural network that was semi-decent at identifying random images of cats and dogs, as shown at the top of this post.

Increasing the training iterations at this point doesn't help make the network any better. As you can see from the training output charts, there is very little gain made between the networks trained over 8000 iterations (left) versus 15,000 iterations (right):

At this point, the best way to get this network functioning better is to increase the number of marked images, which can easily be done considering I've only marked up 3.6% of the 25,000 images in this dataset.

More Examples

Total time to setup and mark 900 images from this project is 1 day, and training with Darknet on a high-end NVIDIA RTX 2070 GPU took exactly 2 hours:

Saving weights to dogs-vs-cats_yolov3-tiny_15000.weights
Saving weights to dogs-vs-cats_yolov3-tiny_last.weights
Saving weights to dogs-vs-cats_yolov3-tiny_final.weights
	Command being timed: "darknet detector -map -dont_show train dogs-vs-cats.data dogs-vs-cats_yolov3-tiny.cfg"
	User time (seconds): 16491.23
	System time (seconds): 2657.68
	Percent of CPU this job got: 265%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 2:00:13

For those of you just getting started with neural networks, training a similar sized ANN on a CPU instead of GPU would probably take several weeks instead of 2 hours.
Invest in a decent GPU if you plan to train anything non-trivial.

Here are several more results when running the artificial neural network (900 marks, 15K iterations) over the Kaggle dataset: