Animal Detection (2019)

#Python  #ComputerVision #DeepLearning



One popular method employed by ecologists to study an environment in wildlife is the strategic placement of motion-triggered camera traps in the desired area. The collection of data captured by these cameras provide vital insights into the local wildlife in a non-invasive manner. Despite the attractive benefits of utilizing camera-trap images for ecological study, there are significant challenges that hinder their widespread use. Namely, in contrast to their ease of deployment, their usefulness requires manual data processing, which is utterly labor-intensive and time-consuming. The motion-triggered images taken are entirely unlabeled, meaning human volunteers must painstakingly categorize each image one after another. Further, the sheer volume of images captured by a network of motion-triggered camera traps can make the inspection and labeling of images extremely tedious. Even with a dedicated effort to label all of the captured images, the images can sometimes be quite difficult to categorize. Examples of non-ideal images can be seen in the figure below. For instance, animals can be too close or too far to the camera, or they may only be partially visible in the captured image. Furthermore, the image may be affected by adverse environmental conditions such as harsh weather or the lighting conditions during the day and night lighting, which makes the task of accurate classification even more challenging.


Examples of non-ideal images.

Figures retrieved from: Norouzzadeh, M., Nguyen, A., Kosmala, M., Swanson, A., Palmer, M., Packer, C. and Clune, J. (2018). Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proceedings of the National Academy of Sciences, 115(25),pp.E5716-E5725

This study aimed to alleviate such burdens by automatically detecting and labeling wild animals from motion-triggered camera-trap images using deep learning methods. In layman terms, deep learning is a process that allows computers to extract multiple levels of abstraction from raw data. It is worth noting that this project built upon and tested the limits of a previously explored VGG architecture that sought to investigate the use of deep learning for automated data extraction of camera-trap images. More specifically, this project proposed a VGG inspired model utilizing deep convolutional neural networks to classify images from the most extensive available dataset of wild animal images, the Snapshot Serengeti (SS) dataset.  This project sought to match or exceed the classification accuracy achieved by previous studies using only a fraction of the full SS dataset for training the model. The aforementioned project was done in accordance with the final project requirements for CSCE 496/896: Introduction to Deep Learning at the University of Nebraska-Lincoln. The project was accomplished during the Spring term in 2019. Other members of the team included Cole Dempsey, Brian Chong, and Rahul Prajapati

Since training the convolutional model on the full data set was not achievable within the allocated period for this project, several unique experiments were conducted on fractions of the data set to test the effectiveness of the model under a variety of training conditions. For instance, this project focused on classifying images that were already known to contain an animal—eliminating the images labeled empty by human volunteers before training reduced the data set by nearly 75 percent to a more manageable size. Further challenges existed, such as the imbalanced number of images between species. To combat this, the number of images per species was reduced to the total number of images available for the least represented species. Several classification experiments were conducted to evaluate the performance of the model. It was observed that the model yielded high accuracy during training but performed poorly on the validation and testing sets. This was likely a result of reducing the number of images per species, and thus limiting the number of training instances.

The full report detailing the approach to accomplish the above tasks with justifications to the decisions made for each phase can be found below.