Figure 1. Example of annotated image with one bumblebee

Insect benchmark datasets with time-lapse images as described in paper:

Bjerge K, Alison J, Dyrmann M, Frigaard C.E., Mann H. M. R., Høye T.T., Accurate detection and identification of insects from camera trap images with deep learning, bioRxiv, 2022

Train: train1201.zip
MD5 hash: 6831b05cab0988743a113819eb23be75
Zipped size: 8.8 GB
Unzipped size: 41.5 GB
Files: 43.683

Val: val1201.zip
MD5 hash: 88317db11fd10fab4976edb4d8d4a71f
Zipped size: 1.1 GB
Unzipped size: 5.24 GB
Files: 5.708

Test: test1201.zip
MD5 hash: d940cac65cf067a3baf356ecaa9944e3
Zipped size: 2.83 GB
Unzipped size: 12.9 GB
Files: 14.590

Labels in YOLO format: ultralytics/yolov5: label format

The annotated training and validation datasets contains insects of nine different species as listed below:

0 Coccinellidae septempunctata
1 Apis mellifera
2 Bombus lapidarius
3 Bombus terrestris
4 Eupeodes corolla
5 Episyrphus balteatus
6 Aglais urticae
7 Vespula vulgaris
8 Eristalis tenax

The test dataset contains additional classes of insects.

9 Non-Bombus Anthophila
10 Bombus spp.
11 Syrphidae
12 Fly spp.
13 Unclear insect
14 Mixed animals:
——————————
Rhopalocera
Non-Anthophila Hymenoptera
Non-Syrphidae Diptera
Non-Conccinalidae Coleoptera
Concinellidae
Other animals

There are two naming conventions for image (.jpg) and label (.txt) files.

Background images without insects are named:
X_Seq-YYYYMMDDHHMMSS-snapshot”.
E.g.:
Background image: 12_13-20190704172200-snapshot.jpg
Empty label file: 12_13-20190704172200-snapshot.txt

Images annotated with insects are named:
SZ_IP-MonthDate_C_Seq-YYYYMMDDHHMMSS”.
E.g.:
Image file: S1_146-Aug23_1_156-20190822133230.jpg
Label file: S1_146-Aug23_1_156-20190822133230.txt

Abbreviations:

YYYYMMDDHHMMSS – Capture timestamp with year, month, date, hour, minutes, and second
Seq – Sequence number created by the motion program to separate images
C – Identification of two cameras with Id=0 or Id=1 in system identified by SZ_IP
MonthDate – Folder name for where the original image were stored in the system
SZ_IP – Identification of five camera systems: S1_123, S2_146, S3_194, S4_199, S5_187 (Two cameras in each system)
X – An index number related to a specific camera and folder ensuring unique file names of background images from different camera systems.

The important information in a filename is system (SZ_IP), camera Id (C) and timestamp (YYYYMMDDHHMMSS).

The three best YOLOv5 models from the paper are available in pytorch format.

All models are tested with YOLOv5 release v7.0 (22-11-2022): ultralytics/yolov5: YOLOv5 🚀 in PyTorch

Download the YOLOv5models.zip containing the files listed below.
MD5 hash: bc2194e94bfbe0ba93e4a66df6eb6f1b
Zipped size: 489 MByte
Unzipped size: 528 MByte

insect1201-bestF1-640v5m.pt: Model no. 6 in Table 2 (F1=0.912)
insect1201-bestF1-1280v5m6.pt: Model no. 8 in Table 2 (F1=0.925)
insect1201-bestF1-1280v5m6.pt: Model no. 10 in Table 2 (F1=0.932)

insects-1201val.yaml: YAML file with label names to train YOLOv5

trainInsects-1201m.sh: Linux bash shell script with parameters to train YOLOv5m6
valInsectsF1-1201.sh: Linux bash shell script with parameters to validated models

Copyright © 2018 AU Signal Processing Group