One of the challenges in evaluating multi-object video detection, tracking and classification systems is having publically
available data sets with which to compare different systems. However, the measures of performance for tracking and
classification are different. Data sets that are suitable for evaluating tracking systems may not be appropriate for
classification. Tracking video data sets typically only have ground truth track IDs, while classification video data sets
only have ground truth class-label IDs. The former identifies the same object over multiple frames, while the latter
identifies the type of object in individual frames. This paper describes an advancement of the ground truth meta-data for
the DARPA Neovision2 Tower data set to allow both the evaluation of tracking and classification. The ground truth data
sets presented in this paper contain unique object IDs across 5 different classes of object (Car, Bus, Truck, Person,
Cyclist) for 24 videos of 871 image frames each. In addition to the object IDs and class labels, the ground truth data also
contains the original bounding box coordinates together with new bounding boxes in instances where un-annotated
objects were present. The unique IDs are maintained during occlusions between multiple objects or when objects re-enter
the field of view. This will provide: a solid foundation for evaluating the performance of multi-object tracking of
different types of objects, a straightforward comparison of tracking system performance using the standard Multi Object
Tracking (MOT) framework, and classification performance using the Neovision2 metrics. These data have been hosted
publically.
|