metrics

The results in the table below are the most comprehensive benchmarks we have released to date on PRCV's people tracking capabilities as compared to other published results. Overall the system is far better than any technology today - and it is a general solution. This does not cover our other capabilities such as gaze tracking and behavior recognition. Bolded scores are the best in their categories. The meaning of each metric is explained below.

Tracker	MOTA %	MOTP	MT (%)	PT (%)	ML (%)	#IDSW	#FRAG	Notes
multiview	47.44	0.166m	51.22	31.71	17.07	10	78	Lower resolution images. Fast runtime. Realworld data.
SORT	33.4	72.1	11.7	57.4	30.9	1001	1664	High resolution images. 2D projective. MOT database
Xu 2016	30.6	72.02	23.19	65.79	11.01	299	200	High resolution images. 2D projective. CAMPUS
Xu 2017	35.64	72.38	25.89	61.59	12.52	265	219	High resolution images. 2D projective. CAMPUS
Tang 2017	46.8	79	18.2	41.7	40.1	481	595	MOT16 test set

Table 3: Comparison of Perceive to other top tracking algorithms. The metrics are:

MOTA: Accuracy. MOTA later increased as a localization issue was addressed. Higher scores are better.
MOTP: Precision. The tracker is an average of 16cm off (0.166m), when tracking someone. Not directly comparable to pixel distance which other systems use. Lower scores are better.
MT: Mostly tracked tracks. Higher better. Percentage of ground-truth trajectories that are at least 80% covered by the tracker.
PT: Partly tracked tracks = 1 - MT - ML
ML: Mostly lost tracks. Lower better.
IDSW: Identity switches. A detection is swapped between two people. Lower better. The total number of times that the trajectory switches ground-truth identity. For example, when two people cross in the video, and that causes the track to switch.
FRAG: Track fragments. A person is being tracked for a bit, then the track is lost, and then re-acquired, resulting in a "fragmentation" of the track. Lower better.

Measuring precision and recall in the classic sense is more complicated with a tracker. The following three metrics explain how this is measured in more detail.

MOTP: Multiple object tracking accuracy. The ability to measure precise object positions: total error in track-point positions divided by total number of matchers. In meters.

MOTA: Multiple object tracking precision. The first term is the ratio of misses (false negatives) computed over every track. The second term is the ratio of false positives. The third term is the ratio of mismatches. (i.e., ID switches, see below.) These three terms together give the total error rate.

MODP: Multiple object detection precision. The average IoU (i.e., intersection area divided by union area) for every tracked point. Note there's some bias (of about 20cm) which consistently crops up, presumably because of an integer math bug in the system. Solving this particular bug will show up in better MODP results. However, also note that MODP is much more challenging when calculating such in a 3D environment. This means that comparing our MODP results to most papers will be comparing apples to oranges.

MODA: Multiple object detection accuracy. Same as MOTP, but mismatches (ID switches) are not counted.

These results show that our long-sought goal of building a tracker which can work on almost any kind of input video with many different tracker configurations has been achieved. These metrics form the basis of the argument that the software solution from PRCV is even superior to some hardware assisted solutions on the market today.