Multi-object tracking (MOT) remains a challenging problem in computer vision, particularly in scenarios involving occlusions, crowded scenes, and identity switches. We introduce UniTrack, a novel approach that enhances MOT algorithms through a specialized graph-based loss.
Our method addresses critical limitations in existing tracking systems by incorporating three key components: tracking score optimization for improved object-track association, spatial consistency constraints to maintain object properties across frames, and temporal consistency enforcement to ensure smooth trajectory generation.
Extensive experiments on MOT17, MOT20, SportsMOT and DanceTrack datasets demonstrate that UniTrack achieves superior tracking consistency and robustness compared to baseline methods, particularly excelling in challenging scenarios with frequent identity switches and occlusions.
Visual comparison showing UniTrack's superior identity preservation and tracking consistency compared to baseline Trackformer in challenging scenarios.
UniTrack introduces a novel graph-based loss mechanism to any MOT algorithm either end to end or tracking by detection. The method consists of three key components:
The unified loss function combines these components with learnable weights:
LUniTrack = λtrack Ltrack + λspatial Lspatial + λtemporal Ltemporal
This approach addresses critical challenges in multi-object tracking by enhancing tracking consistency and identity preservation through tracking-specific constraints.
| Method | MOTA ↑ | IDF1 ↑ | HOTA ↑ | IDSW ↓ |
|---|---|---|---|---|
| FairMOT | 61.7 | 61.5 | 52.9 | 388 |
| UT-FairMOT (Ours) | 64.5 | 64.2 | 55.3 | 482 |
| MOTR | 62.1 | 61.3 | 53.2 | 289 |
| UT-MOTR (Ours) | 64.8 | 63.9 | 55.7 | 356 |
| Trackformer | 62.3 | 57.6 | 52.8 | 643 |
| UT-Trackformer (Ours) | 65.9 | 66.4 | 56.2 | 705 |
| ByteTrack | 80.3 | 77.3 | 63.1 | 2196 |
| UT-ByteTrack (Ours) | 82.1 | 79.8 | 65.4 | 1865 |
| GTR | 75.3 | 71.5 | 59.1 | 1445 |
| UT-GTR (Ours) | 79.1 | 74.8 | 67.9 | 951 |
| MOTE | 82.0 | 80.3 | 66.3 | 620 |
| UT-MOTE (Ours) | 84.5 | 83.5 | 68.2 | 542 |
| Method | MOTA ↑ | IDF1 ↑ | HOTA ↑ | IDSW ↓ |
|---|---|---|---|---|
| FairMOT | 53.5 | 58.3 | 52.4 | 488 |
| UT-FairMOT (Ours) | 55.2 | 61.5 | 55.8 | 402 |
| MOTR | 53.2 | 57.9 | 51.8 | 389 |
| UT-MOTR (Ours) | 55.8 | 60.4 | 54.2 | 356 |
| Trackformer | 54.1 | 56.2 | 50.9 | 643 |
| UT-Trackformer (Ours) | 56.2 | 64.1 | 57.7 | 314 |
| ByteTrack | 77.8 | 75.2 | 61.3 | 1223 |
| UT-ByteTrack (Ours) | 79.5 | 77.8 | 63.7 | 1045 |
| GTR | 63.6 | 52.3 | 42.6 | 8604 |
| UT-GTR (Ours) | 63.8 | 52.5 | 43.0 | 8570 |
| MOTE | 81.7 | 79.8 | 65.8 | 685 |
| UT-MOTE (Ours) | 83.2 | 81.4 | 67.1 | 578 |
| Method | MOTA ↑ | IDF1 ↑ | HOTA ↑ | IDSW ↓ |
|---|---|---|---|---|
| FairMOT | 90.8 | 53.5 | 49.3 | 2845 |
| UT-FairMOT (Ours) | 92.5 | 56.2 | 52.1 | 2234 |
| Trackformer | 88.1 | 50.0 | 60.0 | 4250 |
| UT-Trackformer (Ours) | 90.3 | 51.5 | 60.8 | 3264 |
| MOTR | 76.2 | 58.4 | 55.8 | 2890 |
| UT-MOTR (Ours) | 79.5 | 62.1 | 58.4 | 2156 |
| ByteTrack | 94.1 | 69.8 | 62.8 | 3267 |
| UT-ByteTrack (Ours) | 96.2 | 71.1 | 64.3 | 2234 |
| GTR | 74.8 | 61.3 | 54.4 | 2364 |
| UT-GTR (Ours) | 84.5 | 73.6 | 66.1 | 1092 |
| MOTE | 93.8 | 68.2 | 61.5 | 2987 |
| UT-MOTE (Ours) | 95.1 | 70.5 | 63.2 | 2456 |
| Method | MOTA ↑ | IDF1 ↑ | HOTA ↑ | IDSW ↓ |
|---|---|---|---|---|
| FairMOT | 82.2 | 40.8 | 39.7 | 2987 |
| UT-FairMOT (Ours) | 84.8 | 43.5 | 42.3 | 2456 |
| Trackformer | 48.2 | 12.8 | 19.4 | 37800 |
| UT-Trackformer (Ours) | 50.4 | 13.6 | 20.5 | 35876 |
| MOTR | 79.7 | 51.5 | 54.2 | 4567 |
| UT-MOTR (Ours) | 82.1 | 54.8 | 57.3 | 3892 |
| ByteTrack | 88.2 | 51.9 | 47.1 | 3456 |
| UT-ByteTrack (Ours) | 91.3 | 56.5 | 49.1 | 2134 |
| GTR | 80.6 | 45.9 | 43.7 | 4338 |
| UT-GTR (Ours) | 82.6 | 48.5 | 50.2 | 3456 |
| MOTE | 87.4 | 53.2 | 46.8 | 3124 |
| UT-MOTE (Ours) | 89.8 | 56.1 | 48.9 | 2567 |
Play the video and enable error visualization to see tracking errors in real-time.
Side-by-side comparison showing UniTrack's performance vs baseline on GTR (Global Tracking Transformers) algorithm. Select different sequences above and toggle error visualizations to see specific improvements. Baveline GTR on the left and UT-GTR on the right