UniTrack: Differentiable Graph Representation Learning for Multi-Object Tracking

Bishoy Galoaa∗, Xiangyu Bai∗, Utsav Nandi,
Sai Siddhartha Vivek Dhir Rangoju, Somaieh Amraee, Sarah Ostadabbas

Northeastern University

arXiv HuggingFace

ICLR 2026 Accepted

Abstract

Multi-object tracking (MOT) remains a challenging problem in computer vision, particularly in scenarios involving occlusions, crowded scenes, and identity switches. We introduce UniTrack, a novel approach that enhances MOT algorithms through a specialized graph-based loss.

Our method addresses critical limitations in existing tracking systems by incorporating three key components: tracking score optimization for improved object-track association, spatial consistency constraints to maintain object properties across frames, and temporal consistency enforcement to ensure smooth trajectory generation.

Extensive experiments on MOT17, MOT20, SportsMOT and DanceTrack datasets demonstrate that UniTrack achieves superior tracking consistency and robustness compared to baseline methods, particularly excelling in challenging scenarios with frequent identity switches and occlusions.

UniTrack Visual Examples

Visual comparison showing UniTrack's superior identity preservation and tracking consistency compared to baseline Trackformer in challenging scenarios.

Method

UniTrack Method Overview

UniTrack framework overview showing the integration of unitrack loss components

UniTrack Detailed Architecture

Detailed architecture showing temporal association graphs and loss computation

UniTrack introduces a novel graph-based loss mechanism to any MOT algorithm either end to end or tracking by detection. The method consists of three key components:

  • Tracking Score Loss: Optimizes detection confidence for better object identification and association accuracy
  • Spatial Consistency Loss: Enforces spatial constraints between related objects to maintain consistent object properties
  • Temporal Consistency Loss: Ensures smooth trajectories across time frames by penalizing abrupt motion changes

The unified loss function combines these components with learnable weights:

LUniTrack = λtrack Ltrack + λspatial Lspatial + λtemporal Ltemporal

This approach addresses critical challenges in multi-object tracking by enhancing tracking consistency and identity preservation through tracking-specific constraints.

Results

Method MOTA ↑ IDF1 ↑ HOTA ↑ IDSW ↓
FairMOT 61.7 61.5 52.9 388
UT-FairMOT (Ours) 64.5 64.2 55.3 482
MOTR 62.1 61.3 53.2 289
UT-MOTR (Ours) 64.8 63.9 55.7 356
Trackformer 62.3 57.6 52.8 643
UT-Trackformer (Ours) 65.9 66.4 56.2 705
ByteTrack 80.3 77.3 63.1 2196
UT-ByteTrack (Ours) 82.1 79.8 65.4 1865
GTR 75.3 71.5 59.1 1445
UT-GTR (Ours) 79.1 74.8 67.9 951
MOTE 82.0 80.3 66.3 620
UT-MOTE (Ours) 84.5 83.5 68.2 542
Method MOTA ↑ IDF1 ↑ HOTA ↑ IDSW ↓
FairMOT 53.5 58.3 52.4 488
UT-FairMOT (Ours) 55.2 61.5 55.8 402
MOTR 53.2 57.9 51.8 389
UT-MOTR (Ours) 55.8 60.4 54.2 356
Trackformer 54.1 56.2 50.9 643
UT-Trackformer (Ours) 56.2 64.1 57.7 314
ByteTrack 77.8 75.2 61.3 1223
UT-ByteTrack (Ours) 79.5 77.8 63.7 1045
GTR 63.6 52.3 42.6 8604
UT-GTR (Ours) 63.8 52.5 43.0 8570
MOTE 81.7 79.8 65.8 685
UT-MOTE (Ours) 83.2 81.4 67.1 578
Method MOTA ↑ IDF1 ↑ HOTA ↑ IDSW ↓
FairMOT 90.8 53.5 49.3 2845
UT-FairMOT (Ours) 92.5 56.2 52.1 2234
Trackformer 88.1 50.0 60.0 4250
UT-Trackformer (Ours) 90.3 51.5 60.8 3264
MOTR 76.2 58.4 55.8 2890
UT-MOTR (Ours) 79.5 62.1 58.4 2156
ByteTrack 94.1 69.8 62.8 3267
UT-ByteTrack (Ours) 96.2 71.1 64.3 2234
GTR 74.8 61.3 54.4 2364
UT-GTR (Ours) 84.5 73.6 66.1 1092
MOTE 93.8 68.2 61.5 2987
UT-MOTE (Ours) 95.1 70.5 63.2 2456
Method MOTA ↑ IDF1 ↑ HOTA ↑ IDSW ↓
FairMOT 82.2 40.8 39.7 2987
UT-FairMOT (Ours) 84.8 43.5 42.3 2456
Trackformer 48.2 12.8 19.4 37800
UT-Trackformer (Ours) 50.4 13.6 20.5 35876
MOTR 79.7 51.5 54.2 4567
UT-MOTR (Ours) 82.1 54.8 57.3 3892
ByteTrack 88.2 51.9 47.1 3456
UT-ByteTrack (Ours) 91.3 56.5 49.1 2134
GTR 80.6 45.9 43.7 4338
UT-GTR (Ours) 82.6 48.5 50.2 3456
MOTE 87.4 53.2 46.8 3124
UT-MOTE (Ours) 89.8 56.1 48.9 2567

Interactive Video Comparisons

Error Visualization Controls

False Positives: Spurious detections by baseline
False Negatives: Missed detections recovered by UniTrack
Identity Switches: Tracking inconsistencies

Current Frame Errors

Play the video and enable error visualization to see tracking errors in real-time.

Side-by-side comparison showing UniTrack's performance vs baseline on GTR (Global Tracking Transformers) algorithm. Select different sequences above and toggle error visualizations to see specific improvements. Baveline GTR on the left and UT-GTR on the right

More Results 👀

Downloads