
UMBC data science MPS capstone, Fall 2023
| Google Street View | Objects365 | Mapillary Vistas | |
|---|---|---|---|
| Size (train, val, test) | 473 / 119 / 79 | 393 / 107 / 54 | 3,202 / 929 / 484 |
| Setting | Street | Outdoors & indoors | Street |
| Used for | Detection & classification | Detection | Detection |
| Release | Maybe a TOS violation? | Released for research | Released for research |
| Source | Sheng, Yao, and Goel (2021) | Shao et al. (2019) | Neuhold et al. (2017) |
| Ultralytics YOLOv8 | Models with built-in modules for training, tuning, & validation |
| Pytorch | Underlies Ultralytics models |
| Roboflow | Dataset creation & management |
| Weights & Biases | Experiment tracking |
| Paperspace | Virtual machine (8 CPUs, 16GB GPU) |
| YOLO | RT-DETR |
|---|---|
| Latest generation YOLO model | Transformer-based model from Baidu |
| Detection & classification (& others) | Detection only |
| Smaller architecture (medium has 26M params) | Larger architecture (large has 33M params) |
| Trains very quickly & can train small models on laptop | Trains slowly & needs more GPU RAM |
| Doesn’t perform as well | Performs better |
| Well-documented & integrated | New, not fully integrated to ecosystem (e.g. no tune method) |
After lots of trial & error, best bets for detection:

YOLO works well on tiled images, but it will need to transfer to full-sized images to be useful
Training results - YOLO & DETR models


Tuning results - YOLO variations only

Screenshot of an earlier demo
Working interactive demo: https://camilleseab-surveillance.hf.space