Re-surveilling surveillance

Camille Seaberry

UMBC data science MPS capstone, Fall 2023

Background

Police surveillance cameras in Baltimore form one of many layers of state surveillance imposed upon residents.
Little documentation, control, or oversight of surveillance landscape
What role do tech vendors play in surveillance? How can open source tech be used for accountability?

Tasks

Identify cameras in images (object detection)
Categorize camera types once detected (classification)

Goals

Improve upon / expand on models I built before—DONE!
Map locations of cameras for spatial analysis—NOT DONE

About the data

	Google Street View	Objects365	Mapillary Vistas
Size (train, val, test)	473 / 119 / 79	393 / 107 / 54	3,202 / 929 / 484
Setting	Street	Outdoors & indoors	Street
Used for	Detection & classification	Detection	Detection
Release	Maybe a TOS violation?	Released for research	Released for research
Source	Sheng, Yao, and Goel (2021)	Shao et al. (2019)	Neuhold et al. (2017)

Tools

Ultralytics YOLOv8	Models with built-in modules for training, tuning, & validation
Pytorch	Underlies Ultralytics models
Roboflow	Dataset creation & management
Weights & Biases	Experiment tracking
Paperspace	Virtual machine (8 CPUs, 16GB GPU)

Models

YOLO	RT-DETR
Latest generation YOLO model	Transformer-based model from Baidu
Detection & classification (& others)	Detection only
Smaller architecture (medium has 26M params)	Larger architecture (large has 33M params)
Trains very quickly & can train small models on laptop	Trains slowly & needs more GPU RAM
Doesn’t perform as well	Performs better
Well-documented & integrated	New, not fully integrated to ecosystem (e.g. no `tune` method)

YOLO family

Ultralytics released YOLOv8 this year (Jocher, Chaurasia, and Qiu (2023))
Avoids anchor box calculations and comparisons of other detection models

Model variations

Detection

Freezing all but last few layers—increased speed, maybe increased accuracy
Tiling images—better detection of small objects

Classification

No RT-DETR classifier, so just trying different sizes of YOLO

Model variations

After lots of trial & error, best bets for detection:

YOLO trained on full-sized images
YOLO trained on tiled images
RT-DETR trained on full-sized images with freezing

Results

Training & first round of validation

YOLO works well on tiled images, but it will need to transfer to full-sized images to be useful

Training results - YOLO & DETR models

Results

Validation examples, DETR model

Results

Tuning

Tuning results - YOLO variations only

Results

Tuning—what went wrong?

Clearly needs more tuning—these metrics are worse than untuned models!
Pick a model & tune extensively & methodically—probably YOLO tiled
- However, that model runs the risk of not transferring well

Results

Classification

Works very well
However, this was only a very small dataset

Confusion matrix, YOLO medium, validation set

Results

Inference

Screenshot of an earlier demo

Demo

Working interactive demo: https://camilleseab-surveillance.hf.space

Challenges

Many moving parts to work together
Some components are very new & incomplete
Hard to find lots of high-quality data
Google Street View images aren’t permanent
Formatting images & annotations to be compatible
Reliable, sustained compute power
A lot to learn!

Potential improvements

Need a better tuning methodology—switch to W&B
Longer training—common benchmarks use 300 epochs
Add slicing to inference step (SAHI, Akyon, Onur Altinuc, and Temizel (2022))
Label more images for a larger dataset
- Can use AI labelling assistants

Next steps?

Use the classification model to add classes back to detection images
Infer on Mapillary images with location data for spatial analysis
- Mapillary already has so many objects annotated, might only need to do this to fill in gaps

Conclusions & implications

This is a potentially useful start but needs more work still
Surveillance studies, movements for police accountability seem to be tech-averse (with good reason), but there is a role for the technologies deployed against communities to be used by them as well
Inherently reactionary to be chasing surveillance state after its infrastructure is built

References

Akyon, Fatih Cagatay, Sinan Onur Altinuc, and Alptekin Temizel. 2022. “Slicing Aided Hyper Inference and Fine-Tuning for Small Object Detection.” In 2022 IEEE International Conference on Image Processing (ICIP), 966–70. https://doi.org/10.1109/ICIP46576.2022.9897990.

Browne, Simone. 2015. Dark Matters: On the Surveillance of Blackness. Durham, NC: Duke University Press.

Jocher, Glenn, Ayush Chaurasia, and Jing Qiu. 2023. “YOLO by Ultralytics.” https://github.com/ultralytics/ultralytics.

Neuhold, Gerhard, Tobias Ollmann, Samuel Rota Bulo, and Peter Kontschieder. 2017. “The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes.” In Proceedings of the IEEE International Conference on Computer Vision, 4990–99. https://openaccess.thecvf.com/content_iccv_2017/html/Neuhold_The_Mapillary_Vistas_ICCV_2017_paper.html.

Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. “You Only Look Once: Unified, Real-Time Object Detection.” arXiv. https://doi.org/10.48550/arXiv.1506.02640.

Shao, Shuai, Zeming Li, Tianyuan Zhang, Chao Peng, Gang Yu, Xiangyu Zhang, Jing Li, and Jian Sun. 2019. “Objects365: A Large-Scale, High-Quality Dataset for Object Detection.” In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 8429–38. https://doi.org/10.1109/ICCV.2019.00852.

Sheng, Hao, Keniel Yao, and Sharad Goel. 2021. “Surveilling Surveillance: Estimating the Prevalence of Surveillance Cameras with Street View Data.” In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 221–30. AIES ’21. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3461702.3462525.

Turtiainen, Hannu, Andrei Costin, Tuomo Lahtinen, Lauri Sintonen, and Timo Hamalainen. 2021. “Towards Large-Scale, Automated, Accurate Detection of CCTV Camera Objects Using Computer Vision. Applications and Implications for Privacy, Safety, and Cybersecurity. (Preprint).” arXiv. http://arxiv.org/abs/2006.03870.