A 4D Convolutional Neural Networks for Video Violence Detection

Mai Magdy; Fahima A. Maghraby; Mohamed Waleed Fakhr

doi:10.37934/araset.36.1.1625

Authors

Mai Magdy College of Computing and Information Technology, Arab Academy for Science, Technology, and Maritime Transport, Cairo P.O. Box 2033, Egypt
Fahima A. Maghraby College of Computing and Information Technology, Arab Academy for Science, Technology, and Maritime Transport, Cairo P.O. Box 2033, Egypt
Mohamed Waleed Fakhr College of Computing and Information Technology, Arab Academy for Science, Technology, and Maritime Transport, Cairo P.O. Box 2033, Egypt

DOI:

https://doi.org/10.37934/araset.36.1.1625

Keywords:

Surveillance Cameras, Computer Vision, Deep Leaning, Violence Detection

Abstract

As global crime has escalated, surveillance cameras have become widespread and will continue to proliferate. Due to the large amount of video, there must be systems that automatically look for suspicious activity and send out an online alert if they find it. This paper presents a deep learning architecture based on video-level four-dimensional convolution neural networks. The suggested architecture consists of residual blocks, which are combined with three-dimensional Convolutional Neural Networks (3D CNNs). The architecture aims to learn short-term and long-term representations of spatiotemporal from video, in addition to interactivity between clips. ResNet50 serves as the foundation for three-dimensional convolution networks and Dense optical flow in the region of concern. The proposed architecture is tested on the RWF2000 dataset with a test accuracy of 94.75. This research achieved higher results compared to other methods in the state of the art.