A 4D Convolutional Neural Networks for Video Violence Detection
DOI:
https://doi.org/10.37934/araset.36.1.1625Keywords:
Surveillance Cameras, Computer Vision, Deep Leaning, Violence DetectionAbstract
As global crime has escalated, surveillance cameras have become widespread and will continue to proliferate. Due to the large amount of video, there must be systems that automatically look for suspicious activity and send out an online alert if they find it. This paper presents a deep learning architecture based on video-level four-dimensional convolution neural networks. The suggested architecture consists of residual blocks, which are combined with three-dimensional Convolutional Neural Networks (3D CNNs). The architecture aims to learn short-term and long-term representations of spatiotemporal from video, in addition to interactivity between clips. ResNet50 serves as the foundation for three-dimensional convolution networks and Dense optical flow in the region of concern. The proposed architecture is tested on the RWF2000 dataset with a test accuracy of 94.75. This research achieved higher results compared to other methods in the state of the art.