Forest Sound Event Detection with Convolutional Recurrent Neural Network-Long Short-Term Memory
Keywords:Convolution Recurrent Neural Network, feature extraction, forest, Long Short-Term Memory, sound event detection
Sound event detection tackles an audio environment's complex sound analysis and recognition problem. The process involves localizing and classifying sounds mainly to estimate the start point and end points of the separate sounds and describe each sound. Sound event detection capability relies on the type of sound. Although detecting sequences of distinct temporal sounds is straightforward, the situation becomes complex when the sound is multiple overlapping of much single audio. This situation usually occurs in the forest environment. Therefore, this aim of the paper is to propose a Convolution Recurrent Neural Network-Long Short-Term Memory algorithm to detect an audio signature of intruders in the forest environment. The audio is extracted in the Mel-frequency cepstrum coefficient and fed into the algorithm as an input. Six sound categories are chainsaw, machete, car, hatchet, ambiance, and bike. They were tested using several epochs, batch size, and filter of the layer in the model. The proposed model can achieve an accuracy of 98.52 percent in detecting the audio signature with a suitable parameter selection. In the future, additional types of audio signatures of intruders and further aspects of evaluation can be added to make the algorithm better at detecting intruders in the forest environment.