Abstract:
To enhance event detection in traffic scenes using sound signals in complex driving environments, we propose a sound event detection method utilizing a graph neural network for cross-modal information extraction. First, we apply the sound event window method to capturing simultaneous and successive relationships in the sound signal, filtering out potential noise and constructing a graphical structure. We then enhance the graph convolutional neural network to balance relationship weights among neighbors and itself, preventing excessive smoothing, and adopt it to understand the relationship information in the graph. Additionally, acoustic features and timing information of sound events are learned using a convolutional recurrent neural network, with event relationship information acquired through cross-modal fusion to enhance model detection performance. Compared to the original CRNN(Convolutional Recurrent Neural Network) model, the proposed method achieves better detection performance on TUT Sound Events 2016 and TUT Sound Events 2017 datasets, with a 10.3% and 2.04% increase in
F1 score, a 5.89% and 10.06% reduction in error rate, and an 8.1% and 6.07% decrease in global error rates, respectively. Experimental results show that the proposed method can effectively improve the perception ability of intelligent vehicles to the surrounding environment during driving.