基于图神经网络的交通场景声音事件检测

姜彦吉; 郭丁旭; 邱友利; 董浩

doi:10.13976/j.cnki.xk.2025.4052

基于图神经网络的交通场景声音事件检测

Sound Event Detection in Traffic Scene Based on Graph Neural Network

摘要

摘要: 为了更好地在复杂行车环境下通过声音信号检测发生的事件，提出一种基于图神经网络获取交叉模态信息的交通场景声音事件检测方法。首先，通过声音事件窗方法获取声音信号中同时和相继发生的关系信息作为交叉模态信息，并过滤掉其中可能存在的噪声关系，构建为图形结构；其次，改进图卷积神经网络以平衡邻居与自身的关系权重并避免过度平滑现象，利用其学习图形结构中的关系信息；最后，基于卷积循环神经网络学习声音事件的声学特征和时序信息，并以交叉模态融合的方式获取事件的关系信息，从而增强模型检测性能。相较于卷积循环神经网络(CRNN)模型，该方法在TUT Sound Events 2016和TUT Sound Events 2017数据集上均取得了更优的检测性能，F₁分数分别提高了10.3%和2.04%，ER (error rate)度量分别降低了5.89%和10.06%，总体错误率分别降低了8.1%和6.07%。实验结果表明，该方法可以有效地提升智能汽车在行驶过程中对周围环境的感知能力。

Abstract: To enhance event detection in traffic scenes using sound signals in complex driving environments, we propose a sound event detection method utilizing a graph neural network for cross-modal information extraction. First, we apply the sound event window method to capturing simultaneous and successive relationships in the sound signal, filtering out potential noise and constructing a graphical structure. We then enhance the graph convolutional neural network to balance relationship weights among neighbors and itself, preventing excessive smoothing, and adopt it to understand the relationship information in the graph. Additionally, acoustic features and timing information of sound events are learned using a convolutional recurrent neural network, with event relationship information acquired through cross-modal fusion to enhance model detection performance. Compared to the original CRNN(Convolutional Recurrent Neural Network) model, the proposed method achieves better detection performance on TUT Sound Events 2016 and TUT Sound Events 2017 datasets, with a 10.3% and 2.04% increase in F₁ score, a 5.89% and 10.06% reduction in error rate, and an 8.1% and 6.07% decrease in global error rates, respectively. Experimental results show that the proposed method can effectively improve the perception ability of intelligent vehicles to the surrounding environment during driving.

HTML全文

参考文献(20)

施引文献

资源附件(0)