基于图神经网络的交通场景声音事件检测

Sound Event Detection in Traffic Scene Based on Graph Neural Network

  • 摘要: 为了更好地在复杂行车环境下通过声音信号检测发生的事件,提出一种基于图神经网络获取交叉模态信息的交通场景声音事件检测方法。首先,通过声音事件窗方法获取声音信号中同时和相继发生的关系信息作为交叉模态信息,并过滤掉其中可能存在的噪声关系,构建为图形结构;其次,改进图卷积神经网络以平衡邻居与自身的关系权重并避免过度平滑现象,利用其学习图形结构中的关系信息;最后,基于卷积循环神经网络学习声音事件的声学特征和时序信息,并以交叉模态融合的方式获取事件的关系信息,从而增强模型检测性能。相较于卷积循环神经网络(CRNN)模型,该方法在TUT Sound Events 2016和TUT Sound Events 2017数据集上均取得了更优的检测性能,F1分数分别提高了10.3%和2.04%,ER (error rate)度量分别降低了5.89%和10.06%,总体错误率分别降低了8.1%和6.07%。实验结果表明,该方法可以有效地提升智能汽车在行驶过程中对周围环境的感知能力。

     

    Abstract: To enhance event detection in traffic scenes using sound signals in complex driving environments, we propose a sound event detection method utilizing a graph neural network for cross-modal information extraction. First, we apply the sound event window method to capturing simultaneous and successive relationships in the sound signal, filtering out potential noise and constructing a graphical structure. We then enhance the graph convolutional neural network to balance relationship weights among neighbors and itself, preventing excessive smoothing, and adopt it to understand the relationship information in the graph. Additionally, acoustic features and timing information of sound events are learned using a convolutional recurrent neural network, with event relationship information acquired through cross-modal fusion to enhance model detection performance. Compared to the original CRNN(Convolutional Recurrent Neural Network) model, the proposed method achieves better detection performance on TUT Sound Events 2016 and TUT Sound Events 2017 datasets, with a 10.3% and 2.04% increase in F1 score, a 5.89% and 10.06% reduction in error rate, and an 8.1% and 6.07% decrease in global error rates, respectively. Experimental results show that the proposed method can effectively improve the perception ability of intelligent vehicles to the surrounding environment during driving.

     

/

返回文章
返回