基于MAE和Mamba的自监督预训练的心率信号测量

耿梦晴; 王永雄

doi:10.13976/j.cnki.xk.2024.4971

基于MAE和Mamba的自监督预训练的心率信号测量

A Self-Supervised Pre-training Method for Heart Rate Signal Measurement Based on MAE and Mamba

摘要

摘要: 针对生理信号的远程光电容积脉搏波描记法（remote PhotoPlethysmoGraphy，rPPG）的特征长序列存在噪声大、标签获取难等问题，本文提出了一种基于掩码自编码器（Masked AutoEncoders，MAE）和Mamba的自监督预训练的心率信号测量方法。首先，为了有效去除噪声，对视频采用平均池化的方式生成时空图片（Spatio-Temporal Map，STMap）。其次，针对标签获取难的问题，采用自监督MAE的方法对STMap进行掩码和重建，从信号中提取固有的自相似先验信息。此外，在特征提取阶段，借助Mamba处理长序列的优势，选择性记忆或忽略输入内容，滤除短期扰动波动，保留长期周期数据，提升模型抗干扰能力，且相较于基于Transformer的方法，参数量更少、推理速度更快。在模型设计上，结合STMap每行是面部区域平均池化时间序列的特性，提出双通道注意力模块（Dual-Path Attention Module，DPAM）以增强通道和面部区域的特征提取能力。在两个公开数据集上的测试结果显示，与基于Transformer的自监督方法相比，在UBFC数据集上的平均绝对误差（MeanAE）下降17.65%，在数据集PURE上下降17.50%，同时参数量减少了43%。

Abstract: In response to the issues of high noise and difficult label acquisition in the long feature sequences of remote PhotoPlethysmoGraphy (rPPG) for physiological signals, this paper proposes a self-supervised pre-trained heart rate signal measurement method based on Masked AutoEncoders (MAE) and Mamba.Firstly, to effectively remove noise, average pooling is applied to the video to generate a Spatio-Temporal Map (STMap). Secondly, to address the problem of difficult label acquisition, the self-supervised MAE method is used to mask and reconstruct the STMap, extracting the inherent self-similar prior information from the signals. Additionally, during the feature extraction stage, leveraging the advantage of Mamba in handling long sequences, it selectively remembers or ignores the input content, filtering out short-term disturbance fluctuations and retaining long-term periodic data, thereby enhancing the model's anti-interference ability. Moreover, compared with the Transformer-based method, it has fewer parameters and a faster inference speed. In terms of model design, considering the characteristic that each row of the STMap represents the time series of the average pooling of each facial region, a Dual-Path Attention Module (DPAM) is proposed to enhance the feature extraction ability of channels and facial regions. The test results on two public datasets show that, compared with the Transformer -based self-supervised method, the Mean Absolute Error (MeanAE) of this method decreases by 17.65% on the UBFC dataset and 17.50% on the PURE dataset. Meanwhile, the number of parameters is reduced by 43%.

HTML全文

参考文献(32)

施引文献

资源附件(0)