Abstract:
In skeleton action recognition based on graph convolution, the different skeleton frames share the same spatial topology, and the temporal feature model employs single-scale temporal convolution. We address these issues and propose an action recognition methodology based on temporal topology unshared graph convolution and multiscale temporal convolution. First, in spatial modeling, we calculate the joint relationship of each frame according to the input samples to establish the independent spatial topology for each skeleton frame. Second, we use the multiscale temporal convolution module with five branches in the temporal modeling to extract action features on different time scales. Finally, we propose a spatiotemporal graph convolutional network for the skeleton action recognition by combining the temporal topology unshared graph convolution and multiscale temporal convolution modules. We carry out comparative experiments on NTU RGB+D, NTU RGB+D 120, and Northwestern-UCLA datasets. The results show that the proposed method has better recognition accuracy with lower model complexity than the current main action recognition methods.