TY - GEN
T1 - Spatial-Temporal Graph U-Net for Skeleton-Based Human Motion Infilling
AU - Xu, Leiyang
AU - Wang, Qiang
AU - Yang, Chenguang
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Motion infilling is a fundamental and challenging research field in human motion modeling and analysis, which aims to generate natural and visually coherent transitions to fill in missing motion frames based on the start and end motion sequences. However, most current methods ignore the spatial structure formed by joints, which may lose some spatial information. This work proposes a novel spatiotemporal graph U-Net that supports flexible inputs for skeleton-based motion infilling. We apply spatiotemporal graph convolutional layers, skeleton pooling layers, and skeleton unpooling layers to extract spatial and temporal features in the motion sequence. At the same time, we use the U-Net structure to integrate the information in the start and end motion sequences. In addition, the generative adversarial mechanism is introduced to ensure the generated skeleton poses are smooth and natural. We conduct experiments on two motion datasets, including one large-scale public dataset and one self-built dataset. The model inputs are joint quaternions or joint coordinates. Experimental results show that our method can improve the performance of skeleton-based motion infilling and achieve state-of-the-art results when using joint coordinates as model input.
AB - Motion infilling is a fundamental and challenging research field in human motion modeling and analysis, which aims to generate natural and visually coherent transitions to fill in missing motion frames based on the start and end motion sequences. However, most current methods ignore the spatial structure formed by joints, which may lose some spatial information. This work proposes a novel spatiotemporal graph U-Net that supports flexible inputs for skeleton-based motion infilling. We apply spatiotemporal graph convolutional layers, skeleton pooling layers, and skeleton unpooling layers to extract spatial and temporal features in the motion sequence. At the same time, we use the U-Net structure to integrate the information in the start and end motion sequences. In addition, the generative adversarial mechanism is introduced to ensure the generated skeleton poses are smooth and natural. We conduct experiments on two motion datasets, including one large-scale public dataset and one self-built dataset. The model inputs are joint quaternions or joint coordinates. Experimental results show that our method can improve the performance of skeleton-based motion infilling and achieve state-of-the-art results when using joint coordinates as model input.
KW - Generative Adversarial Network
KW - Graph U-Net
KW - Motion Infilling
KW - ST-GCN
KW - Skeleton pooling
UR - https://www.scopus.com/pages/publications/85195804120
U2 - 10.1109/ICIT58233.2024.10540720
DO - 10.1109/ICIT58233.2024.10540720
M3 - 会议稿件
AN - SCOPUS:85195804120
T3 - Proceedings of the IEEE International Conference on Industrial Technology
BT - ICIT 2024 - 2024 25th International Conference on Industrial Technology
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th IEEE International Conference on Industrial Technology, ICIT 2024
Y2 - 25 March 2024 through 27 March 2024
ER -