TY - GEN
T1 - Big model for bridge health diagnosis by cross-modal learning
AU - Xu, Yang
N1 - Publisher Copyright:
© 2024 The Author(s).
PY - 2024
Y1 - 2024
N2 - Conventional deep-learning-based methods for bridge structural health diagnosis require complicated network structures and verbose from-scratch training with hyperparameter tuning. Because the pre-trained vision-language big model perceives fundamental knowledge of large-scale image and linguistic datasets, it should be of great potential to perform structural health diagnosis with full use of image and text datasets. This study performs a feasibility study towards establishing a big model for structural health diagnosis based on vision-language cross-modal learning. Specifically, an overall pipeline is proposed using a pre-trained vision-language big model of OFA (one for all, established by DAMO Academy in 2022). A series of Transformer modules based on the self-attention mechanism are stacked to unify pre-training tasks and downstream tasks in pure vision modality, pure language modality, and vision-language cross-modality learning. The results preliminarily demonstrate the feasibility and effectiveness of the visionlanguage cross-modal learning paradigm using OFA big model for structural health diagnosis.
AB - Conventional deep-learning-based methods for bridge structural health diagnosis require complicated network structures and verbose from-scratch training with hyperparameter tuning. Because the pre-trained vision-language big model perceives fundamental knowledge of large-scale image and linguistic datasets, it should be of great potential to perform structural health diagnosis with full use of image and text datasets. This study performs a feasibility study towards establishing a big model for structural health diagnosis based on vision-language cross-modal learning. Specifically, an overall pipeline is proposed using a pre-trained vision-language big model of OFA (one for all, established by DAMO Academy in 2022). A series of Transformer modules based on the self-attention mechanism are stacked to unify pre-training tasks and downstream tasks in pure vision modality, pure language modality, and vision-language cross-modality learning. The results preliminarily demonstrate the feasibility and effectiveness of the visionlanguage cross-modal learning paradigm using OFA big model for structural health diagnosis.
UR - https://www.scopus.com/pages/publications/85200372731
U2 - 10.1201/9781003483755-304
DO - 10.1201/9781003483755-304
M3 - 会议稿件
AN - SCOPUS:85200372731
SN - 9781032770406
T3 - Bridge Maintenance, Safety, Management, Digitalization and Sustainability - Proceedings of the 12th International Conference on Bridge Maintenance, Safety and Management, IABMAS 2024
SP - 2555
EP - 2561
BT - Bridge Maintenance, Safety, Management, Digitalization and Sustainability - Proceedings of the 12th International Conference on Bridge Maintenance, Safety and Management, IABMAS 2024
A2 - Jensen, Jens Sandager
A2 - Frangopol, Dan M.
A2 - Schmidt, Jacob Wittrup
PB - CRC Press/Balkema
T2 - 12th International Conference on Bridge Maintenance, Safety and Management, IABMAS 2024
Y2 - 24 June 2024 through 28 June 2024
ER -