Skip to main navigation Skip to search Skip to main content

CMSM: Cross-modal semantic matching for lightweight IDS in the IoV

  • Zhendong Wang
  • , Xiping Zhou*
  • , Huamao Xie
  • , Dahai Li
  • , Daojing He
  • , Sammy Chan
  • *Corresponding author for this work
  • Jiangxi University of Science and Technology
  • Jiangxi Provincial Key Laboratory of Multidimensional Intelligent Perception and Control
  • Harbin Institute of Technology
  • City University of Hong Kong

Research output: Contribution to journalArticlepeer-review

Abstract

The deep integration of the Internet of Vehicles (IoV) improves transportation efficiency but also increases exposure to sophisticated cyberattacks. Existing intrusion detection methods often exhibit limited feature utilization in high-dimensional heterogeneous traffic data, high computational complexity that constrains edge deployment, and degraded generalization under data-scarce conditions. To address these issues, a Cross-Modal Semantic Matching framework (CMSM) is proposed. CMSM formulates intrusion detection as a visual-semantic feature alignment problem. A center-prioritized multi-view heatmap construction strategy is introduced to convert high-dimensional tabular traffic data into compact visual representations by quantifying feature importance and incorporating global statistical information. To support deployment in resource-constrained environments, a lightweight visual encoder based on Ghost modules is developed, reducing parameter size and computational cost while maintaining expressive capability. In addition, an IoV-oriented semantic description repository is established, and semantic embeddings extracted from a pre-trained language model are integrated to introduce high-level prior knowledge of attack behaviors. Experiments conducted on CICIDS2017, Car-Hacking, and CICIoV2024 demonstrate that CMSM achieves competitive detection performance with high computational efficiency. On the in-vehicle benchmark, CMSM attains state-of-the-art results with substantially lower computational overhead. In more complex inter-vehicle scenarios, the cross-modal mechanism further improves accuracy and generalization. Few-shot evaluations indicate that semantic alignment alleviates data scarcity and enhances model robustness.

Original languageEnglish
Article number112222
JournalComputer Networks
Volume281
DOIs
StatePublished - May 2026
Externally publishedYes

Keywords

  • Cross-modal learning
  • Edge deployment
  • Few-shot learning
  • Internet of vehicles (IoV)
  • Intrusion detection
  • Lightweight neural networks

Fingerprint

Dive into the research topics of 'CMSM: Cross-modal semantic matching for lightweight IDS in the IoV'. Together they form a unique fingerprint.

Cite this