Abstract
With the widespread application of reinforcement learning and deep learning on edge devices, training neural networks on ARM architecture processors has become an urgent demand. However, existing mainstream deep learning frameworks are not sufficiently optimized for training workloads on ARM CPUs, resulting in low training efficiency. To address this problem, this paper proposes and implements a multi-level AI training acceleration scheme based on the open-source C++ library mlpack, targeting the slow training speed and high resource consumption of Convolutional Neural Networks (CNNs) on ARM platforms. The scheme accelerates training by deeply optimizing the im2col algorithm to convert convolutions into efficient matrix multiplications, utilizing ARM NEON SIMD instructions to optimize linear operators, and integrating an FP64/FP16 mixed-precision training strategy with dynamic loss scaling. Experimental results on LeNet-5 and VGG11-style CNNs show substantial performance gains over the original mlpack and mainstream frameworks. On an NVIDIA Jetson AGX Orin, our implementation achieves up to 7.3 (Formula presented.) speedup over the original mlpack baseline and up to 11.3× and 5.69× end-to-end speedups over PyTorch and TensorFlow, respectively, while still delivering multi-fold reductions in training time on a low-resource Raspberry Pi platform. In DQN-based reinforcement learning for Atari Breakout, our solution attains a 4.98× end-to-end speedup over the PyTorch single-threaded baseline and maintains 2.37× and 4.23× advantages over 4-threaded PyTorch and TensorFlow implementations. Ablation studies confirm the complementary nature of the proposed optimizations, with convolutional, linear, and mixed-precision components jointly contributing to the overall speedup and enabling an attractive performance–accuracy trade-off for ARM-based edge computing.
| Original language | English |
|---|---|
| Article number | e70626 |
| Journal | Concurrency and Computation: Practice and Experience |
| Volume | 38 |
| Issue number | 5 |
| DOIs | |
| State | Published - Mar 2026 |
| Externally published | Yes |
Keywords
- AI training acceleration
- ARM architecture
- SIMD
- convolutional neural network
- im2col
- mixed-precision training
- mlpack
Fingerprint
Dive into the research topics of 'A Multi-Level Acceleration Scheme for AI Model Training on ARM Architecture Processors'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver