Skip to main navigation Skip to search Skip to main content

PreTrans: Enabling Efficient CGRA Multi-Task Context Switch Through Config Pre-Mapping and Data Transceiving

  • Yufei Yang
  • , Chenhao Xie*
  • , Liansheng Liu
  • , Xiyuan Peng
  • , Yu Peng*
  • , Hailong Yang
  • , Depei Qian
  • *Corresponding author for this work
  • School of Electronics and Information Engineering, Harbin Institute of Technology
  • Beihang University

Research output: Contribution to journalArticlepeer-review

Abstract

Dynamic resource allocation guarantees the performance of CGRA multi-task, but incurs a wide range of incompatible contexts (config & data) to the CGRA architecture. However, traditional context switch approaches including online config transformation and data reloading may significantly block the task to process inputs under new resource allocation decisions, resulting in the limited task throughput. To address this issue, online config transformation can be avoided if compatible configs have been prepared through offline pre-mapping, but traditional CGRA mappers require days to achieve comprehensive pre-mapping with considerable quality. Besides, online data reloading can also be eliminated through memory sharing, but the traditional arbiter-based approach has the difficulty of trading off physical complexity and memory access parallelism. PreTrans is the first system design to achieve the efficient CGRA multi-task context switch. PreTrans first avoids the online config transformation through a software incremental pre-mapper, which re-utilizes the previously finished pre-mapping results to dramatically accelerate the pre-mapping of subsequent resource allocation decisions with negligible mapping quality loss. Second, PreTrans replaces the traditional arbiter with a hardware data transceiver to better support the memory sharing that eliminates data reloading, which allows each tile to possess an individual memory that maximizes the access parallelism without introducing significant physical overhead. The overall evaluation demonstrates that PreTrans achieves 1.13 ∼ 2.46×∼2.46× throughput improvement on pipeline and parallel multi-task scenarios, and can reach the target throughput immediately after the new resource allocation decision takes effect. Ablation study further shows that the pre-mapper is more than 3 magnitudes faster than the traditional CGRA mapper while maintaining more than 99% of the optimal mapping quality, and the data transceiver only introduces 9.02% hardware area overhead under 16 × 16 CGRA.

Original languageEnglish
Pages (from-to)2214-2228
Number of pages15
JournalIEEE Transactions on Parallel and Distributed Systems
Volume36
Issue number11
DOIs
StatePublished - 2025
Externally publishedYes

Keywords

  • CGRA
  • CGRA architecture
  • CGRA mapper
  • context switch
  • dynamic resource allocation
  • multi-task

Fingerprint

Dive into the research topics of 'PreTrans: Enabling Efficient CGRA Multi-Task Context Switch Through Config Pre-Mapping and Data Transceiving'. Together they form a unique fingerprint.

Cite this