Skip to main navigation Skip to search Skip to main content

MCTA: A Multi-Stage Co-Optimized Transformer Accelerator with Energy-Efficient Dynamic Sparse Optimization

  • Heng Liu
  • , Ming Han
  • , Jin Wu
  • , Ye Wang
  • , Jian Dong*
  • *Corresponding author for this work
  • Harbin Institute of Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

As Transformer-based models continue to enhance service quality across various domains, their intensive computational requirements are exacerbating the AI energy crisis. Traditional energy-efficient Transformer architectures primarily focus on optimizing the Attention stage due to its high algorithmic complexity (O(n2)). However, linear layers can also be significant energy consumers, sometimes accounting for over 70% of total energy usage. Although existing approaches such as sparsity have improved the Attention stage, the optimization space within linear layers is not fully exploited. In this paper, we introduce the multi-stage co-optimized Transformer accelerator (MCTA) for optimizing energy efficiency. Our approach independently enhances the Query-Key-Value generation, Attention, and Feed-forward Neural Network stages. It employs two novel techniques: Low-overhead Mask Generation (LMG) for dynamically identifying unimportant calculations with minimal energy costs, and Cascaded Mask Derivation (CMD) for streamlining the mask generation process through parallel processing. Experimental results show that MCTA achieves an average energy reduction of 1.48 × with only a 1% accuracy loss compared to state-of-the-art accelerators. This work demonstrates the potential for significant energy savings in Transformer models without the need for retraining, paving the way for more sustainable AI applications.

Original languageEnglish
Title of host publication2025 Design, Automation and Test in Europe Conference, DATE 2025 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9783982674100
DOIs
StatePublished - 2025
Event2025 Design, Automation and Test in Europe Conference, DATE 2025 - Lyon, France
Duration: 31 Mar 20252 Apr 2025

Publication series

NameProceedings -Design, Automation and Test in Europe, DATE
ISSN (Print)1530-1591

Conference

Conference2025 Design, Automation and Test in Europe Conference, DATE 2025
Country/TerritoryFrance
CityLyon
Period31/03/252/04/25

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 7 - Affordable and Clean Energy
    SDG 7 Affordable and Clean Energy

Keywords

  • approximate computing
  • dynamic sparse attention
  • energy-efficient design
  • transformer accelerator

Fingerprint

Dive into the research topics of 'MCTA: A Multi-Stage Co-Optimized Transformer Accelerator with Energy-Efficient Dynamic Sparse Optimization'. Together they form a unique fingerprint.

Cite this