Abstract
As Transformer-based models continue to enhance service quality across various domains, their intensive computational requirements are exacerbating the AI energy crisis. Traditional energy-efficient Transformer architectures primarily focus on optimizing the Attention stage due to its high algorithmic complexity (O(n2)). However, linear layers can also be significant energy consumers, sometimes accounting for over 70% of total energy usage. Although existing approaches such as sparsity have improved the Attention stage, the optimization space within linear layers is not fully exploited. In this paper, we introduce the multi-stage co-optimized Transformer accelerator (MCTA) for optimizing energy efficiency. Our approach independently enhances the Query-Key-Value generation, Attention, and Feed-forward Neural Network stages. It employs two novel techniques: Low-overhead Mask Generation (LMG) for dynamically identifying unimportant calculations with minimal energy costs, and Cascaded Mask Derivation (CMD) for streamlining the mask generation process through parallel processing. Experimental results show that MCTA achieves an average energy reduction of 1.48 × with only a 1% accuracy loss compared to state-of-the-art accelerators. This work demonstrates the potential for significant energy savings in Transformer models without the need for retraining, paving the way for more sustainable AI applications.
| Original language | English |
|---|---|
| Title of host publication | 2025 Design, Automation and Test in Europe Conference, DATE 2025 - Proceedings |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| ISBN (Electronic) | 9783982674100 |
| DOIs | |
| State | Published - 2025 |
| Event | 2025 Design, Automation and Test in Europe Conference, DATE 2025 - Lyon, France Duration: 31 Mar 2025 → 2 Apr 2025 |
Publication series
| Name | Proceedings -Design, Automation and Test in Europe, DATE |
|---|---|
| ISSN (Print) | 1530-1591 |
Conference
| Conference | 2025 Design, Automation and Test in Europe Conference, DATE 2025 |
|---|---|
| Country/Territory | France |
| City | Lyon |
| Period | 31/03/25 → 2/04/25 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 7 Affordable and Clean Energy
Keywords
- approximate computing
- dynamic sparse attention
- energy-efficient design
- transformer accelerator
Fingerprint
Dive into the research topics of 'MCTA: A Multi-Stage Co-Optimized Transformer Accelerator with Energy-Efficient Dynamic Sparse Optimization'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver