TY - GEN
T1 - Length Extrapolation of Transformers
T2 - 2024 Findings of the Association for Computational Linguistics, EMNLP 2024
AU - Zhao, Liang
AU - Feng, Xiachong
AU - Feng, Xiaocheng
AU - Zhong, Weihong
AU - Xu, Dongliang
AU - Yang, Qing
AU - Liu, Hongtao
AU - Qin, Bing
AU - Liu, Ting
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - Built upon the Transformer, large language models (LLMs) have captured worldwide attention due to their remarkable abilities.Nevertheless, all Transformer-based models including LLMs suffer from a preset length limit and can hardly generalize from short training sequences to longer inference ones, namely, they cannot perform length extrapolation to handle long sequences, which severely hinders their application in scenarios demanding long input sequences such as legal or scientific documents.Thus, numerous methods have emerged to enhance the length extrapolation of Transformers.Despite the great research efforts, a systematic survey is still lacking.To fill this gap, we delve into these advances in a unified notation from the perspective of positional encoding (PE), as it has been considered the primary factor on length extrapolation.Specifically, we begin with extrapolatable PEs that have dominated this research field.Then, we dive into extrapolation methods based on them, covering position interpolation and randomized position methods.Finally, several challenges and future directions in this area are highlighted.Through this survey, we aim to enable the reader to gain a deep understanding of existing methods and provide stimuli for future research.
AB - Built upon the Transformer, large language models (LLMs) have captured worldwide attention due to their remarkable abilities.Nevertheless, all Transformer-based models including LLMs suffer from a preset length limit and can hardly generalize from short training sequences to longer inference ones, namely, they cannot perform length extrapolation to handle long sequences, which severely hinders their application in scenarios demanding long input sequences such as legal or scientific documents.Thus, numerous methods have emerged to enhance the length extrapolation of Transformers.Despite the great research efforts, a systematic survey is still lacking.To fill this gap, we delve into these advances in a unified notation from the perspective of positional encoding (PE), as it has been considered the primary factor on length extrapolation.Specifically, we begin with extrapolatable PEs that have dominated this research field.Then, we dive into extrapolation methods based on them, covering position interpolation and randomized position methods.Finally, several challenges and future directions in this area are highlighted.Through this survey, we aim to enable the reader to gain a deep understanding of existing methods and provide stimuli for future research.
UR - https://www.scopus.com/pages/publications/85217617757
U2 - 10.18653/v1/2024.findings-emnlp.582
DO - 10.18653/v1/2024.findings-emnlp.582
M3 - 会议稿件
AN - SCOPUS:85217617757
T3 - EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024
SP - 9959
EP - 9977
BT - EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024
A2 - Al-Onaizan, Yaser
A2 - Bansal, Mohit
A2 - Chen, Yun-Nung
PB - Association for Computational Linguistics (ACL)
Y2 - 12 November 2024 through 16 November 2024
ER -