TY - GEN
T1 - Open-ended Long Text Generation via Masked Language Modeling
AU - Liang, Xiaobo
AU - Tang, Zecheng
AU - Li, Juntao
AU - Zhang, Min
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - Pre-trained autoregressive (AR) language models such as BART and GPTs have dominated Open-ended Long Text Generation (Open-LTG). However, the AR nature will decrease the inference efficiency along with the increase of generation length, which hinder their application in Open-LTG. To improve inference efficiency, we alternatively explore the potential of the pre-trained masked language models (MLMs) along with a representative iterative non-autoregressive (NAR) decoding strategy for Open-LTG. Our preliminary study shows that pre-trained MLMs can merely generate short text and will collapse for long text modeling. To enhance the long text generation capability of MLMs, we introduce two simple yet effective strategies for the iterative NAR model: dynamic sliding window attention (DSWA) and linear temperature decay (LTD). It can alleviate long-distance collapse problems and achieve longer text generation with a flexible trade-off between performance and inference speedup. Experiments on the storytelling and multi-paragraph opinionated article writing tasks show that pre-trained MLMs can achieve more than 3 × → 13 × speedup with better performance than strong AR models. Our code is available at GitHub.
AB - Pre-trained autoregressive (AR) language models such as BART and GPTs have dominated Open-ended Long Text Generation (Open-LTG). However, the AR nature will decrease the inference efficiency along with the increase of generation length, which hinder their application in Open-LTG. To improve inference efficiency, we alternatively explore the potential of the pre-trained masked language models (MLMs) along with a representative iterative non-autoregressive (NAR) decoding strategy for Open-LTG. Our preliminary study shows that pre-trained MLMs can merely generate short text and will collapse for long text modeling. To enhance the long text generation capability of MLMs, we introduce two simple yet effective strategies for the iterative NAR model: dynamic sliding window attention (DSWA) and linear temperature decay (LTD). It can alleviate long-distance collapse problems and achieve longer text generation with a flexible trade-off between performance and inference speedup. Experiments on the storytelling and multi-paragraph opinionated article writing tasks show that pre-trained MLMs can achieve more than 3 × → 13 × speedup with better performance than strong AR models. Our code is available at GitHub.
UR - https://www.scopus.com/pages/publications/85174000323
U2 - 10.18653/v1/2023.acl-long.13
DO - 10.18653/v1/2023.acl-long.13
M3 - 会议稿件
AN - SCOPUS:85174000323
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 223
EP - 241
BT - Long Papers
PB - Association for Computational Linguistics (ACL)
T2 - 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Y2 - 9 July 2023 through 14 July 2023
ER -