Skip to main navigation Skip to search Skip to main content

Incremental pre-training from smaller language models

  • Han Zhang
  • , Hui Wang*
  • , Ruifeng Xu
  • *Corresponding author for this work
  • Harbin Institute of Technology Shenzhen
  • Peng Cheng Laboratory
  • Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Large language models have recently become a new learning paradigm and led to state-of-the-art performance across a range of tasks. As explosive open-source pre-trained models are available, it is worth investigating how to better utilize existing models. We propose a simple yet effective method, Incr-Pretrain, for incrementally pre-training language models from smaller well-trained source models. Different layer-wise transfer strategies were introduced for model augmentation including parameter copying, initial value padding, and model distillation. Experiments on multiple zero-shot learning tasks demonstrate satisfying inference performance upon transferring and promising training efficiency during continuing pre-training. Compared to training from scratch, Incr-Pretrain can save up to half the training time to get a similar testing loss.

Original languageEnglish
Title of host publicationSIGHAN 2024 - 10th SIGHAN Workshop on Chinese Language Processing, Proceedings of the Workshop
EditorsKam-Fai Wong, Min Zhang, Ruifeng Xu, Jing Li, Zhongyu Wei, Lin Gui, Bin Liang, Runcong Zhao
PublisherAssociation for Computational Linguistics (ACL)
Pages36-44
Number of pages9
ISBN (Electronic)9798891761551
StatePublished - 2024
Externally publishedYes
Event10th SIGHAN Workshop on Chinese Language Processing, SIGHAN 2024 - Bangkok, Thailand
Duration: 16 Aug 2024 → …

Publication series

NameSIGHAN 2024 - 10th SIGHAN Workshop on Chinese Language Processing, Proceedings of the Workshop

Conference

Conference10th SIGHAN Workshop on Chinese Language Processing, SIGHAN 2024
Country/TerritoryThailand
CityBangkok
Period16/08/24 → …

Fingerprint

Dive into the research topics of 'Incremental pre-training from smaller language models'. Together they form a unique fingerprint.

Cite this