Skip to main navigation Skip to search Skip to main content

Explanation Guided Knowledge Distillation for Pre-trained Language Model Compression

  • Zhao Yang
  • , Yuanzhe Zhang*
  • , Dianbo Sui
  • , Yiming Ju
  • , Jun Zhao
  • , Kang Liu*
  • *Corresponding author for this work
  • University of Chinese Academy of Sciences
  • CAS - Institute of Automation
  • School of Computer Science and Technology (School of Software), Harbin Institute of Technology Weihai

Research output: Contribution to journalArticlepeer-review

Abstract

Knowledge distillation is widely used in pre-trained language model compression, which can transfer knowledge from a cumbersome model to a lightweight one. Though knowledge distillation based model compression has achieved promising performance, we observe that explanations between the teacher model and the student model are not consistent. We argue that the student model should study not only the predictions of the teacher model but also the internal reasoning process. To this end, we propose Explanation Guided Knowledge Distillation (EGKD) in this article, which utilizes explanations to represent the thinking process and improve knowledge distillation. To obtain explanations in our distillation framework, we select three typical explanation methods rooted in different mechanisms, namely gradient-based, perturbation-based, and feature selection methods. Then, to improve computational efficiency, we propose different optimization strategies to utilize the explanations obtained by these three different explanation methods, which could provide the student model with better learning guidance. Experimental results on GLUE demonstrate that leveraging explanations can improve the performance of the student model. Moreover, our EGKD could also be applied to model compression with different architectures.

Original languageEnglish
JournalACM Transactions on Asian and Low-Resource Language Information Processing
Volume23
Issue number2
DOIs
StatePublished - 8 Feb 2024
Externally publishedYes

Keywords

  • Explanation
  • knowledge distillation
  • model compression

Fingerprint

Dive into the research topics of 'Explanation Guided Knowledge Distillation for Pre-trained Language Model Compression'. Together they form a unique fingerprint.

Cite this