Skip to main navigation Skip to search Skip to main content

Accelerating Neural Architecture Search for Natural Language Processing with Knowledge Distillation and Earth Mover's Distance

  • Jianquan Li
  • , Xiaokang Liu
  • , Sheng Zhang
  • , Min Yang*
  • , Ruifeng Xu
  • , Fengqing Qin
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recent AI research has witnessed increasing interests in automatically designing the architecture of deep neural networks, which is coined as neural architecture search (NAS). The automatically searched network architectures via NAS methods have outperformed manually designed architectures on some NLP tasks. However, training a large number of model configurations for efficient NAS is computationally expensive, creating a substantial barrier for applying NAS methods in real-life applications. In this paper, we propose to accelerate neural architecture search for natural language processing based on knowledge distillation (called KD-NAS). Specifically, instead of searching the optimal network architecture on the validation set conditioned on the optimal network weights on the training set, we learn the optimal network by minimizing the knowledge loss transferred from a pre-trained teacher network to the searching network based on Earth Mover's Distance (EMD). Experiments on five datasets show that our method achieves promising performance compared to strong competitors on both accuracy and searching speed. For reproducibility, we submit the code at: https://github.com/lxk00/KD-NAS-EMD.

Original languageEnglish
Title of host publicationSIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery, Inc
Pages2091-2095
Number of pages5
ISBN (Electronic)9781450380379
DOIs
StatePublished - 11 Jul 2021
Externally publishedYes
Event44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021 - Virtual, Online, Canada
Duration: 11 Jul 202115 Jul 2021

Publication series

NameSIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Conference

Conference44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021
Country/TerritoryCanada
CityVirtual, Online
Period11/07/2115/07/21

Keywords

  • earth mover's distance
  • knowledge distillation
  • neural architecture search

Fingerprint

Dive into the research topics of 'Accelerating Neural Architecture Search for Natural Language Processing with Knowledge Distillation and Earth Mover's Distance'. Together they form a unique fingerprint.

Cite this