TY - GEN
T1 - Accelerating Neural Architecture Search for Natural Language Processing with Knowledge Distillation and Earth Mover's Distance
AU - Li, Jianquan
AU - Liu, Xiaokang
AU - Zhang, Sheng
AU - Yang, Min
AU - Xu, Ruifeng
AU - Qin, Fengqing
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/7/11
Y1 - 2021/7/11
N2 - Recent AI research has witnessed increasing interests in automatically designing the architecture of deep neural networks, which is coined as neural architecture search (NAS). The automatically searched network architectures via NAS methods have outperformed manually designed architectures on some NLP tasks. However, training a large number of model configurations for efficient NAS is computationally expensive, creating a substantial barrier for applying NAS methods in real-life applications. In this paper, we propose to accelerate neural architecture search for natural language processing based on knowledge distillation (called KD-NAS). Specifically, instead of searching the optimal network architecture on the validation set conditioned on the optimal network weights on the training set, we learn the optimal network by minimizing the knowledge loss transferred from a pre-trained teacher network to the searching network based on Earth Mover's Distance (EMD). Experiments on five datasets show that our method achieves promising performance compared to strong competitors on both accuracy and searching speed. For reproducibility, we submit the code at: https://github.com/lxk00/KD-NAS-EMD.
AB - Recent AI research has witnessed increasing interests in automatically designing the architecture of deep neural networks, which is coined as neural architecture search (NAS). The automatically searched network architectures via NAS methods have outperformed manually designed architectures on some NLP tasks. However, training a large number of model configurations for efficient NAS is computationally expensive, creating a substantial barrier for applying NAS methods in real-life applications. In this paper, we propose to accelerate neural architecture search for natural language processing based on knowledge distillation (called KD-NAS). Specifically, instead of searching the optimal network architecture on the validation set conditioned on the optimal network weights on the training set, we learn the optimal network by minimizing the knowledge loss transferred from a pre-trained teacher network to the searching network based on Earth Mover's Distance (EMD). Experiments on five datasets show that our method achieves promising performance compared to strong competitors on both accuracy and searching speed. For reproducibility, we submit the code at: https://github.com/lxk00/KD-NAS-EMD.
KW - earth mover's distance
KW - knowledge distillation
KW - neural architecture search
UR - https://www.scopus.com/pages/publications/85111622480
U2 - 10.1145/3404835.3463017
DO - 10.1145/3404835.3463017
M3 - 会议稿件
AN - SCOPUS:85111622480
T3 - SIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 2091
EP - 2095
BT - SIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery, Inc
T2 - 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2021
Y2 - 11 July 2021 through 15 July 2021
ER -