TY - GEN
T1 - Retrieval-free Knowledge Injection through Multi-Document Traversal for Dialogue Models
AU - Wang, Rui
AU - Bao, Jianzhu
AU - Mi, Fei
AU - Chen, Yi
AU - Wang, Hongru
AU - Wang, Yasheng
AU - Li, Yitong
AU - Shang, Lifeng
AU - Wong, Kam Fai
AU - Xu, Ruifeng
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - Dialogue models are often enriched with extensive external knowledge to provide informative responses through a retrieval-augmented pipeline. Nevertheless, retrieval-augmented approaches rely on finely annotated retrieval training data and knowledge-grounded response generation data, making it costly to transfer. To tackle this challenge, this paper proposed a retrieval-free approach, KiDG, by automatically turning knowledge documents into simulated multi-turn dialogues through a Multi-Document Traversal algorithm. The simulated knowledge-intensive dialogues constructed by KiDG in one domain can be easily used to train and enhance pre-trained dialogue models' knowledge w.r.t. this domain without costly annotation. We conduct extensive experiments comparing retrieval-augmented models and a variety of retrieval-free models. We found that dialogue models enhanced with data simulated with KiDG largely outperform state-of-the-art retrieval-free methods, and it achieves comparable performance compared to retrieval-augmented methods while being better, and cheaper at domain transfer. We have released the code and data at https://github.com/DevoAllen/KiDG.
AB - Dialogue models are often enriched with extensive external knowledge to provide informative responses through a retrieval-augmented pipeline. Nevertheless, retrieval-augmented approaches rely on finely annotated retrieval training data and knowledge-grounded response generation data, making it costly to transfer. To tackle this challenge, this paper proposed a retrieval-free approach, KiDG, by automatically turning knowledge documents into simulated multi-turn dialogues through a Multi-Document Traversal algorithm. The simulated knowledge-intensive dialogues constructed by KiDG in one domain can be easily used to train and enhance pre-trained dialogue models' knowledge w.r.t. this domain without costly annotation. We conduct extensive experiments comparing retrieval-augmented models and a variety of retrieval-free models. We found that dialogue models enhanced with data simulated with KiDG largely outperform state-of-the-art retrieval-free methods, and it achieves comparable performance compared to retrieval-augmented methods while being better, and cheaper at domain transfer. We have released the code and data at https://github.com/DevoAllen/KiDG.
UR - https://www.scopus.com/pages/publications/85174369318
U2 - 10.18653/v1/2023.acl-long.364
DO - 10.18653/v1/2023.acl-long.364
M3 - 会议稿件
AN - SCOPUS:85174369318
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 6608
EP - 6619
BT - Long Papers
PB - Association for Computational Linguistics (ACL)
T2 - 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Y2 - 9 July 2023 through 14 July 2023
ER -