Skip to main navigation Skip to search Skip to main content

RLKGF: Reinforcement Learning from Knowledge Graph Feedback Without Human Annotations

  • Lian Yan
  • , Chen Tang
  • , Yi Guan
  • , Haotian Wang
  • , Songyuan Wang
  • , Haifeng Liu
  • , Yang Yang
  • , Jingchi Jiang*
  • *Corresponding author for this work
  • Harbin Institute of Technology
  • Institute for Advanced Algorithms Research
  • Changchun University of Science and Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Reinforcement Learning from Human Feedback (RLHF) has been shown to effectively align large language models (LLMs) with human knowledge. However, the lack of human preference labels remains a significant bottleneck when applying RLHF to a downstream domain. Humans in RLHF play a critical role in injecting reasoning preferences into LLM, and we assume the reasoning process underlying human assessments may potentially be replaced by reasoning pathways derived from Knowledge Graphs (KGs). Inspired by this assumption, we propose Reinforcement Learning from Knowledge Graph Feedback (RLKGF), a novel method that leverages KG semantics and structure to derive RL rewards in the absence of manual annotations. Unlike Reinforcement Learning from AI Feedback (RLAIF), RLKGF directly integrates human priors encoded in KGs as the reward model, aligning LLM responses with expert knowledge without additional preference labeling or reward model training. RLKGF structures context-relevant facts into knowledge subgraphs and defines rewards by simulating information flow across semantic and logical connections between question and candidate response entities. Experiments on three public and one private medical dialogue dataset demonstrate that RLKGF significantly outperforms the competitive RLAIF in improving LLM diagnostic accuracy. The code is available at https://github.com/YanPioneer/RLKGF.

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationACL 2025
EditorsWanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
PublisherAssociation for Computational Linguistics (ACL)
Pages6619-6633
Number of pages15
ISBN (Electronic)9798891762565
DOIs
StatePublished - 2025
Event63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 - Vienna, Austria
Duration: 27 Jul 20251 Aug 2025

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Country/TerritoryAustria
CityVienna
Period27/07/251/08/25

Fingerprint

Dive into the research topics of 'RLKGF: Reinforcement Learning from Knowledge Graph Feedback Without Human Annotations'. Together they form a unique fingerprint.

Cite this