Skip to main navigation Skip to search Skip to main content

The Visual Prism: Refracting Images into Parallel Multilingual Descriptions with Structured Visual Guidance

  • Chengpeng Fu
  • , Xiaocheng Feng*
  • , Yichong Huang
  • , Wenshuai Huo
  • , Baohang Li
  • , Yang Xiang
  • , Ting Liu
  • *Corresponding author for this work
  • Harbin Institute of Technology
  • Peng Cheng Laboratory

Research output: Contribution to journalConference articlepeer-review

Abstract

Parallel corpora, as the foundation of machine translation, remain crucial even in the era of large language models (LLMs) for pre-training and fine-tuning. However, annotating parallel corpora is extremely costly, as it requires annotators to be proficient in multiple languages. To reduce this cost, prior work has explored image-pivoted corpus synthesis, generating multilingual captions for the same image as pseudo-parallel data. Unfortunately, these pseudo corpora suffer from the serious issue of multilingual focus divergence, i.e., the model attending to distinct aspects of the image when generating captions in different languages. To address this problem, we propose a method called PRISMS (Parallel Refracting ImageS into Multilingual descriptions with Structured visual guidance), which leverages semantic graphs as structured visual guidance to unify the focus of multilingual captions. To ensure adherence to this guidance, we introduce two key techniques: supervised fine-tuning using self-generated instructional data, and reinforcement learning with a reward signal based on semantic graph consistency. Experimental results on five languages show that our PRISMS significantly improves the image-pivot parallel corpora synthesis, enabling LLMs to achieve translation performance comparable to that of models trained on manually annotated corpora.

Original languageEnglish
Pages (from-to)30744-30752
Number of pages9
JournalProceedings of the AAAI Conference on Artificial Intelligence
Volume40
Issue number36
DOIs
StatePublished - 2026
Event40th AAAI Conference on Artificial Intelligence, AAAI 2026 - Singapore, Singapore
Duration: 20 Jan 202627 Jan 2026

Fingerprint

Dive into the research topics of 'The Visual Prism: Refracting Images into Parallel Multilingual Descriptions with Structured Visual Guidance'. Together they form a unique fingerprint.

Cite this