Skip to main navigation Skip to search Skip to main content

TFIGF: Fire data augmentation model based on text-to-image synthesis

  • Hongyang Zhao
  • , Yanan Guo*
  • , Xingdong Li
  • , Yi Liu
  • , Jing Jin
  • *Corresponding author for this work
  • Northeast Forestry University
  • Harbin Institute of Technology

Research output: Contribution to journalArticlepeer-review

Abstract

Data imbalance is one of the most challenging issues in deep learning, particularly in the domain of fire detection. In this field, the number of non-fire images significantly exceeds that of fire images, and the diversity of background information in images poses substantial challenges to fire detection. Recently, there have been significant advancements in generating images from textual descriptions using large language models. Inspired by this progress, this paper proposes an innovative Text-to-Image Fire Image Generation Framework (TFIGF). This framework aims to address the problem of insufficient model training due to a lack of adequate positive samples by generating fire images with varied backgrounds, thereby enhancing the efficiency and accuracy of fire detection. The proposed TFIGF framework consists of a front-end image generator and a back-end image filter. The image generator, comprising a feature fusion component, a CLIP image encoder based on the Vision Transformer (ViT), and a feature generation segment, is capable of merging textual information with the prior knowledge in the pre-trained CLIP-ViT model to produce images, enhancing the relevance and diversity of the generated images. Images produced by the image generator are evaluated and filtered by the image filter to obtain fire images most congruent with the textual descriptions. The proposed image filter converts the generated visual information into textual descriptions using ViT and GPT-3, and measures the alignment between the generated images and input text using cosine similarity. The proposed method can generate higher-quality images compared to state-of-the-art generative image methods. Furthermore, to verify the improvements in accuracy and reliability of fire detection with images generated by TFIGF, we constructed datasets augmented to various sizes, trained several popular detection models on these datasets, and tested them with real-world data. Experimental results demonstrate that images generated by TFIGF significantly enhance network performance in fire detection, confirming the framework's potential and practicality in addressing data imbalance issues.

Original languageEnglish
Article number132912
JournalNeurocomputing
Volume675
DOIs
StatePublished - 28 Apr 2026

Keywords

  • Data augmentation
  • Fire detection
  • Generative model
  • Text to image

Fingerprint

Dive into the research topics of 'TFIGF: Fire data augmentation model based on text-to-image synthesis'. Together they form a unique fingerprint.

Cite this