Abstract
Data imbalance is one of the most challenging issues in deep learning, particularly in the domain of fire detection. In this field, the number of non-fire images significantly exceeds that of fire images, and the diversity of background information in images poses substantial challenges to fire detection. Recently, there have been significant advancements in generating images from textual descriptions using large language models. Inspired by this progress, this paper proposes an innovative Text-to-Image Fire Image Generation Framework (TFIGF). This framework aims to address the problem of insufficient model training due to a lack of adequate positive samples by generating fire images with varied backgrounds, thereby enhancing the efficiency and accuracy of fire detection. The proposed TFIGF framework consists of a front-end image generator and a back-end image filter. The image generator, comprising a feature fusion component, a CLIP image encoder based on the Vision Transformer (ViT), and a feature generation segment, is capable of merging textual information with the prior knowledge in the pre-trained CLIP-ViT model to produce images, enhancing the relevance and diversity of the generated images. Images produced by the image generator are evaluated and filtered by the image filter to obtain fire images most congruent with the textual descriptions. The proposed image filter converts the generated visual information into textual descriptions using ViT and GPT-3, and measures the alignment between the generated images and input text using cosine similarity. The proposed method can generate higher-quality images compared to state-of-the-art generative image methods. Furthermore, to verify the improvements in accuracy and reliability of fire detection with images generated by TFIGF, we constructed datasets augmented to various sizes, trained several popular detection models on these datasets, and tested them with real-world data. Experimental results demonstrate that images generated by TFIGF significantly enhance network performance in fire detection, confirming the framework's potential and practicality in addressing data imbalance issues.
| Original language | English |
|---|---|
| Article number | 132912 |
| Journal | Neurocomputing |
| Volume | 675 |
| DOIs | |
| State | Published - 28 Apr 2026 |
Keywords
- Data augmentation
- Fire detection
- Generative model
- Text to image
Fingerprint
Dive into the research topics of 'TFIGF: Fire data augmentation model based on text-to-image synthesis'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver