Abstract
In recent years, the advancement of pretraining models has revolutionized artificial intelligence, driving significant progress across various domains. In drug discovery, these models have shown remarkable potential by leveraging large-scale data to learn generalizable molecular representations, accelerating the identification of promising drug candidates. However, existing models often rely on atom-based reconstruction techniques to handle molecular structures, yet they frequently overlook substructural details such as functional groups and rings—elements that are critical for drug design and discovery. Furthermore, these models exhibit limitations in task adaptability, which impedes their precision in interpreting and predicting complex chemical environments. To address these challenges, we introduce MolFinePrompt, a fine-grained multimodal molecular pretraining model designed to enhance the representational capacity of molecular structures by integrating functional group data into their topological framework. Employing a contrastive learning approach, MolFinePrompt is pre-trained on a dataset of 316K molecular structure-text pairs and features bespoke task prompt texts for optimized fine-tuning, thereby improving its task-specific comprehension. The effectiveness of MolFinePrompt is validated through exemplary experimental results on cross-modal retrieval, molecular property prediction, and drug interaction prediction tasks.
| Original language | English |
|---|---|
| Article number | 114381 |
| Journal | Knowledge-Based Systems |
| Volume | 329 |
| DOIs | |
| State | Published - 4 Nov 2025 |
| Externally published | Yes |
Keywords
- Molecular functional group
- Molecular pre-training
- Multi-modal fusion
- Prompt tuning
Fingerprint
Dive into the research topics of 'Fine-grained multimodal molecular pretraining via prompt learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver