Abstract
Multi-modal Large Language Models (MLLMs) integrating images, text, and speech can provide farmers with accurate diagnoses and treatment of pests and diseases, enhancing agricultural efficiency and sustainability. However, existing benchmarks lack comprehensive evaluations, particularly in multi-level reasoning, making it challenging to identify model limitations. To address this issue, we introduce Agri-CM3, an expert-validated benchmark assessing MLLMs' understanding and reasoning in agricultural management. It includes 3,939 images and 15,901 multi-level multiple-choice questions with detailed explanations. Evaluations of 45 MLLMs reveal significant gaps. Even GPT-4o achieves only 63.64% accuracy, falling short in fine-grained reasoning tasks. Analysis across three reasoning levels and seven compositional abilities highlights key challenges in accuracy and cognitive understanding. Our study provides insights for advancing MLLMs in agricultural management, driving their development and application. Code and data are available at https://github.com/HIT-Kwoo/Agri-CM3.
| Original language | English |
|---|---|
| Title of host publication | Long Papers |
| Editors | Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 11729-11754 |
| Number of pages | 26 |
| ISBN (Electronic) | 9798891762510 |
| DOIs | |
| State | Published - 2025 |
| Event | 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 - Vienna, Austria Duration: 27 Jul 2025 → 1 Aug 2025 |
Publication series
| Name | Proceedings of the Annual Meeting of the Association for Computational Linguistics |
|---|---|
| Volume | 1 |
| ISSN (Print) | 0736-587X |
Conference
| Conference | 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 |
|---|---|
| Country/Territory | Austria |
| City | Vienna |
| Period | 27/07/25 → 1/08/25 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 2 Zero Hunger
Fingerprint
Dive into the research topics of 'Agri-CM3: A Chinese Massive Multi-modal, Multi-level Benchmark for Agricultural Understanding and Reasoning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver