Skip to main navigation Skip to search Skip to main content

Agri-CM3: A Chinese Massive Multi-modal, Multi-level Benchmark for Agricultural Understanding and Reasoning

  • Haotian Wang
  • , Yi Guan
  • , Fanshu Meng
  • , Chao Zhao
  • , Lian Yan
  • , Yang Yang
  • , Jingchi Jiang*
  • *Corresponding author for this work
  • Harbin Institute of Technology
  • Changchun University of Science and Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Multi-modal Large Language Models (MLLMs) integrating images, text, and speech can provide farmers with accurate diagnoses and treatment of pests and diseases, enhancing agricultural efficiency and sustainability. However, existing benchmarks lack comprehensive evaluations, particularly in multi-level reasoning, making it challenging to identify model limitations. To address this issue, we introduce Agri-CM3, an expert-validated benchmark assessing MLLMs' understanding and reasoning in agricultural management. It includes 3,939 images and 15,901 multi-level multiple-choice questions with detailed explanations. Evaluations of 45 MLLMs reveal significant gaps. Even GPT-4o achieves only 63.64% accuracy, falling short in fine-grained reasoning tasks. Analysis across three reasoning levels and seven compositional abilities highlights key challenges in accuracy and cognitive understanding. Our study provides insights for advancing MLLMs in agricultural management, driving their development and application. Code and data are available at https://github.com/HIT-Kwoo/Agri-CM3.

Original languageEnglish
Title of host publicationLong Papers
EditorsWanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
PublisherAssociation for Computational Linguistics (ACL)
Pages11729-11754
Number of pages26
ISBN (Electronic)9798891762510
DOIs
StatePublished - 2025
Event63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 - Vienna, Austria
Duration: 27 Jul 20251 Aug 2025

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
Volume1
ISSN (Print)0736-587X

Conference

Conference63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Country/TerritoryAustria
CityVienna
Period27/07/251/08/25

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 2 - Zero Hunger
    SDG 2 Zero Hunger

Fingerprint

Dive into the research topics of 'Agri-CM3: A Chinese Massive Multi-modal, Multi-level Benchmark for Agricultural Understanding and Reasoning'. Together they form a unique fingerprint.

Cite this