Skip to main navigation Skip to search Skip to main content

Cross-Encoder-Based Semantic Evaluation of Extractive and Generative Question Answering in Low-Resourced African Languages

  • Funebi Francis Ijebu*
  • , Yuanchao Liu
  • , Chengjie Sun
  • , Nobert Jere
  • , Ibomoiye Domor Mienye
  • , Udoinyang Godwin Inyang
  • *Corresponding author for this work
  • School of Computer Science and Technology, Harbin Institute of Technology
  • University of Uyo
  • University of Fort Hare
  • University of Johannesburg

Research output: Contribution to journalArticlepeer-review

Abstract

Efficient language analysis techniques and models are crucial in the artificial intelligence age for enhancing cross-lingual question answering. Transfer learning with state-of-the-art models has been beneficial in this regard, but the performance of low-resource African languages with morphologically rich grammatical structures and unique typologies has shown deficiencies linkable to evaluation techniques and scarce training data. To enhance the former, this paper proposes an evaluation pipeline leveraging the semantic answer similarity method enhanced with automatic answer annotation. The pipeline uses the Language-agnostic BERT Sentence Embedding model integrated with an adapted vector measure to perform cross-lingual text analysis after answer prediction. Experimental results from the multilingual-T5 and AfroXLMR models on nine languages of the AfriQA dataset surpassed existing benchmarks deploying string-based methods for question answer evaluation. The results are also superior to the F1-score-based GPT4 and Llama-2 performances on the same downstream task. The automatic answer annotation technique effectively reduced the labelling time while maintaining a high performance. Thus, the proposed pipeline is more efficient than the prevailing string-based F1 and Exact Match metrics in mixed answer type question–answer evaluations, and it is a more natural performance estimator for models targeting real-world deployment.

Original languageEnglish
Article number119
JournalTechnologies
Volume13
Issue number3
DOIs
StatePublished - Mar 2025
Externally publishedYes

Keywords

  • cross-lingual question answering
  • extractive question answering
  • large language models
  • low-resourced African languages
  • semantic answer similarity

Fingerprint

Dive into the research topics of 'Cross-Encoder-Based Semantic Evaluation of Extractive and Generative Question Answering in Low-Resourced African Languages'. Together they form a unique fingerprint.

Cite this