Skip to main navigation Skip to search Skip to main content

Direct Behavior Optimization: Unlocking the Potential of Lightweight LLMs

  • Hongming Yang
  • , Shi Lin
  • , Jun Shao*
  • , Changting Lin
  • , Donghai Zhu*
  • , Meng Han
  • , Qinglei Kong
  • *Corresponding author for this work
  • Zhejiang Gongshang University
  • Zhejiang University
  • Zhejiang Key Laboratory of Big Data and Future E-Commerce Technology
  • GenTel.io
  • Harbin Institute of Technology Shenzhen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Lightweight Large Language Models (LwLLMs) are reduced-parameter, optimized models designed to run efficiently on consumer-grade hardware, offering significant advantages in resource efficiency, cost-effectiveness, and data privacy. However, these models often struggle with limited inference and reasoning capabilities, which restrict their performance on complex tasks and limit their practical applicability. Moreover, existing prompt optimization methods typically rely on extensive manual effort or the meta-cognitive abilities of state-of-the-art LLMs, making them less effective for LwLLMs. To address these challenges, we introduce DeBoP, a new Direct Behavior Optimization Paradigm, original from the Chain-of-Thought (CoT) prompting technique. Unlike CoT Prompting, DeBoP is an automatic optimization method, which focuses on the optimization directly on the behavior of LwLLMs. In particular, DeBoP transforms the optimization of complex prompts into the optimization of discrete, quantifiable execution sequences using a gradient-free Monte Carlo Tree Search. We evaluate DeBoP on seven challenging tasks where state-of-the-art LLMs excel but LwLLMs generally underperform. Experimental results demonstrate that DeBoP significantly outperforms recent prompt optimization methods on most tasks. In particular, DeBoP-optimized LwLLMs surpass GPT-3.5 on most tasks while reducing computational time by approximately 60% compared to other automatic prompt optimization methods.

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics
Subtitle of host publicationACL 2025
EditorsWanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
PublisherAssociation for Computational Linguistics (ACL)
Pages19489-19515
Number of pages27
ISBN (Electronic)9798891762565
DOIs
StatePublished - 2025
Externally publishedYes
Event63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025 - Vienna, Austria
Duration: 27 Jul 20251 Aug 2025

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Country/TerritoryAustria
CityVienna
Period27/07/251/08/25

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 8 - Decent Work and Economic Growth
    SDG 8 Decent Work and Economic Growth
  2. SDG 12 - Responsible Consumption and Production
    SDG 12 Responsible Consumption and Production

Fingerprint

Dive into the research topics of 'Direct Behavior Optimization: Unlocking the Potential of Lightweight LLMs'. Together they form a unique fingerprint.

Cite this