Skip to main navigation Skip to search Skip to main content

Multi-task self-supervised learning for disfluency detection

  • Shaolei Wang
  • , Wanxiang Che*
  • , Qi Liu
  • , Pengda Qin
  • , Ting Liu
  • , William Yang Wang
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Most existing approaches to disfluency detection heavily rely on human-annotated data, which is expensive to obtain in practice. To tackle the training data bottleneck, we investigate methods for combining multiple self-supervised tasksi. e., supervised tasks where data can be collected without manual labeling. First, we construct large-scale pseudo training data by randomly adding or deleting words from unlabeled news data, and propose two self-supervised pre-training tasks: (i) tagging task to detect the added noisy words. (ii) sentence classification to distinguish original sentences from grammatically-incorrect sentences. We then combine these two tasks to jointly train a network. The pre-trained network is then fine-tuned using human-annotated disfluency detection training data. Experimental results on the commonly used English Switchboard test set show that our approach can achieve competitive performance compared to the previous systems (trained using the full dataset) by using less than 1% (1000 sentences) of the training data. Our method trained on the full dataset significantly outperforms previous methods, reducing the error by 21% on English Switchboard.

Original languageEnglish
Title of host publicationAAAI 2020 - 34th AAAI Conference on Artificial Intelligence
PublisherAAAI press
Pages9193-9200
Number of pages8
ISBN (Electronic)9781577358350
StatePublished - 2020
Event34th AAAI Conference on Artificial Intelligence, AAAI 2020 - New York, United States
Duration: 7 Feb 202012 Feb 2020

Publication series

NameAAAI 2020 - 34th AAAI Conference on Artificial Intelligence

Conference

Conference34th AAAI Conference on Artificial Intelligence, AAAI 2020
Country/TerritoryUnited States
CityNew York
Period7/02/2012/02/20

Fingerprint

Dive into the research topics of 'Multi-task self-supervised learning for disfluency detection'. Together they form a unique fingerprint.

Cite this