Skip to main navigation Skip to search Skip to main content

Learning to share by masking the non-shared for multi-domain sentiment classification

  • Faculty of Computing, Harbin Institute of Technology
  • Pengcheng Laboratory

Research output: Contribution to journalArticlepeer-review

Abstract

Multi-domain sentiment classification deals with the scenario where labeled data exists for multiple domains but is insufficient for training effective sentiment classifiers that work across domains. Thus, fully exploiting sentiment knowledge shared across domains is crucial for real-world applications. While many existing works try to extract domain-invariant features in high-dimensional space, such models fail to explicitly distinguish between shared and private features at the text level, which to some extent lacks interpretability. Based on the assumption that removing domain-related tokens from texts would help improve their domain invariance, we instead first transform original sentences to be domain-agnostic. To this end, we propose the BERTMasker model which explicitly masks domain-related words from texts, learns domain-invariant sentiment features from these domain-agnostic texts and uses those masked words to form domain-aware sentence representations. Empirical experiments on the benchmark multiple domain sentiment classification datasets demonstrate the effectiveness of our proposed model, which improves the accuracy on multi-domain and cross-domain settings by 1.91% and 3.31% respectively. Further analysis on masking proves that removing those domain-related and sentiment irrelevant tokens decreases texts’ domain separability, resulting in the performance degradation of a BERT-based domain classifier by over 12%.

Original languageEnglish
Pages (from-to)2711-2724
Number of pages14
JournalInternational Journal of Machine Learning and Cybernetics
Volume13
Issue number9
DOIs
StatePublished - Sep 2022
Externally publishedYes

Keywords

  • Cross domain
  • Masking
  • Natural language processing
  • Sentiment analysis

Fingerprint

Dive into the research topics of 'Learning to share by masking the non-shared for multi-domain sentiment classification'. Together they form a unique fingerprint.

Cite this