Skip to main navigation Skip to search Skip to main content

Smoothing for bracketing induction

  • Xiangyu Duan
  • , Min Zhang*
  • , Wenliang Chen
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Bracketing induction is the unsupervised learning of hierarchical constituents without labeling their syntactic categories such as verb phrase (VP) from natural raw sentences. Constituent Context Model (CCM) is an effective generative model for the bracketing induction, but the CCM computes probability of a constituent in a very straightforward way no matter how long this constituent is. Such method causes severe data sparse problem because long constituents are more unlikely to appear in test set. To overcome the data sparse problem, this paper proposes to define a non-parametric Bayesian prior distribution, namely the Pitman-Yor Process (PYP) prior, over constituents for constituent smoothing. The PYP prior functions as a back-off smoothing method through using a hierarchical smoothing scheme (HSS). Various kinds of HSS are proposed in this paper. We find that two kinds of HSS are effective, attaining or significantly improving the state-ofthe- art performance of the bracketing induction evaluated on standard treebanks of various languages, while another kind of HSS, which is commonly used for smoothing sequences by ngram Markovization, is not effective for improving the performance of the CCM.

Original languageEnglish
Title of host publicationIJCAI 2013 - Proceedings of the 23rd International Joint Conference on Artificial Intelligence
Pages2085-2091
Number of pages7
StatePublished - 2013
Externally publishedYes
Event23rd International Joint Conference on Artificial Intelligence, IJCAI 2013 - Beijing, China
Duration: 3 Aug 20139 Aug 2013

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
ISSN (Print)1045-0823

Conference

Conference23rd International Joint Conference on Artificial Intelligence, IJCAI 2013
Country/TerritoryChina
CityBeijing
Period3/08/139/08/13

Fingerprint

Dive into the research topics of 'Smoothing for bracketing induction'. Together they form a unique fingerprint.

Cite this