Skip to main navigation Skip to search Skip to main content

A study on corpus content display and IP protection

  • School of Computer Science and Technology, Harbin Institute of Technology
  • Hebei Agricultural University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Corpus has played an important role in most of research fields, especially in natural language processing. Some research demos provided detailed corpus content to highlight the contribution they have made, while overlook the security of corpus. In this paper, we explore content leakage resulted from the content display through a crawler. A website for displaying corpus is selected to be crawled by a simply crawler algorithm with some strategies we present. It is estimated that over 85% of the corpus can be downloaded, which means a substantial threaten to its IP right. Finally, we discuss the protection measures for content display, and give some valid suggestions for information content protection in technology and law.

Original languageEnglish
Title of host publicationData Science - 4th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2018, Proceedings
EditorsQinglei Zhou, Hongzhi Wang, Wei Xie, Zeguang Lu, Qiguang Miao, Yan Wang
PublisherSpringer Verlag
Pages108-119
Number of pages12
ISBN (Print)9789811322051
DOIs
StatePublished - 2018
Externally publishedYes
Event4th International Conference of Pioneer Computer Scientists, Engineers and Educators, ICPCSEE 2018 - Zhengzhou, China
Duration: 21 Sep 201823 Sep 2018

Publication series

NameCommunications in Computer and Information Science
Volume902
ISSN (Print)1865-0929

Conference

Conference4th International Conference of Pioneer Computer Scientists, Engineers and Educators, ICPCSEE 2018
Country/TerritoryChina
CityZhengzhou
Period21/09/1823/09/18

Keywords

  • Corpus content display
  • Corpus security
  • Information content protection

Fingerprint

Dive into the research topics of 'A study on corpus content display and IP protection'. Together they form a unique fingerprint.

Cite this