TY - GEN
T1 - A large-scale Chinese long-text extractive summarization corpus
AU - Chen, Kai
AU - Fu, Guanyu
AU - Chen, Qingcai
AU - Hu, Baotian
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - Recently, large-scale datasets have vastly facilitated the development in nearly domains of Natural Language Processing. However, lacking large scale Chinese corpus is still a critical bottleneck for further research on deep text summarization methods. In this paper, we publish a large-scale Chinese Long-text Extractive Summarization corpus named CLES. The CLES contains about 104K pairs, which is originally collected from Sina Weibo1. To verify the quality of the corpus, we also manually tagged the relevance score of 5,000 pairs. Our benchmark models on the proposed corpus include conventional deep learning based extractive models and several pre-trained Bert-based algorithms. Their performances are reported and briefly analyzed to facilitate further research on the corpus.
AB - Recently, large-scale datasets have vastly facilitated the development in nearly domains of Natural Language Processing. However, lacking large scale Chinese corpus is still a critical bottleneck for further research on deep text summarization methods. In this paper, we publish a large-scale Chinese Long-text Extractive Summarization corpus named CLES. The CLES contains about 104K pairs, which is originally collected from Sina Weibo1. To verify the quality of the corpus, we also manually tagged the relevance score of 5,000 pairs. Our benchmark models on the proposed corpus include conventional deep learning based extractive models and several pre-trained Bert-based algorithms. Their performances are reported and briefly analyzed to facilitate further research on the corpus.
KW - Large Scale
KW - Long-Text
KW - Pre-trained algorithm
KW - Text Summarization
UR - https://www.scopus.com/pages/publications/85115085967
U2 - 10.1109/ICASSP39728.2021.9414946
DO - 10.1109/ICASSP39728.2021.9414946
M3 - 会议稿件
AN - SCOPUS:85115085967
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 7828
EP - 7832
BT - 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021
Y2 - 6 June 2021 through 11 June 2021
ER -