TY - GEN
T1 - GPU-BTM
T2 - 5th IEEE International Conference on Data Science in Cyberspace, DSC 2020
AU - Guo, Yibing
AU - Huang, Yutao
AU - Ding, Ye
AU - Qi, Shuhan
AU - Wang, Xuan
AU - Liao, Qing
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/7
Y1 - 2020/7
N2 - Recently, short texts become very popular in social life. To understand short texts, researchers develop topic models to extract topic information. However, conventional topic models mainly focus on long documents which cannot deal with the sparsity problem of short text. In this paper, we propose a novel topic model for short text called GPU-BTM, which incorporates Generalized Pólya Urn technique into Biterm Topic Model. GPU-BTM utilizes the similarity information and the co-occurrence pattern of words simultaneously to handle the sparsity problem. Specifically, the GPU module considers the similarity information among words, so that GPU-BTM generates more coherent topics. On the other hand, BTM module tries to capture the co-occurrence pattern of words so that the enriched contexts relieve the data sparsity problem. In the experiment part, the results demonstrate that GPU-BTM model outperforms four latest comparison models on two real world short text datasets.
AB - Recently, short texts become very popular in social life. To understand short texts, researchers develop topic models to extract topic information. However, conventional topic models mainly focus on long documents which cannot deal with the sparsity problem of short text. In this paper, we propose a novel topic model for short text called GPU-BTM, which incorporates Generalized Pólya Urn technique into Biterm Topic Model. GPU-BTM utilizes the similarity information and the co-occurrence pattern of words simultaneously to handle the sparsity problem. Specifically, the GPU module considers the similarity information among words, so that GPU-BTM generates more coherent topics. On the other hand, BTM module tries to capture the co-occurrence pattern of words so that the enriched contexts relieve the data sparsity problem. In the experiment part, the results demonstrate that GPU-BTM model outperforms four latest comparison models on two real world short text datasets.
KW - Auxiliary information
KW - Short text
KW - Topic model
UR - https://www.scopus.com/pages/publications/85092060762
U2 - 10.1109/DSC50466.2020.00037
DO - 10.1109/DSC50466.2020.00037
M3 - 会议稿件
AN - SCOPUS:85092060762
T3 - Proceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020
SP - 198
EP - 205
BT - Proceedings - 2020 IEEE 5th International Conference on Data Science in Cyberspace, DSC 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 27 July 2020 through 29 July 2020
ER -