TY - GEN
T1 - Sensitive information detection based on convolution neural network and bi-directional LSTM
AU - Lin, Yan
AU - Xu, Guosheng
AU - Xu, Guoai
AU - Chen, Yudong
AU - Sun, Dawei
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/12
Y1 - 2020/12
N2 - Electronic documents can carry lots of information and are widely used in daily lives. It will cause substantial economic losses to individual users, enterprises, and governments when the documents containing sensitive information are leaked. How to detect sensitive information to prevent data leakage is still a challenge in the field of information security. This paper mainly focuses on the detection of unstructured documents containing sensitive information. Governments, military, and other institutions can actively mark whether the electronic documents contain sensitive information according to the detection results. We propose a reliable method to detect sensitive electronic documents automatically and compare it with other basic methods. The algorithm structure can extract the characteristics of the data more comprehensively to obtain better detection results. Our model outperformed the other models with 93.44 % accuracy. Our model can also reduce the time cost, which is beneficial for realistic production.
AB - Electronic documents can carry lots of information and are widely used in daily lives. It will cause substantial economic losses to individual users, enterprises, and governments when the documents containing sensitive information are leaked. How to detect sensitive information to prevent data leakage is still a challenge in the field of information security. This paper mainly focuses on the detection of unstructured documents containing sensitive information. Governments, military, and other institutions can actively mark whether the electronic documents contain sensitive information according to the detection results. We propose a reliable method to detect sensitive electronic documents automatically and compare it with other basic methods. The algorithm structure can extract the characteristics of the data more comprehensively to obtain better detection results. Our model outperformed the other models with 93.44 % accuracy. Our model can also reduce the time cost, which is beneficial for realistic production.
KW - Convolutional neural network
KW - Data leak
KW - Information security
KW - Sensitive information prevention
KW - Unstructured documents
UR - https://www.scopus.com/pages/publications/85101296838
U2 - 10.1109/TrustCom50675.2020.00223
DO - 10.1109/TrustCom50675.2020.00223
M3 - 会议稿件
AN - SCOPUS:85101296838
T3 - Proceedings - 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2020
SP - 1614
EP - 1621
BT - Proceedings - 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2020
A2 - Wang, Guojun
A2 - Ko, Ryan
A2 - Bhuiyan, Md Zakirul Alam
A2 - Pan, Yi
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 19th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2020
Y2 - 29 December 2020 through 1 January 2021
ER -