TY - GEN
T1 - Protecting data privacy from being inferred from high dimensional correlated data
AU - Ba, Huafeng
AU - Gao, Xiaoming
AU - Zhang, Xiaofeng
AU - He, Zhenyu
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/10/16
Y1 - 2014/10/16
N2 - In the era of big data, privacy becomes a challenging issue which already attracts a good number of research efforts. In the literature, most of existing privacy preserving algorithms focus on protecting users' privacy from being disclosed by making the set of designated semi-id features indiscriminate. However, how to automatically determine the appropriate semi-id features from high-dimensional correlated data is seldom studied. Therefore, in this paper we first theoretically study the problem and propose the IPFS algorithm to find all possible features forming the candidate semi-id feature set which can infer users' privacy. Then, the KIPFS algorithm is proposed to find the key features from the candidate semi-id feature set. By anonymizing the key feature set, called as key inferring privacy features (KIPFS), users' privacy is protected. To evaluate the effectiveness and the efficacy of the proposed approach, two state-of-the-art algorithms, i.e., K-anonymity and t-closeness, applied on the designated semi-id feature set are chose as the baseline algorithms and their revised versions are applied on the KIPFS for the performance comparison. The promising results showed that by anonymizing the identified KIPFS, both aforementioned algorithms can achieve better performance than the original ones in terms of efficiency and data quality.
AB - In the era of big data, privacy becomes a challenging issue which already attracts a good number of research efforts. In the literature, most of existing privacy preserving algorithms focus on protecting users' privacy from being disclosed by making the set of designated semi-id features indiscriminate. However, how to automatically determine the appropriate semi-id features from high-dimensional correlated data is seldom studied. Therefore, in this paper we first theoretically study the problem and propose the IPFS algorithm to find all possible features forming the candidate semi-id feature set which can infer users' privacy. Then, the KIPFS algorithm is proposed to find the key features from the candidate semi-id feature set. By anonymizing the key feature set, called as key inferring privacy features (KIPFS), users' privacy is protected. To evaluate the effectiveness and the efficacy of the proposed approach, two state-of-the-art algorithms, i.e., K-anonymity and t-closeness, applied on the designated semi-id feature set are chose as the baseline algorithms and their revised versions are applied on the KIPFS for the performance comparison. The promising results showed that by anonymizing the identified KIPFS, both aforementioned algorithms can achieve better performance than the original ones in terms of efficiency and data quality.
UR - https://www.scopus.com/pages/publications/84912564901
U2 - 10.1109/WI-IAT.2014.139
DO - 10.1109/WI-IAT.2014.139
M3 - 会议稿件
AN - SCOPUS:84912564901
T3 - Proceedings - 2014 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT 2014
SP - 495
EP - 502
BT - Proceedings - 2014 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT 2014
A2 - Skowron, Andrzej
A2 - Dey, Lipika
A2 - Krasuski, Adam
A2 - Li, Yuefeng
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2014 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT 2014
Y2 - 11 August 2014 through 14 August 2014
ER -