TY - GEN
T1 - Euclidean-based entity resolution for evolving data
AU - Lu, Chang
AU - Wang, Hongzhi
AU - Zhang, Yan
AU - Gao, Hong
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/2/11
Y1 - 2016/2/11
N2 - With large companies and corporations becoming increasingly responsible for data collection, in recent years, a growing number of scientists have proposed using a variety of algorithms and different theories to solve the database problem. Even though existing solutions are effective in many cases many, problems are left to solve during the integration of database. The entity resolution (ER) is a crucial problem to solve. ER has been used in many applications during the updating and loading process of the big data set, while the evolving data needs most. The evolving data set are currently used in the biology and computer information a lot, which contains microscope observation and biology information. Even though researchers have proposed different ER methods, the cost of ER problems is usually too large to accept. We use the high-dimensional space Euclidean vector to simulate the states of different entities in big data set. We combine this approach with the parallel improved Top-K algorithm, devising a way to more effectively detect the identity of the entity. Theoretical analysis and experimental results show that the proposed method could perform entity resolution on evolving data effectively and efficiently.
AB - With large companies and corporations becoming increasingly responsible for data collection, in recent years, a growing number of scientists have proposed using a variety of algorithms and different theories to solve the database problem. Even though existing solutions are effective in many cases many, problems are left to solve during the integration of database. The entity resolution (ER) is a crucial problem to solve. ER has been used in many applications during the updating and loading process of the big data set, while the evolving data needs most. The evolving data set are currently used in the biology and computer information a lot, which contains microscope observation and biology information. Even though researchers have proposed different ER methods, the cost of ER problems is usually too large to accept. We use the high-dimensional space Euclidean vector to simulate the states of different entities in big data set. We combine this approach with the parallel improved Top-K algorithm, devising a way to more effectively detect the identity of the entity. Theoretical analysis and experimental results show that the proposed method could perform entity resolution on evolving data effectively and efficiently.
KW - Entity resolution
KW - Euclidean vector
KW - Top-K
UR - https://www.scopus.com/pages/publications/84963986256
U2 - 10.1109/IMCCC.2015.328
DO - 10.1109/IMCCC.2015.328
M3 - 会议稿件
AN - SCOPUS:84963986256
T3 - Proceedings - 5th International Conference on Instrumentation and Measurement, Computer, Communication, and Control, IMCCC 2015
SP - 1547
EP - 1552
BT - Proceedings - 5th International Conference on Instrumentation and Measurement, Computer, Communication, and Control, IMCCC 2015
A2 - Li, Jun-Bao
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th International Conference on Instrumentation and Measurement, Computer, Communication, and Control, IMCCC 2015
Y2 - 18 September 2015 through 20 September 2015
ER -