Skip to main navigation Skip to search Skip to main content

LPV: A log parser based on vectorization for offline and online log parsing

  • Tong Xiao
  • , Zhe Quan*
  • , Zhi Jie Wang
  • , Kaiqi Zhao
  • , Xiangke Liao
  • *Corresponding author for this work
  • Hunan University
  • Chongqing University
  • The University of Auckland
  • National University of Defense Technology

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

As the first and foremost step of typical automatic log analysis, log parsing has attracted a lot of interest. Most of existing studies treat log messages as pure strings and rely on string matching or string distance. In NLP, word2vec has shown very efficient and effective in representing words with low dimensional vectors. Inspired by this, in this paper we propose a novel method, called LPV (Log Parser based on Vectorization), for both offline and online log parsing. The central idea of our method in offline log parsing is to first convert log messages into vectors, and measure the similarity between two log messages by the distance between two vectors, then log messages can be clustered via clustering the vectors, and log templates can be extracted from the resulting clusters. For online log parsing, we also assign log templates with some kind of average vectors, so that the similarity between an incoming log message and each log template can also be measured by the distance between two vectors. We have conducted extensive experiments based on three widely used log datasets, and the results demonstrate that our proposed method LPV can achieve a competitive performance, compared against state-of-the-art log parsing methods.

Original languageEnglish
Title of host publicationProceedings - 20th IEEE International Conference on Data Mining, ICDM 2020
EditorsClaudia Plant, Haixun Wang, Alfredo Cuzzocrea, Carlo Zaniolo, Xindong Wu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1346-1351
Number of pages6
ISBN (Electronic)9781728183169
DOIs
StatePublished - Nov 2020
Externally publishedYes
Event20th IEEE International Conference on Data Mining, ICDM 2020 - Virtual, Sorrento, Italy
Duration: 17 Nov 202020 Nov 2020

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
Volume2020-November
ISSN (Print)1550-4786

Conference

Conference20th IEEE International Conference on Data Mining, ICDM 2020
Country/TerritoryItaly
CityVirtual, Sorrento
Period17/11/2020/11/20

Keywords

  • Clustering
  • Log parsing
  • Log template extraction
  • Vectorization

Fingerprint

Dive into the research topics of 'LPV: A log parser based on vectorization for offline and online log parsing'. Together they form a unique fingerprint.

Cite this