CN117931762A

CN117931762A - Log format standardization processing method, device, equipment and storage medium

Info

Publication number: CN117931762A
Application number: CN202410016632.0A
Authority: CN
Inventors: 钱忠杰; 姚广; 赵严
Original assignee: Dongpu Software Co Ltd
Current assignee: Dongpu Software Co Ltd
Priority date: 2024-01-04
Filing date: 2024-01-04
Publication date: 2024-04-26

Abstract

The invention relates to the field of log processing, and discloses a log format standardized processing method, device, equipment and storage medium. The method comprises the following steps: collecting log data of a multi-source system, and extracting key fields from the log data; defining a standard log format, and converting the log data according to the standard log format to obtain standardized log data; and carrying out distributed storage on the standardized log data, carrying out association analysis according to the key fields to obtain analysis results, and carrying out visual display on the analysis results. The invention provides a log format standardized processing method, which realizes the automatic conversion from a multi-format raw log to a standard format by defining a unified standard log format, supports the storage of the log in the unified format after conversion, simplifies log analysis, realizes the quick association inquiry of a distributed log, and effectively solves the problem that the log system is difficult to associate due to independence.

Description

Log format standardization processing method, device, equipment and storage medium

Technical Field

The present invention relates to the field of log processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for log format standardization processing.

Background

A Log (Log) is a text file or system that records or documents events, actions, or processes. In the field of computers, it is commonly used to record the running condition of software programs, error information, user interactions, etc., and Standardization refers to the process of unifying, normalizing and regularizing according to a specific specification or standard. By standardization, the efficiency of various activities can be improved and compatibility and interoperability between different products, services or systems can be ensured. The log standardization (Log Standardization) can be understood as a process of sorting and managing log information according to a certain standard or specification, and can facilitate subsequent data analysis, fault investigation, security audit and other works through the standardized log.

However, the existing log storage system faces the problems of mixed log formats and complex analysis, because log source systems are various in types and different in formats, log data of different systems cannot be directly interconnected and communicated, log information related to the problems can be determined only by manually analyzing the log data one by one, so that the log analysis and system diagnosis efficiency is low, and automation and simple and rapid association analysis cannot be realized.

Accordingly, the prior art is still in need of improvement and development.

Disclosure of Invention

The invention mainly aims to solve the problems that the existing system has complex log format, different semantic definitions, incapability of direct interconnection and intercommunication, and the log information related to the problems can be determined by manually analyzing one by one, so that the log analysis and the system diagnosis have low efficiency and the automation can not be realized.

The first aspect of the present invention provides a log format standardization processing method, including: collecting log data of a multi-source system, and extracting key fields from the log data; defining a standard log format, and converting the log data according to the standard log format to obtain standardized log data; and carrying out distributed storage on the standardized log data, carrying out association analysis according to the key fields to obtain analysis results, and carrying out visual display on the analysis results.

Optionally, in a first implementation manner of the first aspect of the present invention, the step of collecting log data of the multi-source system and extracting a key field from the log data includes: configuring a log collector, configuring a log source and an analysis rule in the log collector, wherein the log source is used for designating a range for collecting log data; and collecting log data through the log collector, and extracting key fields from the log data according to the analysis rule.

Optionally, in a second implementation manner of the first aspect of the present invention, the step of extracting a key field from the log data according to the parsing rule includes: screening effective log data through a regular expression algorithm, and matching according to the effective log data to obtain key log data; and dividing the key log data through a separator algorithm to obtain a plurality of key fields.

Optionally, in a third implementation manner of the first aspect of the present invention, the step of defining a standard log format, and converting the log data according to the standard log format to obtain standardized log data includes: defining a standard log format, and constructing a conversion configurator according to the standard log format; and mapping the key field into the standard log format through the conversion configurator to execute conversion processing to obtain standardized log data.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the step of performing distributed storage on the standardized log data includes: outputting the standardized log in a JSON format and uploading the standardized log to a Kafka queue; the standardized log in the Kafka queue is distributed to a plurality of Elastic nodes through KAFKADIVIDER algorithm, and is stored in an Elastic search through Bulk API.

Optionally, in a fifth implementation manner of the first aspect of the present invention, the step of performing association analysis according to the key field to obtain an analysis result includes: receiving the designated key field name and the operator to construct Lucene inquiry, and inquiring the standardized log to obtain an inquiry result; and designing logic conditions according to the query result, and carrying out association analysis according to the logic conditions to obtain an analysis result.

Optionally, in a sixth implementation manner of the first aspect of the present invention, the step of visually displaying the analysis result includes: counting operation indexes of the log data according to the analysis result; and carrying out visual processing and displaying on the operation index.

The second aspect of the present invention provides a log format standardization processing device, including: the acquisition module is used for acquiring log data of the multi-source system and extracting key fields from the log data; the standardized module is used for defining a standard log format and converting the log data according to the standard log format to obtain standardized log data; and the analysis module is used for carrying out distributed storage on the standardized log data, carrying out association analysis according to the key fields, obtaining an analysis result and carrying out visual display on the analysis result.

Optionally, in a first implementation manner of the second aspect of the present invention, the acquisition module includes: the configuration unit is used for configuring a log collector, configuring a log source and a parsing rule in the log collector, wherein the log source is used for designating the range for collecting log data; and the extraction unit is used for acquiring log data through the log acquisition unit and extracting key fields from the log data according to the analysis rule.

Optionally, in a second implementation manner of the second aspect of the present invention, the extracting unit includes: the screening subunit is used for screening effective log data through a regular expression algorithm and matching the effective log data to obtain key log data; and the segmentation subunit is used for segmenting the key log data through a separator algorithm to obtain a plurality of key fields.

Optionally, in a third implementation manner of the second aspect of the present invention, the normalization module includes: the definition unit is used for defining a standard log format and constructing a conversion configurator according to the standard log format; and the conversion unit is used for mapping the key field into the standard log format through the conversion configurator to execute conversion processing so as to obtain standardized log data.

Optionally, in a fourth implementation manner of the second aspect of the present invention, the analysis module includes: the output unit is used for outputting the standardized log in a JSON format and uploading the standardized log to a Kafka queue; and the storage unit is used for distributing the standardized logs in the Kafka queue to a plurality of Elastic nodes through KAFKADIVIDER algorithm and storing the standardized logs in an Elastic search through Bulk API.

Optionally, in a fifth implementation manner of the second aspect of the present invention, the analysis module further includes: the query unit is used for receiving the designated key field names and operators to construct Lucene query, and querying the standardized log to obtain a query result; and the logic processing unit is used for designing logic conditions according to the query result and carrying out association analysis according to the logic conditions to obtain an analysis result.

Optionally, in a sixth implementation manner of the second aspect of the present invention, the analysis module further includes: the statistics unit is used for counting the operation indexes of the log data according to the analysis result; and the display unit is used for carrying out visual processing on the operation index and displaying the operation index.

A third aspect of the present invention provides a log format normalization processing device, including: a memory and at least one processor, the memory having computer readable instructions stored therein, the memory and the at least one processor being interconnected by a line; the at least one processor invokes the computer readable instructions in the memory to cause the log format normalization processing device to perform the steps of the log format normalization processing method as described above.

A fourth aspect of the present invention provides a computer readable storage medium having stored therein computer readable instructions which, when run on a computer, cause the computer to perform the steps of the log format normalization processing method as described above.

The beneficial effects are that: in the technical scheme of the invention, log data of a multi-source system is collected, and key fields are extracted from the log data; defining a standard log format, and converting the log data according to the standard log format to obtain standardized log data; and carrying out distributed storage on the standardized log data, carrying out association analysis according to the key fields to obtain analysis results, and carrying out visual display on the analysis results. The invention provides a log format standardized processing method, which realizes the automatic conversion from a multi-format raw log to a standard format by defining a unified standard log format, supports the storage of the log in the unified format after conversion, simplifies log analysis, realizes the quick association inquiry of a distributed log, and effectively solves the problem that the log system is difficult to associate due to independence.

Drawings

Fig. 1 is a first flowchart of a log format normalization processing method according to an embodiment of the present invention;

FIG. 2 is a second flowchart of a log format normalization processing method according to an embodiment of the present invention;

FIG. 3 is a third flowchart of a log format normalization processing method according to an embodiment of the present invention;

fig. 4 is a fourth flowchart of a log format normalization processing method according to an embodiment of the present invention;

FIG. 5 is a fifth flowchart of a log format normalization processing method according to an embodiment of the present invention;

FIG. 6 is a sixth flowchart of a log format normalization processing method according to an embodiment of the present invention;

fig. 7 is a seventh flowchart of a log format normalization processing method according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a log format normalization processing device according to an embodiment of the present invention;

Fig. 9 is a schematic diagram of another structure of a log format normalization processing device according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of a log format normalization processing device according to an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a log format standardized processing method, a device, equipment and a storage medium, which are used for collecting log data of a multi-source system and extracting key fields from the log data; defining a standard log format, and converting the log data according to the standard log format to obtain standardized log data; and carrying out distributed storage on the standardized log data, carrying out association analysis according to the key fields to obtain analysis results, and carrying out visual display on the analysis results. The invention solves the problems that the prior system has complex log format, different semantic definitions, incapability of direct interconnection and intercommunication, and the log information related to the problems can be determined only by manually analyzing one by one, so that the log analysis and the system diagnosis have low efficiency.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

For ease of understanding, the following describes a specific flow of an embodiment of the present invention, referring to fig. 1, in which a first embodiment of a log format normalization processing method according to an embodiment of the present invention includes:

s101, collecting log data of a multi-source system, and extracting key fields from the log data;

In this embodiment, log data is collected from multiple source systems, including various applications and devices, and because of the design and business requirements of the system, log data from different systems is often stored in different formats and structures, and thus it is necessary to perform standardized processing on log data of different structures, and after collecting log data, it is necessary to extract key fields from these data, where the key fields may include a timestamp, an event type, an event description, an order ID, or a unique field of other business logs, etc., for subsequent log analysis and processing.

S102, defining a standard log format, and converting the log data according to the standard log format to obtain standardized log data;

In this embodiment, for the purpose of analyzing and understanding log data in heterogeneous format, a standard log format is defined, so that the standard log format can contain all key fields, and is easy to understand and analyze, and the collected log data is converted according to the standard log format to obtain standardized log data.

By way of example, the original, multi-source system log data may be converted in a standard log format using conversion tools or algorithms to generate standard, normalized log data, which may be accomplished automatically or by manual intervention. First, the standard format of the log data needs to be determined, including the format of the log message, the format of the timestamp, the order and name of the fields, etc., based on the particular application or system requirements. For example, a standard log format includes fields such as a time stamp, a system name, an event type, detailed information, etc., according to a certain log format, a script or program is written to convert the log data, the script can read the original log data, convert it according to a specified format, and write the result into a new standardized log file, during the conversion process, the function of the script can also be tested and verified, so that it can correctly convert the log data according to the standard format, for example, by comparing the original log data with the standardized log data to verify, so as to ensure the conversion correctness, if the log data amount is large, it can also be considered to use a batch processing technique to improve the conversion efficiency, for example, a script or program can be used to periodically scan the log directory, add a new log file into the conversion queue, and use a multithreading or asynchronous processing to process a plurality of files simultaneously, so as to improve the conversion efficiency.

S103, carrying out distributed storage on the standardized log data, carrying out association analysis according to the key fields to obtain analysis results, and carrying out visual display on the analysis results.

In this embodiment, the distributed storage is a storage manner that data is stored in a plurality of computing nodes (usually a plurality of servers) in a scattered manner, so that high reliability, expandability and high performance of the data are realized, compared with the traditional centralized storage, the data have higher performance and usability, in the storage process, log data need to be cut and partitioned according to a certain rule, so that subsequent association analysis and query, for example, partitioning according to time, geographic position or other custom fields, are convenient, so that the data have more structuralized and indexable characteristics in the storage process, and for association analysis, data mining algorithms, such as association rule mining, cluster analysis, classification algorithm and the like, can be used, and potential modes, abnormal behaviors or user behavior trends can be found by analyzing association relations among key fields in the log, such as using an open source tool or a self-development program. Finally, in order to visually display the analysis results, data visualization tools such as Tableau, powerBI can be used, and the analysis results can be clearly displayed by constructing visual charts, maps or dashboards, so that users can be helped to better understand and utilize the analysis results, such as the content of timelines, abnormal distribution and the like among logs.

Referring to fig. 2, a second embodiment of a log format normalization processing method according to an embodiment of the present invention includes:

S201, configuring a log collector, configuring a log source and a parsing rule in the log collector, wherein the log source is used for designating a range for collecting log data;

s202, acquiring log data through the log acquisition device, and extracting key fields from the log data according to the analysis rule.

In this embodiment, in the configuration file of the log collector, a log source to be collected is specified, where the log source may be a single file, all files under a directory, files on a remote server, and so on, and for each log source, a corresponding parameter, such as a path, a file name, an IP address, and so on, is configured, so that the log collector can accurately locate and access log data to be collected, and multiple log sources may also be configured according to requirements, so as to extend a collection range. The log collector reads the log file of the application program at regular time or in real time, extracts key fields for each piece of log data according to a predefined analysis rule, and the analyzed log data can be stored in a cache or a temporary file for subsequent processing.

As an example, popular logging tools such as Logstash, fluentd or Filebeat may be selected for use, the selected tools first installed and configured, and the logging sources set up as the case may be. The log source may be a file, folder, network interface, system log (e.g., syslog), etc. For configuring the log source, a specified path, address, or other relevant parameter is required to determine the log source of the item to ensure that log data is obtained from the correct location. In addition to log sources, parsing rules need to be defined to extract key fields, the parsing rules are different according to log formats, and regular expressions or existing plugins are generally used to parse different types of log data, and the parsing rules should include matching patterns and key fields that need to be extracted, such as date, request path, user identifier, etc., for example, for an application log data:

"2023-01-01 12:00:00INFO This is a log message", according to the parsing rule, the extracted key fields are { "time": "2022-01-01 12:00:00", "level": "INFO", "message": "This is a log message" }. Through the steps, the log collector is configured, a log source to be collected is designated, then key fields are extracted from the log data through the log collector, the required log data is ensured to be collected, and the subsequent processing and analysis are ready.

Referring to fig. 3, a third embodiment of a log format normalization processing method according to an embodiment of the present invention includes:

s301, screening effective log data through a regular expression algorithm, and matching the effective log data to obtain key log data;

s302, dividing the key log data through a separator algorithm to obtain a plurality of key fields.

In this embodiment, for the collected log data, a regular expression algorithm is used to perform screening, and only valid log data conforming to a specific pattern is retained, where the regular expression is a specific formal language for describing the pattern of the text. By parsing the log data, the log data can be matched with predefined rules, so that log data meeting specific conditions can be screened out, log data related to the information can be filtered out, and an appropriate regular expression pattern is defined for matching the required key log data, wherein the pattern can comprise a time format, a level format, a message text and the like. And dividing each piece of key log data according to a specific separator algorithm, dividing the key log data into a plurality of key fields, defining proper separators such as space, comma, tab and the like, selecting proper separators according to the format and structure of the log data, dividing the key log data according to the separators to obtain a plurality of key fields, and extracting the log data conforming to the mode as the key log data for subsequent processing.

By way of example, assume that a piece of critical log data is: "2023-01-01 12:00:00ERROR This is an error message", analysis of the key log data may find that key fields in the key log data are separated by spaces, so that the key log data are split into a plurality of key fields by using the spaces as separators, and the split key fields are: { "time":

"2023-01-01 12:00:00", "level": "ERROR", "message": "This is an error message" }, screening out effective log data by using a regular expression algorithm through the steps, and matching the effective log data to obtain key log data. The key log data is then partitioned by a separator algorithm to obtain a plurality of key fields for further processing and analysis.

Referring to fig. 4, a fourth embodiment of a log format normalization processing method according to an embodiment of the present invention includes:

s401, defining a standard log format, and constructing a conversion configurator according to the standard log format;

s402, mapping the key fields into the standard log format through the conversion configurator, and executing conversion processing to obtain standardized log data.

In this embodiment, the standard log format is a log record format, which may include information such as a timestamp, a log level, and log content, and in order to implement automatic log processing, a standard log format needs to be defined, so that subsequent conversion and processing define the standard log format according to requirements, including field names and field sequences. For example, time, level, service name, message text, etc., according to a standard log format, a conversion configurator is constructed for mapping key fields to the standard log format, and the conversion configurator is a tool for converting raw data into the standard format. In order to implement the processing of standardized log data, a conversion configurator needs to be constructed, the key fields are mapped into the standard log format to perform conversion processing, for example, in the conversion configurator, a mapping rule is defined for each field, and the correspondence between the key fields and the standard log format fields is specified.

By way of example, determining a basic structure of a log format, including information such as a time stamp, a log level, log content, and the like; determining the name and data type of each field, for example, the timestamp may be a date and time type, the log level may be a character string type, etc., determining the format and data unit of each field, for example, the timestamp may be a YYYY-MM-DD HH: MM: SS, the log level may be a character string type, and additional fields such as a user ID, a device ID, etc., may be added according to the actual situation, for example, standard log formats may be defined as: { "time": "[ time ]", "level": "[ level ]",

"Service": "[ service ]", "message": "[ message ]", or define a standard log format as: { "time": "[ time ]", level ": "[ level ]", "service": "[ service ]" "message": and constructing a conversion configurator for mapping the key fields to the standard log format, integrating the conversion configurator in the system according to the mapping rules and logic in the conversion configurator for each key log data so that the conversion configurator can interact with the original data stream, mapping the key fields to the corresponding fields in the standard log format, executing conversion processing, and filling the values of the key fields into the corresponding positions in the standard log format to obtain standardized log data, namely log data conforming to the standard log format.

Referring to fig. 5, a fifth embodiment of a log format normalization processing method according to an embodiment of the present invention includes:

s501, outputting the standardized log in a JSON format and uploading the standardized log to a Kafka queue;

s502, distributing standardized logs in the Kafka queue to a plurality of Elastic nodes through KAFKADIVIDER algorithm, and storing the standardized logs in an Elastic search through Bulk API.

JSON (JavaScript Object Notation) is a lightweight data exchange format in this embodiment, commonly used for serialization and transmission of data, structured data using compact text representations, easy to read and write, and capable of being parsed and generated by multiple programming languages. Kafka is a distributed stream processing platform and has the characteristics of high throughput, persistence, expandability and the like, while Kafka queue is a data storage structure in Kafka and is used for storing and transmitting large-scale real-time data streams, standardized log data are converted into JSON format, and the standardized log data in the JSON format can be sent into the Kafka queue by using a Producer API (application program interface) of Kafka, so that efficient transmission and storage of the log data can be realized. By outputting and uploading standardized log data to the Kafka queue in a JSON format, serialization and efficient transmission of the log data can be realized, and reliable transmission and decoupling of the log data can be realized by using the Kafka queue as a middleware, so that the expandability and stability of a system are improved.

In this embodiment KAFKADIVIDER is a distribution algorithm for distributing log data in the Kafka queue to multiple nodes according to a certain rule, and may be distributed according to some characteristics of the data, such as according to key fields, slices, etc. of the data, and an Elastic node is an example in an Elastic search cluster. The elastsearch is an open-source distributed search and analysis engine for searching, analyzing and storing large-scale data in real time; bulk API is a batch operation interface provided by elastic search, and can be used for efficiently indexing, updating or deleting a plurality of documents in batches; the elastomer search is an open-source distributed search engine for searching and analyzing large amounts of data in real time, in this embodiment for storing and retrieving standardized logs.

By way of example, there are three logs in the Kafka queue, namely log 1, log 2 and log 3, the standardized log data in the Kafka queue is distributed to a plurality of Elastic nodes according to a certain rule by using KAFKADIVIDER algorithm, on each Elastic node, the standardized log data distributed to the node is stored in the Elastic search in batches by using Bulk API, for example, log 1, log 2 and log 3 are respectively stored in the corresponding Elastic search node 1, elastic search node 2 and Elastic search node 3, the standardized log data is indexed to the proper index according to the configured index and mapping rule, so that the subsequent searching and analyzing operation is realized.

Referring to fig. 6, a sixth embodiment of a log format normalization processing method according to an embodiment of the present invention includes:

S601, receiving a designated key field name and an operator to construct Lucene inquiry, and inquiring the standardized log to obtain an inquiry result;

S602, designing logic conditions according to the query result, and performing association analysis according to the logic conditions to obtain an analysis result.

In this embodiment, lucene is an open-source full-text search engine library, provides rich search functions and query grammar, and Lucene queries are query operations performed based on the Lucene library, and can construct query conditions according to specified key field names and operators for retrieving documents meeting the conditions. And constructing a Lucene query statement according to the key field name and the operator specified by the user. For example, if the user designates the key field as "service", the operator as "equal", and the value as "SystemA", the constructed query term as "service: systemA". And (3) performing query operation by using the Lucene library, querying the standardized log data to obtain a log data result set meeting the conditions, and flexibly retrieving the standardized log data.

In this embodiment, the logic condition is designed according to the query result. For example, based on the time field in the query result, a logic condition may be designed that "time is greater than 2023-01-00:00:00", AND the logic condition may include a logic operator (e.g., AND, OR) AND a comparison operator (e.g., greater than, less than, equal to) for filtering AND combining the query result, AND based on the designed logic condition, performing a correlation analysis on the query result. For example, the information such as the number and the frequency of logs meeting the logic conditions in the query result is counted, the relativity and the rule in the log data are found, the user is helped to perform the works such as fault detection, performance optimization and the like, and the reliability and the performance of the system are improved. Meanwhile, the Lucene query and association analysis technology can also rapidly process large-scale log data, and the query efficiency and analysis effect are improved.

Referring to fig. 7, a seventh embodiment of a log format normalization processing method according to an embodiment of the present invention includes:

s701, counting operation indexes of the log data according to the analysis result;

s702, performing visualization processing and displaying on the operation index.

In this embodiment, the operation index is an index for measuring the operation state and performance of the system, which is obtained by counting and analyzing the log data, and the operation index may include various performance indexes, error rate, response time, and the like. And selecting proper operation indexes for statistics according to analysis results, for example, according to the results of association analysis, indexes such as error rates, request frequencies and the like of different services can be counted. And selecting proper visualization tools or libraries, such as Matplotlib and D3.Js, and the like, generating a visualization chart or graph according to the operation index data obtained through statistics, designing proper chart types, such as a line graph, a histogram, a pie chart and the like, according to requirements so as to show the trend, the distribution and other information of the operation index, and embedding the generated visualization chart into a user interface or report so that a user can intuitively check and analyze the operation index. By counting the operation indexes of the log data according to the analysis result and carrying out visual processing and display, the user can be helped to understand the operation state and performance of the system more intuitively, and the operation indexes are displayed in a visual chart or graph, so that the user can quickly know the trend, abnormal condition and the like of the system and carry out corresponding decision and optimization measures. The visual processing technology can improve the understandability of the data and the visual analysis effect, and help the user to better monitor and optimize the system by using the log data. Meanwhile, through automatic operation index statistics and visual processing, the work load of a user can be reduced, and the work efficiency is improved.

The method for log format normalization processing in the embodiment of the present invention is described above, and the log format normalization processing device in the embodiment of the present invention is described below, referring to fig. 8, where an embodiment of the log format normalization processing device in the embodiment of the present invention includes:

The acquisition module 50 is used for acquiring log data of the multi-source system and extracting key fields from the log data;

A normalization module 60, configured to define a standard log format, and convert the log data according to the standard log format to obtain normalized log data;

and the analysis module 70 is used for carrying out distributed storage on the standardized log data, carrying out association analysis according to the key fields, obtaining analysis results, and carrying out visual display on the analysis results.

In this embodiment, through unified log format and distributed storage, a user can complete complex log association analysis only by simple query, thereby greatly improving the system log processing efficiency.

Referring to fig. 9, another embodiment of the log format normalization processing device according to the present invention includes:

In this embodiment, the acquisition module 50 includes:

A configuration unit 501, configured to configure a log collector, and configure a log source and an analysis rule in the log collector, where the log source is used to specify a range for collecting log data;

And the extracting unit 502 is configured to collect log data by using the log collector, and extract a key field from the log data according to the parsing rule.

In this embodiment, the extracting unit 502 further includes:

The screening subunit 5021 is used for screening effective log data through a regular expression algorithm and matching the effective log data to obtain key log data;

The partitioning subunit 5022 is configured to partition the key log data through a separator algorithm to obtain a plurality of key fields.

In this embodiment, the normalization module 60 includes:

A defining unit 601, configured to define a standard log format, and construct a conversion configurator according to the standard log format;

and the conversion unit 602 is configured to map the key field to the standard log format through the conversion configurator to perform conversion processing, so as to obtain standardized log data.

In this embodiment, the analysis module 70 includes:

an output unit 701, configured to output the standardized log in JSON format, and upload the standardized log to a Kafka queue;

And the storage unit 702 is used for distributing the standardized logs in the Kafka queue to a plurality of Elastic nodes through KAFKADIVIDER algorithm and storing the standardized logs in an Elastic search through Bulk API.

In this embodiment, the analysis module 70 further includes:

a query unit 703, configured to receive a specified key field name and an operator to construct a Lucene query, and query the standardized log to obtain a query result;

a logic processing unit 704 for designing logic conditions according to the query result, and performing association analysis according to the logic conditions to obtain an analysis result

In this embodiment, the analysis module 70 further includes:

a statistics unit 705, configured to count operation indexes of the log data according to the analysis result;

and the display unit 706 is configured to perform visualization processing on the operation index and display the operation index.

The invention provides a standardized processing method for multiple log formats, which is characterized in that unified standard log formats are defined, so that automatic conversion from the original log with multiple formats to the standard format is realized, log with the unified format after conversion is supported to be stored, log analysis is simplified, quick association inquiry of distributed log is realized, and the problem that log systems are difficult to associate due to independence is effectively solved.

The log format normalization processing device in the embodiment of the present invention is described in detail above in fig. 8 and 9 from the point of view of the modularized functional entity, and the log format normalization processing apparatus in the embodiment of the present invention is described in detail below from the point of view of hardware processing.

Fig. 10 is a schematic structural diagram of a log format standardized processing device according to an embodiment of the present invention, where the log format standardized processing device 10 may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 11 (e.g., one or more processors) and a memory 12, and one or more storage mediums 13 (e.g., one or more mass storage devices) storing application programs 133 or data 132. Wherein the memory 12 and the storage medium 13 may be transitory or persistent storage. The program stored in the storage medium 13 may include one or more modules (not shown), each of which may include a series of instruction operations in the log format normalization processing device 10. Still further, the processor 11 may be arranged to communicate with the storage medium 13, and to execute a series of instruction operations in the storage medium 13 on the log format normalization processing device 10.

The log format standardized processing device 10 may also include one or more power supplies 14, one or more wired or wireless network interfaces 15, one or more input/output interfaces 16, and/or one or more operating systems 131, such as Windows Serve, mac OS X, unix, linux, freeBSD, and the like. It will be appreciated by those skilled in the art that the device architecture shown in fig. 10 is not limiting of the log format normalization processing device 10 and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, or a volatile computer readable storage medium, having stored therein instructions that, when executed on a computer, cause the computer to perform the steps of a log format normalization processing method.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the system or apparatus and unit described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The log format standardization processing method is characterized by comprising the following steps of:

collecting log data of a multi-source system, and extracting key fields from the log data;

defining a standard log format, and converting the log data according to the standard log format to obtain standardized log data;

and carrying out distributed storage on the standardized log data, carrying out association analysis according to the key fields to obtain analysis results, and carrying out visual display on the analysis results.

2. The method for log format standardization processing according to claim 1, wherein the step of collecting log data of the multi-source system and extracting key fields from the log data includes:

Configuring a log collector, configuring a log source and an analysis rule in the log collector, wherein the log source is used for designating a range for collecting log data;

And collecting log data through the log collector, and extracting key fields from the log data according to the analysis rule.

3. The log format normalization processing method according to claim 2, wherein the step of extracting key fields from the log data according to the parsing rule includes:

screening effective log data through a regular expression algorithm, and matching according to the effective log data to obtain key log data;

And dividing the key log data through a separator algorithm to obtain a plurality of key fields.

4. The method for standardized processing of log format according to claim 1, wherein the step of defining a standard log format and converting the log data according to the standard log format to obtain standardized log data comprises:

defining a standard log format, and constructing a conversion configurator according to the standard log format;

and mapping the key field into the standard log format through the conversion configurator to execute conversion processing to obtain standardized log data.

5. The log format normalization processing method according to claim 1, characterized in that the step of distributively storing the normalized log data includes:

outputting the standardized log in a JSON format and uploading the standardized log to a Kafka queue;

The standardized log in the Kafka queue is distributed to a plurality of Elastic nodes through KAFKADIVIDER algorithm, and is stored in an Elastic search through Bulk API.

6. The method for log format standardization processing according to claim 1, wherein the step of performing association analysis according to the key field to obtain an analysis result includes:

receiving the designated key field name and the operator to construct Lucene inquiry, and inquiring the standardized log to obtain an inquiry result;

And designing logic conditions according to the query result, and carrying out association analysis according to the logic conditions to obtain an analysis result.

7. The log format normalization processing method according to claim 1, wherein the step of visually displaying the analysis result includes:

Counting operation indexes of the log data according to the analysis result;

And carrying out visual processing and displaying on the operation index.

8. A log format normalization processing device, comprising:

the acquisition module is used for acquiring log data of the multi-source system and extracting key fields from the log data;

the standardized module is used for defining a standard log format and converting the log data according to the standard log format to obtain standardized log data;

And the analysis module is used for carrying out distributed storage on the standardized log data, carrying out association analysis according to the key fields, obtaining an analysis result and carrying out visual display on the analysis result.

9. A log format standardized processing device comprising a memory and at least one processor, the memory having computer readable instructions stored therein;

The at least one processor invokes the computer readable instructions in the memory to perform the steps of the log format normalization processing method of any one of claims 1 to 7.

10. A computer readable storage medium having computer readable instructions stored thereon, which when executed by a processor, implement the steps of the log format normalization processing method of any one of claims 1 to 7.