[go: up one dir, main page]

CN111858278A - Log analysis method, system and readable storage device based on big data processing - Google Patents

Log analysis method, system and readable storage device based on big data processing Download PDF

Info

Publication number
CN111858278A
CN111858278A CN202010651841.4A CN202010651841A CN111858278A CN 111858278 A CN111858278 A CN 111858278A CN 202010651841 A CN202010651841 A CN 202010651841A CN 111858278 A CN111858278 A CN 111858278A
Authority
CN
China
Prior art keywords
data
log
analysis method
processing
log analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010651841.4A
Other languages
Chinese (zh)
Inventor
杨勋
胡建国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Guolian Video Information Technology Co ltd
Original Assignee
Beijing Guolian Video Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Guolian Video Information Technology Co ltd filed Critical Beijing Guolian Video Information Technology Co ltd
Priority to CN202010651841.4A priority Critical patent/CN111858278A/en
Publication of CN111858278A publication Critical patent/CN111858278A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a log analysis method, a system and a readable storage device based on big data processing, wherein the log analysis method comprises the following steps: s1: the log collection service BLS collects logs from a log source; s2: performing data cold treatment and data heat treatment on the log; s3: performing big data distributed calculation, processing, analysis and reporting through the BMR; s4: the data processed by the BMR is written into a data warehouse or the data in the BOS is combined with the machine learning BML to carry out user behavior prediction analysis operation, and the main source of the user, contents on favorite websites, loyalty of the user and the like can be seen through a log analysis system. By analyzing the user behavior log, the invention can further optimize the layout and the function of the website so as to improve the user experience and the like. And dividing the popularization budget and emphasizing and optimizing the tendency points of the user group and the like through the analysis result.

Description

基于大数据处理的日志分析方法、系统及可读存储装置Log analysis method, system and readable storage device based on big data processing

【技术领域】【Technical field】

本发明涉及大数据处理技术领域,尤其涉及基于大数据处理的日志分析方法、系统及可读存储装置。The present invention relates to the technical field of big data processing, in particular to a log analysis method, system and readable storage device based on big data processing.

【背景技术】【Background technique】

在互联网的应用中,日志是个非常重要的数据,因为互联网项目往往是要求7*24不间断运行的,所以能获取到监控系统运行的相关日志数据并进行分析就显得很有必要。日志分析系统是面向分析的集成化数据环境,为企业决策制定过程,提供系统数据支持的战略集合。通过对数据仓库中数据的分析,可以帮助企业改进业务流程、控制成本、提高产品质量等。在大数据时代,日志数据是每个公司、组织或机构运营情况的明确记录,也是用户在公司产品过程中使用痕迹的最直接,最易获得、覆盖面最广的数据来源之一。日志的采集和分析往往采用大数据的技术处理。In the application of the Internet, logs are very important data, because Internet projects often require 7*24 uninterrupted operation, so it is necessary to obtain and analyze the log data related to the operation of the monitoring system. The log analysis system is an analysis-oriented integrated data environment that provides a strategic collection of system data support for the enterprise decision-making process. Through the analysis of the data in the data warehouse, it can help enterprises to improve business processes, control costs, and improve product quality. In the era of big data, log data is a clear record of the operation of each company, organization or institution, and it is also one of the most direct, accessible and wide-ranging data sources for users to use traces of the company's products. The collection and analysis of logs are often processed using big data technology.

因此,有必要研究基于大数据处理的日志分析方法、系统及可读存储装置来应对现有技术的不足,以解决或减轻上述一个或多个问题。Therefore, it is necessary to study a log analysis method, system and readable storage device based on big data processing to deal with the deficiencies of the prior art, so as to solve or alleviate one or more of the above problems.

【发明内容】[Content of the invention]

有鉴于此,本发明提供了基于大数据处理的日志分析方法、系统及可读存储装置,通过日志分析能够看到用户的主要来源、喜好网站上的哪些内容,以及用户的忠诚度,对网站的布局、功能进一步的优化,以提高用户的体验。In view of this, the present invention provides a log analysis method, system and readable storage device based on big data processing. Through log analysis, it is possible to see the main sources of users, what content on the website they like, and the loyalty of the user. The layout and functions are further optimized to improve the user experience.

一方面,本发明提供一种日志分析方法,所述日志分析方法包括以下步骤:In one aspect, the present invention provides a log analysis method, which includes the following steps:

S1:日志收集服务,BLS从日志源收集日志;S1: log collection service, BLS collects logs from log sources;

S2:对日志分别进行数据冷处理和数据热处理;S2: Perform data cold processing and data thermal processing on logs respectively;

S3:通过BMR对处理后的数据进行大数据分布式计算,并进行分析;S3: Big data distributed computing is performed on the processed data through BMR, and analysis is performed;

S4:经过BMR计算后的数据写入数据仓库或由BOS中的数据结合机器学习BML进行用户行为预测分析操作;S4: The data calculated by BMR is written into the data warehouse or the data in the BOS is combined with machine learning BML to perform user behavior prediction and analysis operations;

S5:应用和展示分析结果,将数据热处理的结果提供警报给运维人员;将冷数据处理的结果通过BI工具进行展示。S5: Apply and display the analysis results, provide alerts to the operation and maintenance personnel for the results of data heat treatment; display the results of cold data processing through BI tools.

如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述S2中数据冷处理具体为:将日志写入对象存储BOS进行存储或者写入HBase集群,之后接入Hive或Spark SQL集群进行分析处理。The above aspects and any possible implementation manners further provide an implementation manner. The data cold processing in S2 is specifically: writing logs into object storage BOS for storage or writing to HBase clusters, and then accessing Hive or Spark SQL cluster for analytical processing.

如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述S2中数据热处理具体为:将日志输入消息服务Kafka,作为消息队列,投递到流式计算BSC,对日志数据进行实时计算处理,再将处理后的数据写入到Kafka。The above aspects and any possible implementations further provide an implementation. The data heat treatment in the S2 is specifically: inputting the log into the message service Kafka, as a message queue, and delivering it to the streaming computing BSC, and processing the log data. Perform real-time computing processing, and then write the processed data to Kafka.

如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述BMR为全托管的Hadoop/Spark集群。According to the above aspect and any possible implementation manner, an implementation manner is further provided, where the BMR is a fully managed Hadoop/Spark cluster.

如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述S1具体为:通过BLS进行托管式日志收集服务,用户需配置源地址、目的地址和收集规则。The above-mentioned aspects and any possible implementation manners further provide an implementation manner. The S1 is specifically: performing a managed log collection service through BLS, and a user needs to configure a source address, a destination address and a collection rule.

如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述S3具体包括:The above aspect and any possible implementation manner further provide an implementation manner, and the S3 specifically includes:

所述S3具体包括:The S3 specifically includes:

S31:数据清洗,使用分布式计算框架对数据进行清洗,清洗完之后的数据存放在数据仓库或者保留在计算框架内;S31: data cleaning, using a distributed computing framework to clean the data, and storing the cleaned data in the data warehouse or in the computing framework;

S32:使用Spark、Hive、MapReduce或Flink框架,对清洗后的数据进行业务统计,并根据大数据内容进行分析。S32: Use the Spark, Hive, MapReduce or Flink framework to conduct business statistics on the cleaned data and analyze it according to the big data content.

如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述分布式计算框架包括Spark、Hive和MapReduce。According to the above aspect and any possible implementation manner, an implementation manner is further provided, wherein the distributed computing framework includes Spark, Hive and MapReduce.

如上所述的方面和任一可能的实现方式,进一步提供一种实现方式,所述S5中BI工具展示结果包括饼图、柱状图、地图和折线图。According to the above-mentioned aspect and any possible implementation manner, an implementation manner is further provided. The BI tool display result in S5 includes a pie chart, a bar chart, a map and a line chart.

如上所述的方面和任一可能的实现方式,进一步提供一种基于大数据处理的日志分析装置,所述装置包括The above aspects and any possible implementation manner further provide a log analysis device based on big data processing, the device comprising:

日志收集模块,用于从日志源收集日志;A log collection module for collecting logs from log sources;

处理分析模块,用于对日志进行数据清洗、冷处理和热处理,并对处理结果进行存储和预测分析;The processing and analysis module is used for data cleaning, cold processing and heat treatment of logs, and storage and predictive analysis of processing results;

应用展示模块,通过存储的处理结果进行警报,并将预测分析结果进行BI工具展示。The display module is applied, alerts are made through the stored processing results, and the predictive analysis results are displayed on BI tools.

如上所述的方面和任一可能的实现方式,进一步提供一种计算机可读存储介质,所述计算机可读存储介质上存储有日志分析的处理程序,所述日志分析方法的处理程序被处理器执行时实现如任一项所述的日志分析方法的步骤。The above aspect and any possible implementation manner further provide a computer-readable storage medium, where a processing program for log analysis is stored on the computer-readable storage medium, and the processing program for the log analysis method is processed by a processor The steps of implementing the log analysis method described in any one of the above are implemented when executed.

如上所述的方面和任一可能的实现方式,进一步提供一种基于大数据处理的日志分析方法的应用,所述日志分析方法用于电子银行、通信运营商或电商运营平台。The above aspects and any possible implementation manners further provide an application of a log analysis method based on big data processing, where the log analysis method is used in an electronic bank, a communication operator or an e-commerce operation platform.

与现有技术相比,本发明可以获得包括以下技术效果:Compared with the prior art, the present invention can obtain the following technical effects:

本发明所述系统支持冷、热数据的离线、实时处理;支持多数据源、复杂数据结构的处理和复杂画像的构建。通过日志分析系统能够看到用户的主要来源、喜好网站上的哪些内容,以及用户的忠诚度等。通过分析用户行为日志,能对网站的布局、功能进一步的优化,以提高用户的体验等。通过分析结果,进行推广预算的划分,以及重点优化用户群体的倾向点等。The system of the invention supports offline and real-time processing of cold and hot data; supports the processing of multiple data sources, complex data structures and the construction of complex portraits. Through the log analysis system, you can see the main sources of users, what content they like on the website, and the loyalty of users. By analyzing user behavior logs, the layout and functions of the website can be further optimized to improve user experience. Through the analysis of the results, the promotion budget is divided, and the tendency points of the user groups are optimized.

当然,实施本发明的任一产品并不一定需要同时达到以上所述的所有技术效果。Of course, any product implementing the present invention does not necessarily need to achieve all the above-mentioned technical effects at the same time.

【附图说明】【Description of drawings】

为了更清楚地说明本发明实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to illustrate the technical solutions of the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.

图1是本发明一个实施例提供的日志分析方法的流程图。FIG. 1 is a flowchart of a log analysis method provided by an embodiment of the present invention.

【具体实施方式】【Detailed ways】

为了更好的理解本发明的技术方案,下面结合附图对本发明实施例进行详细描述。In order to better understand the technical solutions of the present invention, the embodiments of the present invention are described in detail below with reference to the accompanying drawings.

应当明确,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。It should be understood that the described embodiments are only some, but not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

在本发明实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本发明。在本发明实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。The terms used in the embodiments of the present invention are only for the purpose of describing specific embodiments, and are not intended to limit the present invention. As used in the embodiments of the present invention and the appended claims, the singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise.

本发明提供一种基于大数据处理的日志分析方法,如图1所示,所述日志分析方法包括以下步骤:The present invention provides a log analysis method based on big data processing. As shown in FIG. 1 , the log analysis method includes the following steps:

S1:日志收集服务BLS从日志源收集日志;S1: Log collection service BLS collects logs from log sources;

S2:通过对日志进行数据冷处理和数据热处理;S2: Perform data cold processing and data heat treatment on logs;

S3:通过BMR进行大数据分布式计算、处理、分析和报告;S3: Big data distributed computing, processing, analysis and reporting through BMR;

S4:经过BMR处理好的数据写入数据仓库或由BOS中的数据结合机器学习BML进行用户行为预测分析操作;S4: The data processed by BMR is written into the data warehouse or the data in the BOS is combined with machine learning BML to perform user behavior prediction and analysis operations;

S5:应用和展示分析结果,将数据热处理的结果提供警报给运维人员;将冷数据通过BI工具进行展示。S5: Apply and display analysis results, provide alerts to operation and maintenance personnel about the results of data heat treatment; display cold data through BI tools.

所述S2中数据冷处理具体为:将日志写入对象存储BOS进行存储或者写入HBase集群,之后接入Hive或SparkSQL集群进行分析处理。The data cold processing in the S2 is specifically: writing the log to the object storage BOS for storage or writing to the HBase cluster, and then accessing the Hive or SparkSQL cluster for analysis and processing.

所述S2中数据热处理具体为:将日志接入消息服务Kafka作为消息队列,投递到流式计算BSC对日志数据进行实时计算处理,再将处理后的数据写入到Kafka。The data heat treatment in the S2 is specifically: connecting the log to the message service Kafka as a message queue, delivering it to the streaming computing BSC to perform real-time computing processing on the log data, and then writing the processed data to Kafka.

所述BMR为全托管的Hadoop/Spark集群,所述S1具体为:通过BLS进行托管式日志收集服务,用户需配置源地址、目的地址和收集规则。The BMR is a fully managed Hadoop/Spark cluster, and the S1 is specifically: a managed log collection service is performed through BLS, and the user needs to configure the source address, destination address and collection rules.

所述S3具体包括:The S3 specifically includes:

S31:数据清洗,使用分布式计算框架对数据进行清洗,清洗完之后的数据存放在数据仓库或者保留在计算框架内;S31: data cleaning, using a distributed computing framework to clean the data, and storing the cleaned data in the data warehouse or in the computing framework;

S32:使用Spark、Hive、MapReduce或Flink框架,对清洗后的数据进行业务统计,并根据大数据内容进行分析。S32: Use the Spark, Hive, MapReduce or Flink framework to conduct business statistics on the cleaned data and analyze it according to the big data content.

所述分布式计算框架包括Spark、Hive和MapReduce。所述S5中BI工具展示结果包括饼图、柱状图、地图和折线图。The distributed computing framework includes Spark, Hive and MapReduce. The BI tool display results in the S5 include pie charts, bar charts, maps and line charts.

一种基于大数据处理的日志分析装置,所述装置包括A log analysis device based on big data processing, the device includes

日志收集模块,用于从日志源收集日志;A log collection module for collecting logs from log sources;

处理分析模块,用于对日志进行数据清洗、冷处理和热处理,并对处理结果进行存储和预测分析;The processing and analysis module is used for data cleaning, cold processing and heat treatment of logs, and storage and predictive analysis of processing results;

应用展示模块,通过存储的处理结果进行警报,并将预测分析结果进行BI工具展示。The display module is applied, alerts are made through the stored processing results, and the predictive analysis results are displayed on BI tools.

一种计算机可读存储介质,所述计算机可读存储介质上存储有日志分析的处理程序,所述日志分析方法的处理程序被处理器执行时实现如任一项所述的日志分析方法的步骤。A computer-readable storage medium, on which a processing program for log analysis is stored, and when the processing program of the log analysis method is executed by a processor, the steps of any one of the log analysis methods are implemented .

本发明针对用户在日常生活中越来越依赖于运营商提供的通信服务,导使得运营商手中留存了大量的用户数据。基于大数据激活这部分数据潜在价值。运营商期望从留存的用户日志数据挖掘出特定方向的隐藏信息价值,如:用户行为预测,营销结果监测,信贷风险防控,地产选址,人口迁徙分布,交通运输规划,商业舆情等信息。针对运营商以上的需求痛点,本发明提供了基于日志的数据挖掘分析服务。The present invention aims at users who rely more and more on the communication service provided by the operator in their daily life, so that a large amount of user data is kept in the hands of the operator. Activate the potential value of this part of data based on big data. Operators expect to mine the hidden information value of a specific direction from the retained user log data, such as: user behavior prediction, marketing result monitoring, credit risk prevention and control, real estate site selection, population migration distribution, transportation planning, business public opinion and other information. Aiming at the above needs and pain points of operators, the present invention provides a log-based data mining analysis service.

在保证用户隐私的情况下,使用运营商日志数据,结合运营商的DPI系统,计费系统及其他数据。构建用户习惯,活动路径,消费场景等上百的一级维度分类,近千的二级维度分类,用于构建用户画像,并使用特定方式进行用户画像的不同维度的不同时间粒度的校准,逐步丰满用户画像。保证提供特定用户特定服务的精准程度。In the case of ensuring user privacy, the operator log data is used, combined with the operator's DPI system, billing system and other data. Hundreds of first-level dimension classifications and nearly a thousand second-level dimension classifications such as user habits, activity paths, and consumption scenarios are constructed, which are used to construct user portraits, and use specific methods to calibrate different dimensions of user portraits at different time granularities. Gradually Plump user portraits. To ensure the accuracy of providing specific services to specific users.

本发明的工作原理如下:The working principle of the present invention is as follows:

大数据处理的日志分析系统分为五大模块。The log analysis system for big data processing is divided into five modules.

数据采集:使用Flume对数据进行采集,将web日志写入BLS的HDFS。Data collection: Use Flume to collect data and write web logs to HDFS of BLS.

数据清洗:使用Spark、Hive、MapReduce或者其他的一些分布式计算框架,清洗完之后的数据存放在数据仓库或者Hive、SparkSQL里。Data cleaning: Using Spark, Hive, MapReduce or some other distributed computing frameworks, the cleaned data is stored in the data warehouse or in Hive and SparkSQL.

数据处理:按照需要进行相应业务的统计和分析(使用Spark、Hive、MapReduce、Flink等框架)。Data processing: Statistics and analysis of the corresponding business are performed as needed (using Spark, Hive, MapReduce, Flink and other frameworks).

数据处理结果入库:结果存放到RDBMS、NoSQL等数据库。Storage of data processing results: The results are stored in databases such as RDBMS and NoSQL.

数据的可视化:通过图形化展示的方式展现出来:饼图、柱状图、地图、折线图。Data visualization: It is displayed through graphical display: pie charts, bar charts, maps, line charts.

本发明所述系统支持冷、热数据的离线、实时处理;支持多数据源、复杂数据结构的处理和复杂画像的构建。通过日志分析系统能够看到用户的主要来源、喜好网站上的哪些内容,以及用户的忠诚度等。通过分析用户行为日志,本发明能对网站的布局、功能进一步的优化,以提高用户的体验等。通过分析结果,进行推广预算的划分,以及重点优化用户群体的倾向点等。The system of the invention supports offline and real-time processing of cold and hot data; supports the processing of multiple data sources, complex data structures and the construction of complex portraits. Through the log analysis system, you can see the main sources of users, what content they like on the website, and the loyalty of users. By analyzing the user behavior log, the present invention can further optimize the layout and function of the website, so as to improve the user's experience and the like. Through the analysis of the results, the promotion budget is divided, and the tendency points of the user groups are optimized.

本发明基于大数据处理的日志分析系统和方法同时实现热数据处理与冷数据处理,包括日志收集、处理与分析、应用与展示三个模块。The log analysis system and method based on big data processing of the present invention simultaneously realizes hot data processing and cold data processing, including three modules: log collection, processing and analysis, and application and display.

在日志收集模块,日志收集服务BLS从日志源(如服务器)收集日志。BLS是托管式日志收集服务,用户只需配置源地址、目的地址、收集规则等简单信息即可实现日志的高可靠、高可用收集。In the log collection module, the log collection service BLS collects logs from log sources (such as servers). BLS is a managed log collection service. Users only need to configure simple information such as source address, destination address, and collection rules to achieve highly reliable and highly available log collection.

收集到的日志即可接入日志处理模块。一方面对于热数据处理场景,可以将日志接入消息服务Kafka作为消息队列,投递到流式计算BSC对日志数据进行实时计算处理,再将处理后的数据写入到Kafka。另一方面对于冷数据处理场景,可以将日志先写入对象存储BOS进行存储,或者直接写入HBase集群,之后接入Hive、SparkSQL集群进行分析处理。BMR是全托管的Hadoop/Spark集群,借助大数据分布式计算技术,专注于大数据处理、分析、报告。经过BMR处理好的数据可以写入数据仓库。同时也可直接由BOS中的数据结合机器学习BML进行用户行为预测等分析操作。The collected logs can be connected to the log processing module. On the one hand, for hot data processing scenarios, logs can be connected to the message service Kafka as a message queue, delivered to the streaming computing BSC to perform real-time computing and processing of log data, and then write the processed data to Kafka. On the other hand, for cold data processing scenarios, logs can be first written to the object storage BOS for storage, or directly written to the HBase cluster, and then connected to the Hive and SparkSQL clusters for analysis and processing. BMR is a fully managed Hadoop/Spark cluster that focuses on big data processing, analysis, and reporting with the help of big data distributed computing technology. The data processed by BMR can be written to the data warehouse. At the same time, analysis operations such as user behavior prediction can be performed directly from the data in the BOS combined with the machine learning BML.

在应用与展示模块,热数据经过处理可以提供警报给运维人员;冷数据可通过BI工具进行展示。In the application and display module, hot data can be processed to provide alerts to operation and maintenance personnel; cold data can be displayed through BI tools.

本发明所述日志分析方法可以用于银行、通信运营商和电商运营平台。The log analysis method of the present invention can be used for banks, communication operators and e-commerce operation platforms.

以上对本申请实施例所提供的基于大数据处理的日志分析方法、系统及可读存储装置,进行了详细介绍。以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The log analysis method, system, and readable storage device based on big data processing provided by the embodiments of the present application have been described in detail above. The description of the above embodiment is only used to help understand the method of the present application and its core idea; meanwhile, for those of ordinary skill in the art, according to the idea of the present application, there will be changes in the specific embodiment and the scope of application, In conclusion, the content of this specification should not be construed as a limitation on the present application.

如在说明书及权利要求书当中使用了某些词汇来指称特定组件。本领域技术人员应可理解,硬件制造商可能会用不同名词来称呼同一个组件。本说明书及权利要求书并不以名称的差异来作为区分组件的方式,而是以组件在功能上的差异来作为区分的准则。如在通篇说明书及权利要求书当中所提及的“包含”、“包括”为一开放式用语,故应解释成“包含/包括但不限定于”。“大致”是指在可接收的误差范围内,本领域技术人员能够在一定误差范围内解决所述技术问题,基本达到所述技术效果。说明书后续描述为实施本申请的较佳实施方式,然所述描述乃以说明本申请的一般原则为目的,并非用以限定本申请的范围。本申请的保护范围当视所附权利要求书所界定者为准。As certain terms are used in the specification and claims to refer to particular components. It should be understood by those skilled in the art that hardware manufacturers may refer to the same component by different nouns. The present specification and claims do not use the difference in name as a way to distinguish components, but use the difference in function of the components as a criterion for distinguishing. As mentioned in the entire specification and claims, "comprising" and "including" are open-ended terms, so they should be interpreted as "including/including but not limited to". "Approximately" means that within an acceptable error range, those skilled in the art can solve the technical problem within a certain error range, and basically achieve the technical effect. Subsequent descriptions in the specification are preferred embodiments for implementing the present application. However, the descriptions are for the purpose of illustrating the general principles of the present application and are not intended to limit the scope of the present application. The scope of protection of this application should be determined by the appended claims.

还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的商品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种商品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的商品或者系统中还存在另外的相同要素。It should also be noted that the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a commodity or system comprising a list of elements includes not only those elements, but also includes not explicitly listed other elements, or elements inherent to the commodity or system. Without further limitation, an element defined by the phrase "comprising a..." does not preclude the presence of additional identical elements in the article or system that includes the element.

应当理解,本文中使用的术语“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。It should be understood that the term "and/or" used in this document is only an association relationship to describe the associated objects, indicating that there may be three kinds of relationships, for example, A and/or B, which may indicate that A exists alone, and A and B exist at the same time. B, there are three cases of B alone. In addition, the character "/" in this document generally indicates that the related objects are an "or" relationship.

上述说明示出并描述了本申请的若干优选实施例,但如前所述,应当理解本申请并非局限于本文所披露的形式,不应看作是对其他实施例的排除,而可用于各种其他组合、修改和环境,并能够在本文所述申请构想范围内,通过上述教导或相关领域的技术或知识进行改动。而本领域人员所进行的改动和变化不脱离本申请的精神和范围,则都应在本申请所附权利要求书的保护范围内。The above description shows and describes several preferred embodiments of the present application, but as mentioned above, it should be understood that the present application is not limited to the form disclosed herein, and should not be regarded as excluding other embodiments, but can be used in various various other combinations, modifications and environments, and can be modified within the scope of the concept of the application described herein, using the above teachings or skill or knowledge in the relevant field. However, modifications and changes made by those skilled in the art do not depart from the spirit and scope of the present application, and should all fall within the protection scope of the appended claims of the present application.

Claims (10)

1. A log analysis method based on big data processing is characterized by comprising the following steps:
s1: a log collection service, the BLS collecting logs from a log source;
s2: respectively carrying out data cold treatment and data heat treatment on the log;
s3: performing big data distributed calculation on the processed data through the BMR, and analyzing;
s4: writing the data after BMR calculation into a data warehouse or performing user behavior prediction analysis operation by combining data in BOS and machine learning BML;
s5: applying and displaying the analysis result, and providing an alarm for operation and maintenance personnel according to the result of the data heat treatment; and displaying the result of cold data processing through a BI tool.
2. The log analysis method according to claim 1, wherein the data cold processing in S2 is specifically: and writing the log into an object storage BOS for storage or into an HBase cluster, and then accessing a Hive or Spark SQL cluster for analysis and processing.
3. The log analysis method according to claim 1, wherein the data heat treatment in S2 is specifically: and inputting the log into a message service Kafka as a message queue, delivering the log to a streaming computing BSC, performing real-time computing processing on the log data, and writing the processed data into the Kafka.
4. The log analysis method as claimed in claim 1, wherein the S1 is implemented by a hosted log collection service through BLS, and the user needs to configure the source address, the destination address and the collection rule.
5. The log analysis method according to claim 1, wherein the S3 specifically includes:
s31: data cleaning, namely cleaning data by using a distributed computing frame, and storing the cleaned data in a data warehouse or reserving the cleaned data in the computing frame;
s32: and carrying out service statistics on the cleaned data by using Spark, Hive, MapReduce or Flink frames, and analyzing according to the content of the big data.
6. The log analysis method of claim 1, wherein the distributed computing framework comprises Spark, Hive, and MapReduce.
7. The log analysis method as claimed in claim 1, wherein the BI tool presentation result in S5 includes a pie chart, a bar chart, a map and a line chart.
8. A log analysis device based on big data processing, comprising the log analysis method of any of the above claims 1 to 7, wherein the device comprises
The log collection module is used for collecting logs from a log source;
The processing and analyzing module is used for carrying out data cleaning, cold treatment and heat treatment on the log, and storing and predicting and analyzing the processing result;
and the application display module is used for alarming according to the stored processing result and displaying the prediction analysis result in a BI tool.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a processing program of a log analysis method, which when executed by a processor implements the steps of the log analysis method according to any one of claims 1 to 7.
10. Application of a log analysis method based on big data processing, based on the steps of the log analysis method of any one of claims 1 to 7, wherein the log analysis method is used for electronic banking, communication operators or e-commerce operation platforms.
CN202010651841.4A 2020-07-08 2020-07-08 Log analysis method, system and readable storage device based on big data processing Pending CN111858278A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010651841.4A CN111858278A (en) 2020-07-08 2020-07-08 Log analysis method, system and readable storage device based on big data processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010651841.4A CN111858278A (en) 2020-07-08 2020-07-08 Log analysis method, system and readable storage device based on big data processing

Publications (1)

Publication Number Publication Date
CN111858278A true CN111858278A (en) 2020-10-30

Family

ID=73153153

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010651841.4A Pending CN111858278A (en) 2020-07-08 2020-07-08 Log analysis method, system and readable storage device based on big data processing

Country Status (1)

Country Link
CN (1) CN111858278A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113094250A (en) * 2021-05-12 2021-07-09 成都新希望金融信息有限公司 Log early warning method and device, electronic equipment and storage medium
CN113518365A (en) * 2021-04-29 2021-10-19 北京红山信息科技研究院有限公司 Data association method, device, server and storage medium
CN113836431A (en) * 2021-10-19 2021-12-24 中国平安人寿保险股份有限公司 User recommendation method, device, equipment and medium based on user duration

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017071134A1 (en) * 2015-10-28 2017-05-04 北京汇商融通信息技术有限公司 Distributed tracking system
CN107577805A (en) * 2017-09-26 2018-01-12 华南理工大学 A business service system for log big data analysis
CN110489453A (en) * 2019-07-02 2019-11-22 广东工业大学 User's game real-time recommendation method and system based on big data log analysis
CN110690984A (en) * 2018-07-05 2020-01-14 上海宝信软件股份有限公司 Spark-based big data weblog acquisition, analysis and early warning method and system
US20200160230A1 (en) * 2018-11-19 2020-05-21 International Business Machines Corporation Tool-specific alerting rules based on abnormal and normal patterns obtained from history logs

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017071134A1 (en) * 2015-10-28 2017-05-04 北京汇商融通信息技术有限公司 Distributed tracking system
CN107577805A (en) * 2017-09-26 2018-01-12 华南理工大学 A business service system for log big data analysis
CN110690984A (en) * 2018-07-05 2020-01-14 上海宝信软件股份有限公司 Spark-based big data weblog acquisition, analysis and early warning method and system
US20200160230A1 (en) * 2018-11-19 2020-05-21 International Business Machines Corporation Tool-specific alerting rules based on abnormal and normal patterns obtained from history logs
CN110489453A (en) * 2019-07-02 2019-11-22 广东工业大学 User's game real-time recommendation method and system based on big data log analysis

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
杨潇黎;蒋廷耀;金鑫;罗神;: "分布式web日志处理平台的研究与实现", 信息通信, no. 03 *
田晓旭: "【技术分享】百度开放云张琪:大数据时代的数据仓储", pages 4 - 13, Retrieved from the Internet <URL:https://mp.weixin.qq.com/s/iHEAYUqEG5RXdwaOOnrjdA> *
马延超;王超;李尚同;: "基于大数据技术的日志统计与分析系统研究", 电脑知识与技术, no. 34 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113518365A (en) * 2021-04-29 2021-10-19 北京红山信息科技研究院有限公司 Data association method, device, server and storage medium
CN113518365B (en) * 2021-04-29 2023-11-17 北京红山信息科技研究院有限公司 Data association method, device, server and storage medium
CN113094250A (en) * 2021-05-12 2021-07-09 成都新希望金融信息有限公司 Log early warning method and device, electronic equipment and storage medium
CN113094250B (en) * 2021-05-12 2023-08-18 成都新希望金融信息有限公司 Log early warning method and device, electronic equipment and storage medium
CN113836431A (en) * 2021-10-19 2021-12-24 中国平安人寿保险股份有限公司 User recommendation method, device, equipment and medium based on user duration

Similar Documents

Publication Publication Date Title
US11816120B2 (en) Extracting seasonal, level, and spike components from a time series of metrics data
CA3040101C (en) Method and system for searching for and navigating to user content and other user experience pages in a financial management system with a customer self-service system for the financial management system
US10614077B2 (en) Computer system for automated assessment at scale of topic-specific social media impact
Elgendy et al. Big data analytics: a literature review paper
US10129274B2 (en) Identifying significant anomalous segments of a metrics dataset
US8818788B1 (en) System, method and computer program product for identifying words within collection of text applicable to specific sentiment
WO2019072091A1 (en) Method and apparatus for use in determining tags of interest to user
US10176534B1 (en) Method and system for providing an analytics model architecture to reduce abandonment of tax return preparation sessions by potential customers
US10510113B2 (en) Providing financial transaction data to a user
US20210112101A1 (en) Data set and algorithm validation, bias characterization, and valuation
US10748157B1 (en) Method and system for determining levels of search sophistication for users of a customer self-help system to personalize a content search user experience provided to the users and to increase a likelihood of user satisfaction with the search experience
CN111242661A (en) Coupon issuing method and device, computer system and medium
US20120254053A1 (en) On Demand Information Network
US20160117328A1 (en) Influence score of a social media domain
WO2014107441A2 (en) Social media impact assessment
JP5914549B2 (en) Information processing apparatus and information analysis method
US20190026759A1 (en) System and method for universal data modeling
CN111858278A (en) Log analysis method, system and readable storage device based on big data processing
Navdeep et al. Role of big data analytics in analyzing e-Governance projects
Karmagatri et al. Uncovering User Perceptions toward Digital Banks in Indonesia: A Naive Bayes Sentiment Analysis of Twitter Data
Zayed et al. Role of Artificial Intelligence (AI) in Accounting Information Systems in Detecting Fraud
TW201843639A (en) Financial business analysis platform comprising a business data acquisition unit, a business database storage unit, a business data processing and analysis unit, a semantic engine processing unit and a business analysis visualization unit
Sruthika et al. A study on evolution of data analytics to big data analytics and its research scope
US8914454B1 (en) Verification of social media data
Sunagar et al. Influence of big data in smart tourism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201030

RJ01 Rejection of invention patent application after publication