CN113656369A - Log distributed streaming acquisition and calculation method in big data scene - Google Patents
Log distributed streaming acquisition and calculation method in big data scene Download PDFInfo
- Publication number
- CN113656369A CN113656369A CN202110927267.5A CN202110927267A CN113656369A CN 113656369 A CN113656369 A CN 113656369A CN 202110927267 A CN202110927267 A CN 202110927267A CN 113656369 A CN113656369 A CN 113656369A
- Authority
- CN
- China
- Prior art keywords
- data
- task
- flow
- stream
- operator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1805—Append-only file systems, e.g. using logs or journals to store data
- G06F16/1815—Journaling file systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Debugging And Monitoring (AREA)
Abstract
The application relates to the field of data processing, and discloses a log distributed streaming acquisition and calculation method and device, electronic equipment and a storage medium in a big data scene, wherein the method comprises the following steps: collecting flow data, and configuring flow partitions of the flow data according to a directed graph of the constructed flow data; according to the flow partition, performing task conversion on the flow data to generate a plurality of task operators, and calculating the operation data of each task operator; if the operation data does not need logic conversion in the operation process, transmitting the operation data to a downstream task operator in a straight mode, enabling the operation data to be in the same flow partition, summarizing the operation data in the same flow partition, and obtaining final output data; if the operation data needs logic conversion in the operation process, the operation data is transmitted to a task operator corresponding to the logic conversion by adopting a redistribution mode, so that the operation data is in different flow partitions, the operation data in different flow partitions is summarized, and final output data is obtained. The method and the device can improve the processing efficiency of the streaming data.
Description
Technical Field
The present application relates to the field of data processing, and in particular, to a log distributed stream type acquisition and calculation method and apparatus in a big data scene, an electronic device, and a computer-readable storage medium.
Background
In the current era of proliferation of internet users, devices, services and the like, a large amount of continuous data is generated in different business scenes, that is, a large amount of stream data is generated, and how to calculate and process the stream data in real time and high efficiency is increasingly important in the face of the large amount of generated stream data.
Currently, data computation frameworks (such as Map Reduce, Storm, Spark, etc.) are usually adopted to compute and process these streaming data generated in real time, but in an actual service scenario, due to complexity of data and variability of user requirements, when the streaming data is processed by using these data computation frameworks, the requirements of the service scenario cannot be quickly responded, so that processing efficiency of streaming data is not high.
Content of application
In order to solve the technical problem or at least partially solve the technical problem, the application provides a log distributed stream type acquisition and calculation method and device in a big data scene, an electronic device and a computer readable storage medium, which can improve the processing efficiency of stream data.
In a first aspect, the present application provides a log distributed streaming acquisition and calculation method in a big data scenario, including:
collecting flow data, constructing a directed graph of the flow data, and configuring a flow partition of the flow data according to the directed graph;
according to the stream partition, performing task conversion on the stream data by using an operation operator to generate a plurality of task operators, and calculating the operation data of each task operator;
identifying whether the operation data needs logic conversion in the operation process;
if the operation data does not need logic conversion in the operation process, transmitting the operation data to a downstream task operator by adopting a straight-forward mode so as to enable the operation data to be in the same flow partition, and summarizing the operation data in the same flow partition to obtain final output data;
and if the running data needs logic conversion in the running process, transmitting the running data to a task operator corresponding to the logic conversion by adopting a redistribution mode so as to enable the running data to be in different flow partitions, and summarizing the running data in different flow partitions to obtain final output data.
It can be seen that, in the embodiment of the present application, by first constructing a directed graph of collected stream data, and configuring a stream partition of the stream data according to the directed graph, a flow direction and a relationship between data of the stream data in a data processing process can be determined, and data having the same flow direction in the directed graph are merged into the same area for processing, so that consistency of subsequent stream data in the processing process is ensured, and a processing speed of the subsequent stream data is increased; secondly, according to the stream partition, performing task conversion on the stream data by using an operation operator to generate a plurality of task operators, and determining the processing operation of each data in the stream partition, so that the calculation processing of a subsequent application program can be responded quickly, the running data of each task operator can be calculated, and the output result of each task operator can be obtained; further, in the embodiment of the present application, by identifying whether each of the operation data needs to be logically converted in the operation process, it may be determined whether the processing interval of each of the subsequent operation data is in the same stream partition, so that the calculation of the stream data is processed in a straight mode or a redistribution mode, and the processing efficiency of the stream data is improved.
In one possible implementation manner of the first aspect, the constructing the directed graph of the stream data includes:
taking each data in the streaming data as a pixel node, and traversing node paths of all element nodes in the pixel node by adopting a depth-first algorithm;
determining an adjacency list of the pixel nodes of the graphic element according to the node path;
and generating a directed graph of the streaming data according to the adjacency list.
In a possible implementation manner of the first aspect, the configuring, according to the directed graph, a stream partition of the stream data includes:
querying data with the same direction in the directed graph;
establishing a timestamp for the inquired data to obtain target data, and establishing a data partition table of the target data;
and creating a buffer area of the data partition table to obtain the stream partition of the stream data.
In a possible implementation manner of the first aspect, the task converting the stream data by using an operation operator according to the stream partition to generate a plurality of task operators includes:
acquiring data attributes of each data in the stream partition, and identifying the data characteristics of each data in the stream data according to the data attributes;
converting each data feature into a feature field to obtain a plurality of feature fields;
and responding to a data operation task input by a user, and creating an operation task of each characteristic field in the stream partition by using an operation operator to obtain a plurality of task operators.
In a possible implementation manner of the first aspect, the calculating the operation data of each task operator includes:
calculating the operation data of each task operator by using the following formula:
Sk=1
wherein S iskThe method comprises the steps of representing operation data of task operators, k representing the kth task operator in the task operators, j representing data in the task operators, n representing the number of data in the task operators, t representing output time of the task operators, and r representing an output function of the task operators.
In a possible implementation manner of the first aspect, the identifying whether each of the operation data needs to be logically converted in an operation process includes:
triggering each task thread of the running data, and detecting whether the data of each task thread in the running process needs to be updated;
if the updating is not needed, the operation data does not need logic conversion;
if the update is needed, the operation data needs logic conversion.
In a possible implementation manner of the first aspect, the transmitting the running data to a downstream task operator in a straight-forward mode includes:
acquiring an upstream task operator generating the operating data, and configuring data transmission channels of the upstream task operator and the downstream task operator;
and transmitting the operating data to a downstream task operator according to the data transmission channel.
In a second aspect, the present application provides a log distributed streaming acquisition and computation apparatus in a big data scenario, where the apparatus includes:
the flow partition configuration module is used for collecting flow data, constructing a directed graph of the flow data and configuring the flow partitions of the flow data according to the directed graph;
the operation data calculation module is used for performing task conversion on the stream data by using an operation operator according to the stream partition to generate a plurality of task operators and calculating operation data of each task operator;
the data conversion identification module is used for identifying whether the operation data needs logic conversion in the operation process;
the operation data summarization module is used for transmitting the operation data to a downstream task operator in a straight mode when the operation data does not need logic conversion in the operation process so as to enable the operation data to be in the same flow partition and summarize the operation data in the same flow partition to obtain final output data;
and the operation data summarizing module is further used for transmitting the operation data to the task operator corresponding to the logic conversion by adopting a redistribution mode when the operation data needs the logic conversion in the operation process, so that the operation data is in different flow partitions, and summarizing the operation data in different flow partitions to obtain final output data.
In a third aspect, the present application provides an electronic device, comprising:
at least one processor; and a memory communicatively coupled to the at least one processor;
wherein the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the log distributed streaming acquisition and calculation method in a big data scenario as described in any one of the above first aspects.
In a fourth aspect, the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the log distributed streaming acquisition and calculation method in a big data scenario as described in any one of the above first aspects is implemented.
It is understood that the beneficial effects of the second to fourth aspects can be seen from the description of the first aspect, and are not described herein again.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
Fig. 1 is a detailed flowchart of a log distributed streaming acquisition and calculation method in a big data scene according to an embodiment of the present application;
fig. 2 is a schematic flow chart illustrating a step of the log distributed streaming acquisition and calculation method in the big data scenario provided in fig. 1 according to an embodiment of the present application;
fig. 3 is a schematic flow chart illustrating another step of the log distributed streaming acquisition and calculation method in the big data scenario provided in fig. 1 according to an embodiment of the present application;
fig. 4 is a schematic block diagram of a log distributed streaming acquisition and computation apparatus in a big data scenario according to an embodiment of the present application;
fig. 5 is a schematic view of an internal structure of an electronic device for implementing a log distributed stream type collection and calculation method in a big data scene according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
A log distributed streaming collection and calculation method in a big data scenario according to an embodiment of the present application is described with reference to a flowchart shown in fig. 1. The log distributed stream type acquisition and calculation method in the big data scene described in fig. 1 includes:
s1, collecting flow data, constructing a directed graph of the flow data, and configuring flow partitions of the flow data according to the directed graph.
In the embodiment of the present application, the streaming data refers to data continuously generated by thousands of data sources, and is usually sent in the form of data records, which have a data sequence that arrives sequentially, massively, quickly, and continuously, and can be applied to the fields such as network monitoring, sensor networks, aerospace, weather measurement and control, financial services, and the like. Further, in the embodiment of the present application, the acquisition of the flow data may be combined in one or more of the following manners: the method comprises the following steps of firstly, consuming a data source in a message queue; a second mode is to consume a data source of the distributed log; and thirdly, consuming a bounded data source from the historical data.
Further, the embodiment of the application constructs the directed graph of the stream data to determine the stream direction and the relationship between the stream data and the data in the data processing process. As an embodiment of the present application, the constructing a directed graph of the streaming data includes: and taking each data in the stream data as a pixel node, traversing node paths of all element nodes in the pixel node by adopting a depth-first algorithm, determining an adjacency list of the pixel node according to the node paths, and generating the directed graph of the stream data according to the adjacency list.
Illustratively, the existing stream data includes six data of "car insurance, life insurance, production insurance, stock, fund", the six data are converted into pixel nodes to obtain six graph element nodes of "1, 2,3,4,5, 5", a node path of the "2, 3,4,5, 5" is traversed from "1" by adopting a depth-first algorithm, if the node path of the element node of "2" is true, it is determined that the element node of "1" and the element node of "2" are in a pointing relationship, that is, "1" can successfully point to the element node of "2", a corresponding relationship of edge to [2] ═ 1 is generated, the traversal step of the element nodes is repeatedly executed until the traversal step of each element node in the graph element nodes is ended, a corresponding relationship of each element node is obtained, and thus an adjacency list of the graph element nodes is generated.
Further, according to the directed graph, the streaming partition of the streaming data is configured, so that data with the same flow direction in the directed graph are merged to the same area for processing, and the processing speed of subsequent streaming data is increased.
As an embodiment of the present application, referring to fig. 2, the configuring the stream partition of the stream data according to the directed graph includes:
s201, inquiring data with the same direction in the directed graph;
s202, constructing a timestamp for the inquired data to obtain target data, and establishing a data partition table of the target data;
s202, creating a buffer area of the data partition table to obtain the stream partition of the stream data.
The data with the same orientation can be realized by a query statement, such as an SQL statement, the construction of the timestamp is used for characterizing the processing time and the event time of subsequent data, and the data partition table can be constructed in a key-value pair form, that is, the ID sum of the target data is used as a key-value; the buffer area may be opened up in the memory of the application running the streaming data.
Based on the configuration of the stream partition, the data with the same stream direction in the stream data can be identified, and the consistency of the subsequent stream data in the processing process is ensured, so that the processing speed of the stream data is increased.
And S2, according to the stream partition, performing task conversion on the stream data by using an operation operator to generate a plurality of task operators, and calculating the operation data of each task operator.
It should be understood that the stream data is realized in response to different module units in an application program during running, and therefore, according to the stream partition, the application performs task conversion on the stream data through an operation operator to determine the processing operation of each data in the stream partition, so that the calculation processing of a subsequent application program can be quickly responded, wherein the operation operator can be understood as an operation instruction or an operation command.
As an embodiment of the present application, referring to fig. 3, the task conversion of the stream data by using an operation operator according to the stream partition to generate a plurality of task operators includes:
s301, acquiring data attributes of each data in the stream partition, and identifying the data characteristics of each data in the stream data according to the data attributes;
s302, converting each data feature into a feature field to obtain a plurality of feature fields;
s303, responding to a data operation task input by a user, and creating an operation task of each characteristic field in the stream partition by using an operation operator to obtain a plurality of task operators.
The data attribute is used for representing basic information of each data in the stream data, such as a data name, a data type, a data dimension, a data identifier and the like, the data feature is used for representing feature information of each data in the stream data, part of useless information of each data in the stream data can be screened out based on the data feature so as to improve data processing speed, the feature field is converted to convert the data feature into characters which can be recognized by a computer, and optionally, the feature field conversion can be realized in a coded form; the data operation tasks are generated based on different user requirements, such as the requirement of the user A is to calculate the profit and loss of the vehicle insurance and the life insurance in the last year.
Based on the generation of the task operator, the running task of each data in the streaming data in the subsequent running process can be determined, so that the running data calculation premise of the streaming data can be guaranteed.
Further, the method and the device obtain the output result of each task operator by calculating the operation data of each task operator. In an alternative embodiment of the present application, the operational data of each of the task operators is calculated using the following formula:
Sk=1
wherein S iskThe method comprises the steps of representing operation data of task operators, k representing the kth task operator in the task operators, j representing data in the task operators, n representing the number of data in the task operators, t representing output time of the task operators, and r representing an output function of the task operators.
And S3, identifying whether the operation data needs logic conversion in the operation process.
It should be understood that, in the running process of the running data, due to the complexity and changeability of the actual service scenario, the user's requirement may have a phenomenon of change, that is, the running data may have a task transition phenomenon, and therefore, by identifying whether each running data needs to be logically converted in the running process, the present application may determine whether the processing interval of each subsequent running data is in the same flow partition, so as to adopt a straight mode or a redistribution mode to process the calculation of the flow data, thereby improving the processing efficiency of the flow data. Wherein the logic conversion may be understood as a data update operation for each of the operation data.
As an embodiment of the present application, the identifying whether each of the operation data needs to be logically converted in an operation process includes: triggering each task thread of the running data, detecting whether the data of each task thread in the running process needs to be updated, if not, the running data does not need logic conversion, and if so, the running data needs logic conversion.
The task thread refers to a minimum operation unit for operating the task operator, and the data updating is realized by detecting the task requirement of each operating data.
And if the operation data does not need logic conversion in the operation process, executing S4, and transmitting the operation data to a downstream task operator by adopting a straight-forward mode so as to enable the operation data to be in the same flow partition, and summarizing the operation data in the same flow partition to obtain final output data.
In the embodiment of the application, when the running data does not need logic conversion in the running process, the running data is transmitted to a downstream task operator in a direct mode, so that the running data is in the same flow partition, that is, the consistency of the input sequence and the output sequence of the running data is ensured, the direct mode can be understood as a one-to-one mode, that is, the data of the same flow partition is transmitted to the same flow partition of the downstream task operator, and the downstream task operator can be understood as a task operator which keeps event processing consistency with an upstream task operator generating the running data.
As an embodiment of the present application, the transmitting the running data to a downstream task operator in a straight-forward mode includes: and acquiring an upstream task operator generating the operating data, configuring a data transmission channel of the upstream task operator and the downstream task operator, and transmitting the operating data to the downstream task operator according to the data transmission channel.
The data transmission channel may be understood as a data communication link between the upstream task operator and the downstream task operator, and may be configured by a network transmission protocol, where the network transmission protocol may be an https/http protocol.
Further, the method and the device for obtaining the data output of the same flow partition have the advantages that the operation data in the same flow partition are collected to output all data generated by the flow data in the same flow partition in the operation process, and the final output data of the same flow partition is obtained.
It should be noted that, another embodiment of the present application further includes: when the running data is transmitted to a downstream task operator in the straight mode, a Hadoop Distributed File System (HDFS) is adopted to back up the running data, so that the phenomena that errors occur in the transmission process of the running data due to instability of network transmission or characteristics of streaming data, and back-up and reloading of the running data cannot be guaranteed in time are prevented.
And if the running data needs logic conversion in the running process, executing S5, and transmitting the running data to a task operator corresponding to the logic conversion by adopting a redistribution mode so as to enable the running data to be in different flow partitions, and summarizing the running data in different flow partitions to obtain final output data.
In the embodiment of the application, when the operation data needs logic conversion in the operation process, the operation data is transmitted to the task operator corresponding to the logic conversion in a redistribution mode, so that the operation data is in different flow partitions, namely, the operation data is subjected to task conversion, different task requirements are met, the operation data in different flow partitions is summarized, and final output data of different flow partitions is obtained.
It should be noted that, another embodiment of the present application further includes: and when the operation data is transmitted to the task operator corresponding to the logic conversion in a redistribution mode, a Hadoop Distributed File System (HDFS) is adopted to backup the operation data so as to prevent the operation data from generating errors in the transmission process and being incapable of guaranteeing the phenomena of back-supplement and reloading of the operation data in time due to the instability of network transmission or the characteristics of stream data.
It can be seen that, in the embodiment of the present application, by first constructing a directed graph of collected stream data, and configuring a stream partition of the stream data according to the directed graph, a flow direction and a relationship between data of the stream data in a data processing process can be determined, and data having the same flow direction in the directed graph are merged into the same area for processing, so that consistency of subsequent stream data in the processing process is ensured, and a processing speed of the subsequent stream data is increased; secondly, according to the stream partition, performing task conversion on the stream data by using an operation operator to generate a plurality of task operators, and determining the processing operation of each data in the stream partition, so that the calculation processing of a subsequent application program can be responded quickly, the running data of each task operator can be calculated, and the output result of each task operator can be obtained; further, in the embodiment of the present application, by identifying whether each of the operation data needs to be logically converted in the operation process, it may be determined whether the processing interval of each of the subsequent operation data is in the same stream partition, so that the calculation of the stream data is processed in a straight mode or a redistribution mode, and the processing efficiency of the stream data is improved.
Fig. 4 is a functional block diagram of a log distributed streaming acquisition and computation apparatus in the big data scenario of the present application.
The log distributed streaming acquisition and computing device 400 in the big data scenario described in the present application may be installed in an electronic device. According to the realized functions, the log distributed stream acquisition and calculation apparatus in the big data scene may include a stream partition configuration module 401, an operation data calculation module 402, a data conversion identification module 403, and an operation data summarization generation module 404. A module according to the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the flow partition configuration module 401 is configured to collect flow data, construct a directed graph of the flow data, and configure a flow partition of the flow data according to the directed graph;
the running data calculation module 402 is configured to perform task conversion on the stream data by using an operation operator according to the stream partition, generate a plurality of task operators, and calculate running data of each task operator;
the data conversion identification module 403 is configured to identify whether logic conversion is required in the operation process of the operation data;
the running data summarizing module 404 is configured to transmit the running data to a downstream task operator in a straight mode when the running data does not need logic conversion in a running process, so that the running data is in the same flow partition, and summarize the running data in the same flow partition to obtain final output data;
the operation data summarization module 404 is further configured to, when the operation data needs to be logically converted in the operation process, transmit the operation data to a task operator corresponding to the logical conversion in a redistribution mode, so that the operation data are in different stream partitions, and summarize the operation data in different stream partitions, thereby obtaining final output data.
In detail, in the embodiment of the present application, when the modules in the log distributed streaming type collecting and calculating apparatus 400 in the big data scene are used, the same technical means as the log distributed streaming type collecting and calculating method in the big data scene described in fig. 1 and fig. 3 are used, and the same technical effect can be produced, and details are not repeated here.
Fig. 5 is a schematic structural diagram of an electronic device implementing a log distributed stream collection and calculation method in a big data scenario according to the present application.
The electronic device may include a processor 50, a memory 51, a communication bus 52 and a communication interface 53, and may further include a computer program stored in the memory 51 and executable on the processor 50, such as a log distributed streaming acquisition and calculation program in a big data scenario.
In some embodiments, the processor 50 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same function or different functions, and includes one or more Central Processing Units (CPUs), a microprocessor, a digital Processing chip, a graphics processor, a combination of various control chips, and the like. The processor 50 is a Control Unit (Control Unit) of the electronic device, connects various components of the whole electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device by running or executing programs or modules stored in the memory 51 (for example, executing log distributed stream acquisition and calculation programs in a big data scene, etc.), and calling data stored in the memory 51.
The memory 51 includes at least one type of readable storage medium including flash memory, removable hard disks, multimedia cards, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, etc. The memory 51 may in some embodiments be an internal storage unit of the electronic device, for example a removable hard disk of the electronic device. The memory 51 may also be an external storage device of the electronic device in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device. Further, the memory 51 may also include both an internal storage unit and an external storage device of the electronic device. The memory 51 may be used to store not only application software installed in the electronic device and various types of data, such as codes of log distributed stream acquisition and calculation programs in a big data scenario, but also temporarily store data that has been output or will be output.
The communication bus 52 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 51 and at least one processor 50 or the like.
The communication interface 53 is used for communication between the electronic device and other devices, and includes a network interface and a user interface. Optionally, the network interface may include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), which are typically used to establish a communication connection between the electronic device and other electronic devices. The user interface may be a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable, among other things, for displaying information processed in the electronic device and for displaying a visualized user interface.
Fig. 5 shows only an electronic device having components, and those skilled in the art will appreciate that the structure shown in fig. 5 does not constitute a limitation of the electronic device, and may include fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 50 through a power management device, so that functions of charge management, discharge management, power consumption management and the like are realized through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The log distributed stream acquisition and calculation program in the big data scenario stored in the memory 51 in the electronic device is a combination of multiple computer programs, and when running in the processor 50, it can implement:
collecting flow data, constructing a directed graph of the flow data, and configuring a flow partition of the flow data according to the directed graph;
according to the stream partition, performing task conversion on the stream data by using an operation operator to generate a plurality of task operators, and calculating the operation data of each task operator;
identifying whether the operation data needs logic conversion in the operation process;
if the operation data does not need logic conversion in the operation process, transmitting the operation data to a downstream task operator by adopting a straight-forward mode so as to enable the operation data to be in the same flow partition, and summarizing the operation data in the same flow partition to obtain final output data;
and if the running data needs logic conversion in the running process, transmitting the running data to a task operator corresponding to the logic conversion by adopting a redistribution mode so as to enable the running data to be in different flow partitions, and summarizing the running data in different flow partitions to obtain final output data.
Specifically, the processor 50 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the computer program, which is not described herein again.
Further, the electronic device integrated module/unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a non-volatile computer-readable storage medium. The computer readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
The present application also provides a computer-readable storage medium, storing a computer program that, when executed by a processor of an electronic device, may implement:
collecting flow data, constructing a directed graph of the flow data, and configuring a flow partition of the flow data according to the directed graph;
according to the stream partition, performing task conversion on the stream data by using an operation operator to generate a plurality of task operators, and calculating the operation data of each task operator;
identifying whether the operation data needs logic conversion in the operation process;
if the operation data does not need logic conversion in the operation process, transmitting the operation data to a downstream task operator by adopting a straight-forward mode so as to enable the operation data to be in the same flow partition, and summarizing the operation data in the same flow partition to obtain final output data;
and if the running data needs logic conversion in the running process, transmitting the running data to a task operator corresponding to the logic conversion by adopting a redistribution mode so as to enable the running data to be in different flow partitions, and summarizing the running data in different flow partitions to obtain final output data.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is merely exemplary of the present application and is presented to enable those skilled in the art to understand and practice the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. A log distributed streaming acquisition and calculation method in a big data scene is characterized by comprising the following steps:
collecting flow data, constructing a directed graph of the flow data, and configuring a flow partition of the flow data according to the directed graph;
according to the stream partition, performing task conversion on the stream data by using an operation operator to generate a plurality of task operators, and calculating the operation data of each task operator;
identifying whether the operation data needs logic conversion in the operation process;
if the operation data does not need logic conversion in the operation process, transmitting the operation data to a downstream task operator by adopting a straight-forward mode so as to enable the operation data to be in the same flow partition, and summarizing the operation data in the same flow partition to obtain final output data;
and if the running data needs logic conversion in the running process, transmitting the running data to a task operator corresponding to the logic conversion by adopting a redistribution mode so as to enable the running data to be in different flow partitions, and summarizing the running data in different flow partitions to obtain final output data.
2. The log distributed streaming collection and calculation method in the big data scenario according to claim 1, wherein the constructing the directed graph of the stream data includes:
taking each data in the streaming data as a pixel node, and traversing node paths of all element nodes in the pixel node by adopting a depth-first algorithm;
determining an adjacency list of the pixel nodes of the graphic element according to the node path;
and generating a directed graph of the streaming data according to the adjacency list.
3. The log distributed streaming collection and calculation method in the big data scenario according to claim 1, wherein the configuring the streaming partition of the streaming data according to the directed graph includes:
querying data with the same direction in the directed graph;
establishing a timestamp for the inquired data to obtain target data, and establishing a data partition table of the target data;
and creating a buffer area of the data partition table to obtain the stream partition of the stream data.
4. The log distributed stream type collection and calculation method under the big data scene as claimed in claim 1, wherein said performing task conversion on said stream data by using an operation operator according to said stream partition to generate a plurality of task operators comprises:
acquiring data attributes of each data in the stream partition, and identifying the data characteristics of each data in the stream data according to the data attributes;
converting each data feature into a feature field to obtain a plurality of feature fields;
and responding to a data operation task input by a user, and creating an operation task of each characteristic field in the stream partition by using an operation operator to obtain a plurality of task operators.
5. The log distributed streaming collection and calculation method under the big data scenario of claim 1, wherein the calculating the operation data of each task operator comprises:
calculating the operation data of each task operator by using the following formula:
Sk=1
wherein S iskThe method comprises the steps of representing operation data of task operators, k representing the kth task operator in the task operators, j representing data in the task operators, n representing the number of data in the task operators, t representing output time of the task operators, and r representing an output function of the task operators.
6. The log distributed streaming collection and calculation method in the big data scenario according to any one of claims 1 to 5, wherein the identifying whether each of the operation data needs logic conversion during an operation process includes:
triggering each task thread of the running data, and detecting whether the data of each task thread in the running process needs to be updated;
if the updating is not needed, the operation data does not need logic conversion;
if the update is needed, the operation data needs logic conversion.
7. The log distributed streaming collection and calculation method in big data scenario according to claim 1, wherein the transmitting the running data to a downstream task operator in a straight mode includes:
acquiring an upstream task operator generating the operating data, and configuring data transmission channels of the upstream task operator and the downstream task operator;
and transmitting the operating data to a downstream task operator according to the data transmission channel.
8. A log distributed stream type acquisition and computation apparatus in big data scene, the apparatus comprising:
the flow partition configuration module is used for collecting flow data, constructing a directed graph of the flow data and configuring the flow partitions of the flow data according to the directed graph;
the operation data calculation module is used for performing task conversion on the stream data by using an operation operator according to the stream partition to generate a plurality of task operators and calculating operation data of each task operator;
the data conversion identification module is used for identifying whether the operation data needs logic conversion in the operation process;
the operation data summarization module is used for transmitting the operation data to a downstream task operator in a straight mode when the operation data does not need logic conversion in the operation process so as to enable the operation data to be in the same flow partition and summarize the operation data in the same flow partition to obtain final output data;
and the operation data summarizing module is further used for transmitting the operation data to the task operator corresponding to the logic conversion by adopting a redistribution mode when the operation data needs the logic conversion in the operation process, so that the operation data is in different flow partitions, and summarizing the operation data in different flow partitions to obtain final output data.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the log distributed streaming acquisition and computation method in big data scenarios as claimed in any one of claims 1 to 7.
10. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the log distributed streaming acquisition and computation method in big data scenario according to any one of claims 1 to 7.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110927267.5A CN113656369A (en) | 2021-08-13 | 2021-08-13 | Log distributed streaming acquisition and calculation method in big data scene |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110927267.5A CN113656369A (en) | 2021-08-13 | 2021-08-13 | Log distributed streaming acquisition and calculation method in big data scene |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN113656369A true CN113656369A (en) | 2021-11-16 |
Family
ID=78479629
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202110927267.5A Pending CN113656369A (en) | 2021-08-13 | 2021-08-13 | Log distributed streaming acquisition and calculation method in big data scene |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN113656369A (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114564547A (en) * | 2022-02-10 | 2022-05-31 | 阿里云计算有限公司 | Data processing method, device, equipment and storage medium |
| CN115168410A (en) * | 2022-09-07 | 2022-10-11 | 北京镜舟科技有限公司 | Operator execution method and device, electronic equipment and storage medium |
Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP2731020A1 (en) * | 2012-11-07 | 2014-05-14 | Fujitsu Limited | Database system and method for storing distributed data and logic |
| US20150134626A1 (en) * | 2013-11-11 | 2015-05-14 | Amazon Technologies, Inc. | Partition-based data stream processing framework |
| US20150169786A1 (en) * | 2013-12-16 | 2015-06-18 | Zbigniew Jerzak | Event stream processing partitioning |
| WO2017045400A1 (en) * | 2015-09-17 | 2017-03-23 | 华为技术有限公司 | Method and apparatus for optimizing stream application |
| CN108628605A (en) * | 2018-04-28 | 2018-10-09 | 百度在线网络技术(北京)有限公司 | Stream data processing method, device, server and medium |
| CN108733832A (en) * | 2018-05-28 | 2018-11-02 | 北京阿可科技有限公司 | The distributed storage method of directed acyclic graph |
| CN111352961A (en) * | 2020-03-16 | 2020-06-30 | 华南师范大学 | Distributed RDF stream data processing method, system, device and medium |
| CN111770025A (en) * | 2020-06-22 | 2020-10-13 | 深圳大学 | A parallel data partition method, device, electronic device and storage medium |
| CN111782371A (en) * | 2020-06-30 | 2020-10-16 | 北京百度网讯科技有限公司 | Stream computing method and device based on DAG interaction |
| US20200379774A1 (en) * | 2019-05-30 | 2020-12-03 | Microsoft Technology Licensing, Llc | Efficient out of process reshuffle of streaming data |
| CN112905854A (en) * | 2021-03-05 | 2021-06-04 | 北京中经惠众科技有限公司 | Data processing method and device, computing equipment and storage medium |
-
2021
- 2021-08-13 CN CN202110927267.5A patent/CN113656369A/en active Pending
Patent Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP2731020A1 (en) * | 2012-11-07 | 2014-05-14 | Fujitsu Limited | Database system and method for storing distributed data and logic |
| US20150134626A1 (en) * | 2013-11-11 | 2015-05-14 | Amazon Technologies, Inc. | Partition-based data stream processing framework |
| US20150169786A1 (en) * | 2013-12-16 | 2015-06-18 | Zbigniew Jerzak | Event stream processing partitioning |
| WO2017045400A1 (en) * | 2015-09-17 | 2017-03-23 | 华为技术有限公司 | Method and apparatus for optimizing stream application |
| CN108628605A (en) * | 2018-04-28 | 2018-10-09 | 百度在线网络技术(北京)有限公司 | Stream data processing method, device, server and medium |
| CN108733832A (en) * | 2018-05-28 | 2018-11-02 | 北京阿可科技有限公司 | The distributed storage method of directed acyclic graph |
| US20200379774A1 (en) * | 2019-05-30 | 2020-12-03 | Microsoft Technology Licensing, Llc | Efficient out of process reshuffle of streaming data |
| CN111352961A (en) * | 2020-03-16 | 2020-06-30 | 华南师范大学 | Distributed RDF stream data processing method, system, device and medium |
| CN111770025A (en) * | 2020-06-22 | 2020-10-13 | 深圳大学 | A parallel data partition method, device, electronic device and storage medium |
| CN111782371A (en) * | 2020-06-30 | 2020-10-16 | 北京百度网讯科技有限公司 | Stream computing method and device based on DAG interaction |
| CN112905854A (en) * | 2021-03-05 | 2021-06-04 | 北京中经惠众科技有限公司 | Data processing method and device, computing equipment and storage medium |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114564547A (en) * | 2022-02-10 | 2022-05-31 | 阿里云计算有限公司 | Data processing method, device, equipment and storage medium |
| CN115168410A (en) * | 2022-09-07 | 2022-10-11 | 北京镜舟科技有限公司 | Operator execution method and device, electronic equipment and storage medium |
| CN115168410B (en) * | 2022-09-07 | 2022-12-20 | 北京镜舟科技有限公司 | Operator execution method and device, electronic equipment and storage medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN113918361B (en) | Terminal control method, device, equipment and medium based on Internet of Things rule engine | |
| US10447772B2 (en) | Managed function execution for processing data streams in real time | |
| CN111339071B (en) | Method and device for processing multi-source heterogeneous data | |
| CN111339073A (en) | Real-time data processing method and device, electronic equipment and readable storage medium | |
| JP2021119463A (en) | Knowledge graph generation method, relationship mining method, equipment, equipment and medium | |
| CN112445854B (en) | Multi-source service data real-time processing method, device, terminal and storage medium | |
| CN114756301B (en) | Log processing method, device and system | |
| CN111966289A (en) | Partition optimization method and system based on Kafka cluster | |
| CN112685259A (en) | Data acquisition method and device based on buried points, electronic equipment and storage medium | |
| CN107203437A (en) | The methods, devices and systems for preventing internal storage data from losing | |
| JP5268589B2 (en) | Information processing apparatus and information processing apparatus operating method | |
| CN114707474A (en) | Report generation method and device, electronic equipment and computer readable storage medium | |
| CN113890712A (en) | Data transmission method and device, electronic equipment and readable storage medium | |
| CN113656369A (en) | Log distributed streaming acquisition and calculation method in big data scene | |
| CN115941446A (en) | Alarm root cause location method, apparatus, electronic device and computer readable medium | |
| CN111181769A (en) | Network topological graph drawing method, system, device and computer readable storage medium | |
| CN114490137A (en) | Service data real-time statistical method and device, electronic equipment and readable storage medium | |
| CN115588244A (en) | Internet of vehicles big data real-time analysis method, device, equipment and medium | |
| CN115510065A (en) | Data management method, device, equipment and storage medium of data warehouse | |
| CN111143057B (en) | Heterogeneous cluster data processing method and system based on multiple data centers and electronic equipment | |
| CN114153862A (en) | Service data processing method, device, equipment and storage medium | |
| CN114727100B (en) | Joint debugging method and device for monitoring equipment | |
| CN111885158A (en) | Cluster task processing method and device, electronic equipment and storage medium | |
| CN115129694A (en) | High-concurrency-based request processing method and device, electronic equipment and storage medium | |
| CN117376092A (en) | Fault root cause positioning method, device, equipment and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20211116 |
|
| WD01 | Invention patent application deemed withdrawn after publication |