Disclosure of Invention
The present disclosure provides a data processing method, apparatus, device, and storage medium to at least solve the problems of resource waste and time consumption statistics caused by frequent IO operations in the related art. The technical scheme of the disclosure is as follows:
according to a first aspect of the embodiments of the present disclosure, there is provided a data processing method, including:
when data in a memory needs to be written into a disk, determining an index of specified data in the data needing to be written into the disk, wherein the specified data is determined by configuration information corresponding to the data, and the index is a numerical value obtained by performing mathematical processing on the specified data;
when all indexes of specified data in the data needing to be written into the disk are determined, inputting all the determined indexes into a message queue;
and adopting a stream processing engine to carry out aggregation processing on all indexes in the message alignment to obtain a statistical result.
In an embodiment, the determining an index of specified data in the data that needs to be written to the disk includes:
distributing data needing to be written into the disk to at least one thread;
determining, by the at least one thread, an indicator of a specified one of the data to which each is assigned.
In one embodiment, the determining, by the at least one thread, a metric specifying data among the data to which each is assigned includes:
and calling a statistical plug-in unit through the at least one thread to determine the index of the designated data in the data distributed to the at least one thread, wherein the statistical plug-in unit is used for acquiring corresponding configuration information and determining the index of the designated data in the data according to the configuration information.
In an embodiment, the aggregating, by using the stream processing engine, all the indicators in the message pair column to obtain a statistical result includes:
acquiring identification information carried by each index, wherein the identification information comprises a time identification;
determining an aggregation window by using the stream processing engine according to the identification information;
and counting all the indexes positioned in each aggregation window to obtain the statistical result.
In an embodiment, the identification information further comprises a user identification.
According to a second aspect of the embodiments of the present disclosure, there is provided a data processing apparatus including:
the determining module is configured to determine an index of specified data in the data needing to be written into the disk when the data in the memory needs to be written into the disk, wherein the specified data is determined by configuration information corresponding to the data, and the index is a numerical value obtained by performing mathematical processing on the specified data;
the input module is configured to input all determined indexes into a message queue when the determination module determines that all indexes of specified data in the data needing to be written into the disk are finished;
and the processing module is configured to adopt a stream processing engine to perform aggregation processing on all indexes input into the message alignment column by the input module to obtain a statistical result.
In one embodiment, the determining module comprises:
the distribution submodule is configured to distribute data needing to be written into the disk to at least one thread;
a first statistics submodule configured to determine, by the at least one thread, an indicator of specified data of the data allocated by the allocation submodule.
In an embodiment, the first statistical submodule is configured to:
and calling a statistical plug-in unit through the at least one thread to determine the index of the designated data in the data distributed to the at least one thread, wherein the statistical plug-in unit is used for acquiring corresponding configuration information and determining the index of the designated data in the data according to the configuration information.
In one embodiment, the processing module comprises:
the acquisition submodule is configured to acquire identification information carried by each index, and the identification information comprises a time identification;
a determining submodule configured to determine, by using the stream processing engine, an aggregation window according to the identification information acquired by the acquiring submodule;
and the second statistic submodule is configured to perform statistics on all the indexes in the aggregation window determined by each determination submodule to obtain the statistic result.
In an embodiment, the identification information further comprises a user identification.
According to a third aspect of the embodiments of the present disclosure, there is provided a data processing apparatus including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the data processing method described above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium having instructions that, when executed by a processor of a data processing apparatus, enable the data processing apparatus to perform the above-described data processing method.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, which, when run on an electronic device, causes the electronic device to perform the above-mentioned data processing method.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
when data in the memory needs to be written into the disk, the index of the designated data in the data needing to be written into the disk is determined, so that frequent IO (input/output) operations caused by the fact that the data is written into the disk first and the data needs to be read from the disk when the data index is determined are avoided, resources are saved, the counting time is shortened, and the timeliness is better.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
In the related art, data needs to be read into a memory from a disk when indexes are acquired, and then the indexes are acquired according to configured index acquisition items, and the acquisition mode needs to consume a lot of Input and Output (IO) and resources.
Fig. 1 is a flowchart illustrating a data processing method according to an exemplary embodiment of the present disclosure, and as shown in fig. 1, the data processing method may include the following steps:
in step S101, when data in the memory needs to be written to the disk, an index of designated data in the data that needs to be written to the disk is determined, where the designated data is determined by configuration information corresponding to the data that needs to be written to the disk, and the index is a numerical value obtained by performing mathematical processing on the designated data.
When data in the memory needs to be written into the disk, the index of the designated data in the data can be determined before the data is written into the disk, so that frequent IO (input/output) operation caused by the fact that the data is written into the disk first and the data needs to be read from the disk when the data index is to be counted is avoided.
Since the amount of data to be determined may be large, in order to improve statistical efficiency, data that needs to be written to the disk may be allocated to at least one thread, and indexes of designated data in the data allocated to each thread may be counted by the at least one thread.
For example, as shown in fig. 2, data to be written to the disk may be allocated to four threads 21 to 24, and indexes of designated data in the data allocated to each thread may be counted by the four threads.
The process of counting the indexes of the designated data in the data respectively allocated by the at least one thread may be: the statistical plug-in can be generated in advance and used for acquiring corresponding configuration information from a configuration center and counting the indexes of the designated data in the data according to the configuration information.
The index is a value obtained by performing mathematical processing on the specified data, and the index may include, but is not limited to, a maximum value, a minimum value, a mean value, a quantile, a null rate, a value distribution, an abnormal value rate, a default value rate, and the like.
As shown in fig. 2, each thread may call a statistical LIB25 to count indexes of designated data in the data respectively allocated to the thread, where the statistical LIB25 is a statistical plug-in generated in advance, and the statistical LIB25 may obtain configuration information corresponding to each data from a configuration center.
In the embodiment, the statistical plug-in can use an open source library, which is beneficial to improving the statistical efficiency and accuracy and is convenient for updating and upgrading.
In step S102, when the index specifying data among all the data that needs to be written to the disk is determined, all the determined indexes are input to the message queue.
As shown in fig. 2, when the index of the specified data in all the data that needs to be written to the disk is determined, all the determined indexes may be input to the message queue 26.
In step S103, a stream processing engine is used to aggregate all indexes in the message alignment column, so as to obtain a statistical result.
Since the metrics output by each thread are local, all the metrics in the message pair column need to be aggregated, wherein all the metrics in the message pair column can be aggregated by using the stream processing engine.
For example, in fig. 2, the maximum ages of the four threads are 18, 28, 47, and 59, but the index output by each thread has locality, and all the indexes in the message queue need to be compared, that is, all the maximum ages are compared to obtain a global maximum age, that is, 18, 28, 47, and 59 are compared to obtain a maximum age of 59.
The process of performing aggregation processing on all the indexes in the message pair column by using the stream processing engine may include: and acquiring identification information carried by each index, determining aggregation windows by adopting a stream processing engine according to the identification information, and counting all the indexes in each aggregation window to obtain a statistical result.
Wherein the identification information may comprise a time identification, which may comprise a time stamp.
For example, as shown in fig. 2, a database name (datbaseName), a table name (tableName), and a timestamp carried by each index may be obtained, a stream processing engine is used to determine an aggregation window according to the datbaseName, tableName, and timestamp, and all indexes located in each aggregation window are counted to obtain a statistical result, and the statistical result is stored in the database.
Wherein, datbaseName can be used for distinguishing different services, and tableName can be used for distinguishing different data tables.
Optionally, the identification information may further include a user identification, which may include a query identification (query _ id).
For example, query _ id, datbaseName, tableName and timestamp carried by each index may be obtained, the stream processing engine is adopted to determine an aggregation window according to the query _ id, datbaseName, tableName and timestamp, and all the indexes located in each aggregation window are counted to obtain a statistical result.
In this embodiment, the stream processing engine is used to determine the aggregation window according to the query _ id, the databasename, the tableName and the timestamp, so that the determined aggregation window has high accuracy, and meanwhile, the indexes are counted based on the high-accuracy aggregation window, so that the obtained statistical result has higher accuracy.
Optionally, after the step S103, the method may further include: problem data is determined based on the statistical results.
In this embodiment, after the statistical result is obtained, problem data such as error data may be found from the statistical result, and since the statistical result obtained based on the above manner has higher accuracy, the problem data determined based on the above statistical result also has higher accuracy.
According to the embodiment, when the data in the memory needs to be written into the disk, the index of the designated data in the data needing to be written into the disk is determined, so that frequent IO (input/output) operation caused by the fact that the data is written into the disk first and the data needs to be read from the disk when the data index is determined is avoided, resources are saved, the counting time is prolonged, and timeliness is better.
Fig. 3 is a flowchart illustrating another data processing method according to an exemplary embodiment of the disclosure, and as shown in fig. 3, the data processing method may include:
in step S301, when data in the memory needs to be written into the disk, the data that needs to be written into the disk is allocated to at least one thread, and the at least one thread calls the statistical plugin to determine an index of the designated data in the allocated data.
The statistical plug-in is used for acquiring corresponding configuration information from the configuration center and determining indexes of specified data in the data needing to be written into the disk according to the configuration information.
In step S302, when the index specifying data among all the data that needs to be written to the disk is determined, all the determined indexes are input to the message queue.
In step S303, the time identifier and the user identifier carried by each index in the message queue are obtained.
In step S304, the stream processing engine is used to determine aggregation windows according to the time identifier and the user identifier, and count all the indexes in each aggregation window to obtain a statistical result.
Optionally, after the step S303, the method may further include: problem data is determined based on the statistical results.
In this embodiment, after the statistical result is obtained, the problem data such as error data can be found from the statistical result, and since the statistical result obtained based on the above manner has high accuracy and timeliness, the problem data can be determined quickly and accurately based on the above statistical result.
In the embodiment, the statistical efficiency can be improved by calling the statistical plug-in by at least one thread to determine the index of the designated data in the data distributed to the thread, and the statistical accuracy can be improved by determining the aggregation windows according to the time identifier and the user identifier by using the stream processing engine and counting all the indexes in each aggregation window.
Fig. 4 is a block diagram of a data processing apparatus according to an exemplary embodiment of the present disclosure. Referring to fig. 4, the apparatus includes:
the determining module 41 is configured to determine an index of specified data in the data that needs to be written to the disk, where the specified data is determined by the configuration information corresponding to the data, when the data in the memory needs to be written to the disk, and the index is a value obtained by performing mathematical processing on the specified data.
The input module 42 is configured to input all the determined indexes into the message queue when the determination module 41 determines that the index of the specified data among all the data that needs to be written to the disk is completed.
The processing module 43 is configured to perform aggregation processing on all the indexes input into the message queue by the input module 42 by using the stream processing engine, so as to obtain a statistical result.
According to the embodiment, when the data in the memory needs to be written into the disk, the index of the designated data in the data needing to be written into the disk is determined, so that frequent IO (input/output) operation caused by the fact that the data is written into the disk first and the data needs to be read from the disk when the data index is determined is avoided, resources are saved, the counting time is prolonged, and timeliness is better.
Fig. 5 is a block diagram of another data processing apparatus according to an exemplary embodiment of the disclosure, and as shown in fig. 5, on the basis of the embodiment shown in fig. 4, the determining module 41 may include:
the allocating submodule 411 is configured to allocate data to be written to the disk to at least one thread;
the first determining submodule 412 is configured to determine, by at least one thread, an index of specified data among the data allocated by the allocating submodule 411.
In this embodiment, the first determination submodule 412 may be configured to:
and calling the statistical plug-in unit through at least one thread to determine the indexes of the designated data in the data distributed to the statistical plug-in unit, wherein the statistical plug-in unit is used for acquiring corresponding configuration information and determining the indexes of the designated data in the data according to the configuration information.
Fig. 6 is a block diagram of another data processing apparatus according to an exemplary embodiment of the disclosure, and as shown in fig. 6, on the basis of the embodiment shown in fig. 4, the processing module 43 may include:
the obtaining submodule 431 is configured to obtain the identification information carried by each index, where the identification information includes a time identifier.
The second determining submodule 432 is configured to determine the aggregation window using the stream processing engine according to the identification information acquired by the acquiring submodule 431.
The statistics submodule 433 is configured to perform statistics on all the indicators located in the aggregation window determined by each determination submodule 432, so as to obtain a statistical result.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 7 is a block diagram of a data processing device shown in an exemplary embodiment of the present disclosure. As shown in fig. 7, the data processing apparatus includes a processor 710, a memory 720 for storing instructions executable by the processor 710; wherein the processor is configured to execute the above instructions to implement the above data processing method. In addition to the processor 710 and the memory 720 shown in fig. 7, the data processing apparatus may also include other hardware according to the actual functions of data processing, which is not described in detail.
In an exemplary embodiment, a storage medium comprising instructions, such as memory 720 comprising instructions, executable by processor 710 to perform the data processing method described above is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, for example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, a computer program product is also provided, which, when run on an electronic device, causes the electronic device to perform the above-described data processing method.
Fig. 8 is a block diagram illustrating an apparatus suitable for a data processing method according to an exemplary embodiment of the present disclosure, and as shown in fig. 8, an embodiment of the present disclosure provides an apparatus 800 suitable for a data processing method, including: a Radio Frequency (RF) circuit 810, a power supply 820, a processor 830, a memory 840, an input unit 850, a display unit 860, a camera 870, a communication interface 880, and a Wireless Fidelity (Wi-Fi) module 890. Those skilled in the art will appreciate that the configuration of the apparatus shown in fig. 8 is not intended to be limiting, and that embodiments of the present application provide apparatus that may include more or less components than those shown, or that certain components may be combined, or that a different arrangement of components may be used.
The various components of the apparatus 800 are described in detail below with reference to FIG. 8:
the RF circuitry 810 may be used for receiving and transmitting data during a communication or conversation. Specifically, the RF circuit 810 sends the downlink data of the base station to the processor 830 for processing; and in addition, sending the uplink data to be sent to the base station. In general, RF circuit 810 includes, but is not limited to, an antenna, at least one Amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like.
In addition, the RF circuit 810 may also communicate with networks and other devices via wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile communications (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Messaging Service (SMS), etc.
Wi-Fi technology belongs to short distance wireless transmission technology, and the device 800 can be connected with an Access Point (AP) through a Wi-Fi module 890, so as to realize Access of a data network. The Wi-Fi module 890 may be used for receiving and transmitting data during communication.
Device 800 may be physically connected to other devices via communication interface 880. Optionally, the communication interface 880 is connected to a communication interface of another device through a cable, so as to implement data transmission between the device 800 and the other device.
Since the device 800 can implement a communication service to send information to other contacts in the embodiment of the present application, the device 800 needs to have a data transmission function, that is, the device 800 needs to include a communication module inside. Although fig. 8 illustrates communication modules such as RF circuitry 810, Wi-Fi module 890, and communication interface 880, it will be appreciated that at least one of the above-described components, or other communication modules (e.g., bluetooth modules) for enabling communication, are present in device 800 for data transfer.
For example, when the device 800 is a cellular telephone, the device 800 may include RF circuitry 810 and may also include a Wi-Fi module 890; when device 800 is a computer, device 800 may include a communications interface 880 and may also include a Wi-Fi module 890; when the device 800 is a tablet computer, the device 800 may include a Wi-Fi module.
Memory 840 may be used to store software programs and modules. The processor 830 executes various functional applications and data processing of the device 800 by executing software programs and modules stored in the memory 840, and when the processor 830 executes the program codes in the memory 840, part or all of the processes in fig. 1, fig. 2, fig. 3 and fig. 4 of the embodiments of the present disclosure can be implemented.
Alternatively, the memory 840 may mainly include a program storage area and a data storage area. The storage program area can store an operating system, various application programs (such as communication application), a face recognition module and the like; the storage data area may store data created according to the use of the device (such as various multimedia files like pictures, video files, etc., and face information templates), etc.
Further, the memory 840 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
The input unit 850 may be used to receive numeric or character information input by a user and generate key signal inputs related to user settings and function control of the device 800.
Alternatively, the input unit 850 may include a touch panel 851 and other input devices 852.
The touch panel 851, also referred to as a touch screen, can collect touch operations of a user (such as operations of the user on the touch panel 851 or near the touch panel 851 by using any suitable object or accessory such as a finger or a stylus), and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 851 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 830, and can receive and execute commands sent by the processor 830. In addition, the touch panel 851 may be implemented by various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave.
Alternatively, other input devices 852 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.
The display unit 860 may be used to display information input by a user or information provided to the user and various menus of the device 800. The display unit 860 is a display system of the device 800, and is used for presenting an interface and implementing human-computer interaction.
The display unit 860 may include a display panel 861. Alternatively, the Display panel 861 may be configured by a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.
Further, the touch panel 851 may cover the display panel 861, and when the touch panel 851 detects a touch operation on or near the touch panel 851, the touch operation is transmitted to the processor 830 to determine the type of the touch event, and then the processor 830 provides a corresponding visual output on the display panel 861 according to the type of the touch event.
Although in FIG. 8, the touch panel 851 and the display panel 861 are shown as two separate components to implement the input and output functions of the device 800, in some embodiments, the touch panel 851 and the display panel 861 may be integrated to implement the input and output functions of the device 800.
The processor 830, which is the control center of the device 800, connects the respective components using various interfaces and lines, performs various functions of the device 800 and processes data by operating or executing software programs and/or modules stored in the memory 840 and calling data stored in the memory 840, thereby implementing various device-based services.
Optionally, processor 830 may include one or more processing units. Optionally, the processor 830 may integrate an application processor and a modem processor, wherein the application processor mainly handles operating systems, user interfaces, application programs, etc., and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 830.
A camera 870 for performing the functions of the device 800 to take pictures or video. The camera 870 may also be used to implement the scanning function of the apparatus 800 to scan a scanned object (two-dimensional code/barcode).
Device 800 also includes a power supply 820 (such as a battery) for powering the various components. Optionally, the power supply 820 may be logically connected to the processor 830 through a power management system, so as to implement functions of managing charging, discharging, power consumption, and the like through the power management system.
In an exemplary embodiment, the device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described data processing methods.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.