Disclosure of Invention
The purpose of the present disclosure is to provide a method and an apparatus for filtering heterogeneous big data information, aiming at the defects of the prior art, and specifically including the following steps:
step 1, reading heterogeneous big data and splitting according to a data structure to obtain standard data;
step 2, calculating the error rate of the standard data;
step 3, deleting abnormal data with the error rate larger than the error threshold value in the standard data to obtain filtered data;
step 4, sorting the filtered data according to the error rate;
step 5, deleting 10% of data at the head and the tail in the sorted queue to obtain a filtering result;
and 6, outputting a filtering result.
Further, in step 1, the data structure of the heterogeneous big data at least includes an array, a queue, a hash table, and a tree.
Further, in step 1, the step of obtaining the standard data by splitting according to the data structure includes the following sub-steps:
step 1.1, inputting according to the data structure type of the heterogeneous big data;
step 1.2, reading and splitting the data into metadata with keywords according to the data structure type;
step 1.3, combining metadata according to the same keywords to obtain standard data;
the standard data includes at least a data magnitude.
Further, in step 2, the sub-step of calculating the error rate of the standard data is:
step 2.1, set x
1,x
2,x
3,…,x
nFor the data magnitudes of n standard data, the arithmetic mean X' is
Step 2.2, the error rate s of the standard data by arithmetic mean X' is formulated as:
wherein n is a positive integer greater than or equal to 0, the value range is not limited, i is 1-n, xiIs the data magnitude of the standard data.
Further, in step 3, the error threshold is: let S
1,S
2,S
3,…,S
nFor an error rate of n standard data, the error threshold S' is
Further, in step 4, the sorting method according to the error rate at least includes bubble sorting, insertion sorting and simple selection sorting.
The invention also provides a device for filtering heterogeneous big data information, which comprises:
the splitting unit is used for reading the heterogeneous big data and splitting the data according to the data structure to obtain standard data;
an error rate calculation unit for calculating an error rate of the standard data;
the exception handling unit is used for deleting the exception data of which the error rate is greater than the error threshold value in the standard data to obtain filtered data;
the sorting unit is used for sorting the filtered data according to the error rate;
a head and tail removing unit for deleting 10% of the head and tail data in the sorted queue to obtain a filtering result;
and the output unit is used for outputting the filtering result.
The beneficial effect of this disclosure does: the invention discloses a method and a device for filtering heterogeneous big data information, which can normalize data, reduce the error rate of heterogeneous data and improve the logic compatibility of the heterogeneous data by adopting a uniform heterogeneous data processing method aiming at different data structures.
Detailed Description
The conception, specific structure and technical effects of the present disclosure will be clearly and completely described below in conjunction with the embodiments and the accompanying drawings to fully understand the objects, aspects and effects of the present disclosure. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
Fig. 1 is a flowchart illustrating a method for filtering heterogeneous big data information according to the present disclosure, and the method for filtering heterogeneous big data information according to an embodiment of the present disclosure is described below with reference to fig. 1.
The disclosure provides a method for filtering heterogeneous big data information, which specifically comprises the following steps:
step 1, reading heterogeneous big data and splitting according to a data structure to obtain standard data;
step 2, calculating the error rate of the standard data;
step 3, deleting abnormal data with the error rate larger than the error threshold value in the standard data to obtain filtered data;
step 4, sorting the filtered data according to the error rate;
step 5, deleting 10% of data at the head and the tail in the sorted queue to obtain a filtering result;
and 6, outputting a filtering result.
Further, in step 1, the data structure of the heterogeneous big data at least includes an array, a queue, a hash table, and a tree.
Further, in step 1, the step of obtaining the standard data by splitting according to the data structure includes the following sub-steps:
step 1.1, inputting according to the data structure type of the heterogeneous big data;
step 1.2, reading and splitting the data into metadata with keywords according to the data structure type;
step 1.3, combining metadata according to the same keywords to obtain standard data;
the standard data includes at least a data magnitude.
Further, in step 2, the sub-step of calculating the error rate of the standard data is:
step 2.1, set x
1,x
2,x
3,…,x
nFor the data magnitudes of n standard data, the arithmetic mean X' is
Step 2.2, calculatingThe error rate s for the mean X' standard data is formulated as:
wherein n is a positive integer greater than or equal to 0, the value range is not limited, i is 1-n, xiIs the data magnitude of the standard data.
Further, in step 3, the error threshold is: let S
1,S
2,S
3,…,S
nFor an error rate of n standard data, the error threshold S' is
Further, in step 4, the sorting method according to the error rate at least includes bubble sorting, insertion sorting and simple selection sorting.
The present invention also provides a device for filtering heterogeneous big data information, as shown in fig. 2, the device includes:
the splitting unit is used for reading the heterogeneous big data and splitting the data according to the data structure to obtain standard data;
an error rate calculation unit for calculating an error rate of the standard data;
the exception handling unit is used for deleting the exception data of which the error rate is greater than the error threshold value in the standard data to obtain filtered data;
the sorting unit is used for sorting the filtered data according to the error rate;
a head and tail removing unit for deleting 10% of the head and tail data in the sorted queue to obtain a filtering result;
and the output unit is used for outputting the filtering result.
The heterogeneous big data information filtering device can be operated in computing equipment such as desktop computers, notebooks, palm computers and cloud servers. The device for filtering heterogeneous big data information can be operated by the device comprising a processor and a memory. Those skilled in the art will appreciate that the example is only an example of a filtering apparatus for heterogeneous big data information, and does not constitute a limitation to the filtering apparatus for heterogeneous big data information, and may include more or less components, or combine some components, or different components, for example, the filtering apparatus for heterogeneous big data information may further include an input and output device, a network access device, a bus, and the like. The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general processor may be a microprocessor or the processor may be any conventional processor, etc., and the processor is a control center of the filtering apparatus operation apparatus for one kind of heterogeneous big data information, and various interfaces and lines are used to connect various parts of the filtering apparatus operation apparatus for the whole one kind of heterogeneous big data information.
The memory can be used for storing the computer program and/or the module, and the processor can realize various functions of the filtering device for the heterogeneous big data information by running or executing the computer program and/or the module stored in the memory and calling the data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
While the present disclosure has been described in considerable detail and with particular reference to a few illustrative embodiments thereof, it is not intended to be limited to any such details or embodiments or any particular embodiments, but it is to be construed as effectively covering the intended scope of the disclosure by providing a broad, potential interpretation of such claims in view of the prior art with reference to the appended claims. Furthermore, the foregoing describes the disclosure in terms of embodiments foreseen by the inventor for which an enabling description was available, notwithstanding that insubstantial modifications of the disclosure, not presently foreseen, may nonetheless represent equivalent modifications thereto.