[go: up one dir, main page]

CN110908996B - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN110908996B
CN110908996B CN201811087227.9A CN201811087227A CN110908996B CN 110908996 B CN110908996 B CN 110908996B CN 201811087227 A CN201811087227 A CN 201811087227A CN 110908996 B CN110908996 B CN 110908996B
Authority
CN
China
Prior art keywords
data
deleted
dynamic array
marking unit
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811087227.9A
Other languages
Chinese (zh)
Other versions
CN110908996A (en
Inventor
杨猛
王席敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201811087227.9A priority Critical patent/CN110908996B/en
Publication of CN110908996A publication Critical patent/CN110908996A/en
Application granted granted Critical
Publication of CN110908996B publication Critical patent/CN110908996B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data processing method and device, and relates to the technical field of computers. One embodiment of the method comprises the following steps: packaging the dynamic array, so that the packaged data structure comprises at least one state marking unit, wherein the state marking unit is used for storing state information of data in the dynamic array, and the state information comprises deleted and undeleted data; searching the data to be deleted from the packaged data structure, and modifying the state information of the data to be deleted from undeleted to deleted so as to realize data deletion. The embodiment can solve the problems of cache failure, cache miss, context switching and the like, and avoid CPU performance resource waste and implicit consumption caused by CPU context switching due to frequent updating of CPU cache.

Description

Data processing method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for data processing.
Background
In a distributed scenario, the business is split into a set of mutually cooperating services according to the application function. Each service focuses on a specific, related set of functions. Each service may have its own independent database, cache, etc. to ensure decoupling from other services.
The service splitting ensures the overall stability of the system, reduces the complexity of single service maintenance, and provides infinite possibility for the expansibility of the system. The data sink and the service float, the upper layer service system can obtain the needed data only by obtaining the data from different services instead of persisting part or even all of the data to the local and then arranging the service. For example, for an order related business system of an e-commerce platform, it is required to acquire commodity information from a commodity service, acquire inventory information from an inventory service, filter out the underinventory commodity according to the inventory information, perform the steps of generating a subsequent order if the inventory is sufficient, fail to order if the inventory is insufficient, and prompt a user to replace the commodity.
In order to improve service performance and reduce network consumption of communication between systems, service providers generally provide a batch data acquisition interface, and a dynamic array ArrayList in Java is the most commonly used data list structure. The pseudo code of the business process is as follows:
As can be seen from the above pseudo code, arrayList (corresponding to skuList in the above code) uses the remove (Object object) method to delete data from the list. ArrayList is a container of array structure, so the index value corresponding to the data element to be deleted needs to be found in the deleting process. The query of the data elements to be deleted is realized through traversal, and whether the data elements in the array structure are consistent with the data elements to be deleted or not is sequentially compared to determine the subscript.
When deleting the data elements, firstly increasing the operation times statistical value, and then calculating the number of the data elements in the array needing to be moved, wherein the number of the data elements is equal to the number of all the data elements after the current subscript, namely size-index-1. If the lower sample of the current data element is the last of the arrays, namely, the index of the index value is equal to the size-1, numMoved =0, and the movement of the data element is not needed; if not, all the data elements after the subscript of the current data element are advanced by one step, and the number of the moved data elements is numMoved obtained by calculation. The position corresponding to the last subscript in the original array is set to be empty after the number of the data elements is decremented, so that the JVM (the JVM is Java Virtual Machine abbreviation and Java virtual machine) can conveniently recycle garbage.
FIG. 1 is a schematic diagram of a process for deleting data elements in a dynamic array ArrayList. As shown in fig. 1, 8 data elements, e1, e2, … …, e8, are stored in the arraylest, and when the data element e5 needs to be deleted from the arraylest, the related art is to traverse each data element in the arraylest to find the data element e5 and obtain the corresponding subscript thereof, and since the computer starts from 0 when performing encoding, the subscript of the data element e5 is 4. Then, the number of data elements to be moved is calculated, which is 3 in this example, and then the data element e5 is deleted, and the following data elements e6, e7 and e8 are moved forward one bit at a time; and finally, setting the position corresponding to the last subscript (namely: 7) in the original ArrayList as null.
In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art:
1. The ArrayList does not adopt any optimization to search the data elements, the matched data elements are directly searched through traversal, the time complexity is O (N), and the time complexity is proportional to the position subscripts of the data elements to be searched. The fastRemove method calls a local method library, directly operates a memory address, has the time complexity of O (1) and is irrelevant to input data, so that when the overall complexity is O (N) and the data size is large, serious performance problems are formed, and the problems are dominant consumption;
2. When data in the ArrayList is deleted, the memory will change, the operation on the memory will cause the data in the CPU cache of the central processing unit to be invalid, according to the cache consistency protocol, the JVM operates the data again and will have cache miss, the ArrayList after deleting the data needs to be reloaded into the cache from the memory or the disk, thus the CPU performance resource waste caused by frequent update of the CPU cache can be caused; meanwhile, since the I/O (Input/Output) speed of the memory and the disk is far lower than that of the register and the CPU cache, the CPU performs context switching during waiting for the I/O ready, and the context switching is the operation which consumes the most time in the running process of the computer, and the implicit consumption is not neglected.
Disclosure of Invention
In view of this, the embodiments of the present invention provide a method and an apparatus for data processing, which can solve the problems of cache failure, cache miss, and context switch, and avoid the CPU performance resource waste and implicit consumption caused by CPU context switch due to frequent update of the CPU cache.
To achieve the above object, according to one aspect of an embodiment of the present invention, there is provided a method of data processing.
A method of data processing, comprising: packaging the dynamic array, so that the packaged data structure comprises at least one state marking unit, wherein the state marking unit is used for storing state information of data in the dynamic array, and the state information comprises deleted and undeleted data; searching the data to be deleted from the packaged data structure, and modifying the state information of the data to be deleted from undeleted to deleted so as to realize data deletion.
Optionally, the encapsulated data structure further includes an index unit, configured to build an index according to the data in the dynamic array, and search the data to be deleted according to the built index.
Optionally, establishing the index according to the data in the dynamic array includes: sequentially acquiring data in the dynamic array, and operating the data to obtain an index entry; the data and its index in the dynamic array, and a pointer to the next data are saved to an index unit through an index entry to build an index.
Optionally, the status marking unit is implemented based on a bitmap composed of long integer variables.
Optionally, modifying the state information of the data to be deleted to implement data deletion includes: determining a state marking unit corresponding to the data to be deleted according to the subscript of the data to be deleted in the dynamic array; and modifying the state information of the data to be deleted, which is stored in the determined state marking unit, so as to realize data deletion.
Optionally, the method further comprises the following steps of: updating the number of undeleted data in the dynamic array; and sequentially carrying out operation on the data in the dynamic array and the state information in the state marking unit to determine the undeleted data.
Optionally, after determining the data that is not deleted, further includes: and storing the undeleted data into an array, and then performing data structure conversion on the array to obtain a dynamic array.
According to another aspect of an embodiment of the present invention, there is provided an apparatus for data processing.
An apparatus for data processing, comprising: the array packaging module is used for packaging the dynamic array, so that the packaged data structure comprises at least one state marking unit, wherein the state marking unit is used for storing state information of data in the dynamic array, and the state information comprises deleted and undeleted data; and the state modification module is used for searching the data to be deleted from the packaged data structure and modifying the state information of the data to be deleted from undeleted to deleted so as to realize data deletion.
Optionally, the encapsulated data structure further includes an index unit, configured to build an index according to the data in the dynamic array, and search the data to be deleted according to the built index.
Optionally, establishing the index according to the data in the dynamic array includes: sequentially acquiring data in the dynamic array, and operating the data to obtain an index entry; the data and its index in the dynamic array, and a pointer to the next data are saved to an index unit through an index entry to build an index.
Optionally, the status marking unit is implemented based on a bitmap composed of long integer variables.
Optionally, the state modification module is further configured to: determining a state marking unit corresponding to the data to be deleted according to the subscript of the data to be deleted in the dynamic array; and modifying the state information of the data to be deleted, which is stored in the determined state marking unit, so as to realize data deletion.
Optionally, the method further comprises a data determining module for: updating the number of undeleted data in the dynamic array after deleting the data; and sequentially carrying out operation on the data in the dynamic array and the state information in the state marking unit to determine the undeleted data.
Optionally, the system further comprises a data storage module for: after determining the undeleted data, storing the undeleted data into an array, and then performing data structure conversion on the array to obtain a dynamic array.
According to yet another aspect of an embodiment of the present invention, an electronic device for data processing is provided.
An electronic device for data processing, comprising: one or more processors; and the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are enabled to realize the data processing method provided by the embodiment of the invention.
According to yet another aspect of an embodiment of the present invention, a computer-readable medium is provided.
A computer readable medium having stored thereon a computer program which when executed by a processor implements a method of data processing provided by an embodiment of the invention.
One embodiment of the above invention has the following advantages or benefits: by packaging the dynamic array to increase the state marking unit for storing the data state, when deleting the data, the state of the data to be deleted stored in the state marking unit is modified to realize the data deletion, so that the data deletion can be realized by marking the data state without directly deleting the data elements in the original dynamic array and performing memory movement, thereby solving the problems of cache invalidation, cache miss, context switching and the like, and avoiding the CPU performance resource waste caused by frequent updating of the CPU cache and implicit consumption caused by the CPU context switching. In addition, the index unit is added when the dynamic array is packaged, so that the problems that the data query efficiency is low and the operation with large data volume cannot be supported in the dynamic array ARRAYLIST are solved, the data query efficiency is improved, and the overall performance of the system is improved.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of a process for deleting data elements in a dynamic array ArrayList;
FIG. 2 is a schematic diagram of the main steps of a method of data processing according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a packaged data structure according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of the main blocks of an apparatus for data processing according to an embodiment of the present invention;
FIG. 5 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
Fig. 6 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Aiming at the problems that the query efficiency of the data elements of the dynamic array ArrayList in the prior art is low, the operation of large data volume cannot be supported, and memory operation is performed for many times in the deleting process, so that cache invalidation, cache miss, context switching and the like are caused, the method and the device optimize the package of the original dynamic array ArrayList, and improve the query performance by establishing an index on the premise of not changing the original structure of the dynamic array; in the process of deleting data, a mode of directly deleting data elements and moving a memory in an original dynamic array is not used, and the data elements to be deleted are marked, so that the waste of CPU performance resources and implicit consumption caused by CPU context switching due to frequent updating of a CPU cache after the memory moves are avoided.
Since the data exists in the form of elements in the dynamic array ArrayList, the data elements are data in the following description of the embodiment of the present invention.
Fig. 2 is a schematic diagram of main steps of a method of data processing according to an embodiment of the present invention. As shown in fig. 2, the method for data processing according to the embodiment of the present invention mainly includes the following steps S201 and S202.
Step S201: packaging the dynamic array, so that the packaged data structure comprises at least one state marking unit, wherein the state marking unit is used for storing state information of data in the dynamic array, and the state information comprises deleted and undeleted data;
Step S202: searching the data to be deleted from the packaged data structure, and modifying the state information of the data to be deleted from undeleted to deleted so as to realize data deletion.
According to step S201 and step S202, by packaging the dynamic array to add a status marking unit for storing a data status, when deleting data, the status of the data to be deleted stored in the status marking unit is modified to realize data deletion, so that the data deletion can be realized by marking the data status without directly deleting data elements in the original dynamic array and performing memory movement, thereby solving the problems of cache failure, cache miss, context switching and the like, and avoiding the CPU performance resource waste caused by frequent updating of the CPU cache and implicit consumption caused by CPU context switching.
In particular, the original dynamic array ArrayList needs to be encapsulated, and the encapsulated data structure includes the array size of the dynamic array ArrayList, the memory pointer of the data element array capable of pointing to the dynamic array ArrayList, and the status marking unit. Thus, the operation of the data like an ArrayList can be realized by operating the packaged data structure.
Since the dynamic array ArrayList does not expose the internal data acquisition mode, when the dynamic array ArrayList is packaged, the position of the dynamic array ArrayList in the memory (i.e. the memory pointer of the ArrayList) needs to be directly acquired by means of the sun. Misc. Unsafe class provided by the Java development kit JDK (JavaDevelopmentkit). One specific implementation of the susc. Onsafe class is for example:
UNSAFE getObject ([ original ARRAYLIST ], [ address offset of data element ELEMENTDATA in ARRAYLIST ]).
According to one embodiment of the invention, the status marking unit may be implemented based on bitmaps word/words of long integer variables. A long-form variable occupies 64 bits, each bit indicating whether a particular index corresponding to a data element in the dynamic array is marked for deletion. The bit has values of only 0 and 1, and is initially 0, and is set to 1 after the deletion state is marked, so that a long variable can represent the states (deleted or not deleted) of 64 data elements in the dynamic array ArrayList. Because the long variable is easy to perform bit operation, better effects can be achieved in the aspects of memory space use and operation performance.
According to another embodiment of the present invention, the status marking unit may also be implemented based on bitmaps word/words composed of integer variables, but since one integer (int) variable has only 4 bytes, 32 bits in total, the status of 32 data elements can be saved at most, which is only half of the long type. Therefore, the bitmap composed of the int variables is used for realizing the state marking unit, which causes the bitmap array to be 2 times larger than the original bitmap array, and the number of loops is doubled when the number is fetched at last. Thus, of the commonly used data types, the long type is optimal.
A bitmap implementation state marking unit based on long type variables can store state information of at most 64 data elements in an array list of dynamic arrays, and when the size of the array list (i.e. the number of data elements stored in the array list) exceeds 64, a plurality of state marking units are needed to record the state information of the data, and the state marking units are distinguished by numbering the state marking units. For example: when the size of the ArrayList is 128, 2 status flag units are required to identify the status information of the data in the ArrayList, where status flag unit word [0] represents the status information of the data with subscripts 0-63 and status flag unit word [1] represents the status information of the data with subscripts 64-127.
When data deletion is needed, searching the data to be deleted from the packaged data structure, and acquiring the subscript of the data to be deleted in the dynamic array; then determining a state marking unit corresponding to the data to be deleted according to the subscript of the data to be deleted in the dynamic array; and then, modifying the state information of the data to be deleted, which is stored in the determined state marking unit, so as to realize data deletion. In order to determine the status flag unit corresponding to the data to be deleted, the subscript of the data to be deleted in the dynamic array may perform a modulo operation on the capacity 64 of the status flag unit (i.e., the number of status information that may be stored in each status flag unit), thereby obtaining a status flag unit number, and then the corresponding status flag unit may be found according to the status flag unit number. In performing the modulo operation, it may be performed by a bit operation, and for example, the state flag cell number wordToUse may be determined by wordToUse = (subscript > > 6). In which "> >6" means that 6 bits are shifted right, and is a binary modulo arithmetic, and since the capacity 64 of the state flag cell is equal to 6 times of 2, the modulo of the subscript pair 64 can be achieved by "subscript > > 6".
According to an embodiment of the present invention, the status marking unit is implemented by using bitmaps word/words composed based on long-form (long-form) variables. And in the long-integer (long-type) variable, each bit indicates whether a specific index corresponding to a data element in the dynamic array is marked for deletion. The bit has values of only 0 and 1, and is initially 0, and is set to 1 after the erasure state is marked. Therefore, when a data needs to be marked and deleted, bit operation can be performed with the state information of the data in the dynamic array ArrayList stored in the state marking unit according to the binary code corresponding to the subscript of the data to be deleted in the dynamic array ArrayList, so as to modify the state information of the data to be deleted, for example, the state information of the data to be deleted can be modified to 1 through wordToUse |= (1L < < subscript). Wherein "1L" represents a long type 1 because the bitmap itself is implemented in a long type, where direct writing of 1 would cause data overflow, in addition to requiring a specified value, also requiring a specified type; "<" is a bit operator, representing a left shift, for a 2-ary; "wordToUse |= (1L < < subscript)" is equivalent to "wordToUse = (wordToUse | (1L < < subscript))" meaning that the left shift operation is performed first, then the bit or operation is performed, and then the variable wordToUse is assigned. The binary bit operation can greatly improve the operation speed.
According to the embodiment of the invention, in particular implementation, in order to consider the scene of small data volume (the number of data included in the dynamic array ArrayList is not more than 64), a fast path can be defined, and only one state marking unit (bitmap word) is used, so that the positioning calculation of the state marking unit is saved, and the efficiency of data state marking operation under the scene of small data volume is improved. When the number of data in the dynamic array ArrayList is not more than 64, word is used for realizing a state marking unit; when the number of data in the dynamic array ArrayList is greater than 64, a plurality of status flag cells are implemented using words. That is, only one of words and words may have a value.
At this time, when data deletion is required, the data to be deleted is required to be searched from the packaged data structure, and the subscript of the data to be deleted in the dynamic array is acquired; then, determining whether a fast path can be walked according to the size of the dynamic array ArrayList, wherein when the fast path can be walked, namely, the positioning calculation of a state marking unit is not needed, the unique state marking unit is directly determined as the state marking unit to be operated; when the fast path cannot be walked, determining a state marking unit corresponding to the data to be deleted according to the subscript of the data to be deleted in the dynamic array; and then, modifying the state information of the data to be deleted, which is stored in the determined state marking unit, so as to realize data deletion.
When the fast path can be walked, since only one state marking unit is provided and word is used for realizing, when the state information of the data to be deleted is modified through bit operation, the state information of the data to be deleted can be modified to 1 directly through word|= (1L < < subscript).
In addition, in order to solve the problems that the data query efficiency is low and the operation with large data volume cannot be supported in the dynamic array ArrayList, the invention can also improve the data query efficiency by establishing an index and searching data through the index when the original dynamic array ArrayList is packaged. Namely: the encapsulated data structure may further include an index unit for creating an index according to the data in the dynamic array, and searching the data to be deleted according to the created index.
In an embodiment of the present invention, indexing is established using a lightweight map structure customized based on hashmap. hashmap is an implementation of a hash table based map interface. Assuming that the hash function can properly distribute data elements among the buckets (buckets), the efficiency of data lookup can be greatly improved by indexing the data lookup compared to the prior traversal-by-traversal data lookup. However, since the size of the original dynamic array ArrayList in the use scene is already determined, the problems of capacity expansion and the like do not need to be considered, and the hashmap read-write operation has a plurality of additional operations, and the functions are too heavy compared with the functions required by the invention, so that the invention customizes a lightweight map structure based on hashmap, removes all additional actions, and only keeps the query structure of the core array+linked list so as to improve the performance.
According to the technical scheme of the invention, the jump table can be used for establishing the index. A jump table is a linked list with an added forward pointer. The skip list is a randomized data structure, and is essentially an ordered linked list that can be searched in two halves. The jump table is added with multi-level indexes on the original ordered linked list, and quick searching is realized through the indexes, so that the searching performance can be improved, and the performance of inserting and deleting operations can be improved.
Fig. 3 is a schematic diagram of a packaged data structure according to an embodiment of the present invention. As shown in fig. 3, in the case of encapsulating an original dynamic array arrayleist, the encapsulated data structure includes an array size of the dynamic array arrayleist, a memory pointer capable of pointing to a data element array of the dynamic array arrayleist, and a status marking unit and an index unit according to an embodiment of the present invention. Thus, the operation of the data like an ArrayList can be realized by operating the packaged data structure. For example: the ArrayList can be traversed to acquire data and establish indexes, so that intermediate links can be reduced as much as possible, and the performance is improved.
Taking the example of indexing using a lightweight map structure customized based on hashmap, the process of indexing from data in a dynamic array includes:
sequentially acquiring data in the dynamic array, and operating the data to obtain an index entry;
the data and its index in the dynamic array, and the pointer to the next data are saved to the index unit through the index entry to build the index.
When the data is operated to obtain the index entry, the hash value of the data is obtained by carrying out hash operation on the data, and then the hash value of the data is subjected to modulo operation on the number of the index entries to determine the index entry.
The number of index entries included in the index unit depends on the size of the dynamic array ArrayList (i.e., the number of data included in the dynamic array ArrayList), and in general, the number of index entries may be set equal to the size of the dynamic array ArrayList, so that the number of data nodes corresponding to each index entry is as small as possible, thereby ensuring query efficiency. As can be seen from the packed data structure diagram shown in fig. 3, each index entry may correspond to a plurality of data nodes, each data node including the data itself (pointed to by key) and its index (value) in the dynamic array, and a pointer (next) to the next data. Meanwhile, the data node may further include a hash value (hash) of the data. When the data needs to be searched through the index, the index entry can be determined according to the hash value (hash) of the data, then the data node is searched from a plurality of data nodes corresponding to the determined index entry according to the hash value (hash) of the data, and the data (pointed by key) stored in the searched data node is compared with the data to be searched to determine the finally searched data.
According to one embodiment of the present invention, the index unit includes an index entry number that is automatically converted into 2 whole-order squares, e.g., 8, 16, 32, etc., according to the size of the dynamic array ArrayList. Therefore, when the index entry is determined, bit operation can be performed on the hash value of the data in a binary modulo manner to determine the index entry, and the hash value of the data does not need to be subjected to modulo operation on the number of the index entries to determine the index entry, so that the index entry corresponding to the data can be determined more quickly.
According to another embodiment of the present invention, after an index is established according to data in a dynamic array, when data deletion is required, a subscript of the data to be deleted in the dynamic array ArrayList can be quickly obtained through the index, and then the state information of the data to be deleted is modified to realize data deletion. Specifically, firstly, calculating a hash value of data to be deleted; then carrying out modular operation on the number of index entries by using the hash value of the data to be deleted to determine the index entries; then, searching the data to be deleted from the data nodes corresponding to the index entry according to the hash value of the data to be deleted, wherein the data nodes corresponding to the data to be deleted already comprise subscripts of the data nodes in the dynamic array; finally, determining a state marking unit corresponding to the data to be deleted according to the subscript of the data to be deleted in the dynamic array; and modifying the state information of the data to be deleted stored in the determined state marking unit to realize data deletion.
In general, the deletion of data is not the final objective. The data in the dynamic array ArrayList after deletion may or may not be empty and the undeleted data needs to be taken out for use, for example, the method can be used for prompting the user which data is illegal, and the like. Therefore, after deleting the data, the number of the undeleted data in the dynamic array (i.e. the number of the remaining data in the dynamic array) needs to be updated, for example, a variable may be used to save the number of the remaining data, and the original array size is initially reduced by 1 every time the data is deleted. When the value of this variable is 0, it is indicated that no data remains. If there is data remaining and there is a need to return the remaining data, a new data structure is needed to carry the value of this variable as the new data structure size.
In addition, after the data is deleted, the data in the dynamic array can be sequentially operated with the state information in the state marking unit to determine the data which is not deleted. Specifically, when the status flag unit is implemented based on a long-type variable, the data in the dynamic array may be bit-and-operated sequentially with the status information in the status flag unit to determine the data that is not deleted. For example, assuming that the size of the dynamic array is 8, the state information in the state flag cell is word=01111111, the corresponding subscript of data e2 is 1, because 01111111& (1L < < 1) -! =0, so it can be determined that the data e2 has been deleted; the subscript corresponding to the data e8 is 7, and since 01111111& (1L < < 7) = 0, it can be determined that the data e8 is not deleted, thereby determining the data which is not deleted.
According to one embodiment of the present invention, after determining the undeleted data, the undeleted data may also be saved into an array, and then the array may be subjected to data structure conversion to obtain a dynamic array. As the add method of the dynamic array ArrayList is directly used for adding the data elements, the additional actions are more and the speed is slow. The present invention uses an array instead of a dynamic array ArrayList to store the remaining undeleted data elements.
Since the data structure finally presented to the user after the remaining undeleted data elements are acquired needs to be a dynamic array ArrayList, the data structure of the array needs to be converted into a dynamic array ArrayList. One specific implementation of this is for example:
ArrayList remainList=new ArrayList(0);
UNSAFE. PutInt (REMAINLIST, arrayListSizeOffset, size); size of the memory setting size of the operation
Unsafe.putobject (REMAINLIST, arrayListEleDataOffset, REMAINELES); the operation memory sets an array data element.
The data structure conversion process shown by the codes comprises the following steps: firstly, a dynamic array ArrayList with the size of 0 is created, then the size and the internal data structure of the dynamic array are set through memory operation, and the additional action of the dynamic array ArrayList when data addition is carried out is eliminated.
Finally, a series of performance tests are performed by using the technical scheme of the embodiment, and 1000 times of data deletion are performed in a circulating way, so that the data deletion is performed on the lists with different capacities to compare time-consuming conditions. As shown in table 1, the comparison between the time consumption of the prior art for directly deleting the data in the ArrayList and the time consumption of the prior art for optimally deleting the data in the ArrayList is shown.
TABLE 1
As can be seen from Table 1, the time consumption required by the optimized deletion of the data in the ArrayList by using the technical scheme of the invention is less than that of the prior art no matter how much data is in the ArrayList, and the performance is improved; and, the larger the number of data in the ArrayList, the more remarkable the performance improvement. According to further performance tests, it is found that 22869697ms is required for deleting 100000 data elements directly in the ArrayList in the prior art, only 10258ms is required for optimizing deletion in the invention, and the performance is improved by 2229.45 times.
According to another aspect of the present invention, there is also provided an apparatus for data processing. Fig. 4 is a schematic diagram of main modules of an apparatus for data processing according to an embodiment of the present invention. As shown in fig. 4, the apparatus 400 for data processing according to the embodiment of the present invention mainly includes an array packaging module 401 and a state modifying module 402.
The array packaging module 401 is configured to package a dynamic array, so that the packaged data structure includes at least one status marking unit, where the status marking unit is configured to store status information of data in the dynamic array, and the status information includes deleted and undeleted data;
The state modification module 402 is configured to search for data to be deleted from the encapsulated data structure, and modify the state information of the data to be deleted from undeleted to deleted to implement data deletion.
According to an embodiment of the present invention, the encapsulated data structure may further include an index unit, configured to build an index according to the data in the dynamic array, and search the data to be deleted according to the built index.
According to another embodiment of the present invention, establishing an index from the data in the dynamic array includes:
Sequentially acquiring data in the dynamic array, and operating the data to obtain an index entry;
The data and its index in the dynamic array, and a pointer to the next data are saved to an index unit through an index entry to build an index.
According to a further embodiment of the invention, the status marking unit is implemented based on a bitmap of long integer variables.
According to another embodiment of the invention, the state modification module 402 may also be configured to:
determining a state marking unit corresponding to the data to be deleted according to the subscript of the data to be deleted in the dynamic array;
and modifying the state information of the data to be deleted, which is stored in the determined state marking unit, so as to realize data deletion.
According to yet another embodiment of the present invention, the data processing apparatus 400 may further include a data determining module (not shown in the figure) for:
Updating the number of undeleted data in the dynamic array after deleting the data; and
And sequentially carrying out operation on the data in the dynamic array and the state information in the state marking unit to determine the data which are not deleted.
According to yet another embodiment of the present invention, the data processing apparatus 400 may further include a data saving module (not shown in the figure) for:
after determining the undeleted data, storing the undeleted data into an array, and then performing data structure conversion on the array to obtain a dynamic array.
According to the technical scheme of the embodiment of the invention, the state marking unit for storing the data state is added by packaging the dynamic array, when the data is deleted, the data to be deleted stored in the state marking unit is modified to realize the data deletion, and the data deletion can be realized by marking the data state without directly deleting the data elements in the original dynamic array and performing memory movement, so that the problems of cache failure, cache miss, context switching and the like are solved, and the CPU performance resource waste caused by frequent updating of the CPU cache and the implicit consumption caused by the CPU context switching are avoided. In addition, the index unit is added when the dynamic array is packaged, so that the problems that the data query efficiency is low and the operation with large data volume cannot be supported in the dynamic array ArrayList are solved, the data query efficiency is improved, and the overall performance of the system is improved.
Fig. 5 illustrates an exemplary system architecture 500 of a data processing method or apparatus to which embodiments of the present invention may be applied.
As shown in fig. 5, the system architecture 500 may include terminal devices 501, 502, 503, a network 504, and a server 505. The network 504 is used as a medium to provide communication links between the terminal devices 501, 502, 503 and the server 505. The network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 505 via the network 504 using the terminal devices 501, 502, 503 to receive or send messages or the like. Various communication client applications may be installed on the terminal devices 501, 502, 503, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 501, 502, 503 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 505 may be a server providing various services, such as a background management server (by way of example only) providing support for shopping-type websites browsed by users using the terminal devices 501, 502, 503. The background management server may analyze and process the received data such as the product information query request, and feedback the processing result (e.g., the target push information, the product information—only an example) to the terminal device.
It should be noted that, the method for processing data provided by the embodiment of the present invention is generally performed by the server 505, and accordingly, the device for processing data is generally disposed in the server 505.
It should be understood that the number of terminal devices, networks and servers in fig. 5 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 6, there is illustrated a schematic diagram of a computer system 600 suitable for use in implementing a terminal device or server in accordance with an embodiment of the present invention. The terminal device or server shown in fig. 6 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present invention.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU) 601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on drive 610 so that a computer program read therefrom is installed as needed into storage section 608.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 601.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described units or modules may also be provided in a processor, for example, as: a processor includes an array package module and a state modification module. The names of the units or modules do not limit the units or modules themselves in some cases, for example, the array packaging module may also be described as "a module for packaging a dynamic array such that the packaged data structure includes not less than one status flag unit".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include: packaging the dynamic array, so that the packaged data structure comprises at least one state marking unit, wherein the state marking unit is used for storing state information of data in the dynamic array, and the state information comprises deleted and undeleted data; searching the data to be deleted from the packaged data structure, and modifying the state information of the data to be deleted from undeleted to deleted so as to realize data deletion.
According to the technical scheme of the embodiment of the invention, the state marking unit for storing the data state is added by packaging the dynamic array, when the data is deleted, the data to be deleted stored in the state marking unit is modified to realize the data deletion, and the data deletion can be realized by marking the data state without directly deleting the data elements in the original dynamic array and performing memory movement, so that the problems of cache failure, cache miss, context switching and the like are solved, and the CPU performance resource waste caused by frequent updating of the CPU cache and the implicit consumption caused by the CPU context switching are avoided. In addition, the index unit is added when the dynamic array is packaged, so that the problems that the data query efficiency is low and the operation with large data volume cannot be supported in the dynamic array ArrayList are solved, the data query efficiency is improved, and the overall performance of the system is improved.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (14)

1. A method of data processing, comprising:
packaging the dynamic array, so that the packaged data structure comprises at least one state marking unit, wherein the state marking unit is used for storing state information of data in the dynamic array, and the state information comprises deleted and undeleted data;
searching the data to be deleted from the packaged data structure, and acquiring the subscript of the data to be deleted in the dynamic array; determining a state marking unit corresponding to the data to be deleted according to the subscript of the data to be deleted in the dynamic array; modifying the state information of the data to be deleted stored in the determined state marking unit from undeleted to deleted so as to realize data deletion;
When determining a state marking unit corresponding to the data to be deleted, performing modular operation on the capacity of the state marking unit by using the subscript of the data to be deleted in the dynamic array to obtain a state marking unit number, and finding out a corresponding state marking unit according to the state marking unit number;
When the state information of the data to be deleted is changed from undeleted to deleted, bit operation is carried out on the binary code corresponding to the subscript of the data to be deleted in the dynamic array and the state information of the data in the dynamic array stored in the state marking unit so as to modify the state information of the data to be deleted.
2. The method of claim 1, wherein the encapsulated data structure further comprises an indexing unit for indexing data in the dynamic array and searching the data to be deleted according to the index.
3. The method of claim 2, wherein indexing from the data in the dynamic array comprises:
Sequentially acquiring data in the dynamic array, and operating the data to obtain an index entry;
The data and its index in the dynamic array, and a pointer to the next data are saved to an index unit through an index entry to build an index.
4. The method according to claim 1, wherein the status marking unit is implemented based on a bitmap of long integer variables.
5. The method of claim 1, wherein after the deleting of the data, further comprising:
updating the number of undeleted data in the dynamic array; and
And sequentially carrying out operation on the data in the dynamic array and the state information in the state marking unit to determine the data which are not deleted.
6. The method of claim 5, wherein after determining the undeleted data, further comprising:
and storing the undeleted data into an array, and then performing data structure conversion on the array to obtain a dynamic array.
7. An apparatus for data processing, comprising:
The array packaging module is used for packaging the dynamic array, so that the packaged data structure comprises at least one state marking unit, wherein the state marking unit is used for storing state information of data in the dynamic array, and the state information comprises deleted and undeleted data;
The state modification module is used for searching the data to be deleted from the packaged data structure and acquiring the subscript of the data to be deleted in the dynamic array; determining a state marking unit corresponding to the data to be deleted according to the subscript of the data to be deleted in the dynamic array; modifying the state information of the data to be deleted stored in the determined state marking unit from undeleted to deleted so as to realize data deletion; when determining a state marking unit corresponding to the data to be deleted, performing modular operation on the capacity of the state marking unit by using the subscript of the data to be deleted in the dynamic array to obtain a state marking unit number, and finding out a corresponding state marking unit according to the state marking unit number; when the state information of the data to be deleted is changed from undeleted to deleted, bit operation is carried out on the binary code corresponding to the subscript of the data to be deleted in the dynamic array and the state information of the data in the dynamic array stored in the state marking unit so as to modify the state information of the data to be deleted.
8. The apparatus of claim 7, wherein the encapsulated data structure further comprises an indexing unit for indexing data in the dynamic array and searching the data to be deleted according to the index.
9. The apparatus of claim 8, wherein indexing from the data in the dynamic array comprises:
Sequentially acquiring data in the dynamic array, and operating the data to obtain an index entry;
The data and its index in the dynamic array, and a pointer to the next data are saved to an index unit through an index entry to build an index.
10. The apparatus of claim 7, wherein the status marking unit is implemented based on a bitmap of long integer variables.
11. The apparatus of claim 7, further comprising a data determination module configured to:
Updating the number of undeleted data in the dynamic array after deleting the data; and
And sequentially carrying out operation on the data in the dynamic array and the state information in the state marking unit to determine the data which are not deleted.
12. The apparatus of claim 11, further comprising a data preservation module configured to:
after determining the undeleted data, storing the undeleted data into an array, and then performing data structure conversion on the array to obtain a dynamic array.
13. An electronic device for data processing, comprising:
One or more processors;
storage means for storing one or more programs,
When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.
14. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-6.
CN201811087227.9A 2018-09-18 2018-09-18 Data processing method and device Active CN110908996B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811087227.9A CN110908996B (en) 2018-09-18 2018-09-18 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811087227.9A CN110908996B (en) 2018-09-18 2018-09-18 Data processing method and device

Publications (2)

Publication Number Publication Date
CN110908996A CN110908996A (en) 2020-03-24
CN110908996B true CN110908996B (en) 2024-07-16

Family

ID=69812767

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811087227.9A Active CN110908996B (en) 2018-09-18 2018-09-18 Data processing method and device

Country Status (1)

Country Link
CN (1) CN110908996B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112416932A (en) * 2020-11-18 2021-02-26 青岛海尔科技有限公司 Method and device for establishing field association
CN113741976B (en) * 2021-08-25 2024-06-11 武汉大学 Cache bump elimination method, device, equipment and storage medium
CN114817146A (en) * 2022-04-15 2022-07-29 北京沃东天骏信息技术有限公司 Method and device for processing data
CN118170589B (en) * 2024-05-16 2024-07-23 济南浪潮数据技术有限公司 Data processing method, computer program product, equipment and computer medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103377130A (en) * 2012-04-13 2013-10-30 日立(中国)研究开发有限公司 Data storage equipment and corresponding data storage method
CN103617216A (en) * 2013-11-21 2014-03-05 珠海金山网络游戏科技有限公司 Quick data retrieval method and quick data retrieval system by novel Hash value table

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7062761B2 (en) * 2001-07-10 2006-06-13 Micron Technology, Inc. Dynamic arrays and overlays with bounds policies
CN103140840B (en) * 2011-09-30 2016-08-03 华为技术有限公司 The method and device of data management
CN104246716B (en) * 2012-12-19 2018-02-09 华为技术有限公司 The processing method and equipment of memory space object
CN105095489A (en) * 2015-08-18 2015-11-25 浪潮(北京)电子信息产业有限公司 Distributed file deletion method, device and system
CN107958079A (en) * 2017-12-14 2018-04-24 郑州云海信息技术有限公司 Aggregate file delet method, system, device and readable storage medium storing program for executing

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103377130A (en) * 2012-04-13 2013-10-30 日立(中国)研究开发有限公司 Data storage equipment and corresponding data storage method
CN103617216A (en) * 2013-11-21 2014-03-05 珠海金山网络游戏科技有限公司 Quick data retrieval method and quick data retrieval system by novel Hash value table

Also Published As

Publication number Publication date
CN110908996A (en) 2020-03-24

Similar Documents

Publication Publication Date Title
CN110908996B (en) Data processing method and device
CN111597148B (en) Distributed metadata management method for distributed file system
US10235384B2 (en) Servicing database operations using a messaging server
CN109614402B (en) Multidimensional data query method and device
CN110389812B (en) Method, apparatus, and computer-readable storage medium for managing virtual machines
CN111753019B (en) Data partitioning method and device applied to data warehouse
CN103853714A (en) Data processing method and device
CN105117433A (en) Method and system for statistically querying HBase based on analysis performed by Hive on HFile
Yang et al. An enhanced dynamic hash TRIE algorithm for lexicon search
CN112860412B (en) Service data processing method and device, electronic equipment and storage medium
CN111857539A (en) Method, apparatus and computer program product for managing a storage system
US7676457B2 (en) Automatic index based query optimization
CN111949648B (en) Memory data caching system and data indexing method
CN112783887A (en) Data processing method and device based on data warehouse
CN113704242B (en) Data processing method and device
US20240095033A1 (en) Systems and methods for handling macro compatibility for documents at a storage system
CN109213815B (en) Method, device, server terminal and readable medium for controlling execution times
US12253974B2 (en) Metadata processing method and apparatus, and a computer-readable storage medium
CN113127416B (en) Data query method and device
CN112988857B (en) Service data processing method and device
CN116483841B (en) Form data management method and device based on compact framework
CN111339140A (en) Data query method and device, storage medium and electronic equipment
CN112783904B (en) Method and device for updating index data
CN113779484B (en) A data calculation method and device
CN115994145B (en) Method and device for processing data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant