[go: up one dir, main page]

CN118381513A - A data compression transmission method based on data object - Google Patents

A data compression transmission method based on data object Download PDF

Info

Publication number
CN118381513A
CN118381513A CN202410804899.6A CN202410804899A CN118381513A CN 118381513 A CN118381513 A CN 118381513A CN 202410804899 A CN202410804899 A CN 202410804899A CN 118381513 A CN118381513 A CN 118381513A
Authority
CN
China
Prior art keywords
data
type
compilable
data object
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410804899.6A
Other languages
Chinese (zh)
Inventor
马进
李镜
余智君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yuchuang Intelligent Technology Co ltd
Original Assignee
Shanghai Yuchuang Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yuchuang Intelligent Technology Co ltd filed Critical Shanghai Yuchuang Intelligent Technology Co ltd
Priority to CN202410804899.6A priority Critical patent/CN118381513A/en
Publication of CN118381513A publication Critical patent/CN118381513A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本发明公开了一种基于数据对象的数据压缩传输方法,针对现有技术中存在的JSON对数据表的查询结果或者对数字的表示进行存储或传输时,不仅占用存储空间而且降低开发效率,浪费存储空间的问题,本发明的技术方案包括:定义可编译数据类型,可编译数据类型用于表示所有基础数据类型;获取一个待编码的数据对象,并对所述数据对象进行解析,获取数据对象的基础数据类型;根据数据对象的基础数据类型对其进行编码,得到数据对象的可编译数据类型,用一个字节表示;然后根据数据对象的可编译数据类型,对数据对象的内容进行编码。其目的为:通过对数据对象进行快速的封包和解包,极大的提高了开发效率和运算效率。

The present invention discloses a data compression transmission method based on data objects. Aiming at the problem that when storing or transmitting the query result of a data table or the representation of a number in JSON in the prior art, it not only occupies storage space but also reduces development efficiency and wastes storage space, the technical scheme of the present invention includes: defining a compilable data type, which is used to represent all basic data types; obtaining a data object to be encoded, parsing the data object, and obtaining the basic data type of the data object; encoding the data object according to its basic data type to obtain the compilable data type of the data object, which is represented by one byte; and then encoding the content of the data object according to the compilable data type of the data object. Its purpose is to greatly improve the development efficiency and operation efficiency by quickly packaging and unpacking the data object.

Description

Data compression transmission method based on data object
Technical Field
The invention belongs to the technical field of data transmission, and particularly relates to a data compression transmission method based on a data object.
Background
With the development of the age and the development of technology, the popularization of the Internet is more and more important, file storage and data transmission are also important, a reasonable data format can better utilize the storage space of a hard disk, and a reasonable data format can better utilize the network bandwidth, so that unification of the storage format and the network transmission data format is of great importance, the unified data format can save the storage space, the transmission data volume can be reduced in network transmission, and meanwhile, the efficiency of each party can be greatly improved in the development process.
In the prior art, a data file is generally compressed into JSON (JavaScript Object Notation, JS object numbered musical notation) format, and the JSON format is used for data transmission. The JSON format is a lightweight data exchange format. It stores and presents data in a text format that is completely independent of the programming language based on a subset of ECMAScript (European Computer Manufacturers Association, js specification by the european computer institute). The method is easy to read and write by a person, can exchange data among multiple languages, and is easy to analyze and generate by a machine.
The prior art has the following technical problems:
when JSON stores or transmits the query result of a data table, the content of the JSON can have a plurality of repeated field names and the like, so that the data volume of the whole application scene can be increased by times when the JSON is applied to a specific scene, the storage space is occupied, and the development efficiency is reduced.
Json stores or transmits a representation of a number, such as: { "num":200}, 200 in this JSON is stored or transmitted in 3 bytes, whereas in practice 200 occupies only one byte in binary, as follows: the JSON content can be seen as a data table with one field num, which has two rows, and the frequency of occurrence of the field num increases with the number of rows of the table. JSON has the problem of wasting memory space on a digital representation.
Disclosure of Invention
Aiming at the problems that JSON occupies storage space and reduces development efficiency when the query result of a data table is stored or transmitted and the JSON wastes storage space when the representation of the digital is transmitted or stored in the prior art, the invention provides a data compression transmission method based on a data object, which aims at: by fast packaging and unpacking the data object, the development efficiency and the operation efficiency are greatly improved.
The technical scheme adopted by the invention for achieving the purpose is as follows:
a data compression transmission method based on a data object, comprising:
s1: defining a compilable data type, wherein the compilable data type is used for representing all basic data types;
s2: acquiring a data object to be encoded, and analyzing the data object to acquire the basic data type of the data object;
S3: encoding the data object according to the basic data type of the data object to obtain the compilable data type of the data object, wherein the compilable data type is represented by one byte;
s4: the content of the data object is then encoded according to the compilable data type of the data object.
Further: s4.1: when the basic data type of the data object is the minimum floating point data storage unit semi-precision floating point type, the compilable data type is recorded, and the storage occupies 2 bytes.
S4.2: when the basic data type of the data object is single-precision floating point type, the compilable data type is recorded, and the storage occupies 4 bytes, the data processing process is as follows, whether the recorded compilable data type can meet the storage condition of half-precision floating point compilable data type is judged, if so, the data type storage is directly changed into half-precision floating point type data type storage, and the data type storage adopting the single-precision floating point type is not met.
S4.3: when the basic data type of the data object is double-precision floating point type, the compilatable data type is recorded, the storage occupies 8 bytes, the data processing process is as follows, whether the recorded compilatable data type meets the storage condition of the half-precision floating point compilatable data type or not is judged preferentially, if not, whether the recorded compilatable data type meets the storage condition of the single-precision floating point type or not is judged, if yes, the single-precision floating point type compilatable data type is used, if not, the precision loss is further adopted, the operation before the value after interception is repeated is used, and if not, the double-precision floating point type compilatable data type is used.
S4.4: when the basic data type of the data object is an integer type, the compilable data types of the data object are recorded as follows
The 8 bytes range is:
9,223,372,036,854,775,808~9,223,372,036,854,775,807
The 4 bytes range of values is: -2,147,483, 648-2, 147,483,647
The 2 bytes range of values is: -32768-32767
The 1 byte range is: -128-127
The data processing mode is as follows:
the storage occupies a plurality of numerical values to judge the minimum unit which can be stored in sequence, and the minimum unit is adopted for storage, so that the aim of fully using the storage space to achieve data compression is achieved.
S4.5: when the basic data type of the data object is a character string type, the compilable data type is recorded, and the data is processed in the following way: checking whether the character string is coded or not in a character string index table, if the same character string is coded, directly using a reference value and increasing a count, otherwise, adding the reference, and recording the length and the data content, wherein the length is compressed according to an integer type processing mode.
S4.6: when the basic data type of the data object is the Boolean type, the compilable type is recorded, and the data is processed in the following way: it is stored directly.
S4.7: when the basic data type of the data object is a processing array type, the compilable data type is recorded, and the processing mode of the data is as follows: and compressing the array length of the data object by an integer type processing mode, traversing the array data of the data object, serializing the data of which the array member data accords with the data types of S4.2-S4.6 in a mode of S4.2-S4.6, and storing the key value pair type processing mode if the key value pair data exists in the array member data.
S4.8: when the basic data type of the data object is a key value pair type, the compilable data type is recorded, and the data is processed in the following way: recording the length of the membership number, wherein the length is processed in a compression processing mode according to an S4.4 integer type mode, and then serializing each item value of the key value pair according to an S4.2-S4.7 mode.
Further, the base data type of the data object is an extended data type, the industry unit variable is extended, and the industry unit vector is extended and serialized.
Compared with the prior art, the technical scheme of the invention has the following advantages/beneficial effects:
1. The data compression transmission method can carry out quick package and unpack on the data object, and greatly improves the development efficiency and the operation efficiency.
2. The invention can use unified standard in multiple terminals, can be applied to Web development and instant messaging of a server, a client, a mobile terminal and the like by defining the compilable data types for representing all basic data types, and can be used in multiple different programming languages.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some examples of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a data compression transmission method based on a data object according to embodiment 1 of the present invention.
Fig. 2 is a schematic flow chart of encoding a character string data object in embodiment 1 of the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described below, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, based on the embodiments of the invention, which are apparent to those of ordinary skill in the art without inventive faculty, are intended to be within the scope of the invention. Accordingly, the detailed description of the embodiments of the invention provided below is not intended to limit the scope of the invention as claimed, but is merely representative of selected embodiments of the invention.
As shown in fig. 1, a data compression transmission method based on a data object includes:
s1: a compilable data type is defined, which is used to represent all of the underlying data types.
Defining compilable data types is the creation of a standardized data representation method that allows different underlying data types to be uniformly identified and processed, laying a foundation for subsequent data compression and transmission. The method is helpful for reducing data redundancy, improving storage and transmission efficiency, and realizing data consistency and interoperability in various programming environments.
First, a set of data types need to be defined, which are called "compilable data types". These types are various underlying data types used to represent the original data object during the encoding process. The role of the compilable data types is to be able to represent all underlying data types that may occur in a data object. The underlying data types typically include, but are not limited to, integers, floating point numbers, boolean values, strings, and the like. In the encoding phase of the data compression transmission method, each basic data type field in the original data object is converted into a corresponding compilable data type. This conversion process is part of the encoding process in order to convert the data into a format more suitable for compression and transmission.
In subsequent processing, e.g. as mentioned in step S3, each compilable data type will be represented by a byte. This means that each elementary data type will be mapped onto a one byte identifier, enabling fast identification and processing at encoding. The compilable data types provide basis for subsequent data encoding. In step S4, the content of the data object is encoded according to the compilable data type of each field, including selecting an appropriate compression algorithm, determining the storage mode of the data, etc. By defining compilable data types, the present invention provides a unified data representation standard that facilitates data consistency and interoperability between different terminals and programming languages. By using compilable data types, the present invention aims to improve development efficiency and operational efficiency, as such a unified representation method simplifies the encoding and decoding process, making data processing faster and more efficient.
S2: and acquiring a data object to be encoded, and analyzing the data object to acquire the basic data type of the data object.
A data object that needs to be encoded and compressed is acquired. The data objects may be any form of data structure, such as JSON objects, database records, user-defined data structures, and the like. After the data object is obtained, the data object is parsed, and the parsing process involves checking the structure of the data object, and identifying each field contained therein and its data type.
In parsing the data object, it is necessary to identify and record the basic data type of each field. These base data types correspond to compilable data types, which are basic constituent elements of the original data, such as integers, floating point numbers, strings, boolean values, and the like. By identifying the underlying data type in the data object, preparation can be made for the subsequent encoding steps (steps S3 and S4). Knowing the data type of each field helps to determine the appropriate encoding strategy, for example, selecting the appropriate data compression algorithm or determining the storage format of the data.
Identifying the underlying data type helps to optimize the storage and transmission of the data. For example, if one field is an integer type, it may be decided to use a more compact integer encoding scheme rather than using a more general format, thereby reducing the required memory space and transmission bandwidth.
In step S1, compilable data types are defined, and basic data types identified in the data objects are mapped to corresponding compilable data types, providing explicit guidance for the encoding process.
For data objects that contain complex data structures, such as arrays, lists, or nested objects, the parsing process also requires recursively identifying and processing the data types of each element in these structures.
By parsing the data object and retrieving its underlying data type, a representation of the serialized information may be generated, which will instruct how to convert the data object into a format suitable for network transmission or storage.
S3: the data object is encoded according to its basic data type, resulting in a compilable data type for the data object, represented in one byte.
In step S2, the underlying data type of the data object has been parsed and identified. Step S3 involves converting these basic data types into their compilable form, i.e. into a predefined, standardized representation. The compilable data types are a set of predefined data types for representing all possible underlying data types during the encoding process. These types are designed to be directly understood and processed by the system or machine. Each compilable data type is represented as one byte, which means that 8 bits (i.e., 1 byte) are used to uniquely identify and store the encoding of each underlying data type. This representation is very compact and helps to reduce the space required for storage and transport. The mapping relationship between the compilable data type and the base data type defined in step S1 is applied in this step. Each base data type is mapped to a particular compilable data type identifier. Different encoding strategies may need to be employed for different underlying data types. For example, integers may use variable length coding, while strings may use length prefix coding. These policies are considered when converting to compilable data types. By encoding the data type as one byte, the representation size of the data can be effectively reduced, which is part of the data compression. The compressed data is easier to store and transfer.
The use of one byte to represent all compilable data types helps to build a unified coding format, which is important for data exchange across different platforms and programming languages.
The result of the encoding of step S3 will be the basis of step S4, wherein the content of the data object will be encoded according to the compilable data type of each field. By this compact coding, the efficiency of data compression and transmission can be improved, as it reduces the amount of data that needs to be processed. This approach allows for efficient data exchange between different systems and applications while maintaining the integrity and accuracy of the data types.
S4: the content of the data object is then encoded according to the compilable data type of the data object.
The encoding process of step S4 is based on the compilable data type determined in step S3. This means that the manner in which each data field is encoded will depend on the identity of its corresponding compilable data type. This step involves encoding the specific content in the data object according to its compilable data type. The purpose of encoding is to convert the data into a format suitable for storage or transmission while minimizing the size of the data. The encoding process typically involves data compression to reduce the volume of data. Depending on the data type, different compression algorithms or techniques are used, such as entropy coding, run-length coding, variable length coding, etc. Different encoding rules need to be employed for different compilable data types. For example, integers may be encoded with a fixed length, while strings may be encoded with a length prefix. By encoding according to the data type, the storage efficiency can be optimized. For example, if an integer is within a certain range, fewer bytes may be used to store it, rather than always using the largest possible range of byte numbers. The encoded data should facilitate network transmission, reducing the bandwidth required for transmission. This requires the selection of the appropriate protocol and transport format. During the encoding process, it is necessary to ensure that the integrity and accuracy of the data is not compromised. Even if data is compressed, decompression should be able to restore to the original state.
S4.1: the minimum floating point data storage unit is of a half-precision floating point type, when the basic data type of the data object is of the half-precision floating point type, the compilable data type of the data object is recorded as VT_FP16, and the storage occupies 2 bytes. A half-precision floating-point type is a type of data that uses fewer digits to represent a floating-point number, which uses less memory space than standard single-precision (32-bit) and double-precision (64-bit) floating-point types. In the IEEE 754 standard, half-precision floating-point numbers are typically represented using 16 bits (2 bytes), which is more compact than single-precision (4 bytes) and double-precision (8 bytes) floating-point types, helping to reduce the storage requirements of data. In one embodiment, a half precision floating point type is assigned a particular compilable data type identifier vt_fp16. This identifier is used to uniquely identify the half-precision floating-point type data during encoding. The use of the vt_fp16 identifier may effectively compress floating point data, particularly when a large number of floating point numbers need to be stored or transferred, which may significantly reduce the required storage space or transfer bandwidth. In step S3, the underlying data type of the data object is converted into a compilable data type. For the half-precision floating-point type, this conversion means that it is mapped to the vt_fp16 identifier.
S4.2: when the basic data type of the data object is single precision floating point type, the compiled data type is VT_Fp32, and the storage occupies 4 bytes. The data processing mode is as follows: it is determined whether vt_fp16 can satisfy the storage, and if so, it is directly changed to vt_fp16 storage, for example, 4 bytes are occupied by a value 1.0 stored in vt_fp32, and only 2 bytes are occupied by a value 1.0 stored in vt_fp16. If the precision loss is allowed, the conversion range from the VT_Fp32 to the VT_Fp16 can be widened, for example, the value is 0.0031, the allowable precision threshold is 3 bits after the decimal point, the intercepted value is 0.003, whether the VT_Fp16 meets the storage requirement of the value or not is judged, if the storage requirement is met, the VT_Fp16 is used for storage, and if the storage requirement is not met, the VT_32 is stored. Single precision floating point type is a type of data that uses 32 bits (4 bytes) to represent floating point numbers, conforming to the IEEE 754 standard. It provides higher precision and dynamic range than half precision. In one embodiment, a single precision floating point type is assigned a particular compilable data type identifier vt_fp32. This identifier is used to uniquely identify single precision floating point type data during encoding. In processing single precision floating point type data, a tradeoff between precision and storage space needs to be considered. If the accuracy requirements of the data allow, it is contemplated that a more compact representation may be used. It is first determined whether a single precision floating point number can be stored with a half precision floating point number (vt_fp16), and if the precision requirement can be met, a more compact half precision representation is used.
S4.3: when the base data type of the data object is double-precision floating point type, the compilable data type thereof is recorded as vt_fp64, and the compilable data type occupies 8 bytes. The processing mode of the data is that if the type of the compilable data recorded when the semi-precision floating point type is judged to be VT_FP16 preferentially meets the storage condition, the VT_FP16 is used for storage. And secondly, selecting a single-precision floating-point type compilable data type VT_Fp32VT_Fp32 for storage. In case neither is satisfied, if a loss of precision is allowed, the previous operation is repeated using the values after interception (the specific procedure is described with reference to S4.2). In case neither is satisfied then VT FP64 storage is used. The double-precision floating-point type is a data type that uses 64 bits (8 bytes) to represent a floating-point number, conforming to the IEEE 754 standard. It provides higher precision and dynamic range than single and half precision. In one embodiment, the precision floating point type is assigned a specific compilable data type identifier vt_fp64. This identifier is used to uniquely identify double-precision floating-point type data during encoding. In processing double-precision floating-point data, it is first determined whether a more compact representation can be used without losing critical precision. Priority is given to whether half-precision floating point type (vt_fp16) can be used to store data. If the value of the double precision floating point number can be accurately represented with half precision (VT FP 16), or the loss of precision is within an acceptable range, half precision is chosen for storage to reduce storage space. If the half precision (vt_fp16) cannot meet the storage condition, it is next considered whether single precision floating point (vt_fp32) can be used to store data. Single precision provides less storage requirements than double precision while still maintaining relatively high precision. If precision loss is allowed, the value of the double-precision floating point number can be intercepted to meet the storage condition of the lower-precision floating point type. The specific procedure of the truncation may involve rounding or other numerical approximation methods as described in S4.2. If the precision requirements of the data remain unsatisfied by both types after attempting to use vt_fp16 and vt_fp32, then the last option is to use double precision floating point type (vt_fp64) to store the data to ensure the integrity and precision of the data. During encoding, the system automatically selects the most suitable data type to represent the floating point number according to the priority and the condition so as to balance the storage efficiency and the precision requirement. The policy provides flexibility, and allows the most suitable storage mode to be selected according to actual needs in different application scenes.
S4.4: when the basic data type of the data object is an integer type, the compilable data types of the data object are recorded as follows
The range of values for VT_INT64 (8 bytes) is:
9,223,372,036,854,775,808~9,223,372,036,854,775,807
The range of values for VT_INT32 (4 bytes) is: -2,147,483, 648-2, 147,483,647
The range of values for VT_INT16 (2 bytes) is: -32768-32767
The range of values for VT_INT8 is: -128-127
The data processing mode is as follows:
The number of memory occupancy values in turn determines the minimum unit that can be stored. For example, the vt_int64 type with the value of 100, and the value of 100 satisfies the value range of vt_int8, vt_int8 is used for storage. If the lower storage condition is not satisfied, the original data type storage is still adopted. And the like to fully use the storage space to achieve the purpose of data compression.
In one embodiment, several different sized integer types and their corresponding compilable data type identifiers are defined:
VT_INT64: an 8 byte (64 bit) integer, is used to represent a very large integer.
VT_INT32: a4 byte (32 bit) integer, used to represent a larger range of integers.
VT_INT16: a2 byte (16 bit) integer, for representing a medium size integer.
VT_INT8: a1 byte (8 bit) integer, used to represent a smaller range of integers.
In the encoding process, the goal is to represent integers using a minimum memory space. To this end, the system will evaluate the value of each integer and determine the minimum unit of storage that can be used. For each integer data, the system will process according to the following logic:
It is first checked whether the value is within vt_int8. If so, 1 byte VT_INT8 is used to store this value. If the value is outside the range of VT_INT8, but meets the range of VT_INT16, then 2 bytes of VT_INT16 are used for storage. If the value is outside the range of VT_INT16, but meets the range of VT_INT32, then 4 bytes of VT_INT32 are used for storage. If the value is outside the range of VT_INT32, then 8 bytes of VT_INT64 are used for storage.
The purpose of this approach is to make full use of memory space and to achieve data compression by selecting the most appropriate data type to store the values.
Taking the example of a value of 100, although it can be represented by vt_int64, since the value of 100 is in the range of vt_int8 (-128 to 127), a more compact vt_int8 is employed for storage, taking only 1 byte instead of 8 bytes. This strategy provides flexibility that allows the system to choose the most appropriate storage based on the actual size of the data, rather than fixing the type of data that uses the maximum or minimum. In implementation, it is necessary to consider how to automatically detect the size of an integer and map it to the most appropriate compilable data type.
S4.5: as shown in fig. 2, when the basic data type of the data object is a STRING type, the compilable data type is recorded as vt_stream, and the data is processed in the following manner:
It is first determined whether the length is greater than 0, which is a basic verification step to determine whether the character string contains any characters. If the length of the string is 0, i.e. it is an empty string, the length field is written as 0 at encoding. This means that no character data need be stored in the subsequent data. If not, writing the length of 0, otherwise, judging whether the character string index table is already coded, if the character string index table is coded in the same way, directly using the reference value and increasing the count, otherwise, writing the reference address, recording the length and the data content, wherein the length is compressed according to an integer type processing mode, and then writing the data obtained after the compression processing.
For STRING type data, a compilable data type identifier vt_stream is defined. This identifier is used to uniquely identify the data of the string type during the encoding process.
The system maintains a string index table for tracking the already encoded strings. This is an optimization technique for detecting and avoiding the storage of duplicate string data. If an encoded string identical to the string to be encoded is found in the string index table, the content of the string is not restored. Instead, only one reference value, typically a pointer or index to the original string, is stored and a reference count is incremented for that string. This helps to reduce the use of storage space and improves the compression efficiency of data. If the string is not found in the index table, it is considered a new string, which needs to be encoded. In this case, the character string is written to the storage medium and a new entry is added to the index table, including the reference address and the content of the character string. The length of the string will be compressed in accordance with the previously mentioned integer type of compression process. This means that the minimum storage unit will be used to store the length of the string, as in the processing of integers in step S4.4. The compressed length and the actual content of the string are then written to the storage medium. This ensures that the string data is efficiently serialized and stored. In this way, the system can optimize the storage of string data, significantly reducing the volume of data by avoiding repeated storage of the same string content.
S4.6: when the basic data type of the data object is boolean, recording that the compilable type is vt_true or vt_false, and processing the data in the following manner: it is stored directly in a manner compatible with most programming environments and systems. The boolean type data is assigned two specific compilable data type identifiers, namely vt_true and vt_false. These identifiers are used to uniquely identify the true (true) and false (false) states of the boolean values during the encoding process. The boolean type data does not require complex conversion or compression processing. Since boolean values have only two possible states, these values can be stored directly. At encoding, if the boolean value is TRUE, the vt_true identifier is used for storage; if FALSE, the VT_FALSE identifier is used for storage. This method is simple and efficient. Since boolean values have only two states, using a single identifier to represent them minimizes the required memory space. For example, a boolean value may be stored in one bit (bit) or represented using a single bit in one byte (byte). The method of directly storing the boolean value provides extremely high coding efficiency, since it avoids unnecessary processing steps and can be read and written quickly. Although the boolean values are stored directly, the use of explicit identifiers (vt_true and vt_false) ensures the integrity and readability of the data.
S4.7: when the basic data type of the data object is a processing ARRAY type, recording that the compilable data type is VT_ARRAY, and the processing mode of the data is as follows: and compressing the array length of the data object by an integer type processing mode, traversing the array data of the data object, serializing the data of which the array member data accords with the data types of S4.1-S4.5 in a mode of S4.1-S4.5, and storing the key value pair type processing mode if the key value pair data exists in the array member data. For ARRAY type data, a compilable data type identifier vt_array is defined. This identifier is used to uniquely identify the array type of data during the encoding process. The length of the array first needs to be processed before encoding the array. Using the compression processing of the integer types mentioned earlier (step S4.4), the most suitable integer type (vt_int8, vt_int16, vt_int32 or vt_int64) is selected to store the length of the array according to the value of the array length. This ensures that the array length is stored in the most compact way. Next, the system will traverse each element in the array. For each element, a serialization process is required. If the elements in the array conform to the data types described in steps S4.1 through S4.5 (i.e., half-precision floating-point type, single-precision floating-point type, double-precision floating-point type, integer type, or string type), then the elements are serialized in the corresponding steps. For example, if the element is an integer, it is compressed and stored according to the strategy of S4.4. If the elements in the array are key-value pairs, the type of processing needs to be stored according to the key-value pairs described in step S4.8. This typically involves recording the number of key-value pairs, the length of the compression process, and the value of each of the serialized key-value pairs. In this way, the system may optimize the storage of array data, serializing array elements by compressing the array length and selecting the appropriate data type, thereby reducing the volume of data.
S4.8: when the basic data type of the data object is a key value pair type, for example { name: "yourname" }, map in C++, or unordered _map in JavaScript, the compilable data type is recorded as VT_ KVP (Key Value Pair), and the data is processed in the following manner: recording the length of the membership number, wherein the length is processed in a compression processing mode according to an S4.4 integer type mode, and then serializing each item value of the key value pair according to an S4.2-S4.6 mode.
The key pair type of data is assigned a specific compilable data type identifier vt_kvp. This identifier is used to uniquely identify the data structure of the key-value pair type during the encoding process. Before encoding a key-pair data structure, the number of members (i.e., the number of key-pairs) in the key-pair first needs to be recorded. This membership will then be used to guide the processing of the entire key to the data structure during encoding. The length of the membership number will be compressed in accordance with the previously mentioned integer type compression process (step S4.4). This means that the minimum unit of storage will be used to store membership to optimize the use of storage space. Next, the system will traverse each key-value pair in the key-value pair data structure. For the values in each key-value pair, a corresponding serialization policy needs to be applied according to its data type:
If the data type of the value corresponds to the type described in steps S4.1 to S4.5 (for example, floating point number, integer, string, boolean value), serialization is performed according to the rules defined in these steps.
For example, if the value is an integer, it is compressed and stored according to the strategy of S4.4; if the value is a string, the processing is performed according to the strategy of S4.5.
Although step S4.8 is mainly concerned with serialization of values, the keys typically also need to be processed and stored. The keys may be handled in a manner similar to a string or by other suitable serialization strategies depending on their data type.
In this way, the system may optimize the storage of key-value pair data, reducing the volume of data by compressing the membership length and selecting the appropriate data type to serialize the values in the key-value pair.
While key-value pair data is compressed and serialized, this approach maintains the integrity of the key-value pair data structure, ensuring that each key is correctly associated with its corresponding value.
When the basic data type of the data object is an extended data type, industry unit variables such as 3D field can be extended, vectors Vector2, vector3, vector4 and Matrix can be extended and serialized, and the unit variables for special lines must be the data types in the step of being capable of forming angles S4.2-4.6.
Extended data types refer to those data types that are used in a particular industry or application, but not in the basic data type system of the generic programming language. For example, in the field of 3D graphics processing, common extension data types may include vectors (Vector 2, vector3, vector 4) and matrices (Matrix).
Industry unit variables are industry specific data units, such as vectors and matrices in 3D modeling, that are used to represent points, directions, transformations, etc. in space.
Spreading serialization refers to the conversion of these particular spread data types into a format that can be stored or transmitted. This process requires defining how to encode these complex data structures into binary format or other format suitable for network transmission. In the 3D field, vectors (Vector 2, vector3, vector 4) generally represent spatial points or directions in two, three or four dimensions. When serializing these vectors, it is necessary to convert each of their components (i.e., each individual value of the vector) into a format suitable for storage or transmission. Matrix (Matrix) is used in 3D graphics to represent transformations such as rotations, scaling, translations, etc. When serializing a matrix, all elements in the matrix need to be encoded in a certain order.
For unit variables of a particular row, it is necessary to be able to decompose or convert them into the basic data types described in steps S4.2 to S4.6. This means that even complex data structures eventually need to be encoded in the form of integers, floating point numbers, boolean values, strings, or the like.
By decomposing the extended data type into the basic data type, the serialization method can be ensured to have good compatibility and expansibility, and can adapt to different platforms and programming environments.
The data object is compressed and transmitted in the processing mode, so that the method has wide application scene, in video transmission (ffmpeg) application, graphics data changed in AVPacket objects and other related important data can be packed and compressed by referring to the previous frame of image data, other additional data which is beneficial to development can be added in the packet, and then the packet can be transmitted to a decoding end for unpacking, decoding and playing at one time. In video compression and transmission, the amount of data required to be transmitted may be reduced by comparing the difference between the current frame and the previous frame. This approach is commonly referred to as motion compensation or frame difference coding. The shifted graphics data and other related important data are packaged into a AVPacket object. The object is compressed according to a defined data compression transmission method to reduce the size of the data. Additional data may be added to the compressed data packet to aid in developing or optimizing performance, such as encoding parameters, time stamps, synchronization information, and the like. And transmitting the packed and compressed data packet to a decoding end at one time, and then, performing unpacking, decoding and playing by the decoding end.
In the 3D field, grid data, materials, normals, UV coordinates and other additional data information required by the model can be packed and compressed, the packed model is designed to be a model object crossing the engine, and the models can be loaded and used by a set of algorithms at ThreeJS (JavaScript library applicable to WebGL), UE4 (Unreal Engine 4, a widely used game engine), unity3D (a popular cross-platform game development engine) and other engines; the packaged model is designed to be used across engines, meaning that it can be loaded and rendered in different 3D engines using the same set of algorithms.
In the field of instant messaging, various different text messages, expression messages, user information, picture information, files and the like can be packaged and transmitted, the packaged data are smaller, the network pressure is reduced to a certain extent, and the development efficiency and code readability are improved. By compressing the data, the amount of data transmitted by the network is reduced, thereby relieving network stress. By means of the unified packing and compressing method, the development process can be simplified, and the readability and maintainability of codes are improved.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that the above-mentioned preferred embodiment should not be construed as limiting the invention, and the scope of the invention should be defined by the appended claims. It will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the spirit and scope of the invention, and such modifications and adaptations are intended to be comprehended within the scope of the invention.

Claims (10)

1. A data compression transmission method based on a data object, comprising:
s1: defining a compilable data type, wherein the compilable data type is used for representing all basic data types;
s2: acquiring a data object to be encoded, and analyzing the data object to acquire the basic data type of the data object;
S3: encoding the data object according to the basic data type of the data object to obtain the compilable data type of the data object, wherein the compilable data type is represented by one byte;
s4: the content of the data object is then encoded according to the compilable data type of the data object.
2. A data compression transmission method based on data objects according to claim 1, characterized in that:
S4.1: when the basic data type of the data object is the minimum floating point data storage unit semi-precision floating point type, the compilable data type is recorded, and the storage occupies 2 bytes.
3. A data compression transmission method based on data objects according to claim 2, characterized in that: s4.2: when the basic data type of the data object is single-precision floating point type, the compilable data type is recorded, the storage occupies 4 bytes, the data processing process is as follows, whether the recorded compilable data type can meet the storage condition of half-precision floating point compilable data type is judged, if so, the data type storage is directly changed into half-precision floating point type data type storage, and the data type storage adopting the single-precision floating point type is not met.
4. A data compression transmission method based on data objects according to claim 3, characterized in that:
S4.3: when the basic data type of the data object is double-precision floating point type, the compilatable data type is recorded, the storage occupies 8 bytes, the data processing process is as follows, whether the recorded compilatable data type meets the storage condition of the half-precision floating point compilatable data type or not is judged preferentially, if not, whether the recorded compilatable data type meets the storage condition of the single-precision floating point type or not is judged, if yes, the single-precision floating point type compilatable data type is used, if not, the precision loss is further adopted, the operation before the value after interception is repeated is used, and if not, the double-precision floating point type compilatable data type is used.
5. The data compression transmission method based on the data object according to claim 4, wherein:
S4.4: when the basic data type of the data object is an integer type, the compilable data types of the data object are recorded as follows
The 8 bytes range is:
9,223,372,036,854,775,808~9,223,372,036,854,775,807
The 4 bytes range of values is: -2,147,483, 648-2, 147,483,647
The 2 bytes range of values is: -32768-32767
The 1 byte range is: -128-127
The data processing mode is as follows:
the storage occupies a plurality of numerical values to judge the minimum unit which can be stored in sequence, and the minimum unit is adopted for storage, so that the aim of fully using the storage space to achieve data compression is achieved.
6. A data compression transmission method based on data objects according to claim 5, characterized in that:
S4.5: when the basic data type of the data object is a character string type, the compilable data type is recorded, and the data is processed in the following way: checking whether the character string is coded or not in a character string index table, if the same character string is coded, directly using a reference value and increasing a count, otherwise, adding the reference, and recording the length and the data content, wherein the length is compressed according to an integer type processing mode.
7. The data compression transmission method based on the data object according to claim 6, wherein:
s4.6: when the basic data type of the data object is the Boolean type, the compilable type is recorded, and the data is processed in the following way: it is stored directly.
8. The method for data compression transmission based on a data object according to claim 7, wherein,
S4.7: when the basic data type of the data object is a processing array type, the compilable data type is recorded, and the processing mode of the data is as follows: and compressing the array length of the data object by an integer type processing mode, traversing the array data of the data object, serializing the data of which the array member data accords with the data types of S4.2-S4.6 in a mode of S4.2-S4.6, and storing the key value pair type processing mode if the key value pair data exists in the array member data.
9. The method for data compression transmission based on a data object according to claim 8, wherein,
S4.8: when the basic data type of the data object is a key value pair type, the compilable data type is recorded, and the data is processed in the following way: recording the length of the membership number, wherein the length is processed in a compression processing mode according to an S4.4 integer type mode, and then serializing each item value of the key value pair according to an S4.2-S4.7 mode.
10. A data object based data compression transmission method as claimed in claim 9, wherein the base data type of the data object is an extended data type, the industry unit variable is extended, and the industry unit vector is extended and serialized.
CN202410804899.6A 2024-06-21 2024-06-21 A data compression transmission method based on data object Pending CN118381513A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410804899.6A CN118381513A (en) 2024-06-21 2024-06-21 A data compression transmission method based on data object

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410804899.6A CN118381513A (en) 2024-06-21 2024-06-21 A data compression transmission method based on data object

Publications (1)

Publication Number Publication Date
CN118381513A true CN118381513A (en) 2024-07-23

Family

ID=91906872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410804899.6A Pending CN118381513A (en) 2024-06-21 2024-06-21 A data compression transmission method based on data object

Country Status (1)

Country Link
CN (1) CN118381513A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118524444A (en) * 2024-07-24 2024-08-20 中交上海航道装备工业有限公司 Wireless communication multilink load balancing and intelligent optimizing method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5253325A (en) * 1988-12-09 1993-10-12 British Telecommunications Public Limited Company Data compression with dynamically compiled dictionary
US20110037625A1 (en) * 2009-08-14 2011-02-17 Stephen Allyn Joyce Data encoding method
CN102314697A (en) * 2011-07-20 2012-01-11 张行清 Data type-based numeric data compression and decompression method
US20190044531A1 (en) * 2018-05-11 2019-02-07 Intel Corporation System for compressing floating point data
CN111008230A (en) * 2019-11-22 2020-04-14 远景智能国际私人投资有限公司 Data storage method and device, computer equipment and storage medium
US20200201602A1 (en) * 2018-12-21 2020-06-25 Graphcore Limited Converting floating point numbers to reduce the precision
CN112527754A (en) * 2020-12-23 2021-03-19 山东鲁能软件技术有限公司 Numerical data compression method and system based on bitwise variable length storage
CN114972955A (en) * 2022-06-08 2022-08-30 清华大学 Data processing method, device and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5253325A (en) * 1988-12-09 1993-10-12 British Telecommunications Public Limited Company Data compression with dynamically compiled dictionary
US20110037625A1 (en) * 2009-08-14 2011-02-17 Stephen Allyn Joyce Data encoding method
CN102314697A (en) * 2011-07-20 2012-01-11 张行清 Data type-based numeric data compression and decompression method
US20190044531A1 (en) * 2018-05-11 2019-02-07 Intel Corporation System for compressing floating point data
US20200201602A1 (en) * 2018-12-21 2020-06-25 Graphcore Limited Converting floating point numbers to reduce the precision
CN111008230A (en) * 2019-11-22 2020-04-14 远景智能国际私人投资有限公司 Data storage method and device, computer equipment and storage medium
CN112527754A (en) * 2020-12-23 2021-03-19 山东鲁能软件技术有限公司 Numerical data compression method and system based on bitwise variable length storage
CN114972955A (en) * 2022-06-08 2022-08-30 清华大学 Data processing method, device and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘春: "大数据基本处理框架原理与实践", 31 January 2022, 机械工业出版社 *
饶文碧: "hadoop核心技术与实验", 30 April 2017, 武汉大学出版社, pages: 155 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118524444A (en) * 2024-07-24 2024-08-20 中交上海航道装备工业有限公司 Wireless communication multilink load balancing and intelligent optimizing method

Similar Documents

Publication Publication Date Title
KR100614677B1 (en) How to Compress / Restore Structured Documents
US5363098A (en) Byte aligned data compression
CN106570018B (en) Serialization and deserialization method, device and system and electronic equipment
US7952500B2 (en) Serialization of shared and cyclic data structures using compressed object encodings
US10496595B2 (en) Method for zero-copy object serialization and deserialization
US7800519B2 (en) Method and apparatus for compressing and decompressing data
WO2007026258A2 (en) Methods and devices for compressing and decompressing structured documents
CN118381513A (en) A data compression transmission method based on data object
US7821426B2 (en) Adaptive entropy coding compression output formats
KR100484137B1 (en) Improved huffman decoding method and apparatus thereof
US7518538B1 (en) Adaptive entropy coding compression with multi-level context escapes
US20100257375A1 (en) System and method for compressing compressed data
JP5933742B2 (en) Method and apparatus for generating iterative structure discovery based 3D model compressed bitstream
Mlakar et al. End‐to‐End Compressed Meshlet Rendering
US5262776A (en) Process for use with lempel/gin encoding
CN116894457B (en) Network weight access method of deep learning model
CN112449191B (en) Method for compressing multiple images, method and device for decompressing images
CN115757307A (en) Data compression method and device, electronic equipment and storage medium
Jakob et al. A parallel approach to compression and decompression of triangle meshes using the GPU
US6988114B2 (en) Process and system for compressing and decompressing digital information and computer program product therefore
WO2022216289A1 (en) Efficiently accessing, storing and transmitting data elements
US7949679B2 (en) Efficient storage for finite state machines
US20250225680A1 (en) System and method for geometry point cloud coding
CN112200301B (en) Convolution computing device and method
US20240236607A1 (en) Apparatus and method for converting geometry data for ar/vr systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20240723