CN114510605A - Data storage method, device, electronic device and storage medium - Google Patents
Data storage method, device, electronic device and storage medium Download PDFInfo
- Publication number
- CN114510605A CN114510605A CN202011279996.6A CN202011279996A CN114510605A CN 114510605 A CN114510605 A CN 114510605A CN 202011279996 A CN202011279996 A CN 202011279996A CN 114510605 A CN114510605 A CN 114510605A
- Authority
- CN
- China
- Prior art keywords
- storage
- original data
- field
- data information
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明实施例公开了一种数据存储方法、装置、电子设备及存储介质。该方法包括:获取原始数据信息中的场景信息和/或字段属性;其中,所述原始数据信息包括至少两个字段;根据所述原始数据信息的场景信息和/或所述原始数据信息的字段属性,确定存储策略;根据所述存储策略,确定所述原始数据信息中各字段的存储位置;其中,所述存储位置包括至少一个存储组件。采用上述技术手段能够实现提高开发效率和简化数据入库流程从而提高数据入库效率的目的。
The embodiments of the present invention disclose a data storage method, an apparatus, an electronic device and a storage medium. The method includes: acquiring scene information and/or field attributes in the original data information; wherein the original data information includes at least two fields; the scene information and/or the fields of the original data information according to the original data information attribute, determine a storage strategy; according to the storage strategy, determine the storage location of each field in the original data information; wherein, the storage location includes at least one storage component. The above technical means can achieve the purpose of improving the development efficiency and simplifying the data warehousing process so as to improve the data warehousing efficiency.
Description
技术领域technical field
本发明实施例涉及数据处理技术领域,尤其涉及一种数据存储方法、装置、电子设备及存储介质。Embodiments of the present invention relate to the technical field of data processing, and in particular, to a data storage method, apparatus, electronic device, and storage medium.
背景技术Background technique
当今社会已经从IT(Information Technology)时代过渡到了DT(DataTechnology)时代,数据对于个人或者企业来说都是至关重要的财富。如今,越来越多的企业意识到了数据的重要性,并希望对他们所拥有的数据加以管理和利用,进而提升数据的使用价值。Today's society has transitioned from the IT (Information Technology) era to the DT (Data Technology) era, and data is a vital asset for individuals or businesses. Today, more and more enterprises realize the importance of data, and hope to manage and utilize the data they have to enhance the use value of the data.
目前,大数据提供一些非关系型数据库以及其他大数据相关的组件进行半结构化以及非结构化数据的存储和查询。At present, big data provides some non-relational databases and other big data-related components for semi-structured and unstructured data storage and query.
然而,现有的大数据进行存储时通常采用多组件存储,针对不同组件需要定义不同的入库规则,而不同的入库规则需要不同的代码进而导致开发周期长,人力投入大。However, the existing big data usually adopts multi-component storage, and different warehousing rules need to be defined for different components, and different warehousing rules require different codes, which leads to a long development cycle and a large manpower investment.
因此,亟需一种数据存储方法,用于提高开发效率和简化数据入库流程从而提高数据入库效率。Therefore, there is an urgent need for a data storage method for improving the development efficiency and simplifying the data warehousing process to improve the data warehousing efficiency.
发明内容SUMMARY OF THE INVENTION
本发明实施例提供了一种数据存储方法、装置、电子设备及存储介质,以实现提高开发效率和简化数据入库流程从而提高数据入库效率的目的。Embodiments of the present invention provide a data storage method, device, electronic device, and storage medium, so as to achieve the purpose of improving development efficiency and simplifying data storage process, thereby improving data storage efficiency.
第一方面,本发明实施例提供了一种数据存储方法,包括:In a first aspect, an embodiment of the present invention provides a data storage method, including:
获取原始数据信息中的场景信息和/或字段属性;其中,所述原始数据信息包括至少两个字段;Obtain scene information and/or field attributes in the original data information; wherein, the original data information includes at least two fields;
根据所述原始数据信息的场景信息和/或所述原始数据信息的字段属性,确定存储策略;Determine a storage strategy according to the scene information of the original data information and/or the field attribute of the original data information;
根据所述存储策略,确定所述原始数据信息中各字段的存储位置;其中,所述存储位置包括至少一个存储组件。According to the storage policy, the storage location of each field in the original data information is determined; wherein, the storage location includes at least one storage component.
第二方面,本发明实施例还提供了一种数据存储装置,包括:In a second aspect, an embodiment of the present invention further provides a data storage device, including:
原始数据信息获取模块,用于获取原始数据信息中的场景信息和/或字段属性;其中,所述原始数据信息包括至少两个字段;an original data information acquisition module, configured to acquire scene information and/or field attributes in the original data information; wherein the original data information includes at least two fields;
存储策略确定模块,用于根据所述原始数据信息的场景信息和/或所述原始数据信息的字段属性,确定存储策略;a storage strategy determination module, configured to determine a storage strategy according to the scene information of the original data information and/or the field attributes of the original data information;
存储位置确定模块,用于根据所述存储策略,确定所述原始数据信息中各字段的存储位置;其中,所述存储位置包括至少一个存储组件。A storage location determination module, configured to determine the storage location of each field in the original data information according to the storage policy; wherein, the storage location includes at least one storage component.
第三方面,本发明实施例还提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如本发明实施例中任一所述的数据存储方法。In a third aspect, an embodiment of the present invention further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the processor, the processor implementing the program as described in the present invention when the processor executes the program The data storage method described in any one of the embodiments.
第四方面,本发明实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本发明实施例中任一所述的数据存储方法。In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the data storage method according to any one of the embodiments of the present invention.
本发明实施例通过获取原始数据信息中的场景信息和/或字段属性;其中,所述原始数据信息包括至少两个字段;根据所述原始数据信息的场景信息和/或所述原始数据信息的字段属性,确定存储策略;根据所述存储策略,确定所述原始数据信息中各字段的存储位置;其中,所述存储位置包括至少一个存储组件。采用上述技术手段实现提高开发效率和简化数据入库流程从而提高数据入库效率的目的。In the embodiment of the present invention, the scene information and/or field attributes in the original data information are acquired; wherein, the original data information includes at least two fields; the scene information and/or the original data information according to the scene information and/or the original data information The field attribute is used to determine a storage strategy; the storage location of each field in the original data information is determined according to the storage strategy; wherein the storage location includes at least one storage component. The above-mentioned technical means are used to achieve the purpose of improving development efficiency and simplifying data warehousing process so as to improve data warehousing efficiency.
附图说明Description of drawings
图1是本发明实施例一中提供的一种数据存储方法的流程示意图;1 is a schematic flowchart of a data storage method provided in Embodiment 1 of the present invention;
图2是本发明实施例二中提供的一种数据存储方法的流程示意图;2 is a schematic flowchart of a data storage method provided in Embodiment 2 of the present invention;
图3是本发明实施例三中提供的一种数据存储装置的结构示意图;3 is a schematic structural diagram of a data storage device provided in Embodiment 3 of the present invention;
图4是本发明实施例四中提供的一种电子设备的结构示意图。FIG. 4 is a schematic structural diagram of an electronic device provided in Embodiment 4 of the present invention.
具体实施方式Detailed ways
下面结合附图和实施例对本发明作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本发明,而非对本发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本发明相关的部分而非全部结构。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention. In addition, it should be noted that, for the convenience of description, the drawings only show some but not all structures related to the present invention.
在更加详细地讨论示例性实施例之前应当提到的是,一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将各步骤描述成顺序的处理,但是其中的许多步骤可以被并行地、并发地或者同时实施。此外,各步骤的顺序可以被重新安排。当其操作完成时所述处理可以被终止,但是还可以具有未包括在附图中的附加步骤。所述处理可以对应于方法、函数、规程、子例程、子程序等。Before discussing the exemplary embodiments in greater detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although the flowchart depicts the steps as a sequential process, many of the steps may be performed in parallel, concurrently, or concurrently. Furthermore, the order of the steps can be rearranged. The process may be terminated when its operation is complete, but may also have additional steps not included in the figures. The processing may correspond to a method, function, procedure, subroutine, subroutine, or the like.
实施例一Example 1
图1是本发明实施例一提供的一种数据存储方法的流程示意图,本实施例可适用于将大数据存储至多种存储组件中的情况,该方法可以由一种数据存储装置来执行。该装置可以采用软件和/或硬件的方式实现,并可集成于电子设备中,具体包括如下步骤:FIG. 1 is a schematic flowchart of a data storage method according to Embodiment 1 of the present invention. This embodiment is applicable to the situation of storing big data in various storage components, and the method can be executed by a data storage device. The device can be implemented in software and/or hardware, and can be integrated into electronic equipment, and specifically includes the following steps:
S110、获取原始数据信息中的场景信息和/或字段属性;其中,所述原始数据信息包括至少两个字段。S110. Acquire scene information and/or field attributes in the original data information; wherein the original data information includes at least two fields.
本实施例中,原始数据信息是待存储的非结构化数据和半结构化数据。场景信息可以包括数据的应用场景,示例性的,可以按照场景信息分为检索查询类和比对分析类。字段属性可以包括字段的查询频率,在对原始数据信息进行存储的过程中,若以字段进行存储会更加高效。因此,需要获取原始数据信息中的字段属性。In this embodiment, the original data information is unstructured data and semi-structured data to be stored. The scene information may include the application scene of the data. Exemplarily, the scene information may be classified into a retrieval query category and a comparison analysis category. The field attribute can include the query frequency of the field. In the process of storing the original data information, it will be more efficient to store it in the field. Therefore, it is necessary to obtain the field attributes in the original data information.
S120、根据所述原始数据信息的场景信息和/或所述原始数据信息的字段属性,确定存储策略。S120. Determine a storage policy according to the scene information of the original data information and/or the field attribute of the original data information.
本实施例中,可以根据原始数据信息中的场景信息确定存储策略;也可以根据原始数据信息中的字段属性,确定存储策略;还可以是根据原始数据信息中的场景信息和字段属性,确定存储策略。In this embodiment, the storage strategy may be determined according to the scene information in the original data information; the storage strategy may also be determined according to the field attributes in the original data information; the storage strategy may also be determined according to the scene information and field attributes in the original data information Strategy.
本实施例中,可选的,所述根据所述原始数据信息的场景信息和/或所述原始数据信息的字段属性,确定存储策略,包括:In this embodiment, optionally, determining the storage policy according to the scene information of the original data information and/or the field attributes of the original data information includes:
根据所述原始数据信息的场景信息确定场景数值;和/或,根据所述原始数据信息的字段属性确定各字段的属性数值;Determine the scene value according to the scene information of the original data information; and/or, determine the attribute value of each field according to the field attribute of the original data information;
根据预先设定的存储组件数值和所述场景数值,或,根据预先设定的存储组件数值和所述属性数值,或,根据预先设定的存储组件数值、所述场景数值和所述属性数值,确定各字段的存储策略。According to the preset value of the storage component and the scene value, or, according to the preset value of the storage component and the attribute value, or, according to the preset value of the storage component, the scene value and the attribute value , to determine the storage policy of each field.
本实施例中,场景数值是不同的场景信息所对应的数值,场景数值的赋值可以由用户进行设置,使得场景信息中不同的场景信息的赋值处于不同区间。示例性的,场景信息的赋值区间可以是1-10,并可以设置检索查询类的场景数值为10,比对分析类的场景数值为1;还可以是检索查询类的场景数值是1,比对分析类的场景数值是10。In this embodiment, the scene values are values corresponding to different scene information, and the assignment of the scene values can be set by the user, so that the assignments of different scene information in the scene information are in different ranges. Exemplarily, the assignment interval of the scene information can be 1-10, and the scene value of the retrieval query class can be set to 10, and the scene value of the comparison analysis class can be set to 1; The scene value for the analysis class is 10.
本实施例中,各字段的属性数值是字段属性所对应的数值,即字段查询频率对应的数值,其中,属性数值是在一个区间范围内,可以是1-10,也可以是1-100,本实施例中并不进行限定。具体的,字段的查询频率越高,所对应的字段的属性数值在区间范围内的数值就越高。示例性的,字段的查询频率非常频繁,则可以将该字段的属性数值设定为10;若该字段不可能用于查询,如图片中的URL(Uniform Resource Locator,统一资源定位器)或者特征信息,则该字段的属性数值设定为0。In this embodiment, the attribute value of each field is the value corresponding to the field attribute, that is, the value corresponding to the field query frequency, wherein the attribute value is within an interval, which may be 1-10 or 1-100. This embodiment is not limited. Specifically, the higher the query frequency of a field, the higher the value of the attribute value of the corresponding field within the interval range. Exemplarily, if the query frequency of the field is very frequent, the attribute value of the field can be set to 10; if the field cannot be used for query, such as the URL (Uniform Resource Locator) or feature in the picture information, the attribute value of this field is set to 0.
本实施例中,存储组件数值是根据不同的组件进行设置的,其中,存储组件数值的具体赋值是可以由用户设置的。示例性的,若存储组件数值的赋值区间为1-10,则可以设置侧重数据查询性能但存储空间占比较大的组件的存储组件数值为10,示例性的,该类组件可以是ElasticSearch组件。其中,ElasticSearch组件可以用于分布式全文检索,ElasticSearch组件查询性能高,内部存储的数据多为需要快速检索的文本数据,通常用于快速检索查询。而侧重数据存储占比但查询性能较差的组件的存储组件数值为1,示例性的,该类组件可以是HBase组件。其中,HBase组件是一个分布式的、面向列的开源数据库。HBase组件压缩比比较高,内部存储的数据多为列式数据,通常用于快速的统计分析。而数据存储占比和查询性能均居中的存储组件数值可以是5。这样设置的好处是能够根据自定义的数据存储规则对原始数据信息进行存储,能够提高多组件模式下的数据存储效率。In this embodiment, the value of the storage component is set according to different components, and the specific assignment of the value of the storage component can be set by the user. Exemplarily, if the value range of the storage component is 1-10, the value of the storage component of the component that focuses on data query performance but accounts for a large storage space can be set to 10. Exemplarily, this type of component may be an ElasticSearch component. Among them, the ElasticSearch component can be used for distributed full-text retrieval. The ElasticSearch component has high query performance, and most of the data stored internally is text data that needs to be retrieved quickly, which is usually used for fast retrieval queries. The value of the storage component of a component that focuses on the proportion of data storage but has poor query performance is 1. Exemplarily, this type of component may be an HBase component. Among them, the HBase component is a distributed, column-oriented open source database. The compression ratio of HBase components is relatively high, and the data stored internally is mostly columnar data, which is usually used for fast statistical analysis. The value of the storage component where the proportion of data storage and query performance are both in the middle can be 5. The advantage of this setting is that the original data information can be stored according to the custom data storage rules, which can improve the data storage efficiency in the multi-component mode.
本实施例中,可选的,所述根据预先设定的存储组件数值和所述场景数值,或,根据预先设定的存储组件数值和所述属性数值,或,根据预先设定的存储组件数值、所述场景数值和所述属性数值,确定各字段的存储策略,包括:In this embodiment, optionally, according to the preset value of the storage component and the scene value, or, according to the preset value of the storage component and the attribute value, or, according to the preset value of the storage component The value, the scene value, and the attribute value determine the storage strategy of each field, including:
比较所述场景数值和所述存储组件数值,确定各字段的存储策略;Compare the scene value and the storage component value, and determine the storage strategy of each field;
或,or,
比较所述属性数值和所述存储组件数值,确定各字段的存储策略;Compare the attribute value and the storage component value, and determine the storage strategy of each field;
或,or,
将所述场景数值与所述属性数值的乘积作为结果值;Using the product of the scene value and the attribute value as the result value;
将所述结果值进行开方计算,得到存储数值;Carry out the square root calculation of the result value to obtain the stored value;
比较所述存储数值与所述存储组件数值,确定各字段的存储策略。Comparing the storage value with the storage component value, the storage policy of each field is determined.
若该字段既存在场景数值为10,也存在属性数值为10,则结果值为100,对该结果值进行开方计算后,得到存储数值为10,则该字段的存储策略是存储至存储组件数值为10的组件中。若该字段既存在场景数值为1,也存在属性数值为2,则结果值为则该字段的存储策略是存储至存储组件数值为1的组件中。If the field has both a scene value of 10 and an attribute value of 10, the result value is 100. After the square root calculation of the result value, the stored value is 10, and the storage strategy of this field is to store to the storage component. components with a value of 10. If the field has both a scene value of 1 and an attribute value of 2, the result value is Then the storage policy of this field is to store in the component whose storage component value is 1.
本实施例中,可选的,在特定的情况下,若该字段的属性数值为0,预先设定的存储组件数值分别为10、1和5,则由于该字段的属性数值与预先设定的存储组件数值中的1最为接近,则该字段的存储策略是存储至存储组件数值为1的组件中。In this embodiment, optionally, in a specific case, if the attribute value of the field is 0, and the preset storage component values are 10, 1, and 5, respectively, because the attribute value of this field is different from the preset value 1 in the value of the storage component is the closest, then the storage policy of this field is to store in the component whose storage component value is 1.
本实施例中,可选的,在特定的情况下,若该字段的场景数值为10,预先设定的存储组件数值分别为10、1和5,则由于该字段的场景数值与10最为接近,根据预设的数据存储规则,则可以单独根据场景数值判断存储组件,则该字段的存储策略是存储至存储组件数值为10的组件中。In this embodiment, optionally, in a specific case, if the scene value of this field is 10, and the preset storage component values are 10, 1, and 5, respectively, because the scene value of this field is the closest to 10 , according to the preset data storage rules, the storage component can be judged according to the scene value alone, and the storage policy of this field is to store in the component whose storage component value is 10.
S130、根据所述存储策略,确定所述原始数据信息中各字段的存储位置;其中,所述存储位置包括至少一个存储组件。S130. Determine the storage location of each field in the original data information according to the storage policy; wherein the storage location includes at least one storage component.
本实施例中,若存储策略是存储至存储组件数值为10的组件中,则可以是存储至ElasticSearch组件中;若存储策略是存储至存储组件数值为1的组件中,则可以是HBase组件中。In this embodiment, if the storage policy is stored in the component whose storage component value is 10, it may be stored in the ElasticSearch component; if the storage policy is stored in the component whose storage component value is 1, it may be stored in the HBase component .
本实施例中,可选的,在所述根据所述原始数据信息的场景信息和/或所述原始数据信息的字段属性,确定存储策略之前,还包括:In this embodiment, optionally, before the storage policy is determined according to the scene information of the original data information and/or the field attributes of the original data information, the method further includes:
判断所述原始数据信息是否满足预设的字段规则;其中,所述字段规则包括字段名称、字段类型、字段长度和字段值的范围;Judging whether the original data information satisfies a preset field rule; wherein, the field rule includes a field name, a field type, a field length and a range of field values;
若否,则将所述原始数据信息进行剔除;If not, remove the original data information;
若是,则继续对所述原始数据信息进行存储。If so, continue to store the original data information.
本实施例中,通过判断原始数据信息中的字段名称、字段类型、字段长度和字段值的范围,能够确定该原始数据信息是否满足存储的条件,若不满足,则将该原始数据信息进行剔除。这样设置的好处是,能够使得存储至存储组件中的原始数据信息的内容比较完整,进而有效地过滤脏数据,提高数据存储的效率以及存储后的数据的利用价值。In this embodiment, by judging the field name, field type, field length and field value range in the original data information, it can be determined whether the original data information satisfies the storage conditions, and if not, the original data information is eliminated . The advantage of this setting is that the content of the original data information stored in the storage component can be relatively complete, thereby effectively filtering dirty data, improving the efficiency of data storage and the utilization value of the stored data.
本发明实施例通过获取原始数据信息;所述原始数据信息包括场景信息和/或字段属性;所述原始数据信息包括至少两个字段;根据所述原始数据信息的场景信息和/或所述原始数据信息的字段属性,确定存储策略;根据所述存储策略,确定所述原始数据信息中各字段的存储位置;其中,所述存储位置包括至少一个存储组件。采用上述技术手段能够实现提高开发效率和简化数据入库流程从而提高数据入库效率的目的。The embodiment of the present invention obtains original data information; the original data information includes scene information and/or field attributes; the original data information includes at least two fields; the scene information and/or the original data information according to the original data information The field attribute of the data information determines the storage strategy; according to the storage strategy, the storage location of each field in the original data information is determined; wherein, the storage location includes at least one storage component. The above technical means can achieve the purpose of improving the development efficiency and simplifying the data warehousing process so as to improve the data warehousing efficiency.
实施例二Embodiment 2
图2是本发明实施例二提供的一种数据存储方法的流程示意图,本实施例可适用于将大数据存储至多种存储组件中的情况,本实施例是在实施例一的基础上的进一步细化,该方法可以由一种数据存储装置来执行。该装置可以采用软件和/或硬件的方式实现,并可集成于电子设备中,具体包括如下步骤:FIG. 2 is a schematic flowchart of a data storage method according to Embodiment 2 of the present invention. This embodiment is applicable to the situation where big data is stored in various storage components. This embodiment is a further step on the basis of Embodiment 1. Refinement, the method may be performed by a data storage device. The device can be implemented in software and/or hardware, and can be integrated into electronic equipment, and specifically includes the following steps:
S210、获取原始数据信息;所述原始数据信息包括场景信息和/或字段属性;所述原始数据信息包括至少两个字段。S210. Obtain original data information; the original data information includes scene information and/or field attributes; the original data information includes at least two fields.
S220、根据所述原始数据信息的场景信息和/或所述原始数据信息的字段属性,确定存储策略。S220. Determine a storage policy according to the scene information of the original data information and/or the field attribute of the original data information.
S230、根据所述存储策略,确定所述原始数据信息中各字段的存储位置;其中,所述存储位置包括至少一个存储组件。S230. Determine a storage location of each field in the original data information according to the storage policy; wherein the storage location includes at least one storage component.
本实施例中,可选的,所述根据所述存储策略,确定所述原始数据信息中各字段的存储位置,其中,所述存储位置包括至少一个存储组件,包括:In this embodiment, optionally, the storage location of each field in the original data information is determined according to the storage policy, wherein the storage location includes at least one storage component, including:
若所述原始数据信息中的各字段中存在字段在所述至少一个存储组件存储过程中失败次数超过预设次数,则停止进行存储。If the number of failures of each field in the original data information exceeds the preset number of times during the storage process of the at least one storage component, the storage is stopped.
本实施例中,所述原始数据信息中的字段要存储至特定的存储组件中,若所述原始数据信息中的各字段中的字段在存储过程中失败,则该字段会继续尝试存储至对应的存储组件,并不会尝试存储至其他组件中。进一步地,当该字段存储至对应的存储组件中的存储次数超过预设次数时,则放弃存储。In this embodiment, the fields in the original data information are to be stored in a specific storage component. If a field in each field in the original data information fails during the storage process, the field will continue to try to be stored in the corresponding storage component. storage component, and does not attempt to store in other components. Further, when the number of times that the field is stored in the corresponding storage component exceeds the preset number of times, the storage is abandoned.
S240、根据所述原始数据信息中的各字段的内容,确定所述原始数据信息的主键信息;将所述主键信息存储至所述原始数据信息中各字段的存储位置。S240. Determine primary key information of the original data information according to the content of each field in the original data information; store the primary key information in the storage location of each field in the original data information.
本实施例中,主键信息是原始数据信息的唯一标识,示例性的,原始数据信息中存在用户的身份证号码、手机号码、姓名和性别,主键信息可以是用户的身份证号码,该主键信息会同原始数据信息中的各字段一同存储至至少一个的存储组件中。本实施例中,主键信息的设置可以是在Schema中,Schema原本指代XML Schema,是W3C于2001年5月发布的推荐标准,指出如何形式地描述XML文档的元素。Schema通常可以看做是元数据的抽象集合,在本专利中Schema将作为数据表的抽象集合,记录表的一些关键信息。本实施例中,定义好表的Schema信息之后,在创建表的时候读取JSON格式记录的Schema信息将其拼接成SQL语句,以便进行存储和查询原始数据信息。In this embodiment, the primary key information is the unique identifier of the original data information. Exemplarily, the original data information contains the user's ID number, mobile phone number, name and gender, and the primary key information may be the user's ID number. The primary key information It is stored in at least one storage component together with each field in the original data information. In this embodiment, the primary key information can be set in Schema, which originally refers to XML Schema, which is a recommended standard issued by W3C in May 2001, and indicates how to formally describe elements of an XML document. Schema can generally be regarded as an abstract collection of metadata. In this patent, Schema will be used as an abstract collection of data tables to record some key information of the tables. In this embodiment, after the schema information of the table is defined, when the table is created, the schema information recorded in the JSON format is read and spliced into an SQL statement, so as to store and query the original data information.
S250、根据查询条件中字段的内容,在一个所述存储组件中确定主键信息;根据所述主键信息,从其他所述存储组件中查询所述原始数据信息的各字段。S250. Determine primary key information in one of the storage components according to the content of the fields in the query condition; query each field of the original data information from other storage components according to the primary key information.
本实施例中,根据查询条件中的字段内容,在存储组件中确定该字段的内容,由该字段确定与之绑定的主键信息,并根据该主键信息从其他存储组件中查询与该主键信息绑定的各字段。In this embodiment, according to the field content in the query condition, the content of the field is determined in the storage component, the primary key information bound to it is determined by the field, and the primary key information is queried from other storage components according to the primary key information bound fields.
本实施例中,示例性的,设置原始数据信息的主键信息为ID,因此如果查询条件无误,则调用ES的API构建出相应的查询语句,去ES中查到符合条件的字段之后,再根据主键信息ID去HBase中获取完整数据。对于复杂查询,由于ES目前对复杂查询支持的不够完善,所以在该实施例中复杂查询会去查询GP数据库,从GP中查询到相应的字段后返回给调用方。In this embodiment, exemplarily, the primary key information of the original data information is set as ID, so if the query conditions are correct, the API of ES is called to construct a corresponding query statement, and after the fields that meet the conditions are found in ES, the The primary key information ID goes to HBase to get the complete data. For complex queries, since ES currently does not support complex queries well enough, in this embodiment, complex queries will query the GP database, and the corresponding fields will be queried from the GP and returned to the caller.
本发明实施例通过确定主键信息,并通过主键信息查询存储组件中的各字段,采用上述技术手段能够实现提高查询效率目的。In the embodiment of the present invention, the primary key information is determined, and each field in the storage component is queried through the primary key information, and the purpose of improving the query efficiency can be achieved by adopting the above technical means.
实施例三Embodiment 3
图3是本发明实施例三提供的一种数据存储装置的结构示意图。本发明实施例所提供的一种数据存储装置可执行本发明任意实施例所提供的一种数据存储方法,具备执行方法相应的功能模块和有益效果。如图3所示,该装置包括:FIG. 3 is a schematic structural diagram of a data storage device according to Embodiment 3 of the present invention. A data storage device provided by an embodiment of the present invention can execute a data storage method provided by any embodiment of the present invention, and has functional modules and beneficial effects corresponding to the execution method. As shown in Figure 3, the device includes:
原始数据信息获取模块310,用于获取原始数据信息中的场景信息和/或字段属性;其中,所述原始数据信息包括至少两个字段。The original data
存储策略确定模块320,用于根据所述原始数据信息的场景信息和/或所述原始数据信息的字段属性,确定存储策略。The storage
存储位置确定模块330,用于根据所述存储策略,确定所述原始数据信息中各字段的存储位置;其中,所述存储位置包括至少一个存储组件。The storage
可选的,存储策略确定模块320,用于根据所述原始数据信息的场景信息确定场景数值;和/或,根据所述原始数据信息的字段属性确定各字段的属性数值;Optionally, the storage
根据预先设定的存储组件数值和所述场景数值,或,根据预先设定的存储组件数值和所述属性数值,或,根据预先设定的存储组件数值、所述场景数值和所述属性数值,确定各字段的存储策略。According to the preset value of the storage component and the scene value, or, according to the preset value of the storage component and the attribute value, or, according to the preset value of the storage component, the scene value and the attribute value , to determine the storage policy of each field.
可选的,存储策略确定模块320,用于比较所述场景数值和所述存储组件数值,确定各字段的存储策略;Optionally, the storage
或,or,
比较所述属性数值和所述存储组件数值,确定各字段的存储策略;Compare the attribute value and the storage component value, and determine the storage strategy of each field;
或,or,
将所述场景数值与所述属性数值的乘积作为结果值;Using the product of the scene value and the attribute value as the result value;
将所述结果值进行开方计算,得到存储数值;Carry out the square root calculation of the result value to obtain the stored value;
比较所述存储数值与所述存储组件数值,确定各字段的存储策略。Comparing the storage value with the storage component value, the storage policy of each field is determined.
所述装置还包括:The device also includes:
原始数据信息判断模块,用于判断所述原始数据信息是否满足预设的字段规则;其中,所述字段规则包括字段名称、字段类型、字段长度和字段值的范围;A raw data information judgment module, used for judging whether the raw data information satisfies a preset field rule; wherein, the field rule includes a field name, a field type, a field length and a range of field values;
若否,则将所述原始数据信息进行剔除;If not, remove the original data information;
若是,则继续对所述原始数据信息进行存储。If so, continue to store the original data information.
存储位置确定模块330,用于若所述原始数据信息中的各字段中存在字段在所述至少一个存储组件存储过程中失败次数超过预设次数,则停止进行存储。The storage
所述装置,还包括:The device also includes:
主键信息确定模块,用于根据所述原始数据信息中的各字段的内容,确定所述原始数据信息的主键信息;a primary key information determination module, configured to determine the primary key information of the original data information according to the content of each field in the original data information;
将所述主键信息存储至所述原始数据信息中各字段的存储位置。The primary key information is stored in the storage location of each field in the original data information.
所述装置,还包括:The device also includes:
各字段查询模块,用于根据查询条件中字段的内容,在一个所述存储组件中确定主键信息;Each field query module is used to determine primary key information in one of the storage components according to the content of the field in the query condition;
根据所述主键信息,从其他所述存储组件中查询所述原始数据信息的各字段。According to the primary key information, each field of the original data information is queried from the other storage components.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiments, which will not be repeated here.
实施例四Embodiment 4
图4为本发明实施例四提供的一种电子设备的结构示意图,图4示出了适于用来实现本发明实施例实施方式的示例性设备的结构示意图。图4显示的设备12仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。FIG. 4 is a schematic structural diagram of an electronic device according to Embodiment 4 of the present invention, and FIG. 4 shows a schematic structural diagram of an exemplary device suitable for implementing the embodiments of the present invention. The
如图4所示,设备12以通用计算设备的形式表现。设备12的组件可以包括但不限于:一个或者多个处理器或者处理单元16,系统存储器28,连接不同系统组件(包括系统存储器28和处理单元16)的总线18。As shown in FIG. 4,
总线18表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(ISA)总线,微通道体系结构(MAC)总线,增强型ISA总线、视频电子标准协会(VESA)局域总线以及外围组件互连(PCI)总线。
设备12典型地包括多种计算机系统可读介质。这些介质可以是任何能够被设备12访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。
系统存储器28可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(RAM)30和/或高速缓存存储器32。设备12可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统34可以用于读写不可移动的、非易失性磁介质(图4未显示,通常称为“硬盘驱动器”)。尽管图4中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如CD-ROM,DVD-ROM或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线18相连。系统存储器28可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本发明实施例各实施例的功能。
具有一组(至少一个)程序模块42的程序/实用工具40,可以存储在例如系统存储器28中,这样的程序模块42包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本发明实施例所描述的实施例中的功能和/或方法。A program/
设备12也可以与一个或多个外部设备14(例如键盘、指向设备、显示器24等)通信,还可与一个或者多个使得用户能与该设备12交互的设备通信,和/或与使得该设备12能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口22进行。并且,设备12还可以通过网络适配器20与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图4所示,网络适配器20通过总线18与设备12的其它模块通信。应当明白,尽管图中未示出,可以结合设备12使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
处理单元16通过运行存储在系统存储器28中的程序,从而执行各种功能应用以及数据处理,例如实现本发明实施例所提供的一种数据存储方法,包括:The
获取原始数据信息中的场景信息和/或字段属性;其中,所述原始数据信息包括至少两个字段;Obtain scene information and/or field attributes in the original data information; wherein, the original data information includes at least two fields;
根据所述原始数据信息的场景信息和/或所述原始数据信息的字段属性,确定存储策略;Determine a storage strategy according to the scene information of the original data information and/or the field attribute of the original data information;
根据所述存储策略,确定所述原始数据信息中各字段的存储位置;其中,所述存储位置包括至少一个存储组件。According to the storage policy, the storage location of each field in the original data information is determined; wherein, the storage location includes at least one storage component.
实施例五Embodiment 5
本发明实施例五还提供一种计算机可读存储介质,其上存储有计算机程序(或称为计算机可执行指令),该程序被处理器执行时可实现上述任意实施例所述的一种数据存储方法,包括:Embodiment 5 of the present invention further provides a computer-readable storage medium, on which a computer program (or referred to as a computer-executable instruction) is stored, and when the program is executed by a processor, the data described in any of the foregoing embodiments can be implemented. Storage methods, including:
获取原始数据信息中的场景信息和/或字段属性;其中,所述原始数据信息包括至少两个字段;Obtain scene information and/or field attributes in the original data information; wherein, the original data information includes at least two fields;
根据所述原始数据信息的场景信息和/或所述原始数据信息的字段属性,确定存储策略;Determine a storage strategy according to the scene information of the original data information and/or the field attribute of the original data information;
根据所述存储策略,确定所述原始数据信息中各字段的存储位置;其中,所述存储位置包括至少一个存储组件。According to the storage policy, the storage location of each field in the original data information is determined; wherein, the storage location includes at least one storage component.
本发明实施例的计算机存储介质,可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。The computer storage medium in the embodiments of the present invention may adopt any combination of one or more computer-readable mediums. The computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples (a non-exhaustive list) of computer readable storage media include: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the above. In this document, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。A computer-readable signal medium may include a propagated data signal in baseband or as part of a carrier wave, with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括——但不限于无线、电线、光缆、RF等等,或者上述的任意合适的组合。Program code embodied on a computer readable medium may be transmitted using any suitable medium, including - but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
可以以一种或多种程序设计语言或其组合来编写用于执行本发明实施例操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如”C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out the operations of embodiments of the present invention may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, and A conventional procedural programming language - such as the "C" language or similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
注意,上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解,本发明不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此,虽然通过以上实施例对本发明进行了较为详细的说明,但是本发明不仅仅限于以上实施例,在不脱离本发明构思的情况下,还可以包括更多其他等效实施例,而本发明的范围由所附的权利要求范围决定。Note that the above are only preferred embodiments of the present invention and applied technical principles. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present invention. Therefore, although the present invention has been described in detail through the above embodiments, the present invention is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present invention. The scope is determined by the scope of the appended claims.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011279996.6A CN114510605A (en) | 2020-11-16 | 2020-11-16 | Data storage method, device, electronic device and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011279996.6A CN114510605A (en) | 2020-11-16 | 2020-11-16 | Data storage method, device, electronic device and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN114510605A true CN114510605A (en) | 2022-05-17 |
Family
ID=81547143
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202011279996.6A Pending CN114510605A (en) | 2020-11-16 | 2020-11-16 | Data storage method, device, electronic device and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN114510605A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2024082518A1 (en) * | 2022-10-19 | 2024-04-25 | 中冶南方工程技术有限公司 | Data storage method and apparatus for digital steel coil system |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109814856A (en) * | 2019-01-17 | 2019-05-28 | 平安科技(深圳)有限公司 | Data entry method, device, terminal and computer readable storage medium |
| CN111159205A (en) * | 2019-11-27 | 2020-05-15 | 京东数字科技控股有限公司 | Data processing method and system |
| CN111752941A (en) * | 2019-07-31 | 2020-10-09 | 北京京东尚科信息技术有限公司 | A data storage, access method, device, server and storage medium |
-
2020
- 2020-11-16 CN CN202011279996.6A patent/CN114510605A/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109814856A (en) * | 2019-01-17 | 2019-05-28 | 平安科技(深圳)有限公司 | Data entry method, device, terminal and computer readable storage medium |
| CN111752941A (en) * | 2019-07-31 | 2020-10-09 | 北京京东尚科信息技术有限公司 | A data storage, access method, device, server and storage medium |
| CN111159205A (en) * | 2019-11-27 | 2020-05-15 | 京东数字科技控股有限公司 | Data processing method and system |
Non-Patent Citations (1)
| Title |
|---|
| 刘兴堂: "现代系统建模与仿真技术", 31 August 2011, 西北工业大学出版社, pages: 440 - 443 * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2024082518A1 (en) * | 2022-10-19 | 2024-04-25 | 中冶南方工程技术有限公司 | Data storage method and apparatus for digital steel coil system |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN113051268B (en) | Data query method, data query device, electronic device and storage medium | |
| CN111949693B (en) | Data processing device, data processing method, storage medium and electronic equipment | |
| CN110162408B (en) | Data processing method, device, equipment and machine-readable medium | |
| US20150370897A1 (en) | Data query method and apparatus | |
| MX2013014800A (en) | Recommending data enrichments. | |
| CN109376173A (en) | A data query method, device, electronic device and storage medium | |
| CN114064729B (en) | A data retrieval method, device, equipment and storage medium | |
| CN110888839A (en) | Data storage and data search method and device | |
| CN111694866A (en) | Data searching and storing method, data searching system, data searching device, data searching equipment and data searching medium | |
| CN110704476A (en) | Data processing method, device, equipment and storage medium | |
| CN110334109A (en) | Relational database data query method, system, medium and electronic device | |
| WO2024001029A1 (en) | Method and apparatus for maintaining blockchain data, electronic device, and storage medium | |
| CN108319608A (en) | The method, apparatus and system of access log storage inquiry | |
| US9646048B2 (en) | Declarative partitioning for data collection queries | |
| WO2023231615A1 (en) | Materialized-column creation method and data query method based on data lake | |
| CN112579307A (en) | Physical lock resource allocation detection method and device and electronic equipment | |
| CN112417225A (en) | Joint query method and system for multi-source heterogeneous data | |
| CN117171108B (en) | Virtual model mapping method and system | |
| CN113722296A (en) | Agricultural information processing method and device, electronic equipment and storage medium | |
| CN113448995A (en) | Database operation method, device and system, electronic equipment and storage medium | |
| CN116628018A (en) | A data query method, device, server and storage medium | |
| CN114510605A (en) | Data storage method, device, electronic device and storage medium | |
| CN111753017B (en) | Method and device for processing dimension table based on Kylin system, electronic equipment and storage medium | |
| CN109542912B (en) | Interval data storage method, device, server and storage medium | |
| CN117271699A (en) | Unstructured data management method, device, equipment and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |

