CN120560593B

CN120560593B - Multi-dimensional data rapid storage method based on big data

Info

Publication number: CN120560593B
Application number: CN202511061823.XA
Authority: CN
Inventors: 许伟; 宫立圆
Original assignee: Laiwu Vocational and Technical College
Current assignee: Laiwu Vocational and Technical College
Priority date: 2025-07-31
Filing date: 2025-07-31
Publication date: 2025-10-03
Anticipated expiration: 2045-07-31
Also published as: CN120560593A

Abstract

The invention relates to the technical field of data processing, in particular to a multi-dimensional data quick storage method based on big data, which comprises the steps of combining the quantity of dimension data contained in different data systems, the time for calling any dimension data and the times for calling the dimension data by a user, obtaining public coefficients of the dimension data, dividing the dimension data into different types of dimension data according to the size of the public coefficients of the dimension data, including public dimension data, private dimension data and undetermined dimension data, obtaining semi-public possibility of the undetermined dimension data according to the condition that the undetermined dimension data is called in the past time, dividing the undetermined dimension data into semi-public dimension data and private dimension data according to the size of the semi-public possibility, and storing the different data respectively. The method and the device effectively improve the storage effect of the data with different dimensions by optimizing the storage strategy.

Description

A fast storage method for multidimensional data based on big data

技术领域Technical Field

本发明涉及数据处理技术领域，具体涉及一种基于大数据的多维数据快速存储方法。The present invention relates to the field of data processing technology, and in particular to a method for fast storage of multidimensional data based on big data.

背景技术Background Art

智慧校园平台通过大数据分析技术为广大师生提供了全面、协同的智慧化生活的校园环境，为教学、科研、管理和学习生活提供智能化、个性化便捷化的信息服务。The smart campus platform uses big data analysis technology to provide teachers and students with a comprehensive and collaborative smart living campus environment, and provides intelligent, personalized and convenient information services for teaching, scientific research, management and learning life.

智慧校园平台最核心部分就是数据的整理与存储，即数据库的建立，但是由于智慧校园平台中的数据来自多个数据系统，而不同数据之间存在复杂的关联关系，导致数据在各个数据系统中存储和调用时，数据内容的一致性和完整性受到挑战，对不同维度数据的存储方式极大影响了数据系统对数据调取和存储时的实际效果，存储方式不当时，会导致数据在对应数据系统中进行存储和调用的效果不理想。The core part of the smart campus platform is data organization and storage, that is, the establishment of a database. However, since the data in the smart campus platform comes from multiple data systems and there are complex correlations between different data, the consistency and integrity of the data content are challenged when the data is stored and called in each data system. The storage method of data of different dimensions greatly affects the actual effect of the data system when retrieving and storing data. If the storage method is inappropriate, the storage and calling effect of the data in the corresponding data system will be unsatisfactory.

发明内容Summary of the Invention

本发明提供一种基于大数据的多维数据快速存储方法，以解决现有的问题。The present invention provides a fast storage method for multidimensional data based on big data to solve the existing problems.

本发明的一种基于大数据的多维数据快速存储方法采用如下技术方案：The present invention provides a fast multi-dimensional data storage method based on big data using the following technical solutions:

本发明一个实施例提供了一种基于大数据的多维数据快速存储方法，该方法包括以下步骤：An embodiment of the present invention provides a method for fast storage of multidimensional data based on big data, the method comprising the following steps:

获取若干数据系统中分别包含的若干维度数据；Obtaining multiple dimensional data contained in multiple data systems;

结合不同数据系统中所包含维度数据的数量、调取任意维度数据的时间以及维度数据被用户调取的次数，获取所述维度数据的公共系数，根据维度数据的公共系数的大小，将维度数据划分为不同类型的维度数据，包括公共维度数据、私有维度数据、待定维度数据；Based on the amount of dimension data contained in different data systems, the time for retrieving any dimension data, and the number of times the dimension data is retrieved by users, the common coefficients of the dimension data are obtained, and according to the size of the common coefficients of the dimension data, the dimension data are divided into different types of dimension data, including public dimension data, private dimension data, and undetermined dimension data;

根据待定维度数据在过去时间内被调用的情况，获取待定维度数据的半公有可能性，根据半公有可能性的大小将待定维度数据分为半公有维度数据和诸私有维度数据；According to the call status of the pending dimension data in the past, the semi-public possibility of the pending dimension data is obtained, and the pending dimension data is divided into semi-public dimension data and private dimension data according to the size of the semi-public possibility;

对公有维度数据、私有维度数据、半公有维度数据和诸私有维度数据分别进行存储。Public dimension data, private dimension data, semi-public dimension data and private dimension data are stored separately.

进一步地，所述结合不同数据系统中所包含维度数据的数量、调取任意维度数据的时间以及维度数据被用户调取的次数，获取所述维度数据的公共系数，包括的具体方法为：Furthermore, the method of obtaining the common coefficients of the dimensional data by combining the number of dimensional data included in different data systems, the time of retrieving any dimensional data, and the number of times the dimensional data is retrieved by the user includes the following specific methods:

根据包含任意维度数据的数据系统的数量以及在预设时间内维度数据被用户调取的次数，分别获取所述维度数据的公有频率和调取频率；According to the number of data systems containing any dimension data and the number of times the dimension data is retrieved by users within a preset time, respectively obtain the public frequency and retrieval frequency of the dimension data;

根据不同数据系统之间包含相同维度数据的数量以及调取任意维度数据的时间，获取每个维度数据的缩放系数；Obtain the scaling factor for each dimension data based on the number of identical dimension data in different data systems and the time it takes to retrieve any dimension data;

根据维度数据的公有频率、调取频率以及缩放系数，获取维度数据的公共系数，所述公有频率、调取频率以及缩放系数均与公共系数呈正比。The common coefficient of the dimensional data is obtained according to the common frequency, the retrieval frequency and the scaling factor of the dimensional data, wherein the common frequency, the retrieval frequency and the scaling factor are all proportional to the common coefficient.

进一步地，所述维度数据的公有频率和调取频率的具体获取方法为：Furthermore, the specific method for obtaining the public frequency and the retrieval frequency of the dimensional data is as follows:

对于任意维度数据，获取包含维度数据的数据系统的数量、所有数据系统的数量、在预设的历史时间段内维度数据被用户调取的次数、维度数据被用户调取的总次数；For any dimension data, obtain the number of data systems containing the dimension data, the number of all data systems, the number of times the dimension data was retrieved by users within a preset historical time period, and the total number of times the dimension data was retrieved by users;

将包含维度数据的数据系统的数量与所有数据系统的数量的比值，记为维度数据的公有频率；The ratio of the number of data systems containing dimension data to the number of all data systems is recorded as the common frequency of dimension data;

将在预设的历史时间段内维度数据被用户调取的次数与维度数据被用户调取的总次数的比值，记为维度数据的调取频率。The ratio of the number of times the dimension data is retrieved by the user within a preset historical time period to the total number of times the dimension data is retrieved by the user is recorded as the retrieval frequency of the dimension data.

进一步地，所述根据不同数据系统之间包含相同维度数据的数量以及调取任意维度数据的时间，获取每个维度数据的缩放系数，包括的具体方法为：Furthermore, the scaling factor of each dimension data is obtained according to the number of identical dimension data in different data systems and the time of retrieving any dimension data, including the specific method of:

维度数据的缩放系数的具体计算方法为：The specific calculation method of the scaling factor of dimensional data is:

式中，表示第个维度数据的缩放系数，表示第个数据系统对第个维度数据的调取时间，表示第个数据系统对第个维度数据的调取时间，表示第个数据系统与第个数据系统之间所包含维度数据的交集中元素的数量，表示第个数据系统与第个数据系统之间所包含维度数据的并集中元素的数量，为遍历第个数据系统与第个数据系统的所有组合后得到的第一均值，为遍历第个数据系统与第个数据系统的所有组合后得到的第二均值，表示以自然常数为底数的指数函数。Where, Indicates the The scaling factor of the dimensional data, Indicates the The data system The retrieval time of each dimension data, Indicates the The data system The retrieval time of each dimension data, Indicates the A data system and The number of elements in the intersection of dimensional data between data systems, Indicates the A data system and The number of elements in the union of dimensional data between the data systems, To traverse A data system and The first mean obtained after all combinations of data systems, To traverse A data system and The second mean obtained from all combinations of data systems, Represents an exponential function with a natural constant as its base.

进一步地，所述根据维度数据的公共系数的大小，将维度数据划分为不同类型的维度数据，包括公共维度数据、私有维度数据、待定维度数据，包括的具体方法为：Furthermore, the dimensional data is divided into different types of dimensional data according to the size of the common coefficient of the dimensional data, including public dimensional data, private dimensional data, and undetermined dimensional data, including the specific method of:

预设第一阈值和第二阈值，将公共系数大于等于第一阈值的维度数据记为公共维度数据，将公共系数小于第一阈值且大于等于第二阈值的维度数据记为待定维度数据，将公共系数小于等于第二阈值的维度数据记为私有维度数据。Preset first threshold and the second threshold , the dimensional data whose public coefficient is greater than or equal to the first threshold is recorded as public dimensional data, the dimensional data whose public coefficient is less than the first threshold and greater than or equal to the second threshold is recorded as pending dimensional data, and the dimensional data whose public coefficient is less than or equal to the second threshold is recorded as private dimensional data.

进一步地，所述根据待定维度数据在过去时间内被调用的情况，获取待定维度数据的半公有可能性，包括的具体方法为：Furthermore, the method of obtaining the semi-public possibility of the pending dimension data according to the situation in which the pending dimension data was called in the past time includes the following specific methods:

根据待定维度数据在过去时间内被调用的情况，分别获取历史调用参考值以及待定维度数据的历史平均调用频率、同调参数、所有数据系统调用待定维度数据的历史调用总频次；According to the call status of the pending dimension data in the past, obtain the historical call reference value, the historical average call frequency of the pending dimension data, the synchronization parameter, and the total historical call frequency of all data systems calling the pending dimension data;

所述历史调用参考值为所有待定维度数据的历史平均调用频率的均值；The historical call reference value is the average of the historical average call frequencies of all undetermined dimension data;

对于任意待定维度数据，获取所述待定维度数据的历史平均调用频率与历史调用参考值的比值，记为所述待定维度数据的第一比值；获取所述待定维度数据的同调参数与所有数据系统调用所述待定维度数据的历史调用总频次的比值，记为所述待定维度数据的第二比值；根据所述待定维度数据的第一比值和第二比值得到所述待定维度数据的半公有可能性，所述第一比值和第二比值均与所述待定维度数据的半公有可能性呈正相关。For any pending dimensional data, obtain the ratio of the historical average call frequency of the pending dimensional data to the historical call reference value, and record it as the first ratio of the pending dimensional data; obtain the ratio of the coherence parameter of the pending dimensional data to the total historical call frequency of all data systems calling the pending dimensional data, and record it as the second ratio of the pending dimensional data; obtain the semi-public possibility of the pending dimensional data based on the first ratio and the second ratio of the pending dimensional data, and both the first ratio and the second ratio are positively correlated with the semi-public possibility of the pending dimensional data.

进一步地，所述历史平均调用频率的具体获取方法为：Furthermore, the specific method for obtaining the historical average call frequency is:

预设历史时间段和历史时间区间，历史时间区间包含若干个历史时间段，将待定维度数据在一个历史时间段内被所有数据系统调用的次数记为待定维度数据在所述历史时间段的历史调用参数，将待定维度数据在所有历史时间段的平均历史调用参数记为待定维度数据的历史平均调用频率。A historical time period and a historical time interval are preset. The historical time interval includes several historical time periods. The number of times the pending dimension data is called by all data systems within a historical time period is recorded as the historical call parameter of the pending dimension data in the historical time period. The average historical call parameter of the pending dimension data in all historical time periods is recorded as the historical average call frequency of the pending dimension data.

进一步地，所述同调参数的具体获取方法为：Furthermore, the specific method for obtaining the coherence parameters is:

将待定维度数据与任意公有维度数据在历史时间区间内被所有数据系统同时调取的次数，记为待定维度数据与公有维度数据维的同调因子，将待定维度数据与所有公有维度数据维的同调因子的均值，记为待定维度数据的同调参数。The number of times that the pending dimension data and any public dimension data are simultaneously retrieved by all data systems within the historical time interval is recorded as the synchronization factor between the pending dimension data and the public dimension data, and the average of the synchronization factors between the pending dimension data and all public dimension data is recorded as the synchronization parameter of the pending dimension data.

进一步地，所述待定维度数据的半公有可能性的具体计算方法为：Furthermore, the specific calculation method of the semi-public possibility of the undetermined dimensional data is:

半公有可能性的具体计算表达式为：The specific calculation expression of semi-public possibility is:

其中，表示第个待定维度数据的半公有可能性，表示第个待定维度数据的历史平均调用频率，表示历史调用参考值，表示第个待定维度数据的同调参数，表示所有数据系统调用第个待定维度数据的历史调用总频次，表示softsign归一化函数。in, Indicates the The semi-public possibility of undetermined dimension data, Indicates the The historical average call frequency of undetermined dimension data, Indicates the historical call reference value, Indicates the The homology parameters of the data with undetermined dimensions, Indicates that all data system calls The total historical call frequency of undetermined dimension data, Represents the softsign normalization function.

进一步地，所述对公有维度数据、私有维度数据、半公有维度数据和诸私有维度数据分别进行存储，包括的具体方法为：Furthermore, the public dimension data, private dimension data, semi-public dimension data and private dimension data are stored separately, including the specific method of:

设立若干个服务器，将存储空间最大的服务器作为总服务器并使每个数据系统对应一个服务器；Set up several servers, use the server with the largest storage space as the master server and assign one server to each data system;

将私有维度数据存储在所属数据系统对应的服务器中，将诸私有维度数据存储在调用过所述诸私有维度数据的数据系统所对应的服务器中，将公有维度数据存储在总服务器中；将半公有数据维存储在总服务器以及所属数据系统对应的服务器中。Private dimension data is stored in the server corresponding to the data system to which it belongs, all private dimension data is stored in the server corresponding to the data system that has called the private dimension data, and public dimension data is stored in the main server; semi-public data dimensions are stored in the main server and the server corresponding to the data system to which they belong.

本发明的技术方案的有益效果是：通过分析根据各数据在众多子系统内的重复性与调用频繁性，结合各数据间的关联性，将数据划分为公有维度数据、私有维度数据、半公有维度数据与诸私有维度数据，通过划分维度数据类型并根据不同特征进行存储，可以提高数据存取的效率，公共维度数据和私有维度数据分别存储在不同的服务器上，减少了数据冗余，并且提供了更快的数据访问速度，有利于对总服务器中的公有维度数据和半公有维度数据进行统一处理，避免了分散存储时对其进行延迟复制以保持最终一致性时产生部分数据丢失的情况；而对于私有数据和诸私有维度数据存储在其各自数据系统中，避免了对其存储时出现冗余存储造成的资源浪费，即通过将不同类型的维度数据存储在不同的服务器上，可以根据其调用频率和公共性进行合理的分配，减少了存储空间的浪费，私有维度数据存储在相应的数据系统对应的服务器中，保障了数据的安全性和隐私性，只有拥有相应权限的用户才能访问，这样有利于提高数据的安全性，防止在某个系统操作时误删或修改导致数据丢失的情况发生，同时通过根据维度数据的调取频率和历史使用情况来动态调整数据存储策略，可以使得对常用数据的访问更加实时和响应迅速，利用半公有可能性的概念，对待定维度数据进行分析和分类，能够更加智能地调整数据存储策略，使得数据存储更加灵活和高效，故通过优化存储策略有效提高了对不同维度数据的存储效果。The beneficial effects of the technical solution of the present invention are as follows: by analyzing the repetitiveness and call frequency of each data in numerous subsystems and combining the correlation between each data, the data is divided into public dimension data, private dimension data, semi-public dimension data and private dimension data. By dividing the dimension data types and storing them according to different characteristics, the efficiency of data access can be improved. Public dimension data and private dimension data are stored on different servers respectively, which reduces data redundancy and provides faster data access speed, which is conducive to unified processing of public dimension data and semi-public dimension data in the main server, avoiding the loss of part of the data when delayed replication is performed to maintain final consistency during decentralized storage; and for private data and private dimension data, they are stored in their respective data systems, avoiding the waste of resources caused by redundant storage when storing them, that is, By storing different types of dimensional data on different servers, they can be reasonably allocated according to their call frequency and publicness, reducing the waste of storage space. Private dimensional data is stored in the server corresponding to the corresponding data system, ensuring the security and privacy of the data. Only users with corresponding permissions can access it, which is conducive to improving data security and preventing data loss due to accidental deletion or modification during a system operation. At the same time, by dynamically adjusting the data storage strategy based on the call frequency and historical usage of the dimensional data, access to commonly used data can be made more real-time and responsive. By using the concept of semi-public possibility to analyze and classify the dimensional data to be determined, the data storage strategy can be adjusted more intelligently, making data storage more flexible and efficient. Therefore, by optimizing the storage strategy, the storage effect of different dimensional data is effectively improved.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the drawings required for use in the embodiments or the description of the prior art. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative work.

图1为本发明一种基于大数据的多维数据快速存储方法的步骤流程图；FIG1 is a flowchart of a method for rapidly storing multidimensional data based on big data according to the present invention;

图2为智慧校园平台示意图。Figure 2 is a schematic diagram of the smart campus platform.

具体实施方式DETAILED DESCRIPTION

为了更进一步阐述本发明为达成预定发明目的所采取的技术手段及功效，以下结合附图及较佳实施例，对依据本发明提出的一种基于大数据的多维数据快速存储方法，其具体实施方式、结构、特征及其功效，详细说明如下。在下述说明中，不同的“一个实施例”或“另一个实施例”指的不一定是同一实施例。此外，一或多个实施例中的特定特征、结构或特点可由任何合适形式组合。To further illustrate the technical means and effectiveness of the present invention to achieve its intended purpose, the following, in conjunction with the accompanying drawings and preferred embodiments, describes in detail a method for rapidly storing multidimensional data based on big data, including its specific implementation, structure, features, and effectiveness. In the following description, different references to "one embodiment" or "another embodiment" do not necessarily refer to the same embodiment. Furthermore, specific features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

除非另有定义，本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

下面结合附图具体的说明本发明所提供的一种基于大数据的多维数据快速存储方法的具体方案。The specific solution of the multi-dimensional data fast storage method based on big data provided by the present invention is described in detail below with reference to the accompanying drawings.

请参阅图1，其示出了本发明一个实施例提供的一种基于大数据的多维数据快速存储方法的步骤流程图，该方法包括以下步骤：Please refer to FIG1 , which shows a flowchart of a method for fast storage of multidimensional data based on big data according to an embodiment of the present invention. The method includes the following steps:

步骤S001：获取若干数据系统中分别包含的若干维度数据。Step S001: Acquire a plurality of dimensional data respectively contained in a plurality of data systems.

需要说明的是，智慧校园平台是一个多系统集合形成的校园管理平台，通常包含若干个数据系统，例如有：行政系统、教学系统、科研系统以及图书馆系统，而每个系统均面向不同的用户，如图2所示为智慧校园平台示意图，包括学生、教师以及工作人员，因此不同用户在各系统中均对应有多维度的数据信息，例如在行政系统中，对应有学生和教师的身份信息、院系信息，工作人员的身份信息和校园科室信息，即后勤管理部，后勤管理部包括若干职能部门和若干业务服务部门。因此为了提高对智慧校园平台中用户数据的存储效率，需要对智慧校园平台中的多维度数据进行分析。It should be noted that the smart campus platform is a campus management platform formed by a collection of multiple systems, typically including several data systems, such as the administrative system, teaching system, scientific research system, and library system. Each system is oriented to different users. Figure 2 shows a schematic diagram of the smart campus platform, including students, teachers, and staff. Therefore, different users have multi-dimensional data information corresponding to each system. For example, in the administrative system, there is student and teacher identity information, department information, staff identity information, and campus department information, namely the logistics management department. The logistics management department includes several functional departments and several business service departments. Therefore, in order to improve the storage efficiency of user data in the smart campus platform, it is necessary to analyze the multi-dimensional data in the smart campus platform.

具体的，为了实现本实施例提出的一种基于大数据的多维数据快速存储方法，首先需要采集若干维度数据，具体过程为：Specifically, in order to implement the multi-dimensional data fast storage method based on big data proposed in this embodiment, it is necessary to first collect data of several dimensions. The specific process is as follows:

通过智慧校园平台中的数据库，获取智慧校园平台中的所有数据系统中所有用户的若干维度数据。Through the database in the smart campus platform, several dimensional data of all users in all data systems in the smart campus platform are obtained.

至此，通过上述方法得到若干数据系统中所有用户的若干维度数据。So far, several dimensional data of all users in several data systems have been obtained through the above method.

步骤S002：结合不同数据系统中所包含维度数据的数量、调取任意维度数据的时间以及维度数据被用户调取的次数，获取所述维度数据的公共系数，根据维度数据的公共系数的大小，将维度数据划分为不同类型的维度数据，包括公共维度数据、私有维度数据、待定维度数据。Step S002: Based on the number of dimensional data contained in different data systems, the time for retrieving any dimensional data, and the number of times the dimensional data is retrieved by the user, the common coefficients of the dimensional data are obtained, and according to the size of the common coefficients of the dimensional data, the dimensional data are divided into different types of dimensional data, including public dimensional data, private dimensional data, and pending dimensional data.

步骤2.1，结合不同数据系统中所包含维度数据的数量、调取任意维度数据的时间以及维度数据被用户调取的次数，获取所述维度数据的公共系数。Step 2.1: Acquire common coefficients of the dimensional data based on the amount of dimensional data included in different data systems, the time for retrieving any dimensional data, and the number of times the dimensional data is retrieved by the user.

作为一种实施例，所述公共系数的具体计算方法为：As an embodiment, the specific calculation method of the common coefficient is:

首先，根据包含任意维度数据的数据系统的数量以及在预设时间内维度数据被用户调取的次数，分别获取所述维度数据的公有频率和调取频率。First, according to the number of data systems containing any dimensional data and the number of times the dimensional data is retrieved by users within a preset time, the public frequency and the retrieval frequency of the dimensional data are respectively obtained.

然后，根据不同数据系统之间包含相同维度数据的数量以及调取任意维度数据的时间，获取每个维度数据的缩放系数。Then, the scaling factor of each dimension data is obtained according to the number of identical dimension data in different data systems and the time of retrieving any dimension data.

最后，根据维度数据的公有频率、调取频率以及缩放系数，获取维度数据的公共系数，所述公有频率、调取频率以及缩放系数均与公共系数呈正比。Finally, the common coefficient of the dimensional data is obtained according to the common frequency, the access frequency and the scaling factor of the dimensional data, wherein the common frequency, the access frequency and the scaling factor are all proportional to the common coefficient.

作为一种实施例，所述维度数据的公共系数的具体表达式为：As an embodiment, the specific expression of the common coefficient of the dimensional data is:

其中，表示维度数据的公共系数；表示包含维度数据的数据系统的数量，表示所有数据系统的数量，表示在预设的历史时间段内维度数据被用户调取的次数，表示维度数据被用户调取的总次数，表示维度数据的缩放系数；表示维度数据的公有频率，表示维度数据的调取频率。in, Represents the common coefficients of dimensional data; Indicates the number of data systems that contain dimensional data, represents the number of all data systems, Indicates the number of times dimension data is retrieved by users within the preset historical time period. Indicates the total number of times dimension data is retrieved by users. Represents the scaling factor of dimensional data; Represents the common frequency of dimensional data, Indicates the frequency of retrieval of dimension data.

所述维度数据的缩放系数的获取方法为：The method for obtaining the scaling factor of the dimensional data is:

获取智慧校园平台的每个数据系统在调取任意维度数据时数据系统的响应时间，记为数据系统对所述维度数据的调取时间；根据智慧校园平台的不同数据系统的维度数据之间的集合关系，以及不同数据系统调取相同维度数据的调取时间差异，得到所述维度数据的缩放系数。Obtain the response time of each data system of the smart campus platform when retrieving any dimensional data, which is recorded as the retrieval time of the dimensional data by the data system; obtain the scaling coefficient of the dimensional data based on the set relationship between the dimensional data of different data systems of the smart campus platform and the difference in retrieval time of different data systems retrieving the same dimensional data.

作为一种实施例，维度数据的缩放系数的具体计算方法为：As an embodiment, the specific calculation method of the scaling coefficient of the dimensional data is:

需要说明的是，由于不同数据系统中数据的设计架构不同，使得用户对不同数据系统中的相同的维度数据进行调取时，会存在系统响应时间不同的情况，本实施例中将在系统架构设计过程中不同数据系统中被频繁调取的相同维度的数据作为一种公共数据，因此不同数据系统中对相同的维度数据的调取时间越相近，则该维度数据越有可能为公共数据，则维度数据的缩放系数就越大，以保证该维度数据的公共系数越大，另外，在校园智慧平台的不同数据系统之间所包含的相同维度数据越多，则数据系统中维度数据被共用的程度就越高，对应的公共系数就越大。It should be noted that due to the different design architectures of data in different data systems, when users call the same dimensional data in different data systems, there will be different system response times. In this embodiment, the data of the same dimension that is frequently called in different data systems during the system architecture design process is regarded as a kind of public data. Therefore, the closer the call time for the same dimensional data in different data systems is, the more likely the dimensional data is to be public data, and the larger the scaling coefficient of the dimensional data is, to ensure that the public coefficient of the dimensional data is larger. In addition, the more the same dimensional data is contained between different data systems in the campus smart platform, the higher the degree of sharing of the dimensional data in the data system, and the larger the corresponding public coefficient.

步骤2.2，根据维度数据的公共系数的大小，将维度数据划分为不同类型的维度数据，包括公共维度数据、私有维度数据、待定维度数据。Step 2.2: Divide the dimensional data into different types of dimensional data according to the size of the public coefficients of the dimensional data, including public dimensional data, private dimensional data, and pending dimensional data.

具体划分方法为：预设第一阈值和第二阈值，将公共系数大于等于第一阈值的维度数据记为公共维度数据，将公共系数小于第一阈值且大于等于第二阈值的维度数据记为待定维度数据，将公共系数小于等于第二阈值的维度数据记为私有维度数据。The specific division method is: preset the first threshold and the second threshold , the dimensional data whose public coefficient is greater than or equal to the first threshold is recorded as public dimensional data, the dimensional data whose public coefficient is less than the first threshold and greater than or equal to the second threshold is recorded as pending dimensional data, and the dimensional data whose public coefficient is less than or equal to the second threshold is recorded as private dimensional data.

需要说明的是，根据经验预设第一阈值、第二阈值，可根据实际情况进行调整，本实施例不进行具体限定。It should be noted that the first threshold is preset based on experience , the second threshold , can be adjusted according to actual conditions, and this embodiment does not specifically limit it.

需要说明的是，由于不同维度数据在被不同数据系统所调用，且在调用过程中不同维度数据的调用情况也存在差异，本实施例通过分析维度数据在所有数据系统中的调用情况，将维度数据进行区别划分，所得到的划分结果中，公共维度数据表示该维度数据被多个数据系统调用，私有维度数据表示对应维度数据仅有个别子系统在调用，通过对不同维度数据进行划分，以便于后续根据维度数据在数据系统中的调取情况，对维度数据进行分类型存储，提高对不同维度数据的存储效果，即通过维度数据划分，将一些基础信息以及各数据系统通用的维度数据存储在总数据库中，有利于对这些维度数据进行统一处理，避免分散存储导致出现数据丢失的情况，而对于私有维度数据，由于其仅存在于个别的数据系统中，因此可以将私有维度数据分别存储在其对应数据系统的服务器中，避免私有维度数据进行冗余存储造成的资源浪费。It should be noted that, since different dimensional data are called by different data systems, and the calling situations of different dimensional data are also different during the calling process, this embodiment divides the dimensional data into different categories by analyzing the calling situation of dimensional data in all data systems. In the obtained division results, public dimensional data indicates that the dimensional data is called by multiple data systems, and private dimensional data indicates that the corresponding dimensional data is only called by individual subsystems. By dividing the different dimensional data, it is convenient to subsequently store the dimensional data by type according to the calling situation of the dimensional data in the data system, thereby improving the storage effect of different dimensional data. That is, through the division of dimensional data, some basic information and dimensional data common to each data system are stored in the overall database, which is conducive to unified processing of these dimensional data and avoids data loss caused by decentralized storage. As for private dimensional data, since it only exists in individual data systems, the private dimensional data can be stored separately in the servers of their corresponding data systems to avoid resource waste caused by redundant storage of private dimensional data.

至此，通过上述方法得到公共维度数据、私有维度数据、待定维度数据。So far, the public dimension data, private dimension data, and pending dimension data have been obtained through the above method.

步骤S003：根据待定维度数据在过去时间内被调用的情况，获取待定维度数据的半公有可能性，根据半公有可能性的大小将待定维度数据分为半公有维度数据和诸私有维度数据。Step S003: According to the situation in which the pending dimension data has been called in the past, the semi-public possibility of the pending dimension data is obtained, and the pending dimension data is divided into semi-public dimension data and private dimension data according to the size of the semi-public possibility.

需要说明的是，智慧校园平台中通常存在一些不常用但又比较重要数据系统，这些数据系统内的维度数据在采用分布式存储时，可能由于不常用的原因导致用户在进行数据调取时出现数据未更新的情况，为了确定这些维度数据是否需要与公有维度数据一起存储，并在所属数据系统对应服务器内进行备份，本实施例选择根据维度数据在过去时间内被调用的情况将这些维度数据筛选出来，被调取越是频繁说明该维度数据越是重要，这与所属数据系统是否常用或重要无关，如在报名一些比赛时要求核对英语成绩，但英语的四六级报名系统一年仅用到一两次。另外，待定维度数据与公有维度数据之间的关联程度越深，即公有数据出现时待定维度数据也总会伴随出现，说明该待定维度数据越是基础、重要，因此在后续对该维度数据进行存储时，便需要与公有维度数据一起存储，且需要在其各自所属数据系统进行备份。It should be noted that the smart campus platform often contains some infrequently used but relatively important data systems. When distributed storage is used, the dimensional data within these systems may not be updated when users retrieve the data due to its infrequent use. To determine whether these dimensional data need to be stored with the public dimensional data and backed up within the corresponding server of the data system, this embodiment selects these dimensional data based on how frequently they have been retrieved. The more frequently they are retrieved, the more important they are. This is irrelevant to whether the data system they belong to is frequently used or important. For example, English scores may be verified when registering for certain competitions, but the CET-4 and CET-6 registration systems are only used once or twice a year. In addition, the deeper the correlation between the pending dimensional data and the public dimensional data, that is, the pending dimensional data will always appear when the public data appears, the more fundamental and important the pending dimensional data is. Therefore, when the dimensional data is subsequently stored, it needs to be stored with the public dimensional data and backed up within its respective data system.

具体的，首先，根据待定维度数据在过去时间内被调用的情况，获取待定维度数据的半公有可能性。Specifically, first, according to the situation that the pending dimension data was called in the past, the semi-public possibility of obtaining the pending dimension data is obtained.

作为一种优选的实施例，所述待定维度数据的半公有可能性的获取方法为：As a preferred embodiment, the method for obtaining the semi-public probability of the undetermined dimensional data is:

作为一种可选的实施例，所述待定维度数据的半公有可能性的具体计算方法为：As an optional embodiment, a specific method for calculating the semi-public probability of the undetermined dimensional data is as follows:

所述待定维度数据的历史平均调用频率的获取方法为：预设历史时间段和历史时间区间，历史时间区间包含若干个历史时间段，将待定维度数据在一个历史时间段内被所有数据系统调用的次数记为待定维度数据在所述历史时间段的历史调用参数，将待定维度数据在所有历史时间段的平均历史调用参数记为待定维度数据的历史平均调用频率。The method for obtaining the historical average call frequency of the pending dimension data is as follows: presetting a historical time period and a historical time interval, where the historical time interval includes several historical time periods, recording the number of times the pending dimension data is called by all data systems within a historical time period as the historical call parameter of the pending dimension data in the historical time period, and recording the average historical call parameter of the pending dimension data in all historical time periods as the historical average call frequency of the pending dimension data.

需要说明的是，根据经验预设历史时间段为一个月，历史时间区间为一年，可根据情况进行调整，本实施例不进行具体限定。It should be noted that, based on experience, the historical time period is preset to one month and the historical time interval is preset to one year, which can be adjusted according to circumstances and are not specifically limited in this embodiment.

所述历史调用参考值为所有待定维度数据的历史平均调用频率的均值。The historical call reference value is the average of the historical average call frequencies of all pending dimension data.

所述待定维度数据的同调参数的获取方法为：将待定维度数据与任意公有维度数据在历史时间区间内被所有数据系统同时调取的次数，记为待定维度数据与公有维度数据维的同调因子，将待定维度数据与所有公有维度数据维的同调因子的均值，记为待定维度数据的同调参数。The method for obtaining the synchronization parameter of the pending dimensional data is as follows: the number of times the pending dimensional data and any public dimensional data are simultaneously retrieved by all data systems within a historical time interval is recorded as the synchronization factor of the pending dimensional data and the public dimensional data dimension, and the average of the synchronization factors of the pending dimensional data and all public dimensional data dimensions is recorded as the synchronization parameter of the pending dimensional data.

所述待定维度数据的历史调用总频次的获取方法为：将待定维度数据在历史时间区间内被所有数据系统调用的总次数，记为待定维度数据的历史调用总频次。The method for obtaining the total historical call frequency of the pending dimension data is as follows: the total number of times the pending dimension data is called by all data systems within a historical time interval is recorded as the total historical call frequency of the pending dimension data.

需要说明的是，用于描述对待定维度数据维在过去时间内被调取的频次，调取待定维度数据的次数相对于调取所有待定维度数据的次数越多，表明该待定维度数据越是常用，则该待定维度数据越偏向于公有的数据；用于描述待定维度数据与众多公有维度数据之间关于调取频率之间的关联，当待定维度数据被调用时，若有更多的公有维度数据同时被调用，就说明该待定维度数据与公有维度数据的关联性越高，则当待定维度数据越是常用且与公有维度数据的关联性越高，其可以作为半公有数据维的可能就越大。It should be noted that It is used to describe the frequency with which the pending dimension data has been retrieved in the past. The more times the pending dimension data is retrieved relative to the number of times all the pending dimension data are retrieved, the more frequently the pending dimension data is used, and the more likely it is to be public data. It is used to describe the association between the frequency of retrieval of pending dimension data and a large number of public dimension data. When the pending dimension data is called, if more public dimension data are called at the same time, it means that the correlation between the pending dimension data and the public dimension data is higher. The more commonly the pending dimension data is used and the higher the correlation with the public dimension data, the greater the possibility that it can be used as a semi-public data dimension.

然后，根据半公有可能性的大小将待定维度数据分为半公有维度数据和诸私有维度数据。Then, the undetermined dimension data is divided into semi-public dimension data and private dimension data according to the size of the semi-public possibility.

作为一种实施例，半公有维度数据和诸私有维度数据的具体获取方法为：预设半公有阈值，将半公有可能性大于等于半公有阈值的待定维度数据记为半公有维度数据，将半公有可能性小于半公有阈值的待定维度数据记为诸私有维度数据。As an embodiment, the specific method for obtaining semi-public dimension data and private dimension data is: preset a semi-public threshold, record the pending dimension data with a semi-public possibility greater than or equal to the semi-public threshold as semi-public dimension data, and record the pending dimension data with a semi-public possibility less than the semi-public threshold as private dimension data.

需要说明的是，本实施例中根据经验预设半公有阈值为0.5，可根据实际情况进行调整，本实施例不进行具体限定。It should be noted that, in this embodiment, the semi-public threshold is preset to 0.5 based on experience, and can be adjusted according to actual conditions. This embodiment does not impose any specific limitation.

需要说明的是，本实施例通过更进一步地对得到的待定维度数据进行划分，若待定维度数据维的半公有可能性大于等于半公有阈值，则说明该待定维度数据被常用的可能越大，因此将其作为半公有维度数据；反之，若待定维度数据地半公有可能性小于半公有阈值，则认为该维度数据为不常用的数据，因此将其作为诸私有维度数据。本实施例通过将维度数据进一步细分为半公有维度数据和诸私有维度数据，是为了后续进行数据存储时，将相对常用的半公有维度数据存储在总数据库中，并在对应数据系统中进行备份，有利于提高数据存储的安全性，而对于划分出的诸私有维度数据，由于该维度数据相对不常用，但被调用的数据系统较多，因此后续对其进行数据存储时，应当进行多次存储，以保证数据不会出现丢失的情况；It should be noted that this embodiment further divides the obtained pending dimensional data. If the semi-public probability of the pending dimensional data is greater than or equal to the semi-public threshold, it means that the pending dimensional data is more likely to be frequently used, so it is regarded as semi-public dimensional data; on the contrary, if the semi-public probability of the pending dimensional data is less than the semi-public threshold, the dimensional data is considered to be infrequently used data, so it is regarded as private dimensional data. This embodiment further divides the dimensional data into semi-public dimensional data and private dimensional data so that when data is subsequently stored, the relatively commonly used semi-public dimensional data can be stored in the general database and backed up in the corresponding data system, which is conducive to improving the security of data storage. As for the divided private dimensional data, since this dimensional data is relatively infrequently used but is called by many data systems, it should be stored multiple times when data is subsequently stored to ensure that the data will not be lost.

至此，通过上述方法得到半公有维度数据和诸私有维度数据。At this point, the semi-public dimension data and private dimension data are obtained through the above method.

步骤S004：对公有维度数据、私有维度数据、半公有维度数据和诸私有维度数据分别进行存储。Step S004: store the public dimension data, private dimension data, semi-public dimension data and private dimension data respectively.

具体的，首先，设立若干个服务器，将存储空间最大的服务器作为总服务器并使每个数据系统对应一个服务器。Specifically, first, several servers are set up, the server with the largest storage space is used as the main server, and each data system corresponds to one server.

其次，将私有维度数据存储在所属数据系统对应的服务器中，将诸私有维度数据存储在调用过所述诸私有维度数据的数据系统所对应的服务器中。Secondly, the private dimension data is stored in a server corresponding to the data system to which it belongs, and all private dimension data are stored in a server corresponding to the data system that has called the private dimension data.

最后，将公有维度数据存储在总服务器中；将半公有数据维存储在总服务器以及所属数据系统对应的服务器中。Finally, the public dimension data is stored in the main server; the semi-public data dimension is stored in the main server and the server corresponding to the data system to which it belongs.

需要说明的是，由于私有维度数据仅在所属的数据系统中被调用，因此为了便于存储和调用，本实施例将其存储在每个私有维度数据所属数据系统对应的服务器中；由于诸私有维度数据虽然不常用，但是在多个数据系统中均会被调用，因此需要保证诸私有维度数据在被数据系统调用时，能够快速调用和存储，因此本实施例选择将诸私有维度数据存储在被调用过的数据系统对应的服务器中；由于公有维度数据调用频率高，作为一种公有数据，其属于学生、教职工的基础数据，受到多个数据系统调用，因此避免将公有维度数据分布存储导致需要频繁同步的情况，本实施例将其存储到总服务器内；由于半公有维度数据被调用的频率较高，因此为了保证半公有数据不会出现存储错误或数据丢失的问题，本实施例选择将半公有数据在总服务器和所属数据系统对应的服务中，保证存储安全。It should be noted that since private dimension data is only called in the data system to which it belongs, in order to facilitate storage and calling, this embodiment stores it in the server corresponding to the data system to which each private dimension data belongs; although private dimension data is not frequently used, it will be called in multiple data systems. Therefore, it is necessary to ensure that the private dimension data can be quickly called and stored when called by the data system. Therefore, this embodiment chooses to store the private dimension data in the server corresponding to the called data system; since public dimension data is called frequently, as a kind of public data, it is the basic data of students, faculty and staff, and is called by multiple data systems. Therefore, to avoid the situation where the public dimension data is distributed and requires frequent synchronization, this embodiment stores it in the main server; since semi-public dimension data is called frequently, in order to ensure that semi-public data will not have storage errors or data loss problems, this embodiment chooses to store the semi-public data in the service corresponding to the main server and the data system to which it belongs to ensure storage security.

至此，本实施例完成。At this point, this embodiment is completed.

需要说明的是，本实施例中所用的模型仅用于表示负相关关系和约束模型输出的结果处于区间内，具体实施时，可替换成具有同样目的的其他模型，本实施例只是以模型为例进行叙述，不对其做具体限定，其中是指该模型的输入。It should be noted that the The model is only used to represent negative correlation and constrain the output of the model to be in In the specific implementation, it can be replaced by other models with the same purpose. This embodiment is only based on The model is described as an example without any specific limitation. is the input to the model.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the principles of the present invention should be included in the scope of protection of the present invention.

Claims

1. A method for fast storage of multidimensional data based on big data, characterized in that the method comprises the following steps:

Obtaining multiple dimensional data contained in multiple data systems;

Based on the amount of dimension data contained in different data systems, the time for retrieving any dimension data, and the number of times the dimension data is retrieved by users, the common coefficients of the dimension data are obtained, and according to the size of the common coefficients of the dimension data, the dimension data are divided into different types of dimension data, including public dimension data, private dimension data, and undetermined dimension data;

According to the situation in which the undetermined dimension data has been called in the past, the semi-public possibility of the undetermined dimension data is obtained, and the undetermined dimension data is divided into semi-public dimension data and private dimension data according to the size of the semi-public coefficient;

Public dimension data, private dimension data, semi-public dimension data and private dimension data are stored separately;

The method for obtaining the semi-public possibility of the pending dimension data is as follows: based on the calling situation of the pending dimension data in the past, respectively obtaining the historical call reference value and the historical average calling frequency of the pending dimension data, the coherence parameter, and the total historical calling frequency of all data systems calling the pending dimension data;

The historical call reference value is the average of the historical average call frequencies of all undetermined dimension data;

The specific calculation method of the semi-common coefficient of the data with undetermined dimensions is:

in, Indicates the The semi-public coefficients of data with undetermined dimensions, Indicates the The historical average call frequency of undetermined dimension data, Indicates the historical call reference value, Indicates the The homology parameters of the data with undetermined dimensions, Indicates that all data system calls The total historical call frequency of undetermined dimension data, Represents the softsign normalization function.

2. The method for fast multidimensional data storage based on big data according to claim 1, wherein the method for obtaining the common coefficients of the dimensional data is based on the amount of dimensional data contained in different data systems, the time when any dimensional data is retrieved, and the number of times the dimensional data is retrieved by the user, including the following specific methods:

According to the number of data systems containing any dimension data and the number of times the dimension data is retrieved by users within a preset time, respectively obtain the public frequency and retrieval frequency of the dimension data;

Obtain the scaling factor for each dimension data based on the number of identical dimension data in different data systems and the time it takes to retrieve any dimension data;

The common coefficient of the dimensional data is obtained according to the common frequency, the retrieval frequency and the scaling factor of the dimensional data, wherein the common frequency, the retrieval frequency and the scaling factor are all proportional to the common coefficient.

3. The method for fast multi-dimensional data storage based on big data according to claim 2, wherein the specific method for obtaining the common frequency and the retrieval frequency of the dimensional data is:

For any dimension data, obtain the number of data systems containing the dimension data, the number of all data systems, the number of times the dimension data was retrieved by users within a preset historical time period, and the total number of times the dimension data was retrieved by users;

The ratio of the number of data systems containing dimension data to the number of all data systems is recorded as the common frequency of dimension data;

The ratio of the number of times the dimension data is retrieved by the user within a preset historical time period to the total number of times the dimension data is retrieved by the user is recorded as the retrieval frequency of the dimension data.

4. The method for fast multidimensional data storage based on big data according to claim 2, wherein the scaling factor of each dimension data is obtained based on the amount of identical dimension data between different data systems and the time when any dimension data is retrieved, including the following specific methods:

The specific calculation method of the scaling factor of dimensional data is:

Where, Indicates the The scaling factor of the dimensional data, Indicates the The data system The retrieval time of each dimension data, Indicates the The data system The retrieval time of each dimension data, Indicates the A data system and The number of elements in the intersection of dimensional data between data systems, Indicates the A data system and The number of elements in the union of dimensional data between the data systems, represents the mean function, Represents an exponential function with a natural constant as its base.

5. The method for fast multidimensional data storage based on big data according to claim 1 is characterized in that the dimensional data is divided into different types of dimensional data according to the size of the common coefficients of the dimensional data, including public dimensional data, private dimensional data, and undetermined dimensional data, and the specific method includes:

Preset first threshold and the second threshold , the dimensional data whose public coefficient is greater than or equal to the first threshold is recorded as public dimensional data, the dimensional data whose public coefficient is less than the first threshold and greater than or equal to the second threshold is recorded as pending dimensional data, and the dimensional data whose public coefficient is less than or equal to the second threshold is recorded as private dimensional data.

6. The method for fast multi-dimensional data storage based on big data according to claim 1, wherein the specific method for obtaining the historical average call frequency is:

A historical time period and a historical time interval are preset. The historical time interval includes several historical time periods. The number of times the pending dimension data is called by all data systems within a historical time period is recorded as the historical call parameter of the pending dimension data in the historical time period. The average historical call parameter of the pending dimension data in all historical time periods is recorded as the historical average call frequency of the pending dimension data.

7. The method for fast multidimensional data storage based on big data according to claim 6, wherein the specific method for obtaining the coherence parameters is:

The number of times that the pending dimension data and any public dimension data are simultaneously retrieved by all data systems within the historical time interval is recorded as the synchronization factor between the pending dimension data and the public dimension data, and the average of the synchronization factors between the pending dimension data and all public dimension data is recorded as the synchronization parameter of the pending dimension data.

8. The method for fast multidimensional data storage based on big data according to claim 6, wherein the specific method for obtaining the total frequency of historical calls of all data systems to the undetermined dimensional data is:

The total number of times the pending dimension data is called by all data systems within the historical time interval is recorded as the total historical call frequency of the pending dimension data.

9. The method for fast multidimensional data storage based on big data according to claim 1, wherein the public dimension data, private dimension data, semi-public dimension data, and private dimension data are stored separately, including the following specific methods:

Set up several servers, use the server with the largest storage space as the master server and assign one server to each data system;

Private dimension data is stored in the server corresponding to the data system to which it belongs, all private dimension data is stored in the server corresponding to the data system that has called the private dimension data, and public dimension data is stored in the main server; semi-public data dimensions are stored in the main server and the server corresponding to the data system to which they belong.