KR101757253B1

KR101757253B1 - Method and apparatus for managing multidimensional data

Info

Publication number: KR101757253B1
Application number: KR1020160174842A
Authority: KR
Inventors: 엄정호; 박경석; 이용; 문봉기; 김상철; 이준희; 김태훈
Original assignee: 한국과학기술정보연구원
Priority date: 2016-12-20
Filing date: 2016-12-20
Publication date: 2017-07-13
Anticipated expiration: 2036-12-20
Also published as: WO2018117504A1

Abstract

본 발명은 해시 기법을 기반으로 다차원 데이터에 대한 최적화된 분산 관리가 수행되도록 함으로써, 기존과 달리 데이터 적재를 위해 불필요하게 소비되는 적재 시간 및 비용을 최소화하여 고속 데이터 로딩이 가능하도록 하는, 새로운 다차원 데이터 분산관리 방안이 실현되도록 하는 다차원 데이터를 관리하기 위한 장치 및 그 방법을 제안한다.According to the present invention, optimized distributed management of multidimensional data is performed on the basis of a hash technique. Thus, new multidimensional data, which enables high-speed data loading by minimizing a loading time and cost unnecessarily consumed for data loading, The present invention proposes an apparatus and method for managing multidimensional data that realizes a distributed management plan.

Description

[0001] METHOD AND APPARATUS FOR MANAGING MULTIDIMENSIONAL DATA [0002]

본 발명은 다차원 데이터베이스(Multi-dimensional database) 기반의 데이터 관리 기술에 관한 것으로, 더욱 상세하게는, 해시 기법을 기반으로 다차원 데이터에 대한 최적화된 분산 관리를 수행할 수 있는, 새로운 다차원 데이터 분산관리 방안에 관한 것이다.The present invention relates to a data management technique based on a multi-dimensional database, and more particularly, to a new multi-dimensional data distribution management method capable of performing optimized distributed management of multi-dimensional data based on a hash technique .

다차원 데이터베이스는 과학정보, 의료정보 등과 같은 대용량의 다차원 데이터를 모델링하고, 질의 처리 및 연산을 지원하기 위해 다수의 애트리뷰트(Attribute, 이하 속성) 항목을 가지고 있는 자료를 관리하는 데이터베이스이다.A multidimensional database is a database that manages data having a large number of attributes (attributes) in order to model large-scale multidimensional data such as scientific information and medical information, and to support query processing and computation.

이러한 다차원 데이터베이스를 기반으로 대용량의 다차원 데이터를 분석 및 그 결과를 저장할 수 있으나, 최근 실험 데이터 및 과학연구에서 활용되는 데이터의 용량이 관측 장비 및 관련 기술의 발달로 폭증하고 있기 때문에 이러한 다차원 데이터를 적재하기까지 오랜 시간이 소비되어야만 한다.Although it is possible to analyze and store large-scale multi-dimensional data based on such a multi-dimensional database, since the capacity of data used in recent experimental data and scientific research is increasing due to the development of observation equipment and related technologies, It must take a long time to complete.

즉, 다차원 데이터의 경우, 하나의 레코드가 여러 특징을 가지는 형태를 가지므로 이를 데이터베이스에 적재하기 위해서는, 적재하려는 다차원 데이터베이스의 포맷으로 변환하는 과정, 분배하는 과정, 정렬하는 과정 등이 수행되어야만 한다.That is, in the case of multidimensional data, since a single record has various characteristics, in order to load it into the database, conversion to a format of the multidimensional database to be loaded, distribution process, and sorting process must be performed.

이러한 다차원 데이터의 기존 저장 과정을 보다 구체적으로 설명하면, 다차원 데이터베이스의 경우, 클러스터를 활용하여 데이터를 관리하고, 일반적으로 CSV 등의 데이터 파일 포맷으로 다차원 데이터를 관리하게 된다. 만일, 다차원 데이터가 HDF5 데이터로 입력되는 경우, 반드시 포맷 변경을 수행해야만 적재가 가능하게 된다. 이후, CSV 형태의 다차원 데이터를 분산하여 해당 인스턴스DB서버에 배치하는 과정을 수행하게 되는데, 이때 분산 방식으로는 주로 라운드로빈(Round-Robin) 방식이 이용된다.In the case of a multidimensional database, data is managed using a cluster, and multidimensional data is managed in a data file format such as CSV in general. If multidimensional data is input as HDF5 data, the format change must be performed before loading. Thereafter, the multivariate data of the CSV form is distributed and placed in the corresponding instance DB server. In this case, a round robin method is mainly used as a distributed method.

이처럼 데이터 분산이 완료되면, 다차원 데이터베이스의 각 인스턴스DB서버에서는 분산된 데이터를 정렬하여 각 청크(Chunk)를 구성하게 되는 데, 이 과정에서 분배된 다차원 데이터를 관리하지 못하는 경우가 발생할 수 있게 된다. 이에, 관리가 어려운 해당 데이터를 다른 인스턴스DB서버로 전송하는 재분배 과정을 수행한 이후, 다차원 데이터를 정렬하고, 데이터베이스 포맷으로 변환하여 해당 인스턴스DB서버에 적재되게 된다.When the data distribution is completed, the DB server of each instance of the multidimensional database arranges each chunk by sorting the distributed data. In this case, the distributed multidimensional data can not be managed. After performing the redistribution process of transferring the corresponding data, which is difficult to manage, to another instance DB server, the multidimensional data is sorted, converted into the database format, and loaded into the corresponding instance DB server.

전술에서 알 수 있듯이, 기존 저장 과정을 수행하게 되면, 다차원 데이터베이스의 각 인스턴스DB서버에서 관리하기 어려운 다차원 데이터가 분배될 수 있기 때문에 해당 데이터를 재배치하기 위해 불필요한 재배치 과정을 수행해야 하며, 그에 따라 네트워크 I/O가 발생하여 네트워크 병목 현상이 발생하게 되는 문제가 있다. As can be seen from the foregoing description, since the multidimensional data that can not be easily managed by the DB server of each instance of the multidimensional database can be distributed when the existing storing process is performed, an unnecessary relocation process must be performed in order to relocate the corresponding data. There is a problem that network bottleneck occurs due to occurrence of I / O.

또한, 분배된 다차원 데이터를 재배치하기 위해서는 인스턴스DB서버 간의 통신이 발생하게 되므로, 오버헤드가 증가될 수 밖에 없으며, 그에 따라 적재 비용 및 시간이 증가하게 되는 한계점이 존재한다.In addition, since the communication between the instance DB servers occurs in order to relocate the distributed multidimensional data, there is a limit to increase the overhead, thereby increasing the loading cost and time.

본 발명은 상기한 사정을 감안하여 창출된 것으로서, 본 발명에서 도달하고자 하는 목적은, 해시 기법을 기반으로 다차원 데이터에 대한 최적화된 분산 관리가 수행되도록 함으로써, 기존과 달리 데이터 적재를 위해 불필요하게 소비되는 적재 비용 및 시간을 최소화하여 고속 데이터 로딩이 가능하도록 하는, 새로운 다차원 데이터 분산관리 방안이 실현되도록 하는데 있다.SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and an object of the present invention is to provide an apparatus and method for performing optimized distributed management of multidimensional data based on a hash technique, Dimensional data distribution management method that enables high-speed data loading by minimizing the loading cost and time.

상기 목적을 달성하기 위한 본 발명에 따른 다차원데이터관리장치는, 다차원 데이터에 포함되는 적어도 하나의 특정차원데이터 별 특정해시(hash)정보를 기반으로 적어도 하나의 청크(chunk)에 상기 특정차원데이터를 분배하는 분배관리부; 상기 청크 별로 분배된 특정차원데이터의 차원을 기반으로 각 청크에 분배된 특정차원데이터를 정렬하는 정렬관리부; 및 상기 정렬된 특정차원데이터를 기 설정된 특정포맷으로 변환하여 해당 청크와 관련되는 인스턴스DB서버에 저장하는 저장관리부를 포함하는 것을 특징으로 한다.According to an aspect of the present invention, there is provided a multidimensional data management apparatus for generating multidimensional data, the multidimensional data including at least one specific dimension data and at least one chunk, A distribution management unit for distributing the distribution data; A sort management unit for sorting the specific dimension data distributed to each chunk based on the dimension of the specific dimension data distributed by the chunks; And a storage management unit for converting the sorted specific dimension data into a predetermined format and storing the converted data in an instance DB server associated with the chunk.

상기 분배관리부는, 특정해시함수를 기반으로 상기 특정차원데이터의 차원 별 해시값을 산출한 결과에 기초하여 상기 특정해시정보에 해당하는 최종해시값을 산출하는 것을 특징으로 한다.Wherein the distribution management unit calculates a final hash value corresponding to the specific hash information based on a result of calculating a hash value for each dimension of the specific dimension data based on a specific hash function.

상기 특정차원데이터는 차원 별로 해당 차원에서의 위치에 대응하는 위치값을 포함하며, 상기 특정해시함수는, 금번 해시값을 산출하려는 제1 차원보다 먼저 해시값이 산출된 제2 차원의 해시값, 소수(Prime Number), 상기 제1 차원의 위치값, 제1 차원의 청크범위 중 적어도 하나를 파라미터로 이용하는 것을 특징으로 한다.Wherein the specific dimension data includes a position value corresponding to a position in the dimension for each dimension, and the specific hash function is a hash value of the second dimension calculated from the hash value before the first dimension for which the present hash value is to be calculated, A prime number, a position value of the first dimension, and a chunk range of the first dimension as parameters.

상기 특정해시함수는, 상기 제2 차원의 해시값과 상기 소수를 곱한 제1 결과값과, 상기 제1 차원의 위치값과 상기 제1 차원의 최소 위치값의 차이값을 상기 제1 차원의 청크범위로 나눗셈한 제2 결과값에 대한 배타적논리합(XOR)에 대응하는 것을 특징으로 한다.Wherein the specific hash function includes a first result obtained by multiplying the hash value of the second dimension by the prime number and a difference value between the position value of the first dimension and the minimum position value of the first dimension, (XOR) with respect to a second result value obtained by dividing the result by the chunk range.

상기 분배관리부는, 상기 특정해시함수를 기반으로 상기 제2 차원의 해시값을 이용하여 상기 제1 차원의 해시값이 산출되면, 상기 제1 차원의 해시값과 상기 다차원 데이터를 관리하는 인스턴스DB서버의 전체개수를 기반으로 모듈러 연산을 수행하여 상기 최종해시값을 산출하는 것을 특징으로 한다.The hash value of the first dimension and the hash value of the first dimension are calculated using the hash value of the second dimension based on the specific hash function, And performing a modular operation based on the total number of servers to calculate the final solution value.

상기 최종해시값의 크기는 상기 청크를 구분하는 청크번호와 매칭되며 상기 분배관리부는, 상기 최종해시값과 매칭되는 상기 청크번호를 갖는 해당 청크에 상기 최종해시값과 관련되는 특정차원데이터를 분배하는 것을 특징으로 한다.Wherein the size of the final hash value is matched with a chunk number that identifies the chunk and the distribution management unit is operable to determine whether the chunk number of the chunk number matches the last hash value, And the like.

상기 정렬관리부는, 기 설정된 차원관리방식을 기반으로 상기 청크 별로 분배된 특정차원데이터의 차원 별 정렬순서를 결정하고, 상기 정렬순서가 가장 빠른 해당 차원의 위치정보값을 정렬하는 동시에 다음 순서의 해당 차원의 위치정보값을 정렬하는 것을 특징으로 한다.Wherein the sorting management unit determines a sorting order of the specific dimension data divided by the chunks based on a predetermined dimension management method, arranges the position information values of the dimension having the fastest sorting order, Dimensional positional information value of the dimension.

상기 저장관리부는, 기 설정된 특정포맷을 기반으로 상기 정렬된 특정차원데이터를 변환하여 특정차원변환데이터를 생성하고, 상기 특정차원변환데이터를 해당 청크와 관련되는 인스턴스DB서버에 저장하는 것을 특징으로 한다.The storage management unit converts the sorted specific dimension data based on a predetermined format to generate specific dimension conversion data and stores the specific dimension conversion data in an instance DB server related to the chunk .

상기 목적을 달성하기 위한 본 발명에 따른 다차원데이터관리장치의 동작방법은, 다차원 데이터에 포함되는 적어도 하나의 특정차원데이터 별 특정해시정보를 기반으로 적어도 하나의 청크(chunk)에 상기 특정차원데이터를 분배하는 분배관리단계; 상기 청크 별로 분배된 특정차원데이터의 차원을 기반으로 각 청크에 분배된 특정차원데이터를 정렬하는 정렬관리단계; 및 상기 정렬된 특정차원데이터를 기 설정된 특정포맷으로 변환하여 해당 청크와 관련되는 인스턴스DB서버에 저장하는 저장관리단계를 포함하는 것을 특징으로 한다.According to another aspect of the present invention, there is provided a method of operating a multidimensional data management apparatus, the method comprising: storing at least one specific dimension data in at least one chunk based on specific hash information for each specific dimension data included in the multidimensional data; A distribution management step of distributing the distribution data; A sort management step of sorting specific dimension data distributed to each chunk based on a dimension of specific dimension data distributed for each chunk; And a storage management step of converting the sorted specific dimension data into a predetermined format and storing the converted data in an instance DB server associated with the chunk.

상기 분배관리단계는, 특정해시함수를 기반으로 상기 특정차원데이터의 차원 별 해시값을 산출한 결과에 기초하여 상기 특정해시정보에 해당하는 최종해시값을 산출하는 것을 특징으로 한다.Wherein the distribution management step calculates a final solution value corresponding to the specific solution information based on a result of calculating a hash value for each dimension of the specific dimension data based on the specific solution function.

상기 분배관리단계는, 상기 특정해시함수를 기반으로 상기 제2 차원의 해시값을 이용하여 상기 제1 차원의 해시값이 산출되면, 상기 제1 차원의 해시값과 상기 다차원 데이터를 관리하는 인스턴스DB서버의 전체개수를 기반으로 모듈러 연산을 수행하여 상기 최종해시값을 산출하는 것을 특징으로 한다.The hash value of the first dimension is calculated using the hash value of the second dimension based on the specific hash function based on the hash value of the first dimension and the instance managing the multidimensional data And the final solution value is calculated by performing a modular operation based on the total number of DB servers.

상기 최종해시값의 크기는 상기 청크를 구분하는 청크번호와 매칭되며, 상기 분배관리단계는, 상기 최종해시값과 매칭되는 상기 청크번호를 갖는 해당 청크에 상기 최종해시값과 관련되는 특정차원데이터를 분배하는 것을 특징으로 한다.Wherein the size of the final hash value is matched with a chunk number that identifies the chunk, and the distribution managing step further includes the step of determining whether or not the chunk number of the chunk number matches the final hash value, Dimensional data is distributed.

상기 정렬관리단계는, 기 설정된 차원관리방식을 기반으로 상기 청크 별로 분배된 특정차원데이터의 차원 별 정렬순서를 결정하고, 상기 정렬순서가 가장 빠른 해당 차원의 위치정보값을 정렬하는 동시에 다음 순서의 해당 차원의 위치정보값을 정렬하는 것을 특징으로 한다.Wherein the sorting management step comprises: determining a sorting order of the specific dimension data divided by the chunks based on a predetermined dimension management method; sorting the position information values of the dimension having the fastest sorting order; And arranges position information values of the dimension.

상기 저장관리단계는, 기 설정된 특정포맷을 기반으로 상기 정렬된 특정차원데이터를 변환하여 특정차원변환데이터를 생성하고, 상기 특정차원변환데이터를 해당 청크와 관련되는 인스턴스DB서버에 저장하는 것을 특징으로 한다.The storage management step may include converting the sorted specific dimension data based on a predetermined specific format to generate specific dimension conversion data and storing the specific dimension conversion data in an instance DB server related to the chunk. do.

이에, 본 발명의 다차원 데이터를 관리하기 위한 장치 및 그 방법에 의하면, 해시 기법을 기반으로 다차원 데이터에 대한 최적화된 분산 관리가 수행되도록 함으로써, 기존과 달리 데이터 적재를 위해 불필요하게 소비되는 적재 시간 및 비용을 최소화하여 고속 데이터 로딩이 가능하도록 하는, 새로운 다차원 데이터 분산관리 방안을 제공할 수 있다.According to the apparatus and method for managing multidimensional data of the present invention, optimized distribution management for multidimensional data is performed based on a hash technique, The present invention can provide a new multidimensional data distribution management method that enables high-speed data loading by minimizing the cost.

도 1은 본 발명의 실시예에 따른 다차원데이터관리장치가 적용될 통신 환경을 나타내는 도면이다.
도 2는 본 발명의 실시예에 따른 다차원 데이터베이스의 개략적인 구성도를 나타내는 도면이다.
도 3은 본 발명의 실시예에 따른 다차원데이터관리장치의 개략적인 구성도를 나타내는 도면이다.
도 4는 본 발명의 실시예에 따른 해시 기법을 기반으로 분배하려는 다차원 데이터의 일례를 나타내는 도면이다.
도 5 및 도 6은 본 발명의 실시예에 따른 해시 기법을 기반으로 특정해시정보를 산출하는 일례를 나타내는 도면이다.
도 7은 본 발명의 실시예에 따른 청크에 분배된 특정차원데이터가 저장되는 일례를 나타내는 도면이다.
도 8은 본 발명의 실시예에 따른 청크에 분배된 특정차원데이터가 저장되는 다른예를 나타내는 도면이다.
도 9는 본 발명의 실시예에 따른 해시 기반 데이터 분산관리 서비스를 제공하는 흐름을 나타내는 도면이다.1 is a diagram illustrating a communication environment to which a multidimensional data management apparatus according to an embodiment of the present invention is applied.
2 is a diagram showing a schematic configuration diagram of a multidimensional database according to an embodiment of the present invention.
3 is a diagram showing a schematic configuration diagram of a multidimensional data management apparatus according to an embodiment of the present invention.
4 is a diagram illustrating an example of multi-dimensional data to be distributed based on a hash technique according to an embodiment of the present invention.
5 and 6 are views illustrating an example of calculating specific hash information based on a hash technique according to an embodiment of the present invention.
7 is a diagram illustrating an example of storing specific dimension data distributed to chunks according to an embodiment of the present invention.
8 is a view showing another example in which specific dimension data distributed to chunks according to an embodiment of the present invention is stored.
9 is a flowchart illustrating a hash-based data distribution management service according to an embodiment of the present invention.

본 명세서에서 사용되는 기술적 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아님을 유의해야 한다. 또한, 본 명세서에서 사용되는 기술적 용어는 본 명세서에서 특별히 다른 의미로 정의되지 않는 한, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 의미로 해석되어야 하며, 과도하게 포괄적인 의미로 해석되거나, 과도하게 축소된 의미로 해석되지 않아야 한다. 또한, 본 명세서에서 사용되는 기술적인 용어가 본 발명의 사상을 정확하게 표현하지 못하는 잘못된 기술적 용어일 때에는, 당업자가 올바르게 이해할 수 있는 기술적 용어로 대체되어 이해되어야 할 것이다. 또한, 본 발명에서 사용되는 일반적인 용어는 사전에 정의되어 있는 바에 따라, 또는 전후 문맥상에 따라 해석되어야 하며, 과도하게 축소된 의미로 해석되지 않아야 한다.It is noted that the technical terms used herein are used only to describe specific embodiments and are not intended to limit the invention. It is also to be understood that the technical terms used herein are to be interpreted in a sense generally understood by a person skilled in the art to which the present invention belongs, Should not be construed to mean, or be interpreted in an excessively reduced sense. Further, when a technical term used herein is an erroneous technical term that does not accurately express the spirit of the present invention, it should be understood that technical terms that can be understood by a person skilled in the art are replaced. In addition, the general terms used in the present invention should be interpreted according to a predefined or prior context, and should not be construed as being excessively reduced.

이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성 요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 또한, 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 발명의 사상을 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 발명의 사상이 제한되는 것으로 해석되어서는 아니됨을 유의해야 한다. 본 발명의 사상은 첨부된 도면 외에 모든 변경, 균등물 내지 대체물에 까지도 확장되는 것으로 해석되어야 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings, wherein like reference numerals refer to like or similar elements throughout the several views, and redundant description thereof will be omitted. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail. It is to be noted that the accompanying drawings are only for the purpose of facilitating understanding of the present invention, and should not be construed as limiting the scope of the present invention with reference to the accompanying drawings. The spirit of the present invention should be construed as extending to all modifications, equivalents, and alternatives in addition to the appended drawings.

이하, 첨부된 도면을 참조하여 본 발명의 일 실시예에 대하여 설명한다.Hereinafter, an embodiment of the present invention will be described with reference to the accompanying drawings.

먼저, 도 1을 참조하여 본 발명의 다차원 데이터베이스(Multi-dimensional database) 기반의 데이터 관리를 수행하는 장치(이하, 다차원데이터관리장치)가 적용될 통신 환경을 설명하도록 하겠다.First, a communication environment to which an apparatus for performing data management based on a multi-dimensional database of the present invention (hereinafter referred to as a multi-dimensional data management apparatus) to which the present invention is applied will be described with reference to FIG.

도 1에 도시된 바와 같이, 본 발명이 적용될 통신 환경은, 다차원데이터관리장치(100), 사용자단말(200) 및 다차원 데이터베이스(Multi-dimensional database)(300)를 포함하는 구성을 가질 수 있다.1, the communication environment to which the present invention is applied may have a configuration including a multidimensional data management apparatus 100, a user terminal 200, and a multi-dimensional database 300.

다차원데이터관리장치(100)는, 다차원 데이터베이스(300)와의 연동을 통해 다차원 데이터에 대한 최적화된 분산 관리가 수행되도록 하기 위한 장치로서, 특히 본 발명에서 제공하고자 하는 서비스 즉, 기존과 달리 다차원 데이터 적재를 위해 불필요하게 소비되는 적재 시간 및 비용을 최소화하여 고속 데이터 로딩이 가능하도록 하는, 해시(hash) 기법을 기반으로 다차원 데이터에 대한 최적화된 분산 관리를 수행하는 서비스(이하, 해시 기반 데이터 분산관리 서비스)를 제공하기 위한 장치이다.The multidimensional data management apparatus 100 is an apparatus for performing optimized distributed management of multidimensional data through interworking with the multidimensional database 300. The multidimensional data management apparatus 100 is a device for providing services to be provided by the present invention, Based data distribution management service (hereinafter referred to as " hash-based data distribution management service ") that performs optimized distributed management on multi-dimensional data based on a hash technique that minimizes load time and cost unnecessarily consumed for multi- ). &Lt; / RTI >

이러한, 다차원데이터관리장치(100)는, 다차원 데이터베이스(300) 내에 구현되거나, 또는 별도의 장치로 분리되어 구현될 수 있다. 본 발명의 실시예에서는 다차원데이터관리장치(100)가 다차원 데이터베이스(300)와 별도로 분리되어 독립적으로 존재하는 것으로 언급하여 설명하도록 하겠다. The multidimensional data management apparatus 100 may be embodied in the multidimensional database 300 or may be implemented separately as a separate apparatus. In the embodiment of the present invention, the multidimensional data management apparatus 100 is separated from the multidimensional database 300 and independently exists.

즉, 다차원데이터관리장치(100)는, 사용자단말(200)로부터 다차원 데이터가 수신되면, 다차원 데이터베이스(300)와의 연동을 통해 다차원 데이터의 분산 관리를 위한 알고리즘(이하, 분산관리 알고리즘)에 수행하여 다차원 데이터베이스(300)의 각 인스턴스DB서버에 다차원 데이터가 분산 및 저장되도록 하여 분산관리수행결과를 생성하고, 이를 사용자단말(200)로 제공한다.That is, when the multidimensional data management apparatus 100 receives multidimensional data from the user terminal 200, the multidimensional data management apparatus 100 performs an algorithm (hereinafter referred to as a distribution management algorithm) for multidimensional data distribution management in cooperation with the multidimensional database 300 Dimensional data is distributed and stored in each instance DB server of the multidimensional database 300 to generate a distributed management execution result and provides it to the user terminal 200. [

사용자단말(200)은, 해시 기반 데이터 분산관리 서비스를 제공받기 위해 사용자가 이용하는 단말일 수 있으며, 다차원 데이터를 다차원데이터관리장치(100)로 전송하여 분산관리수행결과의 생성을 요청할 수 있다.The user terminal 200 may be a terminal used by a user to receive a hash-based data distribution management service, and may transmit the multidimensional data to the multidimensional data management apparatus 100 to request the generation of the distributed management execution result.

여기서, 다차원 데이터는, 사용자가 다차원 데이터베이스(300)에 저장하려는 데이터로서, 다양한 포맷(HDF5, HDF4, NetCDF, CSV 등)으로 표현되는 모든 데이터를 포괄할 수 있으며, 본 발명에서는 이러한 데이터 포맷과 상관 없이 다차원 데이터로 표현되는 모든 데이터를 대상으로 한다.Here, the multidimensional data may include all data represented by various formats (HDF5, HDF4, netCDF, CSV, etc.) as data to be stored in the multidimensional database 300 by the user. In the present invention, All data represented by multidimensional data is targeted.

이러한, 사용자단말(200)은, 해시 기반 데이터 분산관리 서비스를 제공받기 원하는 기관/조직의 단말이거나, 또는 일반개인이 이용하는 단말일 수 있다.The user terminal 200 may be a terminal of an organization / organization desiring to receive a hash-based data distribution management service or a terminal used by a general individual.

다차원 데이터베이스(300)는, 대용량의 다차원 데이터(예: 과학정보, 의료정보 등)를 모델링하고, 질의 처리 및 연산을 지원하기 위해 다수의 속성 항목을 가지고 있는 자료를 관리하는 데이터베이스이다. The multidimensional database 300 is a database that manages data having a large number of attribute items in order to model a large-capacity multidimensional data (e.g., scientific information, medical information, etc.) and support query processing and calculation.

이러한, 다차원 데이터베이스(300)는, 해시 기반 데이터 분산관리 서비스가 제공될 수 있도록 다차원데이터관리장치(100)로부터 분산관리 알고리즘에 따라 해당 분산관리절차의 수행이 요청되면, 분산관리절차에 해당하는 동작을 실질적으로 실행하여 다차원 데이터를 분산 및 저장하게 된다.When the multidimensional database 300 is requested by the multidimensional data management apparatus 100 to perform the corresponding distributed management procedure according to the distributed management algorithm so that the hash-based data distribution management service can be provided, To disperse and store the multidimensional data.

이와 관련하여, 도 2에는 해시 기반 데이터 분산관리 서비스가 제공될 수 있도록 다차원데이터관리장치(100)와 연동하여 동작하는 다차원 데이터베이스(300)의 일례가 도시되어 있다.In this regard, FIG. 2 shows an example of a multi-dimensional database 300 that operates in conjunction with the multi-dimensional data management apparatus 100 so that a hash-based data distribution management service can be provided.

도 2에 도시된 바와 같이, 본 발명의 다차원 데이터베이스(300)는, 적어도 하나의 인스턴스DB서버(0,1...n)에 의해 운영되며, 각 인스턴스DB서버(0,1...n)에는 어레이스토리지(AS0,1...n)가 할당되어 있어 분산관리절차의 실행에 따라 분산되는 다차원 데이터를 저장 및 관리할 수 있게 된다.2, the multidimensional database 300 of the present invention is operated by at least one instance DB server (0, 1 ... n), and each instance DB server (0, 1 ... n ) Are allocated to the array storages AS0,1 ... n, so that it is possible to store and manage the multi-dimensional data distributed according to the execution of the dispersion management procedure.

이러한 분산관리절차를 수행하기 위해서는 다차원 데이터가 분할되어 관리되어야 하므로, 어레이스토리지(AS0,1...n)에 대응하는 청크어레이(chunk array)(이하, 청크)가 할당되어 병렬화되게 된다.In order to perform such a distributed management procedure, multidimensional data must be divided and managed, so that chunk arrays (hereinafter referred to as chunks) corresponding to the array storages AS0,1 ... n are allocated and parallelized.

즉, 인스턴스DB서버(0,1...n)의 각 청크에는, 분산관리절차가 실행되는 과정에서 다차원 데이터가 분배되게 되며, 각 청크에 대응하는 해당 어레이스토리지(AS0,1...n)에는, 청크에 분배된 해당 다차원 데이터에 대한 분산관리절차(분배, 정렬, 변환)가 완료되어 최종적으로 생성되는 다차원 변환데이터가 저장되게 된다.In other words, multidimensional data is distributed to each chunk of the instance DB servers (0, 1 ... n) in the process of executing the distributed management procedure, and the corresponding array storage (AS0,1 ... n ), The multidimensional transform data to be finally generated is stored after the distribution management procedure (distribution, alignment, and transformation) for the corresponding multidimensional data distributed to the chunks is completed.

이에, 다차원 데이터베이스(300)는, 다차원데이터관리장치(100)와의 연동을 통해 분산관리절차에 해당하는 동작을 실질적으로 실행하여 다차원 데이터에 대한 병렬분산연산을 지원할 수 있게 된다.Accordingly, the multidimensional database 300 can support the parallel distributed operation on the multidimensional data by substantially executing the operation corresponding to the distribution management procedure through interlocking with the multidimensional data management apparatus 100. [

본 발명의 실시예에서는 다차원데이터관리장치(100)와 다차원 데이터베이스(300)가 연동하여 분산관리 알고리즘에 따른 분산관리절차가 수행되는 것으로 언급하였으나, 이에 한정되지 않으며, 다차원 데이터베이스(300) 내에서 다차원데이터관리장치(100)의 기능이 구현되는 경우에는 다차원 데이터베이스(300) 단독으로도 분산관리절차를 수행하여 서비스를 제공할 수도 있다.In the embodiment of the present invention, the multidimensional data management apparatus 100 and the multidimensional database 300 are interlocked with each other to perform the distribution management procedure according to the distribution management algorithm. However, the present invention is not limited to this, When the function of the data management apparatus 100 is implemented, the multidimensional database 300 alone may also provide a service by performing a distribution management procedure.

이하에서는, 도 3을 참조하여 본 발명의 실시예에 따른 다차원데이터관리장치의 구성을 구체적으로 설명하겠다.Hereinafter, the configuration of a multidimensional data management apparatus according to an embodiment of the present invention will be described in detail with reference to FIG.

도 3에 도시된 바와 같이, 본 발명에 따른 다차원데이터관리장치(100)는, 다차원 데이터에 포함되는 적어도 하나의 특정차원데이터 별 특정해시(hash)정보를 기반으로 적어도 하나의 청크(chunk)에 특정차원데이터를 분배하는 분배관리부(110), 청크 별로 분배된 특정차원데이터의 차원을 기반으로 각 청크에 분배된 특정차원데이터를 정렬하는 정렬관리부(120), 및 정렬된 특정차원데이터를 기 설정된 특정포맷으로 변환하여 해당 청크와 관련되는 인스턴스DB서버에 저장하는 저장관리부(130)를 포함하는 구성을 가질 수 있다.3, the multidimensional data management apparatus 100 according to the present invention includes at least one chunk based on hash information of at least one specific dimension data included in the multidimensional data, A sort management unit 120 for sorting specific dimension data distributed to each chunk on the basis of the dimension of specific dimension data distributed for each chunk, And a storage management unit 130 for converting the data into a predetermined format and storing the data in an instance DB server associated with the chunk.

또한, 본 발명에 따른 다차원데이터관리장치(100)는, 해시 기반 데이터 분산관리 서비스를 제공하기 위해 생성 및 송수신되는 모든 정보(예: 다차원 데이터, 해시함수, 분산관리수행결과 등)을 저장하고, 요청에 따라 제공하는 저장부(140)를 더 포함하는 구성을 가질 수 있다.In addition, the multidimensional data management apparatus 100 according to the present invention stores all information (e.g., multidimensional data, hash functions, distribution management results, and the like) generated and transmitted and received to provide a hash-based data distribution management service, And may further include a storage unit 140 provided on demand.

이상의 분배관리부(110), 정렬관리부(120), 저장관리부(130) 및 저장부(140)를 포함하는 다차원데이터관리장치(100)의 구성 전체 내지는 적어도 일부는, 프로세서에 의해 실행되는 소프트웨어 모듈 형태 또는 하드웨어 모듈 형태로 구현되거나, 소프트웨어 모듈과 하드웨어 모듈이 조합된 형태로도 구현될 수 있다.All or at least a part of the multidimensional data management apparatus 100 including the distribution management unit 110, the alignment management unit 120, the storage management unit 130 and the storage unit 140 may be a software module type Or a hardware module, or a combination of a software module and a hardware module.

결국, 본 발명의 실시예에 따른 다차원데이터관리장치(100)는, 해시 기법을 기반으로 다차원 데이터에 대한 최적화된 분산 관리가 수행되도록 함으로써, 기존과 달리 데이터 적재를 위해 불필요하게 소비되는 적재 시간 및 비용을 최소화하여 고속 데이터 로딩이 가능하도록 하는, 새로운 다차원 데이터 분산관리 방안을 제공하게 되는 데, 이하에서는 이를 위한 다차원데이터관리장치(100) 내 각 구성에 대해 구체적으로 설명하기로 한다.As a result, the multidimensional data management apparatus 100 according to the embodiment of the present invention performs optimized distributed management of multi-dimensional data based on the hash technique, The present invention provides a new multidimensional data distribution management method that enables high-speed data loading by minimizing the cost of the multidimensional data management apparatus 100. Hereinafter, each configuration in the multidimensional data management apparatus 100 will be described in detail.

분배관리부(110)는, 해시 기법을 기반으로 다차원 데이터를 분배한다.The distribution management unit 110 distributes the multidimensional data based on the hash technique.

보다 구체적으로, 분배관리부(110)는, 사용자가 다차원 데이터베이스(300)에 저장하려는 다차원 데이터를 선택하면, 다차원 데이터에 포함되는 적어도 하나의 특정차원데이터 별 특정해시정보를 기반으로 적어도 하나의 청크에 특정차원데이터를 분배한다.More specifically, when the user selects the multidimensional data to be stored in the multidimensional database 300, the distribution management unit 110 generates at least one chunk based on the specific hash information for each at least one specific dimensional data included in the multidimensional data, And distributes the specific dimension data to the target.

즉, 분배관리부(110)는, 특정해시함수를 기반으로 특정차원데이터의 차원 별 해시값을 산출한 결과에 기초하여 특정해시정보에 해당하는 최종해시값을 산출하고, 최종해시값을 기반으로 분배하려는 해당 청크를 확인하여 특정차원데이터를 분배한다.That is, the distribution management unit 110 calculates the final hash value corresponding to the specific hash value based on the result of calculating the dimensionally hash value of the specific dimension data based on the specific hash function, And distributes the specific dimension data by checking the corresponding chunk to be distributed.

여기서, 특정해시함수는, 특정차원데이터의 차원 별 해시값을 찾기 위한 것으로, 수학식 1과 같이 정의될 수 있다.Here, the specific hash function is used to find a hash value by dimension of specific dimension data, and can be defined as Equation (1).

[수학식 1][Equation 1]

여기서, H_i는, 각 차원(dimension) 마다의 해시값이며, 이때 모든 차원의 초기값 H₀은 "0"으로 초기화된다. H_d는, 차원의 수가 d일 때, 각 차원 별로 산출된 해시값을 기반으로 d차원에서 최종적으로 산출되는 최종해시값이다. Prime는, 중복되지 않는 해시값을 찾기 위한 소수(Prime Number)이며, V_i는, 다차원 데이터의 i번째 차원에서의 위치에 대응하는 위치 데이터값이다. Min_i는, 해당 차원에서의 최소 위치 데이터값이며, ChunkInterval은 i번째 차원의 청크범위이며, N은 다차원 데이터베이스의 인스턴스DB서버의 개수이다.

는, 비트 연산의 XOR이다.Here, H _i is a hash value for each dimension, and the initial value H ₀ of all dimensions is initialized to "0". H _d is the final hash value finally calculated in the d dimension based on the hash value calculated for each dimension when the number of dimensions is d. Prime is a prime number for finding non-redundant hash values, and V _i is a position data value corresponding to a position in the i-th dimension of the multidimensional data. Min _i is the minimum position data value in the corresponding dimension, ChunkInterval is the chunk range of the ith dimension, and N is the number of the instance DB servers in the multidimensional database.

Is the XOR of the bit operation.

이처럼 수학식 1을 이용하여 특정차원데이터의 차원 별 해시값의 산출이 완료되면, 최종해시값의 산출을 위해 모듈러 연산인 수학식 2를 이용하게 된다.When the calculation of the hash value for each dimension of specific dimension data is completed using Equation (1), a modular operation (Equation 2) is used to calculate the final hash value.

[수학식 2]&Quot; (2) "

여기서, H_d는, 차원의 수가 d일 때, 각 차원 별로 산출된 해시값을 기반으로 d차원에서 최종적으로 산출되는 최종해시값이며, N은 다차원 데이터베이스의 인스턴스DB서버의 개수이다.Here, H _d is the final hash value finally calculated in the d dimension based on the hash value calculated for each dimension when the number of dimensions is d, and N is the number of the instance DB servers of the multidimensional database.

이와 관련하여, 도 4에는 해시 기법을 기반으로 분배하려는 다차원 데이터의 일례가 도시되어 있다. 도 4에서는, 차원(Dimension)에 대한 정보만이 활용되므로 다차원 데이터에 대한 속성값을 별도로 표기하지 않았다.In this regard, FIG. 4 shows an example of multi-dimensional data to be distributed based on a hash technique. In FIG. 4, since only information on dimensions is utilized, attribute values for multidimensional data are not separately described.

도 4에 도시된 바와 같이, 사용자로부터 선택되는 다차원 데이터가 2차원(2-dimension)일 때, X축이 1차원이고, Y축이 2차원이며, 각 차원 별로 청크의 크기(ChunkInterval)가 "3"이고, 인스턴스DB서버의 개수(N)가 4개이며, 소수(Prime Number)가 "991"이며, 차원 별 최소 위치 데이터값은 (0,0)인 경우, 다차원 데이터에 포함되는 적어도 하나의 특정차원데이터(D00-D55) 별 특정해시정보를 기반으로 적어도 하나의 청크에 특정차원데이터를 분배하는 과정에 대하여 설명하도록 하겠다.4, when the multidimensional data selected by the user is two-dimensional, the X-axis is one-dimensional, the Y-axis is two-dimensional, and the chunk size (ChunkInterval) 3 ", the number N of the instance DB servers is 4, the prime number is" 991 ", and the minimum position data value per dimension is (0, 0), at least one A description will be given of a process of distributing specific dimension data to at least one chunk based on specific hash information per specific dimension data (D00-D55) of FIG.

분배관리부(110)는, 먼저 다차원 데이터에 포함되는 특정차원데이터(D00-D55)의 차원 별 해시값을 찾기 위한 과정을 수행한다.The distribution management unit 110 first performs a process for finding a hash value for each dimension of the specific dimension data D00-D55 included in the multidimensional data.

즉, 분배관리부(110)는, 도 5에 도시된 바와 같이, 특정차원데이터(D00)의 1차원(X축)에서의 해시값(H₁)을 산출하기 위한 제1 파라미터, 즉 모든 차원의 초기값인 H₀="0", 1차원(X축)에서의 위치에 대응하는 위치값인 V₁ ="0", 1차원(X축)에서의 최소 위치 데이터값인 Min₁="0", 1차원(X축)에서의 청크의 크기인 ChunkInterval="3"를 확인한다.5, the distribution manager 110 stores a first parameter for calculating a hash value (H ₁ ) in one dimension (X axis) of the specific dimension data D00, that is, the initial value of H ₀ = "0", the one-dimensional position value corresponding to a position in the (X axis) V ₁ = "0", the Min ₁ minimum position data value in the first dimension (X-axis) = "0 Quot ;, and ChunkInterval = "3 ", which is the size of the chunk in one dimension (X axis).

이후, 분배관리부(110)는, 수학식 1에 제1 파라미터를 적용하여 특정차원데이터(D00)의 1차원(X축)에서의 해시값(H₁)을 아래와 같이 산출하게 된다.Then, the distribution management unit 110 calculates the hash value (H ₁ ) in one dimension (X axis) of the specific dimension data D00 by applying the first parameter to Equation ( ₁ ) as follows.

이처럼 해시값(H₁)의 산출이 완료되면, 분배관리부(110)는, 특정차원데이터(D00)의 2차원(Y축)에서의 해시값(H₂)을 산출하기 위한 제2 파라미터, 즉 1차원(X축)에서의 해시값인 H₁="0", 소수(Prime Number)="991", 2차원(Y축)에서의 위치에 대응하는 위치값인 V₂ ="0", 2차원(Y축)에서의 최소 위치 데이터값인 Min₂="0", 2차원(Y축)에서의 청크의 크기인 ChunkInterval="3"를 확인한다.When the calculation of the hash value H ₁ is completed as described above, the distribution management unit 110 calculates the second parameter for calculating the hash value (H ₂ ) in two dimensions (Y axis) of the specific dimension data D00, one-dimensional hash values at (X-axis) H ₁ = "0", a small number of V ₂ position value corresponding to the position on the (Prime Number) = "991", 2-D (Y-axis) = "0", Min ₂ = "0", which is the minimum position data value in the two-dimensional (Y-axis), and ChunkInterval = "3", the size of the chunk in the two-dimensional (Y-axis).

이후, 분배관리부(110)는, 수학식 1에 제2 파라미터를 적용하여 특정차원데이터(D00)의 2차원(Y축)에서의 해시값(H₂)을 아래와 같이 산출하게 된다.Then, the distribution management unit 110 calculates the hash value (H ₂ ) in two dimensions (Y axis) of the specific dimension data D00 by applying the second parameter to Equation (1) as follows.

결국, 제2 파라미터에는, 금번 해시값(H₂)을 산출하려는 2차원(Y축)보다 먼저 해시값이 산출된 1차원(X축)의 해시값(H₁), 소수(Prime Number), 2차원(Y축)에서의 위치에 대응하는 위치값(V₂), 2차원(Y축)의 청크범위(ChunkInterval), 2차원(Y축)에서의 최소 위치 데이터값(Min₂) 중 적어도 하나가 포함되게 된다.After all, the second parameter, geumbeon hash value than the first, and the hash value calculated one-dimensional hash value of (X-axis) (H _1), small number of two-dimensional (Y-axis) to calculate the (H ₂₎ (Prime Number), At least one of a position value V ₂ corresponding to a position in two dimensions (Y axis), a chunk range (ChunkInterval) in two dimensions (Y axis), and a minimum position data value Min ₂ in two dimensions (Y axis) One is included.

이에, 특정해시함수(H_i)에 제2 파라미터가 적용되게 되면, 특정해시함수(H₂)는, 특정차원데이터(D00)의 1차원(X축)에서의 해시값(H₁)과 소수(Prime Number)를 곱한 제1 결과값(

)과, 2차원(Y축)에서의 위치에 대응하는 위치값(V₂)과 2차원(Y축)에서의 최소 위치 데이터값(Min₂)을 청크범위(ChunkInterval)로 나눗셈한 제2 결과값(

)에 대한 배타적논리합(XOR)에 대응하게 된다. Thus, when the second parameter is applied to the specific solution function H _i , the specific solution function H ₂ is set to have the hash value H ₁ in one dimension (X axis) of the specific dimension data D00, And a first result (Prime Number)

A second result obtained by dividing the position value V ₂ corresponding to the position in the two-dimensional (Y axis) and the minimum position data value Min ₂ in the two-dimensional (Y axis) by the chunk range (ChunkInterval) value(

(XOR) to the exclusive logical sum (XOR).

즉, 특정차원데이터(D00)에 포함되는 2개의 차원(X축, Y축) 중 금번 해시값을 산출하려는 2차원(Y축)의 해시값(H₂)의 경우, 먼저 해시값(H₁)이 산출된 1차원(X축)의 해시값을 반영하게 되므로, 후술하게 될 모듈러 연산인 수학식 2의 파라미터가 2차원(Y축)의 해시값(H₂)만이 적용되어 특정차원데이터(D00)에 대한 최종해시값이 산출되게 되더라도, 특정차원데이터(D00) 별 모든 차원, 즉 (X축, Y축)이 반영되어 청크를 결정할 수 있게 된다.That is, if the second hash value (H ₂₎ of the two-dimensional (Y-axis) to calculate a geumbeon hash value of the two-dimensional (X-axis, Y-axis) included in the specific-dimensional data (D00), first, a hash value (H ₁ (Y-axis) hash value H ₂ is applied to the parameter of Equation (2), which is a modular operation to be described later, so that the specific dimension data D00), all the dimensions of the specific dimension data D00, that is, the (X-axis, Y-axis) are reflected and the chunk can be determined.

이처럼 특정차원데이터(D00)의 차원 별 해시값(H₁, H₂)의 산출이 완료되면, 분배관리부(110)는, 특정차원데이터(D00)의 최종해시값의 산출을 위해 모듈러 연산인 수학식 2를 이용한다.After the calculation of the hash values H ₁ and H _{2 for each} dimension of the specific dimension data D00 is completed, the distribution management unit 110 calculates the final hash value of the specific dimension data D00 (2) "

즉, 분배관리부(110)는, 특정차원데이터(D00)의 최종해시값을 산출하기 위한 제3 파라미터, 즉 특정차원데이터(D00)의 2차원(Y축)에서의 해시값(H₂), 다차원 데이터를 관리하는 인스턴스DB서버의 전체개수인 N="4"를 확인한다.That is, the distribution management unit 110 stores the third parameter for calculating the final hash value of the specific dimension data D00, that is, the hash value (H ₂ ) in the two-dimensional (Y axis) And N = "4 ", which is the total number of the instance DB servers managing the multidimensional data.

이후, 분배관리부(110)는, 수학식 2에 제3 파라미터를 적용하여 특정차원데이터(D00)의 최종해시값을 아래와 같이 산출하게 된다.Thereafter, the distribution management unit 110 calculates the final solution value of the specific dimension data D00 by applying the third parameter to Equation (2) as follows.

결국, 최종해시값은, 1차원(X축)에서의 해시값(H₁)을 이용하여 2차원(Y축)에서의 해시값(H₂)이 산출되면, 2차원(Y축)에서의 해시값(H₂)과 인스턴스DB서버의 전체개수(N)를 기반으로 모듈러 연산을 수행하여 산출되게 된다.In the end, the final solution when the value is one-dimensional if using a hash value (H ₁₎ in the (X axis), a two-dimensional hash value in the (Y-axis) (H ₂₎ is output, in 2-D (Y-axis) The hash value (H ₂ ) of the instance DB server and the total number (N) of the instance DB servers.

이처럼 최종해시값의 산출이 완료되면, 분배관리부(110)는, 최종해시값과 매칭되는 청크번호를 갖는 해당 청크에 최종해시값과 관련되는 특정차원데이터(D00)를 분배하게 된다.When the calculation of the final solution value is completed, the distribution management unit 110 distributes the specific dimension data D00 related to the final solution value to the corresponding chunk having the chunk number matched with the final solution value.

이때, 최종해시값의 크기(0,1,2...)와 청크를 구분하는 청크번호(0,1,2...)는 미리 매칭되어 있는 것이 바람직할 것이다. 예를 들어, 최종해시값의 크기가 "0"이면, 청크번호도 "0"으로 매칭되며, 최종해시값의 크기가 "1"이면, 청크번호도 "1"으로 매칭되도록 하는 것이다.At this time, it is preferable that the sizes (0, 1, 2 ...) of the final hash value and the chunk numbers (0, 1, 2, ...) for distinguishing the chunks are matched in advance. For example, if the size of the final hash value is "0 ", the chunk number is also matched with" 0 ", and if the size of the final hash value is "1 ", the chunk number is also matched with" 1 ".

이에, 특정차원데이터(D00)의 최종해시값이 "0"으로 산출된다면, 최종해시값의 크기에 매칭되는 청크번호 "0"을 갖는 해당 청크를 검출하여 특정차원데이터(D00)를 분배할 수 있게 되는 것이다.If the final hash value of the specific dimension data D00 is calculated as "0 ", the corresponding chunk having the chunk number" 0 "matching the size of the final hash value is detected and the specific dimension data D00 is distributed It will be possible to do.

전술한 특정차원데이터(D00)의 차원 별 해시값을 기반으로 최종해시값을 산출하여 해당 청크를 검출한 후 특정차원데이터(D00)를 해당 청크에 분배하는 과정을 도 5 및 도 6에 도시된 나머지 특정차원데이터(D01-D55) 별로 동일하게 수행하게 된다면, 도 7과 같이, 특정차원데이터(D00, D01, D02, D10, D11, D12, D20, D21, D22)의 최종해시값은 "0"으로 산출되므로, 청크번호 "0"을 갖는 청크(Chunk0)에 분배가 이루어 질 것이며, 특정차원데이터(D03, D04, D05, D13, D14, D15, D23, D24, D25)의 최종해시값은 "1"으로 산출되므로, 청크번호 "1"을 갖는 청크(Chunk1)에 분배가 이루어 질 것이며, 특정차원데이터(D33, D34, D35, D43, D44, D45, D53, D54, D55)의 최종해시값은 "2"으로 산출되므로, 청크번호 "2"을 갖는 청크(Chunk2)에 분배가 이루어 질 것이며, 특정차원데이터(D30, D31, D32, D40, D41, D42, D50, D51, D52)의 최종해시값은 "3"으로 산출되므로, 청크번호 "3"을 갖는 청크(Chunk3)에 분배가 이루어 질 것이다.The process of calculating the final solution value based on the dimension-specific hash value of the specific dimension data D00 and distributing the specific dimension data D00 to the corresponding chunk after detecting the corresponding chunk is shown in FIGS. 5 and 6 The final hash value of the specific dimension data D00, D01, D02, D10, D11, D12, D20, D21, and D22 is calculated as shown in FIG. 7, (D03, D04, D05, D13, D14, D15, D23, D24, and D25) will be distributed to the chunk having the chunk number "0 " D34, D35, D43, D44, D45, D53, D54, and D55 will be distributed to the chunk having the chunk number "1 " D30, D31, D32, D40, D41, D42, D50, D51 (D30, D31, D32, D40, D51) will be distributed to chunks , D52) Jonghae when values are calculated from the "3", will be distributed is made of a chunk (Chunk3) having a chunk number "3".

이후, 분배관리부(110)는, 다차원 데이터에 포함되는 모든 특정차원데이터(D00-D55)에 대한 해당 청크로의 분배가 완료되었음을 정렬관리부(120)로 알린다.Thereafter, the distribution management unit 110 notifies the sort management unit 120 that distribution of all the specific dimension data D00-D55 included in the multidimensional data to the corresponding chunks is completed.

정렬관리부(120)는, 청크 별로 분배된 특정차원데이터를 정렬한다.The alignment management unit 120 arranges specific dimension data distributed by chunks.

보다 구체적으로, 정렬관리부(120)는, 청크 별로 분배된 특정차원데이터의 차원을 기반으로 각 청크에 분배된 특정차원데이터를 정렬한다.More specifically, the alignment management unit 120 arranges the specific dimension data distributed to each chunk based on the dimension of the specific dimension data distributed for each chunk.

즉, 정렬관리부(120)는, 기 설정된 차원관리방식을 기반으로 청크 별로 분배된 특정차원데이터의 차원 별 정렬순서를 결정하고, 정렬순서가 가장 빠른 해당 차원의 위치정보값을 정렬하는 동시에 다음 순서의 해당 차원의 위치정보값을 정렬한다.That is, the alignment management unit 120 determines the sorting order of the specific dimension data divided by chunks based on the preset dimension management method, aligns the position information values of the corresponding dimension with the fastest sorting order, The position information value of the corresponding dimension of the "

여기서, 기 설정된 차원관리방식이라 함은, 다차원 데이터베이스(300)의 각 인스턴스DB서버(0,1...n)가 차원을 관리하기 위해 미리 설정한 방식일 수 있다.Here, the predetermined dimension management method may be a method that each of the instance DB servers (0, 1, ..., n) of the multidimensional database 300 sets in advance for managing the dimension.

즉, 기 설정된 차원관리방식이, 예를 들어 다차원 데이터가 2차원(2-dimension)일 때 1차원(X축)과 2차원(Y축) 중 1차원(X축)보다 2차원(Y축)의 정렬순서가 선행하도록 하는 방식이면, 정렬순서가 가장 빠른 2차원(Y축)을 기준으로 2차원(Y축)의 위치정보값에 대한 정렬을 수행하는 동시에 다음 순서의 1차원(X축)을 기준으로 1차원(X축)의 위치정보값을 정렬하는 것일 수 있다.That is, the predetermined dimension management method is a method that is two-dimensional (Y-axis) than one dimension (X-axis) of one dimension (X-axis) and two dimensions (Y-axis) when multidimensional data is two- (Y-axis) position information values on the basis of the two-dimensional (Y-axis) that has the fastest sorting order, and at the same time, one-dimensional (X-axis (X-axis) positional information based on the position information of the one-dimensional (X-axis).

이하에서는, 설명의 편의를 위해 도 7에 도시된 청크(Chunk0,1,2,3) 중 청크(Chunk0)와 앞서 언급한 기 설정된 차원관리방식에 대한 일례를 이용하여 설명을 이어가도록 하겠다.Hereinafter, for convenience of description, description will be made using a chunk of chunks (Chunk 0, 1, 2, 3) shown in FIG. 7 and an example of the previously set dimension management method.

도 7에 도시된 바와 같이, 청크(Chunk0)에는 특정차원데이터(D00, D01, D02, D10, D11, D12, D20, D21, D22)가 할당되게 되므로, 정렬관리부(120)는, 청크(Chunk0)에 대응하는 인스턴스DB서버0의 기 설정된 차원관리방식을 확인한다. 7, since the specific dimension data D00, D01, D02, D10, D11, D12, D20, D21 and D22 are allocated to the chunk 0, ) Of the instance DB server 0 corresponding to the predetermined dimension management method.

이후, 정렬관리부(120)는, 기 설정된 차원관리방식이 다차원 데이터의 2개의 차원(X축, Y축) 중 1차원(X축)보다 2차원(Y축)의 정렬순서가 선행되도록 하는 방식이므로, 정렬순서가 가장 빠른 2차원(Y축)을 기준으로 2차원(Y축)의 위치정보값에 대한 정렬을 수행한 후 다음 순서의 1차원(X축)을 기준으로 1차원(X축)의 위치정보값을 정렬해야 함을 확인할 수 있게 된다.Thereafter, the alignment management unit 120 determines whether or not the predetermined dimension management method is a method in which a two-dimensional (Y-axis) sorting order precedes one dimension (X axis) of two dimensions (X axis and Y axis) (Y axis), the position information values of the two dimensions (Y axis) are aligned with respect to the two-dimensional (Y axis) It is possible to confirm that the positional information value of the position information is aligned.

이처럼 기 설정된 차원관리방식의 확인이 완료되면, 정렬관리부(120)는, 청크(Chunk0) 내 특정차원데이터(D00, D01, D02, D10, D11, D12, D20, D21, D22)의 위치정보를 확인한다.After the confirmation of the predetermined dimension management method is completed, the alignment management unit 120 stores position information of specific dimension data D00, D01, D02, D10, D11, D12, D20, D21, D22 in the chunk Check.

즉, 정렬관리부(120)는, 특정차원데이터(D00)의 1차원(X축)에서의 위치에 대응하는 위치값인 V₁ ="0"과 2차원(Y축)에서의 위치에 대응하는 위치값인 V₂ ="0"을 이용하여 특정차원데이터(D00)의 위치정보 (0,0)을 확인할 수 있게 된다. 이어서, 정렬관리부(120)는, 동일한 방식으로 나머지 특정차원데이터(D01, D02, D10, D11, D12, D20, D21, D22)의 위치정보를 (0,1), (0,2), (1,0), (1,1), (1,2), (2,0), (2,1), (2,2)으로 확인하게 된다. That is, the alignment management unit 120, corresponding to the position of the position values of V ₁ = "0" and two-dimensional (Y-axis) corresponding to the position in the first dimension (X-axis) of a particular dimension data (D00) The position information (0, 0) of the specific dimension data D00 can be confirmed by using the position value V ₂ = "0". The alignment management unit 120 stores the position information of the remaining specific dimension data D01, D02, D10, D11, D12, D20, D21 and D22 in the same manner as (0,1), 1,0), (1,1), (1,2), (2,0), (2,1), and (2,2).

이후, 정렬관리부(120)는, 정렬순서가 가장 빠른 2차원(Y축)을 기준으로 2차원(Y축)의 위치정보값이 가장 작은 "0"을 순서대로 정렬하는 동시에 다음 순서의 1차원(X축)을 기준으로 1차원(X축)의 위치정보값을 정렬한 결과인 (0,0), (1,0), (2,0)를 청크(Chunk0)에 배치하고, 다음 크기의 2차원(Y축)의 위치정보값 "1"및 "2"역시 동일한 방식으로 정렬한 결과인 (0,1), (1,1), (2,1)과 (0,2), (1,2), (2,2)를 청크(Chunk0)에 배치, 즉 청크(Chunk0)에 포함되는 특정차원데이터(D00, D01, D02, D10, D11, D12, D20, D21, D22)의 배치 구조를 변경하여 정렬청크(Chunk0')를 생성한다.Thereafter, the alignment management unit 120 sorts "0" having the smallest position information value of the two-dimensional (Y axis) on the basis of the two-dimensional (Y axis) (0,0), (1,0), and (2,0) obtained by aligning one-dimensional (X-axis) positional information values on the basis of the X-axis (X-axis) (0, 1), (1,1), (2,1), and (0,2), which are the results obtained by aligning the position information values "1" D2, D10, D11, D12, D20, D21, and D22 included in the chunk (Chunk0) are arranged in the chunk (Chunk0) Change the layout structure to create an alignment chunk (Chunk0 ').

전술에 따라 나머지 청크(Chunk1,2,3) 별로 분배된 특정차원데이터의 차원을 기반으로 각 청크에 분배된 특정차원데이터의 정렬이 완료되면, 나머지 청크(Chunk1,2,3)에 대응하는 정렬청크(Chunk1',2',3')가 생성되게 된다.When the alignment of the specific dimension data distributed to each chunk is completed based on the dimension of the specific dimension data distributed for the remaining chunks (Chunks 1, 2, 3) in the above-described manner, the alignment corresponding to the remaining chunks Chunks (Chunk 1 ', 2', 3 ') are generated.

이러한 본 발명의 정렬방식은, 동기화 방식으로서, 대용량의 다차원 데이터를 정렬하기 때문에 외부정렬(External-Merge Sort) 방식 등을 이용할 수 있으며, 다차원 데이터의 수량에 따라 적합한 정렬 알고리즘을 활용할 수도 있다.According to the alignment method of the present invention, an external-merge sort method or the like can be used in order to align a large-capacity multidimensional data as a synchronization method, and an appropriate sorting algorithm may be used according to the number of multidimensional data.

한편, 본 발명의 실시예에서는, 동기화 방식을 기반으로 대용량의 다차원 데이터를 정렬하는 것으로 언급하였으나, 이에 한정되지 않으며, 해시기법에 의해 다차원 데이터의 분배가 진행되는 동안에 지속적으로 데이터를 병합 및 정렬하게 되는 삽입정렬법(insertion sort), 버킷정렬(bucket sort) 등과 같은 비동기 방식을 기반으로도 정렬될 수도 있음은 물론이다.Meanwhile, in the embodiment of the present invention, the multidimensional data of a large capacity is sorted based on the synchronization method. However, the present invention is not limited to this, and the data may be continuously merged and sorted while the distribution of the multidimensional data is progressed by the hash technique Of course, be based on asynchronous methods such as insertion sorting, bucket sorting, and so on.

저장관리부(130)는, 기 설정된 특정포맷으로 정렬된 특정차원데이터를 변환하여 저장한다.The storage management unit 130 converts and stores specific dimension data arranged in a predetermined format.

보다 구체적으로, 저장관리부(130)는, 정렬관리부(120)로부터 청크 별로 분배된 특정차원데이터에 대한 정렬이 완료되면, 정렬된 특정차원데이터를 기 설정된 특정포맷으로 변환하여 해당 청크와 관련되는 인스턴스DB서버에 저장한다.More specifically, the storage management unit 130 converts the sorted specific dimension data into a predetermined specific format when the alignment of the specific dimension data distributed by the chunks is completed from the alignment management unit 120, DB server.

즉, 저장관리부(130)는, 기 설정된 특정포맷을 기반으로 정렬된 특정차원데이터를 변환하여 특정차원변환데이터를 생성하고, 특정차원변환데이터를 해당 청크와 관련되는 인스턴스DB서버에 저장한다.That is, the storage management unit 130 converts specific dimension data sorted based on the predetermined format, generates specific dimension conversion data, and stores the specific dimension conversion data in the instance DB server associated with the chunk.

여기서, 기 설정된 특정포맷이라 함은, 다차원 데이터베이스(300)의 각 인스턴스DB서버(0,1...n)에서 다차원 데이터를 관리하려는 저장형식일 수 있다.Here, the preset specific format may be a storage format for managing multidimensional data in each instance DB server (0, 1 ... n) of the multidimensional database 300.

예를 들어, 기 설정된 특정포맷은, 각 인스턴스DB서버(0,1...n)에 할당된 어레이스토리지(AS0,1...n)의 모든 셀에 값이 존재하는 Dense Array Format과, 어레이스토리지(AS0,1...n)의 일부 몇 개의 셀에만 값이 산개되어 드문드문 존재하게 되는 Sparse Array Format 등을 포함할 수 있다.For example, the predetermined format has a Dense Array Format in which values are present in all the cells of the array storage (AS0,1 ... n) allocated to each instance DB server (0,1 ... n) A Sparse Array Format in which values are scattered only in a few cells of the array arrays AS0,1, ..., n, and is sparsely present.

이하에서는, 설명의 편의를 위해 도 7의 청크(Chunk0,1,2,3)와 기 설정된 특정포맷이 Dense Array Format인 것으로 언급하여 설명을 이어가도록 하겠다.Hereinafter, for convenience of description, it will be described that the chunks (Chunk 0, 1, 2, 3) of FIG. 7 and the predetermined specific format are Dense Array Format.

저장관리부(130)는, 정렬관리부(120)로부터 청크(Chunk0,1,2,3)에 대응하는 정렬청크(Chunk0',1',2',3')의 생성이 완료되면, 정렬청크(Chunk0',1',2',3')에 저장된 데이터를 기 설정된 특정포맷인 Dense Array Format을 기반으로 변환하여 어레이스토리지(AS0,1,2,3)의 셀에 대응하는 셀값을 할당하여 특정차원변환데이터를 생성한다.When the alignment management unit 130 completes the generation of the alignment chunks (Chunk0 ', 1', 2 ', 3') corresponding to the chunks (Chunk0,1,2,3) from the alignment management unit 120, The cell values corresponding to the cells of the array storage AS0,1,2,3 are allocated based on the Dense Array Format, which is a predetermined format, Dimensional transformation data.

이후, 저장관리부(130)는, 청크(Chunk0,1,2,3)에 대응하는 어레이스토리지(AS0,1,2,3)를 확인하여 해당하는 인스턴스DB서버(0,1,2,3)를 검출한다. 이어서, 저장관리부(130)는, 청크(Chunk0,1,2,3) 별로 생성된 특정차원변환데이터를 해당 청크(Chunk0,1,2,3)와 관련되는 인스턴스DB서버(0,1,2,3)에 저장한다.The storage management unit 130 then checks the array storages AS0,1,2,3 corresponding to the chunks (Chunk0,1,2,3) and stores them in the corresponding instance DB servers (0,1,2,3) . The storage management unit 130 stores the specific dimension conversion data generated for each chunk (Chunk 0, 1, 2, 3) into an instance DB server 0, 1, 2 , 3).

전술에 따라 특정차원변환데이터에 대한 저장이 완료되면, 저장관리부(130)는, 모든 분산관리절차의 수행이 완료된 것이므로, 다차원 데이터에 대한 분산관리의 수행이 완료되었음을 알리기 위해 분산관리수행결과를 생성하고, 이를 사용자단말(200)로 전달한다.When the storage of the specific dimension conversion data is completed according to the above description, the storage management unit 130 completes the execution of all the distributed management procedures. Therefore, in order to notify that the distributed management of the multidimensional data is completed, And transmits it to the user terminal 200.

한편, 도 8과 같이 청크(Chunk0,1,2,3)와 인스턴스DB서버(0,1)가 매칭되지 않는 경우에는, 각 인스턴스DB서버(0,1)에 적어도 2개의 어레이스토리지를 할당하여 적재를 수행할 수도 있다. On the other hand, when the chunks (Chunk0,1,2,3) and the instance DB server (0,1) do not match as shown in FIG. 8, at least two array storages are allocated to each instance DB server Loading can also be performed.

예를 들어, 인스턴스DB서버(0)에 어레이스토리지(AS0,2)가 할당되도록 하고, 인스턴스DB서버(1)에 어레이스토리지(AS1,3)가 할당되도록 하면, 어레이스토리지(AS0,2)에는 청크(0,2)가 대응되며, 어레이스토리지(AS1,3)에는 청크(1,3)가 되응되게 되므로, 청크(Chunk0,1,2,3) 별로 생성된 특정차원변환데이터를 해당 특정차원변환데이터를 해당 청크와 관련되는 인스턴스DB서버(0,1)에 저장할 수 있게 된다.For example, when the array storage systems AS0 and AS2 are allocated to the instance DB server 0 and the array storage systems AS1 and AS3 are allocated to the instance DB server 1, Chunks (0, 2) correspond to the chunks (1, 2, 3), and the chunks (1,3) The converted data can be stored in the instance DB server (0, 1) associated with the chunk.

이상에서 설명한 바와 같이, 본 발명의 다차원데이터관리장치(100)는, 해시 값을 기반으로 다차원 데이터를 각 인스턴스DB서버에 직접 맵핑하여 적재하게 되므로, 기존 다차원 데이터 저장 방식과 달리 다차원 데이터를 재배치 과정을 근본적으로 차단할 수 있어 적재로 인한 오버헤드를 감소시킬 수 있게 되며, 그에 따라 전반적인 적재 비용 및 시간을 최소화할 수 있게 되는 것입니다.As described above, since the multidimensional data management apparatus 100 of the present invention maps and loads the multidimensional data directly to each instance DB server based on the hash value, unlike the existing multidimensional data storage method, Can be fundamentally blocked, reducing the overhead associated with stacking, thereby minimizing the overall loading cost and time.

따라서, 본 발명의 다차원데이터관리장치(100)는, 해시 기법을 기반으로 다차원 데이터에 대한 최적화된 분산 관리가 수행되도록 함으로써, 기존과 달리 데이터 적재를 위해 불필요하게 소비되는 적재 시간 및 비용을 최소화하여 고속 데이터 로딩이 가능하도록 하는, 새로운 다차원 데이터 분산관리 방안을 제공할 수 있는 효과를 도출한다.Therefore, the multidimensional data management apparatus 100 of the present invention performs optimized distributed management of multidimensional data based on the hash technique, thereby minimizing the time and cost of the unnecessary consumption for data loading unlike the conventional method And to provide a new multidimensional data distribution management method capable of high speed data loading.

이하에서는, 도 9를 참조하여 본 발명의 실시예에 따른 해시 기반 데이터 분산관리 서비스를 제공하는 흐름을 구체적으로 설명하도록 한다. 이하에서는 설명의 편의를 위해, 전술의 도 1 내지 도 8에서 언급한 참조번호를 언급하여 설명하도록 하겠다.Hereinafter, a flow of providing a hash-based data distribution management service according to an embodiment of the present invention will be described in detail with reference to FIG. Hereinafter, for convenience of explanation, reference will be made to the reference numerals mentioned in the above-mentioned Figs. 1 to 8.

다차원데이터관리장치(100)는, 사용자가 다차원 데이터베이스(300)에 저장하려는 다차원 데이터를 선택하면(S100), 다차원 데이터에 포함되는 적어도 하나의 특정차원데이터 별 특정해시정보를 기반으로 적어도 하나의 청크에 특정차원데이터를 분배한다(S110-S140).When the user selects the multidimensional data to be stored in the multidimensional database 300 (S100), the multidimensional data management apparatus 100 selects at least one of the multidimensional data Specific dimension data is distributed to the chunks (S110 - S140).

즉, 다차원데이터관리장치(100)는, 특정해시함수를 기반으로 특정차원데이터의 차원 별 해시값을 산출한 결과에 기초하여 특정해시정보에 해당하는 최종해시값을 산출하고, 최종해시값을 기반으로 분배하려는 해당 청크를 확인하여 특정차원데이터를 분배한다.That is, the multidimensional data management apparatus 100 calculates the final hash value corresponding to the specific hash information based on the result of calculating the hash value for each dimension of the specific dimension data based on the specific hash function, The specific chunk data to be distributed based on the time value is identified and the specific dimension data is distributed.

[수학식 1][Equation 1]

Is the XOR of the bit operation.

[수학식 2]&Quot; (2) "

다차원데이터관리장치(100)는, 먼저 다차원 데이터에 포함되는 특정차원데이터(D00-D55)의 차원 별 해시값을 찾기 위한 과정을 수행한다.The multidimensional data management apparatus 100 first performs a process for finding a hash value for each dimension of the specific dimension data D00-D55 included in the multidimensional data.

즉, 다차원데이터관리장치(100)는, 도 5에 도시된 바와 같이, 특정차원데이터(D00)의 1차원(X축)에서의 해시값(H₁)을 산출하기 위한 제1 파라미터, 즉 모든 차원의 초기값인 H₀="0", 1차원(X축)에서의 위치에 대응하는 위치값인 V₁ ="0", 1차원(X축)에서의 최소 위치 데이터값인 Min₁="0", 1차원(X축)에서의 청크의 크기인 ChunkInterval="3"를 확인한다.That is, as shown in FIG. 5, the multidimensional data management apparatus 100 has a first parameter for calculating a hash value (H ₁ ) in one dimension (X axis) of the specific dimension data D00, that is, the initial value of the dimension H ₀ = "0", the one-dimensional position value corresponding to a position in the (X-axis) V ₁ = "0", the Min ₁ minimum position data value in the first dimension (X-axis) = Quot; 0 ", and ChunkInterval = "3 ", which is the size of the chunk in one dimension (X axis).

이후, 다차원데이터관리장치(100)는, 수학식 1에 제1 파라미터를 적용하여 특정차원데이터(D00)의 1차원(X축)에서의 해시값(H₁)을 아래와 같이 산출하게 된다.Thereafter, the multidimensional data management apparatus 100 calculates the hash value (H ₁ ) in one dimension (X axis) of the specific dimension data D00 by applying the first parameter to Equation ( ₁ ) as follows.

이처럼 해시값(H₁)의 산출이 완료되면, 다차원데이터관리장치(100)는, 특정차원데이터(D00)의 2차원(Y축)에서의 해시값(H₂)을 산출하기 위한 제2 파라미터, 즉 1차원(X축)에서의 해시값인 H₁="0", 소수(Prime Number)="991", 2차원(Y축)에서의 위치에 대응하는 위치값인 V₂ ="0", 2차원(Y축)에서의 최소 위치 데이터값인 Min₂="0", 2차원(Y축)에서의 청크의 크기인 ChunkInterval="3"를 확인한다.When the calculation of the hash value H ₁ is completed as described above, the multidimensional data management apparatus 100 calculates the second parameter for calculating the hash value (H ₂ ) in two dimensions (Y axis) of the specific dimension data D00 H ₁ = "0", a prime number = "991", a position value corresponding to a position in the two-dimensional (Y-axis), V ₂ = "0 Min ₂ = "0", which is the minimum position data value in two dimensions (Y axis), and ChunkInterval = "3", which is the size of the chunk in two dimensions (Y axis).

이후, 다차원데이터관리장치(100)는, 수학식 1에 제2 파라미터를 적용하여 특정차원데이터(D00)의 2차원(Y축)에서의 해시값(H₂)을 아래와 같이 산출하게 된다.Thereafter, the multidimensional data management apparatus 100 calculates the hash value (H ₂ ) in two dimensions (Y axis) of the specific dimension data D00 by applying the second parameter to Equation (1) as follows.

(XOR) to the exclusive logical sum (XOR).

이처럼 특정차원데이터(D00)의 차원 별 해시값(H₁, H₂)의 산출이 완료되면, 다차원데이터관리장치(100)는, 특정차원데이터(D00)의 최종해시값의 산출을 위해 모듈러 연산인 수학식 2를 이용한다.When the calculation of the hash values H ₁ and H _{2 for each} dimension of the specific dimension data D00 is completed, the multidimensional data management apparatus 100 calculates the final hash value of the specific dimension data D00, (2) "

즉, 다차원데이터관리장치(100)는, 특정차원데이터(D00)의 최종해시값을 산출하기 위한 제3 파라미터, 즉 특정차원데이터(D00)의 2차원(Y축)에서의 해시값(H₂), 다차원 데이터를 관리하는 인스턴스DB서버의 전체개수인 N="4"를 확인한다.That is, the multidimensional data management apparatus 100 has the third parameter for calculating the final hash value of the specific dimension data D00, that is, the hash value H (Y) of the specific dimension data D00 in the two-dimensional ₂ ) and N = "4 ", which is the total number of the instance DB servers managing the multidimensional data.

이후, 다차원데이터관리장치(100)는, 수학식 2에 제3 파라미터를 적용하여 특정차원데이터(D00)의 최종해시값을 아래와 같이 산출하게 된다.Thereafter, the multidimensional data management apparatus 100 calculates the final solution value of the specific dimension data D00 by applying the third parameter to Equation (2) as follows.

이처럼 최종해시값의 산출이 완료되면, 다차원데이터관리장치(100)는, 최종해시값과 매칭되는 청크번호를 갖는 해당 청크에 최종해시값과 관련되는 특정차원데이터(D00)를 분배하게 된다.When the calculation of the final solution value is completed, the multi-dimensional data management device 100 distributes the specific dimension data D00 related to the final solution value to the corresponding chunk having the chunk number matched with the final solution value do.

이후, 다차원데이터관리장치(100)는, 기 설정된 차원관리방식을 기반으로 청크 별로 분배된 특정차원데이터의 차원 별 정렬순서를 결정하고, 정렬순서가 가장 빠른 해당 차원의 위치정보값을 정렬하는 동시에 다음 순서의 해당 차원의 위치정보값을 정렬한다(S150).Thereafter, the multidimensional data management apparatus 100 determines the sorting order of the specific dimension data distributed by chunks based on the preset dimension management method, arranges the position information value of the corresponding dimension with the fastest sorting order The position information values of the corresponding dimensions of the next order are sorted (S150).

보다 구체적으로, 다차원데이터관리장치(100)는, 기 설정된 차원관리방식이 다차원 데이터의 2개의 차원(X축, Y축) 중 1차원(X축)보다 2차원(Y축)의 정렬순서가 선행되도록 하는 방식이면, 정렬순서가 가장 빠른 2차원(Y축)을 기준으로 2차원(Y축)의 위치정보값에 대한 정렬을 수행한 후 다음 순서의 1차원(X축)을 기준으로 1차원(X축)의 위치정보값을 정렬해야 함을 확인할 수 있게 된다.More specifically, the multidimensional data management apparatus 100 determines whether the predetermined dimension management method is a two-dimensional (Y-axis) sorting order than one dimension (X-axis) of two dimensions (Y-axis) is performed based on the two-dimensional (Y-axis), which is the fastest sorting order, It is possible to confirm that the position information value of the dimension (X axis) should be aligned.

이처럼 기 설정된 차원관리방식의 확인이 완료되면, 다차원데이터관리장치(100)는, 특정차원데이터(D00)의 1차원(X축)에서의 위치에 대응하는 위치값인 V₁ ="0"과 2차원(Y축)에서의 위치에 대응하는 위치값인 V₂ ="0"을 이용하여 특정차원데이터(D00)의 위치정보 (0,0)을 확인한다. 이어서, 다차원데이터관리장치(100)는, 동일한 방식으로 나머지 특정차원데이터(D01, D02, D10, D11, D12, D20, D21, D22)의 위치정보를 (0,1), (0,2), (1,0), (1,1), (1,2), (2,0), (2,1), (2,2)으로 확인한다. When the confirmation of the preset dimension management method is completed, the multidimensional data management apparatus 100 reads V ₁ = "0 ", which is a position value corresponding to the position in one dimension (X axis) of the specific dimension data D00 The position information (0, 0) of the specific dimension data D00 is confirmed by using V ₂ = "0" which is a position value corresponding to the position in the two-dimensional (Y axis). Next, the multidimensional data management apparatus 100 reads position information of the remaining specific dimension data D01, D02, D10, D11, D12, D20, D21, and D22 in the same manner as (0,1) , (1,0), (1,1), (1,2), (2,0), (2,1), and (2,2).

이후, 다차원데이터관리장치(100)는, 정렬순서가 가장 빠른 2차원(Y축)을 기준으로 2차원(Y축)의 위치정보값이 가장 작은 "0"을 순서대로 정렬하는 동시에 다음 순서의 1차원(X축)을 기준으로 1차원(X축)의 위치정보값을 정렬한 결과인 (0,0), (1,0), (2,0)를 청크(Chunk0)에 배치하고, 다음 크기의 2차원(Y축)의 위치정보값 "1"및 "2"역시 동일한 방식으로 정렬한 결과인 (0,1), (1,1), (2,1)과 (0,2), (1,2), (2,2)를 청크(Chunk0)에 배치, 즉 청크(Chunk0)에 포함되는 특정차원데이터(D00, D01, D02, D10, D11, D12, D20, D21, D22)의 배치 구조를 변경하여 정렬청크(Chunk0')를 생성한다.Thereafter, the multidimensional data management apparatus 100 sequentially arranges "0" having the smallest position information value of the two-dimensional (Y axis) with respect to the two-dimensional (Y axis) (0, 0), (1, 0), and (2, 0), which are the results of aligning position information values of one dimension (X axis) with respect to one dimension (X axis) Position information values "1" and "2" of the next size in two dimensions (Y axis) are also obtained by arranging in the same manner (0,1), (1,1), (2,1) D2, D10, D11, D12, D20, D21, and D22 included in a chunk (Chunk0) are arranged in a chunk (Chunk0) ) To generate an alignment chunk (Chunk0 ').

전술에 따라 나머지 청크(Chunk1,2,3) 별로 분배된 특정차원데이터의 차원을 기반으로 각 청크에 분배된 특정차원데이터의 정렬이 완료되면, 다차원데이터관리장치(100)는, 동일한 방식으로 나머지 청크(Chunk1,2,3)에 대응하는 정렬청크(Chunk1',2',3')를 생성한다.When the sorting of the specific dimension data distributed to each chunk is completed based on the dimension of the specific dimension data distributed for the remaining chunks (Chunks 1, 2, 3) in the above-described manner, the multidimensional data management apparatus 100, And generates alignment chunks (Chunk 1 ', 2', 3 ') corresponding to the chunks (Chunks 1, 2, 3).

이후, 다차원데이터관리장치(100)는, 기 설정된 특정포맷을 기반으로 정렬된 특정차원데이터를 변환하여 특정차원변환데이터를 생성하고, 특정차원변환데이터를 해당 청크와 관련되는 인스턴스DB서버에 저장한다(S160, S170).Thereafter, the multidimensional data management apparatus 100 converts the specific dimension data sorted based on the predetermined format, generates specific dimension conversion data, and stores the specific dimension conversion data in the instance DB server associated with the chunk (S160, S170).

다차원데이터관리장치(100)는, 청크(Chunk0,1,2,3)에 대응하는 정렬청크(Chunk0',1',2',3')의 생성이 완료되면, 정렬청크(Chunk0',1',2',3')에 저장된 데이터를 기 설정된 특정포맷인 Dense Array Format을 기반으로 변환하여 어레이스토리지(AS0,1,2,3)의 셀에 대응하는 셀값을 할당하여 특정차원변환데이터를 생성한다.When the generation of the alignment chunks (Chunk0 ', 1', 2 ', 3') corresponding to the chunks (Chunk0,1,2,3) is completed, the multidimensional data management apparatus 100 updates the alignment chunks ', 2', 3 ') based on a Dense Array Format, which is a predetermined format, and assigns cell values corresponding to the cells of the array storage (AS0,1,2,3) .

이후, 다차원데이터관리장치(100)는, 청크(Chunk0,1,2,3)에 대응하는 어레이스토리지(AS0,1,2,3)를 확인하여 해당하는 인스턴스DB서버(0,1,2,3)를 검출한다. 이어서, 다차원데이터관리장치(100)는, 청크(Chunk0,1,2,3) 별로 생성된 특정차원변환데이터를 해당 청크(Chunk0,1,2,3)와 관련되는 인스턴스DB서버(0,1,2,3)에 저장한다.Thereafter, the multidimensional data management apparatus 100 confirms the array storages AS0, 1, 2, and 3 corresponding to the chunks (Chun0, 1, 2, 3) 3). Then, the multidimensional data management apparatus 100 transmits the specific dimension conversion data generated for each chunk (Chunk 0, 1, 2, 3) to the instance DB server (0, 1 , 2, 3).

전술에 따라 특정차원변환데이터에 대한 저장이 완료되면, 다차원데이터관리장치(100)는, 모든 분산관리절차의 수행이 완료된 것이므로, 다차원 데이터에 대한 분산관리의 수행이 완료되었음을 알리기 위해 분산관리수행결과를 생성하고, 이를 사용자단말(200)로 전달한다.When the storage of the specific dimension conversion data is completed according to the above description, the multidimensional data management apparatus 100 has completed the execution of all the distributed management procedures. Therefore, in order to notify that the multidimensional data has been distributed, And transmits it to the user terminal 200.

이상에서 설명한 바와 같이, 본 발명은, 해시 값을 기반으로 다차원 데이터를 각 인스턴스DB서버에 직접 맵핑하여 적재하게 되므로, 기존 다차원 데이터 저장 방식과 달리 다차원 데이터를 재배치 과정을 근본적으로 차단할 수 있어 적재로 인한 오버헤드를 감소시킬 수 있게 되며, 그에 따라 전반적인 적재 비용 및 시간을 최소화할 수 있게 되는 것입니다.As described above, according to the present invention, multidimensional data is directly mapped to each instance DB server on the basis of a hash value, so that it is possible to fundamentally block the relocation process of multidimensional data unlike the existing multidimensional data storage method. Thereby reducing overall overhead and time, thereby reducing overall overhead.

따라서, 본 발명의 다차원 데이터를 관리하기 위한 장치 및 그 방법은, 해시 기법을 기반으로 다차원 데이터에 대한 최적화된 분산 관리가 수행되도록 함으로써, 기존과 달리 데이터 적재를 위해 불필요하게 소비되는 적재 시간 및 비용을 최소화하여 고속 데이터 로딩이 가능하도록 하는, 새로운 다차원 데이터 분산관리 방안을 제공할 수 있는 효과를 도출한다.Therefore, the apparatus and method for managing multidimensional data of the present invention can perform optimized distributed management of multidimensional data based on a hash technique, thereby reducing the time and cost of unnecessary consumption for data loading The present invention can provide a new multidimensional data distribution management method that enables high-speed data loading.

본 발명의 실시예들은, 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Embodiments of the present invention may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the medium may be those specially designed and configured for the present invention or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

지금까지 본 발명을 바람직한 실시 예를 참조하여 상세히 설명하였지만, 본 발명이 상기한 실시 예에 한정되는 것은 아니며, 이하의 특허청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변형 또는 수정이 가능한 범위까지 본 발명의 기술적 사상이 미친다 할 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

본 발명의 다차원 데이터를 관리하기 위한 장치 및 그 방법에 따르면, 해시 기법을 기반으로 다차원 데이터에 대한 최적화된 분산 관리가 수행되도록 함으로써, 기존과 달리 데이터 적재를 위해 불필요하게 소비되는 적재 시간 및 비용을 최소화하여 고속 데이터 로딩이 가능하도록 하는, 새로운 다차원 데이터 분산관리 방안을 제공할 수 있다는 점에서, 기존 기술의 한계를 뛰어 넘음에 따라 관련 기술에 대한 이용만이 아닌 적용되는 장치의 시판 또는 영업의 가능성이 충분할 뿐만 아니라 현실적으로 명백하게 실시할 수 있는 정도이므로 산업상 이용가능성이 있는 발명이다.According to the apparatus and method for managing multidimensional data of the present invention, optimized distributed management of multidimensional data is performed based on a hash technique, thereby reducing the time and cost of unnecessary consumption for data loading Dimensional data distribution management method that enables high-speed data loading by minimizing the size of data to be stored in the storage device. Therefore, as a result of overcoming the limitations of the existing technology, Is not only sufficient but also practically usable because it can be practically carried out clearly.

100: 다차원데이터관리장치
110: 분배관리부 120: 정렬관리부
130: 저장관리부 140: 저장부
200: 사용자단말
300: 다차원 데이터베이스100: Multidimensional data management device
110: distribution management unit 120: alignment management unit
130: storage management unit 140: storage unit
200: user terminal
300: Multidimensional Database

Claims

A distribution manager for distributing the specific dimension data to at least one chunk based on at least one specific hash information for each specific dimension data included in the multidimensional data; And
And an alignment management unit for aligning the specific dimension data distributed to each chunk based on the dimension of the specific dimension data distributed for each of the chunks,
Wherein the distribution management unit comprises:
Wherein the final hash value corresponding to the specific hash value is calculated based on a result of calculating a hash value for each dimension of the specific dimension data based on a specific hash function.

The method according to claim 1,
Further comprising a storage management unit for converting the sorted specific dimension data into a predetermined format and storing the converted specific dimension data in an instance DB server associated with the chunk.

The method according to claim 1,
Wherein the specific dimension data includes a position value corresponding to a position in the dimension for each dimension,
The specific hash function,
A hash value of the second dimension, a prime number, a position value of the first dimension, and a chunk range of the first dimension, which are calculated before the first dimension for which the hash value is to be calculated, Dimensional data management apparatus.

The method of claim 3,
The specific hash function,
A first result obtained by multiplying the hash value of the second dimension by the prime number and a second result value obtained by dividing the difference value between the position value of the first dimension and the minimum position value of the first dimension by a chunk range of the first dimension, (XOR) with respect to the result value.

The method of claim 3,
Wherein the distribution management unit comprises:
When the hash value of the first dimension is calculated using the hash value of the second dimension based on the specific hash function, the hash value of the first dimension and the total number of the instance DB servers managing the multidimensional data are Wherein the final hash value is calculated by performing a modular operation on the basis of the first hash value.

6. The method of claim 5,
The size of the final hash value is matched with a chunk number that identifies the chunk,
Wherein the distribution management unit comprises:
And distributes specific dimension data related to the final hash value to a corresponding chunk having the chunk number matched with the final hash value.

The method of claim 3,
The alignment management unit,
Determining a sorting order of the specific dimension data distributed by the chunks based on the predetermined dimension management method, sorting the position information value of the corresponding dimension with the highest sorting order, Dimensional data.

3. The method of claim 2,
The storage management unit,
Transforming the specified specific dimension data based on a predetermined specific format to generate specific dimension conversion data, and storing the specific dimension conversion data in an instance DB server related to the chunk.

A distribution management step of distributing the specific dimension data to at least one chunk on the basis of at least one specific identification information for each specific dimension data included in the multidimensional data; And
And a sort management step of sorting the specific dimension data distributed to each chunk based on the dimension of the specific dimension data distributed for each of the chunks,
Wherein the distribution management step comprises:
Calculating a final hash value corresponding to the specific hash value based on a result of calculating a hash value for each dimension of the specific dimension data based on a specific hash function.

10. The method of claim 9,
Further comprising a storage management step of converting the sorted specific dimension data into a predetermined format and storing the converted specific dimension data in an instance DB server associated with the chunk.

10. The method of claim 9,
Wherein the specific dimension data includes a position value corresponding to a position in the dimension for each dimension,
The specific hash function,
A hash value of the second dimension, a prime number, a position value of the first dimension, and a chunk range of the first dimension, which are calculated before the first dimension for which the hash value is to be calculated, Wherein the data management apparatus comprises:

12. The method of claim 11,
The specific hash function,
A first result obtained by multiplying the hash value of the second dimension by the prime number and a second result value obtained by dividing the difference value between the position value of the first dimension and the minimum position value of the first dimension by a chunk range of the first dimension, (XOR) with respect to the result of the comparison.

12. The method of claim 11,
Wherein the distribution management step comprises:
When the hash value of the first dimension is calculated using the hash value of the second dimension based on the specific hash function, the hash value of the first dimension and the total number of the instance DB servers managing the multidimensional data are Wherein the final hash value is calculated by performing a modular operation based on the first hash value.

14. The method of claim 13,
The size of the final hash value is matched with a chunk number that identifies the chunk,
Wherein the distribution management step comprises:
Wherein the specific dimension data related to the final hash value is distributed to a corresponding chunk having the chunk number matched with the final hash value.

12. The method of claim 11,
Wherein the alignment management step comprises:
Determining a sorting order of the specific dimension data distributed by the chunks based on the predetermined dimension management method, sorting the position information value of the corresponding dimension with the highest sorting order, Dimensional data management apparatus according to the present invention.

11. The method of claim 10,
Wherein the storage management step comprises:
Wherein the specific dimension conversion data is converted by converting the specific dimension data based on a predetermined format and stores the specific dimension conversion data in an instance DB server associated with the chunk. How it works.