JPS62182849A

JPS62182849A - Data control system

Info

Publication number: JPS62182849A
Application number: JP61025125A
Authority: JP
Inventors: Kyoji Kawagoe; 恭二川越
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1986-02-06
Filing date: 1986-02-06
Publication date: 1987-08-11

Abstract

PURPOSE:To attain the uniform storage of data by increasing the number of pages in response to the increase of overflow and dividing only the data on the overflow page into plural pages. CONSTITUTION:In case data are stored, the data are first transferred to a data value memory part 11 from outside and only the key part in the data value is used by a hash calculation part 12 of production of the real number value that is printed to '1' from proper '0'. An optional hash function or the one that has high possibility to be printed to the uniform distribution in terms of the data distribution state is selected. That is, a function F(X)=X /(conceivable maximum value of X) is available. Then a store address calculating part 15 calculates the store addresses from the obtained hash function and the divided rows in a divided row memory part 13. Then the data in an overflow page if obtained is divided into plural pages by the key hash function value of each data.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、ファイルを有する計算機システムにおけるデ
ータ管理方式に関し、とくにデータの一部をキーとして
計算によってデータの格納番地を求めるハツシュによる
データアクセスを行なうときのデータ管理方式に関する
。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to a data management method in a computer system having files, and in particular, data access by a hash that calculates a data storage address using a part of the data as a key. Regarding the data management method used when carrying out.

１′従来の技術〕従来、ファイルを用いたハツシュアクセス方式において
は、キー及びキー以外のデータは、他のデータと共にペ
ージと呼ばれるデータ格納域単位で蓄積される。このと
き、キー値から上記データ格納域と示す格納番地を求め
るにはハツシュ関数と呼ばれる特殊な計算式によってお
こなわれてきた。1' Prior Art Conventionally, in a hash access method using files, keys and data other than keys are stored together with other data in units of data storage areas called pages. At this time, a special calculation formula called a hash function has been used to determine the storage address indicating the data storage area from the key value.

しかして従来のデータ管理方式ではページの中の記憶域
が少なくなり、もはやこのページにはデータを格納する
ことができない、いわゆるオーバフローが発生したとき
にはオーバフローしたページにデータを格納できずにオ
ーバフローしたページを介して別の格納域に格納してい
た。However, with conventional data management methods, when the storage space in a page decreases and data can no longer be stored in this page, when a so-called overflow occurs, the overflow page cannot store data in the overflow page. It was stored in a separate storage area via .

〔発明が解決しようとする問題点、１上記の従来のデータ管理方式ではデータの分布の偏りに
よりデータの格納効率が劣化し、またオーバフローした
データへのアクセスに際してはページアクセス回数が増
加するという問題点があった。[Problems to be solved by the invention: 1. In the conventional data management method described above, data storage efficiency deteriorates due to biased data distribution, and the number of page accesses increases when accessing overflow data. There was a point.

１、問題点を解決するための手段〕本発明の方式は、データの一部をキーとし、該キーのハ
ツシュ関数値により前記データのファイルへの格納域番
地を決定するデータアクセスを行なう情報処理装置のデ
ータ管理方式において、あらかじめデータ蓄積の際に記
憶された、オーバフローして格納されているデータを複
数のデータ格納域に分割したデータ格納域の番地列から
なる分割列データと前記ハツシュ関数値とがちデータの
格納域番地を決定する格納域番地決定手段と、データ蓄
積に際し前記格納域番地決定手段により決定された格納
域がオーバフローするときには該格納域の番地を前記分
割列データに追加して前記格納域番地決定手段により前
記オーバフローした格納域に格納されているデータおよ
び蓄積しようとしたデータの格納域番地を決定し蓄積す
るように制御する制御手段とを含んで構成される。1. Means for Solving the Problems] The method of the present invention is an information processing method that uses a part of data as a key and performs data access in which the storage address of the data in a file is determined based on the hash function value of the key. In the data management method of the device, divided column data consisting of an address sequence of a data storage area obtained by dividing overflowed data stored in advance during data storage into a plurality of data storage areas and the hash function value. Storage area address determination means for determining the storage area address of Togachi data; and when the storage area determined by the storage area address determination means overflows during data accumulation, the address of the storage area is added to the divided column data. The storage area address determination means determines the storage area address of the data stored in the overflowed storage area and the data to be stored, and controls the storage area address to be stored.

し作用〕前記の手段を有することにより、ページ数はオーバフロ
ーとともに増加することと、オーバフローしたページの
データのみ複数ページに分割することによりデータの均
質的な格納が可能となる。By having the above-mentioned means, the number of pages increases with overflow, and data can be stored uniformly by dividing only the data of overflow pages into a plurality of pages.

また、オーバフローしたデータへのアクセスにおいても
、従来のようなオーバフロ一対応の格納域のアクセスが
不要であり、かつ分割列記憶域の参照にはファイルへの
アクセスが不要なことから、必ず１回のアクセスで所望
のデータを取り出すことが可能となる。In addition, when accessing overflow data, there is no need to access the overflow-compatible storage area as in the past, and access to the file is not required to refer to the split column storage area, so it must be accessed only once. It becomes possible to retrieve the desired data by accessing the .

〔Example〕

以下、本発明の実施例を図面を参照して説明する。 Embodiments of the present invention will be described below with reference to the drawings.

第１図は、本発明の一実施例を示すブロック図である。FIG. 1 is a block diagram showing one embodiment of the present invention.

データを蓄積する際には、まず外部からデータ値記憶部
１・１ヘデータが転送される。データ値内のキ一部のみ
ハツシュ計算部１２で使用され、適切な０から１に写像
する実数値が生成される。ハツシュ関数としては、任意
のものあるいはデータ分布状況からみて一様分布へ写像
する可能性の高い関数が選ばれる。例えば、関数として
Ｆ　（Ｘ）＝Ｘ／　（Ｘの取り得る最大値）が考えられ
る。得られたハツシュ関数値および以下で説明する分割
列記憶部１３内の分割列を用いて、第２図に示す動作に
より格納番地計算部１５で格納番地が計算される。デー
タアクセス制御部１４では以下に示ず処理が行なわれる
。格納番地計算部１５で得られた格納番地を有する格納
域をデータ取出部１６よりとりだし、もしオーバフロー
していなければデータ蓄積部１７にて上記格納域に上記
データを格納したのち、ディスク装置１８でディスクに
書き込む。もし、オーバフローしたページであればこの
ページのページ内データを各データのキーのハツシュ関
数値によって複数のページに分割する。When storing data, data is first transferred from the outside to the data value storage section 1.1. Only a portion of the data value is used by the hash calculator 12 to generate real values that map appropriately from 0 to 1. As the hash function, an arbitrary one or a function that is highly likely to be mapped to a uniform distribution in view of the data distribution situation is selected. For example, F (X) = X/ (maximum possible value of X) can be considered as a function. Using the obtained hash function value and the divided columns in the divided column storage section 13, which will be explained below, a storage address is calculated in the storage address calculation section 15 by the operation shown in FIG. The data access control unit 14 performs processing not shown below. The storage area having the storage address obtained by the storage address calculation unit 15 is retrieved from the data retrieval unit 16, and if there is no overflow, the data is stored in the storage area by the data storage unit 17, and then the data is stored in the storage area by the disk device 18. Write to disc. If the page has overflowed, the in-page data of this page is divided into a plurality of pages according to the hash function value of the key of each data.

分割はまず分割列記憶部１３にオーバフローページの格
納域の番地を追加し、次いで各データを第２図のフロー
で処理し格納番地を求める。以降、単純な場合として２
分割のときを考えることとする。２分割されたデータ集
合は、上記オーバフロー巳なページと新にディスクの空
き領域から獲得されたページに格納される。与えられた
格納しようとするデータもそのハツシュ関数値によって
上記２つのページのいずれかに格納される。In the division, first, the address of the storage area of the overflow page is added to the division column storage unit 13, and then each data is processed according to the flow shown in FIG. 2 to obtain the storage address. Hereafter, as a simple case, 2
Let us consider the time of division. The data set divided into two is stored in the above-mentioned overflow page and a new page acquired from the free space on the disk. The given data to be stored is also stored in one of the two pages according to its hash function value.

データを収り出す際には、まず外部よりデータ値記憶部
１１ヘデータのキ一部のみ設定される。When data is retrieved, only a portion of the data is first set in the data value storage section 11 from the outside.

上記キー値を用いてノ＼ツシュ計算部１２および格納番
地計算部１５にてＬ記蓄積の場合と同様にして格納域の
番地が決定されろ。その後、上記得られた番地よりデー
タ収出部１６より与えられたキー値を有するデータ格納
域が収り出され、データがデータ値記憶部１１へ設定さ
れる。Using the above key value, the address of the storage area is determined by the storage calculation section 12 and the storage address calculation section 15 in the same manner as in the case of L storage. Thereafter, the data storage area having the key value given by the data acquisition section 16 is retrieved from the address obtained above, and the data is set in the data value storage section 11.

に記処理の詳細を例を用いてコ（２明する。The details of the process described below are explained below using an example.

今、第３図に示すデータが第１１図のよつに格納されて
いるものとする。新に、データ（１５，ｆ＞を蓄積する
要求か与えられ、そのデータがデータ値記憶部１１に設
定されたものとする。分割列の内容は（１，，２）であ
る。まず、ハ・ソシュ計算部１２によりハツシュ関数値
が得られる。今、仮に０．２が計算でもとまったものと
する。このとき、つぎの格納番地計算部１５では、以下
に示す処理が行なわれる。Assume now that the data shown in FIG. 3 is stored in the manner shown in FIG. 11. Assume that a new request is given to store data (15, f>) and that data is set in the data value storage unit 11.The content of the division column is (1,,2). - The hash function value is obtained by the Sosh calculation unit 12. Now, let us assume that 0.2 is found in the calculation. At this time, the next storage address calculation unit 15 performs the following processing.

（ステップ１）初期値の設定　Ｎｒ）＝１．ｍ＝１、Ｐ
＝１．Ｎｅ＝１．１＝０（ステップ２＞１＝１（ステップ３）ｉ＞２でないからステ・ツブ４へ（ステ
ップ４）Ｌ、ｌ−１（ステ・ツブ５　）　Ｎ　ｌ！　＝　１．−　＋である
からステ・ンプ６．１へ（ステップ６．１）Ａ、＝Ｏであるからステップ７、Ｉ
ａ／＼（ステーツブ７．１ａ）Ｎｆｆ＝　１　、Ｎｐ＝２．ｍ
＝２（ステ・ｌア２　）　　ｉ　＝　２（スう゛・・Ｉプ３）ｉ＞　２でないからステ・・ｌプ
・１へ（、ス　テ　・・）　ア’−１）Ｌ２＝２（ステ
・ツブ５）Ｎｊ？＝Ｌｚでないからステ・ツブ６．２へ（ステップ６．２）　Ｎ　ｅ　＞　Ｌ　２でないがらス
テ・ソ７７．２ｂ’＼（７！、テ・ｙ７”７．２ｔ＋）Ｎｅ＝１．Ｎｐ＝３（
ステップ２＞　　ｉ　＝３（ステップ３）ｉ＞２であるから終了、Ｐ＝１これより
データの格納番地が１であることが得られた４ところが
、上記ページ１は既にオーバフロー状！ぶにありこれ以
上データを格納することができない。従って、ページ１
内のデータ集合を分割し、ページ１及び新しいページ４
に格納する。(Step 1) Setting the initial value Nr)=1. m=1, P
=1. Ne=1.1=0 (Step 2>1=1 (Step 3) Since i>2 is not true, go to Step 4 (Step 4) L, l-1 (Step 5) N l! = 1.- +, so go to step 6.1 (step 6.1) A, =O, so go to step 7, I
a/\ (Status 7.1a) Nff=1, Np=2. m
= 2 (Step 2) i = 2 (Step 3) Since i > 2, go to Step 1 (, Step...) A'-1) L2 = 2 ( Ste Tubu 5) Nj? = Not Lz, so go to Ste-Tsub 6.2 (Step 6.2) Ne > L 2, but Ste-So77.2b'\ (7!, Te-y7”7.2t+) Ne=1.Np= 3(
Step 2 > i = 3 (Step 3) Since i > 2, it ends, P = 1 From this, it was determined that the data storage address is 14, but page 1 is already in an overflow state! No more data can be stored. Therefore, page 1
Divide the data set in page 1 and new page 4
Store in.

このとき、蓄積すべき上記データもそのキーのハツシュ
関数値によって新しい分割列（１，２，１＞をもちい第
２図のフローチャーＩ・に従って適当なページ（１ある
いは４）に格納される。分割後のファイルの状態を第５
図に示す。At this time, the data to be stored is also stored in an appropriate page (1 or 4) according to flowchart I in FIG. 2 using a new division sequence (1, 2, 1>) according to the hash function value of the key. The state of the file after splitting is
As shown in the figure.

次に、第４図の状態でデータを取り出す際には、仮にデ
ータキー値が１２０でありそのハツシュ関数値が０．９
であるとしたとき、上の蓄積と同様にして以下のように
格納番地が設定される。Next, when extracting data in the state shown in Figure 4, suppose the data key value is 120 and its hash function value is 0.9.
Assuming that, the storage address is set as follows in the same manner as the above accumulation.

（ステップ１〉初期値の設定　Ｎｐ＝１．ｍ＝１、Ｐ＝
１．Ｎｅ＝１．１＝０（ステップ２）ｉ＝１（ステップ３）ｉ＞２でないからステップ４へ（ステッ
プ４）Ｌ＋＝１（ステップ５）Ｎｊ？＝Ｌ＋であるからステップ６．１
へ（ステップ６．１）　Ａ＋　＝　Ｏでないからステップ
７　、Ｉｈへ（ステップ７　、Ｉｂ）Ｐ　＝　２　、　Ｎ　ｅ　＝　
２　、　Ｎ　ｐ　＝　２、ｍ＝２（ステップ２）ｉ＝２（ステップ３）ｉ）・２でないからステップ４へ（ステ
ップ４）Ｌ２＝２（ステップ５）Ｎｅ＝Ｌ２であるからステップ６．１へ（ステップ６．１）　Ａ２　＝　０でないからステップ
７．１ｈへ（ステ・：ｔＴ）”７．１ｂ＞Ｐ＝３．Ｎｅ＝３．Ｎｐ
＝３、ｍ＝３（ステップ２）ｉ−＝３（ステップ３）ｉ＞２であるから終了。Ｐ＝３得られた
格納番地３からデータ取出部６によってキー１２０と一
致するデータ（１２０、ｄ）が取り出され、データ値記
憶部１１に設定され外部に渡される。(Step 1> Setting initial values Np=1.m=1, P=
1. Ne=1.1=0 (Step 2) i=1 (Step 3) Since i>2 is not true, go to Step 4 (Step 4) L+=1 (Step 5) Nj? =L+, so step 6.1
(Step 6.1) Since A+ = O, go to Step 7, Ih (Step 7, Ib) P = 2, N e =
2, N p = 2, m = 2 (Step 2) i = 2 (Step 3) i) - Since it is not 2, go to Step 4 (Step 4) L2 = 2 (Step 5) Ne = L2, so Step 6. 1 (Step 6.1) Since A2 = 0, go to Step 7.1h (Step:tT)”7.1b>P=3.Ne=3.Np
=3, m=3 (Step 2) i-=3 (Step 3) Since i>2, end. P=3 From the obtained storage address 3, data (120, d) matching the key 120 is retrieved by the data retrieval section 6, set in the data value storage section 11, and passed to the outside.

ｉ発明の効果〕このように、本発明にはオーバフローしたページに格納
されているデータの動的な分割および分割ページ番地を
有する分割列を用いた格納番地計算によって、ファイル
内のデータの分布を均質にでき、また必ず１回のアクセ
スでデータを収り出すことができ、アクセス回数を減少
できるという効果がある。実験および解析の結果、デー
タのページ格納効率は平均０．６７であり効率的なデー
タ格納が可能となった。[Effects of the Invention] As described above, the present invention has a method of dynamically dividing data stored in an overflow page and calculating storage addresses using a division column having division page addresses, thereby controlling the distribution of data within a file. This has the advantage that it can be made homogeneous, and the data can be retrieved with just one access, reducing the number of accesses. As a result of experiments and analysis, the average data page storage efficiency was 0.67, making it possible to store data efficiently.

図面の節ｃｉｔな説明第１図は、本発明の一実施例を示すブロック図、第２図
は、第１図の格納番地計算部の動作フローチャー１・、
第３図は、第２図の説明にＣ重用するデータ集合の例を
示す図、第４図及び第５図は、第３図を用いた例におけ
る分割前後のファイル状態を示す説明図である。Brief Description of the Drawings FIG. 1 is a block diagram showing an embodiment of the present invention, and FIG. 2 is an operational flowchart 1 of the storage address calculation section of FIG. 1.
Fig. 3 is a diagram showing an example of a data set that is heavily used in the explanation of Fig. 2, and Figs. 4 and 5 are explanatory diagrams showing the file status before and after division in the example using Fig. 3. .

１〜５．６．１．６．２．７．ｌａ　、　７．Ｉｂ　、
　７．２ａ　。1-5.6.1.6.2.7. la, 7. Ib,
7.2a.

７．２ｂ・・・フローチャートのスデップ、１１・・・
データ飽記憶部、１２・・・ハツシュ計算部、１３・・
・分割列記憶部、１４・・・データアクセス制御部、１
５・・・格納番地計算部、１６・・・データ取出部、１
７・・・データ蓄積部、１８・・・ディスク装置。7.2b...Step of flowchart, 11...
Data saturation storage unit, 12... Hash calculation unit, 13...
- Division column storage unit, 14...data access control unit, 1
5...Storage address calculation unit, 16...Data retrieval unit, 1
7... Data storage section, 18... Disk device.

躬　２ｆｆｉ熟３　凹番地　　　　　　　　　４地学４　図牛５図2ffi Mature 3 concave Street address 4th place Science 4 diagram Cow 5 diagram

Claims

[Scope of Claims] In a data management method for an information processing device that performs data access in which a part of data is used as a key and a storage area address of the data in a file is determined based on the hash function value of the key, data storage is determined in advance. The storage area address of the data is determined from the hash function value and divided column data consisting of the address sequence of the data storage area obtained by dividing the overflowed data stored at the time into a plurality of data storage areas. area address determination means, and when the storage area determined by the storage area address determination means overflows during data storage, the address of the storage area is added to the divided column data, and the overflowed storage area is determined by the storage area address determination means. 1. A data management system comprising: control means for determining storage area addresses of data stored in a storage area and data to be stored and controlling storage area addresses;