Disclosure of Invention
The technical problem to be solved by the present invention is to provide an indexing method for keyword keys on a block chain database, to solve the problem of query of block chains, and to implement a non-falsifiable index.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
an indexing method for key words on a blockchain database, comprising the steps of:
step 1: the common node generates a transaction record according to original data with a keyword key input by a user, and the method specifically comprises the following steps:
step 1-1: generating transaction default information which comprises block chain version number information and timestamp information of data creation time;
step 1-2: appointing a precursor PreHash of the transaction, if the keyword key appears for the first time, setting the precursor PreHash as Null, otherwise, the precursor is the hash value of the latest transaction corresponding to the keyword key;
step 1-3: generating data authority information and a data signature; assigning the scriptPubk as a next public key capable of modifying the key data, and signing the transaction data according to a private key of the scriptPubk by using an ECDSA algorithm;
step 2: the storage node packs the transaction into a block, and specifically includes:
step 2-1: verifying whether the transaction format meets the specification, and filtering the transaction which cannot be identified;
step 2-2: verifying whether the transaction signature is valid; retrieving a transaction index TxIndex of the PreHash value from a k-v database, retrieving a precursor transaction PreTx according to the TxIndex, and taking the appointed next public key information PreTx. ScriptPubk; performing signature verification according to a verify () function of the ECDSA algorithm, wherein if verify (pretx.script pubk, tx.script sig, Tx) ═ True, the signature is valid, otherwise, the signature is invalid, and discarding the transaction;
step 2-3: detecting the honeysuckle; reading a PreHash field of the transaction, if the PreHash field is not empty, searching a precursor transaction index TxIndex from a k-v database according to the PreHash value, if the index indicates that a subsequent transaction of the precursor transaction is not empty, indicating that the transaction to be verified is a double-flower transaction, and judging that the transaction is invalid; if the subsequent transaction is empty, assigning the subsequent transaction as the hash value of the transaction to be verified, wherein the transaction to be verified is valid; if the PreHash field of the transaction is empty, namely the creator of the transaction considers that the transaction which is the same as the transaction key does not exist before, whether the transaction with the key can be searched in the system or not is searched, if yes, the transaction is invalid, and if not, the transaction is valid;
step 2-4: after the transaction is verified to be valid, the transaction is inserted into a key-ordered transaction array vTx [ ] in the block body, and the MerklerBTree index is updated;
the rule for updating the MerkleRBTree is as follows: initializing a current node as Merkleroot, comparing a key of the node to be inserted with a key of the current node, namely NodeKey, if the key is less than NodeKey, inserting the key into a left subtree of the current node, updating the current node as a left child of the current node, otherwise, inserting the key into a right subtree, updating the current node as a right child of the current node, and sequentially recursing until the last layer of branch nodes; after the transaction is inserted, the tree is rotated to ensure black balance, and the Hash value of the branch node is recursively updated upwards to be Hash (rigthhash, key); the branch node is inserted into the node at the last layer according to the following 4 conditions:
case 1: if the branch node is empty, the situation only occurs when the Merkleroot is empty, then the transaction is subjected to Sha256 double-hash operation to obtain the hash value Hash (Tx) of the branch node, then a branch node is newly built, the key value of the branch node is defined to be key, the left hash value is the leaf node Hash value, and the right hash value is 0, which is similar to (Hash (Tx), 0 and key);
case 2: if the key value of the data to be inserted is smaller than the NodeKey value of the node, the data is inserted into the left side of the node; firstly, carrying out Sha256 double-hash operation on a transaction to obtain a hash value Hash (Tx) of the transaction, then newly building a red branch node, wherein a key word of the red branch node is a key to be inserted, the left Hash is Hash (Tx), and the right Hash is the left Hash of the original branch node, namely (Hash (Tx), Oldlefthash, key); note Hash (tx), Oldlefthash, key), then update parent node as (Hash, Oldrighthash, oldkey);
case 3: if the key value of the data to be inserted is larger than the NodeKey value of the node and the right hash of the node is 0, performing Sha256 double hash operation on the transaction to obtain the hash value Hash (Tx), and then updating the branch node to be (Oldlefthash, Hash (Tx), Oldkey);
case 4: if the key value of the data to be inserted is larger than the NodeKey value of the node and the right Hash of the node is not 0, firstly carrying out Sha256 double Hash operation on the transaction to obtain the Hash value Hash (Tx) of the data, then newly building a red branch node, wherein the key value of the branch node is the smaller of the key value of the data to be inserted and the key value of the right child of the original branch node, the left child is the smaller of the key value of the data to be inserted and the right child of the original branch node, and the right child is the larger of the key value; finally, updating the right hash of the father node of the node to be the hash of the newly-built branch node;
and step 3: additionally writing the block data into the disk file, specifically comprising:
step 3-1: serializing a transaction array vTx [ ] storing transaction original data in a block head and a block body, sequentially writing the serialized transaction array into a disk file blk000x, and returning the position nFile and BlockPos of the block in the disk;
step 3-2: generating metadata TxIndex (nFile, BlockPos, n) of each transaction, wherein nFile specifies the number of files, BlockPos specifies the offset of the block in the file, n specifies the subscript of the transaction corresponding to the ordered array, and storing the metadata information of each transaction into a k-v database in the format of k ═ hash (tx), and v ═ TxIndex;
step 3-3: storing the MerkleRBTree index entry in a k-v database, wherein k is Hash (righthash, key) and v is Hash (righthash, key);
and 4, step 4: inquiring the transaction according to the key and returning an inquiry result; starting to search the transaction from the latest block, and inquiring the previous block if the block is not searched;
the searching method in a single block is as follows: entering Merklerroot from the block head into MerklerBTree, comparing a target keyword key with a current node keyword Ckey from the Merklerroot, if the key is less than or equal to Ckey, searching a left child of the target keyword key, and pressing a right child and the node keyword into a verification path stack, otherwise, searching a right child of the target keyword key, and pressing the left child and the node keyword into the verification path stack, and sequentially executing the steps until leaf nodes are reached; if the leaf node key value key is equal to the query key, the query hits; according to the transaction hash value Hash (Tx) inquired at the leaf node, with Hash (Tx) as k, inquiring index information TxIndex corresponding to the transaction on a disk in a k-v database, and finally reading transaction information Tx in a disk file according to TxIndex; finally, returning the verification path stack and the original transaction information to the common user;
and 5: the ordinary user carries out credibility verification on the query result;
after receiving the verification path stack and the transaction information, the common user firstly carries out Sha256 double-hash operation on the transaction to obtain a transaction hash value Hash (Tx), then carries out Sha256 double-hash operation on the transaction hash value and a value popped out from the verification path stack, and continuously hashes the obtained hash value and the next value popped out from the stack until the stack is empty; and comparing whether the obtained hash value is equal to the Merkleroot value in the local block header, if so, the data is credible, otherwise, the data is not credible.
Adopt the produced beneficial effect of above-mentioned technical scheme to lie in: the index method for the key on the block chain database can directly index according to the data key instead of the existing block chain, so that the data queryability is realized; the transaction structure in the traditional block chain is expanded to a mode structure which can be stored and is similar to a traditional database, so that the applicability is improved; data authority is managed according to the digital signature technology, and data security is improved; whether the index is falsified can be self-perceived according to the MerklerBTree, and whether the transaction is falsified can be perceived according to the transaction hash, so that the non-tampering property of the data is ensured; meanwhile, the data verification function of the lightweight node is realized, so that the query end can effectively detect the credibility of the data.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
The block chain technical terms designed by the invention are explained as follows:
hash pointer: the hash pointer of one data item refers to that the content of the data item is subjected to hash operation to obtain a hash value with a fixed length, meanwhile, the hash value is taken as a key, the content of the data item is a value, the k-v pair is stored in a k-v database, and the key is the hash pointer of the data item;
trading: as shown in fig. 1, a Transaction structure diagram is shown, a Transaction is divided into two parts, namely a Transaction header (Transaction) and a Schema (Schema), and the Transaction header includes: version number (Version), parent transaction hash (PreHash), transaction time (nTime), transaction Next owner public Key (scriptPubk), signature (scriptSig) to prove this transaction valid; the schema part is similar to the table structure in the traditional database, and comprises key words and fields (fields);
block chains: the block chain is that the blocks are connected end to end according to the hash pointer, and each block cannot be changed once being formed; as shown in fig. 2, the block structure diagram is a block structure diagram, and the block is divided into a block head and a block body, where the block head includes: the version number, the previous block hash pointer (obtained by hashing the previous block header data), the timestamp when the block is formed, the Merkle root obtained by hashing the transaction from bottom to top in the block body, the random number used for workload certification and the target hash; all transaction records in the block are stored in the block body;
block chain database: the database is provided with a plurality of block chains, each block chain is equivalent to a table in the traditional database, all central mechanisms serve as storage nodes of data, all the storage nodes generate the block chains according to a consensus algorithm, and all the nodes (including users) store block header information and can retrieve records from the block header information and verify the correctness of the records;
MerkleRBTree: combining the Merkle tree with the red-black tree for indexing the transaction data in the block body; as shown in fig. 3, the MerkleRBTree structure diagram is a MerkleRBTree structure diagram, each node stores key information, data with a key value less than or equal to a key of the node is stored in a left sub-tree from a root node, and data with a key value greater than the key of the node is stored in a right sub-tree; the leaf nodes store hash pointers of the transaction data, and the non-leaf node values are obtained by hashing the left child node and the right child node in pairs.
Aiming at the defects that only transaction hash values can be inquired and a general data structure cannot be stored in the existing block chain database, the invention provides an indexing method aiming at key words on the block chain database, which comprises the following specific steps.
Step 1, generating a transaction record. The user input data is Schema data similar to that in a traditional SQL database, and the purpose of this step is to generate additional information needed for the operation of the blockchain mechanism.
And step 1-1, generating default information.
And generating block chain version number information and timestamp information of data creation time.
And step 1-2, generating precursor information.
In the invention, for the data with the same key, only the newest data is considered to be effective all the time, and the old data is considered to be the precursor of the new data, so that a chain structure is generated for the data with the same key logically end to end. When writing data with key as key, designating the precursor of the data object, if there is no precursor, the precursor is Null.
And 1-3, generating data authority information and a data signature.
In the invention, the data written into the database can not be deleted, and the data modification is realized by rewriting a data item of the same key word. The authority information of the data refers to conditions which can be specified by a creator of the transaction data and need to be met when the data is modified. The ECDSA algorithm is introduced here, specifying the next public key that can be modified for that data item and assigning it to ScriptPubk. And finally, the data creator signs the transaction by using a private key of the data creator for later proving that the data creator has the right to modify the data.
And 2, packaging the transaction into a block.
And 2-1, verifying the correctness of the transaction.
The correctness verification of the transaction mainly verifies the format of the transaction and the data authority information. And the transaction which cannot be identified is removed by format verification, so that the system safety is ensured. The ECDSA algorithm is introduced into the authorization verification of the transaction, SicriptPubk in the transaction is a public key of a next authorization owner, and ScriptSig is digital signature information generated by a creator of the transaction. The authority verification is to verify whether the script sig of the transaction is matched with the script pubk of the predecessor transaction through a verify () function, if so, the creator of the transaction is proved to have a private key required by the predecessor transaction, the predecessor of the transaction can be modified, and if not, no related authority exists, and the transaction is invalid.
And 2-2, detecting the honeysuckle.
The data authority verification ensures that the transaction precursor can be modified only by owning the corresponding private key, but when the owner of the transaction transfers the authority to the next successor, the owner of the transaction can still transfer the authority of the transaction to other successors and modify the transaction, so that data inconsistency is caused, which is called double-flower. And after receiving the transaction record, the storage node reads a pre-hash field of the transaction, retrieves a precursor transaction index TxIndex from a k-v database according to the pre-hash value, and if the index indicates that a subsequent transaction of the precursor transaction is not empty, the transaction to be verified is a double-flower transaction, and the transaction is judged to be invalid. And if the subsequent transaction is empty, assigning the subsequent transaction as the hash value of the transaction to be verified, wherein the transaction to be verified is valid. If the pre-hash field of the transaction is empty, that is, the creator of the transaction considers that no transaction with the same key as the transaction key exists before, the double-flower verification at this time is to find whether the transaction with the key as the key can be retrieved in the system, if so, the transaction to be verified is invalid, and if not, the transaction is valid.
And 2-3, updating the block index.
After the transaction is verified to be valid, the original data Tx of the transaction is stored in an array vTx [ ] with ordered KEY, the Tx is subjected to double Sha256 hash operation to obtain a storage KEY (hash (Tx)) of the transaction, and the storage KEY is inserted into the MerkleRBTree. The merklrbtree is a combination of a red black tree and a Merkle tree, and each node is a k-v pair, where k is hash (val) and v is key. Inserting data from a root node, comparing a key of a node to be inserted with a key NodeKey of a Merkleroot node, inserting the key into a left subtree if the key is less than NodeKey, or inserting the key into a right subtree until a last layer of branch nodes. After the transaction is inserted, the tree is rotated to ensure black balance and the branch node Hash value is recursively updated up to Hash (1 efthash). The branch node is inserted into the node at the last layer according to the following 4 conditions:
case 1: if the branch node is empty (only appears when Merkleroot is empty), the transaction is subjected to Sha256 double hash operation to obtain a hash value hash (Tx), then a branch node is newly established, the key value of the branch node is defined to be key, the left hash is a leaf node hash value, the right hash is 0, the shape is (hash (Tx), 0 and key), and the insertion result is shown in FIG. 4 (a). This also means that a tree must not have its branch nodes empty, and the key value of the last level branch node must be the same as its left child key value, as long as it is not empty.
Case 2: if the key value of the data to be inserted is smaller than the NodeKey value of the node, the data should be inserted into the left side of the node, but as can be seen from the case 1, the left child of the node must not be empty and is a leaf node. Firstly, Sha256 double hash operation is carried out on a transaction to obtain a hash value Hash (Tx), then a red branch node is newly built, the key word of the branch node is a key to be inserted, the left Hash is Hash (Tx), and the right Hash is the left Hash of the original branch node, namely (Hash (Tx), Oldlefthash, key). Note that the Hash is Hash (tx), Oldlefthash, key), the update parent node is (Hash, Oldrighthash, Oldkey), and the insertion result is shown in fig. 4 (b).
Case 3: if the key value of the data to be inserted is greater than the node NodeKey value of the node, and the right hash of the node is 0, the transaction is subjected to Sha256 double hash operation to obtain the hash value Hash (Tx), then the branch nodes are updated to be (Oldlefthash, Hash (Tx), Oldkey), and the insertion result is shown in FIG. 4 (c).
Case 4: if the key value of the data to be inserted is larger than the NodeKey value of the node and the right hash of the node is not 0. Performing Sha256 double-hash operation on the transaction to obtain a hash value Hash (Tx) of the transaction, and then newly building a red branch node, wherein the key value of the branch node is the smaller of the key value of the data to be inserted and the key value of the right child of the original branch node, the left child is the smaller of the key value of the data to be inserted and the right child of the original branch node, and the right child is the larger; and finally, updating the right hash of the parent node of the node to be the hash of the newly-built branch node, wherein the insertion result is shown in fig. 4 (d).
And 3, additionally writing the block data into the disk file.
The data writing is carried out by taking a block as a unit, and when the transaction quantity in the block reaches a fixed threshold value, the data in the block is written into a disk. First, the transaction array vTx [ ] holding the transaction original data in the block header and the block body are sequentially written into the disk file blk000 x. The MerklerBTree built in the block body is stored in a k-v database. Simultaneously generating metadata of the Block index storage block for searching the block; the TxIndex is generated to hold the memory location of the transaction.
And 4, searching data.
And inquiring the transaction according to the key of the data, wherein all the blocks are connected end to end, and the latest block is searched according to the key. In a single block, Merklerroot at the head of the block enters MerklerTree, target key and current node key Ckey are compared from Merklerroot, if key is less than or equal to Ckey, the left child is searched, otherwise, the right child is searched, the steps are sequentially executed until a leaf node is searched, and the complexity of query time in the block of N nodes is O (lgN). If the transaction is in the block, the transaction hash value Hash (Tx) is finally inquired at the leaf node, the index information TxIndex of the transaction on the disk can be inquired in the k-v database by taking the Hash (Tx) as k, and the transaction information TxIndex is finally read in the disk file according to the TxIndex. If the transaction is not in the current block, it is looked up in its predecessor block in the same way.
The data in the invention can not be tampered in the following aspects: (1) in the process of obtaining the hash (Tx) from the key, the root node is searched downwards according to the MerklerBTree, because the tree is hashed by the leaf nodes when being generated, the leaf nodes can be finally searched certainly, if a certain branch node is not inquired in the process, the data of the last branch node is tampered or the data of the next node is lost; (2) in the process of obtaining Tx from hash (Tx), since the hash (Tx) is obtained by the hash operation of Tx, whether the transaction Tx is tampered can be detected according to the hash (Tx).
And 5, verifying the data.
The lightweight node only stores the block head information, and the storage node stores the block head and the block body. When the lightweight node queries data, the storage node returns a query result and a verification path. When the lightweight node receives data, the data are hashed two by two from the query result, and finally verification MerkLeroot is generated, and whether the query result is valid is verified by comparing whether the MerkLeroot is consistent with MerkLeroot in a locally stored block header.
The method for indexing block chain data according to the present invention is further described below according to an embodiment. Table 1 is the data that the user wants to store into the blockchain database.
Table 1 data to be stored in blockchain database
| Key
|
Name
|
Sex
|
Account
|
| 1600005
|
Bob
|
Male
|
7000
|
| 1600004
|
Alice
|
Female
|
6000
|
| 1600007
|
Tom
|
Male
|
5000
|
| 1600006
|
Jack
|
Male
|
8000 |
In this embodiment, the indexing process of data in the MerkleRBTree is mainly demonstrated, so that control information such as transaction Prehash, PubKey, ScriptSig and the like is omitted. Assuming that the current block is empty, the four transactions in table 1 come in sequence, and the storage node packs the transactions into the block in sequence.
When the node receives the 1 st transaction, which is empty, according to the case 1 in step 2-3 of this embodiment, first, the transaction is subjected to a double Sha256 hash operation to obtain a hash value of 747d51d1007b4fb1be01aeb633e78f66 cbdeee 393aa 200e73627498b2551df (only the first 8 bits are shown for convenience of writing), then a branch node is generated, the transaction hash value is used as its left hash, the right hash is 0, the node key is 1600005, and a MerkleRBTree index entry is generated as shown in table 2, where k is hash (v).
TABLE 2 MerklerbTree index entry generated at transaction 1
When the node is inserted into the 2 nd transaction again, the MerkleRoot node key is 1600005, the transaction key to be inserted is 1600004, which is smaller than the root node key, and should be inserted into the left subtree, and because the MerkleRoot is the last level branch node on the left side (there is no table entry where k is 747d51d1 in the index entry table), the case is consistent with the case 2 in step 2-3, the operation step is: firstly, carrying out Sha256 double-hash operation on transaction 2 to obtain a transaction hash value dca 0e32, newly generating a branch node, wherein the left hash is dca 0e32, the right hash is father node left hash 747d51d1, and the keyword is 1600004; and then updating the father node upwards, wherein the left hash of the father node is the hash of the new component branch node, the father node k is correspondingly changed, and the MerklerBTree index entry after updating is shown in the table 3.
Table 3 inserts MerklerBTree index entry after 2 nd transaction
When a node is inserted into the 3 rd transaction, the MerkleRoot keyword is 1600005, the keyword of the node to be inserted is 1600007, and if the keyword is larger than the MerkleRoot keyword, the node to be inserted should be inserted into the right subtree of the node, and because the MerkleRoot is the last layer of branch node on the right side at the moment, the situation is consistent with the situation 3 in the step 2-3, the following steps are performed: and performing Sha256 double-hash operation on the transaction to obtain a transaction hash value b178577f, updating the Merklerroot right hash value b178577f, correspondingly changing k of the Merklerroot, and finally, enabling the MerklerRBTree index entry to be shown in Table 4.
Table 4 inserts the MerklerBTree index entry after the 3 rd transaction
Although there is no record of key 1600007 in the index table after inserting the transaction with key 1600007, at query time, 1600007 > 1600005, so its right hash b178577f is found, and b178577f is the transaction hash corresponding to key 1600007, i.e. the right hash of 1600005 records the record of key > 1600005 (1600007).
When the node inserts the 4 th transaction, the MerkleRoot key is 1600005, the key of the node to be inserted is 1600006, which is larger than the MerkleRoot key, the right subtree should be inserted, the MerkleRoot is the last layer of branch node on the right side, which is consistent with the case 4 in the step 2-3, and then the operation is: and carrying out hash operation on the transaction to obtain a transaction hash value of 228de82e, newly generating branch nodes, wherein the left hash of each node is 228de82e, the right hash of each node is b178577f, the key word of each node is 1600006, then updating the right hash of the parent node upwards to be the hash of the branch node, and finally updating the k value of the Merkleroot. The MerkleRBTree index entry after updating is shown in table 5.
Table 5 inserts the MerklerBTree index entry after the 4 th transaction
Assuming that the block size reaches a threshold (in practice thousands of transactions per block), it is ready for storage to disk. The Merklerroot value in the zone block header corresponds to d90f7e2e, the transaction array in the zone block is the transaction original data in the table 1, and the MerklerTree index entry is directly written into the k-v database. Firstly serializing transaction array data in a block header and a block body, sequentially adding the serialized transaction array data to a disk file blk0001.dat, returning a block with a disk offset of nFile 1 and a blockapos 8, then generating metadata TxIndex (nFile, blockapos, n, nextchah) of each transaction, wherein nFile designates the next file, blockapos designates the offset of the block in the file, n designates a subscript of the transaction corresponding to the ordered array, nextchah designates a transaction successor (for detecting that double flowers are omitted here), and finally storing the metadata information of each transaction in a k-v database, wherein the metadata is shown in table 6.
TABLE 6 metadata information for each transaction
| Txhash
|
nFile
|
BlockPos
| n
|
| dcaa0e32 |
|
|
1
|
8
|
0
|
| eaf1e7ec
|
1
|
8
|
1
|
| 228de82e
|
1
|
8
|
2
|
| b178577f
|
1
|
8
|
3 |
To facilitate the explanation of the problem, the hash values for each transaction are listed as follows:
TABLE 7 Hash values for various transactions
| TxHash
|
Key
|
Name
|
Sex
|
Account
|
| dcaa0e32
|
1600004
|
Alice
|
Female
|
6000
|
| eaf1e7ec
|
1600005
|
Bob
|
Male
|
7000
|
| 228de82e
|
1600006
|
Jack
|
Male
|
8000
|
| b178577f
|
1600007
|
Tom
|
Male
|
5000 |
When a transaction with a key of 160006 is queried, first, based on the MerkleRoot value d90f7e2e of the block header, MerkleRoot values (baa d30a, be149016, 1600005), 1600006 > 1600005 are retrieved from the k-v database, so k be149016 is retrieved from the k-v database, MerkleRBTree index values (228de82e, b178577f, 1600006), 1600006 be 1600006 are retrieved from the k-v database, so k be 228de82e is retrieved from the k-v database, transaction metadata (1, 8, 3) is obtained, and finally deserialization is performed in the disk file to obtain the transaction raw data (1600006, Jack, Male, 8000), while the path is verified to be the Hash value on the side that is not hit in the query, in the embodiment ((baa d a, 1600005 b), (368747, 178577f, baa) is returned to the Hash node baa, baa b 72, baa (baa, jack, Male, 8000), b178577f, 1600006), 1600005) is equal to the local MerkleRoot value to verify the authenticity of the query result.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions and scope of the present invention as defined in the appended claims.