|
34 | 34 | import java.util.Iterator;
|
35 | 35 |
|
36 | 36 | /**
|
| 37 | + * Implementation of hash index which is based on <a href="http://en.wikipedia.org/wiki/Extendible_hashing">extendible hashing |
| 38 | + * algorithm</a>. The directory for extindible hashing is implemented in |
| 39 | + * {@link com.orientechnologies.orient.core.index.hashindex.local.OHashTableDirectory} class. Directory is not implemented according |
| 40 | + * to classic algorithm because of its big memory consumption in case of non-uniform data distribution instead it is implemented |
| 41 | + * according too "Multilevel Extendible Hashing Sven Helmer, Thomas Neumann, Guido Moerkotte April 17, 2002". Which has much less |
| 42 | + * memory consumption in case of nonuniform data distribution. |
| 43 | + * |
| 44 | + * Index itself uses so called "muiltilevel schema" when first level contains 256 buckets, when bucket is split it is put at the |
| 45 | + * end of other file which represents second level. So if data which are put has distribution close to uniform (this index was |
| 46 | + * designed to be use as rid index for DHT storage) buckets split will be preformed in append only manner to speed up index write |
| 47 | + * speed. |
| 48 | + * |
| 49 | + * So hash index bucket itself has following structure: |
| 50 | + * <ol> |
| 51 | + * <li>Bucket depth - 1 byte.</li> |
| 52 | + * <li>Bucket's size - amount of entities (key, value) in one bucket, 4 bytes</li> |
| 53 | + * <li>Page indexes of parents of this bucket, page indexes of buckets split of which created current bucket - 64*8 bytes.</li> |
| 54 | + * <li>Offsets of entities stored in this bucket relatively to it's beginning. It is array of int values of undefined size.</li> |
| 55 | + * <li>Entities itself</li> |
| 56 | + * </ol> |
| 57 | + * |
| 58 | + * So if 1-st and 2-nd fields are clear. We should discuss the last ones. |
| 59 | + * |
| 60 | + * |
| 61 | + * Entities in bucket are sorted by key's hash code so each entity has following storage format in bucket: key's hash code (8 |
| 62 | + * bytes), key, value. Because entities are stored in sorted order it means that every time when we insert new entity old ones |
| 63 | + * should be moved. |
| 64 | + * |
| 65 | + * There are 2 reasons why it is bad: |
| 66 | + * <ol> |
| 67 | + * <li>It will generate write ahead log of enormous size.</li> |
| 68 | + * <li>The more amount of memory is affected in operation the less speed we will have. In worst case 60 kb of memory should be |
| 69 | + * moved.</li> |
| 70 | + * </ol> |
| 71 | + * |
| 72 | + * To avoid disadvantages listed above entries ara appended to the end of bucket, but their offsets are stored at the beginning of |
| 73 | + * bucket. Offsets are stored in sorted order (ordered by hash code of entity's key) so we need to move only small amount of memory |
| 74 | + * to store entities in sorted order. |
| 75 | + * |
| 76 | + * About indexes of parents of current bucket. When item is removed from bucket we check space which is needed to store all entities |
| 77 | + * of this bucket, it's buddy bucket (bucket which was also created from parent bucket during split) and if space of single bucket |
| 78 | + * is enough to save all entities from both buckets we remove these buckets and put all content in parent bucket. That is why we |
| 79 | + * need indexes of parents
8000
of current bucket. |
| 80 | + * |
| 81 | + * Also hash index has special file of one page long which contains information about state of each level of buckets in index. This |
| 82 | + * information is stored as array index of which equals to file level. All array item has following structure: |
| 83 | + * <ol> |
| 84 | + * <li>Is level removed (in case all buckets are empty or level was not created yet) - 1 byte</li> |
| 85 | + * <li>File's level id - 8 bytes</li> |
| 86 | + * <li>Amount of buckets in given level - 8 bytes.</li> |
| 87 | + * <li>Index of page of first removed bucket (not splitted but removed) - 8 bytes</li> |
| 88 | + * </ol> |
| 89 | + * |
| 90 | + * |
37 | 91 | * @author Andrey Lomakin
|
38 | 92 | * @since 12.03.13
|
39 | 93 | */
|
|
0 commit comments