ResearchGate
Review of Spatial Indexing Techniques for Large Urban Data Management
e 6205
ite So stents
oe sere
‘Aorertoloning is age masuleaed by Francs 09 6n28 Jar 2016Review of Spatial Indexing Techniques for Large
Urban Data Management
Suhaibah Azzi, Uznir Ujang, Frangois Anton, Darka Mioc and Alias Abdul
Rakman
Abstract Pressure on land development in urban areas causes progressive efforts
in spatial planning and management. The physical growth expansion of urban ar
ces to accommodate rural migration gives a massive impact to social, economic
and polities of major cities. Most of the models used in managing urban areas are
moving towards sustainable urban development which to fulfill current necessities
while preserving the resources for future generation. However in order to manage
large amounts of urban spatial data, an efficient spatial data constellation method is,
needed. With the ease of three dimensional (3D) spatial data usage in urban areas
as a new source of data input, practical spatial data indexing is necessary to im-
prove data retrieval and management, Current two dimensional (2D) spatial index-
ing approaches seem not applicable to the current and future spatial developments
‘Therefore, the objective of this paper is to review existing spatial data indexing ap-
proaches to managing large urban area datasets. Each approach will be reviewed and
discussed according to the current spatial data scenarios. In addition, a 3D spatial
data indexing method will be discussed as an alternative for organizing 3D spatial
data,
‘Subba Ave
SD GIS Research Lab, Universiti Teknologi Malaysia, Malaysia, e-mail: norsuhasbahe
Urnir Vang
SD GIS Research Lab, Universiti Teknologi Malaysia, Malaysia, e-mails mduzn ic@utm my
Frangois Anton
Natonal Space Intute, Technical University of Denmark, Denmark and Universit Teknologi
Malaysia, Malaysia, e-mail: Faspac
Darks Mioe
Natonal Space Intute, Technical University of Denmark, Denmatk, e-mail: ioc? space
Alias Abdul Rabman
SD GIS Research Le, Universiti Teknologi Malaysia, Malaysia e-mail: aliaa@utm.ny2 Suhaibab Az, Uanie Ujang, Frangois Anton, Darka Mice and Alias Abdul Rahman
1 Introduction
‘The urbanization phenomenon and the growth of mega-cities are widespread around
the world. The rapid process of urbanization and the growing number of mega-
cities cause a lot of economical, social and ecological issues and risks. Affected
by this phenomenon, planners or spatial professionals are challenged for making
policies and strategies in managing urban cities towards sustainable development.
‘The increasing prominence of urban growth is parallel with the increase of spatial
information, Spatial information has become indispensable for numerous aspects of
turban development and management. The increasing prominence of spatial data has
been due to recent developments in spatial data capturing such as remote sensing,
slobal positioning system, laser scanning and drones for aerial surveying.
Geographic Information Systems (GIS) offer a platform to store, manage, ana-
lyze and help decision taking using urban data such as buildings. road networks
surface and subsurface utilities, etc. However, the bottleneck of GIS is the manage-
‘ment of large amounts of spatial data, Some of the issues that arise are related to
software and hardware limitations in handling large spatial datasets. Furthermore,
spatial data come from various sources of varying sizes such as digital maps, satel=
lite images, topographic maps and aerial photographs. Moreover, for urban data
‘management, total data sizes could raise up to terabytes due to the large seale, area
and object density. In urban data, spatial objects such as buildings could be builtin
very dense areas with densities of up to thousands of buildings per square kilome-
ter, In addition, preserving the historical development of an urban arca will require
a temporal data management and thus, increase the disk storage size as described in
Ujang et al (2013).
This issue becomes more crucial when the trends of spatial data are evolving to
three-dimensional (3D) spatial data and more specific application frameworks are
moving towards using 3D data as a new data input Sharkawi et al (2008); Mohamad
Yusoff et al (2010). Nowadays, acquisition of 3D spatial data is much more con-
venient with various scales and resolution due to acquisition technology such as
‘TLS (Terrestrial Laser Scanning), LIDAR (Light/Laser Detection and Ranging) and
drone based surveying. For a practical application, these datasets need to be man-
aged and maintain within a 3D spatial database, Spatial query isa basic function in
GIS. In fat, spatial analysis itself originates from spatial query Kun etal (2005). For
large areas of urban data management, optimizing spatial queries is needed since the
large volume of data will affect the duration of data retrieval. In spatial databases,
spatial indexing is one of the key techniques in improving and optimizing spatial
query and analysis.
From this paper. user especially spatial professional will get a basic knowledge
‘on which methods is applicable and practical in managing large dataset for differ-
ent situations or cases especially for 2D and 3D dataset, There are a lot of spatial
indexing structures proposed for spatial query operation. However, not all are suited
to 3D application in urban data management, Sometimes the processing of 3D data
is based on the framework of 2D index structure Which lead to data retsieval inef-
ficiency as mentioned by Gu et al (2011). Thus, in this paper several methods willReview of Spatial Indexing Techniques for Lay
ban Data Management 3
be investigated to see the potential or applicability of each structure in handling »
huge amount of urban data. The effort towards 3D spatial index structure also will
be discussed in this paper including several extension of index structure based on
the existing R-Tree structure, From the review, some issues of the uncompleted part
towards true 3D index structure will be raised.
2 Spatial Indexing
Spatial indexing is a technique to optimize query processing in spatial databases
by organizing records in memory space. Query performance is increased especially
when dealing with many rows in a table, Number of rows will be reduced once the
appropriate indexes are created and after that indexes will be executed for query
processing. In traditional database system, data is ordered (or sorted) for efficient
searching by using the ordering approach such as B-Tree. But this approach is lim-
ited to one dimensional data such as text, numbers, strings and etc. For higher di-
‘mensional spatial data, memory space is utilized for organizing and sorting records
in the database, According to Rigaux et al (2002), there are two main groups of
index structures for spatial databases, which ate space-driven structures and data-
driven structures. In the next section, we will discuss and illustate spatial index
structures based on these two main groups.
2.1 Space Driven Structures
In space driven structure, objects in 2D space ate partitioned into rectangular cells.
Objects are mapped according to geomewic criterion. The simplest space driven
structure, fixed grid was proposed by Bentley and Friedman (1979). The fixed grid
structure divides the space to fixed number of cell with an equal size. Each cell rep-
resents a block of secondary storage. This means every point in the cell is stored
in the same block of secondary storage, The structure of fixed grid also could be
{implemented by using an array or hashing function. From time to time the structure
of fixed grid has been extended to overcome some limitation of original fixed grid
‘The original fixed grid is limited to non-uniformly distributed data which lead to
overflowed cell that being used by others. In this section several space driven struc-
tures will be successively presented such as grid file which is an extension of fixed
arid file, Quadiree, k-d Tree and Space Filling Curve of Hilberts Curve,
Grid Fite
Grid file is initially designed from fixed grid for indexing points Nievergelt et al
(1984). In the fixed grid, the space is decomposed into rectangular cells and the4 Suhaibah Azri, Uanie Ujang, Frangois Anton, Darks Mioe and Alias Abdul Rahman
resulting grid is an n, x, array of equal size cells. Each one of the cells « is asso-
ciated with a disk page, Objects or points mapped in cel care sequentially stored in
the page associated with
‘The same method applied in a grid file, Every cell eis associated with one disk
page. The difference is when a cell overflows, it will split into two cells. From that
split, points are consigned based on the cell they fall into. This causes the size of the
partition and cells to adapt to the point distribution. The construction of grid file is
best described using Figure 1
—______
y
Fig. 1 Grid fle for point distribution
Grid files are not limited only to point indexing. They could be utilized for rect-
angle indexing. In this case, rectangles are mapped by assigning the rectangles to
the cell that overlaps the rectangle with the condition of contains, intersect or in
contained in it. Grid files have also been extended in different ways as reported by
Gaede Gaede et al (1998) such as Extendible Cell or EXCELL Taraminen (1982),
the Two-Level Grid File Hinrichs (1985), and the Twin Grid File Hutflesz et al
(1988)
“The grid file structure seems to meet the requirement in terms of time complex-
fay and space complexity. However, there are several problems that remain, Object
duplication of neighbor cells will increase the index size and removing duplicates
is expensive when the result size is large. For large datasets, especially large scaleReview of Spatial Indexing Techniques for Large Urban Data Management s
urban datasets, the management of this data becomes complex and this will directly
degrade the query response time
Quadtree
‘The quadiuee is a data structure used to subdivide the 2D plane into four quad-
rants, The quadrants are named North West (NW), North East (NE), South West
(SW) and South East (SE). In the Quadtree structure (Figure 2), the search space is
recursively decomposed into quadrants until the number of rectangles overlapping,
each quadrant is less than the page capacity. The Quadiree could be utilized for data
representation such as point, area, line and curve.
Fig. 2 Quadicce strata aken from Gomes etal (2009)
A lot of applications used Quadtrees for detecting collisions, view frustum
culling of terrain data, fractal image analysis and more, In previous research,6 Suhaiba Az, Uanie Ujang, Frangois Anton, Darka Mioe and Alias Abdul Rahman
(Quadtrees were applied to binary images Klinger and Dyer (1976), computation of
‘geometric properties Ranade and Shneier (1981), edge enhancement Ranade (1981)
and distance transformation Samet (1982). Based on this, several scholars Samet
et al (1984), Peuquet (1984), Mark and Lauzon (1985) and Mark (1986) have pro-
posed the use of Quadiee in GIS.
Recent work by Zhang et al (2011), has adopted the Quadiree data structure
for large scale geospatial data with a new design of a Quadtree structure termed
as Bitplane Quadtrce (BQ-Tree). 16-bits of the NASA MODIS image were tested
in Zhang et al (2011) with 22,658 x 15,586 cells in 190 milliseconds, which is
1.86 billion cells per second, While handling raster data seems promising for Bit-
planeQuadiree, a limitation on geometry and topology is reported in Mark (1986);
Tobler and Chen (1986). These papers show that the Quadtree has been experi-
‘mented for large areas by partitioning the space into a set of mutually exclusive and
collectively exhaustive patches or frames. The concept of the quadrant is applied
within the patches of space which is very easy to accommodate with GIS systems.
However, from this experiment, the Quadtree does encounter a problem when it
‘was applied on large portions or Earths surface, This problem is due to the problem
of tessellated squares on a sphere surface, Thus to overcome this problem, Mark
(1986)suggested for continuous or path-connected area of surface the modification
concept of Quadiree is necessary. The same problem is also reported by Tobler and
Chen (1986). In their experiment of they were using Quaduee for global storage
information. From the report, itis mentioned that if one attempts to adapt this tech-
nique on ellipsoidal surface, there will be some complications on the data storage.
This is due to the facet edges which are curves on the sphere is not coincident with
the latitude-longitude lines and sub-facet edges are not even great circles
kd Tree
According to Bentley (1975), the k-d Tree stores k-dimensional points in subdi-
vided rectangular parallelepipeds. Bach rectangular parallelepiped corresponds to
one node of the k-d Tree. The whole region of interest is the root ofthe tree. The
internal nodes are subdivided by rectangles orthogonal to one of the d axes of co-
ordinates (for each level i of the k-d-tree, the axis of coordinates orthogonal to the
dividing rectangle is the axis of coordinate x) into two rectangular parallelepipeds.
‘The points of the original rectangular parallelepiped that have the x; coordinate
lower than or equal to the x coordinate of the dividing rectangle are represented
by the left subtree, while the others are represented by the right subtree (see Figure
3). For each node in the k-d Tree, it will consist of records and two pointers. Pointers
are either null or point to other nodes, The insertion of a new point will result into
the splitting of an existing node into two new nodes, Range searching starts from
the root, and then checks whether the stored node overlaps with the right or the left
region of the subtree. For the overlapping search region, the procedute is repeated
until the leaf levelban Data Management 7
Fig. 3 ked Tee by Gomes etal (2009),
kd Trees are efficient in performing nearest neighbor searches such as list of
points at distance lower than D from point X ot find the nearest points of P from
X. When executing such a query, the corresponding leaf node of point X will be
detected by traversing the k-d-tree from the root until the leaf, The point stored in
the leaf node will be processed and nearby leaf will be scanned. When the distance
from X to the leaf is higher than the point found so far, it is the time to stop seazch-
ing. Algorithm of k-d Tree is the most widely used algorithm for nearest neighbor
search Friedman et al (1977). However, the search implementation is ineffective and
degrading when space dimensionality is increased Oosterom (1999). Sample et al
(2001) and Nene and Nayar (1997), Another problem of k-d Tree mentioned by
Beerentzen et al (2002) is tha its structure depends on the order of inserted points
In the case of k points require k levels; a searching will took a linear time to com=
plete the process. A solution to this problem is by balancing the ree structure using
adaptive k-d Tree. Adaptive k-d Tyee splits each set of points in the whole region of8 Suhaiba Azi, Uanie Ujang, Frangois Anton, Darka Mice and Alias Abdul Rahman
interest into two equal regions, By having this division, it is ensure that the adaptive
k-d Tree isa balance tree (see Figure 4) However, the balancing enforcement by the
adaptive k-d Tree took the flexibility of the structure to insert or delete points. This
is due tothe prohibitions of standard balancing strategies based on rotation, The r0-
tations could not be used because it would destoy the rule ata given depth of tee
However, most of the recent extension of k-d Tree such as generalized k-d Tree has
loosens the rule. Therefore, insertion or point deletions in adaptive generalized k-d
‘Tree are allowed and in the same time satisfy the balancing property. The algorithm
fof k-d Tree has been improved from time to time to suit applications such as retriev-
ing images from a medical image collection Robinson et al (1996) and mobile robot
navigation Li etal (2008),
>
Se Foe DCAH
Fig. An Adaptive ke Tre by Barentzen etal (2002)
Space filling curve
A space filling curve is a continuous function with domain the unit interval (0, 1] and
a range that could be lying in a nD topological, uniform or Euclidean space. Space
filling curves allow a proximity problem to be reduced from higher-dimensional
spaces to the one-dimensional real line. Such a problem usually involves a query
search and sorting in one-dimensional space. Many applications can benefit from
space filling curves, but a common application of space filling curves is the stor-
‘age and retrieval of multidimensional data in the database. Proximity problems for
which space filling curves have been used regularly, are approximate nearest neigh-
bor search or finding closest points from one location. Until now, there are alot ofReview of Spatial Indexing Techniques for Large Urban Data Management 9
variants of space filling curves such as the Hilbert Curve, Dragon Curve, Gosper
Curve, Moore Curve, Z-Order curve and many more.
‘The space filling curve was discovered by Giuseppe Peano in 1890 and called
the Peanos Curve Peano (1890). From his finding, a curve must pass through ev-
ery point on a plane at least once. But, before his finding, in 1878, George Can-
tor has demonstrated a one-to-one correspondence between unit interval and unit
square Thiele and Wos (2002). In 1879, Netto added that any such mapping could
not be continuous Bocarra (2007). In 1891, David Hilbert discovered a Hilbert
Curve structure which isa simpler variant of the same idea of Peanos Kevin (2008)
‘The Hilbert Curve subdivides squares into smaller squares instead of Peanos, nine
smaller squares. Figure 5 shows four iterations of Peanos Curve,
Fig. § First four iterations of Peano Curve
When David Hilbert discovered a variation of Peanos Curve, he divided each side
of the unit square into two parts equally. This yields the square into four smaller
squares, From the divided squares, each of them is divided into another four smaller
squares, and so on, For each part ofthis division, a curve will raverse all the squares.
Hilberts Curve normally starts at the lower left division and ends at the lower right
division. According to its movement and process, the Hilbert Curve could be ex-
pressed in base 2.
‘The Hilbert Space Filling Curve could be described as an underlying grid, an
NN x N artay of cells where NV = 2". Hilberts enumeration of the squares is shown in
Figure 6 for W = 1,2,3. Note that the first square is always at the lower left comer
and the last square is always in the lower right comer.
Fig. 6 The fst hee sages in generating Hilber Curve10 Suhaibab Azi, Uanie Ujang, Frangois Anton, Darks Mioe and Alias Abdul Rahman
From the divided square, we assume that the Nx N cells coordinates start at
(0,0) in the lower left comer. From the assumptions, the Hilbert Curve has comers
at integers ranging from 0 to 2" — 1 in both x and y, And if we took a positive
direction along the curve the location from (x,y) = (0,0) 10 (2 ~ 1, 0)
Ina lage data management Hilberts Curve responded positively on the perfor-
‘mance.Castro and Burns (2007) have experimented with the Hilberts Curve with the
visualization of a large datasets. From their finding, it indicates that Hilberts curve
produces acceptable quality of mapping and in the same time, achieving faster vi-
sualization especially for online visualization of very large data sets. Apart from
Hilberts Curve performance, its limitation should be considered in developing ap-
plications. For an instance Anders (2009) experimented a data vector with Iength of
18M to demonstrate the use of Hilberts Curve. In this experiment, the data vector
plotted with its whole length condensed to width of plot. The obvious drawback of
Hilberts Curve in this experiment is that itis rather hard to relate a position on the
plot back to a position of the vector. This limits applicability when absolute positions
are of interest but is not an issue if one is interested in judging relative positions,
ive, spacing, homogeneity, ec. Another drawbacks of Hilberts Curve mentioned by
Chung et al (2007); Kamata et al (1996) is the representation of Hilberts Curve
Based on its representation some homogeneous segments may be rather long. In
their experiments, seven bits are needed for encoding the length one single segment.
2.1 Data-Driven Structures
Data Driven structures is a structure that based on partitioning the set of objects and
thus adapt to the distribution of these objects. In this section R-Tree data structure
has been chosen as an example of data driven structures. The R-Tree structure itself
is adapted to the rectangle distribution inthe plane which is a concept of data driven
structures
RTree
The R-Tree is a balanced tee found by Guttman (1984), The structure of R-Tree
is applicable to arbitrary spatial objects by aggregating minimum bounding regions
{rnbr) and organizes the aggregates ina wee structure. Each node in the R-Tre sruc-
ture corresponds to a disk page. Every leaf node contains an array of leaf entuies.
‘And every leaf entry must bea pic (by, object) mbr sof the forma (a1T7- 4%),
where d isthe dimension of the search space. Non-leaf nodes contain an array of
node enties. According to Guttman (1984) R-Tree structures must satisfy the fol-
lowing properties:
‘© Every leaf node in the tee contains between m and M index records, where m €
(0,M/2) except for the rootReview of Spatial Indexing Techniques for Large Urban Data Management u
‘© Each index record in a leaf node contain entries of the form (I, tuple-identiier),
where Tis the smallest bounding reclangular parallelepiped of the n-dimensional
data object
‘© Non-leaf nodes contain entries of the form (f, child-pointer), where I is a n-
dimensional rectangular parallelepiped. Every non-leaf node has between m and
‘M children unless itis the root,
© For each leaf entry (1 child-pointer), [is the minimal bounding box or smallest
rectangle of object stored in the child node.
‘© The root node has at Ieast two children unless itis a leaf,
‘© Allleaves appear on at most two different levels, That means the tree is balanced.
Fig. 7 R-Tice rpretentaton from Manolopouls a 2006)
Figure 7 illustrates the R-Tree structure. The three minimum bounding rectangles
of the root node Ra, Rb and Re are represented by thick lines. Traditionally, R-Tree
performance has been evaluated for range query such as NN (Nearest Neighbor)
queries (Roussopoulos et al, 1995) and join query Brinkhoff etal (1993). From the
eport of Papadopoulos and Manolopoulos (1997)), the R-Tree structure shows the
efficiency in answering Nearest Neighbor queries. In real world applications, NN
queries are applied in various fields such as pattern recognition, clustering analy
sis, content based image retrieval and more. Unfortunately, just like other indexing
structures, R-Trees have their own limitations due to their poor scaling characteris
tic, According to Manolopoulos et al (2006), the original R-Tree has two important
disadvantages:
1. For a point query, the R-Tree structure may execute an investigation from root to
leaf level which leads to performance degradation especially when MBRs over-
lap.
2, Large rectangles (MBR) may increase the overlap area which leads to perfor-
‘mance deterioration during query execution,
‘These limitations or disadvantages were addressed i research works towards R-
‘Tree variants, such as R' Tree by Sells etal (1987), R*Tree by Beckmann et al2 ‘Subaibah Azz, Unie Ujang, Frangois Anton, Darka Mioe and Alias Abdul Rahman
(1990), Hilbert-R Tree by Kamel and Faloutsos (1994) and R k-d Tree by Anand-
hhakumar et al (2010). Every variant is altempting (o tackle some limitation of the
original R-Tree. For instance, the R~ Tree variant by Sells et al (1987) is dedicated
to increase search performance for point query by reducing the number of disk ac-
cess and attempts to have zero overlapping. This variation of R-Tree has been dis-
cussed by Hwang et al (2003), Manolopoulos et al (2006) and Balasubramanian
and Sugumaran (2012). Several other R-Tree variants Were introduced such as the
Ri Tree, R*Tree, Packed R-Tree, Hilbert R-Tree, R*Q Tree and Historical R-Tree,
Lakshmi et al, 2012 classified the structure of R-Tree variance in three categories:
process change, hybridization and extension. The classification is shown in Figure
8
BS ea
Fig. 8 ReTree Variants by Balasubramanian and Sugumaran 2012)
‘The R-Tree structure has been widely used in spatial database management sys
ems due (o its efficiency in organizing spatial objects of arbitrary dimension, Be-
sides that, using R-Tree index will save both disk space and time to build and main-
tain the index. Bottom-up build performance also will be improved. This is due to
reducing memory and temporary database space usage. Commercial RDBMSs such
as Oracle and Informix, also adopting an R-Tree structure in their spatial database
according to Ravada and Sharma (1999); Kothuti et al (2002). Due to this reputa-
tion, the R-Tree structure and its variants attracted a lot of attention from researchersReview of Spatial Indexing Techniques for Large Urban Data Management B
and commercial database corapanies. The simplicity of the structure and its ability
to handle spatial objects efficiently are (wo tempting reasons for incorporating the
structure of R-Tree in a spatial database. In modern computing intensive applica
tions, an efficient method for handling spatial data is of great importance Ravada
and Sharma (1999)
3 3D Spatial Indexing
‘The state-of-the-art of 3D spatial indexing evolved when 3D data are widely used in
various fields and applications. For example, spatial professionals can simulate the
noise, heat and pollution spreading in virtual cities; architects can use a 3D dataset
to design buildings and visualize the sceneries with real textures and wility man-
agement companies use 3D spatial databases for the integration of above ground,
underground, indoor and outdoor objects. In order to retrieve those data efficiently,
a 3D spatial index is required in managing, retrieving and optimizing records in
spatial databases. The need of 3D spatial index for 3D data management has been
‘mentioned in Wang et al (2010); Zhu et al (2007); Deren et al (2004); Arens et al
(2005); Zlatanova (2000); Korfter et al (2000).
As we discussed in the previous section, there are many different indexes in han-
dling 2D spatial objects such as k-d Tree, quadtree and grid file. However, apart from
the k-d tree, they are not suitable to 3D applications. Thus, most of the research on.
83D spatial index is done just by extending the 2D spatial index such as R-Tree to
3D R-Tree and Quadiree to Octree, The transition of 2D case to the 3D case is not
straightforward, We could not assume objects in 2D are the same as in 3D space.
For example, the geometry of objects in 2D is different when it transforms into 3D.
Because of that, the relationships between objects in 2D sometimes aze different in
3D. When it comes to spatial indexing, 3D data that was processed based on the
expansion of 2D spatial index may affect the data retrieval during queries Gu et al
(2011), As we know, a query is very significant to the efficient management of the
database especially for large datasets of large scale arcas.
For example, in a commercial database management system such as Oracle, 3D
R-Tree is implemented when dealing with 3D data Murray (2009); Ravada and
Sharma (1999). The concept of 3D R-Tree is not much different from the origi-
nal R-Tree, What makes it capable of handling 3D objects is its extension to the
third dimension of the MBR to the MBV (Minimum Bounding Volume). In the 3D
ReTree, the geometry of candidates that satisfy a query window will be identified
through the volume box (MBV), or in other words, a 3D rectangular parallelepiped.
Figure 9 shows a MBV in a 3D R-Tree. Just like the original R-Tree, the tee shape
is autonomously optimized during data insertion and the leaf nodes are balanced to
make sure the performance is stable. However, when the R-Tree is extended into
3D space, the MBVs of sibling nodes tend to frequently overlap, and MBVs among.
nodes can even contain each other. Overlapping of MBVs is the main reason for the
low efficiency of query performance due to multi-path queties. Therefore, using 3D4 ‘Suhaibah Az, Umie Ujang, Frangois Anton, Daska Mioe and Alias Abdul Rahman
ReTrees for 3D applications is not successfully promising. It needs to be optimized
for modified according to features of 3D space. Several research works such as Zhu
et al (2007); Jun and Shengnan (2011); Gong (2011) try to enhance the 3D R-Tree
in order to minimize the overlapping among siblings.
3 cope
¢ tae
Fig. 9. MBV in R-Tree Suuctue in Gong eta (2009)
Due to critical overlap of sibling nodes and uneven size of nodes, Zhu etal (2007)
attempts to minimize the overlap and optimize the clustering algorithm by introduc
ing k-means clustering algorithm to put forward an improved 3D R-Tree. From his
experiments, the performance of 3D R-Tree is increased by using an improved al-
gorithm where the overlap is minimized drastically while balancing the volume of
nodes. Figure 10 shows & comparison of the original algorithm of 3D R-Tree and
this improved algorithm, Whilst, Jun Jun and Shengnan (2011) suggested splitting
inegular 3D objects into several small parts in order to improve the overlapping.
node issue. In 2009, Gong et al (2009) reported two sub-algorithms of R-Tree that
make the 3D R-Tree to be well-shaped, The two sub-algorithms are namely node-
choosing and node-split procedures. Node-choosing is dedicated to finding a leat
node for new inserted objects and node-split is responsible for spitting overflow
nodes into smaller ones. Using his algorithm, fewer R-Tree nodes are accessed in
‘memory, which increases the query performance. Experimental results of Gongs
work are shown in Figure 11. From the experiment, the improved algorithm is far
better over the original R-Tree and R*-Tree
Although improving static 3D R-Trees increases the efficiency in query perfor-
‘mance, there are several factors that could not be ignored and must be considered
in the implementation of spatial indexes, 3D R-Trees for different applications will
result in different performances. For example, applications with large amounts of
data still face low retrieval efficiency even wien improved 3D R-Tree is appliedReview of Spatial Indexing Techniques for Large Urban Data Management Is
Fig. 10 Comparison of 3D Ree methods (a) Orginal algorithm of 3 R-Tree and (b) Improved
algorithm in Zu el (2007)
Comparison of 3D R-tree
2
400
200
cecal Ae
fi-teae a eer
lrr-tzee m4 43 dist
lostinsl tees) 6 86
Fig. I Improved 3D R-Tee algorithm over Orginal or Cassie R-Teee and R*-Tree by Gong eal
m9)
According to Gu etal (2011), using a single index structure for 3D spatial data is
not an ideal method, Thus, the combination of several index structure is proposed
by many scholars (see Wang et al 2010); Gu etal (2011); Song etal (2006); He
et al (2009)). A combination of two or more index stractue is also known as hybrid
indexing. By combining each other limitations and strengths, index structures in by-
brid indexing would push the limits of data retrioval ficiency. Some examples of
hybrid indexing of 3D objects are ORSI Gu etal (2011) and LOD.SKDR Tree He
etal (2009), ORSI Gu etal (2011) is built based on an Octree and a R-Tree index
structures. The principle of ORS:
1. The area of interest is divided by Octree. Then the first level index is established
by setting up a threshold for subspace k, where k is the number of objects in
sub-space S. When the number of objects in subspace S is greater than k, Octtee
division is repeated,6 ‘Subaibah Az, Unie Ujang, Prangois Anton, Darka Mioe and Alias Abdul Rahman
2, Secondary index is build based on R-tree, Each subspace is pointed by one
pointer, If there is no R-Tree in the subspace, the pointer is empty. In an R-Tree
rode, adjacent or close spatial objects are gathered to improve query efficiency.
wm fanc[ 1 [|r |
count | Lever | cr,B, (CPyeMBBy
count | Lever [or,sime, | 07.5, OP MB,
Fig. 12 ORSI scucture by Guet al (2011)
Figure 12 shows the ORSI index structure Gu et al (2011). Based on the oper-
ational performance of ORSI on R-Tree, average access on inserting, deleting and
retrieving indicates that the performance of ORSI is overall better than classic R-
‘Trees. Figure 13 shows the comparison of ORSI and classic R-Trees.
et
Fig. 18 Comparison of ORS and R-Tree in Gu etal (2011)
As mentioned in previous paragraph, different structures may be used for dif-
ferent applications. For the applications such as underground utility management,Review of Spatial Indexing Techniques for Large Urban Data Management "7
geological analysis and other underground space entity object may need a different
index structure, Any current index structure could not effectively deal with large
scale complicated geological data, The index structure of multi-level spatial index
or BCSR-Tree in He etal (2009) and 3D RR-Tree in Wang et al (2010) is dedicated
to handling the retrieval of 3D underground objects. For large scale underground
data, Liu etal (2010) proposed LOD-SKDR Tce structure for management of large
scale database, The LOD-SKDR Tyee is a combination of SKD-Tree, R-Tree and
information of object’s LOD (Level of Detail). According to Liu et al (2010) the
LOD-SKDR Tree utilizes the object’s number as a bifurcation bound target to im-
prove the balance of SKD-Tree. While in the meantime, R-Tree and texture schedul-
ing of SKD-Tree content are restricted to reduce time on insertion or deletion. A.
LOD object model is also used to improve real time rendering in the 3D scene.
Figure 14 shows a complicated 3D underground scenaio - utility pipeline in urban
Fig. 14 Complicated Underground Scenes: Underground Pipeline in Urban Area
Tn managing large raw datasets acquired using LADAR, spatial indexing plays an
important ole by managing points in spatial databases. With the rapid development
of LiDAR acquisition, a variety of spatial data will continue to be accumulated
The aim of database management in manipulating those data is by having efficient
high-quality rendering of point clouds. Many scholars and researchers attempt to
investigate suitable indexing data structures for handling these huge massive point
clouuls. The hybrid indexing approach on LIDAR data proposed in Liu et al (2008)
is specialized for point cloud visualization. In their work, Liu etal. 2008 integrate
a ked Tiee and an Octtee for a 3,200,079 point cloud on display time. Wang and
Guo (2012) has presented the QMBB spatial index which has both a 2D and 3D
index strctures. QMBB Tee represents Quadiuee segmentation as 2D index based
on.a 2D range image. Each one ofthe corresponding point image of MBR is then
calculated as a MBV in 3D space. Based on the reslt, the rendering speed is up
‘© 10 milion points per second. There are more research developments of hybrid
indexes for point clouls such as Gong (2011); Koo and Shin (2005); Sehiin etal8 Suhaiba Azi, Uanie Ujang, Frangois Anton, Darka Mice and Alias Abdul Rahman
(2009, 2013). These researches proved the capability of hybrid indexing in efficient
data management for point clouds.
4 Conclusion
In this paper, we presented several index structure based on space driven structure
and data driven structure, We briefly described their structure and their implements-
tion along with large volume data application such as, large urban data management
and point clouds data management. Limitation and advantages of each structure are
also revealed. Based on the several methods of 2D and 3D spatial indexing, itis
clearly shown that spatial indexing is capable in handling a large amount of dataset
especially for urban data management, However, the structuse of each spatial index-
ing is unique and only suits for certain situation or case, For an instance, continuous
area of surface is not suitable for tessellated square based index structure such as
‘Quadtrce, Among all the data structure R-Tree is the most widely used spatial index:
structure in commercial database which serving many applications and multidimen-
sional data. In this paper, several extension of R-Tree index structure based on 3D.
hhas been discussed in the last section of this paper. It could be seen that most of
the structute are developed and tested for point dataset and lack of the example and
effort for volume based data, Based on the discussions and some example given, it
indicates that there is more space for 3D spatial index to be improved especially in
the aspect of complex and large data management, A survey on spatial index struc
ture in this paper may give some view and idea on future development of spatial
indexing for large data management.
Acknowledgements Major funding for this esearch was provided by the Ministry of Higher Zé
eatin Malaysia and patally funded by the Land Surveyors Boaed of Malaysia. This research
vas done while te third author was a Visting Pull Professor atthe Faculty of Geoinformation and
eal Estate at Universiti Teknologi Malaysia for which the thd authors very thank
References
Anandhakumar P, Priyadarshini J, Monisha C, Sugirtha K, Raghavan $ (2010)
Location based hybrid indexing structure - r k-d tree, In; 2010 First Interna-
tional Conference on Integrated Intelligent Computing (ICHC), pp 140-145, DOL
10.1109iciie 2010.56
Anders $ (2009) Visualization of genomic data with the hilbert curve. Bioin-
formatics 25(10):1231-1235, DOI 10.1093/bioinformatics/onp152, URL
nformat ics .oxfordjournals.org/content/25/
cract, http! //bioinformatics .oxfordjournals
content /25/10/1231.full.paf htmReview of Spatial Indexing Techniques for Large Urban Data Management 9
Arens C, Stoter J, van Oosterom P (2005) Modelling 34 spatial ob-
jects in a geo-dbms using a 3D primitive. Computers & Geosciences
31Q}:165-177, DOI _hitp:/dx.doi.org/10.1016f.cageo.2004.05.013, URL
hee: / /awe cience/article/p:
5008830040400192x
Balasubramanian L, Sugumaran M (2012) Article: A state-of-art in R-Tree variants
{or spatial indexing. Intemational Journal of Computer Applications 42(20):35—
41, published by Foundation of Computer Science, New York, USA.
Beerentzen J, Gravesen J, Anton F, Aanszs H (2012) Spatial Data Indexing and Point
Location, In: Guide to Computational Geometry Processing, Springer, London,
pp 213-225, DOT 10,1007/978-1-4471-4075-7.12, URL»:
org/10.1007/978-1-4471-4075-7_12
Beckmann N, Kriegel HP, Schneider R, Seeger B (1990) The R*-Tree: an efficient
‘and robust access method for points and rectangles. SIGMOD Ree 19(2):322-331
Bentley JL (1975) Multidimensional binary search trees used for associative search-
ing. Commun ACM 18(9):509-517
Bentley JL, Friedman JH (1979) Data structures for range searching. ACM Comput
397-409, DOT 10,1145/356789.356797, URL http: //doi acm.
45/356789. 35678
s//dx.doi
sms. In: Essentials of Mathematica, vol 1,
ger New York, pp 407-416, DOI 10.1007/978-0-387-49514-9.23
Brinkoif T, Kriegel HP. Sceger B (1993) Efficient processing of spatial joins using
trees, SIGMOD Rec 22(2):237-246
Castro J, Burns $ (2007) Online Data Visualization of Multidimensional Databases
‘Using the Hilbert SpaceFilling Curve, Lecture Notes in Computer Science, vol
4370, Springer Berlin Heidelberg, chap 9, pp 92-109
Chung KL, Huang YL, Liu YW (2007) Efficient algorithms for coding hilbert
‘curve of arbitrary-sized image and application to window query. Informa-
tion Sciences 177(10):2130 ~ 2151, DOT http://dx doi.org/10.1016/.ins.2006.12.
003, URL http: //nww.scien e/article/
pii/s0020025506003732
Deren L, Qing Z. Qiang L, Peng X (2004) From 2d and 34 gis for eybercty. Geo-
spatial Information Science 7(1):1-5
Friedman JH, Bentley JL, Finkel RA (1977) An algorithm for finding best matches
in logarithmic expected time. ACM Trans Math Softw 3(3):209-226, DOT 10.
1145/385744.355745, URL http: //doi.acm.org/10.1145/355744
355745
Gaede V, G Oliver (1998) Multidimensional access methods. ACM Comput Surv
30(2):170-231
Gomes ALP, Voiculescu, Irina, Jorge, Joaquim, Wyvill, Brian, Galbraith, Cal-
Jum (2009) 2 - Spatial Data Structures. In: Implicit Curves and Surfaces
Mathematics, Data Structures and Algorithms, Springer, London, pp 41 —
62, DOT 10.1007/978-1-84882-406-S\.2, URL hetp://ax.doi .org/
1007/978-1-84882-406-5_2
Jirect.com/sei20 Suhaibab Azi, Uanie Ujang, Frangois Anton, Darka Mioe and Alias Abdul Rahman
Gong J, Ke 8, Li X, Qi $ (2009) A hybrid 3D spatial access method based on
quadirees and R-rees for globe data. In: International Symposium on Spatial
Analysis, Spatial-Temporal Data Modeling, and Data Mining, DOT 10,1117/12.
837594, URL http: //dx-doi.org/10.1117/12. 837594
Gong J, Zhu Q, Zhang H, Li X, Zhou D (2011) An adaptive control method of
lods for 3D scene based on ree index. Acta Geodaetica et Cartographica Sinica
(Assue 4):Page 531-534
Gu W, Wang J, Shi H, Liu ¥ (2011) Research on a hybrid spatial index structure
Journal of Computational Information Systems 7(11}:3972-3978
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. SIG-
MOD Ree 14(2):47-57
He Z, Liu G, Weng Z, Wa C (2009) Three-dimensional spatial indexing method of
‘complicated geological scene. In: 17th International Conference on Geoinformat-
ies, 2009, IBEE, 1, pp 1-4
Hinrichs K (1985) Implementation of the grid fle: Design concepts and experience.
BIT Numerical Mathematics 25(4):569-$92
Hutflesz. A, Six HW, Widmayer P (1988) Twin grid files: space optimizing ac-
‘cess schemes. SIGMOD Ree 17(3):183-190, DOT 10.1145/971701,50222, URL
http://doi .2em.org/10.1145/971 701 50222
Hwang $, Kwon K, Cha S, Lee B (2003) Performance Evaluation of Main-Memory
Retree Variants, Lecture Notes in Computer Science, vol 2750, Springer Berlin
Heidelberg, chap 2, pp 10-27
Jun G, Shengnan K (2011) 3D spatial query implementation method based on tree.
In: International Conference on Remote Sensing. Environment and Transporta-
tion Engineering (RSET), 2011, IEBE, 1, pp 2828-2831
Kamata S, Niimi M, Kawaguchi E (1996) A gray image compression using a hilbert
sean, In: Proceedings of the 13th International Conference on Pattern Recogni-
tion, 1996., 3, pp 905-909, DOT 10.1109/iepr.1996.547299
Kamel I, Faloutsos C (1994) Hilbert R-Tree: An improved R-Tree using fractals
Kevin B (2008) Organizing point sets. PAD thesis, Freie Universitat Berlin
Klinger A, Dyer CR (1976) Experiments on picture representation us-
ing regular decomposition. Computer Graphics and Image Processing
5(1):68-105, DOT http://dx doi org/10.1016/S0146-664X(76)80006-8, URL
jencedirect .com/science/article/pii/
148664X76800058
Koo YM, Shin BS (2005) An Efficient Point Rendering Using Octree and Texture
Lookup, Lecture Notes in Computer Science, vol 3482, Springer Berlin Heidel-
berg, chap 127, pp 1187-1196
Kothurt RKV, Ravada S, Abugoy D (2002) Quadttee and rtree indexes in oracle
spatial: a comparison using gis data
Kun Z, Xiu-guo L, Hui ¥ (2005) Research on Spatial Index Structure of LOD-OR
‘Tree in 3D GIS. Bulletin of Surveying and Mapping S(1):27-29
Li MH, Hong BR, Cai ZS, Piao SH, Huang QC (2008) Novel indoor mobile robot
navigation using monocular vision, Engineering Applications of Attficial In-
telligence 21(3):485 — 497, DOT hup:/idx.doi.org/10.1016/).engappai.2007.05.Review of Spatial Indexing Techniques for Large Urban Data Management FE
003, URL hetp://www.sciencedirect
pii/80852197607000607
Liv H, Huang 7, Zhan Q, Lin P (2008) A database approach to very large LIDAR
data management. In: International Society of Photogranometry and Remote Sens-
ing QSPRS), ISPRS Beijing XXXVIL463-468
Liu ¥, Liu G, He Z (2010) Spatial index technology for multi-scale and large scale
spatial data In: 18th International Conference on Geoinformatics, 2010, IEEE, 1.
ppt
Kofler M., Gervauty M. (2000) R-rees for organizing and visualizing
3d gis databases. The Journal of Visualization and Computer Anima-
tion 11G)128-143, DOT doi:10.1002/1099-1778(200007)11:3\&'\#060; 129:
‘AID-VIS227 \&e\#062,3.0.C0;2-T, URL http: / /wmw. ingentaconnect
con/ content / jus /vis/2000/00000011/00000003/art00227
Manolopoulos ¥, Nanopoulos A, Papadopoulos AN, Theodordis ¥ (2006) R-Tiees:
theory and applications. Springer
Mark DM (1986) The use of quactrees in geographic information systems and spa
tial data handling, Hardware, Data Capture and Management Techniques, Auto-
Carto London 1:517-526
Mark DM, Lauzon JP (1985) Approaches for quadtree-based geographic informa-
tion systems at continental or global scales. In: Proceedings of Auto-Carto, 7, pp
355-365
Mohamad Yusoff I, Uznie Ujang M, Abdul Rabman A (2010) 3D Volumetsic Soft
‘Geo-objects for Dynamic Urban Runoff Modeling, Developments in 3D Geo-
Information Sciences, vol 1, Springer Berlin Heidelberg, chap 18, pp 200-219
Murray € (2009) Oracle spatial developer's guide, L1g release 1 (11.1)
Nene SA, Nayar SK (1997) A simple algorithm for nearest neighbor scar in high
dimensions. Patter Analysis and Machine Intelligence, IEEE Transactions on
19(9):989-1008,
Nievergelt J, Hinterberger HY, Seveik K.C. (1984) The Grid File: An Adaptable,
Symmetiic Multikey File Structure. ACM Trans. Database Syst 9(1):38-71
DOI 10.1145/348.318586, URL htcp://do!.aem.org/10.1145/348.
318586
osterom PV (1999) Spatial access methods edited by longley, goodchild, maguire
en thind, john wileypages385-400 (vol. 1), 1999. Geographical information sys-
tems 1:385~400
Papadopoulos A, Manolopoulos Y (1997) Performance of nearest ncigh-
bor queties in ruees. In; Database Theory ICDT "97, Lecture Notes
in Computer Science, vol 1186, Springer Berlin Heidelberg, pp 34108,
DOL 10,1007/3-540-62222-5.59, URL http: //ax.dos .org/10.1007/
3-$40-62222-5_59
Peano G (1890) Sur une courbe, qui remplit toute une aire plane. Mathematische
Annalen 36(1):157-160
Peuguet DY (1984) Data structures fora knowledge-based geographic information
system. In: Proc. Spatial Data Handling2 Suhaiba Azri, Uanie Ujang, Frangois Anton, Darka Mioe and Alias Abdul Rahman
Ranade $ (1981) Use of quadtrees for edge enhancement. IEEE Transactions on
‘Systems Man and Cybernetics 11:370-373
Ranade S, Shneier M (1981) Using quadtrees to smooth images. TE
‘on Systems Man and Cybernetics 11:373-376
Ravada S, Sharma J (1999) Oracle8i Spatial: Experiences with Extensible
Databases, Lecture Notes in Computer Science, vol 1651, Springer Berlin Hei-
delberg, chap 21, pp 355-359
Rigaux P, Scholl M, Voisard A (2002) 6 - spatial access methods
In: Spatial Databases, Morgan Kaufmann, San Francisco, pp 201-
266, DOT _ hittp://dx doi.org/10.1016/B978- 155860588-6/50008-7, URL
http://www. sciencedi rect .com/science/article/p
p9781558605886500087
Robinson G, Tagare H, Duncan J, Jaffe C (1996) Medical image collection indexing:
‘Shape-based retrieval using kd-tres. Computerized Medical lmaging and Graph-
ies 20(4):209-217, DOL hittp://dx.doi.org/10.1016/S0895-6111(96)00014-6,
URL http://www. sciencedix Jarticle/pii/
80895511196000146
Roussopoulos N, Kelley S, Vincent E (1995) Nearest neighbor queries. SIGMOD
Ree 24(2):71-79
‘Samet HT (1982) Distance transform for images represented by quadtrees. Pattern
‘Analysis and Machine Intelligence, IEE Transactions on (3):298-303
Samet H, Rosenfeld A, Shaffer CA, Webber RE (1984) A geographic information
system using quadtrees. Pattern Recognition 17(6):647~656, DOI hitp://dx.doi
‘org/10,1016/0031-3203(84)90018-9, URL nctp: //uww. scleacedizect
com/science/article/pii/0031320384900185
‘Sample N, Haines M, Arnold M, Purcell T (2001) Optimizing search strategies in kd
trees. In: Fifth WSES/IEEE World Multiconference on Circuits, Systems, Com-
‘munications & Computers (CSCC 2001)
Schén B, Bertolotto M, Laefer DF, Morrish $ (2009) Storage, manipulation, and
visualization of lidar data, In: Proceedings of the 3rd ISPRS International Work-
shop 3D-ARCH 2009:" 3D Virtual Reconstruction and Visualization of Complex
Architectures”, International Society of Photogrammetry and Remote Sensing,
XXXVII-SAWL
‘Schon B, Mosa ASM, Laefer DF, Bertolotto M (2013) Octree-based indexing for 34
pointclouds within an oracle spatial dbms, Computers & Geosciences 51(0):430—
438
Sellis TK, Roussopoulos N,
‘ulti-dimensional objects
Sharkawi KH, Ujang MU, Abdul-Rahman A (2008) 3D navigation system for vis
tual reality based on 3D game engine. In: The International Archives of the Pho-
togrammetry, Remote Sensing and Spatial Information Sciences, 2008, ISPRS.
XXXVI, pp 513-518,
Song Xy, Zhow Xw, Wang Yh (2006) Research on spatial index structure of hybrid
twee in 3D GIS. Journal of Shenyang Jianzhu University (Natural Seience) 3:030
‘Transactions
con/scie!
loutsos C (1987) The R+-Tree: A dynamic index forReview of Spatial Indexing Techniques for Large Urban Data Management 2
‘Tamminen M (1982) The extendible cell method for closest point problems. BIT
‘Numerical Mathematics 22(1):27-41
‘Thiele R, Wos L (2002) Hilber’s twenty-fourth problem, Journal of Automated
Reasoning 29(1):67-89
‘Tobler W, Chen Zt (1986) A quadtree for global information storage. Geographical
“Analysis 18(4):360-371
‘Ujang U, Abdul-Rahman A (2013) Teraporal Three-Dimensional Ontology for Geo-
graphical Information Science (GIS) - A Review: Journal of Geographic Informa-
tion System 5(3):314-323, DOT 10.4236/jgis.2013.53030, URL http://www.
scirp.org/ journal /PaperDown load. aspx?paperID=33280
Wang Y, Guo M (2012) An integrated spatial indexing of huge point im-
age model. ISPRS - International Archives of the Photogrammetry, Re-
‘mote Sensing and Spatial Information Sciences XXXIX-B4:397-401.
DOI 10.5194/sprsarchives-XXXIX-B4-397-2012, URL attp://www.
int~arch-photogranm-remote~sens-spatial-inf-scei.net/
XXXIX-B4/397/2012/
Wang Y, Sheng ¥, Zhou L, Guo F, Zhao L. (2010) An underground space object-
oriented three-dimensional hybrid spatial indexing method, In: 18th International
Conference on Geoinformatics, 2010, IEEE, 1, pp I-S
‘Zhang J, You S, Gruenwald L (2011) Parallel quadtree coding of large-scale raster
geospatial data on gpepus, In: Proceedings of the 19th ACM SIGSPATIAL In-
ternational Conference on Advances in Geographic Information Systems, ACM,
New York, NY, USA, GIS "11, pp 457-460, DOT 10.1145/2093973,2094047,
URL ht