Splitting Methods

They differ primarily in the techniques used to split an overflowing node during insertion. The goal is to minimize coverage and overlap. These goals are at times contradictory and thus heuristics are often used. The node splitting algorithms described below range from being quite simple (e.g., exhaustive search) to being fairly complicated (e.g., R*-tree [Beck90c]). Some just split the overflowing node, while others try to reinsert some of the objects and nodes from the overflowing nodes thereby striving for better overall behavior (e.g., reduction in coverage and overlap).
  1. Exhaustive Search: If the R-tree node overflows, then try all possible ways of splitting the node into two new nodes that satisfy the requirements on the minimal and maximal number of children that can be stored in a node. Choose the split which causes bounding boxes of the two new nodes to have the smallest area [Gutt84]. Thus the goal of this method is to reduce the coverage. This method is quite complex as it tries all possibilities.
  2. Quadratic method: Examine all the children of the overflowing node and find the pair of bounding boxes that would waste the most area were they to be inserted in the same node. This is determined by subtracting the sum of the areas of the two bounding boxes from the area of the covering bounding box. These two bounding boxes are placed in separate nodes, say j and k . The set of remaining bounding boxes are examined and the bounding box i whose addition maximizes the difference in coverage between the bounding boxes associated with j and k is added to the node whose coverage is minimized by the addition. This process is reapplied to the remaining bounding boxes [Gutt84]. This method takes quadratic time.
  3. Linear Method: Find the two bounding boxes with the greatest normalized separation along both axes, and split along this axis. The remaining bounding boxes in the node are assigned to the nodes whose covering bounding box is increased the least by the addition [Gutt84]. This method takes linear time.
  4. R*-tree: The R*-tree [Beck90c] is a name given to a variant of the R-tree which makes use of the most complex of the node splitting algorithms. The algorithm differs from the other algorithms as it attempts to reduce both overlap and coverage. In particular, the primary focus is on reducing overlap with ties broken by favoring the splits that reduce the coverage by using the splits that minimize the perimeter of the bounding boxes of the resulting nodes. In addition, when a node a overflows, instead of immediately splitting a , an attempt is made first to see if some of the objects in a could possibly be more suited to being in another node. This is achieved by reinserting a fraction (30% has been found to yield good performance [Beck90c]) of these objects in the tree (termed forced reinsertion). The node is only split if it has been found to overflow after reinsertion has taken place. This method is quite complex.
  5. Ang/Tan Method: Form four sets, one for each face (i.e., side) of the overflowing node a . Associate each bounding box o in a with the closest face of a in each of the two dimensions. Once the four sets representing four partitions have been constructed (i.e., each bounding box o has been associated with two sets), select the partition that ensures the most even distribution of bounding boxes. In case of a tie, choose the partition with the least overlap. In case of another tie, choose the partition with the least coverage. This method takes linear time [Ang97].
  6. Hilbert Nonpacked Method: Order the objects on the basis of the Peano-Hilbert number (e.g., [Same90a]) corresponding to their centroid [Kame94]. The objects are stored in a B+-tree.
  7. Morton Nonpacked Method: Order the objects on the basis of the Morton number (e.g., [Same90a]) corresponding to their centroid [Whit82]. This number is obtained by interleaving the x and y coordinate values of the centroid. The objects are stored in a B+-tree.
  8. Packed Method: Order the objects on the basis of some criterion [Rous85]. It has two stages. In our implementation, the first stage orders the centroids of the objects in Peano-Hilbert order. The second stage fills the leaf nodes by examining the objects in increasing Peano-Hilbert order of their centroids (obtained by the first stage). Each leaf node is filled with the first unprocessed object and its M-1 nearest neighbors (in the space in which the objects lie) which have not yet been inserted in other leaf nodes. Once an entire level of the packed R-tree has been obtained, the algorithm is reapplied to add nodes at the next level using the same nearest neighbor criterion, terminating when a level contains just one node. The only difference between the ordering that is applied at the levels containing the nonleaf nodes from that used at the level of the leaf nodes is that in the former case we are ordering the bounding boxes while in the latter case we are ordering the actual objects.
  9. Hilbert Packed Method: Order the objects on the basis of the Peano-Hilbert number corresponding to their centroid [Kame93]. Once this ordering has been obtained, the leaf nodes are filled to capacity by examining the objects in increasing order. The nodes at the remaining levels are ordered according to the time at which they were created.
  10. Morton Packed R-tree: Order the objects on the basis of the Morton number corresponding to their centroid (e.g., [Kame93]). This number is obtained by interleaving the x and y coordinate values of the centroid. Once this ordering has been obtained, the leaf nodes are filled to capacity by examining the objects in increasing order. The nodes at the remaining levels are ordered according to the time at which they were created.