Tuesday, June 17, 2014

Heap Sort

The binary heap data structures is an array that can be viewed as a complete binary tree. Each node of the binary tree corresponds to an element of the array. The array is completely filled on all levels except possibly lowest.

We represent heaps in level order, going from left to right. The array corresponding to the heap above is [25, 13, 17, 5, 8, 3].

The root of the tree A[1] and given index i of a node, the indices of its parent, left child and right child can be computed
 
PARENT (i)
        return floor(i/2)
LEFT (i)
        return 2i
RIGHT (i)
        return 2i + 1


Let's try these out on a heap to make sure we believe they are correct. Take this heap,
which is represented by the array [20, 14, 17, 8, 6, 9, 4, 1].

We'll go from the 20 to the 6 first. The index of the 20 is 1. To find the index of the left child, we calculate 1 * 2 = 2. This takes us (correctly) to the 14. Now, we go right, so we calculate 2 * 1 + 1 = 3. This takes us (again, correctly) to the 17.

Now let's try going from the 4 to the 20. 4's index is 7. We want to go to the parent, so we calculate 7 / 2 = 3, which takes us to the 17. Now, to get 17's parent, we calculate 3 / 2 = 1, which takes us to the 20.

 

Heap Property

In a heap, for every node i other than the root, the value of a node is greater than or equal (at most) to the value of its parent.

       
A[PARENT (i)] ≥A[i]

Thus, the largest element in a heap is stored at the root.


Following is an example of Heap:

By the definition of a heap, all the tree levels are completely filled except possibly for the lowest level, which is filled from the left up to a point. Clearly a heap of height h has the minimum number of elements when it has just one node at the lowest level. The levels above the lowest level form a complete binary tree of height h -1 and 2h -1 nodes. Hence the minimum number of nodes possible in a heap of height h is 2h. Clearly a heap of height h, has the maximum number of elements when its lowest level is completely filled. In this case the heap is a complete binary tree of height h and hence has 2h+1 -1 nodes.
Following is not a heap, because it only has the heap property - it is not a complete binary tree. Recall that to be complete, a binary tree has to fill up all of its levels with the possible exception of the last one, which must be filled in from the left side.

 

Height of a node

We define the height of a node in a tree to be a number of edges on the longest simple downward path from a node to a leaf.

 

Height of a tree

The number of edges on a simple downward path from a root to a leaf. Note that the height of a tree with n node is lg n which is (lgn). This implies that an n-element heap has height  lg n
In order to show this let the height of the n-element heap be h. From the bounds obtained on maximum and minimum number of elements in a heap, we get
2hn ≤ 2h+1-1
Where n is the number of elements in a heap.
2hn ≤ 2h+1
Taking logarithms to the base 2
h  ≤  lgn  ≤  h +1
It follows that hlgn.

We known from above that largest element resides in root, A[1]. The natural question to ask is where in a heap might the smallest element resides? Consider any path from root of the tree to a leaf. Because of the heap property, as we follow that path, the elements are either decreasing or staying the same. If it happens to be the case that all elements in the heap are distinct, then the above implies that the smallest is in a leaf of the tree. It could also be that an entire subtree of the heap is the smallest element or indeed that there is only one element in the heap, which in the smallest element, so the smallest element is everywhere. Note that anything below the smallest element must equal the smallest element, so in general, only entire subtrees of the heap can contain the smallest element.

 

Inserting Element in the Heap


Suppose we have a heap as follows

Let's suppose we want to add a node with key 15 to the heap. First, we add the node to the tree at the next spot available at the lowest level of the tree. This is to ensure that the tree remains complete.

Let's suppose we want to add a node with key 15 to the heap. First, we add the node to the tree at the next spot available at the lowest level of the tree. This is to ensure that the tree remains complete.


Now we do the same thing again, comparing the new node to its parent. Since 14 < 15, we have to do another swap:


Now we are done, because 15   20.

Four basic procedures on heap are
  1. Heapify, which runs in O(lg n) time.
  2. Build-Heap, which runs in linear time.
  3. Heap Sort, which runs in O(n lg n) time.
  4. Extract-Max, which runs in O(lg n) time.

 

 

Maintaining the Heap Property

Heapify is a procedure for manipulating heap data structures. It is given an array A and index i into the array. The subtree rooted at the children of A[i] are heap but node A[i] itself may possibly violate the heap property i.e., A[i] < A[2i] or A[i] < A[2i +1]. The procedure 'Heapify' manipulates the tree rooted at A[i] so it becomes a heap. In other words, 'Heapify' is let the value at A[i] "float down" in a heap so that subtree rooted at index i becomes a heap.

Outline of Procedure Heapify

Heapify picks the largest child key and compare it to the parent key. If parent key is larger than heapify quits, otherwise it swaps the parent key with the largest child key. So that the parent is now becomes larger than its children.
It is important to note that swap may destroy the heap property of the subtree rooted at the largest child node. If this is the case, Heapify calls itself again using largest child node as the new root.

Heapify (A, i)
  1. l left [i]
  2. r right [i]
  3. if l ≤ heap-size [A] and A[l] > A[i]
  4.     then largest l
  5.     else largest i
  6. if r ≤ heap-size [A] and A[i] > A[largest]
  7.     then largest r
  8. if largest  ≠ i
  9.     then exchange A[i] A[largest]
  10.         Heapify (A, largest)

 


Analysis

If we put a value at root that is less than every value in the left and right subtree, then 'Heapify' will be called recursively until leaf is reached. To make recursive calls traverse the longest path to a leaf, choose value that make 'Heapify' always recurse on the left child. It follows the left branch when left child is greater than or equal to the right child, so putting 0 at the root and 1 at all other nodes, for example, will accomplished this task. With such values 'Heapify' will called h times, where h is the heap height so its running time will be θ(h) (since each call does (1) work), which is (lgn). Since we have a case in which Heapify's running time (lg n), its worst-case running time is Ω(lgn).

 

Example of Heapify
 

Suppose we have a complete binary tree somewhere whose subtrees are heaps. In the following complete binary tree, the subtrees of 6 are heaps:


The Heapify procedure alters the heap so that the tree rooted at 6's position is a heap. Here's how it works. First, we look at the root of our tree and its two children.


We then determine which of the three nodes is the greatest. If it is the root, we are done, because we have a heap. If not, we exchange the appropriate child with the root, and continue recursively down the tree. In this case, we exchange 6 and 8, and continue.


Now, 7 is greater than 6, so we exchange them.


We are at the bottom of the tree, and can't continue, so we terminate.
 

 

Building a Heap

We can use the procedure 'Heapify' in a bottom-up fashion to convert an array A[1 . . n] into a heap. Since the elements in the subarray A[n/2+1 . . n] are all leaves, the procedure BUILD_HEAP goes through the remaining nodes of the tree and runs 'Heapify' on each one. The bottom-up order of processing node guarantees that the subtree rooted at children are heap before 'Heapify' is run at their parent.

BUILD_HEAP (A)
  1. heap-size (A) length [A]
  2. For i ←  floor(length[A]/2) down to 1 do
  3.         Heapify (A, i)

We can build a heap from an unordered array in linear time.

 

Heap Sort Algorithm

The heap sort combines the best of both merge sort and insertion sort. Like merge sort, the worst case time of heap sort is O(n log n) and like insertion sort, heap sort sorts in-place. The heap sort algorithm starts by using procedure BUILD-HEAP to build a heap on the input array A[1 . . n]. Since the maximum element of the array stored at the root A[1], it can be put into its correct final position by exchanging it with A[n] (the last element in A). If we now discard node n from the heap than the remaining elements can be made into heap. Note that the new element at the root may violate the heap property. All that is needed to restore the heap property.

HEAPSORT (A)
  1. BUILD_HEAP (A)
  2. for i length (A) down to 2 do
    exchange A[1] A[i]
    heap-size [A] heap-size [A] - 1
    Heapify (A, 1)

The HEAPSORT procedure takes time O(n lg n), since the call to BUILD_HEAP takes time O(n) and each of the n -1 calls to Heapify takes time O(lg n).


Now we show that there are at most én/2h+1ù nodes of height h in any n-element heap. We need two observations to show this. The first is  that if we consider the set of nodes of height h, they have the property that the subtree rooted at these nodes are disjoint. In other words, we cannot have two nodes of height h with one being an ancestor of the other. The second property is that all subtrees are complete binary trees except for one subtree. Let Xh be the number of nodes of height h. Since Xh-1 o ft hese subtrees are full, they each contain exactly 2h+1 -1 nodes. One of the height h subtrees may not  full, but contain at least 1 node at its lower level and has at least 2h nodes. The exact count is 1+2+4+ . . . + 2h+1 + 1 = 2h. The remaining nodes have height strictly more than h. To connect all subtrees rooted at node of height h., there must be exactly Xh -1 such nodes. The total of nodes is at least
(Xh-1)(2h+1 + 1) + 2h + Xh-1 which is at most n.
Simplifying gives
Xhn/2h+1 + 1/2.

In the conclusion, it is a property of binary trees that the number of nodes at any level is half of the total number of nodes up to that level. The number of leaves in a binary heap is equal to n/2, where n is the total number of nodes in the tree, is even and n/2 when n is odd. If these leaves are removed, the number of new leaves will be lg(n/2/2 or n/4. If  this process is continued for h levels the number of leaves at that level will be n/2h+1.

 

Implementation

void heapSort(int numbers[], int array_size)
{
int i, temp;

for (i = (array_size / 2)-1; i >= 0; i--)
siftDown(numbers, i, array_size);

for (i = array_size-1; i >= 1; i--)
{
temp = numbers[0];
numbers[0] = numbers[i];
numbers[i] = temp;
siftDown(numbers, 0, i-1);
}
}


void siftDown(int numbers[], int root, int bottom)
{
int done, maxChild, temp;

done = 0;
while ((root*2 <= bottom) && (!done))
{
if (root*2 == bottom)
maxChild = root * 2;
else if (numbers[root * 2] > numbers[root * 2 + 1])
maxChild = root * 2;
else
maxChild = root * 2 + 1;

if (numbers[root] < numbers[maxChild])
{
temp = numbers[root];
numbers[root] = numbers[maxChild];
numbers[maxChild] = temp;
root = maxChild;
}
else
done = 1;
}
}

No comments:

Post a Comment