Heap (Priority Queue)
Table of Contents
Concept
Priority Queue
A priority queue is an abstract data type similar to a regular queue, except that each element in a priority queue has an associated priority.
It only supports comparable data because elements with high priority should be popped out before elements with low priority.
How to assign relative priorities to elements:
- Enqueued elements must be able to be ordered in some way, either from least to greatest or greatest to least.
- If two elements have the same priority, they are served in the same order in which they were enqueued. It means that the element enqueued first has higher priority than the same element enqueued after.
Given an array of numbers, there are usually two types of PQs:
- The Minimum PQ serves the smallest element for the highest priority.
- The Maximum PQ serves the greatest element for the highest priority.
- In C++,
<queue>
header provides a container adapter priority_queue.- Also, you can customize a comparator used in that class.
Popping all the elements out of the PQ in order produces a sequential array.
The PQ mainly provides three operations.
peek()
is to read the element with the highest priority.poll()
is to pop the element with the highest priority out of the PQ.add()
is to enqueue an element into the PQ.
The PQ can be implemented by Heap data structure, but this is not the only way. A linked list or an array can be used to build this abstract data type. However, it would not give the best time complexity.
Notice that PQ != Heap.
Heap
A heap is a tree-based data structure that satisfies the heap property (or heap invariant).
- In a min heap, a parent node should have a smaller key than the keys of its children. So, the root node has the smallest key in the min heap.
- It is called “the min heap propety”.
- In a max heap, a parent node should have a greater key than the keys of its children. So, the root node has the greatest key in the max heap.
- It is called “the max heap propety”.
If any parent node does not order with any of its children or it is not a tree, we can say “it has a violation of the heap property.”
A usual implementation of a heap is a complete binary tree in which
- All the leaf nodes should lean towards the left.
- The last leaf node might not have a right sibling (It might not be a full binary tree.)
Let’s see the difference between a complete binary tree and a full binary tree.
And look at the min and max heaps.
In terms of the Heap implementation, it is typically constructed in-place in the same array where the keys of nodes are stored.
As you can see, the element at position \(0\) represents the root node. And two children positions at \(2*i+1\) and \(2*i+2\), respectively.
Using a d-ary heap instead of a binary heap helps improve time complexity in heap operations. Then, the indices of children should be \(2*i+1\) to \(2*i+d\) where $d$ is the maximum number of children per node.
To keep the heap property, adding a new node requires one particular operation.
- A new node is added at the last in a heap at first.
- Bubble Up (or Swim Up) arises, which switches up the new node with its parent node if the parent has a greater key than its key in a min heap or a smaller key in a max heap.
- It repeats until the heap satisfies the heap property.
To keep the heap property, removing a node requires another particular operation.
- The node to be removed is first switched up with the last node in a heap.
- The node at the last position is removed.
- Bubble Down (or Sink Down) arises, which moves the switched node at the position to remove downward to the smallest child node in a min heap or the greatest child node in a max heap.
- It also occurs until the heap satisfies the heap property.
But, it might need Bubble Up operation if a blue node, as shown above, is smaller than its parent node in Step 3.
Let’s see the Step 4 in the following.
In a binary heap, adding or removing takes $O(log_2(n))$ time where $n$ is the number of nodes.
Quick Tip: Negation can be used to turn a min heap into a max heap (or vice versa).
- Considering the implementation of a comparator:
- Let $x$, $y$ be elements (that actually are numbers) in a min heap where $x$ comes before $y$ if $x \leq y$.
- The negation of this is that, if $x \geq y$, then $y$ comes before $x$.
- An inequation (> or <) can be flipped by the multiplication of -1.
- Flipping the result of comparing between two elements is to turn the min heap into the max heap.
- Another alternative method for numbers is to negate the numbers as we insert them into a heap. And negate them again when they are taken out. This has the same effect as negating the comparator.
- Remember that $x \leq y$ becomes $-x \geq -y$.
- Suppose
lex(x,y)
is a comparator for strings which sorts strings in lexicographic order (the default in most programming languages). Then letnlex(x,y)
be the negation oflex(x,y)
, and also let $s1$, $s2$ be strings.- It produces like the followings in a string comparison.
Result | Comparison |
---|---|
lex(s1, s2) = -1 | if s1 < s2 |
lex(s1, s2) = 0 | if s1 = s2 |
lex(s1, s2) = +1 | if s1 > s2 |
nlex(s1, s2) = -1 | if s1 < s2 |
nlex(s1, s2) = 0 | if s1 = s2 |
nlex(s1, s2) = +1 | if s1 > s2 |
Getting back to the priority queue with the binary heap implementation, three main operations of the PQ will actually work like below.
peek()
returns the root node which represents either the smallest element in a min heap or the greatest element in a max heap.add()
enqueues a new node at the last position and swims up that node until there is no violation of the heap property.poll()
removes and returns the root node.- The root node will be swapped with the last one and removed.
- Bubble down occurs until it satisfies the heap property.
When to use
- Used in certain implementations of Dijkstra’s Shortest Path algorithm.
- Used to dynamically fetch the “next best” or “next worst” element.
- Used in Huffman coding for lossless data compression.
- Used in Best First Search algorithms such as A* to continuously grab the “next most promising node”.
- Used by Minimum Spanning Tree (MST) algorithms on direct graphs.
Complexity
Operation | Time |
---|---|
Construct | $O(n)$ |
Peek | $O(1)$ |
Poll | $O(log(n))$ |
Remove | $O(log(n))$ |
Add | $O(log(n))$ |
(Naive) Remove | $O(n)$ |
(Advanced) Remove with help from a hash table | $O(log(n))$ |
(Naive) contain | $O(n)$ |
Contain with help of a hash table | $O(1)$ |
- Building a PQ takes linear time to set a heap as an array of length $n$.
- Polling and adding are required to satisfy the heap property by swimming up or sinking down.
- Removing a specific node takes logarithmic time (because of swimming up and sinking down),
but a hash table can help optimize it by taking up linear space. But it adds some overhead to the binary heap implementation.
- The inefficiency of the removal algorithm comes from the fact that we have to perform a linear search to find out where an element is indexed. A hash table provides a constant time lookup and update for a mapping from a key (the node value) to a value (the index).
- A Set or TreeSet of indices for the mapping can also be used for several same nodes.
Implementation
Interface
This is a PriorityQueue
class with the binary heap implementation.
Here, this interface stores objects in _heap[]
array and uses a dictionary as a hash table to immediately find an index at which each object positions. It is vital only to put hashable objects into the array because it treats objects as keys in a dictionary.
One notable feature of this hash table is its use of a set of indices as values. This design choice allows us to effectively track the same objects that may be present at different positions in the heap.
- It provides us with two auxiliary methods for the hash table:
_add_map()
and_remove_map()
.
The operation to maintain the heap property, known as Heapify, is achieved through the use of three private methods:
_swap()
exchanges the positions of two nodes in our heap (plus our hash table)._swim_up()
exchanges the node at a given index and its parent node until the given node no longer has higher priority._sink_down()
exchanges the node at a given index and one of its children until the given node no longer has lower priority.
Remind that this binary heap is an array where, when the index of a given node is $i$,
- The parent node has an index of $(i-1)\div2$.
- The left child node has an index of $(i*2)+1$.
- The right child node has an index of $(i*2)+2$.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
class PriorityQueue:
# ...
def _swap(self, i: int, j: int) -> None:
o1, o2 = self._heap[i], self._heap[j]
self._map[o1].remove(i)
self._map[o1].add(j)
self._map[o2].remove(j)
self._map[o2].add(i)
self._heap[i], self._heap[j] = o2, o1
def _swim_up(self, at: int) -> None:
parent = int((at - 1) / 2)
# _less() returns whether the left has higher priority than the right.
while at > 0 and self._less(self._heap[at], self._heap[parent]):
self._swap(parent, at)
at = parent
parent = int((at - 1) / 2)
def _sink_down(self, at: int) -> None:
while True:
obj = self._heap[at]
left_child_at = at*2 + 1
right_child_at = at*2 + 2
# _less() returns whether the left has higher priority than the right.
if left_child_at < self._size and self._less(self._heap[left_child_at], obj):
self._swap(left_child_at, at)
at = left_child_at
elif right_child_at < self._size and self._less(self._heap[right_child_at], obj):
self._swap(right_child_at, at)
at = right_child_at
else:
break
return None
# ...
In addition, there are three main operations:
peek()
returns the first object at the root position in our heap.
poll()
removes and returns the first object. It is sinking down the node that was at the last position
and was switched with the root node until there is no violation of the heap property.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class Priority Queue:
# ...
def poll(self) -> object:
if self.is_empty():
raise Exception("PriorityQueue is empty.")
# Decrement the number of nodes.
self._size -= 1
# Get the root node.
root = self._heap[0]
# Swap the root and the last nodes.
self._swap(0, self._size)
# Remove the node at the last that is the root node.
self._remove_map(root, self._size)
self._heap.pop()
# Sink down the node at the first position.
if self._size > 0:
self._sink_down(0)
# Return the old root node.
return root
# ...
add()
appends a new object at the last position. Then, it is swimming up the last node
that is newly added until there is no violation of the heap property.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class PriorityQueue:
# ...
def add(self, obj: object) -> None:
# Append it to the end of our heap.
self._heap.append(obj)
# Add it into our hash table (Its key happens to be the last index + 1).
self._add_map(obj, self._size)
# Swim up the node at the last position.
self._swim_up(self._size)
# Increment the number of nodes.
self._size += 1
# Do doubling the capacity if the size exceeds it.
if self._size > self._capacity:
self._capacity *= 2
# ...
The time to need both _swim_up()
and _sink_down()
is when to remove a particular node in the middle of the heap.
remove_at()
tries to sink down the swapped node with the particular node. But, if the swapped node has higher priority than its children,
it tries to swim up until there is no violation of the heap property.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
class PriorityQueue:
# ...
def remove_at(self, at: int) -> object:
if self.is_empty():
raise Exception('PriorityQueue is empty.')
# Decrement the number of nodes.
self._size -= 1
# Swap the given node with the last one.
self._swap(at, self._size)
# Remove the last node.
removed = self._heap[self._size]
self._remove_map(removed, self._size)
self._heap[self._size] = 0
# Early return if it removed the last one.
if at == self._size:
return removed
# Try sinking down or swimming up.
prev_last_obj = self._heap[at]
self._sink_down(at)
if self._heap[at] == prev_last_obj:
self._swim_up(at)
return removed
def remove(self, obj: object) -> object:
if self.is_empty():
raise Exception('PriorityQueue is empty.')
return self.remove_at(self._map[obj])
# ...
remove()
makes use of a hash table to find the index of a given object in constant time.
Otherwise, it must scan through all the nodes from the root node to the node we want in the heap,
which takes linear time.
PriorityQueue class
Python supports queue
module that contains a PriorityQueue class which works in the same way.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from queue import PriorityQueue
# Min PQ
pq = PriorityQueue()
pq.put((1, 'value1'))
pq.put((2, 'value2'))
pq.put((2, 'value2'))
pq.put((3, 'value3'))
assert pq.get() == (1, 'value1')
assert pq.get() == (2, 'value2')
assert pq.get() == (2, 'value2')
assert pq.get() == (3, 'value3')
# Max PQ
pq = PriorityQueue()
for x in [1, 2, 2, 3]:
pq.put(-x)
assert [-pq.get() for _ in range(pq.qsize())] == [3, 2, 2, 1]
heapq module
Python provides heapq module for Heap operation.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import heapq
h1 = [3, 5, 4, 7, 8, 2, 9]
heapq.heapify(h1)
assert h1 == [2, 5, 3, 7, 8, 4, 9]
assert heapq.heappop(h1) == 2
assert [heapq.heappop(h1) for _ in range(len(h1))] == [3, 4, 5, 7, 8, 9]
assert h1 == []
h2 = [11, 34, 22, 45, 8]
heapq.heapify(h2)
assert h2 == [8, 11, 22, 45, 34]
'''
8
11 22
45 34
'''
heapq.heappush(h2, 4)
assert h2 == [4, 11, 8, 45, 34, 22]
'''
4
11 8
45 34 22
'''
heapq.heappush(h2, 23)
assert h2 == [4, 11, 8, 45, 34, 22, 23]
'''
4
11 8
45 34 22 23
'''
assert heapq.heappushpop(h2, 6) == 4
assert h2 == [6, 11, 8, 45, 34, 22, 23]
'''
6
11 8
45 34 22 23
'''
assert heapq.nlargest(3, h2) == [45, 34, 23]
assert len(h2) == 7 # h2 does not change
assert heapq.nsmallest(3, h2) == [6, 8, 11]
assert len(h2) == 7 # h2 does not change as well
Application
K-th Max/Min Node
For an array of length $n$, if it is not sorted, we can find the K-th maximum/minimum element in $O(n*log(n))$ using a binary heap.
While iterating over the array, a max/min heap keeps K elements in descending/ascending order. In each iteration, if adding a new element into the heap exceeds the size K, pop the root node out of the heap.
- The root node represents the current maximum/minimum element.
This allows the maximum heap to contain K smallest elements we have seen. So, the root node becomes the K-th smallest element because any other nodes in this heap are smaller than the root node.
1
2
3
4
5
6
7
8
9
10
import heapq
# Same as `heapq.nsmallest(k, arr)[0]`
def find_kth_min_element(arr: list, k: int) -> int:
max_heap = []
for x in arr:
heapq.heappush(max_heap, -x)
if len(max_heap) > k:
heapq.heappop(max_heap)
return -heapq.heappop(max_heap)
Also, it allows the minimum heap to contain K largest elements we have seen. So, the root node becomes the K-th largest element because any other nodes in this heap are greater than the root node.
1
2
3
4
5
6
7
8
9
10
import heapq
# Same as `heapq.nlargest(k, arr)[0]`
def find_kth_max_element(arr: list, k: int) -> int:
min_heap = []
for x in arr:
heapq.heappush(min_heap, x)
if len(min_heap) > k:
heapq.heappop(min_heap)
return heapq.heappop(min_heap)
Next Most Promising Node
Depending on how to define when we pop out of a heap, we can find the next most promising element.
It looks like finding
- the next edge with the smallest weight we have not visited in a given graph.
- the next edge with the greatest weight we have not done yet.
- the next smallest character in a string
- the next earliest end time in an array of intervals
and so on.
Notice that we can make use of a max/min priority queue in dynamic programming problems as well.
- when to choose the smallest/greatest candidate among K solutions to subproblems.
Leetcode
- See my list