## An Example

As an example, suppose we created a very popular on-line game, such as Tetris. Over the few months since the game was published we've gathered a list of achieved scores with the following structure:ID | PLAYER | DATE | SCORE |
---|---|---|---|

1 | Jose Rodriguez | 2016/10/01 14:55:02 | 154 |

2 | Nefart | 2016/12/15 03:12:21 | 12 |

3 | SUF4N | 2016/06/03 22:00:01 | 10280 |

4 | EtnaVolcano | 2016/12/24 18:56:15 | 1342 |

5 | Guido | 2016/09/09 07:15:00 | 280 |

... |

As our game has become quite popular, this list is very long. Let's say we've gathered a few millions of records.

## Binary Heap

This is a task most suited for a heap data-structure. There are many variants of heap, of which the most basic and commonly known is**binary**(or 2-regular)

**heap**.

According to Wikipedia:

Binary heapis acomplete binary treewhich satisfies theheap property- the key stored in each node is greater than or equal to the keys in the node's children, according to some total order function.

Acomplete binary treeis, put informally, a binary tree for which every level, except possibly the last, is completely filled, and all nodes in the last level are as far left as possible.

An Example Binary Heap |

From the heap property follows an important observation - the greatest element is always located in the root of the tree. Also, given heap is a complete binary tree, it has logarithmic height. Therefore the following convenient time complexities:

**Insert**O(log(N))**FindMax**O(1)**DeleteMax**O(log(N))

You can find a more detailed explanation and implementation details on the Internet.

## Min vs. Max Heap

The above description holds for a so-called**max-heap**. Additionally, there also is

a

**min-heap**, where the heap property is inverted and root node contains the minimum element.

It is useful to observe that a min-heap is just a max-heap with inverted ordering

function. Specifically:

x <fmax y <=> x >fmin yTherefore the distinction between max-heap and min-heap lies purely in what ordering function it takes. It is, however, a useful abstraction that simplifies communication and reasoning about heap behaviour.

## The Algorithm

So how can we use a heap to solve the problem at hand?A key observation here is that after i-th iteration in Step 2 the heap will containInput:A sequence of M elements, an appropriate ordering function and a positive number N. Supposing M >> N.Steps:1. Build a min-heap of size N by inserting the first N elements. 2. For each remaining element: a) Insert the element. b) Delete the current minimum. 3. While heap is non-empty: a) Find and collect minimum. b) Delete minimum. 4. Return collected elements, in reversed order.Output:A sequence of top N elements of the input sequence.

the top N elements of the N+i-th prefix of the input sequence. Therefore, at the end of step 2, after all items have been processed, it will contain the top N elements of the whole sequence.

Note that we've used a min-heap, even though we're interested in the N largest elements. This is an important trick that allows us to easily drop the smallest of the current N+1 largest elements in step 2b. Conversely, at the end in Step 4, we need to reverse the order of output, because the heap will give out the N largest elements sorted from the smallest.

## Time And Space Complexity

The total**time complexity**of the algorithm is

**O(M log(N))**, where M is the size of the input sequence and N is the size of the output sequence. Note that M >> N and so this bound is much better than for e.g. sorting the whole input sequence. Often N is very small, such as 10. In that case the time complexity is essentially

**O(M)**.

The heap can be implemented using an underlying array of size N. Therefore the space complexity of this algorithm is

**O(N)**, which is also very good.

## PriorityQueue

Now that the algorithm is clear, let's look at how we can implement it in Java. The standard library provides a binary heap implementation under a class named**PriorityQueue**.

It has the following interface:

public PriorityQueue(int initialCapacity, Comparator<E> comparator)Constructor. The initialCapacity determines the initial size of the underlying array. This is just a technical detail. PriorityQueue is an unbounded queue and the array will grow as needed. The comparator implements the ordering function. PriorityQueue is a min-heap - root note will contain the record which the comparator deems smallest. If you need a max-heap, just give it a reversed comparator (see java.util.Collections#reverseOrder(Comparator<T> cmp)).

public boolean add(E element)Insert.

public int size()Size

public E poll()Retrieve and delete min.

## The Implementation

Using PriorityQueue the algorithm can be expressed in a straight-forward way.public class TopN { public static <T> List<T> getTopN(Stream<T> entries, Comparator<T> comparator, int count) { PriorityQueue<T> queue = new PriorityQueue<>(count + 1, comparator); entries.forEach(entry -> { queue.add(entry); if (queue.size() > count) { queue.poll(); } }); List<T> result = new ArrayList<>(queue.size()); while (!queue.isEmpty()) { result.add(queue.poll()); } Collections.reverse(result); return result; } }

## Back to our Example

Now that we have the algorithm implemented we just need to provide the representation of a record and the ordering function for our specific use-case.@Value public class Score { int id; String player; Date date; int score; }The @Value annotation comes from Project Lombok and denotes a value-object. It is just some syntactic sugar that allows to make our class definitions clearer. The compiler will augment the AST - make all those fields private and final, generate a corresponding getter for each of them plus an all-arguments constructor.

The comparator will look as follows:

public class ScoreComparator implements ComparatorWith all the pieces in place we can now use them this way:{ @Override public int compare(Score first, Score second) { return Integer.compare(first.getScore(), second.getScore()); } }

Stream<Score> scores = ...; // obtain a stream of all scores List<Score> top10 = TopN.getTopN(scores, new ScoreComparator(), 10);This will give us the 10 greatest scores achieved in our game.

## Conclusion

In this article we've seen a brief overview of the heap data-structure and demonstrated, based on a use-case, one type of applications it is very useful for.You can find the complete code on GitHub under dusan-rychnovsky/top-n.

## 0 komentářů:

## Okomentovat