2016-04-05 Probabilistic sorting

How can we quantify subjective information?
For example:

  • I look through my calendar, and think it could be interesting to have a graph of the quality of each day
  • For the posts in this blog, it could be interesting to sort them on different dimensions, such as technical level, which I like etc.

It could be interesting to make/have a computer program, that continually shows two specimens, where you just have to tap on the one you like the most, and then you would get an ordering of the entire specimen set.

This is easily possibly through running a standard sorting algorithm, with me as the comparison function 🙂 But sorting algorithms are not optimised for this kind of tasks.

The problem is different from traditional sorting in several ways:

  • The comparison function is very slow, and thus it should be heavily optimised for reducing the number of comparisons.
  • The comparison function is imprecise, – it is a subjective matter, and thus there is no guarantee the answers are consistent / “correct” all the time.
  • The sort should be interesting for the comparison function, ie. the choice of elements
  • The result should be incremental, so if the comparison function stops, there should be a somewhat ordering of the elements.

How could we measure such a sorting algorithm?

  1. have a comparison function, that has a built-in error rate.
  2. run the algorithm on a data set (list of random integers) and stop after a certain number of comparisons.
  3. sample the result, with random pair of elements, and probability that they are sorted, given their distance in the array.

Of course traditional sorting algorithms can be used here, but it also leads to a class of probabilistic algorithms, for example: have each element has a probability distribution for their position in the dataset, and then the challenge is to find the comparison that reduces the overall uncertainty the most.

This challenge is also related to a more general ranking problem, where you have repeated match/competition between pairs of elements, and the result determines the overall ranking.

This is a fun problem to think about / dive into 🙂