Michelle’s first program was the logistic map in C++ because she always loved fractals. She sort of gave up on using them in her daily work, until she was working with navigation systems. Here, she had to compress multiple dimensions into one. For example: what item or items in the data set is closest to this point?
Of course, this can be solved with brute force, by calculating all Euclidean distances, but that does not seem fun! An additional constraint often is that the data is ordered, so that usually involves a comparison function.
That comparing is often the compressing into one dimension. For example, if you want to traverse cells in a grid, in what order are you going to do that? In other words, we want
a function (x,y) -> z where z is the order in which we will traverse the cells.
A simple method, where we just traverse the cells by column:
does not work because 4 and 5 are strangely far apart, and also it does not scale consistently. We would prefer something like this which is called the Z order (or Morten)
On a larger grid, it looks like this, and there is still some strange behaviour, see the long jump in the middle. Furthermore, there are also things that are close but look far apart, for example the dot at the left arrow and the one above it.
So let’s try a different shape, the Gray codes.
(0,1) – – – (1,1)
We can generate that with super simple code: f(n) = return n $ (n >> 1)
The shape is great, but how to stitch them together? If we do it naively, we have the same problems as in the picture above. It would be better if we rotated. The code is not very easy because of the rotating, but the pattern looks very pretty and we don’t have nasty jumps anymore,
all stitch jumps are one ‘hop’ long.
Are these things fractals? Yes! Even though they’re not that pretty. Pretty is not a property of fractals. What is is:
* nowhere differentiable
* Hausdorff dimension > topological dimension.
As you do more and more iterations of a fractal, they fill up the space, so they can not be differentiated. This also explains why the third characteristic holds, because although there are
one-dimensional lines, they do fill up a two dimensional space.
They were invented a very long time ago by a few (didn’t get who) answering the question whether you can fill the unit space with a function. Hilbert came later and made the idea
famous, because his paper has a nice picture 🙂
So, okay they are fractals, but are they practical? Now we have done the mapping our two dimensional data into one number, we can use it to order data:
compare (a,b) = hilbert(a) – hilbert(b)
And there are applications beyond navigation, like image search, collision detection (are two things likely to be in the same location?), warehouse or data centre organization.
There are tools using this, like slurm (linux resource management) uses spatial locality when scheduling jobs.
What makes these types of data types so applicable is that they are all grid-like: dense data with multiple dimensions with similar cardinality. We have only looked at 2D examples,
but this also works in 3D.
So, when does this NOT work?
It does not work that well if your data is not square-ish” or if you have the curse of dimensionality: you have a lot of dimensions but sparse data.
This was such a cool talk, great storyline and nice examples!