Box-counting dimension:

Ok i’ve been reading the yale.edu tutorials recently and read through the box-counting dimension.

Here’s the summary.

the basic idea is to cover a shape with boxes and find how the number of boxes(required to cover the shape) changes with the size of the boxes.

A slightly more technical(rigourous?) definition is:

d_{b} = lim(r->0) Log(N(r))/Log(1/r)

where

r -- size of the box (1,1/2,1/4 etc..)

k -- is some constant

Also note that while the original limit may be the actual definition and can be used for verifying a box-counting dimension, if you have no idea of a given curve/function's box-counting fractal dimension, you are better off using this relation.

Log(N(r)) = Log(k) + Log((1/r)^{d})

or

Log(N(r)) = Log(k) + d* Log(1/r)

Notice that this is a straight line equation.

So in practice, if you want to find out for a fractal box-counting dimension or suspect there's a fractal pattern hidden, you end up plotting log-log plots of different box sizes against the number of boxes required to fill up the curve.

I am yet figure out a lot more about a quantitative implementation details(planning to use numpy for the same.) of the same. Will update once am done.

But for teasers ,

If you think of implementing this in an algorithm to find fractal dimension of a unknown data set, it immediately becomes more complex than it sounds. To begin with:

1. The curve whose dimension needs to be calculated, becomes a floating point numpy array(of size equal to guessed(rounded off to the whole number))

2. The initial first box becomes a (line/square/cube/<equivalent>) with a size of the range of the curve arrays.

3. At each step the first box size is multiplied by 1/r^n, where r-- is the guessed dim() and n is the step count.

4. The challenge becomes how to figure out where and what position to place the boxes on for covering all(maximum?) points with minimum no. of boxes. I am thinking some clustering algorithm will be required for the same, but some quick look at existing code(from allen downey's complexity book, python code) suggests am over-complicating this. will try out the code and get back on this one.

[UPDATE:11-Feb-2013]: On little more thought the clustering algorithm has to be applied multiple times. Besides, there must be a set(percentage) of points, we should be willing to ignore, else the halting condition might become too long to be practical to implement the algorithm.

how do you figure out where to place the boxes, so that you use the minimum number of boxes to cover the maximum no of points of the curve?

This part sounds complex to me(am thinking we'll have to borrow some clustering algorithm from Machine Learning), but i may be mistaken.