## Effective search under a subtree

Almost every blogging system supports assigning articles into categories and subcategories. Listing articles which belong to a subtree of categories may be quite expensive operation especially if the number of articles is very high. In this article I would like to describe one of possible solutions.

### Category Tree

Tree of article categories could look something like:

- Music

- Rock
- Pop
- Jazz
- Sport

- Snowboarding
- Skiing

- Cross-country skiing
- Downhill skiing
- Kiteboarding
- Traveling

- Adventure
- Leisure

The most simple way of assigning an article to a category would be to give each category an id (e.g. Music=1, Rock=2, Pop=3, Jazz=4, Sport=5,…) and add categoryId column into Article database table. Then if we wanted to get all articles from category Rock we could do it simply by running something like:

`select * from Article where categoryId=2`

But getting all articles from all subtree under Music category will be more complicated, we have to make list of category ids from all Music subtree and query something like:

`select * from Article where categoryId in (1,2,3,4)`

Imagine much larger category tree, the number of categories in a subtree may be high. We wanted something more efficient and still we would like to be able to add new categories into a tree.

### Limited tree

In our project the efficiency was really top priority, therefore we were able to accept slight limitations to our category tree which could give us better performance:

- Maximum tree depth is defined once and then constant. Let's name this
parameter
`maxLevels`

. - Maximum number of children a parent may have is defined once and then
constant. This parameter can be called
`maxChildren`

.

### Limited tree evaluation

Position of every tree node can be described by path to it from the tree root. Look at following tree:

- Music {1}

- Rock {1,1}
- Pop {1,2}
- Jazz {1,3}
- Sport {2}

- Snowboarding {2,1}
- Skiing {2,2}

- Cross-country skiing {2,2,1}
- Downhill skiing {2,2,2}
- Kiteboarding {2,3}
- Traveling {3}

- Adventure {3,1}
- Leisure {3,2}

Path can have maximally `maxLevels`

items and each item in path is
in range from 1 to `maxChildren`

. For simplification we filled paths
shorter then `maxLevels`

by 0s. So for example if ```
maxLevels =
6
```

then path to Pop node is `{1,2,0,0,0,0}`

. Then we can
express a path as `int`

array of size `maxLevels`

.

Now for each node we can calculate a single integral value using this
equation (the operator `^`

means power):

value(path) = path[0]*(maxChildren+1)^(maxLevels-1) + path[1]*(maxChildren+1)^(maxLevels-2) + path[2]*(maxChildren+1)^(maxLevels-3) + ... + path[maxLevels-1]*(maxChildren+1)

Does this equation resemble something to you? Yes you have seen it before, it is equation for conversion a number from numeric system with base (maxLevels+1) into decimal system.

Into Article table we add new column `categoryValue`

into which we
will put the calculated value based on a category an article belongs to.

### Example

Now why we did all this? Let me show you on example. The tree we will use
looks the same as above. At first we have to define `maxLevels`

and
`maxChildren`

parameters. Let's define `maxLevels=4`

and
`maxChildren=7`

(When evaluating tree nodes we will therefore convert
number from octal system into decimal.)

We have written a new article and we want to assign it to category Skiing
(path is {2,2} which is equal to {2,2,0,0}). So we can calculate the
`categoryValue`

column value using equation defined above:

value(Skiing {2,2,0,0}) = 2*8^3 + 2*8^2 + 0*8^1 + 0*8^0 = 1024 + 128 = 1152

Then let's say we want to list all articles from category Sport {2} and below. It means everything between Sport {2} inclusive and Traveling {3} (which is next sibling to Sport) exclusive. So we calculate lower and upper limits:

lower = value(Sport {2,0,0,0}) = 2*8^3 + 0*8^2 + 0*8^1 + 0*8^0 = 1024 upper = value(Traveling {3,0,0,0}) = 3*8^3 + 0*8^2 + 0*8^1 + 0*8^0 = 1536

The sql to get all articles from under the Sport node is then:

select * from Article where categoryValue >= :lower and categoryValue < :upper

It looks more complicated then it really is. For testing we use
`maxChildren=9`

because then paths form direcly decimal numbers. For
example when `maxLevels=5`

then:

value(2,3,4) = value(2,3,4,0,0) = 23400

### Real life

Tree parameters are limited by type we store calculated value in. We used
`long`

java type which is 64 bits type. We set
`maxLevels=6`

and `maxChildren=10`

. It is sufficient for
our needs and very fast.