I recently had fun with an all-to-common issue with SQL driven websites: hierarchical data. For those who don’t like big words, think trees. Other people have already discussed storage methods, and I would actually highly suggest you read the writeup if you haven’t already.
While it is fairly straightforward to deal with, in our case we use HTML_QuickForm to handle our forms and are using QuickForm’s hierselect to select a category.
The issue starts showing its face in 2 distinct areas: (1) the client is not yet sure how deep they need their categories to go, and (2) the hierselect requires a very specific format of data to be passed in.
To start off we have our categories table:
CREATE TABLE `categories` (
`id` int(10) unsigned NOT NULL auto_increment,
`parent_id` int(10) unsigned default NULL,
`title` varchar(255) NOT NULL,
`active` tinyint(1) NOT NULL default '1',
PRIMARY KEY (`id`),
KEY `fk_categories_categories` (`parent_id`),
CONSTRAINT `fk_categories_categories` FOREIGN KEY (`parent_id`) REFERENCES `categories` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
To make life easier for myself I have my model auto generating the SQL statement to go n levels deep in printing all categories:
if( intval($depth) < 1 )
$depth = 1;
$select = array();
$from = array();
$where = array();
$order = array();
for( $i = 1; $i <= $depth; $i++ )
{
$select[] = "level" . $i . ".id AS level" . $i . "_id";
$from[] = "`categories` AS level" . $i . "";
$where[] = "( level" . $i . ".active = 1 OR level" . $i . ".active IS NULL )";
$order[] = "level" . $i . "_id";
}
//SELECT
$sql = "SELECT " . implode(', ', $select) . ", level" . $depth . ".title AS title " ;
//FROM
$sql .= "FROM " . $from[0] . " ";
unset($from[0]);
if( count($select) > 0 )
{
foreach( $from as $key => $value )
{
$from[$key] = $value . " ON level" . ($key) . ".id = level" . ($key+1) . ".parent_id";
}
$sql .= " RIGHT JOIN " . implode(" RIGHT JOIN ", $from) . " ";
}
//WHERE
$sql .= "WHERE level1.parent_id IS NULL AND " . implode(" AND ", $where) . " ";
//ORDER
$sql .= "ORDER BY " . implode(", ", $order);
$rs = $this->query($sql);
At a depth of 5 our SQL query generated becomes:
SELECT
level1.id AS level1_id,
level2.id AS level2_id,
level3.id AS level3_id,
level4.id AS level4_id,
level5.id AS level5_id,
level5.title AS title
FROM
`categories` AS level1
RIGHT JOIN `categories` AS level2 ON level1.id = level2.parent_id
RIGHT JOIN `categories` AS level3 ON level2.id = level3.parent_id
RIGHT JOIN `categories` AS level4 ON level3.id = level4.parent_id
RIGHT JOIN `categories` AS level5 ON level4.id = level5.parent_id
WHERE
level1.parent_id IS NULL
AND ( level1.active = 1 OR level1.active IS NULL )
AND ( level2.active = 1 OR level2.active IS NULL )
AND ( level3.active = 1 OR level3.active IS NULL )
AND ( level4.active = 1 OR level4.active IS NULL )
AND ( level5.active = 1 OR level5.active IS NULL )
ORDER BY
level1_id, level2_id, level3_id, level4_id, level5_id
The series of ( levelN.active = 1 OR levelN.active IS NULL ) guarantees that only entries with the active field turned on and that children of deactivated categories are not displayed. Fancy piece of SQL-fu if I do say so myself. The data is returned looking something like this:
| level1_id | level2_id | level3_id | level4_id | level5_id | title |
|---|---|---|---|---|---|
| 1 | Parent | ||||
| 2 | Parent 2 | 1 | 7 | Child 2 | |
| 2 | 4 | Child | |||
| 2 | 5 | Child 2 | |||
| 1 | 6 | 8 | Grandchild | ||
| 1 | 6 | 9 | Grandchild 2 | ||
| 1 | 6 | 8 | 10 | Grandchild | |
| 1 | 6 | 8 | 11 | Grandchild 2 | |
| 1 | 6 | 8 | 10 | 12 | Great Grandchild |
| 1 | 6 | 8 | 10 | 13 | Great Grandchild 2 |
Now by cleaning up the null values from the $category array we have our path:
$data = $rs->getRows();
$categories = array();
foreach( $data as $category )
{
$title = $category['title'];
unset($category['title']);
foreach( array_reverse($category,true) as $key => $value )
{
if( $value === null )
{
unset($category[$key]);
}
}
array_set($categories, array_merge(array(count($category)-1),array_values($category)), $title, "-- SELECT CATEGORY --");
}
return $categories;
You’ll notice that we also insert into our $categories array according to the hierselect format linked above. That in and of itself is an interesting function as, for example, our entry for Great Great Grandchild would look like $categories[4][1][6][8][10][12] = 'Great Great Grandchild'; and we need empty entries for each level.
array_set() has this code:
function array_set(array &$array, array $keys, $value, $emptyText)
{
$tmp =& $array;
$lastKey = array_pop($keys);
foreach( $keys as $key )
{
if( !is_array($tmp[$key]) )
{
$tmp[$key] = array();
}
$tmp =& $tmp[$key];
}
if( !isset($tmp['']) )
{
$tmp[''] = $emptyText;
}
$tmp[$lastKey] = $value;
}
And our final array output will look something like this:
Array
(
[0] => Array
(
[] => -- SELECT CATEGORY --
[1] => Parent
[2] => Parent 2
)
[1] => Array
(
[1] => Array
(
[] => -- SELECT CATEGORY --
[6] => Child
[7] => Child 2
)
[2] => Array
(
[] => -- SELECT CATEGORY --
[4] => Child
[5] => Child 2
)
)
[2] => Array
(
[1] => Array
(
[6] => Array
(
[] => -- SELECT CATEGORY --
[8] => Grandchild
[9] => Grandchild 2
)
)
)
[3] => Array
(
[1] => Array
(
[6] => Array
(
[8] => Array
(
[] => -- SELECT CATEGORY --
[10] => Grandchild
[11] => Grandchild 2
)
)
)
)
[4] => Array
(
[1] => Array
(
[6] => Array
(
[8] => Array
(
[10] => Array
(
[] => -- SELECT CATEGORY --
[12] => Great Grandchild
[13] => Great Grandchild 2
)
)
)
)
)
)
Good thing is all that needs to be changed to dig deeper into the hierarchy is the $depth variable from above.
This solution is very good for write intensive applications, so anyone building a forum-like system will find use out of this system.
However I’m thinking about moving to the Nested Set model as the application for this code will not be write-intensive…
Comments 11
Thanks for this tip!
Posted 02 Jun 2008 at 6:04 pm ¶What about the pre-ordered traversal tree? Google it, it might be usefull. It reduces the number of iterations and minimizes the complexity of the SQL query that you created (in my humble opinion). If i am not mistaken JOINS are operations that ‘cost’ when it comes to the RDBMs internals.
Posted 03 Jun 2008 at 3:34 am ¶andreas is right, MPTT provides significant performance advantages. Branches can be quickly retrieved with BETWEEN. The major challenge to MPTT is the time required to rebuild the as new items are added to it.
Posted 05 Jun 2008 at 2:17 pm ¶Agreed, unfortunately I’ve progressed a little to far into the system to turn back and so far aren’t seeing any performance hits, then again I’m caching built trees anyways.
Posted 05 Jun 2008 at 2:25 pm ¶http://dev.mysql.com/tech-resources/articles/hierarchical-data.html
Posted 05 Jun 2008 at 2:59 pm ¶http://www.sitepoint.com/article/hierarchical-data-database
Posted 05 Jun 2008 at 3:00 pm ¶Andreas: I have that linked above :D
I also just bought Joe Celko’s book on Hierarchical Data in SQL, been reading thought that :)
Posted 05 Jun 2008 at 11:26 pm ¶A nice opportunity to show the power of CakePHP, which has a standard behavior for tree data structures:
var $actsAs = array(‘Tree’);
This article may interest you as well (non-cake, don’t worry):
Posted 06 Jun 2008 at 2:39 am ¶http://kevin.vanzonneveld.net/techblog/article/convert_anything_to_tree_structures_in_php/
Showing an explode-like function that convert flat arrays into multidimensional tree’s based on a delimiter found in it’s keys.
If anyone is intrested i will be posting an article implementing MPTT on my website. I don’t want to publish the URL here since i don’t want to try and get traffic to my blog by using comments on other people blogs.
The implementation is done with PHP.
Another idea that came to me is:
Cache the tree structure results for a short period of time and avoid the DB costfull operations (when it comes to thousands of rows and you need complex queries). You can update your cache by implementing the Observer pattern on your Cache Manager Class.
And something that you might want to look up to is ‘memory caching’…where you can place chunks of data in memory (the fastest way to access them).
http://pecl.php.net/package/memcache
“Memcached is a caching daemon designed especially for
dynamic web applications to decrease database load by
storing objects in memory.
This extension allows you to work with memcached through
handy OO and procedural interfaces”
Also try googling the facebook caching mechanisms. They did a lot of work on that.
Posted 09 Jun 2008 at 6:22 am ¶(http://sizzo.org/wp/wp-content/uploads/2007/09/facebook_performance_caching.pdf)
I’m sorry but this is just a horrible way to do this, look up Nested Sets (aka. MPTT) instead.
After reading it a bit closer I realized that you need to change the DDL for the table when creating new levels, all I say is eeeewwww…
Posted 12 Jun 2008 at 3:51 am ¶Thanks for this. I was looking for it.
Posted 13 Nov 2008 at 2:04 pm ¶Trackbacks & Pingbacks 8
[...] Cousineau submitted a new blog post he’s come up with that looks at using hierarchical data in a MySQL database. I recently had [...]
[...] Hierarchical Data With PHP and MySQL (tags: php) [...]
Tower Of Power » Blog Archive » Hierarchical Data With PHP and MySQL…
Tower Of Power » Blog Archive » Hierarchical Data With PHP and MySQL…
[...] Перевод статьи Hierarchical Data With PHP and MySQL. [...]
[...] http://www.toosweettobesour.com/2008/06/02/hierarchical-data-with-php-and-mysql/ [...]
[...] I get back I’ve got some updates for the hierarchical data post, and should have a post ready on using rsync and phing for remote deployment. Category: Updates [...]
[...] http://www.toosweettobesour.com/2008/06/02/hierarchical-data-with-php-and-mysql/ [...]
[...] And not only basic categorization, n-deep hierarchical categorization. I’ve already discussed storage and retrieval of such data, but there comes a time when one needs to display this [...]
Post a Comment