Hierarchical Data With PHP and MySQL

I recently had fun with an all-to-common issue with SQL driven websites: hierarchical data. For those who don’t like big words, think trees. Other people have already discussed storage methods, and I would actually highly suggest you read the writeup if you haven’t already.

While it is fairly straightforward to deal with, in our case we use HTML_QuickForm to handle our forms and are using QuickForm’s hierselect to select a category.

The issue starts showing its face in 2 distinct areas: (1) the client is not yet sure how deep they need their categories to go, and (2) the hierselect requires a very specific format of data to be passed in.

To start off we have our categories table:

CREATE TABLE  `categories` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `parent_id` int(10) unsigned default NULL,
  `title` varchar(255) NOT NULL,
  `active` tinyint(1) NOT NULL default '1',
  PRIMARY KEY  (`id`),
  KEY `fk_categories_categories` (`parent_id`),
  CONSTRAINT `fk_categories_categories` FOREIGN KEY (`parent_id`) REFERENCES `categories` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

To make life easier for myself I have my model auto generating the SQL statement to go n levels deep in printing all categories:

if( intval($depth) < 1 )
	$depth = 1;

$select = array();
$from = array();
$where = array();
$order = array();

for( $i = 1; $i <= $depth; $i++ )
{
	$select[] = "level" . $i . ".id AS level" . $i . "_id";
	$from[] = "`categories` AS level" . $i . "";
	$where[] = "( level" . $i . ".active = 1 OR level" . $i . ".active IS NULL )";
	$order[] = "level" . $i . "_id";
}

//SELECT
$sql = "SELECT " . implode(', ', $select) . ", level" . $depth . ".title AS title " ;

//FROM
$sql .= "FROM " . $from[0] . " ";

unset($from[0]);
if( count($select) > 0 )
{
	foreach( $from as $key => $value )
	{
		$from[$key] = $value . " ON level" . ($key) . ".id = level" . ($key+1) . ".parent_id";
	}

	$sql .= " RIGHT JOIN " . implode(" RIGHT JOIN ", $from) . " ";
}

//WHERE
$sql .= "WHERE level1.parent_id IS NULL AND " . implode(" AND ", $where) . " ";

//ORDER
$sql .= "ORDER BY " . implode(", ", $order);

$rs = $this->query($sql);

At a depth of 5 our SQL query generated becomes:

SELECT
  level1.id AS level1_id,
  level2.id AS level2_id,
  level3.id AS level3_id,
  level4.id AS level4_id,
  level5.id AS level5_id,
  level5.title AS title
FROM
  `categories` AS level1
  RIGHT JOIN `categories` AS level2 ON level1.id = level2.parent_id
  RIGHT JOIN `categories` AS level3 ON level2.id = level3.parent_id
  RIGHT JOIN `categories` AS level4 ON level3.id = level4.parent_id
  RIGHT JOIN `categories` AS level5 ON level4.id = level5.parent_id
WHERE
  level1.parent_id IS NULL
  AND ( level1.active = 1 OR level1.active IS NULL )
  AND ( level2.active = 1 OR level2.active IS NULL )
  AND ( level3.active = 1 OR level3.active IS NULL )
  AND ( level4.active = 1 OR level4.active IS NULL )
  AND ( level5.active = 1 OR level5.active IS NULL )
ORDER BY
  level1_id, level2_id, level3_id, level4_id, level5_id

The series of ( levelN.active = 1 OR levelN.active IS NULL ) guarantees that only entries with the active field turned on and that children of deactivated categories are not displayed. Fancy piece of SQL-fu if I do say so myself. The data is returned looking something like this:

level1_id level2_id level3_id level4_id level5_id title
1 Parent
2 Parent 2
1 7 Child 2
2 4 Child
2 5 Child 2
1 6 8 Grandchild
1 6 9 Grandchild 2
1 6 8 10 Grandchild
1 6 8 11 Grandchild 2
1 6 8 10 12 Great Grandchild
1 6 8 10 13 Great Grandchild 2



Now by cleaning up the null values from the $category array we have our path:

$data = $rs->getRows();

$categories = array();
foreach( $data as $category )
{
	$title = $category['title'];
	unset($category['title']);

	foreach( array_reverse($category,true) as $key => $value )
	{
		if( $value === null )
		{
			unset($category[$key]);
		}
	}

	array_set($categories, array_merge(array(count($category)-1),array_values($category)), $title, "-- SELECT CATEGORY --");
}

return $categories;

You’ll notice that we also insert into our $categories array according to the hierselect format linked above. That in and of itself is an interesting function as, for example, our entry for Great Great Grandchild would look like $categories[4][1][6][8][10][12] = 'Great Great Grandchild'; and we need empty entries for each level.

array_set() has this code:

function array_set(array &$array, array $keys, $value, $emptyText)
{
	$tmp =& $array;

	$lastKey = array_pop($keys);
	foreach( $keys as $key )
	{
		if( !is_array($tmp[$key]) )
		{
			$tmp[$key] = array();
		}

		$tmp =& $tmp[$key];
	}

	if( !isset($tmp['']) )
	{
		$tmp[''] = $emptyText;
	}

	$tmp[$lastKey] = $value;
}

And our final array output will look something like this:

Array
(
    [0] => Array
        (
            [] => -- SELECT CATEGORY --
            [1] => Parent
            [2] => Parent 2
        )

    [1] => Array
        (
            [1] => Array
                (
                    [] => -- SELECT CATEGORY --
                    [6] => Child
                    [7] => Child 2
                )

            [2] => Array
                (
                    [] => -- SELECT CATEGORY --
                    [4] => Child
                    [5] => Child 2
                )
        )

    [2] => Array
        (
            [1] => Array
                (
                    [6] => Array
                        (
                            [] => -- SELECT CATEGORY --
                            [8] => Grandchild
                            [9] => Grandchild 2
                        )
                )
        )

    [3] => Array
        (
            [1] => Array
                (
                    [6] => Array
                        (
                            [8] => Array
                                (
                                    [] => -- SELECT CATEGORY --
                                    [10] => Grandchild
                                    [11] => Grandchild 2
                                )
                        )
                )
        )

    [4] => Array
        (
            [1] => Array
                (
                    [6] => Array
                        (
                            [8] => Array
                                (
                                    [10] => Array
                                        (
                                            [] => -- SELECT CATEGORY --
                                            [12] => Great Grandchild
                                            [13] => Great Grandchild 2
                                        )
                                )
                        )
                )
        )
)

Good thing is all that needs to be changed to dig deeper into the hierarchy is the $depth variable from above.

This solution is very good for write intensive applications, so anyone building a forum-like system will find use out of this system.

However I’m thinking about moving to the Nested Set model as the application for this code will not be write-intensive…

Comments 11

  1. wenbert wrote:

    Thanks for this tip!

    Posted 02 Jun 2008 at 6:04 pm
  2. andreas wrote:

    What about the pre-ordered traversal tree? Google it, it might be usefull. It reduces the number of iterations and minimizes the complexity of the SQL query that you created (in my humble opinion). If i am not mistaken JOINS are operations that ‘cost’ when it comes to the RDBMs internals.

    Posted 03 Jun 2008 at 3:34 am
  3. Chad wrote:

    andreas is right, MPTT provides significant performance advantages. Branches can be quickly retrieved with BETWEEN. The major challenge to MPTT is the time required to rebuild the as new items are added to it.

    Posted 05 Jun 2008 at 2:17 pm
  4. Daniel Cousineau wrote:

    Agreed, unfortunately I’ve progressed a little to far into the system to turn back and so far aren’t seeing any performance hits, then again I’m caching built trees anyways.

    Posted 05 Jun 2008 at 2:25 pm
  5. Andreas Kompanez wrote:

    http://dev.mysql.com/tech-resources/articles/hierarchical-data.html

    Posted 05 Jun 2008 at 2:59 pm
  6. Aaron wrote:

    http://www.sitepoint.com/article/hierarchical-data-database

    Posted 05 Jun 2008 at 3:00 pm
  7. Daniel Cousineau wrote:

    Andreas: I have that linked above :D

    I also just bought Joe Celko’s book on Hierarchical Data in SQL, been reading thought that :)

    Posted 05 Jun 2008 at 11:26 pm
  8. kvz wrote:

    A nice opportunity to show the power of CakePHP, which has a standard behavior for tree data structures:
    var $actsAs = array(‘Tree’);

    This article may interest you as well (non-cake, don’t worry):
    http://kevin.vanzonneveld.net/techblog/article/convert_anything_to_tree_structures_in_php/
    Showing an explode-like function that convert flat arrays into multidimensional tree’s based on a delimiter found in it’s keys.

    Posted 06 Jun 2008 at 2:39 am
  9. andreas wrote:

    If anyone is intrested i will be posting an article implementing MPTT on my website. I don’t want to publish the URL here since i don’t want to try and get traffic to my blog by using comments on other people blogs.

    The implementation is done with PHP.

    Another idea that came to me is:
    Cache the tree structure results for a short period of time and avoid the DB costfull operations (when it comes to thousands of rows and you need complex queries). You can update your cache by implementing the Observer pattern on your Cache Manager Class.

    And something that you might want to look up to is ‘memory caching’…where you can place chunks of data in memory (the fastest way to access them).

    http://pecl.php.net/package/memcache

    “Memcached is a caching daemon designed especially for
    dynamic web applications to decrease database load by
    storing objects in memory.
    This extension allows you to work with memcached through
    handy OO and procedural interfaces”

    Also try googling the facebook caching mechanisms. They did a lot of work on that.
    (http://sizzo.org/wp/wp-content/uploads/2007/09/facebook_performance_caching.pdf)

    Posted 09 Jun 2008 at 6:22 am
  10. Fredrik Holmström wrote:

    I’m sorry but this is just a horrible way to do this, look up Nested Sets (aka. MPTT) instead.

    After reading it a bit closer I realized that you need to change the DDL for the table when creating new levels, all I say is eeeewwww…

    Posted 12 Jun 2008 at 3:51 am
  11. Vandenbussche wrote:

    Thanks for this. I was looking for it.

    Posted 13 Nov 2008 at 2:04 pm

Trackbacks & Pingbacks 8

  1. From Daniel Cousineau’s Blog: Hierarchical Data With PHP and MySQL | Development Blog With Code Updates : Developercast.com on 02 Jun 2008 at 11:25 am

    [...] Cousineau submitted a new blog post he’s come up with that looks at using hierarchical data in a MySQL database. I recently had [...]

  2. From Skylog » Blog Archive » links for 2008-06-03 on 03 Jun 2008 at 12:30 am

    [...] Hierarchical Data With PHP and MySQL (tags: php) [...]

  3. From roScripts - Webmaster resources and websites on 03 Jun 2008 at 3:15 pm

    Tower Of Power » Blog Archive » Hierarchical Data With PHP and MySQL…

    Tower Of Power » Blog Archive » Hierarchical Data With PHP and MySQL…

  4. From Иерархия данных в PHP и MySQL : Заметки программиста on 05 Jun 2008 at 9:24 pm

    [...] Перевод статьи Hierarchical Data With PHP and MySQL. [...]

  5. From Hierarchical Data With PHP and MySQL on 06 Jun 2008 at 5:31 am

    [...] http://www.toosweettobesour.com/2008/06/02/hierarchical-data-with-php-and-mysql/ [...]

  6. From Tower Of Power » Blog Archive » Updates To Come on 07 Jun 2008 at 7:05 pm

    [...] I get back I’ve got some updates for the hierarchical data post, and should have a post ready on using rsync and phing for remote deployment. Category: Updates [...]

  7. From Managing Hierarchical Data in MySQL with Modified Preorder Tree Traversal (MPTT) | my-whiteboard on 27 Jul 2008 at 4:22 pm

    [...] http://www.toosweettobesour.com/2008/06/02/hierarchical-data-with-php-and-mysql/ [...]

  8. From Tower Of Power » Blog Archive » Displaying N-Deep Trees (Remember Your Algorithms Course?) on 05 Aug 2008 at 10:45 am

    [...] And not only basic categorization, n-deep hierarchical categorization. I’ve already discussed storage and retrieval of such data, but there comes a time when one needs to display this [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *