Data structure Notes
Home Up C Notes C++ Notes Data structure Notes

 

Excerpts from Data Structure notes:

...

Balancing binary trees

 

Sometimes trees become "jagged" in shape and some internal management or balancing is necessary to keep the tree efficient. 

Let's sketch some ideas on designing a tree balancer.  We already touched into an algorithm when we designed the remove() function.  Recall that to promote the best tree shape, we based our decision of replacement node upon the deepest leaf in the left or right subtree.

 

Let's pick an unbalanced node in an unbalanced tree.  Clearly 30 is unbalanced as it's left-most subtree depth is zero and right-most subtree depth is 4.  We can shift it to a better position in the tree by deleting it and adding it:

 

 

        8                    8                        8

       / \                   / \                        / \

      4   40            4  40                   4   40

          / \                    / \                     /  \

         30  50   =>     33   50  =>      33   50

           \                       \                  /  \

            35                     35         30    35

           /                       /                    /

          32               32                   32

         /  \                /  \                  /  \

       31    34       31    34          31   34

            /

         33

    (1) Decide to      (2) Delete     (3) Re-add 30

       reorganize 30        30

 

Notice that the tree's depth has now changed from 7 to 6.

  We can continue this process for other unbalanced nodes, say elements 8, 33 and 40 to get an overall better weighted tree.

 


 

Balancing algorithm pseudocode

 

 

 

nBalances   : shared integer, current # of node reorganizations done

maxBalances : shared integer, maximum # of node reorganizations to perform

 

DEPTH_TOLERANCE = 2

 

************************************************************************

* Perform a maximum of <maxNumberOfBalances> shifts of nodes

* working in preorder fashion from root to leaves in <bt>

 

 

balanceATree(bt,maxNumberOfBalances)

               nBalances   <- 0

               maxBalances <- maximumNumberOfBalances

               BTPreorderBalance(bt,balanceASubtree)

 

 

************************************************************************

* Decide if <node> needs to be reorganized

*  YES, if <node>'s left subtree and right subtree depths are skewed

*        by more than <DEPTH_TOLERANCE>

*

*  returns TRUE or FALSE depending on whether <node> was rebalanced

 

balanceASubtree(node,bt)

               depthDifference = | leftMostSubtreeDepth(node) - rightMostSubTreeDepth(node) |

               if ((depthDifference > DEPTH_TOLERANCE))

                              BTDelete(bt,node)

                              BTAdd(bt,node)

                              return TRUE

               else

                              return FALSE

 

 

************************************************************************

* Keep performing preorder traversals of <bt> calling function <f>

*  until <bt> is sufficiently balanced, that is,

*  until we have performed a cumulative total of <maxBalances> OR a

*  preorder traversal results in failure to find an unbalanced node

* Notes: The preorder traversal is terminated upon the FIRST rebalance

*       or if <f> has returned TRUE more than <maxBalances> times

*  We MUST retraverse the tree starting from the root EACH time after

*   the next rebalance because the tree changes and we can't depend

*   upon any old links in the recursive traversal being accurate

*   Fortunately, Preorder grabs nodes from the TOP of the

*    tree first so the algorithm is relatively quick to find

*    unbalanced nodes

 

              

BTPreorderBalance(bt,f)

               while (nBalances < maxBalances)

                              if (!BTProcessPreorderBalance(bt,bt.root,f))

                                             nBalances = maxBalances  * balancing successful, force termination of loop

 

 

BTProcessPreorder(bt,node,f)

               if (f(node,bt))

                              nBalances = nBalances + 1     * (1) exit upon a rebalance

                              return TRUE

               if (nBalances >= maxBalances)        * (2) exit upon sufficient # of balances

                              return TRUE

               if (BTProcessPreorder(bt,node.left)) * (3) exit recursively upon either

                              return TRUE                   *     of above conditions

               if (BTProcessPreorder(bt,node.right))

                              return TRUE

               return FALSE

 

...

 

Unit IV Exercises

 

 

(1) Industrial Tool Corporation wishes to cross-reference patterns on certain "important" customers, those who have in the past put in large orders of equipment.  Cross-reference as in "which bigwig is buying what and how much and when from us so we can be better prepared to control our inventory in the next year".  Currently, the company stores general inventory stats of their clients in yearly files: it1992.dat, it1993.data, ...,up until the current year.

 

Upper management wishes a system that produces a chronological report of the major orderers, who will be specified in a control file bigcust.ctl:

 

            Heneca Industries        <= Sample major orderers

            Jackson Quarry

            Sutherland Mills

 

Because the nature of the application is primarily cross-referencing (or searching), system analysts have agreed that hash tables are a suitable means of looking up certain customers.

 

You are requested to design the system that produces the following report:

 

                           Purchase:

Year    Company            Month Qty   Description

-----   -------            -----------------------

1992 :

         Sutherland Mills    May    45   Sander

                Heneca Industries   July   21   Oil gun

                Jackson Quarry      Aug    11   Drill press

                Jackson Quarry      Oct     9  

 

1993 :

                Jackson Quarry      Jan    17   Crane lift

                Heneca Industries   Nov    11   Chain saw

 

1994 :

                Sutherland Mills    May    55   Sander

                Sutherland Mills    June   10   Chain saw

...

 

 

Read the files and store the records in separate hash tables for each year.  There may be many matches for a particular company depending on what commodities they have purchased in any year.  Provide the ability to update bigcust.dat from within your application.  Here is a sample menu:

 

               1 - Produce Report

               2 - Update major orderer list (bigcust.ctl)

               3 - Exit


 

(2a) Implement an online library search system using conventional hashing.  Your program should allow "Title" or "Author" search.  Note you need only one copy of the data in memory, but two hash tables, one that records hashed references to titles, another to authors.

 

Assume library entries are of the format:

 

 

               LibraryBook

                              Title         : String

                              Author        : String

                              YearPublished : String

                              OnLoan        : Boolean [Y/N]

 

 

Populate your database on file with at least 50 titles.  Test your program by searching on various titles and authors, some existent, some non-existent.

 

Sample run

 

            T x - Title, A y - Author search

            =>  T Bayshore blues

      No such title

            =>  A Finigree, Wilson

             Jacobs ladder   1987  On loan

             Johnny Be Good  1988  Available

            => T Betty's cookery

             Calhill, Sally  1976  Available

 

 

(b) Fancy up your application to provide "narrow down" search on the first two characters of the title or author.  Hint: Create a third and a fourth hash table that contain hash references to the first two characters of title and author respectively.  You can use conventional hashing or the multiple hashing method as described in the notes (or a superior hybrid thereof).

 

Sample run

 

TN x - Title narrowed down search,

AN y - Author narrowed down search

 

=> TN Da

            Dandy Lion and the wicked witch   1976  On loan

            Davie Crockett  1954  On loan

            Dad and Mom     1997  On loan

            ...