------------------------------------------------------------------------------- Three basic ways to traverse a Tree Structure Depth Top-Down Breath 7 1 1 / \ / \ / \ 3 6 2 5 2 3 / \ / \ / \ / \ / \ / \ 1 2 4 5 3 4 6 7 4 5 6 7 Recurse each Do current node For each node on a list child Recurse on each Do Node Do current node child node Append all children to end of queued list (Recursive) (Recursive) (Iteritive) Also see "info/perl/file_traversal_notes" for Directory structures. =============================================================================== Polymorphic Data Structures ------------------------------------------------------------------------------- Glossary: Data structure: The complex data storage for multiple types of elements. structure: short for data structure -- not a C structure. C structure: C `stuct' or `union' data type. Element: The user data that we want to store. Declaration: Declaration of the data structure Instance: Definition of one data stucture for actual use. Polymorphic DS A Data Structure which could be used with any data type. ------------------------------------------------------------------------------- Problems and techniques. Problems associated with this 1/ Storing variable types of elements (few languages provides a generic type) Insertion: How should elements be inserted into the data stucture? As a pointer to elements only? By assignment? By memory copying? Similarly what data types can be inserted into the structure? Integer types only (char,short,int, long)? Pointers only (void *)? Basic C data types (int, float, char *)? Arrays (char [80])? Actual C Structures? Variable sized C Structures? For example, with a basic type such as an integer you would expect the data structure to just make a copy of it and not use pointers. Similarly with a pointer. But what if an actual C structure is given and not a pointer to a structure? Should it be copied, or just disallowed. The last is very dependant on the compiler available. What if the data type is highly variable in length and the user only wants to allocate just enough memory required for each of his data elements to be stored. In the X window system, for example, has a number of such element. For instance `XEvent' and `Widget', are pointers to such variable C structures. Lookup: How does the data structure return a copy of the element stored in the data structure. This Does the user need to free the element returned or would this cause it to free the copy of the element still in the structure as well. Destruction/Removal: The problem with element type destruction is that only the user know how to destroy the element data he provides. Many data structure inplimentations for pointers assume that to destroy the item stored you only need to free the structure pointed to. WRONG! Some of the problems with this are: Pointers associated information. EG: to a linked list of lines for a polygon data object. Pointers to related but separate structures. EG: a pointer to the data structure the element is stored in. The element may also be referenced elsewhere. EG: It may be in another different data structure. In short only the user knows how to (or if) to destory the data stored in the data structure. How does the data structure do this? The best and only good answer for this, is to follow a good programming rule of thumb. I have seen many programs ``Whoever allocated/created it must itself free it'' If the above rule is to be broken, it must be verbosely written as being so in manuals, proceedural comments and general comments. IE: if the user allocated the stored data, it must be given back to him/she to be freed. I have seen many routines (not just data structures) which allocate memory, but either never free it or tell the user how to free it. For example: many manuals mention the library `strdup' function to duplicate a string but do not mention that you must call `free' when you are finish using it. A good memory debugging library usually finds these problems. This brings us to the main problem, that is usably ignored by most implimentations I have seen. Destroying a instance of the data structure completely when finished. In this case you can usually use a technique to very quickly retrieve elements without having to ensure the data structure remains in correct usable order. The only good techiques I have found are. 1/ Destroy instance call with a destory element proceedure as an argument for the user to supply. 2/ Provide a trio of functions (macros) to destory the structure destory_start(ds); while( (data = destroy_element(ds)) != NullValue ) { /* destroy user data here */ } destroy_end(ds); Key Search & Comparisions: Though not as critical as the other operations above, it is still important. In designing a data structure with macros, all the other operations above can actually be performed inside the macros. Thus solving any typing problem that may exist. Comparisions however are part of the search loop of the structure and thus can't be placed in a macro. How should the users comparision function be called. What should it supply, pointers to the stored data like the qsort() library function, or more logically a copy of the data stored so it couldn't possibably be changed. Can the comparasion function be different for different instances? Does the comparision function need to be provided on every operation or only when an instance of a data structure is declared? 2/ Does the user need to provide any special provisions to the element he wants to store in the stucture? For example, does he need to add a `next' field to his C structure for a linked list? Can he specify the a field name for this use, so the data can be stored in multiple structures of the same type. 3/ Polymophism of the data structure (how contrived is it?) Can you provide multiple and separate instances of the data structure? Can each instance use a different element types from each other? Does a different data type require the procedures to be duplicated? * One element type only, one instance only, per program. The acutal data type is usually defined in the structures header file at a specific location. This is the the most common type of data stuturure and the simplest to both use and write. In fact it is just a typlical direct implementation of a specific data type with the type being modified by a defined macro. * One element type per coded module of the data structure. This is generally achieved by compiling the data structures module twice with a different set of defines, to produce a different set of procedures for each element type required. The only additional complexity required for the encoding over that previously is to use the macros to re-label the procedures and structure definitions using either the type being stored or more commonly a string given my the user. This method can be carried even further by encoding all the proceedure and data structure definitions into a single marco definition. That macro definition is then inserted by the programmer into is code as required. The macro call is supplied it with the names (or part of the names) of the data structures, procedures, and element type to be used. The best example of this I have seen is a ``general linked list sorter'' proceedure. The programmer just includes a header at the top of his code and then adds a single macro call, suppling: the name of the produre to create; the data type of the elements in the users linked list; and the name of the field in the element structure which pointes to the next element. The macro then expands into two recursive procedures which does a merge sort on any linked link of the type given. The comparision function is given on the call to the actual sorting proceedure created. Encoding all the procedures needed in a single macro is probably extreme, and unwarrented, but it is not too difficult to do and results in only a single include header file. * One code that handles multiple instances of Multiple element types. This is a true polymorphic data structure, and very difficult to achieve without some sort of restriction. The most common restriction is to limit the element type to either a pointer, or at least interger sized. This is easy to do and both of the previous types can be converted to this form by using an element type of ``void *''. This is a cop out. I have yet to see a truly polymorphic data structure in C, though below I have outlined a method which may produce such a structure. 5/ How is insertion/retrival for keyed elements defined. Does the key need to be part of the stucture being inserted or is is a separate element? What restriction is their of the type of a key element? Can the key type be itself a structure? How is the key tested, and is the test for equality or relative? Can you insert the same key multiple times? If so in what order are they retrived FIFO, FILO, UNDEFINED. 6/ Can Data Stucture instance variable change value for the same structure? For example, in a typical stack implimentation, a stack instance is just a pointer to the top element on the stack. This would vary as elements are pushed on and poped off the stack. This could be a disaster if the user wants to reference the stack in other places such as another data structure, or pass the stack to other procedures. A better idea is to return a pointer to a stack header instead of the top of stack. 7/ How to handle error conditions? Return a pre-declared or user supplied `Null' type value or other error value? Or just NULL for pointers? Call a error handler? Can the user define the error handler? Can you continue after an error? Can the above items be different for different instances? ------------------------------------------------------------------------------- Methods of Implementation 1/ Restricated to pointers only (void *), with multiple instances This is simple and satisifys most of the problems above. It also allows NULL to be used for error returns and works well in just about all possible cases. But it is not truly polymorphic and thus is a cop out. Problems: No type checking of data elements of the structure. How to access any keys stored within the elements. Comparisions are pointers to the pointer or the pointer itself? 2/ Use defines to set the element type, and other required information to compile into generic procedures. Problems: Only one type of any data element can be defined in a program, unless the procedures can be re-labeled by another define. 3/ Use Macros to write procedures and declarations for the specific element type wanted. The ``linked list merge sorting'' program mentioned above was of this type. The procedures names are modified to include a name provided by the macro caller, thus insuring the ability to create a set of procedures to handle any element type required. Although multiple element types and instances are catered for this method is highly wasteful of code generation, with multple copied of procedures with very little differances. This method however is perfect for cases where only one data type needs to be stored. Problem: Duplicate code for different element types. 4/ Macros to make data structure declarations for a specific data type, but does not use any type defines to modify procedures to a specific type. The procedures themselves handle all instances of the structure. Example for a stack type: #include "stack.h" /* include this file */ STKdecl(FooStack, Foo ); /* Declare a stack of type FooStack for elements of type Foo */ STKinit(stack, FooStack, NullValue); /* Create and Initialize `stack' as an instance of stack type FooStack, declare any empty/error returns as NullValue */ The major problem is how to copy the elements (char, int, pointer) into the data structure without knowing the actual type at that time. The solution for insertsion and retrival is to call the procedures to only find (and create?) the structure, but have macro calls do the handling of the users data into and out of the data structure. The macros being expanded in the users code area, and will know and understand the type of the users data and so typechecking is performed there. IE: procedures only handle data structure nodes and macros handle the elements within the nodes. Comparision function calls make this problem even more difficult. These functions must be called within the data structures procedures (due to the need for looping), and can't thus know the data type of the functions arguments. To solve this problem you could either insist that you only allow pointers and integers to be stored (so why bother with all this?), or restrict the comparision function to compare pointers to the elements stored in the data structures nodes. The later, is the method that the C standard library functions qsort() and bsearch() so the method is well known. This method works well if you store actual C structures or intergers in the structure, but introduces a extra level of indirection if you store pointers to C structures in the nodes. IE, the comparision will be comparing a pointer to the pointer of the users data being compared. This last could also allow the user to change the actual value of the element stored in the data structure instead of only being given a copy of the element stored. But it will work fine. Problems: This method is easy for data structures that need no loops such as stacks and simple list structures (no sorting or search), but is harder to program for other data stuctures. The use of macros require care for side effect problems. This usually means that the structure header would usually need some temp variables included in them for the various macros to use. Comparision functions will be given pointers to the element stored in structure, instead of a copy of the element stored. IE: By-Referance instead of By-Value This could allow naive programmers to modify the element and thus destroying the data structures consistancy. -------------------------------------------------------------------------------