
1: @node Searching and Sorting, Pattern Matching, Message Translation, Top 2: @c %MENU% General searching and sorting functions 3: @chapter Searching and Sorting 4: 5: This chapter describes functions for searching and sorting arrays of 6: arbitrary objects. You pass the appropriate comparison function to be 7: applied as an argument, along with the size of the objects in the array 8: and the total number of elements. 9: 10: @menu 11: * Comparison Functions:: Defining how to compare two objects. 12: Since the sort and search facilities 13: are general, you have to specify the 14: ordering. 15: * Array Search Function:: The @code{bsearch} function. 16: * Array Sort Function:: The @code{qsort} function. 17: * Search/Sort Example:: An example program. 18: * Hash Search Function:: The @code{hsearch} function. 19: * Tree Search Function:: The @code{tsearch} function. 20: @end menu 21: 22: @node Comparison Functions 23: @section Defining the Comparison Function 24: @cindex Comparison Function 25: 26: In order to use the sorted array library functions, you have to describe 27: how to compare the elements of the array. 28: 29: To do this, you supply a comparison function to compare two elements of 30: the array. The library will call this function, passing as arguments 31: pointers to two array elements to be compared. Your comparison function 32: should return a value the way @code{strcmp} (@pxref{String/Array 33: Comparison}) does: negative if the first argument is ``less'' than the 34: second, zero if they are ``equal'', and positive if the first argument 35: is ``greater''. 36: 37: Here is an example of a comparison function which works with an array of 38: numbers of type @code{double}: 39: 40: @smallexample 41: int 42: compare_doubles (const void *a, const void *b) 43: @{ 44: const double *da = (const double *) a; 45: const double *db = (const double *) b; 46: 47: return (*da > *db) - (*da < *db); 48: @} 49: @end smallexample 50: 51: The header file @file{stdlib.h} defines a name for the data type of 52: comparison functions. This type is a GNU extension. 53: 54: @comment stdlib.h 55: @comment GNU 56: @tindex comparison_fn_t 57: @smallexample 58: int comparison_fn_t (const void *, const void *); 59: @end smallexample 60: 61: @node Array Search Function 62: @section Array Search Function 63: @cindex search function (for arrays) 64: @cindex binary search function (for arrays) 65: @cindex array search function 66: 67: Generally searching for a specific element in an array means that 68: potentially all elements must be checked. The GNU C library contains 69: functions to perform linear search. The prototypes for the following 70: two functions can be found in @file{search.h}. 71: 72: @comment search.h 73: @comment SVID 74: @deftypefun {void *} lfind (const void *@var{key}, void *@var{base}, size_t *@var{nmemb}, size_t @var{size}, comparison_fn_t @var{compar}) 75: The @code{lfind} function searches in the array with @code{*@var{nmemb}} 76: elements of @var{size} bytes pointed to by @var{base} for an element 77: which matches the one pointed to by @var{key}. The function pointed to 78: by @var{compar} is used decide whether two elements match. 79: 80: The return value is a pointer to the matching element in the array 81: starting at @var{base} if it is found. If no matching element is 82: available @code{NULL} is returned. 83: 84: The mean runtime of this function is @code{*@var{nmemb}}/2. This 85: function should only be used if elements often get added to or deleted from 86: the array in which case it might not be useful to sort the array before 87: searching. 88: @end deftypefun 89: 90: @comment search.h 91: @comment SVID 92: @deftypefun {void *} lsearch (const void *@var{key}, void *@var{base}, size_t *@var{nmemb}, size_t @var{size}, comparison_fn_t @var{compar}) 93: The @code{lsearch} function is similar to the @code{lfind} function. It 94: searches the given array for an element and returns it if found. The 95: difference is that if no matching element is found the @code{lsearch} 96: function adds the object pointed to by @var{key} (with a size of 97: @var{size} bytes) at the end of the array and it increments the value of 98: @code{*@var{nmemb}} to reflect this addition. 99: 100: This means for the caller that if it is not sure that the array contains 101: the element one is searching for the memory allocated for the array 102: starting at @var{base} must have room for at least @var{size} more 103: bytes. If one is sure the element is in the array it is better to use 104: @code{lfind} so having more room in the array is always necessary when 105: calling @code{lsearch}. 106: @end deftypefun 107: 108: To search a sorted array for an element matching the key, use the 109: @code{bsearch} function. The prototype for this function is in 110: the header file @file{stdlib.h}. 111: @pindex stdlib.h 112: 113: @comment stdlib.h 114: @comment ISO 115: @deftypefun {void *} bsearch (const void *@var{key}, const void *@var{array}, size_t @var{count}, size_t @var{size}, comparison_fn_t @var{compare}) 116: The @code{bsearch} function searches the sorted array @var{array} for an object 117: that is equivalent to @var{key}. The array contains @var{count} elements, 118: each of which is of size @var{size} bytes. 119: 120: The @var{compare} function is used to perform the comparison. This 121: function is called with two pointer arguments and should return an 122: integer less than, equal to, or greater than zero corresponding to 123: whether its first argument is considered less than, equal to, or greater 124: than its second argument. The elements of the @var{array} must already 125: be sorted in ascending order according to this comparison function. 126: 127: The return value is a pointer to the matching array element, or a null 128: pointer if no match is found. If the array contains more than one element 129: that matches, the one that is returned is unspecified. 130: 131: This function derives its name from the fact that it is implemented 132: using the binary search algorithm. 133: @end deftypefun 134: 135: @node Array Sort Function 136: @section Array Sort Function 137: @cindex sort function (for arrays) 138: @cindex quick sort function (for arrays) 139: @cindex array sort function 140: 141: To sort an array using an arbitrary comparison function, use the 142: @code{qsort} function. The prototype for this function is in 143: @file{stdlib.h}. 144: @pindex stdlib.h 145: 146: @comment stdlib.h 147: @comment ISO 148: @deftypefun void qsort (void *@var{array}, size_t @var{count}, size_t @var{size}, comparison_fn_t @var{compare}) 149: The @var{qsort} function sorts the array @var{array}. The array contains 150: @var{count} elements, each of which is of size @var{size}. 151: 152: The @var{compare} function is used to perform the comparison on the 153: array elements. This function is called with two pointer arguments and 154: should return an integer less than, equal to, or greater than zero 155: corresponding to whether its first argument is considered less than, 156: equal to, or greater than its second argument. 157: 158: @cindex stable sorting 159: @strong{Warning:} If two objects compare as equal, their order after 160: sorting is unpredictable. That is to say, the sorting is not stable. 161: This can make a difference when the comparison considers only part of 162: the elements. Two elements with the same sort key may differ in other 163: respects. 164: 165: If you want the effect of a stable sort, you can get this result by 166: writing the comparison function so that, lacking other reason 167: distinguish between two elements, it compares them by their addresses. 168: Note that doing this may make the sorting algorithm less efficient, so 169: do it only if necessary. 170: 171: Here is a simple example of sorting an array of doubles in numerical 172: order, using the comparison function defined above (@pxref{Comparison 173: Functions}): 174: 175: @smallexample 176: @{ 177: double *array; 178: int size; 179: @dots{} 180: qsort (array, size, sizeof (double), compare_doubles); 181: @} 182: @end smallexample 183: 184: The @code{qsort} function derives its name from the fact that it was 185: originally implemented using the ``quick sort'' algorithm. 186: 187: The implementation of @code{qsort} in this library might not be an 188: in-place sort and might thereby use an extra amount of memory to store 189: the array. 190: @end deftypefun 191: 192: @node Search/Sort Example 193: @section Searching and Sorting Example 194: 195: Here is an example showing the use of @code{qsort} and @code{bsearch} 196: with an array of structures. The objects in the array are sorted 197: by comparing their @code{name} fields with the @code{strcmp} function. 198: Then, we can look up individual objects based on their names. 199: 200: @comment This example is dedicated to the memory of Jim Henson. RIP. 201: @smallexample 202: @include search.c.texi 203: @end smallexample 204: 205: @cindex Kermit the frog 206: The output from this program looks like: 207: 208: @smallexample 209: Kermit, the frog 210: Piggy, the pig 211: Gonzo, the whatever 212: Fozzie, the bear 213: Sam, the eagle 214: Robin, the frog 215: Animal, the animal 216: Camilla, the chicken 217: Sweetums, the monster 218: Dr. Strangepork, the pig 219: Link Hogthrob, the pig 220: Zoot, the human 221: Dr. Bunsen Honeydew, the human 222: Beaker, the human 223: Swedish Chef, the human 224: 225: Animal, the animal 226: Beaker, the human 227: Camilla, the chicken 228: Dr. Bunsen Honeydew, the human 229: Dr. Strangepork, the pig 230: Fozzie, the bear 231: Gonzo, the whatever 232: Kermit, the frog 233: Link Hogthrob, the pig 234: Piggy, the pig 235: Robin, the frog 236: Sam, the eagle 237: Swedish Chef, the human 238: Sweetums, the monster 239: Zoot, the human 240: 241: Kermit, the frog 242: Gonzo, the whatever 243: Couldn't find Janice. 244: @end smallexample 245: 246: 247: @node Hash Search Function 248: @section The @code{hsearch} function. 249: 250: The functions mentioned so far in this chapter are for searching in a sorted 251: or unsorted array. There are other methods to organize information 252: which later should be searched. The costs of insert, delete and search 253: differ. One possible implementation is using hashing tables. 254: The following functions are declared in the header file @file{search.h}. 255: 256: @comment search.h 257: @comment SVID 258: @deftypefun int hcreate (size_t @var{nel}) 259: The @code{hcreate} function creates a hashing table which can contain at 260: least @var{nel} elements. There is no possibility to grow this table so 261: it is necessary to choose the value for @var{nel} wisely. The method 262: used to implement this function might make it necessary to make the 263: number of elements in the hashing table larger than the expected maximal 264: number of elements. Hashing tables usually work inefficiently if they are 265: filled 80% or more. The constant access time guaranteed by hashing can 266: only be achieved if few collisions exist. See Knuth's ``The Art of 267: Computer Programming, Part 3: Searching and Sorting'' for more 268: information. 269: 270: The weakest aspect of this function is that there can be at most one 271: hashing table used through the whole program. The table is allocated 272: in local memory out of control of the programmer. As an extension the 273: GNU C library provides an additional set of functions with an reentrant 274: interface which provide a similar interface but which allow to keep 275: arbitrarily many hashing tables. 276: 277: It is possible to use more than one hashing table in the program run if 278: the former table is first destroyed by a call to @code{hdestroy}. 279: 280: The function returns a non-zero value if successful. If it return zero 281: something went wrong. This could either mean there is already a hashing 282: table in use or the program runs out of memory. 283: @end deftypefun 284: 285: @comment search.h 286: @comment SVID 287: @deftypefun void hdestroy (void) 288: The @code{hdestroy} function can be used to free all the resources 289: allocated in a previous call of @code{hcreate}. After a call to this 290: function it is again possible to call @code{hcreate} and allocate a new 291: table with possibly different size. 292: 293: It is important to remember that the elements contained in the hashing 294: table at the time @code{hdestroy} is called are @emph{not} freed by this 295: function. It is the responsibility of the program code to free those 296: strings (if necessary at all). Freeing all the element memory is not 297: possible without extra, separately kept information since there is no 298: function to iterate through all available elements in the hashing table. 299: If it is really necessary to free a table and all elements the 300: programmer has to keep a list of all table elements and before calling 301: @code{hdestroy} s/he has to free all element's data using this list. 302: This is a very unpleasant mechanism and it also shows that this kind of 303: hashing tables is mainly meant for tables which are created once and 304: used until the end of the program run. 305: @end deftypefun 306: 307: Entries of the hashing table and keys for the search are defined using 308: this type: 309: 310: @deftp {Data type} {struct ENTRY} 311: Both elements of this structure are pointers to zero-terminated strings. 312: This is a limiting restriction of the functionality of the 313: @code{hsearch} functions. They can only be used for data sets which use 314: the NUL character always and solely to terminate the records. It is not 315: possible to handle general binary data. 316: 317: @table @code 318: @item char *key 319: Pointer to a zero-terminated string of characters describing the key for 320: the search or the element in the hashing table. 321: @item char *data 322: Pointer to a zero-terminated string of characters describing the data. 323: If the functions will be called only for searching an existing entry 324: this element might stay undefined since it is not used. 325: @end table 326: @end deftp 327: 328: @comment search.h 329: @comment SVID 330: @deftypefun {ENTRY *} hsearch (ENTRY @var{item}, ACTION @var{action}) 331: To search in a hashing table created using @code{hcreate} the 332: @code{hsearch} function must be used. This function can perform simple 333: search for an element (if @var{action} has the @code{FIND}) or it can 334: alternatively insert the key element into the hashing table. Entries 335: are never replaced. 336: 337: The key is denoted by a pointer to an object of type @code{ENTRY}. For 338: locating the corresponding position in the hashing table only the 339: @code{key} element of the structure is used. 340: 341: If an entry with matching key is found the @var{action} parameter is 342: irrelevant. The found entry is returned. If no matching entry is found 343: and the @var{action} parameter has the value @code{FIND} the function 344: returns a @code{NULL} pointer. If no entry is found and the 345: @var{action} parameter has the value @code{ENTER} a new entry is added 346: to the hashing table which is initialized with the parameter @var{item}. 347: A pointer to the newly added entry is returned. 348: @end deftypefun 349: 350: As mentioned before the hashing table used by the functions described so 351: far is global and there can be at any time at most one hashing table in 352: the program. A solution is to use the following functions which are a 353: GNU extension. All have in common that they operate on a hashing table 354: which is described by the content of an object of the type @code{struct 355: hsearch_data}. This type should be treated as opaque, none of its 356: members should be changed directly. 357: 358: @comment search.h 359: @comment GNU 360: @deftypefun int hcreate_r (size_t @var{nel}, struct hsearch_data *@var{htab}) 361: The @code{hcreate_r} function initializes the object pointed to by 362: @var{htab} to contain a hashing table with at least @var{nel} elements. 363: So this function is equivalent to the @code{hcreate} function except 364: that the initialized data structure is controlled by the user. 365: 366: This allows having more than one hashing table at one time. The memory 367: necessary for the @code{struct hsearch_data} object can be allocated 368: dynamically. It must be initialized with zero before calling this 369: function. 370: 371: The return value is non-zero if the operation was successful. If the 372: return value is zero, something went wrong, which probably means the 373: programs ran out of memory. 374: @end deftypefun 375: 376: @comment search.h 377: @comment GNU 378: @deftypefun void hdestroy_r (struct hsearch_data *@var{htab}) 379: The @code{hdestroy_r} function frees all resources allocated by the 380: @code{hcreate_r} function for this very same object @var{htab}. As for 381: @code{hdestroy} it is the programs responsibility to free the strings 382: for the elements of the table. 383: @end deftypefun 384: 385: @comment search.h 386: @comment GNU 387: @deftypefun int hsearch_r (ENTRY @var{item}, ACTION @var{action}, ENTRY **@var{retval}, struct hsearch_data *@var{htab}) 388: The @code{hsearch_r} function is equivalent to @code{hsearch}. The 389: meaning of the first two arguments is identical. But instead of 390: operating on a single global hashing table the function works on the 391: table described by the object pointed to by @var{htab} (which is 392: initialized by a call to @code{hcreate_r}). 393: 394: Another difference to @code{hcreate} is that the pointer to the found 395: entry in the table is not the return value of the functions. It is 396: returned by storing it in a pointer variables pointed to by the 397: @var{retval} parameter. The return value of the function is an integer 398: value indicating success if it is non-zero and failure if it is zero. 399: In the latter case the global variable @var{errno} signals the reason for 400: the failure. 401: 402: @table @code 403: @item ENOMEM 404: The table is filled and @code{hsearch_r} was called with an so far 405: unknown key and @var{action} set to @code{ENTER}. 406: @item ESRCH 407: The @var{action} parameter is @code{FIND} and no corresponding element 408: is found in the table. 409: @end table 410: @end deftypefun 411: 412: 413: @node Tree Search Function 414: @section The @code{tsearch} function. 415: 416: Another common form to organize data for efficient search is to use 417: trees. The @code{tsearch} function family provides a nice interface to 418: functions to organize possibly large amounts of data by providing a mean 419: access time proportional to the logarithm of the number of elements. 420: The GNU C library implementation even guarantees that this bound is 421: never exceeded even for input data which cause problems for simple 422: binary tree implementations. 423: 424: The functions described in the chapter are all described in the @w{System 425: V} and X/Open specifications and are therefore quite portable. 426: 427: In contrast to the @code{hsearch} functions the @code{tsearch} functions 428: can be used with arbitrary data and not only zero-terminated strings. 429: 430: The @code{tsearch} functions have the advantage that no function to 431: initialize data structures is necessary. A simple pointer of type 432: @code{void *} initialized to @code{NULL} is a valid tree and can be 433: extended or searched. The prototypes for these functions can be found 434: in the header file @file{search.h}. 435: 436: @comment search.h 437: @comment SVID 438: @deftypefun {void *} tsearch (const void *@var{key}, void **@var{rootp}, comparison_fn_t @var{compar}) 439: The @code{tsearch} function searches in the tree pointed to by 440: @code{*@var{rootp}} for an element matching @var{key}. The function 441: pointed to by @var{compar} is used to determine whether two elements 442: match. @xref{Comparison Functions}, for a specification of the functions 443: which can be used for the @var{compar} parameter. 444: 445: If the tree does not contain a matching entry the @var{key} value will 446: be added to the tree. @code{tsearch} does not make a copy of the object 447: pointed to by @var{key} (how could it since the size is unknown). 448: Instead it adds a reference to this object which means the object must 449: be available as long as the tree data structure is used. 450: 451: The tree is represented by a pointer to a pointer since it is sometimes 452: necessary to change the root node of the tree. So it must not be 453: assumed that the variable pointed to by @var{rootp} has the same value 454: after the call. This also shows that it is not safe to call the 455: @code{tsearch} function more than once at the same time using the same 456: tree. It is no problem to run it more than once at a time on different 457: trees. 458: 459: The return value is a pointer to the matching element in the tree. If a 460: new element was created the pointer points to the new data (which is in 461: fact @var{key}). If an entry had to be created and the program ran out 462: of space @code{NULL} is returned. 463: @end deftypefun 464: 465: @comment search.h 466: @comment SVID 467: @deftypefun {void *} tfind (const void *@var{key}, void *const *@var{rootp}, comparison_fn_t @var{compar}) 468: The @code{tfind} function is similar to the @code{tsearch} function. It 469: locates an element matching the one pointed to by @var{key} and returns 470: a pointer to this element. But if no matching element is available no 471: new element is entered (note that the @var{rootp} parameter points to a 472: constant pointer). Instead the function returns @code{NULL}. 473: @end deftypefun 474: 475: Another advantage of the @code{tsearch} function in contrast to the 476: @code{hsearch} functions is that there is an easy way to remove 477: elements. 478: 479: @comment search.h 480: @comment SVID 481: @deftypefun {void *} tdelete (const void *@var{key}, void **@var{rootp}, comparison_fn_t @var{compar}) 482: To remove a specific element matching @var{key} from the tree 483: @code{tdelete} can be used. It locates the matching element using the 484: same method as @code{tfind}. The corresponding element is then removed 485: and a pointer to the parent of the deleted node is returned by the 486: function. If there is no matching entry in the tree nothing can be 487: deleted and the function returns @code{NULL}. If the root of the tree 488: is deleted @code{tdelete} returns some unspecified value not equal to 489: @code{NULL}. 490: @end deftypefun 491: 492: @comment search.h 493: @comment GNU 494: @deftypefun void tdestroy (void *@var{vroot}, __free_fn_t @var{freefct}) 495: If the complete search tree has to be removed one can use 496: @code{tdestroy}. It frees all resources allocated by the @code{tsearch} 497: function to generate the tree pointed to by @var{vroot}. 498: 499: For the data in each tree node the function @var{freefct} is called. 500: The pointer to the data is passed as the argument to the function. If 501: no such work is necessary @var{freefct} must point to a function doing 502: nothing. It is called in any case. 503: 504: This function is a GNU extension and not covered by the @w{System V} or 505: X/Open specifications. 506: @end deftypefun 507: 508: In addition to the function to create and destroy the tree data 509: structure, there is another function which allows you to apply a 510: function to all elements of the tree. The function must have this type: 511: 512: @smallexample 513: void __action_fn_t (const void *nodep, VISIT value, int level); 514: @end smallexample 515: 516: The @var{nodep} is the data value of the current node (once given as the 517: @var{key} argument to @code{tsearch}). @var{level} is a numeric value 518: which corresponds to the depth of the current node in the tree. The 519: root node has the depth @math{0} and its children have a depth of 520: @math{1} and so on. The @code{VISIT} type is an enumeration type. 521: 522: @deftp {Data Type} VISIT 523: The @code{VISIT} value indicates the status of the current node in the 524: tree and how the function is called. The status of a node is either 525: `leaf' or `internal node'. For each leaf node the function is called 526: exactly once, for each internal node it is called three times: before 527: the first child is processed, after the first child is processed and 528: after both children are processed. This makes it possible to handle all 529: three methods of tree traversal (or even a combination of them). 530: 531: @table @code 532: @item preorder 533: The current node is an internal node and the function is called before 534: the first child was processed. 535: @item postorder 536: The current node is an internal node and the function is called after 537: the first child was processed. 538: @item endorder 539: The current node is an internal node and the function is called after 540: the second child was processed. 541: @item leaf 542: The current node is a leaf. 543: @end table 544: @end deftp 545: 546: @comment search.h 547: @comment SVID 548: @deftypefun void twalk (const void *@var{root}, __action_fn_t @var{action}) 549: For each node in the tree with a node pointed to by @var{root}, the 550: @code{twalk} function calls the function provided by the parameter 551: @var{action}. For leaf nodes the function is called exactly once with 552: @var{value} set to @code{leaf}. For internal nodes the function is 553: called three times, setting the @var{value} parameter or @var{action} to 554: the appropriate value. The @var{level} argument for the @var{action} 555: function is computed while descending the tree with increasing the value 556: by one for the descend to a child, starting with the value @math{0} for 557: the root node. 558: 559: Since the functions used for the @var{action} parameter to @code{twalk} 560: must not modify the tree data, it is safe to run @code{twalk} in more 561: than one thread at the same time, working on the same tree. It is also 562: safe to call @code{tfind} in parallel. Functions which modify the tree 563: must not be used, otherwise the behavior is undefined. 564: @end deftypefun