(linenum→info "unix/slp.c:2238")

glibc/2.7/manual/search.texi

    1: @node Searching and Sorting, Pattern Matching, Message Translation, Top
    2: @c %MENU% General searching and sorting functions
    3: @chapter Searching and Sorting
    4: 
    5: This chapter describes functions for searching and sorting arrays of
    6: arbitrary objects.  You pass the appropriate comparison function to be
    7: applied as an argument, along with the size of the objects in the array
    8: and the total number of elements.
    9: 
   10: @menu
   11: * Comparison Functions::        Defining how to compare two objects.
   12:                                  Since the sort and search facilities
   13:                                  are general, you have to specify the
   14:                                  ordering.
   15: * Array Search Function::       The @code{bsearch} function.
   16: * Array Sort Function::         The @code{qsort} function.
   17: * Search/Sort Example::         An example program.
   18: * Hash Search Function::        The @code{hsearch} function.
   19: * Tree Search Function::        The @code{tsearch} function.
   20: @end menu
   21: 
   22: @node Comparison Functions
   23: @section Defining the Comparison Function
   24: @cindex Comparison Function
   25: 
   26: In order to use the sorted array library functions, you have to describe
   27: how to compare the elements of the array.
   28: 
   29: To do this, you supply a comparison function to compare two elements of
   30: the array.  The library will call this function, passing as arguments
   31: pointers to two array elements to be compared.  Your comparison function
   32: should return a value the way @code{strcmp} (@pxref{String/Array
   33: Comparison}) does: negative if the first argument is ``less'' than the
   34: second, zero if they are ``equal'', and positive if the first argument
   35: is ``greater''.
   36: 
   37: Here is an example of a comparison function which works with an array of
   38: numbers of type @code{double}:
   39: 
   40: @smallexample
   41: int
   42: compare_doubles (const void *a, const void *b)
   43: @{
   44:   const double *da = (const double *) a;
   45:   const double *db = (const double *) b;
   46: 
   47:   return (*da > *db) - (*da < *db);
   48: @}
   49: @end smallexample
   50: 
   51: The header file @file{stdlib.h} defines a name for the data type of
   52: comparison functions.  This type is a GNU extension.
   53: 
   54: @comment stdlib.h
   55: @comment GNU
   56: @tindex comparison_fn_t
   57: @smallexample
   58: int comparison_fn_t (const void *, const void *);
   59: @end smallexample
   60: 
   61: @node Array Search Function
   62: @section Array Search Function
   63: @cindex search function (for arrays)
   64: @cindex binary search function (for arrays)
   65: @cindex array search function
   66: 
   67: Generally searching for a specific element in an array means that
   68: potentially all elements must be checked.  The GNU C library contains
   69: functions to perform linear search.  The prototypes for the following
   70: two functions can be found in @file{search.h}.
   71: 
   72: @comment search.h
   73: @comment SVID
   74: @deftypefun {void *} lfind (const void *@var{key}, void *@var{base}, size_t *@var{nmemb}, size_t @var{size}, comparison_fn_t @var{compar})
   75: The @code{lfind} function searches in the array with @code{*@var{nmemb}}
   76: elements of @var{size} bytes pointed to by @var{base} for an element
   77: which matches the one pointed to by @var{key}.  The function pointed to
   78: by @var{compar} is used decide whether two elements match.
   79: 
   80: The return value is a pointer to the matching element in the array
   81: starting at @var{base} if it is found.  If no matching element is
   82: available @code{NULL} is returned.
   83: 
   84: The mean runtime of this function is @code{*@var{nmemb}}/2.  This
   85: function should only be used if elements often get added to or deleted from
   86: the array in which case it might not be useful to sort the array before
   87: searching.
   88: @end deftypefun
   89: 
   90: @comment search.h
   91: @comment SVID
   92: @deftypefun {void *} lsearch (const void *@var{key}, void *@var{base}, size_t *@var{nmemb}, size_t @var{size}, comparison_fn_t @var{compar})
   93: The @code{lsearch} function is similar to the @code{lfind} function.  It
   94: searches the given array for an element and returns it if found.  The
   95: difference is that if no matching element is found the @code{lsearch}
   96: function adds the object pointed to by @var{key} (with a size of
   97: @var{size} bytes) at the end of the array and it increments the value of
   98: @code{*@var{nmemb}} to reflect this addition.
   99: 
  100: This means for the caller that if it is not sure that the array contains
  101: the element one is searching for the memory allocated for the array
  102: starting at @var{base} must have room for at least @var{size} more
  103: bytes.  If one is sure the element is in the array it is better to use
  104: @code{lfind} so having more room in the array is always necessary when
  105: calling @code{lsearch}.
  106: @end deftypefun
  107: 
  108: To search a sorted array for an element matching the key, use the
  109: @code{bsearch} function.  The prototype for this function is in
  110: the header file @file{stdlib.h}.
  111: @pindex stdlib.h
  112: 
  113: @comment stdlib.h
  114: @comment ISO
  115: @deftypefun {void *} bsearch (const void *@var{key}, const void *@var{array}, size_t @var{count}, size_t @var{size}, comparison_fn_t @var{compare})
  116: The @code{bsearch} function searches the sorted array @var{array} for an object
  117: that is equivalent to @var{key}.  The array contains @var{count} elements,
  118: each of which is of size @var{size} bytes.
  119: 
  120: The @var{compare} function is used to perform the comparison.  This
  121: function is called with two pointer arguments and should return an
  122: integer less than, equal to, or greater than zero corresponding to
  123: whether its first argument is considered less than, equal to, or greater
  124: than its second argument.  The elements of the @var{array} must already
  125: be sorted in ascending order according to this comparison function.
  126: 
  127: The return value is a pointer to the matching array element, or a null
  128: pointer if no match is found.  If the array contains more than one element
  129: that matches, the one that is returned is unspecified.
  130: 
  131: This function derives its name from the fact that it is implemented
  132: using the binary search algorithm.
  133: @end deftypefun
  134: 
  135: @node Array Sort Function
  136: @section Array Sort Function
  137: @cindex sort function (for arrays)
  138: @cindex quick sort function (for arrays)
  139: @cindex array sort function
  140: 
  141: To sort an array using an arbitrary comparison function, use the
  142: @code{qsort} function.  The prototype for this function is in
  143: @file{stdlib.h}.
  144: @pindex stdlib.h
  145: 
  146: @comment stdlib.h
  147: @comment ISO
  148: @deftypefun void qsort (void *@var{array}, size_t @var{count}, size_t @var{size}, comparison_fn_t @var{compare})
  149: The @var{qsort} function sorts the array @var{array}.  The array contains
  150: @var{count} elements, each of which is of size @var{size}.
  151: 
  152: The @var{compare} function is used to perform the comparison on the
  153: array elements.  This function is called with two pointer arguments and
  154: should return an integer less than, equal to, or greater than zero
  155: corresponding to whether its first argument is considered less than,
  156: equal to, or greater than its second argument.
  157: 
  158: @cindex stable sorting
  159: @strong{Warning:} If two objects compare as equal, their order after
  160: sorting is unpredictable.  That is to say, the sorting is not stable.
  161: This can make a difference when the comparison considers only part of
  162: the elements.  Two elements with the same sort key may differ in other
  163: respects.
  164: 
  165: If you want the effect of a stable sort, you can get this result by
  166: writing the comparison function so that, lacking other reason
  167: distinguish between two elements, it compares them by their addresses.
  168: Note that doing this may make the sorting algorithm less efficient, so
  169: do it only if necessary.
  170: 
  171: Here is a simple example of sorting an array of doubles in numerical
  172: order, using the comparison function defined above (@pxref{Comparison
  173: Functions}):
  174: 
  175: @smallexample
  176: @{
  177:   double *array;
  178:   int size;
  179:   @dots{}
  180:   qsort (array, size, sizeof (double), compare_doubles);
  181: @}
  182: @end smallexample
  183: 
  184: The @code{qsort} function derives its name from the fact that it was
  185: originally implemented using the ``quick sort'' algorithm.
  186: 
  187: The implementation of @code{qsort} in this library might not be an
  188: in-place sort and might thereby use an extra amount of memory to store
  189: the array.
  190: @end deftypefun
  191: 
  192: @node Search/Sort Example
  193: @section Searching and Sorting Example
  194: 
  195: Here is an example showing the use of @code{qsort} and @code{bsearch}
  196: with an array of structures.  The objects in the array are sorted
  197: by comparing their @code{name} fields with the @code{strcmp} function.
  198: Then, we can look up individual objects based on their names.
  199: 
  200: @comment This example is dedicated to the memory of Jim Henson.  RIP.
  201: @smallexample
  202: @include search.c.texi
  203: @end smallexample
  204: 
  205: @cindex Kermit the frog
  206: The output from this program looks like:
  207: 
  208: @smallexample
  209: Kermit, the frog
  210: Piggy, the pig
  211: Gonzo, the whatever
  212: Fozzie, the bear
  213: Sam, the eagle
  214: Robin, the frog
  215: Animal, the animal
  216: Camilla, the chicken
  217: Sweetums, the monster
  218: Dr. Strangepork, the pig
  219: Link Hogthrob, the pig
  220: Zoot, the human
  221: Dr. Bunsen Honeydew, the human
  222: Beaker, the human
  223: Swedish Chef, the human
  224: 
  225: Animal, the animal
  226: Beaker, the human
  227: Camilla, the chicken
  228: Dr. Bunsen Honeydew, the human
  229: Dr. Strangepork, the pig
  230: Fozzie, the bear
  231: Gonzo, the whatever
  232: Kermit, the frog
  233: Link Hogthrob, the pig
  234: Piggy, the pig
  235: Robin, the frog
  236: Sam, the eagle
  237: Swedish Chef, the human
  238: Sweetums, the monster
  239: Zoot, the human
  240: 
  241: Kermit, the frog
  242: Gonzo, the whatever
  243: Couldn't find Janice.
  244: @end smallexample
  245: 
  246: 
  247: @node Hash Search Function
  248: @section The @code{hsearch} function.
  249: 
  250: The functions mentioned so far in this chapter are for searching in a sorted
  251: or unsorted array.  There are other methods to organize information
  252: which later should be searched.  The costs of insert, delete and search
  253: differ.  One possible implementation is using hashing tables.
  254: The following functions are declared in the header file @file{search.h}.
  255: 
  256: @comment search.h
  257: @comment SVID
  258: @deftypefun int hcreate (size_t @var{nel})
  259: The @code{hcreate} function creates a hashing table which can contain at
  260: least @var{nel} elements.  There is no possibility to grow this table so
  261: it is necessary to choose the value for @var{nel} wisely.  The method
  262: used to implement this function might make it necessary to make the
  263: number of elements in the hashing table larger than the expected maximal
  264: number of elements.  Hashing tables usually work inefficiently if they are
  265: filled 80% or more.  The constant access time guaranteed by hashing can
  266: only be achieved if few collisions exist.  See Knuth's ``The Art of
  267: Computer Programming, Part 3: Searching and Sorting'' for more
  268: information.
  269: 
  270: The weakest aspect of this function is that there can be at most one
  271: hashing table used through the whole program.  The table is allocated
  272: in local memory out of control of the programmer.  As an extension the
  273: GNU C library provides an additional set of functions with an reentrant
  274: interface which provide a similar interface but which allow to keep
  275: arbitrarily many hashing tables.
  276: 
  277: It is possible to use more than one hashing table in the program run if
  278: the former table is first destroyed by a call to @code{hdestroy}.
  279: 
  280: The function returns a non-zero value if successful.  If it return zero
  281: something went wrong.  This could either mean there is already a hashing
  282: table in use or the program runs out of memory.
  283: @end deftypefun
  284: 
  285: @comment search.h
  286: @comment SVID
  287: @deftypefun void hdestroy (void)
  288: The @code{hdestroy} function can be used to free all the resources
  289: allocated in a previous call of @code{hcreate}.  After a call to this
  290: function it is again possible to call @code{hcreate} and allocate a new
  291: table with possibly different size.
  292: 
  293: It is important to remember that the elements contained in the hashing
  294: table at the time @code{hdestroy} is called are @emph{not} freed by this
  295: function.  It is the responsibility of the program code to free those
  296: strings (if necessary at all).  Freeing all the element memory is not
  297: possible without extra, separately kept information since there is no
  298: function to iterate through all available elements in the hashing table.
  299: If it is really necessary to free a table and all elements the
  300: programmer has to keep a list of all table elements and before calling
  301: @code{hdestroy} s/he has to free all element's data using this list.
  302: This is a very unpleasant mechanism and it also shows that this kind of
  303: hashing tables is mainly meant for tables which are created once and
  304: used until the end of the program run.
  305: @end deftypefun
  306: 
  307: Entries of the hashing table and keys for the search are defined using
  308: this type:
  309: 
  310: @deftp {Data type} {struct ENTRY}
  311: Both elements of this structure are pointers to zero-terminated strings.
  312: This is a limiting restriction of the functionality of the
  313: @code{hsearch} functions.  They can only be used for data sets which use
  314: the NUL character always and solely to terminate the records.  It is not
  315: possible to handle general binary data.
  316: 
  317: @table @code
  318: @item char *key
  319: Pointer to a zero-terminated string of characters describing the key for
  320: the search or the element in the hashing table.
  321: @item char *data
  322: Pointer to a zero-terminated string of characters describing the data.
  323: If the functions will be called only for searching an existing entry
  324: this element might stay undefined since it is not used.
  325: @end table
  326: @end deftp
  327: 
  328: @comment search.h
  329: @comment SVID
  330: @deftypefun {ENTRY *} hsearch (ENTRY @var{item}, ACTION @var{action})
  331: To search in a hashing table created using @code{hcreate} the
  332: @code{hsearch} function must be used.  This function can perform simple
  333: search for an element (if @var{action} has the @code{FIND}) or it can
  334: alternatively insert the key element into the hashing table.  Entries
  335: are never replaced.
  336: 
  337: The key is denoted by a pointer to an object of type @code{ENTRY}.  For
  338: locating the corresponding position in the hashing table only the
  339: @code{key} element of the structure is used.
  340: 
  341: If an entry with matching key is found the @var{action} parameter is
  342: irrelevant.  The found entry is returned.  If no matching entry is found
  343: and the @var{action} parameter has the value @code{FIND} the function
  344: returns a @code{NULL} pointer.  If no entry is found and the
  345: @var{action} parameter has the value @code{ENTER} a new entry is added
  346: to the hashing table which is initialized with the parameter @var{item}.
  347: A pointer to the newly added entry is returned.
  348: @end deftypefun
  349: 
  350: As mentioned before the hashing table used by the functions described so
  351: far is global and there can be at any time at most one hashing table in
  352: the program.  A solution is to use the following functions which are a
  353: GNU extension.  All have in common that they operate on a hashing table
  354: which is described by the content of an object of the type @code{struct
  355: hsearch_data}.  This type should be treated as opaque, none of its
  356: members should be changed directly.
  357: 
  358: @comment search.h
  359: @comment GNU
  360: @deftypefun int hcreate_r (size_t @var{nel}, struct hsearch_data *@var{htab})
  361: The @code{hcreate_r} function initializes the object pointed to by
  362: @var{htab} to contain a hashing table with at least @var{nel} elements.
  363: So this function is equivalent to the @code{hcreate} function except
  364: that the initialized data structure is controlled by the user.
  365: 
  366: This allows having more than one hashing table at one time.  The memory
  367: necessary for the @code{struct hsearch_data} object can be allocated
  368: dynamically.  It must be initialized with zero before calling this
  369: function.
  370: 
  371: The return value is non-zero if the operation was successful.  If the
  372: return value is zero, something went wrong, which probably means the
  373: programs ran out of memory.
  374: @end deftypefun
  375: 
  376: @comment search.h
  377: @comment GNU
  378: @deftypefun void hdestroy_r (struct hsearch_data *@var{htab})
  379: The @code{hdestroy_r} function frees all resources allocated by the
  380: @code{hcreate_r} function for this very same object @var{htab}.  As for
  381: @code{hdestroy} it is the programs responsibility to free the strings
  382: for the elements of the table.
  383: @end deftypefun
  384: 
  385: @comment search.h
  386: @comment GNU
  387: @deftypefun int hsearch_r (ENTRY @var{item}, ACTION @var{action}, ENTRY **@var{retval}, struct hsearch_data *@var{htab})
  388: The @code{hsearch_r} function is equivalent to @code{hsearch}.  The
  389: meaning of the first two arguments is identical.  But instead of
  390: operating on a single global hashing table the function works on the
  391: table described by the object pointed to by @var{htab} (which is
  392: initialized by a call to @code{hcreate_r}).
  393: 
  394: Another difference to @code{hcreate} is that the pointer to the found
  395: entry in the table is not the return value of the functions.  It is
  396: returned by storing it in a pointer variables pointed to by the
  397: @var{retval} parameter.  The return value of the function is an integer
  398: value indicating success if it is non-zero and failure if it is zero.
  399: In the latter case the global variable @var{errno} signals the reason for
  400: the failure.
  401: 
  402: @table @code
  403: @item ENOMEM
  404: The table is filled and @code{hsearch_r} was called with an so far
  405: unknown key and @var{action} set to @code{ENTER}.
  406: @item ESRCH
  407: The @var{action} parameter is @code{FIND} and no corresponding element
  408: is found in the table.
  409: @end table
  410: @end deftypefun
  411: 
  412: 
  413: @node Tree Search Function
  414: @section The @code{tsearch} function.
  415: 
  416: Another common form to organize data for efficient search is to use
  417: trees.  The @code{tsearch} function family provides a nice interface to
  418: functions to organize possibly large amounts of data by providing a mean
  419: access time proportional to the logarithm of the number of elements.
  420: The GNU C library implementation even guarantees that this bound is
  421: never exceeded even for input data which cause problems for simple
  422: binary tree implementations.
  423: 
  424: The functions described in the chapter are all described in the @w{System
  425: V} and X/Open specifications and are therefore quite portable.
  426: 
  427: In contrast to the @code{hsearch} functions the @code{tsearch} functions
  428: can be used with arbitrary data and not only zero-terminated strings.
  429: 
  430: The @code{tsearch} functions have the advantage that no function to
  431: initialize data structures is necessary.  A simple pointer of type
  432: @code{void *} initialized to @code{NULL} is a valid tree and can be
  433: extended or searched.  The prototypes for these functions can be found
  434: in the header file @file{search.h}.
  435: 
  436: @comment search.h
  437: @comment SVID
  438: @deftypefun {void *} tsearch (const void *@var{key}, void **@var{rootp}, comparison_fn_t @var{compar})
  439: The @code{tsearch} function searches in the tree pointed to by
  440: @code{*@var{rootp}} for an element matching @var{key}.  The function
  441: pointed to by @var{compar} is used to determine whether two elements
  442: match.  @xref{Comparison Functions}, for a specification of the functions
  443: which can be used for the @var{compar} parameter.
  444: 
  445: If the tree does not contain a matching entry the @var{key} value will
  446: be added to the tree.  @code{tsearch} does not make a copy of the object
  447: pointed to by @var{key} (how could it since the size is unknown).
  448: Instead it adds a reference to this object which means the object must
  449: be available as long as the tree data structure is used.
  450: 
  451: The tree is represented by a pointer to a pointer since it is sometimes
  452: necessary to change the root node of the tree.  So it must not be
  453: assumed that the variable pointed to by @var{rootp} has the same value
  454: after the call.  This also shows that it is not safe to call the
  455: @code{tsearch} function more than once at the same time using the same
  456: tree.  It is no problem to run it more than once at a time on different
  457: trees.
  458: 
  459: The return value is a pointer to the matching element in the tree.  If a
  460: new element was created the pointer points to the new data (which is in
  461: fact @var{key}).  If an entry had to be created and the program ran out
  462: of space @code{NULL} is returned.
  463: @end deftypefun
  464: 
  465: @comment search.h
  466: @comment SVID
  467: @deftypefun {void *} tfind (const void *@var{key}, void *const *@var{rootp}, comparison_fn_t @var{compar})
  468: The @code{tfind} function is similar to the @code{tsearch} function.  It
  469: locates an element matching the one pointed to by @var{key} and returns
  470: a pointer to this element.  But if no matching element is available no
  471: new element is entered (note that the @var{rootp} parameter points to a
  472: constant pointer).  Instead the function returns @code{NULL}.
  473: @end deftypefun
  474: 
  475: Another advantage of the @code{tsearch} function in contrast to the
  476: @code{hsearch} functions is that there is an easy way to remove
  477: elements.
  478: 
  479: @comment search.h
  480: @comment SVID
  481: @deftypefun {void *} tdelete (const void *@var{key}, void **@var{rootp}, comparison_fn_t @var{compar})
  482: To remove a specific element matching @var{key} from the tree
  483: @code{tdelete} can be used.  It locates the matching element using the
  484: same method as @code{tfind}.  The corresponding element is then removed
  485: and a pointer to the parent of the deleted node is returned by the
  486: function.  If there is no matching entry in the tree nothing can be
  487: deleted and the function returns @code{NULL}.  If the root of the tree
  488: is deleted @code{tdelete} returns some unspecified value not equal to
  489: @code{NULL}.
  490: @end deftypefun
  491: 
  492: @comment search.h
  493: @comment GNU
  494: @deftypefun void tdestroy (void *@var{vroot}, __free_fn_t @var{freefct})
  495: If the complete search tree has to be removed one can use
  496: @code{tdestroy}.  It frees all resources allocated by the @code{tsearch}
  497: function to generate the tree pointed to by @var{vroot}.
  498: 
  499: For the data in each tree node the function @var{freefct} is called.
  500: The pointer to the data is passed as the argument to the function.  If
  501: no such work is necessary @var{freefct} must point to a function doing
  502: nothing.  It is called in any case.
  503: 
  504: This function is a GNU extension and not covered by the @w{System V} or
  505: X/Open specifications.
  506: @end deftypefun
  507: 
  508: In addition to the function to create and destroy the tree data
  509: structure, there is another function which allows you to apply a
  510: function to all elements of the tree.  The function must have this type:
  511: 
  512: @smallexample
  513: void __action_fn_t (const void *nodep, VISIT value, int level);
  514: @end smallexample
  515: 
  516: The @var{nodep} is the data value of the current node (once given as the
  517: @var{key} argument to @code{tsearch}).  @var{level} is a numeric value
  518: which corresponds to the depth of the current node in the tree.  The
  519: root node has the depth @math{0} and its children have a depth of
  520: @math{1} and so on.  The @code{VISIT} type is an enumeration type.
  521: 
  522: @deftp {Data Type} VISIT
  523: The @code{VISIT} value indicates the status of the current node in the
  524: tree and how the function is called.  The status of a node is either
  525: `leaf' or `internal node'.  For each leaf node the function is called
  526: exactly once, for each internal node it is called three times: before
  527: the first child is processed, after the first child is processed and
  528: after both children are processed.  This makes it possible to handle all
  529: three methods of tree traversal (or even a combination of them).
  530: 
  531: @table @code
  532: @item preorder
  533: The current node is an internal node and the function is called before
  534: the first child was processed.
  535: @item postorder
  536: The current node is an internal node and the function is called after
  537: the first child was processed.
  538: @item endorder
  539: The current node is an internal node and the function is called after
  540: the second child was processed.
  541: @item leaf
  542: The current node is a leaf.
  543: @end table
  544: @end deftp
  545: 
  546: @comment search.h
  547: @comment SVID
  548: @deftypefun void twalk (const void *@var{root}, __action_fn_t @var{action})
  549: For each node in the tree with a node pointed to by @var{root}, the
  550: @code{twalk} function calls the function provided by the parameter
  551: @var{action}.  For leaf nodes the function is called exactly once with
  552: @var{value} set to @code{leaf}.  For internal nodes the function is
  553: called three times, setting the @var{value} parameter or @var{action} to
  554: the appropriate value.  The @var{level} argument for the @var{action}
  555: function is computed while descending the tree with increasing the value
  556: by one for the descend to a child, starting with the value @math{0} for
  557: the root node.
  558: 
  559: Since the functions used for the @var{action} parameter to @code{twalk}
  560: must not modify the tree data, it is safe to run @code{twalk} in more
  561: than one thread at the same time, working on the same tree.  It is also
  562: safe to call @code{tfind} in parallel.  Functions which modify the tree
  563: must not be used, otherwise the behavior is undefined.
  564: @end deftypefun
1
Syntax (Markdown)