
1: @node Pattern Matching, I/O Overview, Searching and Sorting, Top 2: @c %MENU% Matching shell ``globs'' and regular expressions 3: @chapter Pattern Matching 4: 5: The GNU C Library provides pattern matching facilities for two kinds of 6: patterns: regular expressions and file-name wildcards. The library also 7: provides a facility for expanding variable and command references and 8: parsing text into words in the way the shell does. 9: 10: @menu 11: * Wildcard Matching:: Matching a wildcard pattern against a single string. 12: * Globbing:: Finding the files that match a wildcard pattern. 13: * Regular Expressions:: Matching regular expressions against strings. 14: * Word Expansion:: Expanding shell variables, nested commands, 15: arithmetic, and wildcards. 16: This is what the shell does with shell commands. 17: @end menu 18: 19: @node Wildcard Matching 20: @section Wildcard Matching 21: 22: @pindex fnmatch.h 23: This section describes how to match a wildcard pattern against a 24: particular string. The result is a yes or no answer: does the 25: string fit the pattern or not. The symbols described here are all 26: declared in @file{fnmatch.h}. 27: 28: @comment fnmatch.h 29: @comment POSIX.2 30: @deftypefun int fnmatch (const char *@var{pattern}, const char *@var{string}, int @var{flags}) 31: This function tests whether the string @var{string} matches the pattern 32: @var{pattern}. It returns @code{0} if they do match; otherwise, it 33: returns the nonzero value @code{FNM_NOMATCH}. The arguments 34: @var{pattern} and @var{string} are both strings. 35: 36: The argument @var{flags} is a combination of flag bits that alter the 37: details of matching. See below for a list of the defined flags. 38: 39: In the GNU C Library, @code{fnmatch} cannot experience an ``error''---it 40: always returns an answer for whether the match succeeds. However, other 41: implementations of @code{fnmatch} might sometimes report ``errors''. 42: They would do so by returning nonzero values that are not equal to 43: @code{FNM_NOMATCH}. 44: @end deftypefun 45: 46: These are the available flags for the @var{flags} argument: 47: 48: @table @code 49: @comment fnmatch.h 50: @comment GNU 51: @item FNM_FILE_NAME 52: Treat the @samp{/} character specially, for matching file names. If 53: this flag is set, wildcard constructs in @var{pattern} cannot match 54: @samp{/} in @var{string}. Thus, the only way to match @samp{/} is with 55: an explicit @samp{/} in @var{pattern}. 56: 57: @comment fnmatch.h 58: @comment POSIX.2 59: @item FNM_PATHNAME 60: This is an alias for @code{FNM_FILE_NAME}; it comes from POSIX.2. We 61: don't recommend this name because we don't use the term ``pathname'' for 62: file names. 63: 64: @comment fnmatch.h 65: @comment POSIX.2 66: @item FNM_PERIOD 67: Treat the @samp{.} character specially if it appears at the beginning of 68: @var{string}. If this flag is set, wildcard constructs in @var{pattern} 69: cannot match @samp{.} as the first character of @var{string}. 70: 71: If you set both @code{FNM_PERIOD} and @code{FNM_FILE_NAME}, then the 72: special treatment applies to @samp{.} following @samp{/} as well as to 73: @samp{.} at the beginning of @var{string}. (The shell uses the 74: @code{FNM_PERIOD} and @code{FNM_FILE_NAME} flags together for matching 75: file names.) 76: 77: @comment fnmatch.h 78: @comment POSIX.2 79: @item FNM_NOESCAPE 80: Don't treat the @samp{\} character specially in patterns. Normally, 81: @samp{\} quotes the following character, turning off its special meaning 82: (if any) so that it matches only itself. When quoting is enabled, the 83: pattern @samp{\?} matches only the string @samp{?}, because the question 84: mark in the pattern acts like an ordinary character. 85: 86: If you use @code{FNM_NOESCAPE}, then @samp{\} is an ordinary character. 87: 88: @comment fnmatch.h 89: @comment GNU 90: @item FNM_LEADING_DIR 91: Ignore a trailing sequence of characters starting with a @samp{/} in 92: @var{string}; that is to say, test whether @var{string} starts with a 93: directory name that @var{pattern} matches. 94: 95: If this flag is set, either @samp{foo*} or @samp{foobar} as a pattern 96: would match the string @samp{foobar/frobozz}. 97: 98: @comment fnmatch.h 99: @comment GNU 100: @item FNM_CASEFOLD 101: Ignore case in comparing @var{string} to @var{pattern}. 102: 103: @comment fnmatch.h 104: @comment GNU 105: @item FNM_EXTMATCH 106: @cindex Korn Shell 107: @pindex ksh 108: Recognize beside the normal patterns also the extended patterns 109: introduced in @file{ksh}. The patterns are written in the form 110: explained in the following table where @var{pattern-list} is a @code{|} 111: separated list of patterns. 112: 113: @table @code 114: @item ?(@var{pattern-list}) 115: The pattern matches if zero or one occurrences of any of the patterns 116: in the @var{pattern-list} allow matching the input string. 117: 118: @item *(@var{pattern-list}) 119: The pattern matches if zero or more occurrences of any of the patterns 120: in the @var{pattern-list} allow matching the input string. 121: 122: @item +(@var{pattern-list}) 123: The pattern matches if one or more occurrences of any of the patterns 124: in the @var{pattern-list} allow matching the input string. 125: 126: @item @@(@var{pattern-list}) 127: The pattern matches if exactly one occurrence of any of the patterns in 128: the @var{pattern-list} allows matching the input string. 129: 130: @item !(@var{pattern-list}) 131: The pattern matches if the input string cannot be matched with any of 132: the patterns in the @var{pattern-list}. 133: @end table 134: @end table 135: 136: @node Globbing 137: @section Globbing 138: 139: @cindex globbing 140: The archetypal use of wildcards is for matching against the files in a 141: directory, and making a list of all the matches. This is called 142: @dfn{globbing}. 143: 144: You could do this using @code{fnmatch}, by reading the directory entries 145: one by one and testing each one with @code{fnmatch}. But that would be 146: slow (and complex, since you would have to handle subdirectories by 147: hand). 148: 149: The library provides a function @code{glob} to make this particular use 150: of wildcards convenient. @code{glob} and the other symbols in this 151: section are declared in @file{glob.h}. 152: 153: @menu 154: * Calling Glob:: Basic use of @code{glob}. 155: * Flags for Globbing:: Flags that enable various options in @code{glob}. 156: * More Flags for Globbing:: GNU specific extensions to @code{glob}. 157: @end menu 158: 159: @node Calling Glob 160: @subsection Calling @code{glob} 161: 162: The result of globbing is a vector of file names (strings). To return 163: this vector, @code{glob} uses a special data type, @code{glob_t}, which 164: is a structure. You pass @code{glob} the address of the structure, and 165: it fills in the structure's fields to tell you about the results. 166: 167: @comment glob.h 168: @comment POSIX.2 169: @deftp {Data Type} glob_t 170: This data type holds a pointer to a word vector. More precisely, it 171: records both the address of the word vector and its size. The GNU 172: implementation contains some more fields which are non-standard 173: extensions. 174: 175: @table @code 176: @item gl_pathc 177: The number of elements in the vector, excluding the initial null entries 178: if the GLOB_DOOFFS flag is used (see gl_offs below). 179: 180: @item gl_pathv 181: The address of the vector. This field has type @w{@code{char **}}. 182: 183: @item gl_offs 184: The offset of the first real element of the vector, from its nominal 185: address in the @code{gl_pathv} field. Unlike the other fields, this 186: is always an input to @code{glob}, rather than an output from it. 187: 188: If you use a nonzero offset, then that many elements at the beginning of 189: the vector are left empty. (The @code{glob} function fills them with 190: null pointers.) 191: 192: The @code{gl_offs} field is meaningful only if you use the 193: @code{GLOB_DOOFFS} flag. Otherwise, the offset is always zero 194: regardless of what is in this field, and the first real element comes at 195: the beginning of the vector. 196: 197: @item gl_closedir 198: The address of an alternative implementation of the @code{closedir} 199: function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in 200: the flag parameter. The type of this field is 201: @w{@code{void (*) (void *)}}. 202: 203: This is a GNU extension. 204: 205: @item gl_readdir 206: The address of an alternative implementation of the @code{readdir} 207: function used to read the contents of a directory. It is used if the 208: @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of 209: this field is @w{@code{struct dirent *(*) (void *)}}. 210: 211: This is a GNU extension. 212: 213: @item gl_opendir 214: The address of an alternative implementation of the @code{opendir} 215: function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in 216: the flag parameter. The type of this field is 217: @w{@code{void *(*) (const char *)}}. 218: 219: This is a GNU extension. 220: 221: @item gl_stat 222: The address of an alternative implementation of the @code{stat} function 223: to get information about an object in the filesystem. It is used if the 224: @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of 225: this field is @w{@code{int (*) (const char *, struct stat *)}}. 226: 227: This is a GNU extension. 228: 229: @item gl_lstat 230: The address of an alternative implementation of the @code{lstat} 231: function to get information about an object in the filesystems, not 232: following symbolic links. It is used if the @code{GLOB_ALTDIRFUNC} bit 233: is set in the flag parameter. The type of this field is @code{@w{int 234: (*) (const char *,} @w{struct stat *)}}. 235: 236: This is a GNU extension. 237: @end table 238: @end deftp 239: 240: For use in the @code{glob64} function @file{glob.h} contains another 241: definition for a very similar type. @code{glob64_t} differs from 242: @code{glob_t} only in the types of the members @code{gl_readdir}, 243: @code{gl_stat}, and @code{gl_lstat}. 244: 245: @comment glob.h 246: @comment GNU 247: @deftp {Data Type} glob64_t 248: This data type holds a pointer to a word vector. More precisely, it 249: records both the address of the word vector and its size. The GNU 250: implementation contains some more fields which are non-standard 251: extensions. 252: 253: @table @code 254: @item gl_pathc 255: The number of elements in the vector, excluding the initial null entries 256: if the GLOB_DOOFFS flag is used (see gl_offs below). 257: 258: @item gl_pathv 259: The address of the vector. This field has type @w{@code{char **}}. 260: 261: @item gl_offs 262: The offset of the first real element of the vector, from its nominal 263: address in the @code{gl_pathv} field. Unlike the other fields, this 264: is always an input to @code{glob}, rather than an output from it. 265: 266: If you use a nonzero offset, then that many elements at the beginning of 267: the vector are left empty. (The @code{glob} function fills them with 268: null pointers.) 269: 270: The @code{gl_offs} field is meaningful only if you use the 271: @code{GLOB_DOOFFS} flag. Otherwise, the offset is always zero 272: regardless of what is in this field, and the first real element comes at 273: the beginning of the vector. 274: 275: @item gl_closedir 276: The address of an alternative implementation of the @code{closedir} 277: function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in 278: the flag parameter. The type of this field is 279: @w{@code{void (*) (void *)}}. 280: 281: This is a GNU extension. 282: 283: @item gl_readdir 284: The address of an alternative implementation of the @code{readdir64} 285: function used to read the contents of a directory. It is used if the 286: @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of 287: this field is @w{@code{struct dirent64 *(*) (void *)}}. 288: 289: This is a GNU extension. 290: 291: @item gl_opendir 292: The address of an alternative implementation of the @code{opendir} 293: function. It is used if the @code{GLOB_ALTDIRFUNC} bit is set in 294: the flag parameter. The type of this field is 295: @w{@code{void *(*) (const char *)}}. 296: 297: This is a GNU extension. 298: 299: @item gl_stat 300: The address of an alternative implementation of the @code{stat64} function 301: to get information about an object in the filesystem. It is used if the 302: @code{GLOB_ALTDIRFUNC} bit is set in the flag parameter. The type of 303: this field is @w{@code{int (*) (const char *, struct stat64 *)}}. 304: 305: This is a GNU extension. 306: 307: @item gl_lstat 308: The address of an alternative implementation of the @code{lstat64} 309: function to get information about an object in the filesystems, not 310: following symbolic links. It is used if the @code{GLOB_ALTDIRFUNC} bit 311: is set in the flag parameter. The type of this field is @code{@w{int 312: (*) (const char *,} @w{struct stat64 *)}}. 313: 314: This is a GNU extension. 315: @end table 316: @end deftp 317: 318: @comment glob.h 319: @comment POSIX.2 320: @deftypefun int glob (const char *@var{pattern}, int @var{flags}, int (*@var{errfunc}) (const char *@var{filename}, int @var{error-code}), glob_t *@var{vector-ptr}) 321: The function @code{glob} does globbing using the pattern @var{pattern} 322: in the current directory. It puts the result in a newly allocated 323: vector, and stores the size and address of this vector into 324: @code{*@var{vector-ptr}}. The argument @var{flags} is a combination of 325: bit flags; see @ref{Flags for Globbing}, for details of the flags. 326: 327: The result of globbing is a sequence of file names. The function 328: @code{glob} allocates a string for each resulting word, then 329: allocates a vector of type @code{char **} to store the addresses of 330: these strings. The last element of the vector is a null pointer. 331: This vector is called the @dfn{word vector}. 332: 333: To return this vector, @code{glob} stores both its address and its 334: length (number of elements, not counting the terminating null pointer) 335: into @code{*@var{vector-ptr}}. 336: 337: Normally, @code{glob} sorts the file names alphabetically before 338: returning them. You can turn this off with the flag @code{GLOB_NOSORT} 339: if you want to get the information as fast as possible. Usually it's 340: a good idea to let @code{glob} sort them---if you process the files in 341: alphabetical order, the users will have a feel for the rate of progress 342: that your application is making. 343: 344: If @code{glob} succeeds, it returns 0. Otherwise, it returns one 345: of these error codes: 346: 347: @vtable @code 348: @comment glob.h 349: @comment POSIX.2 350: @item GLOB_ABORTED 351: There was an error opening a directory, and you used the flag 352: @code{GLOB_ERR} or your specified @var{errfunc} returned a nonzero 353: value. 354: @iftex 355: See below 356: @end iftex 357: @ifinfo 358: @xref{Flags for Globbing}, 359: @end ifinfo 360: for an explanation of the @code{GLOB_ERR} flag and @var{errfunc}. 361: 362: @comment glob.h 363: @comment POSIX.2 364: @item GLOB_NOMATCH 365: The pattern didn't match any existing files. If you use the 366: @code{GLOB_NOCHECK} flag, then you never get this error code, because 367: that flag tells @code{glob} to @emph{pretend} that the pattern matched 368: at least one file. 369: 370: @comment glob.h 371: @comment POSIX.2 372: @item GLOB_NOSPACE 373: It was impossible to allocate memory to hold the result. 374: @end vtable 375: 376: In the event of an error, @code{glob} stores information in 377: @code{*@var{vector-ptr}} about all the matches it has found so far. 378: 379: It is important to notice that the @code{glob} function will not fail if 380: it encounters directories or files which cannot be handled without the 381: LFS interfaces. The implementation of @code{glob} is supposed to use 382: these functions internally. This at least is the assumptions made by 383: the Unix standard. The GNU extension of allowing the user to provide 384: own directory handling and @code{stat} functions complicates things a 385: bit. If these callback functions are used and a large file or directory 386: is encountered @code{glob} @emph{can} fail. 387: @end deftypefun 388: 389: @comment glob.h 390: @comment GNU 391: @deftypefun int glob64 (const char *@var{pattern}, int @var{flags}, int (*@var{errfunc}) (const char *@var{filename}, int @var{error-code}), glob64_t *@var{vector-ptr}) 392: The @code{glob64} function was added as part of the Large File Summit 393: extensions but is not part of the original LFS proposal. The reason for 394: this is simple: it is not necessary. The necessity for a @code{glob64} 395: function is added by the extensions of the GNU @code{glob} 396: implementation which allows the user to provide own directory handling 397: and @code{stat} functions. The @code{readdir} and @code{stat} functions 398: do depend on the choice of @code{_FILE_OFFSET_BITS} since the definition 399: of the types @code{struct dirent} and @code{struct stat} will change 400: depending on the choice. 401: 402: Beside this difference the @code{glob64} works just like @code{glob} in 403: all aspects. 404: 405: This function is a GNU extension. 406: @end deftypefun 407: 408: @node Flags for Globbing 409: @subsection Flags for Globbing 410: 411: This section describes the flags that you can specify in the 412: @var{flags} argument to @code{glob}. Choose the flags you want, 413: and combine them with the C bitwise OR operator @code{|}. 414: 415: @vtable @code 416: @comment glob.h 417: @comment POSIX.2 418: @item GLOB_APPEND 419: Append the words from this expansion to the vector of words produced by 420: previous calls to @code{glob}. This way you can effectively expand 421: several words as if they were concatenated with spaces between them. 422: 423: In order for appending to work, you must not modify the contents of the 424: word vector structure between calls to @code{glob}. And, if you set 425: @code{GLOB_DOOFFS} in the first call to @code{glob}, you must also 426: set it when you append to the results. 427: 428: Note that the pointer stored in @code{gl_pathv} may no longer be valid 429: after you call @code{glob} the second time, because @code{glob} might 430: have relocated the vector. So always fetch @code{gl_pathv} from the 431: @code{glob_t} structure after each @code{glob} call; @strong{never} save 432: the pointer across calls. 433: 434: @comment glob.h 435: @comment POSIX.2 436: @item GLOB_DOOFFS 437: Leave blank slots at the beginning of the vector of words. 438: The @code{gl_offs} field says how many slots to leave. 439: The blank slots contain null pointers. 440: 441: @comment glob.h 442: @comment POSIX.2 443: @item GLOB_ERR 444: Give up right away and report an error if there is any difficulty 445: reading the directories that must be read in order to expand @var{pattern} 446: fully. Such difficulties might include a directory in which you don't 447: have the requisite access. Normally, @code{glob} tries its best to keep 448: on going despite any errors, reading whatever directories it can. 449: 450: You can exercise even more control than this by specifying an 451: error-handler function @var{errfunc} when you call @code{glob}. If 452: @var{errfunc} is not a null pointer, then @code{glob} doesn't give up 453: right away when it can't read a directory; instead, it calls 454: @var{errfunc} with two arguments, like this: 455: 456: @smallexample 457: (*@var{errfunc}) (@var{filename}, @var{error-code}) 458: @end smallexample 459: 460: @noindent 461: The argument @var{filename} is the name of the directory that 462: @code{glob} couldn't open or couldn't read, and @var{error-code} is the 463: @code{errno} value that was reported to @code{glob}. 464: 465: If the error handler function returns nonzero, then @code{glob} gives up 466: right away. Otherwise, it continues. 467: 468: @comment glob.h 469: @comment POSIX.2 470: @item GLOB_MARK 471: If the pattern matches the name of a directory, append @samp{/} to the 472: directory's name when returning it. 473: 474: @comment glob.h 475: @comment POSIX.2 476: @item GLOB_NOCHECK 477: If the pattern doesn't match any file names, return the pattern itself 478: as if it were a file name that had been matched. (Normally, when the 479: pattern doesn't match anything, @code{glob} returns that there were no 480: matches.) 481: 482: @comment glob.h 483: @comment POSIX.2 484: @item GLOB_NOSORT 485: Don't sort the file names; return them in no particular order. 486: (In practice, the order will depend on the order of the entries in 487: the directory.) The only reason @emph{not} to sort is to save time. 488: 489: @comment glob.h 490: @comment POSIX.2 491: @item GLOB_NOESCAPE 492: Don't treat the @samp{\} character specially in patterns. Normally, 493: @samp{\} quotes the following character, turning off its special meaning 494: (if any) so that it matches only itself. When quoting is enabled, the 495: pattern @samp{\?} matches only the string @samp{?}, because the question 496: mark in the pattern acts like an ordinary character. 497: 498: If you use @code{GLOB_NOESCAPE}, then @samp{\} is an ordinary character. 499: 500: @code{glob} does its work by calling the function @code{fnmatch} 501: repeatedly. It handles the flag @code{GLOB_NOESCAPE} by turning on the 502: @code{FNM_NOESCAPE} flag in calls to @code{fnmatch}. 503: @end vtable 504: 505: @node More Flags for Globbing 506: @subsection More Flags for Globbing 507: 508: Beside the flags described in the last section, the GNU implementation of 509: @code{glob} allows a few more flags which are also defined in the 510: @file{glob.h} file. Some of the extensions implement functionality 511: which is available in modern shell implementations. 512: 513: @vtable @code 514: @comment glob.h 515: @comment GNU 516: @item GLOB_PERIOD 517: The @code{.} character (period) is treated special. It cannot be 518: matched by wildcards. @xref{Wildcard Matching}, @code{FNM_PERIOD}. 519: 520: @comment glob.h 521: @comment GNU 522: @item GLOB_MAGCHAR 523: The @code{GLOB_MAGCHAR} value is not to be given to @code{glob} in the 524: @var{flags} parameter. Instead, @code{glob} sets this bit in the 525: @var{gl_flags} element of the @var{glob_t} structure provided as the 526: result if the pattern used for matching contains any wildcard character. 527: 528: @comment glob.h 529: @comment GNU 530: @item GLOB_ALTDIRFUNC 531: Instead of the using the using the normal functions for accessing the 532: filesystem the @code{glob} implementation uses the user-supplied 533: functions specified in the structure pointed to by @var{pglob} 534: parameter. For more information about the functions refer to the 535: sections about directory handling see @ref{Accessing Directories}, and 536: @ref{Reading Attributes}. 537: 538: @comment glob.h 539: @comment GNU 540: @item GLOB_BRACE 541: If this flag is given the handling of braces in the pattern is changed. 542: It is now required that braces appear correctly grouped. I.e., for each 543: opening brace there must be a closing one. Braces can be used 544: recursively. So it is possible to define one brace expression in 545: another one. It is important to note that the range of each brace 546: expression is completely contained in the outer brace expression (if 547: there is one). 548: 549: The string between the matching braces is separated into single 550: expressions by splitting at @code{,} (comma) characters. The commas 551: themselves are discarded. Please note what we said above about recursive 552: brace expressions. The commas used to separate the subexpressions must 553: be at the same level. Commas in brace subexpressions are not matched. 554: They are used during expansion of the brace expression of the deeper 555: level. The example below shows this 556: 557: @smallexample 558: glob ("@{foo/@{,bar,biz@},baz@}", GLOB_BRACE, NULL, &result) 559: @end smallexample 560: 561: @noindent 562: is equivalent to the sequence 563: 564: @smallexample 565: glob ("foo/", GLOB_BRACE, NULL, &result) 566: glob ("foo/bar", GLOB_BRACE|GLOB_APPEND, NULL, &result) 567: glob ("foo/biz", GLOB_BRACE|GLOB_APPEND, NULL, &result) 568: glob ("baz", GLOB_BRACE|GLOB_APPEND, NULL, &result) 569: @end smallexample 570: 571: @noindent 572: if we leave aside error handling. 573: 574: @comment glob.h 575: @comment GNU 576: @item GLOB_NOMAGIC 577: If the pattern contains no wildcard constructs (it is a literal file name), 578: return it as the sole ``matching'' word, even if no file exists by that name. 579: 580: @comment glob.h 581: @comment GNU 582: @item GLOB_TILDE 583: If this flag is used the character @code{~} (tilde) is handled special 584: if it appears at the beginning of the pattern. Instead of being taken 585: verbatim it is used to represent the home directory of a known user. 586: 587: If @code{~} is the only character in pattern or it is followed by a 588: @code{/} (slash), the home directory of the process owner is 589: substituted. Using @code{getlogin} and @code{getpwnam} the information 590: is read from the system databases. As an example take user @code{bart} 591: with his home directory at @file{/home/bart}. For him a call like 592: 593: @smallexample 594: glob ("~/bin/*", GLOB_TILDE, NULL, &result) 595: @end smallexample 596: 597: @noindent 598: would return the contents of the directory @file{/home/bart/bin}. 599: Instead of referring to the own home directory it is also possible to 600: name the home directory of other users. To do so one has to append the 601: user name after the tilde character. So the contents of user 602: @code{homer}'s @file{bin} directory can be retrieved by 603: 604: @smallexample 605: glob ("~homer/bin/*", GLOB_TILDE, NULL, &result) 606: @end smallexample 607: 608: If the user name is not valid or the home directory cannot be determined 609: for some reason the pattern is left untouched and itself used as the 610: result. I.e., if in the last example @code{home} is not available the 611: tilde expansion yields to @code{"~homer/bin/*"} and @code{glob} is not 612: looking for a directory named @code{~homer}. 613: 614: This functionality is equivalent to what is available in C-shells if the 615: @code{nonomatch} flag is set. 616: 617: @comment glob.h 618: @comment GNU 619: @item GLOB_TILDE_CHECK 620: If this flag is used @code{glob} behaves like as if @code{GLOB_TILDE} is 621: given. The only difference is that if the user name is not available or 622: the home directory cannot be determined for other reasons this leads to 623: an error. @code{glob} will return @code{GLOB_NOMATCH} instead of using 624: the pattern itself as the name. 625: 626: This functionality is equivalent to what is available in C-shells if 627: @code{nonomatch} flag is not set. 628: 629: @comment glob.h 630: @comment GNU 631: @item GLOB_ONLYDIR 632: If this flag is used the globbing function takes this as a 633: @strong{hint} that the caller is only interested in directories 634: matching the pattern. If the information about the type of the file 635: is easily available non-directories will be rejected but no extra 636: work will be done to determine the information for each file. I.e., 637: the caller must still be able to filter directories out. 638: 639: This functionality is only available with the GNU @code{glob} 640: implementation. It is mainly used internally to increase the 641: performance but might be useful for a user as well and therefore is 642: documented here. 643: @end vtable 644: 645: Calling @code{glob} will in most cases allocate resources which are used 646: to represent the result of the function call. If the same object of 647: type @code{glob_t} is used in multiple call to @code{glob} the resources 648: are freed or reused so that no leaks appear. But this does not include 649: the time when all @code{glob} calls are done. 650: 651: @comment glob.h 652: @comment POSIX.2 653: @deftypefun void globfree (glob_t *@var{pglob}) 654: The @code{globfree} function frees all resources allocated by previous 655: calls to @code{glob} associated with the object pointed to by 656: @var{pglob}. This function should be called whenever the currently used 657: @code{glob_t} typed object isn't used anymore. 658: @end deftypefun 659: 660: @comment glob.h 661: @comment GNU 662: @deftypefun void globfree64 (glob64_t *@var{pglob}) 663: This function is equivalent to @code{globfree} but it frees records of 664: type @code{glob64_t} which were allocated by @code{glob64}. 665: @end deftypefun 666: 667: 668: @node Regular Expressions 669: @section Regular Expression Matching 670: 671: The GNU C library supports two interfaces for matching regular 672: expressions. One is the standard POSIX.2 interface, and the other is 673: what the GNU system has had for many years. 674: 675: Both interfaces are declared in the header file @file{regex.h}. 676: If you define @w{@code{_POSIX_C_SOURCE}}, then only the POSIX.2 677: functions, structures, and constants are declared. 678: @c !!! we only document the POSIX.2 interface here!! 679: 680: @menu 681: * POSIX Regexp Compilation:: Using @code{regcomp} to prepare to match. 682: * Flags for POSIX Regexps:: Syntax variations for @code{regcomp}. 683: * Matching POSIX Regexps:: Using @code{regexec} to match the compiled 684: pattern that you get from @code{regcomp}. 685: * Regexp Subexpressions:: Finding which parts of the string were matched. 686: * Subexpression Complications:: Find points of which parts were matched. 687: * Regexp Cleanup:: Freeing storage; reporting errors. 688: @end menu 689: 690: @node POSIX Regexp Compilation 691: @subsection POSIX Regular Expression Compilation 692: 693: Before you can actually match a regular expression, you must 694: @dfn{compile} it. This is not true compilation---it produces a special 695: data structure, not machine instructions. But it is like ordinary 696: compilation in that its purpose is to enable you to ``execute'' the 697: pattern fast. (@xref{Matching POSIX Regexps}, for how to use the 698: compiled regular expression for matching.) 699: 700: There is a special data type for compiled regular expressions: 701: 702: @comment regex.h 703: @comment POSIX.2 704: @deftp {Data Type} regex_t 705: This type of object holds a compiled regular expression. 706: It is actually a structure. It has just one field that your programs 707: should look at: 708: 709: @table @code 710: @item re_nsub 711: This field holds the number of parenthetical subexpressions in the 712: regular expression that was compiled. 713: @end table 714: 715: There are several other fields, but we don't describe them here, because 716: only the functions in the library should use them. 717: @end deftp 718: 719: After you create a @code{regex_t} object, you can compile a regular 720: expression into it by calling @code{regcomp}. 721: 722: @comment regex.h 723: @comment POSIX.2 724: @deftypefun int regcomp (regex_t *restrict @var{compiled}, const char *restrict @var{pattern}, int @var{cflags}) 725: The function @code{regcomp} ``compiles'' a regular expression into a 726: data structure that you can use with @code{regexec} to match against a 727: string. The compiled regular expression format is designed for 728: efficient matching. @code{regcomp} stores it into @code{*@var{compiled}}. 729: 730: It's up to you to allocate an object of type @code{regex_t} and pass its 731: address to @code{regcomp}. 732: 733: The argument @var{cflags} lets you specify various options that control 734: the syntax and semantics of regular expressions. @xref{Flags for POSIX 735: Regexps}. 736: 737: If you use the flag @code{REG_NOSUB}, then @code{regcomp} omits from 738: the compiled regular expression the information necessary to record 739: how subexpressions actually match. In this case, you might as well 740: pass @code{0} for the @var{matchptr} and @var{nmatch} arguments when 741: you call @code{regexec}. 742: 743: If you don't use @code{REG_NOSUB}, then the compiled regular expression 744: does have the capacity to record how subexpressions match. Also, 745: @code{regcomp} tells you how many subexpressions @var{pattern} has, by 746: storing the number in @code{@var{compiled}->re_nsub}. You can use that 747: value to decide how long an array to allocate to hold information about 748: subexpression matches. 749: 750: @code{regcomp} returns @co