
1: @node I/O Overview, I/O on Streams, Pattern Matching, Top 2: @c %MENU% Introduction to the I/O facilities 3: @chapter Input/Output Overview 4: 5: Most programs need to do either input (reading data) or output (writing 6: data), or most frequently both, in order to do anything useful. The GNU 7: C library provides such a large selection of input and output functions 8: that the hardest part is often deciding which function is most 9: appropriate! 10: 11: This chapter introduces concepts and terminology relating to input 12: and output. Other chapters relating to the GNU I/O facilities are: 13: 14: @itemize @bullet 15: @item 16: @ref{I/O on Streams}, which covers the high-level functions 17: that operate on streams, including formatted input and output. 18: 19: @item 20: @ref{Low-Level I/O}, which covers the basic I/O and control 21: functions on file descriptors. 22: 23: @item 24: @ref{File System Interface}, which covers functions for operating on 25: directories and for manipulating file attributes such as access modes 26: and ownership. 27: 28: @item 29: @ref{Pipes and FIFOs}, which includes information on the basic interprocess 30: communication facilities. 31: 32: @item 33: @ref{Sockets}, which covers a more complicated interprocess communication 34: facility with support for networking. 35: 36: @item 37: @ref{Low-Level Terminal Interface}, which covers functions for changing 38: how input and output to terminals or other serial devices are processed. 39: @end itemize 40: 41: 42: @menu 43: * I/O Concepts:: Some basic information and terminology. 44: * File Names:: How to refer to a file. 45: @end menu 46: 47: @node I/O Concepts, File Names, , I/O Overview 48: @section Input/Output Concepts 49: 50: Before you can read or write the contents of a file, you must establish 51: a connection or communications channel to the file. This process is 52: called @dfn{opening} the file. You can open a file for reading, writing, 53: or both. 54: @cindex opening a file 55: 56: The connection to an open file is represented either as a stream or as a 57: file descriptor. You pass this as an argument to the functions that do 58: the actual read or write operations, to tell them which file to operate 59: on. Certain functions expect streams, and others are designed to 60: operate on file descriptors. 61: 62: When you have finished reading to or writing from the file, you can 63: terminate the connection by @dfn{closing} the file. Once you have 64: closed a stream or file descriptor, you cannot do any more input or 65: output operations on it. 66: 67: @menu 68: * Streams and File Descriptors:: The GNU Library provides two ways 69: to access the contents of files. 70: * File Position:: The number of bytes from the 71: beginning of the file. 72: @end menu 73: 74: @node Streams and File Descriptors, File Position, , I/O Concepts 75: @subsection Streams and File Descriptors 76: 77: When you want to do input or output to a file, you have a choice of two 78: basic mechanisms for representing the connection between your program 79: and the file: file descriptors and streams. File descriptors are 80: represented as objects of type @code{int}, while streams are represented 81: as @code{FILE *} objects. 82: 83: File descriptors provide a primitive, low-level interface to input and 84: output operations. Both file descriptors and streams can represent a 85: connection to a device (such as a terminal), or a pipe or socket for 86: communicating with another process, as well as a normal file. But, if 87: you want to do control operations that are specific to a particular kind 88: of device, you must use a file descriptor; there are no facilities to 89: use streams in this way. You must also use file descriptors if your 90: program needs to do input or output in special modes, such as 91: nonblocking (or polled) input (@pxref{File Status Flags}). 92: 93: Streams provide a higher-level interface, layered on top of the 94: primitive file descriptor facilities. The stream interface treats all 95: kinds of files pretty much alike---the sole exception being the three 96: styles of buffering that you can choose (@pxref{Stream Buffering}). 97: 98: The main advantage of using the stream interface is that the set of 99: functions for performing actual input and output operations (as opposed 100: to control operations) on streams is much richer and more powerful than 101: the corresponding facilities for file descriptors. The file descriptor 102: interface provides only simple functions for transferring blocks of 103: characters, but the stream interface also provides powerful formatted 104: input and output functions (@code{printf} and @code{scanf}) as well as 105: functions for character- and line-oriented input and output. 106: @c !!! glibc has dprintf, which lets you do printf on an fd. 107: 108: Since streams are implemented in terms of file descriptors, you can 109: extract the file descriptor from a stream and perform low-level 110: operations directly on the file descriptor. You can also initially open 111: a connection as a file descriptor and then make a stream associated with 112: that file descriptor. 113: 114: In general, you should stick with using streams rather than file 115: descriptors, unless there is some specific operation you want to do that 116: can only be done on a file descriptor. If you are a beginning 117: programmer and aren't sure what functions to use, we suggest that you 118: concentrate on the formatted input functions (@pxref{Formatted Input}) 119: and formatted output functions (@pxref{Formatted Output}). 120: 121: If you are concerned about portability of your programs to systems other 122: than GNU, you should also be aware that file descriptors are not as 123: portable as streams. You can expect any system running @w{ISO C} to 124: support streams, but non-GNU systems may not support file descriptors at 125: all, or may only implement a subset of the GNU functions that operate on 126: file descriptors. Most of the file descriptor functions in the GNU 127: library are included in the POSIX.1 standard, however. 128: 129: @node File Position, , Streams and File Descriptors, I/O Concepts 130: @subsection File Position 131: 132: One of the attributes of an open file is its @dfn{file position} that 133: keeps track of where in the file the next character is to be read or 134: written. In the GNU system, and all POSIX.1 systems, the file position 135: is simply an integer representing the number of bytes from the beginning 136: of the file. 137: 138: The file position is normally set to the beginning of the file when it 139: is opened, and each time a character is read or written, the file 140: position is incremented. In other words, access to the file is normally 141: @dfn{sequential}. 142: @cindex file position 143: @cindex sequential-access files 144: 145: Ordinary files permit read or write operations at any position within 146: the file. Some other kinds of files may also permit this. Files which 147: do permit this are sometimes referred to as @dfn{random-access} files. 148: You can change the file position using the @code{fseek} function on a 149: stream (@pxref{File Positioning}) or the @code{lseek} function on a file 150: descriptor (@pxref{I/O Primitives}). If you try to change the file 151: position on a file that doesn't support random access, you get the 152: @code{ESPIPE} error. 153: @cindex random-access files 154: 155: Streams and descriptors that are opened for @dfn{append access} are 156: treated specially for output: output to such files is @emph{always} 157: appended sequentially to the @emph{end} of the file, regardless of the 158: file position. However, the file position is still used to control where in 159: the file reading is done. 160: @cindex append-access files 161: 162: If you think about it, you'll realize that several programs can read a 163: given file at the same time. In order for each program to be able to 164: read the file at its own pace, each program must have its own file 165: pointer, which is not affected by anything the other programs do. 166: 167: In fact, each opening of a file creates a separate file position. 168: Thus, if you open a file twice even in the same program, you get two 169: streams or descriptors with independent file positions. 170: 171: By contrast, if you open a descriptor and then duplicate it to get 172: another descriptor, these two descriptors share the same file position: 173: changing the file position of one descriptor will affect the other. 174: 175: @node File Names, , I/O Concepts, I/O Overview 176: @section File Names 177: 178: In order to open a connection to a file, or to perform other operations 179: such as deleting a file, you need some way to refer to the file. Nearly 180: all files have names that are strings---even files which are actually 181: devices such as tape drives or terminals. These strings are called 182: @dfn{file names}. You specify the file name to say which file you want 183: to open or operate on. 184: 185: This section describes the conventions for file names and how the 186: operating system works with them. 187: @cindex file name 188: 189: @menu 190: * Directories:: Directories contain entries for files. 191: * File Name Resolution:: A file name specifies how to look up a file. 192: * File Name Errors:: Error conditions relating to file names. 193: * File Name Portability:: File name portability and syntax issues. 194: @end menu 195: 196: 197: @node Directories, File Name Resolution, , File Names 198: @subsection Directories 199: 200: In order to understand the syntax of file names, you need to understand 201: how the file system is organized into a hierarchy of directories. 202: 203: @cindex directory 204: @cindex link 205: @cindex directory entry 206: A @dfn{directory} is a file that contains information to associate other 207: files with names; these associations are called @dfn{links} or 208: @dfn{directory entries}. Sometimes, people speak of ``files in a 209: directory'', but in reality, a directory only contains pointers to 210: files, not the files themselves. 211: 212: @cindex file name component 213: The name of a file contained in a directory entry is called a @dfn{file 214: name component}. In general, a file name consists of a sequence of one 215: or more such components, separated by the slash character (@samp{/}). A 216: file name which is just one component names a file with respect to its 217: directory. A file name with multiple components names a directory, and 218: then a file in that directory, and so on. 219: 220: Some other documents, such as the POSIX standard, use the term 221: @dfn{pathname} for what we call a file name, and either @dfn{filename} 222: or @dfn{pathname component} for what this manual calls a file name 223: component. We don't use this terminology because a ``path'' is 224: something completely different (a list of directories to search), and we 225: think that ``pathname'' used for something else will confuse users. We 226: always use ``file name'' and ``file name component'' (or sometimes just 227: ``component'', where the context is obvious) in GNU documentation. Some 228: macros use the POSIX terminology in their names, such as 229: @code{PATH_MAX}. These macros are defined by the POSIX standard, so we 230: cannot change their names. 231: 232: You can find more detailed information about operations on directories 233: in @ref{File System Interface}. 234: 235: @node File Name Resolution, File Name Errors, Directories, File Names 236: @subsection File Name Resolution 237: 238: A file name consists of file name components separated by slash 239: (@samp{/}) characters. On the systems that the GNU C library supports, 240: multiple successive @samp{/} characters are equivalent to a single 241: @samp{/} character. 242: 243: @cindex file name resolution 244: The process of determining what file a file name refers to is called 245: @dfn{file name resolution}. This is performed by examining the 246: components that make up a file name in left-to-right order, and locating 247: each successive component in the directory named by the previous 248: component. Of course, each of the files that are referenced as 249: directories must actually exist, be directories instead of regular 250: files, and have the appropriate permissions to be accessible by the 251: process; otherwise the file name resolution fails. 252: 253: @cindex root directory 254: @cindex absolute file name 255: If a file name begins with a @samp{/}, the first component in the file 256: name is located in the @dfn{root directory} of the process (usually all 257: processes on the system have the same root directory). Such a file name 258: is called an @dfn{absolute file name}. 259: @c !!! xref here to chroot, if we ever document chroot. -rm 260: 261: @cindex relative file name 262: Otherwise, the first component in the file name is located in the 263: current working directory (@pxref{Working Directory}). This kind of 264: file name is called a @dfn{relative file name}. 265: 266: @cindex parent directory 267: The file name components @file{.} (``dot'') and @file{..} (``dot-dot'') 268: have special meanings. Every directory has entries for these file name 269: components. The file name component @file{.} refers to the directory 270: itself, while the file name component @file{..} refers to its 271: @dfn{parent directory} (the directory that contains the link for the 272: directory in question). As a special case, @file{..} in the root 273: directory refers to the root directory itself, since it has no parent; 274: thus @file{/..} is the same as @file{/}. 275: 276: Here are some examples of file names: 277: 278: @table @file 279: @item /a 280: The file named @file{a}, in the root directory. 281: 282: @item /a/b 283: The file named @file{b}, in the directory named @file{a} in the root directory. 284: 285: @item a 286: The file named @file{a}, in the current working directory. 287: 288: @item /a/./b 289: This is the same as @file{/a/b}. 290: 291: @item ./a 292: The file named @file{a}, in the current working directory. 293: 294: @item ../a 295: The file named @file{a}, in the parent directory of the current working 296: directory. 297: @end table 298: 299: @c An empty string may ``work'', but I think it's confusing to 300: @c try to describe it. It's not a useful thing for users to use--rms. 301: A file name that names a directory may optionally end in a @samp{/}. 302: You can specify a file name of @file{/} to refer to the root directory, 303: but the empty string is not a meaningful file name. If you want to 304: refer to the current working directory, use a file name of @file{.} or 305: @file{./}. 306: 307: Unlike some other operating systems, the GNU system doesn't have any 308: built-in support for file types (or extensions) or file versions as part 309: of its file name syntax. Many programs and utilities use conventions 310: for file names---for example, files containing C source code usually 311: have names suffixed with @samp{.c}---but there is nothing in the file 312: system itself that enforces this kind of convention. 313: 314: @node File Name Errors, File Name Portability, File Name Resolution, File Names 315: @subsection File Name Errors 316: 317: @cindex file name errors 318: @cindex usual file name errors 319: 320: Functions that accept file name arguments usually detect these 321: @code{errno} error conditions relating to the file name syntax or 322: trouble finding the named file. These errors are referred to throughout 323: this manual as the @dfn{usual file name errors}. 324: 325: @table @code 326: @item EACCES 327: The process does not have search permission for a directory component 328: of the file name. 329: 330: @item ENAMETOOLONG 331: This error is used when either the total length of a file name is 332: greater than @code{PATH_MAX}, or when an individual file name component 333: has a length greater than @code{NAME_MAX}. @xref{Limits for Files}. 334: 335: In the GNU system, there is no imposed limit on overall file name 336: length, but some file systems may place limits on the length of a 337: component. 338: 339: @item ENOENT 340: This error is reported when a file referenced as a directory component 341: in the file name doesn't exist, or when a component is a symbolic link 342: whose target file does not exist. @xref{Symbolic Links}. 343: 344: @item ENOTDIR 345: A file that is referenced as a directory component in the file name 346: exists, but it isn't a directory. 347: 348: @item ELOOP 349: Too many symbolic links were resolved while trying to look up the file 350: name. The system has an arbitrary limit on the number of symbolic links 351: that may be resolved in looking up a single file name, as a primitive 352: way to detect loops. @xref{Symbolic Links}. 353: @end table 354: 355: 356: @node File Name Portability, , File Name Errors, File Names 357: @subsection Portability of File Names 358: 359: The rules for the syntax of file names discussed in @ref{File Names}, 360: are the rules normally used by the GNU system and by other POSIX 361: systems. However, other operating systems may use other conventions. 362: 363: There are two reasons why it can be important for you to be aware of 364: file name portability issues: 365: 366: @itemize @bullet 367: @item 368: If your program makes assumptions about file name syntax, or contains 369: embedded literal file name strings, it is more difficult to get it to 370: run under other operating systems that use different syntax conventions. 371: 372: @item 373: Even if you are not concerned about running your program on machines 374: that run other operating systems, it may still be possible to access 375: files that use different naming conventions. For example, you may be 376: able to access file systems on another computer running a different 377: operating system over a network, or read and write disks in formats used 378: by other operating systems. 379: @end itemize 380: 381: The @w{ISO C} standard says very little about file name syntax, only that 382: file names are strings. In addition to varying restrictions on the 383: length of file names and what characters can validly appear in a file 384: name, different operating systems use different conventions and syntax 385: for concepts such as structured directories and file types or 386: extensions. Some concepts such as file versions might be supported in 387: some operating systems and not by others. 388: 389: The POSIX.1 standard allows implementations to put additional 390: restrictions on file name syntax, concerning what characters are 391: permitted in file names and on the length of file name and file name 392: component strings. However, in the GNU system, you do not need to worry 393: about these restrictions; any character except the null character is 394: permitted in a file name string, and there are no limits on the length 395: of file name strings.