What happens when a process opens a file? A process may have several open files which it may be reading from and writing to. It also has a current position within the file, which is the next byte to be read or written. Each process has its own array to keep track of
The file descriptors are unique to a process (except under certain circumstances), in that the integers
may by reused by another process without referring to the same file or location within a file. By
convention Unix shells (although not the kernel) employ the following values:
File | File Descriptor | POSIX Symbolic Constant |
Standard Input | 0 | STDIN_FILENO |
Standard Output | 1 | STDOUT_FILENO |
Standard Error | 2 | STDERR |
How many file descriptors may be opened by a process? One complication here is that this may depend upon the available memory for the process, the maximum integer size, or limits set by the system administrator. Its value is potententially unlimited, but guaranteed by POSIX to be at least 16. Determining the actual value can be difficult, since the value may not be determined at compile time. If a hard limit has been set (so determined at compile time), the symbolic constant OPEN_MAX found in limits.h will be set; if the limit is determined at run time a call to sysconf(_SC_OPEN_MAX) will produce the value. But, it may be indeterminate even at run time (see [Stevens, pp. 42-4] for an alternative to determine this value.) POSIX does insist that the value should not change during the processes lifetime. [Stevens, pp. 33-4, 40, 42-4, 47-8]
Remember that everything is a file in Unix. Because there are so many types of files, several of the system calls below will behave slightly differently for files that are not regular. I have not attempted to include the exact behavior, or all the errors that can arise for files which are not regular.
open | |
Purpose | open or create a file for reading or writing |
Include | #include<fcntl.h> If the optional third argument is used also include: #include<sys/types.h> #include<sys/stat.h> |
Useage | int open(const char *path, int flags[, mode_t mode]);
(The third argument is optional.) |
Arguments | path:
the (relative) path to the file flags: the file status flags mode: file permissions, used when creating a new file |
Returns | -1 on error file descriptor on success |
Errors | Too numerous to list all: see man 2 open ENOTDIR: A component of the path prefix is not a directory. EACCES: Permissions do not permit reading or writing EISDIR: The named file is a directory and it is to opened for writing. EMFILE: The process has already reached its limit for open file descriptors. |
The file descriptor returned by open is guaranteed to be the lowest numbered unused descriptor. This is valuable to know when you want to redirect to a regular file the input to a command that expects to read standard input (or write to standard output): To redirect standard input simply close STDIN_FILENO (descriptor 0) then open a new file--which will be given descriptor 0.
The values for the second argument, the file status flags
consists of bitwise OR'ing ('|') the following:
One of these three must be included:
O_RDONLY | open for reading only |
O_WRONLY | open for writing only |
O_RDWR | open for reading and writing |
The following are optional arguments
O_APPEND | append on each write |
O_CREAT | create file if it does not exist: REQUIRES mode |
O_TRUNC | truncate size to 0 |
The value for mode must be included when
O_CREAT is set. It is simply the permissions, and
can be written using C's octal representation (this is base eight and starts with a leading
zero.) For example, to request that you (the creator) have read and write privileges and everyone
else have read privileges only, you would specify
open("pathtofile",O_WRONLY |
O_CREAT, 0644);
This is only a request on your part, and this request will be compared with the
umask
to determine the final permissions of the file.
See especially [Molay, pp. 84-8], or [Stevens, p.78-81]for further details.
Notice how open allows a process to have different file descriptors to the same file. These may have different file status flags, and may even have different offsets within the file.
close is used to detach the use of the file descriptor for a process. When a process terminates any open file descriptors are automatically closed by the kernel.
close | |
Purpose | delete a file descriptor |
Include | #include<unistd.h> |
Useage | int close(int d); |
Arguments | d: a file descriptor |
Returns | -1 on error 0 on success (the file descriptor deleted) |
Errors | EBADF:
d is not an active descriptor. EINTR: An interrupt was received. |
read starts at the file's current offset, which is then offset by the number of bytes read (for regular files.)
read | |
Purpose | read input from file |
Include | #include<unistd.h> |
Useage | ssize_t read(int d, void *buf, size_t nbytes); |
Arguments | d:
a file descriptor buf: buffer for storing bytes read nbytes: maximum number of bytes to read |
Returns | -1 on error number of bytes read and placed in buf or 0 if end of file |
Errors | EBADF:
d is not an active descriptor.. EFAULT: buf points outside the allocated address space. EAGAIN: The file was marked for non-blocking I/O, and no data were ready to be read. EINVAL: The pointer associated with d was negative. EIO: An I/O error occurred while reading from the file system. |
write starts at the file's current offset, which is then offset by the number of bytes written to the file (for regular files.)
write | |
Purpose | write output to file |
Include | #include<unistd.h> |
Useage | ssize_t write(int d, void *buf, size_t nbytes); |
Arguments | d:
a file descriptor buf: buffer for storing bytes to be written nbytes: maximum number of bytes to read |
Returns | -1 on error number of bytes written |
Errors | Too numerous to list all: see man 2 write EBADF: d is not an active descriptor. EFAULT: Data to be written to the file points outside the allocated address space. EINVAL: The pointer associated with d was negative. EFBIG: An attempt was made to write a file that exceeds the process's file size limit or the maximum file size. ENOSPC: There is no free space remaining on the file system containing the file.. EAGAIN: The file was marked for non-blocking I/O, and no data were ready to be read. EINTR: A signal interrupted the write before it could be completed. EIO: An I/O error occurred while reading from the file system. |
Every file descriptor has an associated current file offset, a number of bytes from the
beginning of the file. Read and write operations normally start at the current offset and cause the
offset to be incremented the number of bytes read or written.
lseek explicitly repositions this offset value.
lseek | |
Purpose | reposition read/write file offset |
Include | #include<unistd.h> |
Useage | off_t lseek(int d, off_t offset, int base); |
Arguments | d:
a file descriptor offset: the number of bytes to be offset base: the position from which the bytes will be offset: SEEK_SET: offset bytes from beginning of the file. SEEK_CUR: offset bytes from current value of offset. SEEK_END: offset bytes from end of the file. |
Returns | -1 on error The resulting offset location as measured in bytes from the beginning of the file. |
Errors | EBADF:
d is not an active descriptor.. EINVAL: basenot a proper value. ESPIPE: base associated with a non-regular file (pipe, socket or FIFO.) |
lseek can be used to determine the current offset
off_t currpos;
ccurrpos = lseek(fd, 0, SEEK_CUR);
lseek can
also be used to test a file descripter if it is a pipe, FIFO, or socket, since these are not
capable of seeking: they force a return of -1 and set errno to
ESPIPE.
dup dup2 |
|
Purpose | duplicate an existing file descriptor |
Include | #include<unistd.h> |
Useage | int dup(int oldd); int dup2(int oldd, int newd); |
Arguments | oldd:
an existing file descriptor newd: the value of the new descriptor newd |
Returns | -1 on error the value of newd |
Errors | EBADF:
oldd or
newd is not a valid active descriptor EMFILE: Too many descriptors are active. |