FILE I/O

Sources

Introduction

Most Unix file I/O can be performed with six functions: These functions are part of the POSIX standard for UNIX programming, not part of ANSI C (and so, you will not find reference to them in Harbison and Steele's C Reference Manual.) Unlike the standard I/O routines provided by ANSI C (such as fscanf and fprintf which store the data they read in buffers) these functions are unbuffered I/O. They invoke a system call in the kernel, and will be called by the standard ANSI C function calls in the UNIX environment.

File Descriptors

What happens when a process opens a file? A process may have several open files which it may be reading from and writing to. It also has a current position within the file, which is the next byte to be read or written. Each process has its own array to keep track of

When a file is opened or created by a process the kernel assigns a position in the array called the file descriptor. Each entry of this array actually contains a pointer to a file table which stores each of the three pieces of information: file, file status flags, and offset. The file table does not itself contain the file, but instead has a pointer to another table (called the vnode table), which has vital information about the file, including its location in memory. The vnode table is unique for each file. Why this complicated sequence?:
file descriptor  --->  file table  --->   vnode table

It turns out to be very flexible: This flexibility is at the heart of how Unix can implement file redirection (< and >) and pipes (|).

The file descriptors are unique to a process (except under certain circumstances), in that the integers may by reused by another process without referring to the same file or location within a file. By convention Unix shells (although not the kernel) employ the following values:
File File Descriptor POSIX Symbolic Constant
Standard Input 0 STDIN_FILENO
Standard Output 1 STDOUT_FILENO
Standard Error 2 STDERR

How many file descriptors may be opened by a process? One complication here is that this may depend upon the available memory for the process, the maximum integer size, or limits set by the system administrator. Its value is potententially unlimited, but guaranteed by POSIX to be at least 16. Determining the actual value can be difficult, since the value may not be determined at compile time. If a hard limit has been set (so determined at compile time), the symbolic constant OPEN_MAX found in limits.h will be set; if the limit is determined at run time a call to sysconf(_SC_OPEN_MAX) will produce the value. But, it may be indeterminate even at run time (see [Stevens, pp. 42-4] for an alternative to determine this value.) POSIX does insist that the value should not change during the processes lifetime. [Stevens, pp. 33-4, 40, 42-4, 47-8]

System Calls

Remember that everything is a file in Unix. Because there are so many types of files, several of the system calls below will behave slightly differently for files that are not regular. I have not attempted to include the exact behavior, or all the errors that can arise for files which are not regular.

open

open  
Purpose open or create a file for reading or writing
Include #include<fcntl.h>
    If the optional third argument is used also include:
#include<sys/types.h>
#include<sys/stat.h>
Useage int open(const char *path, int flags[, mode_t mode]);
(The third argument is optional.)
Arguments path:    the (relative) path to the file
flags:   the file status flags
mode:  file permissions, used when creating a new file
Returns -1 on error
file descriptor on success
Errors Too numerous to list all: see man 2 open
ENOTDIR:   A component of the path prefix is not a directory.
EACCES:    Permissions do not permit reading or writing
EISDIR:    The named file is a directory and it is to opened for writing.
EMFILE:    The process has already reached its limit for open file descriptors.

The file descriptor returned by open is guaranteed to be the lowest numbered unused descriptor. This is valuable to know when you want to redirect to a regular file the input to a command that expects to read standard input (or write to standard output): To redirect standard input simply close STDIN_FILENO (descriptor 0) then open a new file--which will be given descriptor 0.

The values for the second argument, the file status flags consists of bitwise OR'ing ('|') the following:

     One of these three must be included:
O_RDONLY open for reading only
O_WRONLY open for writing only
O_RDWR open for reading and writing


      The following are optional arguments

O_APPEND append on each write
O_CREAT create file if it does not exist: REQUIRES mode
O_TRUNC truncate size to 0

There are four other options, but these three are the most useful, for now. See [Stevens, p.49] for a description of all options.

The value for mode must be included when O_CREAT is set. It is simply the permissions, and can be written using C's octal representation (this is base eight and starts with a leading zero.) For example, to request that you (the creator) have read and write privileges and everyone else have read privileges only, you would specify

       open("pathtofile",O_WRONLY | O_CREAT, 0644);

This is only a request on your part, and this request will be compared with the umask to determine the final permissions of the file. See especially [Molay, pp. 84-8], or [Stevens, p.78-81]for further details.

Notice how open allows a process to have different file descriptors to the same file. These may have different file status flags, and may even have different offsets within the file.

close

close is used to detach the use of the file descriptor for a process. When a process terminates any open file descriptors are automatically closed by the kernel.
close  
Purpose delete a file descriptor
Include #include<unistd.h>
Useage int close(int d);
Arguments d:    a file descriptor
Returns -1 on error
0 on success (the file descriptor deleted)
Errors EBADF:   d is not an active descriptor.
EINTR:    An interrupt was received.

read

read starts at the file's current offset, which is then offset by the number of bytes read (for regular files.)
read  
Purpose read input from file
Include #include<unistd.h>
Useage ssize_t read(int d, void *buf, size_t nbytes);
Arguments d:    a file descriptor
buf:    buffer for storing bytes read
nbytes:    maximum number of bytes to read
Returns -1 on error
number of bytes read and placed in buf or 0 if end of file
Errors EBADF:   d is not an active descriptor..
EFAULT:    buf points outside the allocated address space.
EAGAIN:    The file was marked for non-blocking I/O, and no data were ready to be read.
EINVAL:    The pointer associated with d was negative.
EIO:    An I/O error occurred while reading from the file system.

The main reason the number of bytes read may be less than the number of bytes requested in nbytes is that the end of the file was reached before the requested number of bytes has been read. See [Stevens, p. 54] for several other reasons involving other types of files.

write

write starts at the file's current offset, which is then offset by the number of bytes written to the file (for regular files.)
write  
Purpose write output to file
Include #include<unistd.h>
Useage ssize_t write(int d, void *buf, size_t nbytes);
Arguments d:    a file descriptor
buf:    buffer for storing bytes to be written
nbytes:    maximum number of bytes to read
Returns -1 on error
number of bytes written
Errors Too numerous to list all: see man 2 write
EBADF:   d is not an active descriptor.
EFAULT:    Data to be written to the file points outside the allocated address space.
EINVAL:    The pointer associated with d was negative.
EFBIG:    An attempt was made to write a file that exceeds the process's file size limit or the maximum file size.
ENOSPC:    There is no free space remaining on the file system containing the file..
EAGAIN:    The file was marked for non-blocking I/O, and no data were ready to be read.
EINTR:    A signal interrupted the write before it could be completed.
EIO:    An I/O error occurred while reading from the file system.

lseek

Every file descriptor has an associated current file offset, a number of bytes from the beginning of the file. Read and write operations normally start at the current offset and cause the offset to be incremented the number of bytes read or written. lseek explicitly repositions this offset value.
lseek  
Purpose reposition read/write file offset
Include #include<unistd.h>
Useage off_t lseek(int d, off_t offset, int base);
Arguments d:    a file descriptor
offset:    the number of bytes to be offset
base:    the position from which the bytes will be offset:
   SEEK_SET:    offset bytes from beginning of the file.
   SEEK_CUR:    offset bytes from current value of offset.
   SEEK_END:    offset bytes from end of the file.
Returns -1 on error
The resulting offset location as measured in bytes from the beginning of the file.
Errors EBADF:   d is not an active descriptor..
EINVAL:    basenot a proper value.
ESPIPE:    base associated with a non-regular file (pipe, socket or FIFO.)

A file's offset can be greater than its current size. In this case, if the file is then written to it creates holes in the file, whose value is '\0'. A regular file may not be offset before the beginning of the file

lseek can be used to determine the current offset
       off_t    currpos;
       ccurrpos = lseek(fd, 0, SEEK_CUR);

lseek can also be used to test a file descripter if it is a pipe, FIFO, or socket, since these are not capable of seeking: they force a return of -1 and set errno to ESPIPE.

creat

creat opens a file for writing, creating a new file if one did not exist, or truncating the current file, discarding its contents, if a file does exist. Actually, creat is now implemented by open. Its prototype, together with the necessary header file is:
       #include<fcntl.h>
       int creat(const char *path, mode_t mode);
but is implemented as
       open(path, O_CREAT | O_TRUNC | O_WRONLY, mode);

dup and dup2

dup and dup2 duplicate the contents of an existing file descriptor. Remember, a file descriptor is the index of an array which contains a pointer to the file table. These functions allow a second file descriptor to index a pointer to the same file table. The difference is that dup takes a single argument, the file descriptor you want to duplicate, and returns a new file descriptor which is guaranteed to be the lowest available. dup2 gives you more control over the new file descriptor: it takes two arguments, an already opened file descriptor and a new file descriptor, and directs the new file descriptor to point to the same file table. This is especially valuable when we want to create pipes between programs. If the new file descriptor is actually being used, dup2 closes the file descriptor first, then reassigns it; if the two file descriptors are the same, nothing occurs.

dup
dup2
 
Purpose duplicate an existing file descriptor
Include #include<unistd.h>
Useage int dup(int oldd);
int dup2(int oldd, int newd);
Arguments oldd:    an existing file descriptor
newd:    the value of the new descriptor newd
Returns -1 on error
the value of newd
Errors EBADF:   oldd or newd is not a valid active descriptor
EMFILE:   Too many descriptors are active.