Project 4: The Unix Shell

Objectives

There are four objectives to this assignment:

To write a complete program from scratch. We only provide a small code skeleton.
To learn about shell functionalities. You will familiarize yourself with the Linux programming environment and gain exposure to the necessary functionality in shells.
To learn about process interaction. You will learn about how a process is started, how a parent process waits for the child to terminate, how processes communicate via pipes, and so on.
To learn about defensive programming. In real life, users input invalid values. Your program should perform "sanity checks."

Overview

In this assignment, you will implement a command line interpreter or shell. The shell should operate in this basic way: when you type in a command (in response to its prompt), the shell creates a child process that executes the command you entered and then prompts for more user input when it has finished.

The shell you implement will be similar to, but much simpler than, the one you run every day in Unix. You can find out which shell you are running by typing "echo $SHELL" at a prompt. You may then wish to look at the man pages for 'sh' or the shell you are running (more likely tcsh or bash) to learn more about all of the functionality that a shell can support. For this project, you do not need to implement much functionality.

Program Specifications

Basic Shell

Your basic shell is basically an interactive loop: it repeatedly prints a prompt "myshell> ", parses the input, executes the command specified on that line of input, and waits for the command to finish.

linux3% ./myshell
myshell>

This is repeated until the user types "exit". (Please note that there is a space after myshell> sign in the prompt). The name of your final executable should be myshell.

You should structure your shell such that it creates a new process for each new command. The only exception is that if the command is a built-in command (see below) you should not call fork.

Parsing a Command Line

Your basic shell should be able to parse a command, and run the program corresponding to the command. For example, if the user types "ls -la /tmp" , your shell should run the ls program with all the given arguments and print the output on the screen.

The maximum length of a command line your shell can take is 512 bytes (excluding the newline).

Multiple Commands

After you get your basic shell running, your shell is not too fun if you cannot run multiple jobs on a single command line. To do that, we use the ";" character to separate multiple jobs on a single command line.

For example, if the user types "ls; ps; who" , the jobs should be run one at a time, in left-to-right order. Hence, in our previous example ( "ls; ps; who" ), first ls should run to completion, then ps , then who . The prompt should not be shown again until all jobs are complete.

Built-in Commands

Whenever your shell accepts a command, it should check whether the command is a built-in command or not. If it is, it should not be executed like other programs. Instead, your shell will invoke your implementation of the built-in command.

The UNIX shell has many built-in commands, such as cd , echo , pwd , etc. In this project, you will only implement three built-in commands: exit, cd and pwd, specifically by using the exit(0), chdir, getcwd, and getenv system/library calls. For example, to implement the exit built-in command, you simply call exit(0); in your C program.

Your shell users will be happy with the cd+pwd feature because they can change their working directory. Without this feature, your user is stuck in a single directory.

Redirection (">")

Often, a shell user prefers to send the output of his/her program to a file rather than to the screen. The UNIX shell provides this nice feature with the ">" character. Formally this is called "redirection of standard output." To make your shell users happy, your shell should also include this feature.

For example, if a user types "ls -la /tmp > output" , nothing should be printed on the screen. Instead, the output of the ls program should be rerouted to the output file.

If the "output" file already exists before you run your program, you should print the one and only error message (see "Error Message" section below), and move on to the next command. Your shell should keep running. If the output file is not specified (e.g. "ls > " ), you should also print the error message.

If the command before redirection is a built-in command (for example, "cd > output"), you should also throw the error message. Built-in commands should be called without any redirection.

Advanced Redirection (">+")

In addition to the ">" basic redirection, you also need to support a custom advanced redirection ">+".

For example, if I type program >+ outputFile, it will insert the program's output to the beginning of the outputFile without overwriting the old content. That is the old content will be shifted.

If the outputFile does not exist, ">+" will behave like ">".

This is a feature that does not exist in typical shell programs. That is, ">" overwrites the output file and ">>" appends to the end of the output file (in this project you don't need to support ">>"). But, there is no support to "insert" the output to the beginning of the file.

To give you a specific example, let's imagine two input files aaa and bbb that contains "aaa" and "bbb" respectively. If you run the following commands below, you should see "bbbaaa" printed on the screen.

(outputFile does not exist before)
myshell> cat aaa >+ outputFile
myshell> cat bbb >+ outputFile 
myshell> cat outputFile

White Spaces

Zero or more spaces can exist between a command and the shell special characters (i.e. ";" and ">" ). All of these examples are correct.

myshell> ls;ls;ls
myshell> ls ; ls ; ls
myshell> ls>a; ls > b; ls> c; ls >d

If you are unsure whether a particular command is valid or not, the rule of thumb is to try it in the UNIX shell. If the UNIX shell accepts that command, your shell should accept the same command.

Batch Mode

So far, you have run the shell in interactive mode. Most of the time, testing your shell in interactive mode is time-consuming. To make testing much faster, your shell should support batch mode.

In interactive mode, you display a prompt and the user of the shell will type in one or more commands at the prompt. In batch mode, your shell is started by specifying a batch file on its command line; the batch file contains the same list of commands as you would have typed in the interactive mode.

In batch mode, you should not display a prompt. In batch mode you should print each line you read from the batch file back to the user before executing it; this will help you when you debug your shells (and us when we test your programs). To print the command line, do not use printf because printf will buffer the string in the C library and will not work as expected when you perform automated testing. To print the command line, use write(STDOUT_FILENO, ...) this way:

write(STDOUT_FILENO, cmdline, strlen(cmdline));

In both interactive and batch mode, your shell terminates when it sees the exit command on a line or reaches the end of the input stream (i.e., the end of the batch file).

To run in batch mode, your C program must be invoked exactly as follows:

myshell [batchFile]

The command line arguments to your shell are to be interpreted as follows.

batchFile: an optional argument (often indicated by square brackets as above). If present, your shell will read each line of the batchFile for commands to be executed. If not present or readable, you should print the one and only error message (see "Error Message" section below).

Implementing batch mode should be very straightforward if your shell code is nicely structured. The batch file basically contains the same exact lines that you would have typed interactively in your shell. For example, if in the interactive mode, you test your program with these inputs:

linux3% ./myshell
myshell> ls ; who ; ps
some output printed here 
myshell> ls > /tmp/ls-out;;;; ps > /non-existing-dir/file;
some output and error printed here 
myshell> ls-who-ps
some error printed here

then you could cut your testing time by putting the same input lines to a batch file (for example myBatchFile):

ls ; who ; ps
ls > /tmp/ls-out;;;; ps > /non-existing-dir/file;
ls-who-ps

and run your shell in batch mode:

linux3% ./myshell myBatchFile

In this example, the output of the batch mode should look like this:

ls ; who ; ps
some output printed here 
ls > /tmp/ls-out;;;; ps > /non-existing-dir/file;
some output and error printed here 
ls-who-ps
some error printed here

Handling blank lines:: In batch mode, if the input file contains blank lines, do not print the blank lines.

Defensive Programming and Error Messages:

In this project, defensive programming is also required. Your program should check all parameters, error-codes, etc. In general, there should be no circumstances in which your C program will core dump, hang indefinitely, or prematurely terminate. Therefore, your program must respond to all input in a reasonable manner; by "reasonable", we mean print the error message (as specified in the next paragraph) and either continue processing or exit, depending upon the situation.

Since your code will be graded with automated testing, you should print this one and only error message whenever you encounter an error of any type:

char error_message[30] = "An error has occurred\n";
write(STDOUT_FILENO, error_message, strlen(error_message));

The error message should be printed to stdout (not stderr!). Also, do not attempt to add whitespaces or tabs or extra error messages.

You should consider the following situations as errors; in each case, your shell should print the error message to stdout and exit gracefully:

Two or more input files to your shell program.

For the following situation, you should print the error message to stdout and continue processing:

A command does not exist or cannot be executed.
A very long command line (over 512 characters, excluding the carriage return).

Your shell should also be able to handle the following scenarios below, which are not errors . The best way to check if something should return an error is by checking our test files and the expected outputs.

An empty command line.
An empty command between two or more ';' characters.
Multiple white spaces on a command line.
White space before or after the ';' character or extra white space in general.

All of these requirements will be tested extensively! These lists will likely to grow as we receive questions from you.

Hints and Details

Writing your shell in a simple manner is a matter of finding the relevant library routines and calling them properly. To simplify things for you in this assignment, we will suggest a few library routines you may want to use to make your coding easier. To find information on these library routines, look at the manual pages (using the Unix command man).

Parsing a Command Line

Parsing: For reading lines of input, you may want to look at fgets(). To open a file and get a handle with type FILE * , look into fopen() library call. Be sure to check the return code of these routines for errors! (If you see an error, the routine perror() is useful for displaying the problem. But do not print the error message from perror() to the screen. You should only print the one and only error message that we have specified above). You also may find the strtok() routine useful for parsing the command line (i.e., for extracting the arguments within a command separated by whitespace or a tab).

Too-long command line: A command line that is too long consists of more than 512 of any characters excluding the newline character (Hint: so you must create an array of 514 characters to carry the newline and null-termination character). If you type more than 512 white spaces, it is considered as an invalid command line.

When you encounter a line that is too long, print the whole line, print the error message, then throw the line away (i.e. do not execute any command in this line), and continue to the next command line. If the too-long command line only consists of whitespaces, you still need to print the whole line.

If a command line is too long, you need a special routine to read the rest of characters in the invalid command line, and throw those characters away. For example if a command line is 520 characters long, your special handling routine should read the 8 remaining characters and throw them away. Not doing this will make your shell erratic.

Basic Shell

Executing Commands: Look into fork , execvp , and wait/waitpid . See the UNIX man pages for these functions. Before starting this project, you should definitely play around with these functions.

You will note that there are a variety of commands in the exec family; for this project, you must use execvp . You should not use the system() call to run a command. Remember that if execvp() is successful, it will not return; if it does return, there was an error (e.g., the command does not exist). The most challenging part is specifying the correct arguments. The first argument specifies the program that should be executed, including the full path to the program; this is straight-forward. The second argument, char *argv[] matches those that the program sees in its function prototype:

int main(int argc, char *argv[]);

Note that this argument is an array of strings, or an array of pointers to characters. For example, if you invoke a program with:

foo 205 535

then argv[0] = "foo", argv[1] = "205" and argv[2] = "535".

Important: the list of arguments must be terminated with a NULL pointer; that is, argv[3] = NULL. We strongly recommend that you carefully check that you are constructing this array correctly!

Multiple Commands

If you get your basic shell running, supporting multiple commands should be straight-forward. The only difference here is that you need to wait for the previous process to finish before creating a new one. To do that, you simply use waitpid() again.

Built-in Commands

SHIQIN: for this project, you should throw an error if redirection is applied to built-in command (e.g., "pwd > output" is considered illegal)

For the 'exit' built-in command, you should simply call 'exit();'. Your shell process will exit, and the parent (i.e. the real shell terminal) will be notified.

For managing the current working directory, you should use getenv, chdir, and getcwd . The getenv() system call is useful when you want to go to your $HOME directory. The getcwd() system call is useful when you need to know the current working directory; i.e. if a user types pwd, you simply call getcwd(). And finally, chdir() is useful for moving between directories. Note that you do not need to manage the $PWD environmental variable. Your job is simple. If when a user types "cd", you should change to to the home directory, which you can get from the getenv(HOME). When a user types cd aPath, then just pass that aPath to the chdir() call.

Extra notes on $PWD: In the UNIX shell, when you run "echo $PWD" and "pwd", you sometimes get different outputs. The reason is there are two places where you can get the current working directory. The pwd call gets the string from getcwd() system call which will give you the absolute path, while the second gets the string from the getenv("PWD") system call. So, you might ask which string should you use. The answer is you should use the absolute path from getcwd(). Thus, again, you do not have to manage the $PWD variable. This will reduce the code you need to write.

You do not have to support tilde (~) in this project.

White Spaces

The exact formats for exit, cd and pwd are:

[optionalSpace]exit[optionalSpace]
[optionalSpace]pwd[optionalSpace]
[optionalSpace]cd[optionalSpace]
[optionalSpace]cd[oneOrMoreSpace]dir[optionalSpace]

Any other formats should not be accepted, i.e. do not run the command, but do print the error message and continue processing the next command.

A tab is considered as a white space.

Redirection Hints

Redirection is probably the trickiest part of this project. For this you need dup()/dup2() (As usual, read the man pages). You actually do not need to use pipe() (unless you want to be fancy).

The idea of using dup2 is to intercept the byte stream going to the standard output (i.e. your screen), and redirect the stream to your designated file. dup2 uses file descriptors, which implies that you need to understand what a file descriptor is.

With file descriptor, you can perform reads and writes to a file. Maybe in your life so far, you have only used fopen(), fread(), and fwrite() for reading and writing to a file. Unfortunately, these functions work on FILE* structures, which are C library abstractions rather than a UNIX abstractions; when using FILE* structures, the file descriptors are hidden. Hence, it is impossible for you to use dup2 with these particular functions.

To work on file descriptors, you should use the creat(), open(), read(), and write() system calls. These functions perform their works by using file descriptors. (Again, check out the "man" pages.) Before reading forward, you should familiarize yourself familiar with file descriptor interface.

The idea of redirection is to make the stdout descriptor point to your output file descriptor. First of all, let's understand the STDOUT_FILENO file descriptor. When a command "ls -la /tmp" runs, the ls program prints its output to the screen. But obviously, the ls program does not know what a screen is. All it knows is that the screen is basically pointed by the STDOUT_FILENO file descriptor. In other words, you could rewrite printf("hi") in this way: write(STDOUT_FILENO, "hi", 2) .

To give yourself some practice, create a simple program where you create an output file, intercept stdout, and call printf("hello"). When you create your output file, you should get the corresponding file descriptor. To intercept stdout, you should call "dup2(output_fd, STDOUT_FILENO);" . If you run your program, you should not see "hello" printed on the screen. Instead, the word has been redirected to your output file.

In short, to intercept your 'ls' output, you should redirect stdout before you execute ls, i.e. make the dup2() call before the exec('ls') call.

Redirection given a non-existing or non-runnable program: When you run a non-existing/non-runnable program with redirection, e.g. noprogram > output , you should print the error message, but the question is where? to output or to stdout? Our "intention" is to print the error message to stdout, but that is impossible because you have redirected your stdout to the output file (actually it is possible, but it requires a little bit more work). Therefore, write(STDOUT_FILENO, ... error message ...) will actually be rerouted to the output file, and that is okay. This simplification should (again) make your life easier.

More specifically, when you create a child process with redirection, you need to intercept stdout to your output fd before you call execvp. Then, you run execvp(noprogram), and it will return an error because noprogram cannot be found. Then, your child process will call write(STDOUT_FILENO, ... error message ...) , but the stdout file descriptor of your child process has been "rerouted" to the output fd. Hence, the error message will appear in the output file.

Redirection format: Whenever you find a redirection character '>' in a command, you should check whether the format of the command is correct or not before running the program. The format of a valid redirection command looks like:

[optSpace]progAndArgs[optSpace]>[optSpace]outFile[optSpace]

progAndArgs is basically the program and all the arguments that a user types in. optSpace implies there could be 0 or more whitespaces (including tabs). The output file, outFile, should consist of characters without any white spaces. Also note that in a valid redirection command, the '>' character only appears once. Whenever you encounter an invalid redirection command, you should print the one and only error message (you should know it by now), and continue processing the next command. Here are some examples of invalid redirection command:

ls > out1 out2
ls > out1 out2 out3
ls > out1 > out2

Defensive Programming and Error Messages

No printf! When you want to print a command line or error message, your shell code should never use printf . Instead, use write(STDOUT_FILENO, ...). For example:

write(STDOUT_FILENO, cmdline, strlen(cmdline));
char error_message[30] = "An error has occurred\n";
write(STDOUT_FILENO, error_message, strlen(error_message));

The reason is quite complicated. In short, you will test your shell in an automated way and using printf in your shell code will make the output undeterministic.

Shell-related vs. program-related errors. Note that there is a difference between errors that your shell catches and those that the program catches. Your shell should catch all the syntax errors specified in this project page. If the syntax of the command looks perfect, you simply run the specified program. If there is any program-related errors (e.g. invalid arguments), let the program prints its specific error messages to anywhere it wants (e.g. could be stdout or stderr). For example, if you run program > output , and the program throws some error messages to stdout, they will be automatically rerouted to output file (see the next clarification below for more). But if the program throws some error messages to stderr, let it be like that (i.e., you do not have to do extra work to reroute stderr to output file or stdout).

Miscellaneous Hints/Notes

Remember to get the basic functionality of your shell working before worrying about all of the error conditions and end cases. For example, first get a single command running (probably first a command with no arguments, such as "ls"). Then try adding more arguments.

Next, try working on multiple commands. Make sure that you are correctly handling all of the cases where there is miscellaneous white space around commands or missing commands. Finally, add built-in commands and redirection support.

We strongly recommend that you check the return codes of all system calls from the very beginning of your work. This will often catch errors in how you are invoking these new system calls.

Beat up your own code! You are the best (and in this case, the only) tester of this code. Throw lots of junk at it and make sure the shell behaves well. Good code comes through testing -- you must run all sorts of different tests to make sure things work as desired. Don't be gentle -- other users certainly won't be. Break it now so we don't have to break it later.

There was a case in the past where "ls" returns a different ordering of filenames on a student laptop. If you have this problem, check your $LANG environment variable (run "echo $LANG") and make sure it is set to "en_US" or "en_US.UTF-8".

Hand-in

You should create p4shell directory in your SVN, at the same directory level as your other projects.

Inside the p4shell directory, you should submit only the following files: myshell.c and README.

Your entire shell code should be put into only one file: myshell.c

To ensure that we compile your C correctly for the demo, you will need to create a simple Makefile; this way our scripts can just run make to compile your code with the right libraries and flags.

The name of your final executable should be myshell. This is what we will run when grading your work:

% rm -f myshell
% make
% ./myshell pathToTestFile

Your README file ("README" without file extension) should describe what functionalities are not working. If you think your code is perfect, simply write "All good".

Do not submit any .o or test files. Please just submit the two files above. Make sure that your code runs correctly on CSIL machines.

Automated Testing and Grading

We will run your program on a suite of batch files, some of which will exercise your programs ability to correctly execute commands and some of which will test your programs ability to catch error conditions. Be sure that you thoroughly exercise your program's capabilities on a wide range of batch files, so that you will not be unpleasantly surprised when we run our tests.

To automate grading, we will heavily use the batch mode. If you do everything correctly except the batch mode, you will not get partial credits. Hence, make sure you can read and run the commands in the batch file.

We release all the batch files you need to test your program. They can be found here:

test-scripts/

Also, please read the README-p4 file first. This README file contains everything you need to know regarding running the batch files.

IMPORTANT NOTE: Do NOT manually check your outputs with the expected outputs. You must compare them with the diff command line. The scripts that we provide to you essentially run this:

% ./myshell (pathToBatchFiles)/bf 1> bf.out 2> bf.err
% diff bf.out (pathToExpectedOutput)/bf.out
% diff bf.err (pathToExpectedOutput/bf.err

The command above executes your shell which will run all the commands in "bf" testfile. All stdout and stderr outputs of this execution will automatically go to "bf.out" and "bf.err". The "diff" command line compares your result with our expected result. If there is any discrepancies, the diff command will tell you the differences. If not, nothing will be printed out to the screen.

To make sure ">" works here, do not overwrite with the stdout and stderr file descriptors that belong to the parent process (the myshell process). You should only overwrite the stdout and stderr descriptors in child processes.

Finally, we encourage you to make progress by passing one test at a time. Do not try to write lots of code without testing any single test file. As you implement more features, make sure you re-run all the prior tests again. It's possible that prior tests that you passed are now failing because of your new code.

How to not get a ZERO?

Well, do some work obviously. In general, almost all students in the class pay a close attention to our project specifications (great job! we appreciate it!). However, there are always a few students who take project specifications lightly, get a zero, but then expect some credits. In this project, there is no exception.

You will get a zero on a testfile if your output is different than the expected output (yes, even if it's because of one blankspace!). That is, given a testfile, there is no partial credit (i.e. a test result is either pass or fail, nothing in between). Again, please use diff to check every test output. The only "partial credit" is based on how many testfiles you pass. In the past some students only ran ./myshell testfile and manually "eyeball" the output on the screen with the expected output. This manual approach is not acceptable.

You will get zero if we cannot redirect the output of your shell to a file (i.e. if we cannot run ./myshell testfile > output). As mentioned above, do not overwrite parent's stdout and stderr descriptors.

You will get a zero if your code contains the word "system" or any "#include<path-to-your-library>". That is, if you try to cheat by using the system() library call or calling your own hidden library, we will severely penalize you. Please don't be tricksy, it's just a simple shell.

You will get a zero if we find you copy-paste code from other sources. Remember, we have a database of many solutions and will run a code comparator.

You will get a zero if you try to copy the expected output manually or in some tricksy ways. Note that when we grade your code, we will use different input contents. Those who implement the right logic will not be affected by changes in the input content. But those who manually manufacture the outputs will be affected.

Provided Materials

We have seeded your repositories with a code skeleton (p4shell/myshell.c) and a Makefile (please do not modify the Makefile).