CMSC 22300
Homework 2, Due 4/12/2012, 11pm

setfn.sml is a source file containing the functor SetFn from tutorial2.sml.
It has been modified by adding three functions

   val subset : set * set -> bool
   val fromList : element list -> set
   val toList : set -> element list
 

1. [20] Replace the simple but inefficient versions of union and diff SetFn
with more efficient versions (modeled after the subset function) that are
linear in the sum of the sizes of the two argument sets.  You can do this
by modifying the supplied setfn.sml source file in your repository.


2. [20] Note that with the addition of fromList and toList, we can use
a set module like IntSet2 (produced by applying SetFn to IntOrd) to perform
sorting. For instance,

   fun sort m = IntSet2.toList(IntSet2.fromList m)

or, more concisely,

   val sort : int list -> int list = IntSet2.toList o IntSet2.fromList

where "o" is the infix operator for function composition. The sorting
algorithm used here is called "insertion sort".

Use this idea to more directly implement the sorting algorithm by implementing
a functor SortFn.

functor SortFn (X: ORDER)
   : sig
       type element = X.t
       val sort : element list -> element list
     end = 
struct
  ...
end

You can submit this by modifying the supplied sortfn.sml source
file template in your repository.


3. [40] Unification

We can define a simple language of "terms" by specifying a set of
function symbols, each with an associated arities that specifies
the number or arguments that it takes, and a set of variables.
For instance, here we will deal with a language specified by:

   functions: {A: 0, B: 0, F: 1, G: 1, H: 2}
   variables: {x, y, z, ...}  (an unbounded supply of variables)

Function symbols of arity 0 like A and B are considered constants, and we
write "A" and "B" instead of "A()" and "B()". Here are some example terms
in this language:

   A, x, F(x), G(A), H(G(B),F(y)), H(A,HF(x),y))

And here are some nonexamples, because of arity violations:

   A(x), F(A,B,x), H(F(A))

The unification problem for this language is to find a substitution of
terms for variables that makes an equation (or list of equations) between
terms true.  For instance, the equation

  H(x, H(F(A), G(y))) = H(G(z)), H(F(u), z))

is solved, or made true, by the substitution

  s =  X -> G(G(Y)), U -> A, Z -> G(Y)

since if we apply s to both sides of the equation, we get

  H(G(G(Y)), H(F(A), G(Y))) = H(G(G(Y)), H(F(A), G(Y)))

Such a substitution is called a "unifier" for the set of
term equations.

Terminology: A term is either a variable (e.g. x) or an 
application of a function symbol to arguments (e.g. F(t),
or A()). In the later case, we call the function symbol
being applied the "head" function symbol.

We can "unify" or solve a list of equations as follows:

Let the first equation be t1 = t2, so the list is of the
form t1 = t2; eqns, where eqns are the rest of the equations.

Case: t1 and t2 are identical terms
Then procede to solve the remainder of the equations, eqns.

Case: t1 and t2 have the same head function symbol.

  (e.g. A = A, F(t1') = F(t2'), H(t11,t12) = H(t21,t22))

Then eliminate t1 = t2 and replace it with equations between
the corresponding arguments of the function at the front of
the equation list:

   A = A; eqns  ==>  eqns   (no arguments)
   F(t1') = F(t2'); eqns  ==>  t1' = t2'; eqns
   H(t11,t12) = H(t21, t22); eqns  ==> t11 = t21; t12 = t22; eqns

Case: t1 and t2 have different head function symbols
  (e.g. F(x) = G(A))
Then fail, because the equations obviously cannot be solved.

Case: t1 = x, a variable, t2 a term not containing x 
Then substitute t2 for x throughout the remaining equations
eqns, obtaining a modified list of equations eqns', and procede
to solve eqns'.  (Do this even if t2 is also a variable.)


Case: t1 = x, a variable, t2 a term containing x
Then fail, because equations like x = F(x) have no
(finite) solutions. But x = x is an exception, because
it is covered by the first rule above.  This is called 
the "occurrence check".

Case: t2 = x, a variable, while t1 is not a variable.
Reverse the first equation, yielding x = t1; eqns, and proceed
as above.

This procedure will eventually either fail, either because
of incompatible head function symbols or because a variable
is equated with a term containing that variable, or it will
terminate with an empty list of remaining equations, in which
the equations have all been solved. This procedure does not
actually produce the unifying substitution, it only determines
whether there is a solution or not.

   Example:  The equation list is:

        x = F(A), H(x,y) = H(F(A),x) 

   Then t1 = x, t2 = F(A), and eqns is H(x,y) = H(F(A),x).
   x does not occur in t2, so we substitute F(A) for x
   in eqns, yielding the modified equation list eqns': 

        eqns':  H(F(A),y) = H(F(A),F(A))

   This will get broken down into:

        eqns'':  F(A) = F(A), y = F(A)

   by the case for matching head function symbols.  Further
   steps give us:

        A = A, y = F(A)

        y = F(A)

        <empty>  (substituting F(A) for y in the empty list)

   which indicates a successful solution.


We can represent terms in this language by the ML datatype

  datatype term
    = A | B
    | F of term | G of term | H of term * term
    | Var of string

(where a variable x is represented as Var "x"). 

Write an implementation of the above "unification" algorithm
is SML.  You will need to implement at least the following two
functions:

  val subst : string * term * term -> term

such that, subst("x",t1,t2) substitutes t1 for occurrences of
Var "x" in t2, e.g.

  subst("x", F(A), Var "x") = F(A)
  subst("x", G(Var "y"), H(Var "x",G(B))) = H(G(Var "y"), G(B))

and 

  val unify : (term * term) list -> bool

which takes a list of equations (term pairs), and returns 
true or false indicating whether the list of equations has
a solution.

You can submit this by modifying the supplied unify.sml source
file template in your repository.