= Difference between parallel and concurrent programming.

Concurrency - separate, interacting, interleaved computations
Parallel - computations are carried out simultaneously

Concurrency is to enable certain kinds of programs or programming styles. For
example, event-based programming; actor-oriented programmming. Has become
popular lately in the form of Async libraries. If you saw the Jane Street talk
last quarter, that's what they are supporting.

Parallelism is to enable speedups. 

Draw the matix

nCnP - simple sequential programs. This class.
CnP - windowing systems. Example using CML later.
nCP - typical Manticore programs. Example using PML later.
CP - databases, web servers, lots of complex software

= Concurrent ML
Invented by Prof. Reppy around 1990 to solve the problem of concurrent
programming, particularly for the space of windowing systems. This work laid out
a lot of the foundations of what became event-based programming. Before that,
people wrote loops that read from an event buffer, did work, and manually called
"sleep" to return control to the system.

Sieve of Eratosthenes
Problem: compute the nth prime number.

Imagine someone passing you the numbers in order, starting at 2. If you get a
number, you know that it's prime, but you also know that no future number is
prime if it is divisible by that number. So, you put in place a filter on that
stream of numbers to remove any that are divisible by that number.

In this example, we will talk about CML threads and channels. A CML thread is a
piece of concurrent work and is a unit->unit function passed to CML.spawn.

A CML channel is the strongly-typed communication and synchronization mechanism between CML
threads. You can CML.send a value to a CML channel and CML.recv a value from the
channel. CML.send blocks until the value is recv'd, and CML.recv blocks until a
value is sent to it.

- Demo
cd manticore/swp/src/benchmarks/programs/cml-primes
sml 
CM.make "smlnj.cm";
Main.timeit 100;
Main.timeit 5000;

Discuss that the 100 performance is with >100 CML threads and channels,
filtering, etc. The implementation has been carefully tuned for the last two
decades.

= Parallelism
The Manticore project was started in 2007 to address the issues of providing
good parallel performance on the then-new multicore processors. Until that time,
the state of the art was dual-processor machines and, frankly, most people just
ignored the second processor unless they had trivially parallellizable programs.

We extend the language Standard ML with syntax for implicitly-thread
parallelism. That is, the programmer provides hints to the compiler that it
might be useful to run some parts of the program in parallel, but it is not
mandated. This approach addresses on of the larges challenges of even
non-concurrent parallel programming --- without testing on every specific piece
of hardware, it's hard to know how to divide up the work.

Let's look at an example!

Tree-add creates a perfectly balanced ternary tree (three-way) with random
integers at the leaves. The problem is to add up all of the leaves. So, we use
the obvious recursive definitions to build the tree and then time the
addition.

Now, let's look at the Manticore version of the same program. As you can see,
it's just normal Standard ML code. But, there's a difference where the tuples
are created for the three pieces of parallel work (in both building and
adding). In these cases, we add a pipe, creating a parallel tuple. This syntax
says to the compiler and runtime that you should consider running the work in
parallel, if there are available resources to do so.

So, how do we do? What I didn't mention is that we're logged in to a $15k AMD
box with 48 cores, split into 4 12-core processors. Let's load up xload and do
some timing!

Note that the default is 3^14
- Run MLton version
- Run Manticore sequential version
- Run Manticore parallel version

Now, try with 3^16. Manticore rules!