= Difference between parallel and concurrent programming. Concurrency - separate, interacting, interleaved computations Parallel - computations are carried out simultaneously Concurrency is to enable certain kinds of programs or programming styles. For example, event-based programming; actor-oriented programmming. Has become popular lately in the form of Async libraries. If you saw the Jane Street talk last quarter, that's what they are supporting. Parallelism is to enable speedups. Draw the matix nCnP - simple sequential programs. This class. CnP - windowing systems. Example using CML later. nCP - typical Manticore programs. Example using PML later. CP - databases, web servers, lots of complex software = Concurrent ML Invented by Prof. Reppy around 1990 to solve the problem of concurrent programming, particularly for the space of windowing systems. This work laid out a lot of the foundations of what became event-based programming. Before that, people wrote loops that read from an event buffer, did work, and manually called "sleep" to return control to the system. Sieve of Eratosthenes Problem: compute the nth prime number. Imagine someone passing you the numbers in order, starting at 2. If you get a number, you know that it's prime, but you also know that no future number is prime if it is divisible by that number. So, you put in place a filter on that stream of numbers to remove any that are divisible by that number. In this example, we will talk about CML threads and channels. A CML thread is a piece of concurrent work and is a unit->unit function passed to CML.spawn. A CML channel is the strongly-typed communication and synchronization mechanism between CML threads. You can CML.send a value to a CML channel and CML.recv a value from the channel. CML.send blocks until the value is recv'd, and CML.recv blocks until a value is sent to it. - Demo cd manticore/swp/src/benchmarks/programs/cml-primes sml CM.make "smlnj.cm"; Main.timeit 100; Main.timeit 5000; Discuss that the 100 performance is with >100 CML threads and channels, filtering, etc. The implementation has been carefully tuned for the last two decades. = Parallelism The Manticore project was started in 2007 to address the issues of providing good parallel performance on the then-new multicore processors. Until that time, the state of the art was dual-processor machines and, frankly, most people just ignored the second processor unless they had trivially parallellizable programs. We extend the language Standard ML with syntax for implicitly-thread parallelism. That is, the programmer provides hints to the compiler that it might be useful to run some parts of the program in parallel, but it is not mandated. This approach addresses on of the larges challenges of even non-concurrent parallel programming --- without testing on every specific piece of hardware, it's hard to know how to divide up the work. Let's look at an example! Tree-add creates a perfectly balanced ternary tree (three-way) with random integers at the leaves. The problem is to add up all of the leaves. So, we use the obvious recursive definitions to build the tree and then time the addition. Now, let's look at the Manticore version of the same program. As you can see, it's just normal Standard ML code. But, there's a difference where the tuples are created for the three pieces of parallel work (in both building and adding). In these cases, we add a pipe, creating a parallel tuple. This syntax says to the compiler and runtime that you should consider running the work in parallel, if there are available resources to do so. So, how do we do? What I didn't mention is that we're logged in to a $15k AMD box with 48 cores, split into 4 12-core processors. Let's load up xload and do some timing! Note that the default is 3^14 - Run MLton version - Run Manticore sequential version - Run Manticore parallel version Now, try with 3^16. Manticore rules!