Closed Source
Stateless Parallel Processing Machines (SPPM)
Theory
SPPM was designed to deliver high performance and high application availability while leveraging commodity processors and communication components. Load balancing and fault tolerance were built-in from the beginning. In principle, SPPM is a dataflow parallel machine in which there are only two kinds of application programs: stateless workers and stateful masters. Work is distributed though a scatter/gather paradigm.
A stateless worker repeats a simple calculation cycle: get work information, perform work, publish results. The name “stateless” refers to the fact that there are mechanisms in SPPM that will automatically discard partial results and re-generate working assignments upon detection of worker failures. A stateful master is responsible for distributing work and collecting the results. Compute-intense (iterative or recursive) applications can be accelerated using multiple workers managed by appropriate masters.
The SPPM architecture also supports low overhead fault tolerance protocol for protecting master failures. The SPPM architecture reveals a simple design principle based on the generalized use of SPP concepts: it is possible to build a highly reliable high performance multiprocessor by minimizing unnecessarily exposed states in both computing and communication components. Philosophically, SPPM uses implicit parallelism while others (MPI and OpenMP) use explicit parallelism. The key is a smart networking layer combined with a real, distributed implementation of a tuple space to store data in order to provide the fault tolerance.
Application
The system was written in C# and leveraged several key features of the .NET runtime. Using aspect oriented techniques, we were able to simulate and unit test the master and worker applications without needing to execute them together. In unit testing the applications, common errors in cluster applications could be simulated to verify that the applications would behave as expected.
Given the ease of the API, writing parallel applications is incredibly easy. When ask to parallelize the approximation of pi, we were able to write the master and worker applications and run them on a cluster within ten minutes.
Other research
Behind the theory of SPPM is a strong mathematical model that allows you to easily form an equation to describe your SPPM application. Once this equation is built, you can apply very simple calculus to determine what will be the major bottleneck in your application and how many machines your application will use efficiently before you get diminished returns; if your algorithm will actually run slower in parallel than serially, this model will tell you.
Given the power and extensibility provided by the .NET platform, work began to build SPPM for multi-core machines. This was based in a VS2005 plugin that would process attributes and markup in a class to generate parallelized code which would spawn its own local SPP environment and automatically parallelize an algorithm. With this, you could do row striping, column striping, wavefront, and other data dependency parallel algorithms without writing very complex parallel code manually.




