CMSC 411 Project
A General Overview of Parallel Processing
Fall 1998
Summary & Questions
Mark Pacifico mjpac@wam.umd.edu Mike Merrill mwm@glue.umd.edu
A BRIEF OVERVIEW
There has been a lot of material touched upon up to this point on this webpage.  Let's briefly summarize what we have said.  First of all, parallel processing is the use of a collection of processing elements that communicate and cooperate to solve large problems quickly.  We do not see parallelism in personal computers as much because the performance of uniprocessors is quick enough right now to satisfy most time demands.  Parallel processing is the concept of solving a problem by a technique called divide and conquer.  Instead of executing a task linearly, the task is broken up (divided) into pieces that are executed (conquered) at the same time.  These same problems would take a ridiculous amount of time using just one processor.  By now we know that parallelism is not only important for supercomputers that solve highly complex problems, but it is also important for our personal and business computers even though for the most part these machines still contain a uniprocessor.  Due to the physical restriction of the speed of light is imposed on the speed of a uniprocessor, eventually parallelism will have to be utilized even in personal computers in order to sustain the rate of improvement in execution time.
A LITTLE BIT OF DETAIL
Many terms were introduced in our discussion of parallel processing.  This section offers a brief summary of these terms.  All computers are placed into one of four categories with respect to the parallelism present in the instruction and data streams.  SISD computers are the typical uniprocessor machines such as personal computers.  SIMD machines have multiple processors that execute the same instruction on multiple data streams.  MISD machines are nonexistent for commercial use, but may show up in the future.  Finally, in MIMD machines, each processor fetches its own instructions and operates on its own data.  Within MIMD machines, we have two types of memory arrangements: centralized shared memory which involves one main memory and several processors that have access to it, and distributed memory where each processor has its own slice of main memory.  The issue of how the processors communicate with the memory depending on which memory model the architecture is using is covered in the section on communication.  Finally, synchronization involves getting processors to agree on the value of a variable or a time at which to start or stop accessing.  With mutual exclusion, only one processor is allowed to access a particular memory location at a time.  Barriers involve making processors wait at a certain point in a program for others to catch up.  Memory consistency is an issue that can be solved using sequential memory consistency which carries with it expensive time delays, or a weaker memory consistency model that requires more programmer intervention.
A HISTORICAL PERSPECTIVE AND A GLIMPSE OF THE FUTURE
Parallel Processing can be traced back all the way to the 1950s.  As we have seen, the development of parallel architectures has been extremely rapid since then.  Less than ten years after Cocke and Slotnick's first discussions of the use of parallelism in 1958, Gene Amdahl discovers limits to parallelism, now known as Amdahl's Law.  In the early 70s, Control Data Corporation develops a parallel radar imaging system which achieves 250 times the performance of a system released just seven years prior.  Many other improved systems are developed at a rapid rate until the late 80s when demand for personal computers draws attention away from the development of parallel systems.  Today, certain companies such as Cray and Sun continue development and improvement in parallel computing mainly for use in government, business, and the military.  As we have stated earlier, the speed of light is a physical limitation of the speed of uniprocessor machines that are so popular right now.  Therefore, processors keep getting smaller and smaller so that the distance signals pass is much less, thereby lessening execution time.  There is a limit on this reduction in size, however.  Obviously, we cannot have microprocessors that fit on the head of a pin.  This is the direction that uniprocessors are headed and it is simply impractical for this trend to continue.  Therefore, parallelism will eventually have to factor in when the increase in execution time in uniprocessors plateaus.  No one knows for sure when this will occur, but it is nearly inevitable that parallel processing will eventually play a larger role in every household in America that owns a personal computer.
QUESTIONS AND ANSWERS
Q. Explain why it may be necessary to turn to parallel architectures in the future if we want to furthur increase the speed of our computers.  Why are parallel architectures currently used mostly for supercomputing purposes only?
A. In the future, we may need to rely on parallelism to a large extent if we begin to reach the limit on speed using just one processor.  We would have to build incredibly small machines in order to continue the improvements in uniprocessor speed being made today.  However, for the time being uniprocessor machines are keeping up with the demands of computer applications, leaving parallel architectures to be used primarily in solving larger problems more likely to be encountered by a supercomputer.

Q. Explain the differences between SIMD and MIMD computers.  Which architecture is popular today, and why?
A. Single instruction, multiple data stream (SIMD) computers are multiprocessor machines that execute the same instruction with multiple processors using different data streams.  Each processor has its own data memory, but there is a single instruction memory and control processor that fetches and dispatches instructions.  SIMD processors are typically special purpose, since full generality is not required.  Multiple instruction, multiple data stream (MIMD) computers often use off the shelf microprocessors. Each processor fetches its own instructions and operates on its own data.  MIMD architectures are popular today because they  make good general purpose microprocessors, and these tend to be more inexpensive and cost efficient than those that are custom made.  In fact, almost all MIMDs built today use the same microprocessors found in workstations and small servers.  MIMDs also offer flexibility, and can function as single-user machines running a single high-performance application, multiprogrammed machines running several tasks simultaneously, or some combination of both.

Q. Explain why it is that it is often more efficient to send one large message through the processor network than many small ones?
A. The delay in sending the data is equal to the latency in sending the first word.  Therefore, if we send many small messages, if the first one is large in comparison to the rest, we will have unnecessary delays in the sending of the majority of the message.

Q. Give an example (other than the one given) of a parallel algorithm that creates a problem that can be solved with mutual exclusion.  Explain.
A. Suppose we are trying to compute the boolean OR of a set of bits.  Assume the bit values reside in the processors.  One algorithm to do this is to first store a 0 into some shared location, Proc0, then have each processor grab the value in Proc0, compute the OR of its value and the one it grabbed and place the result back into Proc0.  Now, suppose we have two processors, Proc1 that contains a 1 and Proc2 that contains a 0.  One possibility for the execution path is for Proc1 to get the value in Proc0 (0), Proc2 to get the value in Proc0 (0), Proc1 computes the OR of 1 and 0 and stores 1 into Proc0, and Proc2 computes the OR of 0 and 0 (0) and stores it into Proc0.  Therefore, Proc0 ends up with the value 0, which is incorrect.  Mutual exclusion could solve this problem.

Proposal | Introduction | History | Parallelism | Communication | Synchronization | Summary & Questions
Copyright 1998, Mark Pacifico & Mike Merrill