Preface (taken from A. Gottlieb intro to parallel programing - good overwiew of the field)

Over 40 years ago, before transistors replaced vacuum tubes, David Slotnick, who later became the father of the Illiac IV, proposed a parallel computer to the great John von Neumann himself. The master reflected a moment and replied in his charming Hungarian accent: "too many toobs" .. and he was right. It took technology forty more years to produce microprocessor chips with enough "toobs" to make a parallel computer practical. But now it has happened - the "attack of the killer micros" has indeed occurred, at least ten highly parallel computers are commercially available, and "it's an exciting time" is even truer than it was when our first edition appeared five years ago.

Some of the hot topics of the 80's are now settled. There is no longer any question that there are significant parallel applications. Instead, one asks how best should they be described and what hardware/software design is most cost-effective for their solutions. And there is also no longer any question that large-scale parallel-processing hardware can be built. Indeed, if your checkbook is in good shape, you can have proof delivered to your doorstep. System software for these new architectures is, however, still immature--perhaps the key problem to be solved. Alan Karp's 1987 paper was subtitled "The State of the Art of Parallel Programming and What a Sorry State that Art Is In". A similar remark could still be made today. For one thing, job scheduling is now complicated by a desire to have processes within a job in some sense scheduled together. In addition, it is still not clear what the appropriate programming language(s) should be, programming environments need much improvement, and difficult compiling questions remain to be solved. Also still not settled are the debates about SIMD vs. MIMD and shared memory vs. message passing, although in the latter case significant convergence has occurred at the level of hardware architecture.

What sets this book apart?

What's new in the second edition?

A full summary of each chapter is given further below, but here we highlight various improvements in this edition for those familiar with our first edition. By popular request, there are now EXERCISES for each chapter! And of course much material was added and updated to reflect the developments of the last five years.

Specific modifications include a quantitative treatment of speedup introduced in Chapter 1 and expanded upon in Chapter 4. A new application, seismic migration, has been added to Chapter 2, as well as a discussion of the pros and cons of using a network of workstations as a parallel computer. Chapter 3 has tracked the (large) changes in technology, including the rise in importance of disk arrays and optical fibers, and we have expanded the discussion of the impact of technology. We have consolidated the material on computational models in Chapter 4 and clarified its relationship to the programming languages we discuss in Chapter 5, which has been completely rewritten to include Fortran 90, Fortran D, HPF, Occam, remote procedure calls, Express, PVM, Linda, and other near-term parallel programming approaches, and to compare these with the work in new languages based on functional, dataflow, and logic programming approaches. Although the fundamentals of parallel compilers and operating systems have not changed, there has been considerable research and development, which we have tracked in Chapters 6 and 7. Fat trees, as used in CM-5, and cut-through (wormhole) switching have been added to Chapter 8. Chapter 9 includes treatments of three new commercial SIMD offerings: the MasPar MP-1 and MP-2, the Cray Y-MP C90, and the NEC SX-3. In addition, the section on CM-2 has been expanded. Six new commercial MIMD offerings have been added to Chapter 10: the Thinking Machines CM-5, the Kendall Square Research KSR1, the Intel Paragon, the IBM SP1, the NCUBE Corp. NCUBE-2, and the Cray Research T3D. Information is also included on an expected machine from Tera Computer as well as the recently popular idea of assembling a cluster of workstations as an MIMD system. Although commercially the battle still rages between shared memory and message passing, some convergence can be seen, especially in research machines like the Stanford Dash and the Alewife and J machines from MIT. These three prototypes are described and the convergence issue is discussed. Many of the newer MIMD machines feature directory based caches and this important memory organization is explained.

Ways to Use this Book

Although the book is an organic whole, it is also divided into three meaningful parts--Part I: Foundations, Part II: Software, and Part III: Architectures. Readers with a particular interest in architectures can proceed directly to Part III. Readers who have somewhat broader interests can read the Overview chapter and then proceed to the Part of their choice. And readers interested in the full breadth of parallel computing can proceed through the chapters in order.

It is tempting to begin with parallel architectures ("the fun stuff"), and those who succumb to this temptation are referred to Part III, the third and largest part of our book. Four chapters there cover interconnection schemes, parallel computers whose processing elements do not have their own programs, those that do, and hybrids of these two.

However, parallel computing is a much broader subject. Its major software components form the subject of the second part of this book. The three chapters involved discuss parallel languages, compilers, and operating systems.

But we are dealing here with a fundamental expansion in the possible ways of thinking about and doing computations, one that re-opens many questions that arose during the forty years that electronic computers have been with us and were answered under technological constraints that no longer exist. To put into perspective the opportunity and adventure that parallel processing represents, we begin this book with a four-chapter first part entitled "Foundations". It provides a preview of the whole book and a perspective on potential parallelism in applications, technological limitations and opportunities, and formulation of parallel solutions.

Chapter Summaries

Chapter 1 (Overview) establishes a hardware/software theme that runs through the book: the driving forces for parallel processing come from both areas. Hardware technology (chiefly VLSI) has led to a repeal of "Grosch's law", which held that the best price/performance was obtained with the most powerful uniprocessor. This repeal makes it reasonable to propose supercomputers made up of a collection of processors, and a number of such machines exists. From the software side, there is a strong argument for new languages to increase programmer productivity, which creates a demand for new architectures to execute these new languages efficiently. This chapter also serves as an overview of things to come; a number of the key concepts are introduced here and expanded upon in subsequent chapters.

Chapter 2 (Applications) focuses on the applications that offer sufficient parallelism to keep many processing elements busy simultaneously. These applications fall into two broad categories, numeric processing and symbolic processing, distinguished by the ratio of calculation to data motion among the processing elements. Scientific and engineering applications offer some very large numeric computation problems such as particle calculations (plasmas, QCD), fluid dynamics (weather modeling, aircraft design), seismic migration, computer-aided design, and others. Symbolic processing applications include database systems and artificial intelligence. We discuss Production Systems, Expert Systems, and projects in the more distant future, such as perception/planning systems.

Chapter 3 (Technology) discusses hardware technology as it limits the speedup of uniprocessors and also as it constrains and encourages the development of parallel processors. VLSI is a prime factor, providing powerful and cheap microprocessors for assembly into an ensemble, and enabling more unconventional and customized approaches to building a parallel machine. We also discuss (RAM) memory and (I/O) storage technologies including disk arrays and communication media such as optical fibers. But technology alone is not a panacea; wiring, packaging, and power dissipation still exert strong influences on a feasible design.

Chapter 4 (Computational Models and Algorithms) describes the most prominent parallel computational models, and terms like speedup are defined both informally and quite rigorously. We consider what language features and constructs are needed to express what one wants done in parallel, and discuss the communication and synchronization functions that ANY computational model has to perform. We then turn to the first step in getting an application solved on a parallel computer: choosing the algorithms to be used. We give examples, and discuss how a good algorithm is a necessary but not always sufficient condition for speeding up execution of the entire problem.

Chapter 5 (Languages) separates and compares imperative languages, including implicit and explicit parallelism, and declarative languages, including functional and logic programming. Some languages, like Occam, Id and Sisal, are comparatively new and were designed with parallel execution in mind. Others, like HPF and Fortran 90, are parallel versions of existing serial languages. A third class, including PVM, Express, and Linda represent "aftermarket" parallelism in the sense that the user adds them to an existing serial language. We discuss how this all impacts the programmer's world.

Chapter 6 (Compilers) treats the problem of compiling high-level language programs to run on parallel processors. We discuss vectorization vs. parallelization and show the importance of program flow and dependency graph techniques. Examples are drawn from Parafrase, PFC, PTRAN, MIMDizer, and Bulldog, which all accept FORTRAN. We also discuss an APL and a Sisal compiler.

Chapter 7 (Operating Systems) deals with operating systems for a highly parallel computer. They will have features in common with operating systems for serial processors, but also have critical additional functions to perform. We distinguish between systems that support parallel applications and systems that are themselves parallel, and then give examples of systems having different degrees of internal parallelism.

Chapter 8 (Interconnection Networks) examines the variety of networks that can be used to interconnect the processing elements. We present the topologies and routing methods used including those found on the newest commercial offerings. We also compare alternative switching strategies including, circuit switching, store-and-forward, cut-through, and wormhole routing. We also discuss the engineering costs and limitations of the various networks.

Chapter 9 (SIMD Architectures) treats the class of parallel computers known as SIMD, or Single Instruction, Multiple Data stream designs. These machines are used mostly for problems having high degrees of small-grain parallelism. We discuss the essentials of pipelined SIMD vector and array processors such as the Cray series from the original Cray-I to the C90, the NEC SX series, the Cyber 205, and FPS machines, as well as parallel SIMD designs such as systolic arrays, the Connection Machine, the MasPar MP-1 and MP-2, the MPP, the DAP, and GF11, and we discuss tradeoffs between SIMD and MIMD.

Chapter 10 (MIMD Architectures) shifts to the class of parallel architectures known as the MIMD, or Multiple Instruction, Multiple Data stream category. Most of these designs are intended to take advantage of medium- and large-grain parallelism. We distinguish between private-memory and shared-memory designs, and between machines built from thousands of microprocessors and machines made up of tens of larger processors. We also discuss why MIMD is harder but probably more generally applicable than SIMD. We describe many MIMD designs and commercial offerings including both shared- and private-memory configurations, and discuss the tradeoffs involved. In particular, the new CM-5, KSR1, SP1, T3D, NCUBE-2, and Paragon are described as well as several research prototypes. The partial convergence of shared and private memory is noted, workstation clusters are discussed, and the key issue of consistent caches is explained.

Chapter 11 (Hybrid Architectures) concludes the book with a description of some interesting parallel machines that do not fit neatly into either the SIMD or MIMD category, namely, VLIW (Very Long Instruction Word) and MSIMD (Multiple SIMD) designs.

Acknowledgements

We are most grateful for the generosity and encouragement of our families and of our many friends at IBM, NYU, and the rest of the parallel processing community, including the reviewers of this book. Allan Gottlieb was supported in part by grants from the National Science Foundation and the Applied Mathematical Sciences program of the Department of Energy.

George S. Almasi
almasi@watson.ibm.com

Allan Gottlieb
gottlieb@nyu.edu