Jan30

MPI by Sun

Categories: Tools
0 Responses

Since we’ve been discussing MPI, I thought I’d mention that Sun is again supporting MPI.

For reasons too incredible to believe, Sun neglected to move its MPI package to Solaris 10, or to X64. After a nudge and a bit of funding from the HPCS program, they are fixing that. This is one of the best MPI implementations anywhere, and certainly the best on Sun systems. It can take full advantage of shared memory when it’s available.

It’s part of ClusterTools, which is now available for early access. It’s also available as open source using the Sun Community Source License.

Get it and try it out. If you have any problems, report them. The developers are a good crew, and will do their best to make you happy with it.

Jan30

Starting a group blog for HPCS

Categories: root
0 Responses

Some of my colleagues on the HPCS program liked the idea of blogging about some of the results we’ve been seeing on the program, so I’ve set up a group blog on the topic. I’ll be writing about more general tools issues in this blog.

Jan9

HPC Programming Models

Categories: Performance
0 Responses

Essentially all modern HPC code makes use of parallelism to speed up execution. Parallelism isn’t the only way to speed things up, but it’s the most general way, and the other approaches are usually used in massively parallel systems anyway. We’ll talk about these other approaches some other time.

There are two common approaches to tying computers together for HPC. One is a Symmetric multiprocessor, or SMP, which consists of group of processors sharing a common memory. These are very common, and are becoming even more common with multi-core chips. The other is a cluster of computers interconnected with some high-speed network. These techniques can be used together to build a cluster of SMPs. In fact, the most common HPC system in the near future will probably be a cluster of multi-core Opterons.

The most common tool for writing parallel programs is MPI, or the Message Passing Interface library. This is a library, used from either Fortran or C/C++, that handles data transfer and synchronization among processes.

If we look at real-world usage we discover that it’s dominated by MPI.

F90/F95 17
C/C++ 3
MPI 20
OpenMP 4 (as an alternative to MPI)
(Sca)LAPACK 5
NetCDF 2
PETSc 1

A larger survey of about 300 users at NCSA asked about how the program is parallelized.

MPI 44%
openMP 14%
Mixed MPI/openMP 11%
Automatic 8%

Unfortunately, though MPI works well, it’s not easy to use, since it’s a very low-level library.
A good part of our productivity study has been spend looking for alternatives.

Dec30

Guess I’ll Just Do It

Categories: Performance
1 Response

I’ve been putting off this first post: trying to get all of my materials set up, sort out what I want to do, and in general make everything perfect. Finally I had to face the inevitable I don’t know what I’m doing and the inarguable I’ll never get anything done unless I start and just do it. I’ll add pages and features as I get to them, and in the meantime I’ll try to keep it interesting.

HPCS

I’ve been working on the High Productivity Computer Systems (HPCS) project within Sun for the past two years. This is a project sponsored by DARPA to make super-computer systems more productive as well as faster and bigger. At the very least, it’s a noble effort and will increase our understanding of productivity. I’m in the Developer Products group and am working on developer tools for highly parallel programs.

I’m going to concentrate on Performance, Productivity, and tools, but with a bit of a different emphasis from most of the Sun blogs.

Performance

First, let’s define the term High Performance Computing (HPC). This used to be High Performance Technical Computing (HPTC), and I don’t know why it changed. The major features of HPC are floating point computation, arrays as a data structure, Fortran, and simply enormous problem size. Let’s discuss each point individually, keeping in mind that I can’t include everything in this initial post. There are unmentioned exceptions, overgeneralizations, and significant omissions from each discussion. I’ll try to cover those in later posts.

Floating Point Computation

Most HPC programs make heavy use of floating point computation. In fact a count of Floating Point Operations Per Second (FLOPS) is commonly used as a figure of merit for HPC systems. There are even one or two common applications for which this makes sense. Most HPC applications, though, are not so simple, and spend more time moving data around than actually doing floating point arithmetic. None the less, FLOPS is an easy number to measure, and is to this day the basis for inclusion in the top 500 list of supercomputers.

Arrays

Again, HPC programs tend to depend quite heavily on arrays rather than more complicated data structures. Even when there is some indirection involved, this is usually done with an array of indices. Scaling a program is usually a matter of changing the size of arrays. A great deal of discussion goes into just how these arrays are distributed in memory. Different programming models imply different data distribution.

Fortran

A skillful HPC programmer can write Fortran programs in any language, and usually does. Such programs have most of the computation in loops that iterate over arrays, doing some floating point operations on the elements. The calculations on an element are frequently independent or simply related to the calculations on other elements.
This makes it possible to vectorize or parallelize the loop to improve the performance. If the program is written in Fortran 90 or some later version, array operations provide a way to apply calculations to every element without writing any loops. In fact, it was a pleasant surprise to me just how close F90 programs are to the underlying mathematical notation. I haven’t written any Fortran programs since the 1970’s, but I’d certainly use it now if I had any computationally intensive problems to solve.

Size

HPC programs tend to be big, and they tend to have very large amounts of data. There is no such thing as “big enough” for a real HPC program. If you tell a web applications programmer that his machine just got ten times faster, he will say something like “Now I only need a tenth the number of machines to do my job.” if you tell an HPC programmer the same thing, he’ll say something more like “Now I can decrease the mesh granularity and get better answers.” In other words, system size and speed has moved from being a purely economic problem to being a technical problem. It’s a very different attitude.

Productivity

Productivity is a big and complex subject, and I’ll have lots to say about it over the future weeks. For now, I’d like to leave you with a couple of teasers. These are two keynote talks from a recent conference on programming models for HPC. They are by people who have been in this field a long time, and understand the problems very well indeed. I just have slide sets here, so they are short. Take time to look at them