James F. Leathrum, Jr.
Initial work on the algorithm was done on the serial code to improve both the speed and memory constraints. Speed was improved by reducing the constant associated with the time complexity. Memory was reduced by modifications to the algorithm to reduce the amount of data stored at any point in time.
A parallel form of the FMA (The Parallel Fast Multipole Algorithm (PFMA)) was designed and implemented on a series of existing parallel architectures to study the characteristice of the algorithm, in particular the runtime ans speedup. An alternative to Greengard's Adaptive Fast Multipole Algorithm (AFMA) which addresses nonuniform particle systems is also proposed to facilitate its parallelization.
To prove the usefulness of the FMA to the scientific community, the serial version of the code has been applied to two sample application. The first uses FMA as the N-body solver in an existing molecular dynamics code. Results for several molecules are presented. The second uses the FMA code to solve a Gaussian distribution of bodies such as might occur in graviational force studies.
Given the experiences gathered from the parallel implementations, a new hardware architecture is proposed which takes advantage of the locally shared memories which are associated with the boundaries between processors so that each memory is shared among a subset of the processors in the system. Then a tree structure is imposed on this interconnect to provide easier treewalk operations which are required by FMA.
This page last modified 7/12/95 by Daniel Paul