Parallel Computing

In general, Taylor-type polycrystal models are ideally suited for the parallelization of the computational procedures. Especially, when CPU time is considered, the simulations fall in the category of “embarrassingly parallel” (e. g., Sorensen et al. (1995)) applications, and they provide significant computational improvements. However, such “embarrassingly parallel” applications are strictly feasible only if the total program size fits within a single processor of the parallel computer. This is not the case for the simulations presented in this paper, and it was necessary to implement the poly crystal FE model in a data parallel form as described by Beaudoin et al. (1993) and Inal et al. (2002a, 2003).

The parallel computing algorithms employed in the simulations are designed to distribute data on the microscopic level (crystal data) over the processors of a distributed memory supercomputer. By this method, the global size of the simulation is distributed between the processors of the parallel
computer. To illustrate this, consider a simulation with a total number N of crystals (Fig.1). The basic idea in the finite element formulation is that each material point is representing a polycrystal comprised of N crystals and the constitutive response is given through the Taylor polycrystal model (Fig. 1a). The global crystal data is distributed between the processors (Fig. 1b) such that each processor runs a part of the global program for B = N/A crystals where A is the total number of processors used in the simulation. (Note that the processors read only the crystal data to which they are assigned and all arrays containing microscopic quantities have the maximum size of B instead of

N.) Thus all processors compute microscopic arrays (for the set of crystals that they have assigned) independently. However, to compute the global stiffness matrix, the macroscopic values of stresses, stress rates and the moduli are required. These values are obtained by collective communication between the processors using the Message Passing Interface.

Figure 1. (a) Polycrystal aggregate comprised of N crystals, (b) the distribution of this polycrystal

aggregate between processors

The parallel computing algorithms which we have developed are essential for the simulations presented in this paper. These parallel sub-routines enable simulations with sufficiently fine meshes necessary to capture the key features of localized deformation for the aluminum alloy analysed.