# Parallel computing for FEA simulations

Published on**07 Oct 2013**

The quest for both greater accuracy in simulations and more parameters to be considered in design sweeps has led to longer solution times – something which is at odds with industry’s drive for shortened design cycles. But design and optimization times can now be drastically reduced using Opera’s parallel capabilities.

Multiple jobs in a multiphysics or optimization process can be solved concurrently using the batch queue option and the distributed memory access. The speed-up thus obtained is linear depending only on available hardware resources. This is the preferred way of dealing with smaller models which require multiple runs to cover a wide range of configurations.

For large models that require substantial computational time, the parallel technology allows for the distribution of a single solution process over multiple threads on a shared memory PC. Opera included parallel algorithms in an earlier release, but the newest version, Opera 16R1 has further improved the implementation. For models that are dominated by the computation of coil fields, in particular, the speed-up obtained is virtually linear, allowing for a rapid design evaluation process. Other parts of the Opera solvers have also been parallelized, giving a substantial speed-up over a traditional serial process for all types of application and solvers.

The following examples show the typical speed-up that can be achieved for different types of problems. The models used cover a wide range of model configurations and simulation types. Each model is first run as a single-threaded job and the timings obtained are used as the reference for the comparison against the multi-thread case. All of the models are run on four threads and a separate run is done to evaluate the speed-up when going up to 16 threads.

The FEA process includes multiple stages that can benefit from parallelization. In this case study, the focus will be on three of these stages: coil field calculation, matrix fill and matrix solve.

### Speed-up evaluation on 4 threads

#### Model 1

The first model includes 19 Biot-Savart conductors and the solving time of over 2 hours is dominated by the coil field calculation (over 85%). The breakdown of the solve process into the main three stages is presented in Figure 1. The overall speed-up achieved using 4 threads, when compared with the single-thread case is 3.84 times.

### Model 2

The second example is a model that is also dominated by coil field calculation, with both Racetrack and Bedstead Biot-Savart conductors. The breakdown of the solve process is given in Figure 3, with the total solving time on one thread of around an hour and 15 minutes. The solving time using 4 threads is reduced to 28 minutes, giving an overall speed-up of 2.55 times.

#### Model 3

The next model includes 8 Biot-Savart coils (4 Racetrack and 4 Bedstead). The solving process is dominated by the coil field calculation which accounts for more than 70% of the total 24 minutes on one thread. The breakdown of the solve process is given in Figure 5. The overall speed-up for this model is 2.44 time, taking the total solve time on four threads to less than 10 minutes.

#### Model 4

The fourth model shows a very good speed-up, taking the total solve time from almost 2 hours down to just over 45 minutes using 4 threads. The solving process is dominated by the coil field calculation, the fields from the 4 Biot-Savart conductors taking up more than 70% of the total time (see breakdown in Figure 7). The speed-up achieved for the overall solve process is 2.32 times.

### Speed-up using multiple Parallel Packs

Using multiple parallel license packs, the gains obtained for large models are significant. Below is a comparison of the speed-up achieved over the range from 1 to 16 solving threads. The speed-up in the coil field calculation is almost linear, while the overall speed-up achieved is more than 6 times compared to the single-thread case.

### Conclusion

Parallel processing can offer significant computer time savings in finite element simulations. Opera 16R1 has upgraded algorithms which are showing considerable benefits on shared memory, multi-core PCs. Simulations containing many “Biot-Savart” conductor sources show the most benefit, with one of the example giving 3.17 speed-up using 4 threads. Simulations with non-linear materials also benefit considerably, as the assembly of matrix terms (matrix fill) is performed several times for each non-linear iteration to obtain optimum non-linear convergence. Speed-up for the matrix solve stage become more significant as the problem size increases.

Licensing of the parallel upgrade is offered in such a way to maximize both flexibility and scalability. Opera Multicore Parallel Packs can be applied to any of the physics modules in the Opera-3d suite. From the Opera Manager the user has control over how many threads are used for each solution. The Opera Multicore Parallel Packs, when applied in multiples, scale nonlinearly allowing the user to solve problems utilising very large numbers of cores.

For further details on Opera and its parallel capabilities please contact us