parallel multigrid finite: Topics by WorldWideScience.org

Sample records for parallel multigrid finite

A survey of parallel multigrid algorithms

Science.gov (United States)

Chan, Tony F.; Tuminaro, Ray S.

1987-01-01

A typical multigrid algorithm applied to well-behaved linear-elliptic partial-differential equations (PDEs) is described. Criteria for designing and evaluating parallel algorithms are presented. Before evaluating the performance of some parallel multigrid algorithms, consideration is given to some theoretical complexity results for solving PDEs in parallel and for executing the multigrid algorithm. The effect of mapping and load imbalance on the partial efficiency of the algorithm is studied.
Analysis of a parallel multigrid algorithm

Science.gov (United States)

Chan, Tony F.; Tuminaro, Ray S.

1989-01-01

The parallel multigrid algorithm of Frederickson and McBryan (1987) is considered. This algorithm uses multiple coarse-grid problems (instead of one problem) in the hope of accelerating convergence and is found to have a close relationship to traditional multigrid methods. Specifically, the parallel coarse-grid correction operator is identical to a traditional multigrid coarse-grid correction operator, except that the mixing of high and low frequencies caused by aliasing error is removed. Appropriate relaxation operators can be chosen to take advantage of this property. Comparisons between the standard multigrid and the new method are made.
Semi-coarsening multigrid methods for parallel computing

Energy Technology Data Exchange (ETDEWEB)

Jones, J.E.

1996-12-31

Standard multigrid methods are not well suited for problems with anisotropic coefficients which can occur, for example, on grids that are stretched to resolve a boundary layer. There are several different modifications of the standard multigrid algorithm that yield efficient methods for anisotropic problems. In the paper, we investigate the parallel performance of these multigrid algorithms. Multigrid algorithms which work well for anisotropic problems are based on line relaxation and/or semi-coarsening. In semi-coarsening multigrid algorithms a grid is coarsened in only one of the coordinate directions unlike standard or full-coarsening multigrid algorithms where a grid is coarsened in each of the coordinate directions. When both semi-coarsening and line relaxation are used, the resulting multigrid algorithm is robust and automatic in that it requires no knowledge of the nature of the anisotropy. This is the basic multigrid algorithm whose parallel performance we investigate in the paper. The algorithm is currently being implemented on an IBM SP2 and its performance is being analyzed. In addition to looking at the parallel performance of the basic semi-coarsening algorithm, we present algorithmic modifications with potentially better parallel efficiency. One modification reduces the amount of computational work done in relaxation at the expense of using multiple coarse grids. This modification is also being implemented with the aim of comparing its performance to that of the basic semi-coarsening algorithm.
Parallel Element Agglomeration Algebraic Multigrid and Upscaling Library

Energy Technology Data Exchange (ETDEWEB)

2017-10-24

ParELAG is a parallel C++ library for numerical upscaling of finite element discretizations and element-based algebraic multigrid solvers. It provides optimal complexity algorithms to build multilevel hierarchies and solvers that can be used for solving a wide class of partial differential equations (elliptic, hyperbolic, saddle point problems) on general unstructured meshes. Additionally, a novel multilevel solver for saddle point problems with divergence constraint is implemented.
High Performance Parallel Multigrid Algorithms for Unstructured Grids

Science.gov (United States)

Frederickson, Paul O.

1996-01-01

We describe a high performance parallel multigrid algorithm for a rather general class of unstructured grid problems in two and three dimensions. The algorithm PUMG, for parallel unstructured multigrid, is related in structure to the parallel multigrid algorithm PSMG introduced by McBryan and Frederickson, for they both obtain a higher convergence rate through the use of multiple coarse grids. Another reason for the high convergence rate of PUMG is its smoother, an approximate inverse developed by Baumgardner and Frederickson.
Mapping robust parallel multigrid algorithms to scalable memory architectures

Science.gov (United States)

Overman, Andrea; Vanrosendale, John

1993-01-01

The convergence rate of standard multigrid algorithms degenerates on problems with stretched grids or anisotropic operators. The usual cure for this is the use of line or plane relaxation. However, multigrid algorithms based on line and plane relaxation have limited and awkward parallelism and are quite difficult to map effectively to highly parallel architectures. Newer multigrid algorithms that overcome anisotropy through the use of multiple coarse grids rather than relaxation are better suited to massively parallel architectures because they require only simple point-relaxation smoothers. In this paper, we look at the parallel implementation of a V-cycle multiple semicoarsened grid (MSG) algorithm on distributed-memory architectures such as the Intel iPSC/860 and Paragon computers. The MSG algorithms provide two levels of parallelism: parallelism within the relaxation or interpolation on each grid and across the grids on each multigrid level. Both levels of parallelism must be exploited to map these algorithms effectively to parallel architectures. This paper describes a mapping of an MSG algorithm to distributed-memory architectures that demonstrates how both levels of parallelism can be exploited. The result is a robust and effective multigrid algorithm for distributed-memory machines.
A multigrid solution method for mixed hybrid finite elements

Energy Technology Data Exchange (ETDEWEB)

Schmid, W. [Universitaet Augsburg (Germany)

1996-12-31

We consider the multigrid solution of linear equations arising within the discretization of elliptic second order boundary value problems of the form by mixed hybrid finite elements. Using the equivalence of mixed hybrid finite elements and non-conforming nodal finite elements, we construct a multigrid scheme for the corresponding non-conforming finite elements, and, by this equivalence, for the mixed hybrid finite elements, following guidelines from Arbogast/Chen. For a rectangular triangulation of the computational domain, this non-conforming schemes are the so-called nodal finite elements. We explicitly construct prolongation and restriction operators for this type of non-conforming finite elements. We discuss the use of plain multigrid and the multilevel-preconditioned cg-method and compare their efficiency in numerical tests.
Parallel multigrid smoothing: polynomial versus Gauss-Seidel

International Nuclear Information System (INIS)

Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

2003-01-01

Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines
Parallel multigrid smoothing: polynomial versus Gauss-Seidel

Science.gov (United States)

Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

2003-07-01

Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines.
A parallel version of a multigrid algorithm for isotropic transport equations

International Nuclear Information System (INIS)

Manteuffel, T.; McCormick, S.; Yang, G.; Morel, J.; Oliveira, S.

1994-01-01

The focus of this paper is on a parallel algorithm for solving the transport equations in a slab geometry using multigrid. The spatial discretization scheme used is a finite element method called the modified linear discontinuous (MLD) scheme. The MLD scheme represents a lumped version of the standard linear discontinuous (LD) scheme. The parallel algorithm was implemented on the Connection Machine 2 (CM2). Convergence rates and timings for this algorithm on the CM2 and Cray-YMP are shown
The Mixed Finite Element Multigrid Method for Stokes Equations

Science.gov (United States)

Muzhinji, K.; Shateyi, S.; Motsa, S. S.

2015-01-01

The stable finite element discretization of the Stokes problem produces a symmetric indefinite system of linear algebraic equations. A variety of iterative solvers have been proposed for such systems in an attempt to construct efficient, fast, and robust solution techniques. This paper investigates one of such iterative solvers, the geometric multigrid solver, to find the approximate solution of the indefinite systems. The main ingredient of the multigrid method is the choice of an appropriate smoothing strategy. This study considers the application of different smoothers and compares their effects in the overall performance of the multigrid solver. We study the multigrid method with the following smoothers: distributed Gauss Seidel, inexact Uzawa, preconditioned MINRES, and Braess-Sarazin type smoothers. A comparative study of the smoothers shows that the Braess-Sarazin smoothers enhance good performance of the multigrid method. We study the problem in a two-dimensional domain using stable Hood-Taylor Q 2-Q 1 pair of finite rectangular elements. We also give the main theoretical convergence results. We present the numerical results to demonstrate the efficiency and robustness of the multigrid method and confirm the theoretical results. PMID:25945361
Large-Scale Parallel Viscous Flow Computations using an Unstructured Multigrid Algorithm

Science.gov (United States)

Mavriplis, Dimitri J.

1999-01-01

The development and testing of a parallel unstructured agglomeration multigrid algorithm for steady-state aerodynamic flows is discussed. The agglomeration multigrid strategy uses a graph algorithm to construct the coarse multigrid levels from the given fine grid, similar to an algebraic multigrid approach, but operates directly on the non-linear system using the FAS (Full Approximation Scheme) approach. The scalability and convergence rate of the multigrid algorithm are examined on the SGI Origin 2000 and the Cray T3E. An argument is given which indicates that the asymptotic scalability of the multigrid algorithm should be similar to that of its underlying single grid smoothing scheme. For medium size problems involving several million grid points, near perfect scalability is obtained for the single grid algorithm, while only a slight drop-off in parallel efficiency is observed for the multigrid V- and W-cycles, using up to 128 processors on the SGI Origin 2000, and up to 512 processors on the Cray T3E. For a large problem using 25 million grid points, good scalability is observed for the multigrid algorithm using up to 1450 processors on a Cray T3E, even when the coarsest grid level contains fewer points than the total number of processors.
Segmental Refinement: A Multigrid Technique for Data Locality

Energy Technology Data Exchange (ETDEWEB)

Adams, Mark [Columbia Univ., New York, NY (United States). Applied Physics and Applied Mathematics Dept.; Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

2014-10-27

We investigate a technique - segmental refinement (SR) - proposed by Brandt in the 1970s as a low memory multigrid method. The technique is attractive for modern computer architectures because it provides high data locality, minimizes network communication, is amenable to loop fusion, and is naturally highly parallel and asynchronous. The network communication minimization property was recognized by Brandt and Diskin in 1994; we continue this work by developing a segmental refinement method for a finite volume discretization of the 3D Laplacian on massively parallel computers. An understanding of the asymptotic complexities, required to maintain textbook multigrid efficiency, are explored experimentally with a simple SR method. A two-level memory model is developed to compare the asymptotic communication complexity of a proposed SR method with traditional parallel multigrid. Performance and scalability are evaluated with a Cray XC30 with up to 64K cores. We achieve modest improvement in scalability from traditional parallel multigrid with a simple SR implementation.
Implementation of the Vanka-type multigrid solver for the finite element approximation of the Navier-Stokes equations on GPU

Czech Academy of Sciences Publication Activity Database

Bauer, Petr; Klement, V.; Oberhuber, T.; Žabka, V.

2016-01-01

Roč. 200, March (2016), s. 50-56 ISSN 0010-4655 R&D Projects: GA ČR GB14-36566G Institutional support: RVO:61388998 Keywords : Navier–Stokes equations * mixed finite elements * multigrid * Vanka-type smoothers * Gauss–Seidel * red–black coloring * parallelization * GPU Subject RIV: BK - Fluid Dynamics Impact factor: 3.936, year: 2016
Algebraic multigrid preconditioning within parallel finite-element solvers for 3-D electromagnetic modelling problems in geophysics

Science.gov (United States)

Koldan, Jelena; Puzyrev, Vladimir; de la Puente, Josep; Houzeaux, Guillaume; Cela, José María

2014-06-01

We present an elaborate preconditioning scheme for Krylov subspace methods which has been developed to improve the performance and reduce the execution time of parallel node-based finite-element (FE) solvers for 3-D electromagnetic (EM) numerical modelling in exploration geophysics. This new preconditioner is based on algebraic multigrid (AMG) that uses different basic relaxation methods, such as Jacobi, symmetric successive over-relaxation (SSOR) and Gauss-Seidel, as smoothers and the wave front algorithm to create groups, which are used for a coarse-level generation. We have implemented and tested this new preconditioner within our parallel nodal FE solver for 3-D forward problems in EM induction geophysics. We have performed series of experiments for several models with different conductivity structures and characteristics to test the performance of our AMG preconditioning technique when combined with biconjugate gradient stabilized method. The results have shown that, the more challenging the problem is in terms of conductivity contrasts, ratio between the sizes of grid elements and/or frequency, the more benefit is obtained by using this preconditioner. Compared to other preconditioning schemes, such as diagonal, SSOR and truncated approximate inverse, the AMG preconditioner greatly improves the convergence of the iterative solver for all tested models. Also, when it comes to cases in which other preconditioners succeed to converge to a desired precision, AMG is able to considerably reduce the total execution time of the forward-problem code-up to an order of magnitude. Furthermore, the tests have confirmed that our AMG scheme ensures grid-independent rate of convergence, as well as improvement in convergence regardless of how big local mesh refinements are. In addition, AMG is designed to be a black-box preconditioner, which makes it easy to use and combine with different iterative methods. Finally, it has proved to be very practical and efficient in the
A multigrid algorithm for the cell-centered finite difference scheme

Science.gov (United States)

Ewing, Richard E.; Shen, Jian

1993-01-01

In this article, we discuss a non-variational V-cycle multigrid algorithm based on the cell-centered finite difference scheme for solving a second-order elliptic problem with discontinuous coefficients. Due to the poor approximation property of piecewise constant spaces and the non-variational nature of our scheme, one step of symmetric linear smoothing in our V-cycle multigrid scheme may fail to be a contraction. Again, because of the simple structure of the piecewise constant spaces, prolongation and restriction are trivial; we save significant computation time with very promising computational results.
A parallel finite-difference method for computational aerodynamics

International Nuclear Information System (INIS)

Swisshelm, J.M.

1989-01-01

A finite-difference scheme for solving complex three-dimensional aerodynamic flow on parallel-processing supercomputers is presented. The method consists of a basic flow solver with multigrid convergence acceleration, embedded grid refinements, and a zonal equation scheme. Multitasking and vectorization have been incorporated into the algorithm. Results obtained include multiprocessed flow simulations from the Cray X-MP and Cray-2. Speedups as high as 3.3 for the two-dimensional case and 3.5 for segments of the three-dimensional case have been achieved on the Cray-2. The entire solver attained a factor of 2.7 improvement over its unitasked version on the Cray-2. The performance of the parallel algorithm on each machine is analyzed. 14 refs
Multigrid solution of the convection-diffusion equation with high-Reynolds number

Energy Technology Data Exchange (ETDEWEB)

Zhang, Jun [George Washington Univ., Washington, DC (United States)

1996-12-31

A fourth-order compact finite difference scheme is employed with the multigrid technique to solve the variable coefficient convection-diffusion equation with high-Reynolds number. Scaled inter-grid transfer operators and potential on vectorization and parallelization are discussed. The high-order multigrid method is unconditionally stable and produces solution of 4th-order accuracy. Numerical experiments are included.
Multigrid Finite Element Method in Calculation of 3D Homogeneous and Composite Solids

Directory of Open Access Journals (Sweden)

A.D. Matveev

2016-12-01

Full Text Available In the present paper, a method of multigrid finite elements to calculate elastic three-dimensional homogeneous and composite solids under static loading has been suggested. The method has been developed based on the finite element method algorithms using homogeneous and composite three-dimensional multigrid finite elements (MFE. The procedures for construction of MFE of both rectangular parallelepiped and complex shapes have been shown. The advantages of MFE are that they take into account, following the rules of the microapproach, heterogeneous and microhomogeneous structures of the bodies, describe the three-dimensional stress-strain state (without any simplifying hypotheses in homogeneous and composite solids, as well as generate small dimensional discrete models and numerical solutions with a high accuracy.
A matrix-free implicit unstructured multigrid finite volume method for simulating structural dynamics and fluid structure interaction

Science.gov (United States)

Lv, X.; Zhao, Y.; Huang, X. Y.; Xia, G. H.; Su, X. H.

2007-07-01

A new three-dimensional (3D) matrix-free implicit unstructured multigrid finite volume (FV) solver for structural dynamics is presented in this paper. The solver is first validated using classical 2D and 3D cantilever problems. It is shown that very accurate predictions of the fundamental natural frequencies of the problems can be obtained by the solver with fast convergence rates. This method has been integrated into our existing FV compressible solver [X. Lv, Y. Zhao, et al., An efficient parallel/unstructured-multigrid preconditioned implicit method for simulating 3d unsteady compressible flows with moving objects, Journal of Computational Physics 215(2) (2006) 661-690] based on the immersed membrane method (IMM) [X. Lv, Y. Zhao, et al., as mentioned above]. Results for the interaction between the fluid and an immersed fixed-free cantilever are also presented to demonstrate the potential of this integrated fluid-structure interaction approach.

Analysis of multigrid methods on massively parallel computers: Architectural implications

Science.gov (United States)

Matheson, Lesley R.; Tarjan, Robert E.

1993-01-01

We study the potential performance of multigrid algorithms running on massively parallel computers with the intent of discovering whether presently envisioned machines will provide an efficient platform for such algorithms. We consider the domain parallel version of the standard V cycle algorithm on model problems, discretized using finite difference techniques in two and three dimensions on block structured grids of size 10(exp 6) and 10(exp 9), respectively. Our models of parallel computation were developed to reflect the computing characteristics of the current generation of massively parallel multicomputers. These models are based on an interconnection network of 256 to 16,384 message passing, 'workstation size' processors executing in an SPMD mode. The first model accomplishes interprocessor communications through a multistage permutation network. The communication cost is a logarithmic function which is similar to the costs in a variety of different topologies. The second model allows single stage communication costs only. Both models were designed with information provided by machine developers and utilize implementation derived parameters. With the medium grain parallelism of the current generation and the high fixed cost of an interprocessor communication, our analysis suggests an efficient implementation requires the machine to support the efficient transmission of long messages, (up to 1000 words) or the high initiation cost of a communication must be significantly reduced through an alternative optimization technique. Furthermore, with variable length message capability, our analysis suggests the low diameter multistage networks provide little or no advantage over a simple single stage communications network.
Toward robust scalable algebraic multigrid solvers

International Nuclear Information System (INIS)

Waisman, Haim; Schroder, Jacob; Olson, Luke; Hiriyur, Badri; Gaidamour, Jeremie; Siefert, Christopher; Hu, Jonathan Joseph; Tuminaro, Raymond Stephen

2010-01-01

This talk highlights some multigrid challenges that arise from several application areas including structural dynamics, fluid flow, and electromagnetics. A general framework is presented to help introduce and understand algebraic multigrid methods based on energy minimization concepts. Connections between algebraic multigrid prolongators and finite element basis functions are made to explored. It is shown how the general algebraic multigrid framework allows one to adapt multigrid ideas to a number of different situations. Examples are given corresponding to linear elasticity and specifically in the solution of linear systems associated with extended finite elements for fracture problems.
A Parallel Algebraic Multigrid Solver on Graphics Processing Units

KAUST Repository

Haase, Gundolf

2010-01-01

The paper presents a multi-GPU implementation of the preconditioned conjugate gradient algorithm with an algebraic multigrid preconditioner (PCG-AMG) for an elliptic model problem on a 3D unstructured grid. An efficient parallel sparse matrix-vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster and a multi-GPU configuration with eight GPUs is about 100 times faster than a typical server CPU core. © 2010 Springer-Verlag.
New Multigrid Method Including Elimination Algolithm Based on High-Order Vector Finite Elements in Three Dimensional Magnetostatic Field Analysis

Science.gov (United States)

Hano, Mitsuo; Hotta, Masashi

A new multigrid method based on high-order vector finite elements is proposed in this paper. Low level discretizations in this method are obtained by using low-order vector finite elements for the same mesh. Gauss-Seidel method is used as a smoother, and a linear equation of lowest level is solved by ICCG method. But it is often found that multigrid solutions do not converge into ICCG solutions. An elimination algolithm of constant term using a null space of the coefficient matrix is also described. In three dimensional magnetostatic field analysis, convergence time and number of iteration of this multigrid method are discussed with the convectional ICCG method.
Efficient relaxed-Jacobi smoothers for multigrid on parallel computers

Science.gov (United States)

Yang, Xiang; Mittal, Rajat

2017-03-01

In this Technical Note, we present a family of Jacobi-based multigrid smoothers suitable for the solution of discretized elliptic equations. These smoothers are based on the idea of scheduled-relaxation Jacobi proposed recently by Yang & Mittal (2014) [18] and employ two or three successive relaxed Jacobi iterations with relaxation factors derived so as to maximize the smoothing property of these iterations. The performance of these new smoothers measured in terms of convergence acceleration and computational workload, is assessed for multi-domain implementations typical of parallelized solvers, and compared to the lexicographic point Gauss-Seidel smoother. The tests include the geometric multigrid method on structured grids as well as the algebraic grid method on unstructured grids. The tests demonstrate that unlike Gauss-Seidel, the convergence of these Jacobi-based smoothers is unaffected by domain decomposition, and furthermore, they outperform the lexicographic Gauss-Seidel by factors that increase with domain partition count.
Parallel multigrid methods: implementation on message-passing computers and applications to fluid dynamics. A draft

International Nuclear Information System (INIS)

Solchenbach, K.; Thole, C.A.; Trottenberg, U.

1987-01-01

For a wide class of problems in scientific computing, in particular for partial differential equations, the multigrid principle has proved to yield highly efficient numerical methods. However, the principle has to be applied carefully: if the multigrid components are not chosen adequately with respect to the given problem, the efficiency may be much smaller than possible. This has been demonstrated for many practical problems. Unfortunately, the general theories on multigrid convergence do not give much help in constructing really efficient multigrid algorithms. Although some progress has been made in bridging the gap between theory and practice during the last few years, there are still several theoretical approaches which are misleading rather than helpful with respect to the objective of real efficiency. The research in finding highly efficient algorithms for non-model applications therefore is still a sophisticated mixture of theoretical considerations, a transfer of experiences from model to real life problems and systematical experimental work. The emphasis of the practical research activity today lies - among others - in the following fields: - finding efficient multigrid components for really complex problems, - combining the multigrid approach with advanced discretizative techniques: - constructing highly parallel multigrid algorithms. In this paper, we want to deal mainly with the last topic
The multigrid preconditioned conjugate gradient method

Science.gov (United States)

Tatebe, Osamu

1993-01-01

A multigrid preconditioned conjugate gradient method (MGCG method), which uses the multigrid method as a preconditioner of the PCG method, is proposed. The multigrid method has inherent high parallelism and improves convergence of long wavelength components, which is important in iterative methods. By using this method as a preconditioner of the PCG method, an efficient method with high parallelism and fast convergence is obtained. First, it is considered a necessary condition of the multigrid preconditioner in order to satisfy requirements of a preconditioner of the PCG method. Next numerical experiments show a behavior of the MGCG method and that the MGCG method is superior to both the ICCG method and the multigrid method in point of fast convergence and high parallelism. This fast convergence is understood in terms of the eigenvalue analysis of the preconditioned matrix. From this observation of the multigrid preconditioner, it is realized that the MGCG method converges in very few iterations and the multigrid preconditioner is a desirable preconditioner of the conjugate gradient method.
Adaptive parallel multigrid for Euler and incompressible Navier-Stokes equations

Energy Technology Data Exchange (ETDEWEB)

Trottenberg, U.; Oosterlee, K.; Ritzdorf, H. [and others

1996-12-31

The combination of (1) very efficient solution methods (Multigrid), (2) adaptivity, and (3) parallelism (distributed memory) clearly is absolutely necessary for future oriented numerics but still regarded as extremely difficult or even unsolved. We show that very nice results can be obtained for real life problems. Our approach is straightforward (based on {open_quotes}MLAT{close_quotes}). But, of course, reasonable refinement and load-balancing strategies have to be used. Our examples are 2D, but 3D is on the way.
A Parallel Multigrid Solver for Viscous Flows on Anisotropic Structured Grids

Science.gov (United States)

Prieto, Manuel; Montero, Ruben S.; Llorente, Ignacio M.; Bushnell, Dennis M. (Technical Monitor)

2001-01-01

This paper presents an efficient parallel multigrid solver for speeding up the computation of a 3-D model that treats the flow of a viscous fluid over a flat plate. The main interest of this simulation lies in exhibiting some basic difficulties that prevent optimal multigrid efficiencies from being achieved. As the computing platform, we have used Coral, a Beowulf-class system based on Intel Pentium processors and equipped with GigaNet cLAN and switched Fast Ethernet networks. Our study not only examines the scalability of the solver but also includes a performance evaluation of Coral where the investigated solver has been used to compare several of its design choices, namely, the interconnection network (GigaNet versus switched Fast-Ethernet) and the node configuration (dual nodes versus single nodes). As a reference, the performance results have been compared with those obtained with the NAS-MG benchmark.
Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained Parallelization

Energy Technology Data Exchange (ETDEWEB)

Clark, M. A. [NVIDIA Corp., Santa Clara; Joó, Bálint [Jefferson Lab; Strelchenko, Alexei [Fermilab; Cheng, Michael [Boston U., Ctr. Comp. Sci.; Gambhir, Arjun [William-Mary Coll.; Brower, Richard [Boston U.

2016-12-22

The past decade has witnessed a dramatic acceleration of lattice quantum chromodynamics calculations in nuclear and particle physics. This has been due to both significant progress in accelerating the iterative linear solvers using multi-grid algorithms, and due to the throughput improvements brought by GPUs. Deploying hierarchical algorithms optimally on GPUs is non-trivial owing to the lack of parallelism on the coarse grids, and as such, these advances have not proved multiplicative. Using the QUDA library, we demonstrate that by exposing all sources of parallelism that the underlying stencil problem possesses, and through appropriate mapping of this parallelism to the GPU architecture, we can achieve high efficiency even for the coarsest of grids. Results are presented for the Wilson-Clover discretization, where we demonstrate up to 10x speedup over present state-of-the-art GPU-accelerated methods on Titan. Finally, we look to the future, and consider the software implications of our findings.
Iterative and multigrid methods in the finite element solution of incompressible and turbulent fluid flow

Science.gov (United States)

Lavery, N.; Taylor, C.

1999-07-01

Multigrid and iterative methods are used to reduce the solution time of the matrix equations which arise from the finite element (FE) discretisation of the time-independent equations of motion of the incompressible fluid in turbulent motion. Incompressible flow is solved by using the method of reduce interpolation for the pressure to satisfy the Brezzi-Babuska condition. The k-l model is used to complete the turbulence closure problem. The non-symmetric iterative matrix methods examined are the methods of least squares conjugate gradient (LSCG), biconjugate gradient (BCG), conjugate gradient squared (CGS), and the biconjugate gradient squared stabilised (BCGSTAB). The multigrid algorithm applied is based on the FAS algorithm of Brandt, and uses two and three levels of grids with a V-cycling schedule. These methods are all compared to the non-symmetric frontal solver. Copyright
Massively Parallel Finite Element Programming

KAUST Repository

Heister, Timo; Kronbichler, Martin; Bangerth, Wolfgang

2010-01-01

Today's large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
Massively Parallel Finite Element Programming

KAUST Repository

Heister, Timo

2010-01-01

Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
The finite volume element (FVE) and multigrid method for the incompressible Navier-Stokes equations

International Nuclear Information System (INIS)

Gu Lizhen; Bao Weizhu

1992-01-01

The authors apply FVE method to discrete INS equations with the original variable, in which the bilinear square finite element and the square finite volume are chosen. The discrete schemes of INS equations are presented. The FMV multigrid algorithm is applied to solve that discrete system, where DGS iteration is used as smoother, DGS distributive mode for the INS discrete system is also presented. The sample problems for the square cavity flow with Reynolds number Re≤100 are successfully calculated. The numerical solutions show that the results with 1 FMV is satisfactory and when Re is not large, The FVE discrete scheme of the conservative INS equations and that of non-conservative INS equations with linearization both can provide almost same accuracy
Recent Development of Multigrid Algorithms for Mixed and Noncomforming Methods for Second Order Elliptical Problems

Science.gov (United States)

Chen, Zhangxin; Ewing, Richard E.

1996-01-01

Multigrid algorithms for nonconforming and mixed finite element methods for second order elliptic problems on triangular and rectangular finite elements are considered. The construction of several coarse-to-fine intergrid transfer operators for nonconforming multigrid algorithms is discussed. The equivalence between the nonconforming and mixed finite element methods with and without projection of the coefficient of the differential problems into finite element spaces is described.
Asynchronous Task-Based Parallelization of Algebraic Multigrid

KAUST Repository

AlOnazi, Amani A.

2017-06-23

As processor clock rates become more dynamic and workloads become more adaptive, the vulnerability to global synchronization that already complicates programming for performance in today\\'s petascale environment will be exacerbated. Algebraic multigrid (AMG), the solver of choice in many large-scale PDE-based simulations, scales well in the weak sense, with fixed problem size per node, on tightly coupled systems when loads are well balanced and core performance is reliable. However, its strong scaling to many cores within a node is challenging. Reducing synchronization and increasing concurrency are vital adaptations of AMG to hybrid architectures. Recent communication-reducing improvements to classical additive AMG by Vassilevski and Yang improve concurrency and increase communication-computation overlap, while retaining convergence properties close to those of standard multiplicative AMG, but remain bulk synchronous.We extend the Vassilevski and Yang additive AMG to asynchronous task-based parallelism using a hybrid MPI+OmpSs (from the Barcelona Supercomputer Center) within a node, along with MPI for internode communications. We implement a tiling approach to decompose the grid hierarchy into parallel units within task containers. We compare against the MPI-only BoomerAMG and the Auxiliary-space Maxwell Solver (AMS) in the hypre library for the 3D Laplacian operator and the electromagnetic diffusion, respectively. In time to solution for a full solve an MPI-OmpSs hybrid improves over an all-MPI approach in strong scaling at full core count (32 threads per single Haswell node of the Cray XC40) and maintains this per node advantage as both weak scale to thousands of cores, with MPI between nodes.
Toward textbook multigrid efficiency for fully implicit resistive magnetohydrodynamics

International Nuclear Information System (INIS)

Adams, Mark F.; Samtaney, Ravi; Brandt, Achi

2010-01-01

Multigrid methods can solve some classes of elliptic and parabolic equations to accuracy below the truncation error with a work-cost equivalent to a few residual calculations - so-called 'textbook' multigrid efficiency. We investigate methods to solve the system of equations that arise in time dependent magnetohydrodynamics (MHD) simulations with textbook multigrid efficiency. We apply multigrid techniques such as geometric interpolation, full approximate storage, Gauss-Seidel smoothers, and defect correction for fully implicit, nonlinear, second-order finite volume discretizations of MHD. We apply these methods to a standard resistive MHD benchmark problem, the GEM reconnection problem, and add a strong magnetic guide field, which is a critical characteristic of magnetically confined fusion plasmas. We show that our multigrid methods can achieve near textbook efficiency on fully implicit resistive MHD simulations.
Toward textbook multigrid efficiency for fully implicit resistive magnetohydrodynamics

International Nuclear Information System (INIS)

Adams, Mark F.; Samtaney, Ravi; Brandt, Achi

2013-01-01

Multigrid methods can solve some classes of elliptic and parabolic equations to accuracy below the truncation error with a work-cost equivalent to a few residual calculations so-called textbook multigrid efficiency. We investigate methods to solve the system of equations that arise in time dependent magnetohydrodynamics (MHD) simulations with textbook multigrid efficiency. We apply multigrid techniques such as geometric interpolation, full approximate storage, Gauss-Seidel smoothers, and defect correction for fully implicit, nonlinear, second-order finite volume discretizations of MHD. We apply these methods to a standard resistive MHD benchmark problem, the GEM reconnection problem, and add a strong magnetic guide field, which is a critical characteristic of magnetically confined fusion plasmas. We show that our multigrid methods can achieve near textbook efficiency on fully implicit resistive MHD simulations.
Solving the Fluid Pressure Poisson Equation Using Multigrid-Evaluation and Improvements.

Science.gov (United States)

Dick, Christian; Rogowsky, Marcus; Westermann, Rudiger

2016-11-01

In many numerical simulations of fluids governed by the incompressible Navier-Stokes equations, the pressure Poisson equation needs to be solved to enforce mass conservation. Multigrid solvers show excellent convergence in simple scenarios, yet they can converge slowly in domains where physically separated regions are combined at coarser scales. Moreover, existing multigrid solvers are tailored to specific discretizations of the pressure Poisson equation, and they cannot easily be adapted to other discretizations. In this paper we analyze the convergence properties of existing multigrid solvers for the pressure Poisson equation in different simulation domains, and we show how to further improve the multigrid convergence rate by using a graph-based extension to determine the coarse grid hierarchy. The proposed multigrid solver is generic in that it can be applied to different kinds of discretizations of the pressure Poisson equation, by using solely the specification of the simulation domain and pre-assembled computational stencils. We analyze the proposed solver in combination with finite difference and finite volume discretizations of the pressure Poisson equation. Our evaluations show that, despite the common assumption, multigrid schemes can exploit their potential even in the most complicated simulation scenarios, yet this behavior is obtained at the price of higher memory consumption.
Toward textbook multigrid efficiency for fully implicit resistive magnetohydrodynamics

KAUST Repository

Adams, Mark F.; Samtaney, Ravi; Brandt, Achi

2010-01-01

Multigrid methods can solve some classes of elliptic and parabolic equations to accuracy below the truncation error with a work-cost equivalent to a few residual calculations so-called "textbook" multigrid efficiency. We investigate methods to solve the system of equations that arise in time dependent magnetohydrodynamics (MHD) simulations with textbook multigrid efficiency. We apply multigrid techniques such as geometric interpolation, full approximate storage, Gauss-Seidel smoothers, and defect correction for fully implicit, nonlinear, second-order finite volume discretizations of MHD. We apply these methods to a standard resistive MHD benchmark problem, the GEM reconnection problem, and add a strong magnetic guide field, which is a critical characteristic of magnetically confined fusion plasmas. We show that our multigrid methods can achieve near textbook efficiency on fully implicit resistive MHD simulations. (C) 2010 Elsevier Inc. All rights reserved.

Toward textbook multigrid efficiency for fully implicit resistive magnetohydrodynamics

KAUST Repository

Adams, Mark F.

2010-09-01

Multigrid methods can solve some classes of elliptic and parabolic equations to accuracy below the truncation error with a work-cost equivalent to a few residual calculations so-called "textbook" multigrid efficiency. We investigate methods to solve the system of equations that arise in time dependent magnetohydrodynamics (MHD) simulations with textbook multigrid efficiency. We apply multigrid techniques such as geometric interpolation, full approximate storage, Gauss-Seidel smoothers, and defect correction for fully implicit, nonlinear, second-order finite volume discretizations of MHD. We apply these methods to a standard resistive MHD benchmark problem, the GEM reconnection problem, and add a strong magnetic guide field, which is a critical characteristic of magnetically confined fusion plasmas. We show that our multigrid methods can achieve near textbook efficiency on fully implicit resistive MHD simulations. (C) 2010 Elsevier Inc. All rights reserved.
Unweighted least squares phase unwrapping by means of multigrid techniques

Science.gov (United States)

Pritt, Mark D.

1995-11-01

We present a multigrid algorithm for unweighted least squares phase unwrapping. This algorithm applies Gauss-Seidel relaxation schemes to solve the Poisson equation on smaller, coarser grids and transfers the intermediate results to the finer grids. This approach forms the basis of our multigrid algorithm for weighted least squares phase unwrapping, which is described in a separate paper. The key idea of our multigrid approach is to maintain the partial derivatives of the phase data in separate arrays and to correct these derivatives at the boundaries of the coarser grids. This maintains the boundary conditions necessary for rapid convergence to the correct solution. Although the multigrid algorithm is an iterative algorithm, we demonstrate that it is nearly as fast as the direct Fourier-based method. We also describe how to parallelize the algorithm for execution on a distributed-memory parallel processor computer or a network-cluster of workstations.
Multigrid methods in structural mechanics

Science.gov (United States)

Raju, I. S.; Bigelow, C. A.; Taasan, S.; Hussaini, M. Y.

1986-01-01

Although the application of multigrid methods to the equations of elasticity has been suggested, few such applications have been reported in the literature. In the present work, multigrid techniques are applied to the finite element analysis of a simply supported Bernoulli-Euler beam, and various aspects of the multigrid algorithm are studied and explained in detail. In this study, six grid levels were used to model half the beam. With linear prolongation and sequential ordering, the multigrid algorithm yielded results which were of machine accuracy with work equivalent to 200 standard Gauss-Seidel iterations on the fine grid. Also with linear prolongation and sequential ordering, the V(1,n) cycle with n greater than 2 yielded better convergence rates than the V(n,1) cycle. The restriction and prolongation operators were derived based on energy principles. Conserving energy during the inter-grid transfers required that the prolongation operator be the transpose of the restriction operator, and led to improved convergence rates. With energy-conserving prolongation and sequential ordering, the multigrid algorithm yielded results of machine accuracy with a work equivalent to 45 Gauss-Seidel iterations on the fine grid. The red-black ordering of relaxations yielded solutions of machine accuracy in a single V(1,1) cycle, which required work equivalent to about 4 iterations on the finest grid level.
Multigrid

CERN Document Server

Trottenberg, Ulrich; Schuller, Anton

2000-01-01

Multigrid presents both an elementary introduction to multigrid methods for solving partial differential equations and a contemporary survey of advanced multigrid techniques and real-life applications.Multigrid methods are invaluable to researchers in scientific disciplines including physics, chemistry, meteorology, fluid and continuum mechanics, geology, biology, and all engineering disciplines. They are also becoming increasingly important in economics and financial mathematics.Readers are presented with an invaluable summary covering 25 years of practical experience acquired by the multigrid research group at the Germany National Research Center for Information Technology. The book presents both practical and theoretical points of view.* Covers the whole field of multigrid methods from its elements up to the most advanced applications* Style is essentially elementary but mathematically rigorous* No other book is so comprehensive and written for both practitioners and students
A multigrid method for variational inequalities

Energy Technology Data Exchange (ETDEWEB)

Oliveira, S.; Stewart, D.E.; Wu, W.

1996-12-31

Multigrid methods have been used with great success for solving elliptic partial differential equations. Penalty methods have been successful in solving finite-dimensional quadratic programs. In this paper these two techniques are combined to give a fast method for solving obstacle problems. A nonlinear penalized problem is solved using Newton`s method for large values of a penalty parameter. Multigrid methods are used to solve the linear systems in Newton`s method. The overall numerical method developed is based on an exterior penalty function, and numerical results showing the performance of the method have been obtained.
Segmental Refinement: A Multigrid Technique for Data Locality

KAUST Repository

Adams, Mark F.; Brown, Jed; Knepley, Matt; Samtaney, Ravi

2016-01-01

We investigate a domain decomposed multigrid technique, termed segmental refinement, for solving general nonlinear elliptic boundary value problems. We extend the method first proposed in 1994 by analytically and experimentally investigating its complexity. We confirm that communication of traditional parallel multigrid is eliminated on fine grids, with modest amounts of extra work and storage, while maintaining the asymptotic exactness of full multigrid. We observe an accuracy dependence on the segmental refinement subdomain size, which was not considered in the original analysis. We present a communication complexity analysis that quantifies the communication costs ameliorated by segmental refinement and report performance results with up to 64K cores on a Cray XC30.
Segmental Refinement: A Multigrid Technique for Data Locality

KAUST Repository

Adams, Mark F.

2016-08-04

We investigate a domain decomposed multigrid technique, termed segmental refinement, for solving general nonlinear elliptic boundary value problems. We extend the method first proposed in 1994 by analytically and experimentally investigating its complexity. We confirm that communication of traditional parallel multigrid is eliminated on fine grids, with modest amounts of extra work and storage, while maintaining the asymptotic exactness of full multigrid. We observe an accuracy dependence on the segmental refinement subdomain size, which was not considered in the original analysis. We present a communication complexity analysis that quantifies the communication costs ameliorated by segmental refinement and report performance results with up to 64K cores on a Cray XC30.
Advanced Algebraic Multigrid Solvers for Subsurface Flow Simulation

KAUST Repository

Chen, Meng-Huo

2015-09-13

In this research we are particularly interested in extending the robustness of multigrid solvers to encounter complex systems related to subsurface reservoir applications for flow problems in porous media. In many cases, the step for solving the pressure filed in subsurface flow simulation becomes a bottleneck for the performance of the simulator. For solving large sparse linear system arising from MPFA discretization, we choose multigrid methods as the linear solver. The possible difficulties and issues will be addressed and the corresponding remedies will be studied. As the multigrid methods are used as the linear solver, the simulator can be parallelized (although not trivial) and the high-resolution simulation become feasible, the ultimately goal which we desire to achieve.
Some multigrid algorithms for SIMD machines

Energy Technology Data Exchange (ETDEWEB)

Dendy, J.E. Jr. [Los Alamos National Lab., NM (United States)

1996-12-31

Previously a semicoarsening multigrid algorithm suitable for use on SIMD architectures was investigated. Through the use of new software tools, the performance of this algorithm has been considerably improved. The method has also been extended to three space dimensions. The method performs well for strongly anisotropic problems and for problems with coefficients jumping by orders of magnitude across internal interfaces. The parallel efficiency of this method is analyzed, and its actual performance on the CM-5 is compared with its performance on the CRAY-YMP. A standard coarsening multigrid algorithm is also considered, and we compare its performance on these two platforms as well.
An Optimal Order Nonnested Mixed Multigrid Method for Generalized Stokes Problems

Science.gov (United States)

Deng, Qingping

1996-01-01

A multigrid algorithm is developed and analyzed for generalized Stokes problems discretized by various nonnested mixed finite elements within a unified framework. It is abstractly proved by an element-independent analysis that the multigrid algorithm converges with an optimal order if there exists a 'good' prolongation operator. A technique to construct a 'good' prolongation operator for nonnested multilevel finite element spaces is proposed. Its basic idea is to introduce a sequence of auxiliary nested multilevel finite element spaces and define a prolongation operator as a composite operator of two single grid level operators. This makes not only the construction of a prolongation operator much easier (the final explicit forms of such prolongation operators are fairly simple), but the verification of the approximate properties for prolongation operators is also simplified. Finally, as an application, the framework and technique is applied to seven typical nonnested mixed finite elements.
An evaluation of parallel multigrid as a solver and a preconditioner for singular perturbed problems

Energy Technology Data Exchange (ETDEWEB)

Oosterlee, C.W. [Inst. for Algorithms and Scientific Computing, Sankt Augustin (Germany); Washio, T. [C& C Research Lab., Sankt Augustin (Germany)

1996-12-31

In this paper we try to achieve h-independent convergence with preconditioned GMRES and BiCGSTAB for 2D singular perturbed equations. Three recently developed multigrid methods are adopted as a preconditioner. They are also used as solution methods in order to compare the performance of the methods as solvers and as preconditioners. Two of the multigrid methods differ only in the transfer operators. One uses standard matrix- dependent prolongation operators from. The second uses {open_quotes}upwind{close_quotes} prolongation operators, developed. Both employ the Galerkin coarse grid approximation and an alternating zebra line Gauss-Seidel smoother. The third method is based on the block LU decomposition of a matrix and on an approximate Schur complement. This multigrid variant is presented in. All three multigrid algorithms are algebraic methods.
Higher-order ice-sheet modelling accelerated by multigrid on graphics cards

Science.gov (United States)

Brædstrup, Christian; Egholm, David

2013-04-01

Higher-order ice flow modelling is a very computer intensive process owing primarily to the nonlinear influence of the horizontal stress coupling. When applied for simulating long-term glacial landscape evolution, the ice-sheet models must consider very long time series, while both high temporal and spatial resolution is needed to resolve small effects. The use of higher-order and full stokes models have therefore seen very limited usage in this field. However, recent advances in graphics card (GPU) technology for high performance computing have proven extremely efficient in accelerating many large-scale scientific computations. The general purpose GPU (GPGPU) technology is cheap, has a low power consumption and fits into a normal desktop computer. It could therefore provide a powerful tool for many glaciologists working on ice flow models. Our current research focuses on utilising the GPU as a tool in ice-sheet and glacier modelling. To this extent we have implemented the Integrated Second-Order Shallow Ice Approximation (iSOSIA) equations on the device using the finite difference method. To accelerate the computations, the GPU solver uses a non-linear Red-Black Gauss-Seidel iterator coupled with a Full Approximation Scheme (FAS) multigrid setup to further aid convergence. The GPU finite difference implementation provides the inherent parallelization that scales from hundreds to several thousands of cores on newer cards. We demonstrate the efficiency of the GPU multigrid solver using benchmark experiments.
Algorithms for computational fluid dynamics n parallel processors

International Nuclear Information System (INIS)

Van de Velde, E.F.

1986-01-01

A study of parallel algorithms for the numerical solution of partial differential equations arising in computational fluid dynamics is presented. The actual implementation on parallel processors of shared and nonshared memory design is discussed. The performance of these algorithms is analyzed in terms of machine efficiency, communication time, bottlenecks and software development costs. For elliptic equations, a parallel preconditioned conjugate gradient method is described, which has been used to solve pressure equations discretized with high order finite elements on irregular grids. A parallel full multigrid method and a parallel fast Poisson solver are also presented. Hyperbolic conservation laws were discretized with parallel versions of finite difference methods like the Lax-Wendroff scheme and with the Random Choice method. Techniques are developed for comparing the behavior of an algorithm on different architectures as a function of problem size and local computational effort. Effective use of these advanced architecture machines requires the use of machine dependent programming. It is shown that the portability problems can be minimized by introducing high level operations on vectors and matrices structured into program libraries
Advanced Algebraic Multigrid Solvers for Subsurface Flow Simulation

KAUST Repository

Chen, Meng-Huo; Sun, Shuyu; Salama, Amgad

2015-01-01

and issues will be addressed and the corresponding remedies will be studied. As the multigrid methods are used as the linear solver, the simulator can be parallelized (although not trivial) and the high-resolution simulation become feasible, the ultimately
Design Considerations for a Flexible Multigrid Preconditioning Library

Directory of Open Access Journals (Sweden)

Jérémie Gaidamour

2012-01-01

Full Text Available MueLu is a library within the Trilinos software project [An overview of Trilinos, Technical Report SAND2003-2927, Sandia National Laboratories, 2003] and provides a framework for parallel multigrid preconditioning methods for large sparse linear systems. While providing efficient implementations of modern multigrid methods based on smoothed aggregation and energy minimization concepts, MueLu is designed to be customized and extended. This article gives an overview of design considerations for the MueLu package: user interfaces, internal design, data management, usage of modern software constructs, leveraging Trilinos capabilities, linear algebra operations and advanced application.
Electrical Resistivity Tomography using a finite element based BFGS algorithm with algebraic multigrid preconditioning

Science.gov (United States)

Codd, A. L.; Gross, L.

2018-03-01

We present a new inversion method for Electrical Resistivity Tomography which, in contrast to established approaches, minimizes the cost function prior to finite element discretization for the unknown electric conductivity and electric potential. Minimization is performed with the Broyden-Fletcher-Goldfarb-Shanno method (BFGS) in an appropriate function space. BFGS is self-preconditioning and avoids construction of the dense Hessian which is the major obstacle to solving large 3-D problems using parallel computers. In addition to the forward problem predicting the measurement from the injected current, the so-called adjoint problem also needs to be solved. For this problem a virtual current is injected through the measurement electrodes and an adjoint electric potential is obtained. The magnitude of the injected virtual current is equal to the misfit at the measurement electrodes. This new approach has the advantage that the solution process of the optimization problem remains independent to the meshes used for discretization and allows for mesh adaptation during inversion. Computation time is reduced by using superposition of pole loads for the forward and adjoint problems. A smoothed aggregation algebraic multigrid (AMG) preconditioned conjugate gradient is applied to construct the potentials for a given electric conductivity estimate and for constructing a first level BFGS preconditioner. Through the additional reuse of AMG operators and coarse grid solvers inversion time for large 3-D problems can be reduced further. We apply our new inversion method to synthetic survey data created by the resistivity profile representing the characteristics of subsurface fluid injection. We further test it on data obtained from a 2-D surface electrode survey on Heron Island, a small tropical island off the east coast of central Queensland, Australia.
Multigrid methods III

CERN Document Server

Trottenberg, U; Third European Conference on Multigrid Methods

1991-01-01

These proceedings contain a selection of papers presented at the Third European Conference on Multigrid Methods which was held in Bonn on October 1-4, 1990. Following conferences in 1981 and 1985, a platform for the presentation of new Multigrid results was provided for a third time. Multigrid methods no longer have problems being accepted by numerical analysts and users of numerical methods; on the contrary, they have been further developed in such a successful way that they have penetrated a variety of new fields of application. The high number of 154 participants from 18 countries and 76 presented papers show the need to continue the series of the European Multigrid Conferences. The papers of this volume give a survey on the current Multigrid situation; in particular, they correspond to those fields where new developments can be observed. For example, se veral papers study the appropriate treatment of time dependent problems. Improvements can also be noticed in the Multigrid approach for semiconductor eq...
Multigrid and defect correction for the steady Navier-Stokes equations

NARCIS (Netherlands)

Koren, B.

1990-01-01

Theoretical and experimental convergence results are presented for nonlinear multigrid and iterative defect correction applied to finite volume discretizations of the full, steady, 2D, compressible Navier-Stokes equations. Iterative defect correction is introduced for circumventing the difficulty in
Multigrid Reduction in Time for Nonlinear Parabolic Problems

Energy Technology Data Exchange (ETDEWEB)

Falgout, R. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Manteuffel, T. A. [Univ. of Colorado, Boulder, CO (United States); O' Neill, B. [Univ. of Colorado, Boulder, CO (United States); Schroder, J. B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2016-01-04

The need for parallel-in-time is being driven by changes in computer architectures, where future speed-ups will be available through greater concurrency, but not faster clock speeds, which are stagnant.This leads to a bottleneck for sequential time marching schemes, because they lack parallelism in the time dimension. Multigrid Reduction in Time (MGRIT) is an iterative procedure that allows for temporal parallelism by utilizing multigrid reduction techniques and a multilevel hierarchy of coarse time grids. MGRIT has been shown to be effective for linear problems, with speedups of up to 50 times. The goal of this work is the efficient solution of nonlinear problems with MGRIT, where efficient is defined as achieving similar performance when compared to a corresponding linear problem. As our benchmark, we use the p-Laplacian, where p = 4 corresponds to a well-known nonlinear diffusion equation and p = 2 corresponds to our benchmark linear diffusion problem. When considering linear problems and implicit methods, the use of optimal spatial solvers such as spatial multigrid imply that the cost of one time step evaluation is fixed across temporal levels, which have a large variation in time step sizes. This is not the case for nonlinear problems, where the work required increases dramatically on coarser time grids, where relatively large time steps lead to worse conditioned nonlinear solves and increased nonlinear iteration counts per time step evaluation. This is the key difficulty explored by this paper. We show that by using a variety of strategies, most importantly, spatial coarsening and an alternate initial guess to the nonlinear time-step solver, we can reduce the work per time step evaluation over all temporal levels to a range similar with the corresponding linear problem. This allows for parallel scaling behavior comparable to the corresponding linear problem.
A Pseudo-Temporal Multi-Grid Relaxation Scheme for Solving the Parabolized Navier-Stokes Equations

Science.gov (United States)

White, J. A.; Morrison, J. H.

1999-01-01

A multi-grid, flux-difference-split, finite-volume code, VULCAN, is presented for solving the elliptic and parabolized form of the equations governing three-dimensional, turbulent, calorically perfect and non-equilibrium chemically reacting flows. The space marching algorithms developed to improve convergence rate and or reduce computational cost are emphasized. The algorithms presented are extensions to the class of implicit pseudo-time iterative, upwind space-marching schemes. A full approximate storage, full multi-grid scheme is also described which is used to accelerate the convergence of a Gauss-Seidel relaxation method. The multi-grid algorithm is shown to significantly improve convergence on high aspect ratio grids.

Non-Galerkin Coarse Grids for Algebraic Multigrid

Energy Technology Data Exchange (ETDEWEB)

Falgout, Robert D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Schroder, Jacob B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2014-06-26

Algebraic multigrid (AMG) is a popular and effective solver for systems of linear equations that arise from discretized partial differential equations. And while AMG has been effectively implemented on large scale parallel machines, challenges remain, especially when moving to exascale. Particularly, stencil sizes (the number of nonzeros in a row) tend to increase further down in the coarse grid hierarchy, and this growth leads to more communication. Therefore, as problem size increases and the number of levels in the hierarchy grows, the overall efficiency of the parallel AMG method decreases, sometimes dramatically. This growth in stencil size is due to the standard Galerkin coarse grid operator, $P^T A P$, where $P$ is the prolongation (i.e., interpolation) operator. For example, the coarse grid stencil size for a simple three-dimensional (3D) seven-point finite differencing approximation to diffusion can increase into the thousands on present day machines, causing an associated increase in communication costs. We therefore consider algebraically truncating coarse grid stencils to obtain a non-Galerkin coarse grid. First, the sparsity pattern of the non-Galerkin coarse grid is determined by employing a heuristic minimal “safe” pattern together with strength-of-connection ideas. Second, the nonzero entries are determined by collapsing the stencils in the Galerkin operator using traditional AMG techniques. The result is a reduction in coarse grid stencil size, overall operator complexity, and parallel AMG solve phase times.
On a multigrid method for the coupled Stokes and porous media flow problem

Science.gov (United States)

Luo, P.; Rodrigo, C.; Gaspar, F. J.; Oosterlee, C. W.

2017-07-01

The multigrid solution of coupled porous media and Stokes flow problems is considered. The Darcy equation as the saturated porous medium model is coupled to the Stokes equations by means of appropriate interface conditions. We focus on an efficient multigrid solution technique for the coupled problem, which is discretized by finite volumes on staggered grids, giving rise to a saddle point linear system. Special treatment is required regarding the discretization at the interface. An Uzawa smoother is employed in multigrid, which is a decoupled procedure based on symmetric Gauss-Seidel smoothing for velocity components and a simple Richardson iteration for the pressure field. Since a relaxation parameter is part of a Richardson iteration, Local Fourier Analysis (LFA) is applied to determine the optimal parameters. Highly satisfactory multigrid convergence is reported, and, moreover, the algorithm performs well for small values of the hydraulic conductivity and fluid viscosity, that are relevant for applications.
Finite element electromagnetic field computation on the Sequent Symmetry 81 parallel computer

International Nuclear Information System (INIS)

Ratnajeevan, S.; Hoole, H.

1990-01-01

Finite element field analysis algorithms lend themselves to parallelization and this fact is exploited in this paper to implement a finite element analysis program for electromagnetic field computation on the Sequent Symmetry 81 parallel computer with three processors. In terms of waiting time, the maximum gains are to be made in matrix solution and therefore this paper concentrates on the gains in parallelizing the solution part of finite element analysis. An outline of how parallelization could be exploited in most finite element operations is given in this paper although the actual implemention of parallelism on the Sequent Symmetry 81 parallel computer was in sparsity computation, matrix assembly and the matrix solution areas. In all cases, the algorithms were modified suit the parallel programming application rather than allowing the compiler to parallelize on existing algorithms
Finite volume multigrid method of the planar contraction flow of a viscoelastic fluid

Science.gov (United States)

Moatssime, H. Al; Esselaoui, D.; Hakim, A.; Raghay, S.

2001-08-01

This paper reports on a numerical algorithm for the steady flow of viscoelastic fluid. The conservative and constitutive equations are solved using the finite volume method (FVM) with a hybrid scheme for the velocities and first-order upwind approximation for the viscoelastic stress. A non-uniform staggered grid system is used. The iterative SIMPLE algorithm is employed to relax the coupled momentum and continuity equations. The non-linear algebraic equations over the flow domain are solved iteratively by the symmetrical coupled Gauss-Seidel (SCGS) method. In both, the full approximation storage (FAS) multigrid algorithm is used. An Oldroyd-B fluid model was selected for the calculation. Results are reported for planar 4:1 abrupt contraction at various Weissenberg numbers. The solutions are found to be stable and smooth. The solutions show that at high Weissenberg number the domain must be long enough. The convergence of the method has been verified with grid refinement. All the calculations have been performed on a PC equipped with a Pentium III processor at 550 MHz. Copyright
Totally parallel multilevel algorithms

Science.gov (United States)

Frederickson, Paul O.

1988-01-01

Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.
Multigrid treatment of implicit continuum diffusion

Science.gov (United States)

Francisquez, Manaure; Zhu, Ben; Rogers, Barrett

2017-10-01

Implicit treatment of diffusive terms of various differential orders common in continuum mechanics modeling, such as computational fluid dynamics, is investigated with spectral and multigrid algorithms in non-periodic 2D domains. In doubly periodic time dependent problems these terms can be efficiently and implicitly handled by spectral methods, but in non-periodic systems solved with distributed memory parallel computing and 2D domain decomposition, this efficiency is lost for large numbers of processors. We built and present here a multigrid algorithm for these types of problems which outperforms a spectral solution that employs the highly optimized FFTW library. This multigrid algorithm is not only suitable for high performance computing but may also be able to efficiently treat implicit diffusion of arbitrary order by introducing auxiliary equations of lower order. We test these solvers for fourth and sixth order diffusion with idealized harmonic test functions as well as a turbulent 2D magnetohydrodynamic simulation. It is also shown that an anisotropic operator without cross-terms can improve model accuracy and speed, and we examine the impact that the various diffusion operators have on the energy, the enstrophy, and the qualitative aspect of a simulation. This work was supported by DOE-SC-0010508. This research used resources of the National Energy Research Scientific Computing Center (NERSC).
On the Parallel Elliptic Single/Multigrid Solutions about Aligned and Nonaligned Bodies Using the Virtual Machine for Multiprocessors

Directory of Open Access Journals (Sweden)

A. Averbuch

1994-01-01

Full Text Available Parallel elliptic single/multigrid solutions around an aligned and nonaligned body are presented and implemented on two multi-user and single-user shared memory multiprocessors (Sequent Symmetry and MOS and on a distributed memory multiprocessor (a Transputer network. Our parallel implementation uses the Virtual Machine for Muli-Processors (VMMP, a software package that provides a coherent set of services for explicitly parallel application programs running on diverse multiple instruction multiple data (MIMD multiprocessors, both shared memory and message passing. VMMP is intended to simplify parallel program writing and to promote portable and efficient programming. Furthermore, it ensures high portability of application programs by implementing the same services on all target multiprocessors. The performance of our algorithm is investigated in detail. It is seen to fit well the above architectures when the number of processors is less than the maximal number of grid points along the axes. In general, the efficiency in the nonaligned case is higher than in the aligned case. Alignment overhead is observed to be up to 200% in the shared-memory case and up to 65% in the message-passing case. We have demonstrated that when using VMMP, the portability of the algorithms is straightforward and efficient.
Multigrid and defect correction for the steady Navier-Stokes equations : application to aerodynamics

NARCIS (Netherlands)

Koren, B.

1991-01-01

Theoretical and expcrimental convergence results are presented for nonlinear multigrid and iterative defect correction applied to finite volume discretizations of the full, steady, 2D, compressible NavierStokes equations. lterative defect correction is introduced for circumventing the difficulty in
Wing-Body Aeroelasticity Using Finite-Difference Fluid/Finite-Element Structural Equations on Parallel Computers

Science.gov (United States)

Byun, Chansup; Guruswamy, Guru P.; Kutler, Paul (Technical Monitor)

1994-01-01

In recent years significant advances have been made for parallel computers in both hardware and software. Now parallel computers have become viable tools in computational mechanics. Many application codes developed on conventional computers have been modified to benefit from parallel computers. Significant speedups in some areas have been achieved by parallel computations. For single-discipline use of both fluid dynamics and structural dynamics, computations have been made on wing-body configurations using parallel computers. However, only a limited amount of work has been completed in combining these two disciplines for multidisciplinary applications. The prime reason is the increased level of complication associated with a multidisciplinary approach. In this work, procedures to compute aeroelasticity on parallel computers using direct coupling of fluid and structural equations will be investigated for wing-body configurations. The parallel computer selected for computations is an Intel iPSC/860 computer which is a distributed-memory, multiple-instruction, multiple data (MIMD) computer with 128 processors. In this study, the computational efficiency issues of parallel integration of both fluid and structural equations will be investigated in detail. The fluid and structural domains will be modeled using finite-difference and finite-element approaches, respectively. Results from the parallel computer will be compared with those from the conventional computers using a single processor. This study will provide an efficient computational tool for the aeroelastic analysis of wing-body structures on MIMD type parallel computers.
Parallel computing solution of Boltzmann neutron transport equation

International Nuclear Information System (INIS)

Ansah-Narh, T.

2010-01-01

The focus of the research was on developing parallel computing algorithm for solving Eigen-values of the Boltzmam Neutron Transport Equation (BNTE) in a slab geometry using multi-grid approach. In response to the problem of slow execution of serial computing when solving large problems, such as BNTE, the study was focused on the design of parallel computing systems which was an evolution of serial computing that used multiple processing elements simultaneously to solve complex physical and mathematical problems. Finite element method (FEM) was used for the spatial discretization scheme, while angular discretization was accomplished by expanding the angular dependence in terms of Legendre polynomials. The eigenvalues representing the multiplication factors in the BNTE were determined by the power method. MATLAB Compiler Version 4.1 (R2009a) was used to compile the MATLAB codes of BNTE. The implemented parallel algorithms were enabled with matlabpool, a Parallel Computing Toolbox function. The option UseParallel was set to 'always' and the default value of the option was 'never'. When those conditions held, the solvers computed estimated gradients in parallel. The parallel computing system was used to handle all the bottlenecks in the matrix generated from the finite element scheme and each domain of the power method generated. The parallel algorithm was implemented on a Symmetric Multi Processor (SMP) cluster machine, which had Intel 32 bit quad-core x 86 processors. Convergence rates and timings for the algorithm on the SMP cluster machine were obtained. Numerical experiments indicated the designed parallel algorithm could reach perfect speedup and had good stability and scalability. (au)
Multigrid Methods for EHL Problems

Science.gov (United States)

Nurgat, Elyas; Berzins, Martin

1996-01-01

In many bearings and contacts, forces are transmitted through thin continuous fluid films which separate two contacting elements. Objects in contact are normally subjected to friction and wear which can be reduced effectively by using lubricants. If the lubricant film is sufficiently thin to prevent the opposing solids from coming into contact and carries the entire load, then we have hydrodynamic lubrication, where the lubricant film is determined by the motion and geometry of the solids. However, for loaded contacts of low geometrical conformity, such as gears, rolling contact bearings and cams, this is not the case due to high pressures and this is referred to as Elasto-Hydrodynamic Lubrication (EHL) In EHL, elastic deformation of the contacting elements and the increase in fluid viscosity with pressure are very significant and cannot be ignored. Since the deformation results in changing the geometry of the lubricating film, which in turn determines the pressure distribution, an EHL mathematical model must simultaneously satisfy the complex elasticity (integral) and the Reynolds lubrication (differential) equations. The nonlinear and coupled nature of the two equations makes numerical calculations computationally intensive. This is especially true for highly loaded problems found in practice. One novel feature of these problems is that the solution may exhibit sharp pressure spikes in the outlet region. To this date both finite element and finite difference methods have been used to solve EHL problems with perhaps greater emphasis on the use of the finite difference approach. In both cases, a major computational difficulty is ensuring convergence of the nonlinear equations solver to a steady state solution. Two successful methods for achieving this are direct iteration and multigrid methods. Direct iteration methods (e.g Gauss Seidel) have long been used in conjunction with finite difference discretizations on regular meshes. Perhaps one of the best examples of
Fast multigrid-based computation of the induced electric field for transcranial magnetic stimulation

Science.gov (United States)

Laakso, Ilkka; Hirata, Akimasa

2012-12-01

In transcranial magnetic stimulation (TMS), the distribution of the induced electric field, and the affected brain areas, depends on the position of the stimulation coil and the individual geometry of the head and brain. The distribution of the induced electric field in realistic anatomies can be modelled using computational methods. However, existing computational methods for accurately determining the induced electric field in realistic anatomical models have suffered from long computation times, typically in the range of tens of minutes or longer. This paper presents a matrix-free implementation of the finite-element method with a geometric multigrid method that can potentially reduce the computation time to several seconds or less even when using an ordinary computer. The performance of the method is studied by computing the induced electric field in two anatomically realistic models. An idealized two-loop coil is used as the stimulating coil. Multiple computational grid resolutions ranging from 2 to 0.25 mm are used. The results show that, for macroscopic modelling of the electric field in an anatomically realistic model, computational grid resolutions of 1 mm or 2 mm appear to provide good numerical accuracy compared to higher resolutions. The multigrid iteration typically converges in less than ten iterations independent of the grid resolution. Even without parallelization, each iteration takes about 1.0 s or 0.1 s for the 1 and 2 mm resolutions, respectively. This suggests that calculating the electric field with sufficient accuracy in real time is feasible.
Nonlinear Multigrid solver exploiting AMGe Coarse Spaces with Approximation Properties

DEFF Research Database (Denmark)

Christensen, Max la Cour; Villa, Umberto; Engsig-Karup, Allan Peter

The paper introduces a nonlinear multigrid solver for mixed finite element discretizations based on the Full Approximation Scheme (FAS) and element-based Algebraic Multigrid (AMGe). The main motivation to use FAS for unstructured problems is the guaranteed approximation property of the AMGe coarse...... properties of the coarse spaces. With coarse spaces with approximation properties, our FAS approach on unstructured meshes has the ability to be as powerful/successful as FAS on geometrically refined meshes. For comparison, Newton’s method and Picard iterations with an inner state-of-the-art linear solver...... are compared to FAS on a nonlinear saddle point problem with applications to porous media flow. It is demonstrated that FAS is faster than Newton’s method and Picard iterations for the experiments considered here. Due to the guaranteed approximation properties of our AMGe, the coarse spaces are very accurate...
Parallel iterative procedures for approximate solutions of wave propagation by finite element and finite difference methods

Energy Technology Data Exchange (ETDEWEB)

Kim, S. [Purdue Univ., West Lafayette, IN (United States)

1994-12-31

Parallel iterative procedures based on domain decomposition techniques are defined and analyzed for the numerical solution of wave propagation by finite element and finite difference methods. For finite element methods, in a Lagrangian framework, an efficient way for choosing the algorithm parameter as well as the algorithm convergence are indicated. Some heuristic arguments for finding the algorithm parameter for finite difference schemes are addressed. Numerical results are presented to indicate the effectiveness of the methods.
Second order finite-difference ghost-point multigrid methods for elliptic problems with discontinuous coefficients on an arbitrary interface

Science.gov (United States)

Coco, Armando; Russo, Giovanni

2018-05-01

In this paper we propose a second-order accurate numerical method to solve elliptic problems with discontinuous coefficients (with general non-homogeneous jumps in the solution and its gradient) in 2D and 3D. The method consists of a finite-difference method on a Cartesian grid in which complex geometries (boundaries and interfaces) are embedded, and is second order accurate in the solution and the gradient itself. In order to avoid the drop in accuracy caused by the discontinuity of the coefficients across the interface, two numerical values are assigned on grid points that are close to the interface: a real value, that represents the numerical solution on that grid point, and a ghost value, that represents the numerical solution extrapolated from the other side of the interface, obtained by enforcing the assigned non-homogeneous jump conditions on the solution and its flux. The method is also extended to the case of matrix coefficient. The linear system arising from the discretization is solved by an efficient multigrid approach. Unlike the 1D case, grid points are not necessarily aligned with the normal derivative and therefore suitable stencils must be chosen to discretize interface conditions in order to achieve second order accuracy in the solution and its gradient. A proper treatment of the interface conditions will allow the multigrid to attain the optimal convergence factor, comparable with the one obtained by Local Fourier Analysis for rectangular domains. The method is robust enough to handle large jump in the coefficients: order of accuracy, monotonicity of the errors and good convergence factor are maintained by the scheme.
Multigrid solution of diffusion equations on distributed memory multiprocessor systems

International Nuclear Information System (INIS)

Finnemann, H.

1988-01-01

The subject is the solution of partial differential equations for simulation of the reactor core on high-performance computers. The parallelization and implementation of nodal multigrid diffusion algorithms on array and ring configurations of the DIRMU multiprocessor system is outlined. The particular iteration scheme employed in the nodal expansion method appears similarly efficient in serial and parallel environments. The combination of modern multi-level techniques with innovative hardware (vector-multiprocessor systems) provides powerful tools needed for real time simulation of physical systems. The parallel efficiencies range from 70 to 90%. The same performance is estimated for large problems on large multiprocessor systems being designed at present. (orig.) [de
Summary Report: Multigrid for Systems of Elliptic PDEs

Energy Technology Data Exchange (ETDEWEB)

Lee, Barry [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2016-11-17

We are interested in determining if multigrid can be effectively applied to the system. The conclusion that I seem to be drawn to is that it is impossible to develop a blackbox multigrid solver for these general systems. Analysis of the system of PDEs must be conducted first to determine pre-processing procedures on the continuous problem before applying a multigrid method. Determining this pre-processing is currently not incorporated in black-box multigrid strategies. Nevertheless, we characterize some system features that will make the system more amenable to multigrid approaches, techniques that may lead to more amenable systems, and multigrid procedures that are generally more appropriate for these systems.
Node-based finite element method for large-scale adaptive fluid analysis in parallel environments

International Nuclear Information System (INIS)

Toshimitsu, Fujisawa; Genki, Yagawa

2003-01-01

In this paper, a FEM-based (finite element method) mesh free method with a probabilistic node generation technique is presented. In the proposed method, all computational procedures, from the mesh generation to the solution of a system of equations, can be performed fluently in parallel in terms of nodes. Local finite element mesh is generated robustly around each node, even for harsh boundary shapes such as cracks. The algorithm and the data structure of finite element calculation are based on nodes, and parallel computing is realized by dividing a system of equations by the row of the global coefficient matrix. In addition, the node-based finite element method is accompanied by a probabilistic node generation technique, which generates good-natured points for nodes of finite element mesh. Furthermore, the probabilistic node generation technique can be performed in parallel environments. As a numerical example of the proposed method, we perform a compressible flow simulation containing strong shocks. Numerical simulations with frequent mesh refinement, which are required for such kind of analysis, can effectively be performed on parallel processors by using the proposed method. (authors)
Node-based finite element method for large-scale adaptive fluid analysis in parallel environments

Energy Technology Data Exchange (ETDEWEB)

Toshimitsu, Fujisawa [Tokyo Univ., Collaborative Research Center of Frontier Simulation Software for Industrial Science, Institute of Industrial Science (Japan); Genki, Yagawa [Tokyo Univ., Department of Quantum Engineering and Systems Science (Japan)

2003-07-01

In this paper, a FEM-based (finite element method) mesh free method with a probabilistic node generation technique is presented. In the proposed method, all computational procedures, from the mesh generation to the solution of a system of equations, can be performed fluently in parallel in terms of nodes. Local finite element mesh is generated robustly around each node, even for harsh boundary shapes such as cracks. The algorithm and the data structure of finite element calculation are based on nodes, and parallel computing is realized by dividing a system of equations by the row of the global coefficient matrix. In addition, the node-based finite element method is accompanied by a probabilistic node generation technique, which generates good-natured points for nodes of finite element mesh. Furthermore, the probabilistic node generation technique can be performed in parallel environments. As a numerical example of the proposed method, we perform a compressible flow simulation containing strong shocks. Numerical simulations with frequent mesh refinement, which are required for such kind of analysis, can effectively be performed on parallel processors by using the proposed method. (authors)
Parallel eigenanalysis of finite element models in a completely connected architecture

Science.gov (United States)

Akl, F. A.; Morel, M. R.

1989-01-01

A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi) = (M)(phi)(omega), where (K) and (M) are of order N, and (omega) is order of q. The concurrent solution of the eigenproblem is based on the multifrontal/modified subspace method and is achieved in a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm was successfully implemented on a tightly coupled multiple-instruction multiple-data parallel processing machine, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macrotasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. A parallel finite element dynamic analysis program, p-feda, is documented and the performance of its subroutines in parallel environment is analyzed.

Element-topology-independent preconditioners for parallel finite element computations

Science.gov (United States)

Park, K. C.; Alexander, Scott

1992-01-01

A family of preconditioners for the solution of finite element equations are presented, which are element-topology independent and thus can be applicable to element order-free parallel computations. A key feature of the present preconditioners is the repeated use of element connectivity matrices and their left and right inverses. The properties and performance of the present preconditioners are demonstrated via beam and two-dimensional finite element matrices for implicit time integration computations.
The Closest Point Method and Multigrid Solvers for Elliptic Equations on Surfaces

KAUST Repository

Chen, Yujia

2015-01-01

© 2015 Society for Industrial and Applied Mathematics. Elliptic partial differential equations are important from both application and analysis points of view. In this paper we apply the closest point method to solve elliptic equations on general curved surfaces. Based on the closest point representation of the underlying surface, we formulate an embedding equation for the surface elliptic problem, then discretize it using standard finite differences and interpolation schemes on banded but uniform Cartesian grids. We prove the convergence of the difference scheme for the Poisson\\'s equation on a smooth closed curve. In order to solve the resulting large sparse linear systems, we propose a specific geometric multigrid method in the setting of the closest point method. Convergence studies in both the accuracy of the difference scheme and the speed of the multigrid algorithm show that our approaches are effective.
Vectorization and parallelization of the finite strip method for dynamic Mindlin plate problems

Science.gov (United States)

Chen, Hsin-Chu; He, Ai-Fang

1993-01-01

The finite strip method is a semi-analytical finite element process which allows for a discrete analysis of certain types of physical problems by discretizing the domain of the problem into finite strips. This method decomposes a single large problem into m smaller independent subproblems when m harmonic functions are employed, thus yielding natural parallelism at a very high level. In this paper we address vectorization and parallelization strategies for the dynamic analysis of simply-supported Mindlin plate bending problems and show how to prevent potential conflicts in memory access during the assemblage process. The vector and parallel implementations of this method and the performance results of a test problem under scalar, vector, and vector-concurrent execution modes on the Alliant FX/80 are also presented.
A finite element solution method for quadrics parallel computer

International Nuclear Information System (INIS)

Zucchini, A.

1996-08-01

A distributed preconditioned conjugate gradient method for finite element analysis has been developed and implemented on a parallel SIMD Quadrics computer. The main characteristic of the method is that it does not require any actual assembling of all element equations in a global system. The physical domain of the problem is partitioned in cells of n p finite elements and each cell element is assigned to a different node of an n p -processors machine. Element stiffness matrices are stored in the data memory of the assigned processing node and the solution process is completely executed in parallel at element level. Inter-element and therefore inter-processor communications are required once per iteration to perform local sums of vector quantities between neighbouring elements. A prototype implementation has been tested on an 8-nodes Quadrics machine in a simple 2D benchmark problem
Subroutine MLTGRD: a multigrid algorithm based on multiplicative correction and implicit non-stationary iteration

International Nuclear Information System (INIS)

Barry, J.M.; Pollard, J.P.

1986-11-01

A FORTRAN subroutine MLTGRD is provided to solve efficiently the large systems of linear equations arising from a five-point finite difference discretisation of some elliptic partial differential equations. MLTGRD is a multigrid algorithm which provides multiplicative correction to iterative solution estimates from successively reduced systems of linear equations. It uses the method of implicit non-stationary iteration for all grid levels
Abstract Level Parallelization of Finite Difference Methods

Directory of Open Access Journals (Sweden)

Edwin Vollebregt

1997-01-01

Full Text Available A formalism is proposed for describing finite difference calculations in an abstract way. The formalism consists of index sets and stencils, for characterizing the structure of sets of data items and interactions between data items (“neighbouring relations”. The formalism provides a means for lifting programming to a more abstract level. This simplifies the tasks of performance analysis and verification of correctness, and opens the way for automaticcode generation. The notation is particularly useful in parallelization, for the systematic construction of parallel programs in a process/channel programming paradigm (e.g., message passing. This is important because message passing, unfortunately, still is the only approach that leads to acceptable performance for many more unstructured or irregular problems on parallel computers that have non-uniform memory access times. It will be shown that the use of index sets and stencils greatly simplifies the determination of which data must be exchanged between different computing processes.
Parallel discontinuous Galerkin FEM for computing hyperbolic conservation law on unstructured grids

Science.gov (United States)

Ma, Xinrong; Duan, Zhijian

2018-04-01

High-order resolution Discontinuous Galerkin finite element methods (DGFEM) has been known as a good method for solving Euler equations and Navier-Stokes equations on unstructured grid, but it costs too much computational resources. An efficient parallel algorithm was presented for solving the compressible Euler equations. Moreover, the multigrid strategy based on three-stage three-order TVD Runge-Kutta scheme was used in order to improve the computational efficiency of DGFEM and accelerate the convergence of the solution of unsteady compressible Euler equations. In order to make each processor maintain load balancing, the domain decomposition method was employed. Numerical experiment performed for the inviscid transonic flow fluid problems around NACA0012 airfoil and M6 wing. The results indicated that our parallel algorithm can improve acceleration and efficiency significantly, which is suitable for calculating the complex flow fluid.
Discrete Fourier analysis of multigrid algorithms

NARCIS (Netherlands)

van der Vegt, Jacobus J.W.; Rhebergen, Sander

2011-01-01

The main topic of this report is a detailed discussion of the discrete Fourier multilevel analysis of multigrid algorithms. First, a brief overview of multigrid methods is given for discretizations of both linear and nonlinear partial differential equations. Special attention is given to the
Extending the applicability of multigrid methods

International Nuclear Information System (INIS)

Brannick, J; Brezina, M; Falgout, R; Manteuffel, T; McCormick, S; Ruge, J; Sheehan, B; Xu, J; Zikatanov, L

2006-01-01

Multigrid methods are ideal for solving the increasingly large-scale problems that arise in numerical simulations of physical phenomena because of their potential for computational costs and memory requirements that scale linearly with the degrees of freedom. Unfortunately, they have been historically limited by their applicability to elliptic-type problems and the need for special handling in their implementation. In this paper, we present an overview of several recent theoretical and algorithmic advances made by the TOPS multigrid partners and their collaborators in extending applicability of multigrid methods. specific examples that are presented include quantum chromodynamics, radiation transport, and electromagnetics
Distance-two interpolation for parallel algebraic multigrid

International Nuclear Information System (INIS)

Sterck, H de; Falgout, R D; Nolting, J W; Yang, U M

2007-01-01

In this paper we study the use of long distance interpolation methods with the low complexity coarsening algorithm PMIS. AMG performance and scalability is compared for classical as well as long distance interpolation methods on parallel computers. It is shown that the increased interpolation accuracy largely restores the scalability of AMG convergence factors for PMIS-coarsened grids, and in combination with complexity reducing methods, such as interpolation truncation, one obtains a class of parallel AMG methods that enjoy excellent scalability properties on large parallel computers
Implementation of a high performance parallel finite element micromagnetics package

International Nuclear Information System (INIS)

Scholz, W.; Suess, D.; Dittrich, R.; Schrefl, T.; Tsiantos, V.; Forster, H.; Fidler, J.

2004-01-01

A new high performance scalable parallel finite element micromagnetics package has been implemented. It includes solvers for static energy minimization, time integration of the Landau-Lifshitz-Gilbert equation, and the nudged elastic band method
Self-correcting Multigrid Solver

International Nuclear Information System (INIS)

Lewandowski, Jerome L.V.

2004-01-01

A new multigrid algorithm based on the method of self-correction for the solution of elliptic problems is described. The method exploits information contained in the residual to dynamically modify the source term (right-hand side) of the elliptic problem. It is shown that the self-correcting solver is more efficient at damping the short wavelength modes of the algebraic error than its standard equivalent. When used in conjunction with a multigrid method, the resulting solver displays an improved convergence rate with no additional computational work
Parallel finite elements with domain decomposition and its pre-processing

International Nuclear Information System (INIS)

Yoshida, A.; Yagawa, G.; Hamada, S.

1993-01-01

This paper describes a parallel finite element analysis using a domain decomposition method, and the pre-processing for the parallel calculation. Computer simulations are about to replace experiments in various fields, and the scale of model to be simulated tends to be extremely large. On the other hand, computational environment has drastically changed in these years. Especially, parallel processing on massively parallel computers or computer networks is considered to be promising techniques. In order to achieve high efficiency on such parallel computation environment, large granularity of tasks, a well-balanced workload distribution are key issues. It is also important to reduce the cost of pre-processing in such parallel FEM. From the point of view, the authors developed the domain decomposition FEM with the automatic and dynamic task-allocation mechanism and the automatic mesh generation/domain subdivision system for it. (author)
Spectral analysis and multigrid preconditioners for two-dimensional space-fractional diffusion equations

Science.gov (United States)

Moghaderi, Hamid; Dehghan, Mehdi; Donatelli, Marco; Mazza, Mariarosa

2017-12-01

Fractional diffusion equations (FDEs) are a mathematical tool used for describing some special diffusion phenomena arising in many different applications like porous media and computational finance. In this paper, we focus on a two-dimensional space-FDE problem discretized by means of a second order finite difference scheme obtained as combination of the Crank-Nicolson scheme and the so-called weighted and shifted Grünwald formula. By fully exploiting the Toeplitz-like structure of the resulting linear system, we provide a detailed spectral analysis of the coefficient matrix at each time step, both in the case of constant and variable diffusion coefficients. Such a spectral analysis has a very crucial role, since it can be used for designing fast and robust iterative solvers. In particular, we employ the obtained spectral information to define a Galerkin multigrid method based on the classical linear interpolation as grid transfer operator and damped-Jacobi as smoother, and to prove the linear convergence rate of the corresponding two-grid method. The theoretical analysis suggests that the proposed grid transfer operator is strong enough for working also with the V-cycle method and the geometric multigrid. On this basis, we introduce two computationally favourable variants of the proposed multigrid method and we use them as preconditioners for Krylov methods. Several numerical results confirm that the resulting preconditioning strategies still keep a linear convergence rate.
Improving matrix-vector product performance and multi-level preconditioning for the parallel PCG package

Energy Technology Data Exchange (ETDEWEB)

McLay, R.T.; Carey, G.F.

1996-12-31

In this study we consider parallel solution of sparse linear systems arising from discretized PDE`s. As part of our continuing work on our parallel PCG Solver package, we have made improvements in two areas. The first is improving the performance of the matrix-vector product. Here on regular finite-difference grids, we are able to use the cache memory more efficiently for smaller domains or where there are multiple degrees of freedom. The second problem of interest in the present work is the construction of preconditioners in the context of the parallel PCG solver we are developing. Here the problem is partitioned over a set of processors subdomains and the matrix-vector product for PCG is carried out in parallel for overlapping grid subblocks. For problems of scaled speedup, the actual rate of convergence of the unpreconditioned system deteriorates as the mesh is refined. Multigrid and subdomain strategies provide a logical approach to resolving the problem. We consider the parallel trade-offs between communication and computation and provide a complexity analysis of a representative algorithm. Some preliminary calculations using the parallel package and comparisons with other preconditioners are provided together with parallel performance results.
Parallelized implicit propagators for the finite-difference Schrödinger equation

Science.gov (United States)

Parker, Jonathan; Taylor, K. T.

1995-08-01

We describe the application of block Gauss-Seidel and block Jacobi iterative methods to the design of implicit propagators for finite-difference models of the time-dependent Schrödinger equation. The block-wise iterative methods discussed here are mixed direct-iterative methods for solving simultaneous equations, in the sense that direct methods (e.g. LU decomposition) are used to invert certain block sub-matrices, and iterative methods are used to complete the solution. We describe parallel variants of the basic algorithm that are well suited to the medium- to coarse-grained parallelism of work-station clusters, and MIMD supercomputers, and we show that under a wide range of conditions, fine-grained parallelism of the computation can be achieved. Numerical tests are conducted on a typical one-electron atom Hamiltonian. The methods converge robustly to machine precision (15 significant figures), in some cases in as few as 6 or 7 iterations. The rate of convergence is nearly independent of the finite-difference grid-point separations.
Multigrid methods for partial differential equations - a short introduction

International Nuclear Information System (INIS)

Linden, J.; Stueben, K.

1993-01-01

These notes summarize the multigrid methods and emphasis is laid on the algorithmic concepts of multigrid for solving linear and non-linear partial differential equations. In this paper there is brief description of the basic structure of multigrid methods. Detailed introduction is also contained with applications to VLSI process simulation. (A.B.)
Algébrico: Parte II - Algoritmo Paralelo

Directory of Open Access Journals (Sweden)

Fabio Henrique Pereira

2007-01-01

Full Text Available In this work, it is presented a new parallel wavelet- based algorithm for the Algebraic Multigrid Method (PWAMG. A variation of the standard parallel implementation of discrete wavelet transforms is used in the construction of a hierarchy of matrices and of intergrid transfer operators for Algebraic Multigrid. The PWAMG method has been tested as a parallel solver for the two dimensional Poisson equation, for different numbers of finite difference mesh nodes and comparisons are made with the sequential version of this method.
A Critical Study of Agglomerated Multigrid Methods for Diffusion

Science.gov (United States)

Nishikawa, Hiroaki; Diskin, Boris; Thomas, James L.

2011-01-01

Agglomerated multigrid techniques used in unstructured-grid methods are studied critically for a model problem representative of laminar diffusion in the incompressible limit. The studied target-grid discretizations and discretizations used on agglomerated grids are typical of current node-centered formulations. Agglomerated multigrid convergence rates are presented using a range of two- and three-dimensional randomly perturbed unstructured grids for simple geometries with isotropic and stretched grids. Two agglomeration techniques are used within an overall topology-preserving agglomeration framework. The results show that multigrid with an inconsistent coarse-grid scheme using only the edge terms (also referred to in the literature as a thin-layer formulation) provides considerable speedup over single-grid methods but its convergence deteriorates on finer grids. Multigrid with a Galerkin coarse-grid discretization using piecewise-constant prolongation and a heuristic correction factor is slower and also grid-dependent. In contrast, grid-independent convergence rates are demonstrated for multigrid with consistent coarse-grid discretizations. Convergence rates of multigrid cycles are verified with quantitative analysis methods in which parts of the two-grid cycle are replaced by their idealized counterparts.
Parallel Object-Oriented Computation Applied to a Finite Element Problem

Directory of Open Access Journals (Sweden)

Jon B. Weissman

1993-01-01

Full Text Available The conventional wisdom in the scientific computing community is that the best way to solve large-scale numerically intensive scientific problems on today's parallel MIMD computers is to use Fortran or C programmed in a data-parallel style using low-level message-passing primitives. This approach inevitably leads to nonportable codes and extensive development time, and restricts parallel programming to the domain of the expert programmer. We believe that these problems are not inherent to parallel computing but are the result of the programming tools used. We will show that comparable performance can be achieved with little effort if better tools that present higher level abstractions are used. The vehicle for our demonstration is a 2D electromagnetic finite element scattering code we have implemented in Mentat, an object-oriented parallel processing system. We briefly describe the application. Mentat, the implementation, and present performance results for both a Mentat and a hand-coded parallel Fortran version.

Fast multigrid solution of the advection problem with closed characteristics

Energy Technology Data Exchange (ETDEWEB)

Yavneh, I. [Israel Inst. of Technology, Haifa (Israel); Venner, C.H. [Univ. of Twente, Enschede (Netherlands); Brandt, A. [Weizmann Inst. of Science, Rehovot (Israel)

1996-12-31

The numerical solution of the advection-diffusion problem in the inviscid limit with closed characteristics is studied as a prelude to an efficient high Reynolds-number flow solver. It is demonstrated by a heuristic analysis and numerical calculations that using upstream discretization with downstream relaxation-ordering and appropriate residual weighting in a simple multigrid V cycle produces an efficient solution process. We also derive upstream finite-difference approximations to the advection operator, whose truncation terms approximate {open_quotes}physical{close_quotes} (Laplacian) viscosity, thus avoiding spurious solutions to the homogeneous problem when the artificial diffusivity dominates the physical viscosity.
Scalable multi-grid preconditioning techniques for the even-parity S_N solver in UNIC

International Nuclear Information System (INIS)

Mahadevan, Vijay S.; Smith, Michael A.

2011-01-01

The Even-parity neutron transport equation with FE-S_N discretization is solved traditionally using SOR preconditioned CG method at the lowest level of iterations in order to compute the criticality in reactor analysis problems. The use of high order isoparametric finite elements prohibits the formation of the discrete operator explicitly due to memory constraints in peta scale architectures. Hence, a h-p multi-grid preconditioner based on linear tessellation of the higher order mesh is introduced here for the space-angle system and compared against SOR and Algebraic MG black-box solvers. The performance and scalability of the multi-grid scheme was determined for two test problems and found to be competitive in terms of both computational time and memory requirements. The implementation of this preconditioner in an even-parity solver like UNIC from ANL can further enable high fidelity calculations in a scalable manner on peta flop machines. (author)
Progress with multigrid schemes for hypersonic flow problems

International Nuclear Information System (INIS)

Radespiel, R.; Swanson, R.C.

1995-01-01

Several multigrid schemes are considered for the numerical computation of viscous hypersonic flows. For each scheme, the basic solution algorithm employs upwind spatial discretization with explicit multistage time stepping. Two-level versions of the various multigrid algorithms are applied to the two-dimensional advection equation, and Fourier analysis is used to determine their damping properties. The capabilities of the multigrid methods are assessed by solving three different hypersonic flow problems. Some new multigrid schemes based on semicoarsening strategies are shown to be quite effective in relieving the stiffness caused by the high-aspect-ratio cells required to resolve high Reynolds number flows. These schemes exhibit good convergence rates for Reynolds numbers up to 200 X 10 6 and Mach numbers up to 25. 32 refs., 31 figs., 1 tab
Eigensolution of finite element problems in a completely connected parallel architecture

Science.gov (United States)

Akl, Fred A.; Morel, Michael R.

1989-01-01

A parallel algorithm for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi)=(M)(phi)(omega), where (K) and (M) are of order N, and (omega) is of order q is presented. The parallel algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm has been successfully implemented on a tightly coupled multiple-instruction-multiple-data (MIMD) parallel processing computer, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor, or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macro-tasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18 and 3.61 are achieved on two, four, six and eight processors, respectively.
Multi-Grid detector for neutron spectroscopy: results obtained on time-of-flight spectrometer CNCS

Science.gov (United States)

Anastasopoulos, M.; Bebb, R.; Berry, K.; Birch, J.; Bryś, T.; Buffet, J.-C.; Clergeau, J.-F.; Deen, P. P.; Ehlers, G.; van Esch, P.; Everett, S. M.; Guerard, B.; Hall-Wilton, R.; Herwig, K.; Hultman, L.; Höglund, C.; Iruretagoiena, I.; Issa, F.; Jensen, J.; Khaplanov, A.; Kirstein, O.; Lopez Higuera, I.; Piscitelli, F.; Robinson, L.; Schmidt, S.; Stefanescu, I.

2017-04-01

The Multi-Grid detector technology has evolved from the proof-of-principle and characterisation stages. Here we report on the performance of the Multi-Grid detector, the MG.CNCS prototype, which has been installed and tested at the Cold Neutron Chopper Spectrometer, CNCS at SNS. This has allowed a side-by-side comparison to the performance of 3He detectors on an operational instrument. The demonstrator has an active area of 0.2 m2. It is specifically tailored to the specifications of CNCS. The detector was installed in June 2016 and has operated since then, collecting neutron scattering data in parallel to the He-3 detectors of CNCS. In this paper, we present a comprehensive analysis of this data, in particular on instrument energy resolution, rate capability, background and relative efficiency. Stability, gamma-ray and fast neutron sensitivity have also been investigated. The effect of scattering in the detector components has been measured and provides input to comparison for Monte Carlo simulations. All data is presented in comparison to that measured by the 3He detectors simultaneously, showing that all features recorded by one detector are also recorded by the other. The energy resolution matches closely. We find that the Multi-Grid is able to match the data collected by 3He, and see an indication of a considerable advantage in the count rate capability. Based on these results, we are confident that the Multi-Grid detector will be capable of producing high quality scientific data on chopper spectrometers utilising the unprecedented neutron flux of the ESS.
Multigrid Methods for Fully Implicit Oil Reservoir Simulation

Science.gov (United States)

Molenaar, J.

1996-01-01

In this paper we consider the simultaneous flow of oil and water in reservoir rock. This displacement process is modeled by two basic equations: the material balance or continuity equations and the equation of motion (Darcy's law). For the numerical solution of this system of nonlinear partial differential equations there are two approaches: the fully implicit or simultaneous solution method and the sequential solution method. In the sequential solution method the system of partial differential equations is manipulated to give an elliptic pressure equation and a hyperbolic (or parabolic) saturation equation. In the IMPES approach the pressure equation is first solved, using values for the saturation from the previous time level. Next the saturations are updated by some explicit time stepping method; this implies that the method is only conditionally stable. For the numerical solution of the linear, elliptic pressure equation multigrid methods have become an accepted technique. On the other hand, the fully implicit method is unconditionally stable, but it has the disadvantage that in every time step a large system of nonlinear algebraic equations has to be solved. The most time-consuming part of any fully implicit reservoir simulator is the solution of this large system of equations. Usually this is done by Newton's method. The resulting systems of linear equations are then either solved by a direct method or by some conjugate gradient type method. In this paper we consider the possibility of applying multigrid methods for the iterative solution of the systems of nonlinear equations. There are two ways of using multigrid for this job: either we use a nonlinear multigrid method or we use a linear multigrid method to deal with the linear systems that arise in Newton's method. So far only a few authors have reported on the use of multigrid methods for fully implicit simulations. Two-level FAS algorithm is presented for the black-oil equations, and linear multigrid for
Scalable smoothing strategies for a geometric multigrid method for the immersed boundary equations

Energy Technology Data Exchange (ETDEWEB)

Bhalla, Amneet Pal Singh [Univ. of North Carolina, Chapel Hill, NC (United States); Knepley, Matthew G. [Rice Univ., Houston, TX (United States); Adams, Mark F. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Guy, Robert D. [Univ. of California, Davis, CA (United States); Griffith, Boyce E. [Univ. of North Carolina, Chapel Hill, NC (United States)

2016-12-20

The immersed boundary (IB) method is a widely used approach to simulating fluid-structure interaction (FSI). Although explicit versions of the IB method can suffer from severe time step size restrictions, these methods remain popular because of their simplicity and generality. In prior work (Guy et al., Adv Comput Math, 2015), some of us developed a geometric multigrid preconditioner for a stable semi-implicit IB method under Stokes flow conditions; however, this solver methodology used a Vanka-type smoother that presented limited opportunities for parallelization. This work extends this Stokes-IB solver methodology by developing smoothing techniques that are suitable for parallel implementation. Specifically, we demonstrate that an additive version of the Vanka smoother can yield an effective multigrid preconditioner for the Stokes-IB equations, and we introduce an efficient Schur complement-based smoother that is also shown to be effective for the Stokes-IB equations. We investigate the performance of these solvers for a broad range of material stiffnesses, both for Stokes flows and flows at nonzero Reynolds numbers, and for thick and thin structural models. We show here that linear solver performance degrades with increasing Reynolds number and material stiffness, especially for thin interface cases. Nonetheless, the proposed approaches promise to yield effective solution algorithms, especially at lower Reynolds numbers and at modest-to-high elastic stiffnesses.
Parallel Solver for H(div) Problems Using Hybridization and AMG

Energy Technology Data Exchange (ETDEWEB)

Lee, Chak S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Vassilevski, Panayot S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2016-01-15

In this paper, a scalable parallel solver is proposed for H(div) problems discretized by arbitrary order finite elements on general unstructured meshes. The solver is based on hybridization and algebraic multigrid (AMG). Unlike some previously studied H(div) solvers, the hybridization solver does not require discrete curl and gradient operators as additional input from the user. Instead, only some element information is needed in the construction of the solver. The hybridization results in a H1-equivalent symmetric positive definite system, which is then rescaled and solved by AMG solvers designed for H1 problems. Weak and strong scaling of the method are examined through several numerical tests. Our numerical results show that the proposed solver provides a promising alternative to ADS, a state-of-the-art solver [12], for H(div) problems. In fact, it outperforms ADS for higher order elements.
New multigrid solver advances in TOPS

International Nuclear Information System (INIS)

Falgout, R D; Brannick, J; Brezina, M; Manteuffel, T; McCormick, S

2005-01-01

In this paper, we highlight new multigrid solver advances in the Terascale Optimal PDE Simulations (TOPS) project in the Scientific Discovery Through Advanced Computing (SciDAC) program. We discuss two new algebraic multigrid (AMG) developments in TOPS: the adaptive smoothed aggregation method (αSA) and a coarse-grid selection algorithm based on compatible relaxation (CR). The αSA method is showing promising results in initial studies for Quantum Chromodynamics (QCD) applications. The CR method has the potential to greatly improve the applicability of AMG
Multigrid for Staggered Lattice Fermions

Energy Technology Data Exchange (ETDEWEB)

Brower, Richard C. [Boston U.; Clark, M. A. [Unlisted, US; Strelchenko, Alexei [Fermilab; Weinberg, Evan [Boston U.

2018-01-23

Critical slowing down in Krylov methods for the Dirac operator presents a major obstacle to further advances in lattice field theory as it approaches the continuum solution. Here we formulate a multi-grid algorithm for the Kogut-Susskind (or staggered) fermion discretization which has proven difficult relative to Wilson multigrid due to its first-order anti-Hermitian structure. The solution is to introduce a novel spectral transformation by the K\\"ahler-Dirac spin structure prior to the Galerkin projection. We present numerical results for the two-dimensional, two-flavor Schwinger model, however, the general formalism is agnostic to dimension and is directly applicable to four-dimensional lattice QCD.
Matrix-dependent multigrid-homogenization for diffusion problems

Energy Technology Data Exchange (ETDEWEB)

Knapek, S. [Institut fuer Informatik tu Muenchen (Germany)

1996-12-31

We present a method to approximately determine the effective diffusion coefficient on the coarse scale level of problems with strongly varying or discontinuous diffusion coefficients. It is based on techniques used also in multigrid, like Dendy`s matrix-dependent prolongations and the construction of coarse grid operators by means of the Galerkin approximation. In numerical experiments, we compare our multigrid-homogenization method with homogenization, renormalization and averaging approaches.
Layout optimization with algebraic multigrid methods

Science.gov (United States)

Regler, Hans; Ruede, Ulrich

1993-01-01

Finding the optimal position for the individual cells (also called functional modules) on the chip surface is an important and difficult step in the design of integrated circuits. This paper deals with the problem of relative placement, that is the minimization of a quadratic functional with a large, sparse, positive definite system matrix. The basic optimization problem must be augmented by constraints to inhibit solutions where cells overlap. Besides classical iterative methods, based on conjugate gradients (CG), we show that algebraic multigrid methods (AMG) provide an interesting alternative. For moderately sized examples with about 10000 cells, AMG is already competitive with CG and is expected to be superior for larger problems. Besides the classical 'multiplicative' AMG algorithm where the levels are visited sequentially, we propose an 'additive' variant of AMG where levels may be treated in parallel and that is suitable as a preconditioner in the CG algorithm.
Three-dimensional parallel edge-based finite element modeling of electromagnetic data with field redatuming

DEFF Research Database (Denmark)

Cai, Hongzhu; Čuma, Martin; Zhdanov, Michael

2015-01-01

This paper presents a parallelized version of the edge-based finite element method with a novel post-processing approach for numerical modeling of an electromagnetic field in complex media. The method uses an unstructured tetrahedral mesh which can reduce the number of degrees of freedom signific......This paper presents a parallelized version of the edge-based finite element method with a novel post-processing approach for numerical modeling of an electromagnetic field in complex media. The method uses an unstructured tetrahedral mesh which can reduce the number of degrees of freedom...... significantly. The linear system of finite element equations is solved using parallel direct solvers which are robust for ill-conditioned systems and efficient for multiple source electromagnetic (EM) modeling. We also introduce a novel approach to compute the scalar components of the electric field from...... the tangential components along each edge based on field redatuming. The method can produce a more accurate result as compared to conventional approach. We have applied the developed algorithm to compute the EM response for a typical 3D anisotropic geoelectrical model of the off-shore HC reservoir with complex...
Highly indefinite multigrid for eigenvalue problems

Energy Technology Data Exchange (ETDEWEB)

Borges, L.; Oliveira, S.

1996-12-31

Eigenvalue problems are extremely important in understanding dynamic processes such as vibrations and control systems. Large scale eigenvalue problems can be very difficult to solve, especially if a large number of eigenvalues and the corresponding eigenvectors need to be computed. For solving this problem a multigrid preconditioned algorithm is presented in {open_quotes}The Davidson Algorithm, preconditioning and misconvergence{close_quotes}. Another approach for solving eigenvalue problems is by developing efficient solutions for highly indefinite problems. In this paper we concentrate on the use of new highly indefinite multigrid algorithms for the eigenvalue problem.
A parallel finite element method for the analysis of crystalline solids

DEFF Research Database (Denmark)

Sørensen, N.J.; Andersen, B.S.

1996-01-01

A parallel finite element method suitable for the analysis of 3D quasi-static crystal plasticity problems has been developed. The method is based on substructuring of the original mesh into a number of substructures which are treated as isolated finite element models related via the interface...... conditions. The resulting interface equations are solved using a direct solution method. The method shows a good speedup when increasing the number of processors from 1 to 8 and the effective solution of 3D crystal plasticity problems whose size is much too large for a single work station becomes possible....
A parallel adaptive finite difference algorithm for petroleum reservoir simulation

Energy Technology Data Exchange (ETDEWEB)

Hoang, Hai Minh

2005-07-01

Adaptive finite differential for problems arising in simulation of flow in porous medium applications are considered. Such methods have been proven useful for overcoming limitations of computational resources and improving the resolution of the numerical solutions to a wide range of problems. By local refinement of the computational mesh where it is needed to improve the accuracy of solutions, yields better solution resolution representing more efficient use of computational resources than is possible with traditional fixed-grid approaches. In this thesis, we propose a parallel adaptive cell-centered finite difference (PAFD) method for black-oil reservoir simulation models. This is an extension of the adaptive mesh refinement (AMR) methodology first developed by Berger and Oliger (1984) for the hyperbolic problem. Our algorithm is fully adaptive in time and space through the use of subcycling, in which finer grids are advanced at smaller time steps than the coarser ones. When coarse and fine grids reach the same advanced time level, they are synchronized to ensure that the global solution is conservative and satisfy the divergence constraint across all levels of refinement. The material in this thesis is subdivided in to three overall parts. First we explain the methodology and intricacies of AFD scheme. Then we extend a finite differential cell-centered approximation discretization to a multilevel hierarchy of refined grids, and finally we are employing the algorithm on parallel computer. The results in this work show that the approach presented is robust, and stable, thus demonstrating the increased solution accuracy due to local refinement and reduced computing resource consumption. (Author)
Multigrid and multilevel domain decomposition for unstructured grids

Energy Technology Data Exchange (ETDEWEB)

Chan, T.; Smith, B.

1994-12-31

Multigrid has proven itself to be a very versatile method for the iterative solution of linear and nonlinear systems of equations arising from the discretization of PDES. In some applications, however, no natural multilevel structure of grids is available, and these must be generated as part of the solution procedure. In this presentation the authors will consider the problem of generating a multigrid algorithm when only a fine, unstructured grid is given. Their techniques generate a sequence of coarser grids by first forming an approximate maximal independent set of the vertices and then applying a Cavendish type algorithm to form the coarser triangulation. Numerical tests indicate that convergence using this approach can be as fast as standard multigrid on a structured mesh, at least in two dimensions.
Solution of finite element problems using hybrid parallelization with MPI and OpenMP Solution of finite element problems using hybrid parallelization with MPI and OpenMP

Directory of Open Access Journals (Sweden)

José Miguel Vargas-Félix

2012-11-01

Full Text Available The Finite Element Method (FEM is used to solve problems like solid deformation and heat diffusion in domains with complex geometries. This kind of geometries requires discretization with millions of elements; this is equivalent to solve systems of equations with sparse matrices and tens or hundreds of millions of variables. The aim is to use computer clusters to solve these systems. The solution method used is Schur substructuration. Using it is possible to divide a large system of equations into many small ones to solve them more efficiently. This method allows parallelization. MPI (Message Passing Interface is used to distribute the systems of equations to solve each one in a computer of a cluster. Each system of equations is solved using a solver implemented to use OpenMP as a local parallelization method.The Finite Element Method (FEM is used to solve problems like solid deformation and heat diffusion in domains with complex geometries. This kind of geometries requires discretization with millions of elements; this is equivalent to solve systems of equations with sparse matrices and tens or hundreds of millions of variables. The aim is to use computer clusters to solve these systems. The solution method used is Schur substructuration. Using it is possible to divide a large system of equations into many small ones to solve them more efficiently. This method allows parallelization. MPI (Message Passing Interface is used to distribute the systems of equations to solve each one in a computer of a cluster. Each system of equations is solved using a solver implemented to use OpenMP as a local parallelization method.
Monolithic multigrid method for the coupled Stokes flow and deformable porous medium system

Science.gov (United States)

Luo, P.; Rodrigo, C.; Gaspar, F. J.; Oosterlee, C. W.

2018-01-01

The interaction between fluid flow and a deformable porous medium is a complicated multi-physics problem, which can be described by a coupled model based on the Stokes and poroelastic equations. A monolithic multigrid method together with either a coupled Vanka smoother or a decoupled Uzawa smoother is employed as an efficient numerical technique for the linear discrete system obtained by finite volumes on staggered grids. A specialty in our modeling approach is that at the interface of the fluid and poroelastic medium, two unknowns from the different subsystems are defined at the same grid point. We propose a special discretization at and near the points on the interface, which combines the approximation of the governing equations and the considered interface conditions. In the decoupled Uzawa smoother, Local Fourier Analysis (LFA) helps us to select optimal values of the relaxation parameter appearing. To implement the monolithic multigrid method, grid partitioning is used to deal with the interface updates when communication is required between two subdomains. Numerical experiments show that the proposed numerical method has an excellent convergence rate. The efficiency and robustness of the method are confirmed in numerical experiments with typically small realistic values of the physical coefficients.
Multigrid methods for the computation of propagators in gauge fields

International Nuclear Information System (INIS)

Kalkreuter, T.

1992-11-01

In the present work generalizations of multigrid methods for propagators in gauge fields are investigated. We discuss proper averaging operations for bosons and for staggered fermions. An efficient algorithm for computing C numerically is presented. The averaging kernels C can be used not only in deterministic multigrid computations, but also in multigrid Monte Carlo simulations, and for the definition of block spins and blocked gauge fields in Monte Carlo renormalization group studies of gauge theories. Actual numerical computations of kernels and propagators are performed in compact four-dimensional SU(2) gauge fields. (orig./HSI)

Multigrid preconditioning of the generator two-phase mixture balance equations in the Genepi software

International Nuclear Information System (INIS)

Belliard, M.; Grandotto, M.

2003-01-01

In the framework of the two-phase fluid simulations of the steam generators of pressurized water nuclear reactors, we present in this paper a geometric version of a pseudo-Full MultiGrid (pseudo- FMG) Full Approximation Storage (FAS) preconditioning of balance equations in the GENEPI code. In our application, the 3D steady state flow is reached by a transient computation using a semi-implicit fractional step algorithm for the averaged two-phase mixture balance equations (mass, momentum and energy for the secondary flow). Our application, running on workstation clusters, is based on a CEA code-linker and the PVM package. The difficulties to apply the geometric FAS multigrid method to the momentum and mass balance equations are addressed. The use of a sequential pseudo-FMG FAS twogrid method for both energy and mass/momentum balance equations, using dynamic multigrid cycles, leads to perceptibly improvements in the computation convergences. An original parallel red-black pseudo-FMG FAS three-grid algorithm is presented too. The numerical tests (steam generator mockup simulations) underline the sizable increase in speed of convergence of the computations, essentially for the ones involving a large number of freedom degrees (about 100 thousand cells). The two-phase mixture balance equation residuals are quickly reduced: the reached speed-up stands between 2 and 3 following the number of grids. The effects on the convergence behavior of the numerical parameters are investigated
Multicloud: Multigrid convergence with a meshless operator

International Nuclear Information System (INIS)

Katz, Aaron; Jameson, Antony

2009-01-01

The primary objective of this work is to develop and test a new convergence acceleration technique we call multicloud. Multicloud is well-founded in the mathematical basis of multigrid, but relies on a meshless operator on coarse levels. The meshless operator enables extremely simple and automatic coarsening procedures for arbitrary meshes using arbitrary fine level discretization schemes. The performance of multicloud is compared with established multigrid techniques for structured and unstructured meshes for the Euler equations on two-dimensional test cases. Results indicate comparable convergence rates per unit work for multicloud and multigrid. However, because of its mesh and scheme transparency, multicloud may be applied to a wide array of problems with no modification of fine level schemes as is often required with agglomeration techniques. The implication is that multicloud can be implemented in a completely modular fashion, allowing researchers to develop fine level algorithms independent of the convergence accelerator for complex three-dimensional problems.
Finite Volume Methods for Incompressible Navier-Stokes Equations on Collocated Grids with Nonconformal Interfaces

DEFF Research Database (Denmark)

Kolmogorov, Dmitry

turbine computations, collocated grid-based SIMPLE-like algorithms are developed for computations on block-structured grids with nonconformal interfaces. A technique to enhance both the convergence speed and the solution accuracy of the SIMPLE-like algorithms is presented. The erroneous behavior, which...... versions of the SIMPLE algorithm. The new technique is implemented in an existing conservative 2nd order finite-volume scheme flow solver (EllipSys), which is extended to cope with grids with nonconformal interfaces. The behavior of the discrete Navier-Stokes equations is discussed in detail...... Block LU relaxation scheme is shown to possess several optimal conditions, which enables to preserve high efficiency of the multigrid solver on both conformal and nonconformal grids. The developments are done using a parallel MPI algorithm, which can handle multiple numbers of interfaces with multiple...
Parallel algorithms for testing finite state machines:Generating UIO sequences

OpenAIRE

Hierons, RM; Turker, UC

2016-01-01

This paper describes an efficient parallel algorithm that uses many-core GPUs for automatically deriving Unique Input Output sequences (UIOs) from Finite State Machines. The proposed algorithm uses the global scope of the GPU's global memory through coalesced memory access and minimises the transfer between CPU and GPU memory. The results of experiments indicate that the proposed method yields considerably better results compared to a single core UIO construction algorithm. Our algorithm is s...
Large-Scale Parallel Finite Element Analysis of the Stress Singular Problems

International Nuclear Information System (INIS)

Noriyuki Kushida; Hiroshi Okuda; Genki Yagawa

2002-01-01

In this paper, the convergence behavior of large-scale parallel finite element method for the stress singular problems was investigated. The convergence behavior of iterative solvers depends on the efficiency of the pre-conditioners. However, efficiency of pre-conditioners may be influenced by the domain decomposition that is necessary for parallel FEM. In this study the following results were obtained: Conjugate gradient method without preconditioning and the diagonal scaling preconditioned conjugate gradient method were not influenced by the domain decomposition as expected. symmetric successive over relaxation method preconditioned conjugate gradient method converged 6% faster as maximum if the stress singular area was contained in one sub-domain. (authors)
Parallel DSMC Solution of Three-Dimensional Flow Over a Finite Flat Plate

Science.gov (United States)

Nance, Robert P.; Wilmoth, Richard G.; Moon, Bongki; Hassan, H. A.; Saltz, Joel

1994-01-01

This paper describes a parallel implementation of the direct simulation Monte Carlo (DSMC) method. Runtime library support is used for scheduling and execution of communication between nodes, and domain decomposition is performed dynamically to maintain a good load balance. Performance tests are conducted using the code to evaluate various remapping and remapping-interval policies, and it is shown that a one-dimensional chain-partitioning method works best for the problems considered. The parallel code is then used to simulate the Mach 20 nitrogen flow over a finite-thickness flat plate. It is shown that the parallel algorithm produces results which compare well with experimental data. Moreover, it yields significantly faster execution times than the scalar code, as well as very good load-balance characteristics.
A Dual Super-Element Domain Decomposition Approach for Parallel Nonlinear Finite Element Analysis

Science.gov (United States)

Jokhio, G. A.; Izzuddin, B. A.

2015-05-01

This article presents a new domain decomposition method for nonlinear finite element analysis introducing the concept of dual partition super-elements. The method extends ideas from the displacement frame method and is ideally suited for parallel nonlinear static/dynamic analysis of structural systems. In the new method, domain decomposition is realized by replacing one or more subdomains in a "parent system," each with a placeholder super-element, where the subdomains are processed separately as "child partitions," each wrapped by a dual super-element along the partition boundary. The analysis of the overall system, including the satisfaction of equilibrium and compatibility at all partition boundaries, is realized through direct communication between all pairs of placeholder and dual super-elements. The proposed method has particular advantages for matrix solution methods based on the frontal scheme, and can be readily implemented for existing finite element analysis programs to achieve parallelization on distributed memory systems with minimal intervention, thus overcoming memory bottlenecks typically faced in the analysis of large-scale problems. Several examples are presented in this article which demonstrate the computational benefits of the proposed parallel domain decomposition approach and its applicability to the nonlinear structural analysis of realistic structural systems.
Simulation of incompressible flows with heat and mass transfer using parallel finite element method

Directory of Open Access Journals (Sweden)

Jalal Abedi

2003-02-01

Full Text Available The stabilized finite element formulations based on the SUPG (Stream-line-Upwind/Petrov-Galerkin and PSPG (Pressure-Stabilization/Petrov-Galerkin methods are developed and applied to solve buoyancy-driven incompressible flows with heat and mass transfer. The SUPG stabilization term allows us to solve flow problems at high speeds (advection dominant flows and the PSPG term eliminates instabilities associated with the use of equal order interpolation functions for both pressure and velocity. The finite element formulations are implemented in parallel using MPI. In parallel computations, the finite element mesh is partitioned into contiguous subdomains using METIS, which are then assigned to individual processors. To ensure a balanced load, the number of elements assigned to each processor is approximately equal. To solve nonlinear systems in large-scale applications, we developed a matrix-free GMRES iterative solver. Here we totally eliminate a need to form any matrices, even at the element levels. To measure the accuracy of the method, we solve 2D and 3D example of natural convection flows at moderate to high Rayleigh numbers.
Development of a multi-grid FDTD code for three-dimensional simulation of large microwave sintering experiments

Energy Technology Data Exchange (ETDEWEB)

White, M.J.; Iskander, M.F. [Univ. of Utah, Salt Lake City, UT (United States). Electrical Engineering Dept.; Kimrey, H.D. [Oak Ridge National Lab., TN (United States)

1996-12-31

The Finite-Difference Time-Domain (FDTD) code available at the University of Utah has been used to simulate sintering of ceramics in single and multimode cavities, and many useful results have been reported in literature. More detailed and accurate results, specifically around and including the ceramic sample, are often desired to help evaluate the adequacy of the heating procedure. In electrically large multimode cavities, however, computer memory requirements limit the number of the mathematical cells, and the desired resolution is impractical to achieve due to limited computer resources. Therefore, an FDTD algorithm which incorporates multiple-grid regions with variable-grid sizes is required to adequately perform the desired simulations. In this paper the authors describe the development of a three-dimensional multi-grid FDTD code to help focus a large number of cells around the desired region. Test geometries were solved using a uniform-grid and the developed multi-grid code to help validate the results from the developed code. Results from these comparisons, as well as the results of comparisons between the developed FDTD code and other available variable-grid codes are presented. In addition, results from the simulation of realistic microwave sintering experiments showed improved resolution in critical sites inside the three-dimensional sintering cavity. With the validation of the FDTD code, simulations were performed for electrically large, multimode, microwave sintering cavities to fully demonstrate the advantages of the developed multi-grid FDTD code.
On several aspects and applications of the multigrid method for solving partial differential equations

Science.gov (United States)

Dinar, N.

1978-01-01

Several aspects of multigrid methods are briefly described. The main subjects include the development of very efficient multigrid algorithms for systems of elliptic equations (Cauchy-Riemann, Stokes, Navier-Stokes), as well as the development of control and prediction tools (based on local mode Fourier analysis), used to analyze, check and improve these algorithms. Preliminary research on multigrid algorithms for time dependent parabolic equations is also described. Improvements in existing multigrid processes and algorithms for elliptic equations were studied.
Final Report for 'Implimentation and Evaluation of Multigrid Linear Solvers into Extended Magnetohydrodynamic Codes for Petascale Computing'

International Nuclear Information System (INIS)

Vadlamani, Srinath; Kruger, Scott; Austin, Travis

2008-01-01

Extended magnetohydrodynamic (MHD) codes are used to model the large, slow-growing instabilities that are projected to limit the performance of International Thermonuclear Experimental Reactor (ITER). The multiscale nature of the extended MHD equations requires an implicit approach. The current linear solvers needed for the implicit algorithm scale poorly because the resultant matrices are so ill-conditioned. A new solver is needed, especially one that scales to the petascale. The most successful scalable parallel processor solvers to date are multigrid solvers. Applying multigrid techniques to a set of equations whose fundamental modes are dispersive waves is a promising solution to CEMM problems. For the Phase 1, we implemented multigrid preconditioners from the HYPRE project of the Center for Applied Scientific Computing at LLNL via PETSc of the DOE SciDAC TOPS for the real matrix systems of the extended MHD code NIMROD which is a one of the primary modeling codes of the OFES-funded Center for Extended Magnetohydrodynamic Modeling (CEMM) SciDAC. We implemented the multigrid solvers on the fusion test problem that allows for real matrix systems with success, and in the process learned about the details of NIMROD data structures and the difficulties of inverting NIMROD operators. The further success of this project will allow for efficient usage of future petascale computers at the National Leadership Facilities: Oak Ridge National Laboratory, Argonne National Laboratory, and National Energy Research Scientific Computing Center. The project will be a collaborative effort between computational plasma physicists and applied mathematicians at Tech-X Corporation, applied mathematicians Front Range Scientific Computations, Inc. (who are collaborators on the HYPRE project), and other computational plasma physicists involved with the CEMM project.
Three-dimensional forward modeling of DC resistivity using the aggregation-based algebraic multigrid method

Science.gov (United States)

Chen, Hui; Deng, Ju-Zhi; Yin, Min; Yin, Chang-Chun; Tang, Wen-Wu

2017-03-01

To speed up three-dimensional (3D) DC resistivity modeling, we present a new multigrid method, the aggregation-based algebraic multigrid method (AGMG). We first discretize the differential equation of the secondary potential field with mixed boundary conditions by using a seven-point finite-difference method to obtain a large sparse system of linear equations. Then, we introduce the theory behind the pairwise aggregation algorithms for AGMG and use the conjugate-gradient method with the V-cycle AGMG preconditioner (AGMG-CG) to solve the linear equations. We use typical geoelectrical models to test the proposed AGMG-CG method and compare the results with analytical solutions and the 3DDCXH algorithm for 3D DC modeling (3DDCXH). In addition, we apply the AGMG-CG method to different grid sizes and geoelectrical models and compare it to different iterative methods, such as ILU-BICGSTAB, ILU-GCR, and SSOR-CG. The AGMG-CG method yields nearly linearly decreasing errors, whereas the number of iterations increases slowly with increasing grid size. The AGMG-CG method is precise and converges fast, and thus can improve the computational efficiency in forward modeling of three-dimensional DC resistivity.
Modeling of frequency-domain scalar wave equation with the average-derivative optimal scheme based on a multigrid-preconditioned iterative solver

Science.gov (United States)

Cao, Jian; Chen, Jing-Bo; Dai, Meng-Xue

2018-01-01

An efficient finite-difference frequency-domain modeling of seismic wave propagation relies on the discrete schemes and appropriate solving methods. The average-derivative optimal scheme for the scalar wave modeling is advantageous in terms of the storage saving for the system of linear equations and the flexibility for arbitrary directional sampling intervals. However, using a LU-decomposition-based direct solver to solve its resulting system of linear equations is very costly for both memory and computational requirements. To address this issue, we consider establishing a multigrid-preconditioned BI-CGSTAB iterative solver fit for the average-derivative optimal scheme. The choice of preconditioning matrix and its corresponding multigrid components is made with the help of Fourier spectral analysis and local mode analysis, respectively, which is important for the convergence. Furthermore, we find that for the computation with unequal directional sampling interval, the anisotropic smoothing in the multigrid precondition may affect the convergence rate of this iterative solver. Successful numerical applications of this iterative solver for the homogenous and heterogeneous models in 2D and 3D are presented where the significant reduction of computer memory and the improvement of computational efficiency are demonstrated by comparison with the direct solver. In the numerical experiments, we also show that the unequal directional sampling interval will weaken the advantage of this multigrid-preconditioned iterative solver in the computing speed or, even worse, could reduce its accuracy in some cases, which implies the need for a reasonable control of directional sampling interval in the discretization.
A scalable geometric multigrid solver for nonsymmetric elliptic systems with application to variable-density flows

Science.gov (United States)

Esmaily, M.; Jofre, L.; Mani, A.; Iaccarino, G.

2018-03-01

A geometric multigrid algorithm is introduced for solving nonsymmetric linear systems resulting from the discretization of the variable density Navier-Stokes equations on nonuniform structured rectilinear grids and high-Reynolds number flows. The restriction operation is defined such that the resulting system on the coarser grids is symmetric, thereby allowing for the use of efficient smoother algorithms. To achieve an optimal rate of convergence, the sequence of interpolation and restriction operations are determined through a dynamic procedure. A parallel partitioning strategy is introduced to minimize communication while maintaining the load balance between all processors. To test the proposed algorithm, we consider two cases: 1) homogeneous isotropic turbulence discretized on uniform grids and 2) turbulent duct flow discretized on stretched grids. Testing the algorithm on systems with up to a billion unknowns shows that the cost varies linearly with the number of unknowns. This O (N) behavior confirms the robustness of the proposed multigrid method regarding ill-conditioning of large systems characteristic of multiscale high-Reynolds number turbulent flows. The robustness of our method to density variations is established by considering cases where density varies sharply in space by a factor of up to 104, showing its applicability to two-phase flow problems. Strong and weak scalability studies are carried out, employing up to 30,000 processors, to examine the parallel performance of our implementation. Excellent scalability of our solver is shown for a granularity as low as 104 to 105 unknowns per processor. At its tested peak throughput, it solves approximately 4 billion unknowns per second employing over 16,000 processors with a parallel efficiency higher than 50%.
Mesh Partitioning Algorithm Based on Parallel Finite Element Analysis and Its Actualization

Directory of Open Access Journals (Sweden)

Lei Zhang

2013-01-01

Full Text Available In parallel computing based on finite element analysis, domain decomposition is a key technique for its preprocessing. Generally, a domain decomposition of a mesh can be realized through partitioning of a graph which is converted from a finite element mesh. This paper discusses the method for graph partitioning and the way to actualize mesh partitioning. Relevant softwares are introduced, and the data structure and key functions of Metis and ParMetis are introduced. The writing, compiling, and testing of the mesh partitioning interface program based on these key functions are performed. The results indicate some objective law and characteristics to guide the users who use the graph partitioning algorithm and software to write PFEM program, and ideal partitioning effects can be achieved by actualizing mesh partitioning through the program. The interface program can also be used directly by the engineering researchers as a module of the PFEM software. So that it can reduce the application of the threshold of graph partitioning algorithm, improve the calculation efficiency, and promote the application of graph theory and parallel computing.
Experiences using multigrid for geothermal simulation

Energy Technology Data Exchange (ETDEWEB)

Bullivant, D.P.; O`Sullivan, M.J. [Univ. of Auckland (New Zealand); Yang, Z. [Univ. of New South Wales (Australia)

1995-03-01

Experiences of applying multigrid to the calculation of natural states for geothermal simulations are discussed. The modelling of natural states was chosen for this study because they can take a long time to compute and the computation is often dominated by the development of phase change boundaries that take up a small region in the simulation. For the first part of this work a modified version of TOUGH was used for 2-D vertical problems. A {open_quotes}test-bed{close_quotes} program is now being used to investigate some of the problems encountered with implementing multigrid. This is ongoing work. To date, there have been some encouraging but not startling results.
Ground-state projection multigrid for propagators in 4-dimensional SU(2) gauge fields

International Nuclear Information System (INIS)

Kalkreuter, T.

1991-09-01

The ground-state projection multigrid method is studied for computations of slowly decaying bosonic propagators in 4-dimensional SU(2) lattice gauge theory. The defining eigenvalue equation for the restriction operator is solved exactly. Although the critical exponent z is not reduced in nontrivial gauge fields, multigrid still yields considerable speedup compared with conventional relaxation. Multigrid is also able to outperform the conjugate gradient algorithm. (orig.)
Evaluating the performance of the particle finite element method in parallel architectures

Science.gov (United States)

Gimenez, Juan M.; Nigro, Norberto M.; Idelsohn, Sergio R.

2014-05-01

This paper presents a high performance implementation for the particle-mesh based method called particle finite element method two (PFEM-2). It consists of a material derivative based formulation of the equations with a hybrid spatial discretization which uses an Eulerian mesh and Lagrangian particles. The main aim of PFEM-2 is to solve transport equations as fast as possible keeping some level of accuracy. The method was found to be competitive with classical Eulerian alternatives for these targets, even in their range of optimal application. To evaluate the goodness of the method with large simulations, it is imperative to use of parallel environments. Parallel strategies for Finite Element Method have been widely studied and many libraries can be used to solve Eulerian stages of PFEM-2. However, Lagrangian stages, such as streamline integration, must be developed considering the parallel strategy selected. The main drawback of PFEM-2 is the large amount of memory needed, which limits its application to large problems with only one computer. Therefore, a distributed-memory implementation is urgently needed. Unlike a shared-memory approach, using domain decomposition the memory is automatically isolated, thus avoiding race conditions; however new issues appear due to data distribution over the processes. Thus, a domain decomposition strategy for both particle and mesh is adopted, which minimizes the communication between processes. Finally, performance analysis running over multicore and multinode architectures are presented. The Courant-Friedrichs-Lewy number used influences the efficiency of the parallelization and, in some cases, a weighted partitioning can be used to improve the speed-up. However the total cputime for cases presented is lower than that obtained when using classical Eulerian strategies.
Investigations on application of multigrid method to MHD equilibrium analysis

International Nuclear Information System (INIS)

Ikuno, Soichiro

2000-01-01

The potentiality of application for Multi-grid method to MHD equilibrium analysis is investigated. The nonlinear eigenvalue problem often appears when the MHD equilibria are determined by solving the Grad-Shafranov equation numerically. After linearization of the equation, the problem is solved by use of the iterative method. Although the Red-Black SOR method or Gauss-Seidel method is often used for the solution of the linearized equation, it takes much CPU time to solve the problem. The Multi-grid method is compared with the SOR method for the Poisson Problem. The results of computations show that the CPU time required for the Multi-grid method is about 1000 times as small as that for the SOR method. (author)
On multigrid solution of the implicit equations of hydrodynamics. Experiments for the compressible Euler equations in general coordinates

Science.gov (United States)

Kifonidis, K.; Müller, E.

2012-08-01

Aims: We describe and study a family of new multigrid iterative solvers for the multidimensional, implicitly discretized equations of hydrodynamics. Schemes of this class are free of the Courant-Friedrichs-Lewy condition. They are intended for simulations in which widely differing wave propagation timescales are present. A preferred solver in this class is identified. Applications to some simple stiff test problems that are governed by the compressible Euler equations, are presented to evaluate the convergence behavior, and the stability properties of this solver. Algorithmic areas are determined where further work is required to make the method sufficiently efficient and robust for future application to difficult astrophysical flow problems. Methods: The basic equations are formulated and discretized on non-orthogonal, structured curvilinear meshes. Roe's approximate Riemann solver and a second-order accurate reconstruction scheme are used for spatial discretization. Implicit Runge-Kutta (ESDIRK) schemes are employed for temporal discretization. The resulting discrete equations are solved with a full-coarsening, non-linear multigrid method. Smoothing is performed with multistage-implicit smoothers. These are applied here to the time-dependent equations by means of dual time stepping. Results: For steady-state problems, our results show that the efficiency of the present approach is comparable to the best implicit solvers for conservative discretizations of the compressible Euler equations that can be found in the literature. The use of red-black as opposed to symmetric Gauss-Seidel iteration in the multistage-smoother is found to have only a minor impact on multigrid convergence. This should enable scalable parallelization without having to seriously compromise the method's algorithmic efficiency. For time-dependent test problems, our results reveal that the multigrid convergence rate degrades with increasing Courant numbers (i.e. time step sizes). Beyond a

A parallel finite element procedure for contact-impact problems using edge-based smooth triangular element and GPU

Science.gov (United States)

Cai, Yong; Cui, Xiangyang; Li, Guangyao; Liu, Wenyang

2018-04-01

The edge-smooth finite element method (ES-FEM) can improve the computational accuracy of triangular shell elements and the mesh partition efficiency of complex models. In this paper, an approach is developed to perform explicit finite element simulations of contact-impact problems with a graphical processing unit (GPU) using a special edge-smooth triangular shell element based on ES-FEM. Of critical importance for this problem is achieving finer-grained parallelism to enable efficient data loading and to minimize communication between the device and host. Four kinds of parallel strategies are then developed to efficiently solve these ES-FEM based shell element formulas, and various optimization methods are adopted to ensure aligned memory access. Special focus is dedicated to developing an approach for the parallel construction of edge systems. A parallel hierarchy-territory contact-searching algorithm (HITA) and a parallel penalty function calculation method are embedded in this parallel explicit algorithm. Finally, the program flow is well designed, and a GPU-based simulation system is developed, using Nvidia's CUDA. Several numerical examples are presented to illustrate the high quality of the results obtained with the proposed methods. In addition, the GPU-based parallel computation is shown to significantly reduce the computing time.
h-multigrid agglomeration based solution strategies for discontinuous Galerkin discretizations of incompressible flow problems

Science.gov (United States)

Botti, L.; Colombo, A.; Bassi, F.

2017-10-01

In this work we exploit agglomeration based h-multigrid preconditioners to speed-up the iterative solution of discontinuous Galerkin discretizations of the Stokes and Navier-Stokes equations. As a distinctive feature h-coarsened mesh sequences are generated by recursive agglomeration of a fine grid, admitting arbitrarily unstructured grids of complex domains, and agglomeration based discontinuous Galerkin discretizations are employed to deal with agglomerated elements of coarse levels. Both the expense of building coarse grid operators and the performance of the resulting multigrid iteration are investigated. For the sake of efficiency coarse grid operators are inherited through element-by-element L2 projections, avoiding the cost of numerical integration over agglomerated elements. Specific care is devoted to the projection of viscous terms discretized by means of the BR2 dG method. We demonstrate that enforcing the correct amount of stabilization on coarse grids levels is mandatory for achieving uniform convergence with respect to the number of levels. The numerical solution of steady and unsteady, linear and non-linear problems is considered tackling challenging 2D test cases and 3D real life computations on parallel architectures. Significant execution time gains are documented.
A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows

Science.gov (United States)

Bui, Trong T.

1999-01-01

A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.
Annual Copper Mountain Conferences on Multigrid and Iterative Methods, Copper Mountain, Colorado

International Nuclear Information System (INIS)

McCormick, Stephen F.

2016-01-01

This project supported the Copper Mountain Conference on Multigrid and Iterative Methods, held from 2007 to 2015, at Copper Mountain, Colorado. The subject of the Copper Mountain Conference Series alternated between Multigrid Methods in odd-numbered years and Iterative Methods in even-numbered years. Begun in 1983, the Series represents an important forum for the exchange of ideas in these two closely related fields. This report describes the Copper Mountain Conference on Multigrid and Iterative Methods, 2007-2015. Information on the conference series is available at http://grandmaster.colorado.edu/~copper/
Annual Copper Mountain Conferences on Multigrid and Iterative Methods, Copper Mountain, Colorado

Energy Technology Data Exchange (ETDEWEB)

McCormick, Stephen F. [Front Range Scientific, Inc., Lake City, CO (United States)

2016-03-25

This project supported the Copper Mountain Conference on Multigrid and Iterative Methods, held from 2007 to 2015, at Copper Mountain, Colorado. The subject of the Copper Mountain Conference Series alternated between Multigrid Methods in odd-numbered years and Iterative Methods in even-numbered years. Begun in 1983, the Series represents an important forum for the exchange of ideas in these two closely related fields. This report describes the Copper Mountain Conference on Multigrid and Iterative Methods, 2007-2015. Information on the conference series is available at http://grandmaster.colorado.edu/~copper/.
Simulating streamer discharges in 3D with the parallel adaptive Afivo framework

NARCIS (Netherlands)

H.J. Teunissen (Jannis); U. M. Ebert (Ute)

2017-01-01

htmlabstractWe present an open-source plasma fluid code for 2D, cylindrical and 3D simulations of streamer discharges, based on the Afivo framework that features adaptive mesh refinement, geometric multigrid methods for Poisson's equation, and OpenMP parallelism. We describe the numerical
Neural multigrid for gauge theories and other disordered systems

International Nuclear Information System (INIS)

Baeker, M.; Kalkreuter, T.; Mack, G.; Speh, M.

1992-09-01

We present evidence that multigrid works for wave equations in disordered systems, e.g. in the presence of gauge fields, no matter how strong the disorder, but one needs to introduce a 'neural computations' point of view into large scale simulations: First, the system must learn how to do the simulations efficiently, then do the simulation (fast). The method can also be used to provide smooth interpolation kernels which are needed in multigrid Monte Carlo updates. (orig.)
Mathematics and computational methods development in U.S. department of energy-sponsored research (nuclear energy research initiative and nuclear engineering education research). 5. Analysis of Angular V-Cycle Multigrid Formulation for Three-Dimensional Discrete Ordinates Shielding Problems

International Nuclear Information System (INIS)

Kucukboyaci, Vefa; Haghighat, Alireza

2001-01-01

We have developed new angular multigrid formulations, including the Simplified Angular Multigrid (SAM), Nested Iteration (NI), and V-Cycle schemes, that are compatible with the parallel environment and the adaptive differencing strategy of the PENTRAN three-dimensional parallel S N code. Using the Fourier analysis method for an infinite, homogenous medium, we have investigated the effectiveness of the V-Cycle scheme for different problem parameters including scattering ratio, spatial differencing weights, quadrature order, and mesh size. We have further investigated the effectiveness of the new schemes for practical shielding applications such as the Kobayashi benchmark problem and the boiling water reactor core shroud problem. In this paper, we summarize the angular V-Cycle scheme implemented in the PENTRAN code, the Fourier Analysis of the V-Cycle scheme, and results of convergence analysis of the V-Cycle scheme using different problem parameters. The theoretical analysis reveals that the V-Cycle scheme is effective for a large range of scattering ratios and is insensitive to mesh size. Besides the theoretical analysis, we have applied the new angular multigrid schemes to shielding problems. In comparison to the standard PCR formulation, combinations of the new angular multigrid schemes and PCR (e.g., SAM+V-Cycle+PCR) have proved to be very effective for scattering ratios in a range of 0.6 to 0.9. (authors)
Uniform convergence of multigrid V-cycle iterations for indefinite and nonsymmetric problems

Science.gov (United States)

Bramble, James H.; Kwak, Do Y.; Pasciak, Joseph E.

1993-01-01

In this paper, we present an analysis of a multigrid method for nonsymmetric and/or indefinite elliptic problems. In this multigrid method various types of smoothers may be used. One type of smoother which we consider is defined in terms of an associated symmetric problem and includes point and line, Jacobi, and Gauss-Seidel iterations. We also study smoothers based entirely on the original operator. One is based on the normal form, that is, the product of the operator and its transpose. Other smoothers studied include point and line, Jacobi, and Gauss-Seidel. We show that the uniform estimates for symmetric positive definite problems carry over to these algorithms. More precisely, the multigrid iteration for the nonsymmetric and/or indefinite problem is shown to converge at a uniform rate provided that the coarsest grid in the multilevel iteration is sufficiently fine (but not depending on the number of multigrid levels).
Large parallel volumes of finite and compact sets in d-dimensional Euclidean space

DEFF Research Database (Denmark)

Kampf, Jürgen; Kiderlen, Markus

The r-parallel volume V (Cr) of a compact subset C in d-dimensional Euclidean space is the volume of the set Cr of all points of Euclidean distance at most r > 0 from C. According to Steiner’s formula, V (Cr) is a polynomial in r when C is convex. For finite sets C satisfying a certain geometric...
Multigrid for high dimensional elliptic partial differential equations on non-equidistant grids

NARCIS (Netherlands)

bin Zubair, H.; Oosterlee, C.E.; Wienands, R.

2006-01-01

This work presents techniques, theory and numbers for multigrid in a general d-dimensional setting. The main focus is the multigrid convergence for high-dimensional partial differential equations (PDEs). As a model problem we have chosen the anisotropic diffusion equation, on a unit hypercube. We
Efficiency Analysis of the Parallel Implementation of the SIMPLE Algorithm on Multiprocessor Computers

Science.gov (United States)

Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.

2017-12-01

This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.
Algorithms and data structures for massively parallel generic adaptive finite element codes

KAUST Repository

Bangerth, Wolfgang

2011-12-01

Today\\'s largest supercomputers have 100,000s of processor cores and offer the potential to solve partial differential equations discretized by billions of unknowns. However, the complexity of scaling to such large machines and problem sizes has so far prevented the emergence of generic software libraries that support such computations, although these would lower the threshold of entry and enable many more applications to benefit from large-scale computing. We are concerned with providing this functionality for mesh-adaptive finite element computations. We assume the existence of an "oracle" that implements the generation and modification of an adaptive mesh distributed across many processors, and that responds to queries about its structure. Based on querying the oracle, we develop scalable algorithms and data structures for generic finite element methods. Specifically, we consider the parallel distribution of mesh data, global enumeration of degrees of freedom, constraints, and postprocessing. Our algorithms remove the bottlenecks that typically limit large-scale adaptive finite element analyses. We demonstrate scalability of complete finite element workflows on up to 16,384 processors. An implementation of the proposed algorithms, based on the open source software p4est as mesh oracle, is provided under an open source license through the widely used deal.II finite element software library. © 2011 ACM 0098-3500/2011/12-ART10 $10.00.
Multigrid Methods for the Computation of Propagators in Gauge Fields

Science.gov (United States)

Kalkreuter, Thomas

Multigrid methods were invented for the solution of discretized partial differential equations in order to overcome the slowness of traditional algorithms by updates on various length scales. In the present work generalizations of multigrid methods for propagators in gauge fields are investigated. Gauge fields are incorporated in algorithms in a covariant way. The kernel C of the restriction operator which averages from one grid to the next coarser grid is defined by projection on the ground-state of a local Hamiltonian. The idea behind this definition is that the appropriate notion of smoothness depends on the dynamics. The ground-state projection choice of C can be used in arbitrary dimension and for arbitrary gauge group. We discuss proper averaging operations for bosons and for staggered fermions. The kernels C can also be used in multigrid Monte Carlo simulations, and for the definition of block spins and blocked gauge fields in Monte Carlo renormalization group studies. Actual numerical computations are performed in four-dimensional SU(2) gauge fields. We prove that our proposals for block spins are “good”, using renormalization group arguments. A central result is that the multigrid method works in arbitrarily disordered gauge fields, in principle. It is proved that computations of propagators in gauge fields without critical slowing down are possible when one uses an ideal interpolation kernel. Unfortunately, the idealized algorithm is not practical, but it was important to answer questions of principle. Practical methods are able to outperform the conjugate gradient algorithm in case of bosons. The case of staggered fermions is harder. Multigrid methods give considerable speed-ups compared to conventional relaxation algorithms, but on lattices up to 184 conjugate gradient is superior.
HP-multigrid as smoother algorithm for higher order discontinuous Galerkin discretizations of advection dominated flows. Part I. Multilevel Analysis

NARCIS (Netherlands)

van der Vegt, Jacobus J.W.; Rhebergen, Sander

2011-01-01

The hp-Multigrid as Smoother algorithm (hp-MGS) for the solution of higher order accurate space-(time) discontinuous Galerkin discretizations of advection dominated flows is presented. This algorithm combines p-multigrid with h-multigrid at all p-levels, where the h-multigrid acts as smoother in the
DL_MG: A Parallel Multigrid Poisson and Poisson-Boltzmann Solver for Electronic Structure Calculations in Vacuum and Solution.

Science.gov (United States)

Womack, James C; Anton, Lucian; Dziedzic, Jacek; Hasnip, Phil J; Probert, Matt I J; Skylaris, Chris-Kriton

2018-03-13

The solution of the Poisson equation is a crucial step in electronic structure calculations, yielding the electrostatic potential-a key component of the quantum mechanical Hamiltonian. In recent decades, theoretical advances and increases in computer performance have made it possible to simulate the electronic structure of extended systems in complex environments. This requires the solution of more complicated variants of the Poisson equation, featuring nonhomogeneous dielectric permittivities, ionic concentrations with nonlinear dependencies, and diverse boundary conditions. The analytic solutions generally used to solve the Poisson equation in vacuum (or with homogeneous permittivity) are not applicable in these circumstances, and numerical methods must be used. In this work, we present DL_MG, a flexible, scalable, and accurate solver library, developed specifically to tackle the challenges of solving the Poisson equation in modern large-scale electronic structure calculations on parallel computers. Our solver is based on the multigrid approach and uses an iterative high-order defect correction method to improve the accuracy of solutions. Using two chemically relevant model systems, we tested the accuracy and computational performance of DL_MG when solving the generalized Poisson and Poisson-Boltzmann equations, demonstrating excellent agreement with analytic solutions and efficient scaling to ∼10 9 unknowns and 100s of CPU cores. We also applied DL_MG in actual large-scale electronic structure calculations, using the ONETEP linear-scaling electronic structure package to study a 2615 atom protein-ligand complex with routinely available computational resources. In these calculations, the overall execution time with DL_MG was not significantly greater than the time required for calculations using a conventional FFT-based solver.
A framework for grand scale parallelization of the combined finite discrete element method in 2d

Science.gov (United States)

Lei, Z.; Rougier, E.; Knight, E. E.; Munjiza, A.

2014-09-01

Within the context of rock mechanics, the Combined Finite-Discrete Element Method (FDEM) has been applied to many complex industrial problems such as block caving, deep mining techniques (tunneling, pillar strength, etc.), rock blasting, seismic wave propagation, packing problems, dam stability, rock slope stability, rock mass strength characterization problems, etc. The reality is that most of these were accomplished in a 2D and/or single processor realm. In this work a hardware independent FDEM parallelization framework has been developed using the Virtual Parallel Machine for FDEM, (V-FDEM). With V-FDEM, a parallel FDEM software can be adapted to different parallel architecture systems ranging from just a few to thousands of cores.
s-Step Krylov Subspace Methods as Bottom Solvers for Geometric Multigrid

Energy Technology Data Exchange (ETDEWEB)

Williams, Samuel [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Lijewski, Mike [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Almgren, Ann [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Straalen, Brian Van [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Carson, Erin [Univ. of California, Berkeley, CA (United States); Knight, Nicholas [Univ. of California, Berkeley, CA (United States); Demmel, James [Univ. of California, Berkeley, CA (United States)

2014-08-14

Geometric multigrid solvers within adaptive mesh refinement (AMR) applications often reach a point where further coarsening of the grid becomes impractical as individual sub domain sizes approach unity. At this point the most common solution is to use a bottom solver, such as BiCGStab, to reduce the residual by a fixed factor at the coarsest level. Each iteration of BiCGStab requires multiple global reductions (MPI collectives). As the number of BiCGStab iterations required for convergence grows with problem size, and the time for each collective operation increases with machine scale, bottom solves in large-scale applications can constitute a significant fraction of the overall multigrid solve time. In this paper, we implement, evaluate, and optimize a communication-avoiding s-step formulation of BiCGStab (CABiCGStab for short) as a high-performance, distributed-memory bottom solver for geometric multigrid solvers. This is the first time s-step Krylov subspace methods have been leveraged to improve multigrid bottom solver performance. We use a synthetic benchmark for detailed analysis and integrate the best implementation into BoxLib in order to evaluate the benefit of a s-step Krylov subspace method on the multigrid solves found in the applications LMC and Nyx on up to 32,768 cores on the Cray XE6 at NERSC. Overall, we see bottom solver improvements of up to 4.2x on synthetic problems and up to 2.7x in real applications. This results in as much as a 1.5x improvement in solver performance in real applications.
Copper Mountain conference on multigrid methods. Preliminary proceedings -- List of abstracts

Energy Technology Data Exchange (ETDEWEB)

NONE

1995-12-31

This report contains abstracts of the papers presented at the conference. Papers cover multigrid algorithms and applications of multigrid methods. Applications include the following: solution of elliptical problems; electric power grids; fluid mechanics; atmospheric data assimilation; thermocapillary effects on weld pool shape; boundary-value problems; prediction of hurricane tracks; modeling multi-dimensional combustion and detailed chemistry; black-oil reservoir simulation; image processing; and others.
Multi Scale Finite Element Analyses By Using SEM-EBSD Crystallographic Modeling and Parallel Computing

International Nuclear Information System (INIS)

Nakamachi, Eiji

2005-01-01

A crystallographic homogenization procedure is introduced to the conventional static-explicit and dynamic-explicit finite element formulation to develop a multi scale - double scale - analysis code to predict the plastic strain induced texture evolution, yield loci and formability of sheet metal. The double-scale structure consists of a crystal aggregation - micro-structure - and a macroscopic elastic plastic continuum. At first, we measure crystal morphologies by using SEM-EBSD apparatus, and define a unit cell of micro structure, which satisfy the periodicity condition in the real scale of polycrystal. Next, this crystallographic homogenization FE code is applied to 3N pure-iron and 'Benchmark' aluminum A6022 polycrystal sheets. It reveals that the initial crystal orientation distribution - the texture - affects very much to a plastic strain induced texture and anisotropic hardening evolutions and sheet deformation. Since, the multi-scale finite element analysis requires a large computation time, a parallel computing technique by using PC cluster is developed for a quick calculation. In this parallelization scheme, a dynamic workload balancing technique is introduced for quick and efficient calculations

The multigrid method for reactor calculations

International Nuclear Information System (INIS)

Douglas, S.R.

1991-07-01

Iterative solutions to linear systems of equations are discussed. The emphasis is on the concepts that affect convergence rates of these solution methods. The multigrid method is described, including the smoothing property, restriction, and prolongation. A simple example is used to illustrate the ideas
Multigrid Computation of Stratified Flow over Two-Dimensional Obstacles

Science.gov (United States)

Paisley, M. F.

1997-09-01

A robust multigrid method for the incompressible Navier-Stokes equations is presented and applied to the computation of viscous flow over obstacles in a bounded domain under conditions of neutral stability and stable density stratification. Two obstacle shapes have been used, namely a vertical barrier, for which the grid is Cartesian, and a smooth cosine-shaped obstacle, for which a boundary-conforming transformation is incorporated. Results are given for laminar flows at low Reynolds numbers and turbulent flows at a high Reynolds number, when a simple mixing length turbulence model is included. The multigrid algorithm is used to compute steady flows for each obstacle at low and high Reynolds numbers in conditions of weak static stability, defined byK=ND/πU≤ 1, whereU,N, andDare the upstream velocity, bouyancy frequency, and domain height respectively. Results are also presented for the vertical barrier at low and high Reynolds number in conditions of strong static stability,K> 1, when lee wave motions ensure that the flow is unsteady, and the multigrid algorithm is used to compute the flow at each timestep.
Final report on the Copper Mountain conference on multigrid methods

Energy Technology Data Exchange (ETDEWEB)

NONE

1997-10-01

The Copper Mountain Conference on Multigrid Methods was held on April 6-11, 1997. It took the same format used in the previous Copper Mountain Conferences on Multigrid Method conferences. Over 87 mathematicians from all over the world attended the meeting. 56 half-hour talks on current research topics were presented. Talks with similar content were organized into sessions. Session topics included: fluids; domain decomposition; iterative methods; basics; adaptive methods; non-linear filtering; CFD; applications; transport; algebraic solvers; supercomputing; and student paper winners.
Multi-grid Particle-in-cell Simulations of Plasma Microturbulence

International Nuclear Information System (INIS)

Lewandowski, J.L.V.

2003-01-01

A new scheme to accurately retain kinetic electron effects in particle-in-cell (PIC) simulations for the case of electrostatic drift waves is presented. The splitting scheme, which is based on exact separation between adiabatic and on adiabatic electron responses, is shown to yield more accurate linear growth rates than the standard df scheme. The linear and nonlinear elliptic problems that arise in the splitting scheme are solved using a multi-grid solver. The multi-grid particle-in-cell approach offers an attractive path, both from the physics and numerical points of view, to simulate kinetic electron dynamics in global toroidal plasmas
Analysis and development of stochastic multigrid methods in lattice field theory

International Nuclear Information System (INIS)

Grabenstein, M.

1994-01-01

We study the relation between the dynamical critical behavior and the kinematics of stochastic multigrid algorithms. The scale dependence of acceptance rates for nonlocal Metropolis updates is analyzed with the help of an approximation formula. A quantitative study of the kinematics of multigrid algorithms in several interacting models is performed. We find that for a critical model with Hamiltonian H(Φ) absence of critical slowing down can only be expected if the expansion of (H(Φ+ψ)) in terms of the shift ψ contains no relevant term (mass term). The predictions of this rule was verified in a multigrid Monte Carlo simulation of the Sine Gordon model in two dimensions. Our analysis can serve as a guideline for the development of new algorithms: We propose a new multigrid method for nonabelian lattice gauge theory, the time slice blocking. For SU(2) gauge fields in two dimensions, critical slowing down is almost completely eliminated by this method, in accordance with the theoretical prediction. The generalization of the time slice blocking to SU(2) in four dimensions is investigated analytically and by numerical simulations. Compared to two dimensions, the local disorder in the four dimensional gauge field leads to kinematical problems. (orig.)
Scalable Adaptive Multilevel Solvers for Multiphysics Problems

Energy Technology Data Exchange (ETDEWEB)

Xu, Jinchao [Pennsylvania State Univ., University Park, PA (United States). Dept. of Mathematics

2014-11-26

In this project, we carried out many studies on adaptive and parallel multilevel methods for numerical modeling for various applications, including Magnetohydrodynamics (MHD) and complex fluids. We have made significant efforts and advances in adaptive multilevel methods of the multiphysics problems: multigrid methods, adaptive finite element methods, and applications.
A Parallel Algebraic Multigrid Solver on Graphics Processing Units

KAUST Repository

Haase, Gundolf; Liebmann, Manfred; Douglas, Craig C.; Plank, Gernot

2010-01-01

-vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster
Multigrid on unstructured grids using an auxiliary set of structured grids

Energy Technology Data Exchange (ETDEWEB)

Douglas, C.C.; Malhotra, S.; Schultz, M.H. [Yale Univ., New Haven, CT (United States)

1996-12-31

Unstructured grids do not have a convenient and natural multigrid framework for actually computing and maintaining a high floating point rate on standard computers. In fact, just the coarsening process is expensive for many applications. Since unstructured grids play a vital role in many scientific computing applications, many modifications have been proposed to solve this problem. One suggested solution is to map the original unstructured grid onto a structured grid. This can be used as a fine grid in a standard multigrid algorithm to precondition the original problem on the unstructured grid. We show that unless extreme care is taken, this mapping can lead to a system with a high condition number which eliminates the usefulness of the multigrid method. Theorems with lower and upper bounds are provided. Simple examples show that the upper bounds are sharp.
Generalization of mixed multiscale finite element methods with applications

Energy Technology Data Exchange (ETDEWEB)

Lee, C S [Texas A & M Univ., College Station, TX (United States)

2016-08-01

Many science and engineering problems exhibit scale disparity and high contrast. The small scale features cannot be omitted in the physical models because they can affect the macroscopic behavior of the problems. However, resolving all the scales in these problems can be prohibitively expensive. As a consequence, some types of model reduction techniques are required to design efficient solution algorithms. For practical purpose, we are interested in mixed finite element problems as they produce solutions with certain conservative properties. Existing multiscale methods for such problems include the mixed multiscale finite element methods. We show that for complicated problems, the mixed multiscale finite element methods may not be able to produce reliable approximations. This motivates the need of enrichment for coarse spaces. Two enrichment approaches are proposed, one is based on generalized multiscale finte element metthods (GMsFEM), while the other is based on spectral element-based algebraic multigrid (rAMGe). The former one, which is called mixed GMsFEM, is developed for both Darcy’s flow and linear elasticity. Application of the algorithm in two-phase flow simulations are demonstrated. For linear elasticity, the algorithm is subtly modified due to the symmetry requirement of the stress tensor. The latter enrichment approach is based on rAMGe. The algorithm differs from GMsFEM in that both of the velocity and pressure spaces are coarsened. Due the multigrid nature of the algorithm, recursive application is available, which results in an efficient multilevel construction of the coarse spaces. Stability, convergence analysis, and exhaustive numerical experiments are carried out to validate the proposed enrichment approaches. iii
Uzawa smoother in multigrid for the coupleD porous medium and stokes flow system

NARCIS (Netherlands)

P. Luo (Peiyao); C. Rodrigo (Carmen); F.J. Gaspar Lorenz (Franscisco); C.W. Oosterlee (Kees)

2017-01-01

textabstractThe multigrid solution of coupled porous media and Stokes flow problems is considered. The Darcy equation as the saturated porous medium model is coupled to the Stokes equations by means of appropriate interface conditions. We focus on an efficient multigrid solution technique for the
Efficient multigrid computation of steady hypersonic flows

NARCIS (Netherlands)

Koren, B.; Hemker, P.W.; Murthy, T.K.S.

1991-01-01

In steady hypersonic flow computations, Newton iteration as a local relaxation procedure and nonlinear multigrid iteration as an acceleration procedure may both easily fail. In the present chapter, same remedies are presented for overcoming these problems. The equations considered are the steady,
On multigrid-CG for efficient topology optimization

DEFF Research Database (Denmark)

Amir, Oded; Aage, Niels; Lazarov, Boyan Stefanov

2014-01-01

reduction is obtained by exploiting specific characteristics of a multigrid preconditioned conjugate gradients (MGCG) solver. In particular, the number of MGCG iterations is reduced by relating it to the geometric parameters of the problem. At the same time, accurate outcome of the optimization process...
Finite approximations in fluid mechanics

International Nuclear Information System (INIS)

Hirschel, E.H.

1986-01-01

This book contains twenty papers on work which was conducted between 1983 and 1985 in the Priority Research Program ''Finite Approximations in Fluid Mechanics'' of the German Research Society (Deutsche Forschungsgemeinschaft). Scientists from numerical mathematics, fluid mechanics, and aerodynamics present their research on boundary-element methods, factorization methods, higher-order panel methods, multigrid methods for elliptical and parabolic problems, two-step schemes for the Euler equations, etc. Applications are made to channel flows, gas dynamical problems, large eddy simulation of turbulence, non-Newtonian flow, turbomachine flow, zonal solutions for viscous flow problems, etc. The contents include: multigrid methods for problems from fluid dynamics, development of a 2D-Transonic Potential Flow Solver; a boundary element spectral method for nonstationary viscous flows in 3 dimensions; navier-stokes computations of two-dimensional laminar flows in a channel with a backward facing step; calculations and experimental investigations of the laminar unsteady flow in a pipe expansion; calculation of the flow-field caused by shock wave and deflagration interaction; a multi-level discretization and solution method for potential flow problems in three dimensions; solutions of the conservation equations with the approximate factorization method; inviscid and viscous flow through rotating meridional contours; zonal solutions for viscous flow problems
An h-adaptive finite element solver for the calculations of the electronic structures

International Nuclear Information System (INIS)

Bao Gang; Hu Guanghui; Liu Di

2012-01-01

In this paper, a framework of using h-adaptive finite element method for the Kohn–Sham equation on the tetrahedron mesh is presented. The Kohn–Sham equation is discretized by the finite element method, and the h-adaptive technique is adopted to optimize the accuracy and the efficiency of the algorithm. The locally optimal block preconditioned conjugate gradient method is employed for solving the generalized eigenvalue problem, and an algebraic multigrid preconditioner is used to accelerate the solver. A variety of numerical experiments demonstrate the effectiveness of our algorithm for both the all-electron and the pseudo-potential calculations.
Morphing Wing Structural Optimization Using Opposite-Based Population-Based Incremental Learning and Multigrid Ground Elements

Directory of Open Access Journals (Sweden)

S. Sleesongsom

2015-01-01

Full Text Available This paper has twin aims. Firstly, a multigrid design approach for optimization of an unconventional morphing wing is proposed. The structural design problem is assigned to optimize wing mass, lift effectiveness, and buckling factor subject to structural safety requirements. Design variables consist of partial topology, nodal positions, and component sizes of a wing internal structure. Such a design process can be accomplished by using multiple resolutions of ground elements, which is called a multigrid approach. Secondly, an opposite-based multiobjective population-based incremental learning (OMPBIL is proposed for comparison with the original multiobjective population-based incremental learning (MPBIL. Multiobjective design problems with single-grid and multigrid design variables are then posed and tackled by OMPBIL and MPBIL. The results show that using OMPBIL in combination with a multigrid design approach is the best design strategy. OMPBIL is superior to MPBIL since the former provides better population diversity. Aeroelastic trim for an elastic morphing wing is also presented.
A multilevel correction adaptive finite element method for Kohn-Sham equation

Science.gov (United States)

Hu, Guanghui; Xie, Hehu; Xu, Fei

2018-02-01

In this paper, an adaptive finite element method is proposed for solving Kohn-Sham equation with the multilevel correction technique. In the method, the Kohn-Sham equation is solved on a fixed and appropriately coarse mesh with the finite element method in which the finite element space is kept improving by solving the derived boundary value problems on a series of adaptively and successively refined meshes. A main feature of the method is that solving large scale Kohn-Sham system is avoided effectively, and solving the derived boundary value problems can be handled efficiently by classical methods such as the multigrid method. Hence, the significant acceleration can be obtained on solving Kohn-Sham equation with the proposed multilevel correction technique. The performance of the method is examined by a variety of numerical experiments.
Numerical Multilevel Upscaling for Incompressible Flow in Reservoir Simulation: An Element-based Algebraic Multigrid (AMGe) Approach

DEFF Research Database (Denmark)

Christensen, Max la Cour; Villa, Umberto; Engsig-Karup, Allan Peter

2017-01-01

associated with non-planar interfaces between agglomerates, the coarse velocity space has guaranteed approximation properties. The employed AMGe technique provides coarse spaces with desirable local mass conservation and stability properties analogous to the original pair of Raviart-Thomas and piecewise......We study the application of a finite element numerical upscaling technique to the incompressible two-phase porous media total velocity formulation. Specifically, an element agglomeration based Algebraic Multigrid (AMGe) technique with improved approximation proper ties [37] is used, for the first...... discontinuous polynomial spaces, resulting in strong mass conservation for the upscaled systems. Due to the guaranteed approximation properties and the generic nature of the AMGe method, recursive multilevel upscaling is automatically obtained. Furthermore, this technique works for both structured...
A parallel finite-volume finite-element method for transient compressible turbulent flows with heat transfer

International Nuclear Information System (INIS)

Masoud Ziaei-Rad

2010-01-01

In this paper, a two-dimensional numerical scheme is presented for the simulation of turbulent, viscous, transient compressible flows in the simultaneously developing hydraulic and thermal boundary layer region. The numerical procedure is a finite-volume-based finite-element method applied to unstructured grids. This combination together with a new method applied for the boundary conditions allows for accurate computation of the variables in the entrance region and for a wide range of flow fields from subsonic to transonic. The Roe-Riemann solver is used for the convective terms, whereas the standard Galerkin technique is applied for the viscous terms. A modified κ-ε model with a two-layer equation for the near-wall region combined with a compressibility correction is used to predict the turbulent viscosity. Parallel processing is also employed to divide the computational domain among the different processors to reduce the computational time. The method is applied to some test cases in order to verify the numerical accuracy. The results show significant differences between incompressible and compressible flows in the friction coefficient, Nusselt number, shear stress and the ratio of the compressible turbulent viscosity to the molecular viscosity along the developing region. A transient flow generated after an accidental rupture in a pipeline was also studied as a test case. The results show that the present numerical scheme is stable, accurate and efficient enough to solve the problem of transient wall-bounded flow.
Numerical Evaluation of P-Multigrid Method for the Solution of Discontinuous Galerkin Discretizations of Diffusive Equations

Science.gov (United States)

Atkins, H. L.; Helenbrook, B. T.

2005-01-01

This paper describes numerical experiments with P-multigrid to corroborate analysis, validate the present implementation, and to examine issues that arise in the implementations of the various combinations of relaxation schemes, discretizations and P-multigrid methods. The two approaches to implement P-multigrid presented here are equivalent for most high-order discretization methods such as spectral element, SUPG, and discontinuous Galerkin applied to advection; however it is discovered that the approach that mimics the common geometric multigrid implementation is less robust, and frequently unstable when applied to discontinuous Galerkin discretizations of di usion. Gauss-Seidel relaxation converges 40% faster than block Jacobi, as predicted by analysis; however, the implementation of Gauss-Seidel is considerably more expensive that one would expect because gradients in most neighboring elements must be updated. A compromise quasi Gauss-Seidel relaxation method that evaluates the gradient in each element twice per iteration converges at rates similar to those predicted for true Gauss-Seidel.
Use of a multigrid technique to study effects of limited sampling of heterogeneity on transport prediction

International Nuclear Information System (INIS)

Cole, C.R.; Foote, H.P.

1987-02-01

Reliable ground water transport prediction requires accurate spatial and temporal characterization of a hydrogeologic system. However, cost constraints and the desire to maintain site integrity by minimizing drilling can restrict the amount of spatial sampling that can be obtained to resolve the flow parameter variability associated with heterogeneities. This study quantifies the errors in subsurface transport predictions resulting from incomplete characterization of hydraulic conductivity heterogeneity. A multigrid technique was used to simulate two-dimensional flow velocity fields with high resolution. To obtain these velocity fields, the finite difference code MGRID, which implements a multigrid solution technique, was applied to compute stream functions on a 256-by-256 grid for a variety of hypothetical systems having detailed distributions of hydraulic conductivity. Spatial variability in hydraulic conductivity distributions was characterized by the components in the spectrum of spatial frequencies. A low-pass spatial filtering technique was applied to the base case hydraulic conductivity distribution to produce a data set with lower spatial frequency content. Arrival time curves were then calculated for filtered hydraulic conductivity distribution and compared to base case results to judge the relative importance of the higher spatial frequency components. Results indicate a progression from multimode to single-mode arrival time curves as the number and extent of distinct flow pathways are reduced by low-pass filtering. This relationship between transport predictions and spatial frequencies was used to judge the consequences of sampling the hydraulic conductivity with reduced spatial resolution. 22 refs., 17 figs

A two-dimensional discontinuous heterogeneous finite element method for neutron transport calculations

International Nuclear Information System (INIS)

Masiello, E.; Sanchez, R.

2007-01-01

A discontinuous heterogeneous finite element method is presented and discussed. The method is intended for realistic numerical pin-by-pin lattice calculations when an exact representation of the geometric shape of the pins is made without need for homogenization. The method keeps the advantages of conventional discrete ordinate methods, such as fast execution together with the possibility to deal with a large number of spatial meshes, while minimizing the need for geometric modeling. It also provides a complete factorization in space, angle, and energy for the discretized matrices and minimizes, thus, storage requirements. An angular multigrid acceleration technique has also been developed to speed up the rate of convergence of the inner iterations. A particular aspect of this acceleration is the introduction of boundary restriction and prolongation operators that minimize oscillatory behavior and enhance positivity. Numerical tests are presented that show the high precision of the method and the efficiency of the angular multigrid acceleration. (authors)
Towards a multigrid scheme in SU(2) lattice gauge theory

International Nuclear Information System (INIS)

Gutbrod, F.

1992-12-01

The task of constructing a viable updating multigrid scheme for SU(2) lattice gauge theory is discussed in connection with the classical eigenvalue problem. For a nonlocal overrelaxation Monte Carlo update step, the central numerical problem is the search for the minimum of a quadratic approximation to the action under nonlocal constraints. Here approximate eigenfunctions are essential to reduce the numerical work, and these eigenfunctions are to be constructed with multigrid techniques. A simple implementation on asymmetric lattices is described, where the grids are restricted to 3-dimensional hyperplanes. The scheme is shown to be moderately successful in the early stages of the updating history (starting from a cold configuration). The main results of another, less asymmetric scheme are presented briefly. (orig.)
A parallel direct solver for the self-adaptive hp Finite Element Method

KAUST Repository

Paszyński, Maciej R.

2010-03-01

In this paper we present a new parallel multi-frontal direct solver, dedicated for the hp Finite Element Method (hp-FEM). The self-adaptive hp-FEM generates in a fully automatic mode, a sequence of hp-meshes delivering exponential convergence of the error with respect to the number of degrees of freedom (d.o.f.) as well as the CPU time, by performing a sequence of hp refinements starting from an arbitrary initial mesh. The solver constructs an initial elimination tree for an arbitrary initial mesh, and expands the elimination tree each time the mesh is refined. This allows us to keep track of the order of elimination for the solver. The solver also minimizes the memory usage, by de-allocating partial LU factorizations computed during the elimination stage of the solver, and recomputes them for the backward substitution stage, by utilizing only about 10% of the computational time necessary for the original computations. The solver has been tested on 3D Direct Current (DC) borehole resistivity measurement simulations problems. We measure the execution time and memory usage of the solver over a large regular mesh with 1.5 million degrees of freedom as well as on the highly non-regular mesh, generated by the self-adaptive h p-FEM, with finite elements of various sizes and polynomial orders of approximation varying from p = 1 to p = 9. From the presented experiments it follows that the parallel solver scales well up to the maximum number of utilized processors. The limit for the solver scalability is the maximum sequential part of the algorithm: the computations of the partial LU factorizations over the longest path, coming from the root of the elimination tree down to the deepest leaf. © 2009 Elsevier Inc. All rights reserved.
Compiler-Directed Transformation for Higher-Order Stencils

Energy Technology Data Exchange (ETDEWEB)

Basu, Protonu [Univ. of Utah, Salt Lake City, UT (United States); Hall, Mary [Univ. of Utah, Salt Lake City, UT (United States); Williams, Samuel [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Straalen, Brian Van [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Oliker, Leonid [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Colella, Phillip [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

2015-07-20

As the cost of data movement increasingly dominates performance, developers of finite-volume and finite-difference solutions for partial differential equations (PDEs) are exploring novel higher-order stencils that increase numerical accuracy and computational intensity. This paper describes a new compiler reordering transformation applied to stencil operators that performs partial sums in buffers, and reuses the partial sums in computing multiple results. This optimization has multiple effect son improving stencil performance that are particularly important to higher-order stencils: exploits data reuse, reduces floating-point operations, and exposes efficient SIMD parallelism to backend compilers. We study the benefit of this optimization in the context of Geometric Multigrid (GMG), a widely used method to solvePDEs, using four different Jacobi smoothers built from 7-, 13-, 27-and 125-point stencils. We quantify performance, speedup, andnumerical accuracy, and use the Roofline model to qualify our results. Ultimately, we obtain over 4× speedup on the smoothers themselves and up to a 3× speedup on the multigrid solver. Finally, we demonstrate that high-order multigrid solvers have the potential of reducing total data movement and energy by several orders of magnitude.
A Multigrid NLS-4DVar Data Assimilation Scheme with Advanced Research WRF (ARW)

Science.gov (United States)

Zhang, H.; Tian, X.

2017-12-01

The motions of the atmosphere have multiscale properties in space and/or time, and the background error covariance matrix (Β) should thus contain error information at different correlation scales. To obtain an optimal analysis, the multigrid three-dimensional variational data assimilation scheme is used widely when sequentially correcting errors from large to small scales. However, introduction of the multigrid technique into four-dimensional variational data assimilation is not easy, due to its strong dependence on the adjoint model, which has extremely high computational costs in data coding, maintenance, and updating. In this study, the multigrid technique was introduced into the nonlinear least-squares four-dimensional variational assimilation (NLS-4DVar) method, which is an advanced four-dimensional ensemble-variational method that can be applied without invoking the adjoint models. The multigrid NLS-4DVar (MG-NLS-4DVar) scheme uses the number of grid points to control the scale, with doubling of this number when moving from a coarse to a finer grid. Furthermore, the MG-NLS-4DVar scheme not only retains the advantages of NLS-4DVar, but also sufficiently corrects multiscale errors to achieve a highly accurate analysis. The effectiveness and efficiency of the proposed MG-NLS-4DVar scheme were evaluated by several groups of observing system simulation experiments using the Advanced Research Weather Research and Forecasting Model. MG-NLS-4DVar outperformed NLS-4DVar, with a lower computational cost.
Efficient numerical methods for the large-scale, parallel solution of elastoplastic contact problems

KAUST Repository

Frohne, Jö rg; Heister, Timo; Bangerth, Wolfgang

2015-01-01

© 2016 John Wiley & Sons, Ltd. Quasi-static elastoplastic contact problems are ubiquitous in many industrial processes and other contexts, and their numerical simulation is consequently of great interest in accurately describing and optimizing production processes. The key component in these simulations is the solution of a single load step of a time iteration. From a mathematical perspective, the problems to be solved in each time step are characterized by the difficulties of variational inequalities for both the plastic behavior and the contact problem. Computationally, they also often lead to very large problems. In this paper, we present and evaluate a complete set of methods that are (1) designed to work well together and (2) allow for the efficient solution of such problems. In particular, we use adaptive finite element meshes with linear and quadratic elements, a Newton linearization of the plasticity, active set methods for the contact problem, and multigrid-preconditioned linear solvers. Through a sequence of numerical experiments, we show the performance of these methods. This includes highly accurate solutions of a three-dimensional benchmark problem and scaling our methods in parallel to 1024 cores and more than a billion unknowns.
Efficient numerical methods for the large-scale, parallel solution of elastoplastic contact problems

KAUST Repository

Frohne, Jörg

2015-08-06

© 2016 John Wiley & Sons, Ltd. Quasi-static elastoplastic contact problems are ubiquitous in many industrial processes and other contexts, and their numerical simulation is consequently of great interest in accurately describing and optimizing production processes. The key component in these simulations is the solution of a single load step of a time iteration. From a mathematical perspective, the problems to be solved in each time step are characterized by the difficulties of variational inequalities for both the plastic behavior and the contact problem. Computationally, they also often lead to very large problems. In this paper, we present and evaluate a complete set of methods that are (1) designed to work well together and (2) allow for the efficient solution of such problems. In particular, we use adaptive finite element meshes with linear and quadratic elements, a Newton linearization of the plasticity, active set methods for the contact problem, and multigrid-preconditioned linear solvers. Through a sequence of numerical experiments, we show the performance of these methods. This includes highly accurate solutions of a three-dimensional benchmark problem and scaling our methods in parallel to 1024 cores and more than a billion unknowns.
Quasi-disjoint pentadiagonal matrix systems for the parallelization of compact finite-difference schemes and filters

Science.gov (United States)

Kim, Jae Wook

2013-05-01

This paper proposes a novel systematic approach for the parallelization of pentadiagonal compact finite-difference schemes and filters based on domain decomposition. The proposed approach allows a pentadiagonal banded matrix system to be split into quasi-disjoint subsystems by using a linear-algebraic transformation technique. As a result the inversion of pentadiagonal matrices can be implemented within each subdomain in an independent manner subject to a conventional halo-exchange process. The proposed matrix transformation leads to new subdomain boundary (SB) compact schemes and filters that require three halo terms to exchange with neighboring subdomains. The internode communication overhead in the present approach is equivalent to that of standard explicit schemes and filters based on seven-point discretization stencils. The new SB compact schemes and filters demand additional arithmetic operations compared to the original serial ones. However, it is shown that the additional cost becomes sufficiently low by choosing optimal sizes of their discretization stencils. Compared to earlier published results, the proposed SB compact schemes and filters successfully reduce parallelization artifacts arising from subdomain boundaries to a level sufficiently negligible for sophisticated aeroacoustic simulations without degrading parallel efficiency. The overall performance and parallel efficiency of the proposed approach are demonstrated by stringent benchmark tests.
Performance and scalability of finite-difference and finite-element wave-propagation modeling on Intel's Xeon Phi

NARCIS (Netherlands)

Zhebel, E.; Minisini, S.; Kononov, A.; Mulder, W.A.

2013-01-01

With the rapid developments in parallel compute architectures, algorithms for seismic modeling and imaging need to be reconsidered in terms of parallelization. The aim of this paper is to compare scalability of seismic modeling algorithms: finite differences, continuous mass-lumped finite elements
Adaptive Algebraic Multigrid for Finite Element Elliptic Equations with Random Coefficients

Energy Technology Data Exchange (ETDEWEB)

Kalchev, D

2012-04-02

This thesis presents a two-grid algorithm based on Smoothed Aggregation Spectral Element Agglomeration Algebraic Multigrid (SA-{rho}AMGe) combined with adaptation. The aim is to build an efficient solver for the linear systems arising from discretization of second-order elliptic partial differential equations (PDEs) with stochastic coefficients. Examples include PDEs that model subsurface flow with random permeability field. During a Markov Chain Monte Carlo (MCMC) simulation process, that draws PDE coefficient samples from a certain distribution, the PDE coefficients change, hence the resulting linear systems to be solved change. At every such step the system (discretized PDE) needs to be solved and the computed solution used to evaluate some functional(s) of interest that then determine if the coefficient sample is acceptable or not. The MCMC process is hence computationally intensive and requires the solvers used to be efficient and fast. This fact that at every step of MCMC the resulting linear system changes, makes an already existing solver built for the old problem perhaps not as efficient for the problem corresponding to the new sampled coefficient. This motivates the main goal of our study, namely, to adapt an already existing solver to handle the problem (with changed coefficient) with the objective to achieve this goal to be faster and more efficient than building a completely new solver from scratch. Our approach utilizes the local element matrices (for the problem with changed coefficients) to build local problems associated with constructed by the method agglomerated elements (a set of subdomains that cover the given computational domain). We solve a generalized eigenproblem for each set in a subspace spanned by the previous local coarse space (used for the old solver) and a vector, component of the error, that the old solver cannot handle. A portion of the spectrum of these local eigen-problems (corresponding to eigenvalues close to zero) form the
Parallel Finite Element Particle-In-Cell Code for Simulations of Space-charge Dominated Beam-Cavity Interactions

International Nuclear Information System (INIS)

Candel, A.; Kabel, A.; Ko, K.; Lee, L.; Li, Z.; Limborg, C.; Ng, C.; Prudencio, E.; Schussman, G.; Uplenchwar, R.

2007-01-01

Over the past years, SLAC's Advanced Computations Department (ACD) has developed the parallel finite element (FE) particle-in-cell code Pic3P (Pic2P) for simulations of beam-cavity interactions dominated by space-charge effects. As opposed to standard space-charge dominated beam transport codes, which are based on the electrostatic approximation, Pic3P (Pic2P) includes space-charge, retardation and boundary effects as it self-consistently solves the complete set of Maxwell-Lorentz equations using higher-order FE methods on conformal meshes. Use of efficient, large-scale parallel processing allows for the modeling of photoinjectors with unprecedented accuracy, aiding the design and operation of the next-generation of accelerator facilities. Applications to the Linac Coherent Light Source (LCLS) RF gun are presented
Block-accelerated aggregation multigrid for Markov chains with application to PageRank problems

Science.gov (United States)

Shen, Zhao-Li; Huang, Ting-Zhu; Carpentieri, Bruno; Wen, Chun; Gu, Xian-Ming

2018-06-01

Recently, the adaptive algebraic aggregation multigrid method has been proposed for computing stationary distributions of Markov chains. This method updates aggregates on every iterative cycle to keep high accuracies of coarse-level corrections. Accordingly, its fast convergence rate is well guaranteed, but often a large proportion of time is cost by aggregation processes. In this paper, we show that the aggregates on each level in this method can be utilized to transfer the probability equation of that level into a block linear system. Then we propose a Block-Jacobi relaxation that deals with the block system on each level to smooth error. Some theoretical analysis of this technique is presented, meanwhile it is also adapted to solve PageRank problems. The purpose of this technique is to accelerate the adaptive aggregation multigrid method and its variants for solving Markov chains and PageRank problems. It also attempts to shed some light on new solutions for making aggregation processes more cost-effective for aggregation multigrid methods. Numerical experiments are presented to illustrate the effectiveness of this technique.
Achieving Textbook Multigrid Efficiency for Hydrostatic Ice Sheet Flow

KAUST Repository

Brown, Jed; Smith, Barry; Ahmadia, Aron

2013-01-01

The hydrostatic equations for ice sheet flow offer improved fidelity compared with the shallow ice approximation and shallow stream approximation popular in today's ice sheet models. Nevertheless, they present a serious bottleneck because they require the solution of a three-dimensional (3D) nonlinear system, as opposed to the two-dimensional system present in the shallow stream approximation. This 3D system is posed on high-aspect domains with strong anisotropy and variation in coefficients, making it expensive to solve with current methods. This paper presents a Newton--Krylov multigrid solver for the hydrostatic equations that demonstrates textbook multigrid efficiency (an order of magnitude reduction in residual per iteration and solution of the fine-level system at a small multiple of the cost of a residual evaluation). Scalability on Blue Gene/P is demonstrated, and the method is compared to various algebraic methods that are in use or have been proposed as viable approaches.
Achieving Textbook Multigrid Efficiency for Hydrostatic Ice Sheet Flow

KAUST Repository

Brown, Jed

2013-03-12

The hydrostatic equations for ice sheet flow offer improved fidelity compared with the shallow ice approximation and shallow stream approximation popular in today\\'s ice sheet models. Nevertheless, they present a serious bottleneck because they require the solution of a three-dimensional (3D) nonlinear system, as opposed to the two-dimensional system present in the shallow stream approximation. This 3D system is posed on high-aspect domains with strong anisotropy and variation in coefficients, making it expensive to solve with current methods. This paper presents a Newton--Krylov multigrid solver for the hydrostatic equations that demonstrates textbook multigrid efficiency (an order of magnitude reduction in residual per iteration and solution of the fine-level system at a small multiple of the cost of a residual evaluation). Scalability on Blue Gene/P is demonstrated, and the method is compared to various algebraic methods that are in use or have been proposed as viable approaches.
Blockspin and multigrid for staggered fermions in non-abelian gauge fields

International Nuclear Information System (INIS)

Kalkreuter, T.; Mack, G.; Speh, M.

1991-07-01

We discuss blockspins for staggered fermions, i.e. averaging and interpolation procedures which are needed in a real space renormalization group approach to gauge theories with staggered fermions and in a multigrid approach to the computation of gauge covariant propagators. The discussion starts from the requirement that the symmetries of the free action should be preserved by the blocking procedure in the limit of a pure gauge. A definition of an averaging kernel as a solution of a gauge covariant eigenvalue equation is proposed, and the properties of a corresponding interpolation kernel are examined in the light of general criteria for good choices of blockspins. Some results of multigrid computation of bosonic propagation in an SU(2) gauge field in 4 dimensions are also presented. (orig.)
Multigrid for the Galerkin least squares method in linear elasticity: The pure displacement problem

Energy Technology Data Exchange (ETDEWEB)

Yoo, Jaechil [Univ. of Wisconsin, Madison, WI (United States)

1996-12-31

Franca and Stenberg developed several Galerkin least squares methods for the solution of the problem of linear elasticity. That work concerned itself only with the error estimates of the method. It did not address the related problem of finding effective methods for the solution of the associated linear systems. In this work, we prove the convergence of a multigrid (W-cycle) method. This multigrid is robust in that the convergence is uniform as the parameter, v, goes to 1/2 Computational experiments are included.
A Cost-Effective Smoothed Multigrid with Modified Neighborhood-Based Aggregation for Markov Chains

Directory of Open Access Journals (Sweden)

Zhao-Li Shen

2015-01-01

Full Text Available Smoothed aggregation multigrid method is considered for computing stationary distributions of Markov chains. A judgement which determines whether to implement the whole aggregation procedure is proposed. Through this strategy, a large amount of time in the aggregation procedure is saved without affecting the convergence behavior. Besides this, we explain the shortage and irrationality of the Neighborhood-Based aggregation which is commonly used in multigrid methods. Then a modified version is presented to remedy and improve it. Numerical experiments on some typical Markov chain problems are reported to illustrate the performance of these methods.
Primal Domain Decomposition Method with Direct and Iterative Solver for Circuit-Field-Torque Coupled Parallel Finite Element Method to Electric Machine Modelling

Directory of Open Access Journals (Sweden)

Daniel Marcsa

2015-01-01

Full Text Available The analysis and design of electromechanical devices involve the solution of large sparse linear systems, and require therefore high performance algorithms. In this paper, the primal Domain Decomposition Method (DDM with parallel forward-backward and with parallel Preconditioned Conjugate Gradient (PCG solvers are introduced in two-dimensional parallel time-stepping finite element formulation to analyze rotating machine considering the electromagnetic field, external circuit and rotor movement. The proposed parallel direct and the iterative solver with two preconditioners are analyzed concerning its computational efficiency and number of iterations of the solver with different preconditioners. Simulation results of a rotating machine is also presented.
Multidimensional radiative transfer with multilevel atoms. II. The non-linear multigrid method.

Science.gov (United States)

Fabiani Bendicho, P.; Trujillo Bueno, J.; Auer, L.

1997-08-01

A new iterative method for solving non-LTE multilevel radiative transfer (RT) problems in 1D, 2D or 3D geometries is presented. The scheme obtains the self-consistent solution of the kinetic and RT equations at the cost of only a few (iteration (Brandt, 1977, Math. Comp. 31, 333; Hackbush, 1985, Multi-Grid Methods and Applications, springer-Verlag, Berlin), an efficient multilevel RT scheme based on Gauss-Seidel iterations (cf. Trujillo Bueno & Fabiani Bendicho, 1995ApJ...455..646T), and accurate short-characteristics formal solution techniques. By combining a valid stopping criterion with a nested-grid strategy a converged solution with the desired true error is automatically guaranteed. Contrary to the current operator splitting methods the very high convergence speed of the new RT method does not deteriorate when the grid spatial resolution is increased. With this non-linear multigrid method non-LTE problems discretized on N grid points are solved in O(N) operations. The nested multigrid RT method presented here is, thus, particularly attractive in complicated multilevel transfer problems where small grid-sizes are required. The properties of the method are analyzed both analytically and with illustrative multilevel calculations for Ca II in 1D and 2D schematic model atmospheres.
Development of Multigrid Methods for diffusion, Advection, and the incompressible Navier-Stokes Equations

Energy Technology Data Exchange (ETDEWEB)

Gjesdal, Thor

1997-12-31

This thesis discusses the development and application of efficient numerical methods for the simulation of fluid flows, in particular the flow of incompressible fluids. The emphasis is on practical aspects of algorithm development and on application of the methods either to linear scalar model equations or to the non-linear incompressible Navier-Stokes equations. The first part deals with cell centred multigrid methods and linear correction scheme and presents papers on (1) generalization of the method to arbitrary sized grids for diffusion problems, (2) low order method for advection-diffusion problems, (3) attempt to extend the basic method to advection-diffusion problems, (4) Fourier smoothing analysis of multicolour relaxation schemes, and (5) analysis of high-order discretizations for advection terms. The second part discusses a multigrid based on pressure correction methods, non-linear full approximation scheme, and papers on (1) systematic comparison of the performance of different pressure correction smoothers and some other algorithmic variants, low to moderate Reynolds numbers, and (2) systematic study of implementation strategies for high order advection schemes, high-Re flow. An appendix contains Fortran 90 data structures for multigrid development. 160 refs., 26 figs., 22 tabs.

Parallel simulation of two-phase incompressible and immiscible flows in porous media using a finite volume formulation and a modified IMPES approach

International Nuclear Information System (INIS)

Da Silva, R S; De Carvalho, D K E; Antunes, A R E; Lyra, P R M; Willmersdorf, R B

2010-01-01

In this paper a finite volume method with a 'Modified Implicit Pressure, Explicit Saturation' (MIMPES) approach is used to model the 3-D incompressible and immiscible two-phase flow of water and oil in heterogeneous and anisotropic porous media. A vertex centered finite volume method with an edge-based data structure is adopted to discretize both the elliptic pressure and the hyperbolic saturation equations using parallel computers with distributed memory. Due to the explicit solution of the saturation equation in the IMPES method, severe time step restrictions are imposed on the simulation. In order to circumvent this problem, an edge-based implementation of the MIMPES method was used. In this method, the pressure equation is solved and the velocity field is computed much less frequently than the saturation field. Following the work of Hurtado, a mean relative variation of the velocity field throughout the simulation is used to automatically control the updating process, allowing for much larger time-steps in a very simple way. In order to run large scale problems, we have developed a parallel implementation using clusters of PC's. The simulator uses open source parallel libraries like FMDB, ParMetis and PETSc. Results of speed-up and efficiency are presented to validate the performance of the parallel simulator.
A multigrid Newton-Krylov method for flux-limited radiation diffusion

International Nuclear Information System (INIS)

Rider, W.J.; Knoll, D.A.; Olson, G.L.

1998-01-01

The authors focus on the integration of radiation diffusion including flux-limited diffusion coefficients. The nonlinear integration is accomplished with a Newton-Krylov method preconditioned with a multigrid Picard linearization of the governing equations. They investigate the efficiency of the linear and nonlinear iterative techniques
Multigrid time-accurate integration of Navier-Stokes equations

Science.gov (United States)

Arnone, Andrea; Liou, Meng-Sing; Povinelli, Louis A.

1993-01-01

Efficient acceleration techniques typical of explicit steady-state solvers are extended to time-accurate calculations. Stability restrictions are greatly reduced by means of a fully implicit time discretization. A four-stage Runge-Kutta scheme with local time stepping, residual smoothing, and multigridding is used instead of traditional time-expensive factorizations. Some applications to natural and forced unsteady viscous flows show the capability of the procedure.
Primal-Dual Interior Point Multigrid Method for Topology Optimization

Czech Academy of Sciences Publication Activity Database

Kočvara, Michal; Mohammed, S.

2016-01-01

Roč. 38, č. 5 (2016), B685-B709 ISSN 1064-8275 Grant - others:European Commission - EC(XE) 313781 Institutional support: RVO:67985556 Keywords : topology optimization * multigrid method s * interior point method Subject RIV: BA - General Mathematics Impact factor: 2.195, year: 2016 http://library.utia.cas.cz/separaty/2016/MTR/kocvara-0462418.pdf
High-Fidelity RF Gun Simulations with the Parallel 3D Finite Element Particle-In-Cell Code Pic3P

Energy Technology Data Exchange (ETDEWEB)

Candel, A; Kabel, A.; Lee, L.; Li, Z.; Limborg, C.; Ng, C.; Schussman, G.; Ko, K.; /SLAC

2009-06-19

SLAC's Advanced Computations Department (ACD) has developed the first parallel Finite Element 3D Particle-In-Cell (PIC) code, Pic3P, for simulations of RF guns and other space-charge dominated beam-cavity interactions. Pic3P solves the complete set of Maxwell-Lorentz equations and thus includes space charge, retardation and wakefield effects from first principles. Pic3P uses higher-order Finite Elementmethods on unstructured conformal meshes. A novel scheme for causal adaptive refinement and dynamic load balancing enable unprecedented simulation accuracy, aiding the design and operation of the next generation of accelerator facilities. Application to the Linac Coherent Light Source (LCLS) RF gun is presented.
Comparative Performance Analysis of Coarse Solvers for Algebraic Multigrid on Multicore and Manycore Architectures

Energy Technology Data Exchange (ETDEWEB)

Druinsky, A; Ghysels, P; Li, XS; Marques, O; Williams, S; Barker, A; Kalchev, D; Vassilevski, P

2016-04-02

In this paper, we study the performance of a two-level algebraic-multigrid algorithm, with a focus on the impact of the coarse-grid solver on performance. We consider two algorithms for solving the coarse-space systems: the preconditioned conjugate gradient method and a new robust HSS-embedded low-rank sparse-factorization algorithm. Our test data comes from the SPE Comparative Solution Project for oil-reservoir simulations. We contrast the performance of our code on one 12-core socket of a Cray XC30 machine with performance on a 60-core Intel Xeon Phi coprocessor. To obtain top performance, we optimized the code to take full advantage of fine-grained parallelism and made it thread-friendly for high thread count. We also developed a bounds-and-bottlenecks performance model of the solver which we used to guide us through the optimization effort, and also carried out performance tuning in the solver’s large parameter space. Finally, as a result, significant speedups were obtained on both machines.
Time-dependent radiation transfer with rayleigh scattering in finite plane-parallel media using pomraning-eddington approximation

International Nuclear Information System (INIS)

El-Wakil, S.A.; Sallah, M.; Degheidy, A.R.

2005-01-01

The time-dependent radiation transfer equation in plane geometry with Rayleigh scattering is studied. The traveling wave transformation is used to obtain the corresponding stationary-like equation. Pomraning-Eddington approximation is then used to calculate the radiation intensity in finite plane-parallel media. Numerical results and shielding calculations are shown for reflectivity and transmissivity at different times. The medium is assumed to have specular-reflecting boundaries. For the sake of comparison, two different weight functions are introduced and to force the boundary conditions to be fulfilled
Parallel Reservoir Simulations with Sparse Grid Techniques and Applications to Wormhole Propagation

KAUST Repository

Wu, Yuanqing

2015-09-08

In this work, two topics of reservoir simulations are discussed. The first topic is the two-phase compositional flow simulation in hydrocarbon reservoir. The major obstacle that impedes the applicability of the simulation code is the long run time of the simulation procedure, and thus speeding up the simulation code is necessary. Two means are demonstrated to address the problem: parallelism in physical space and the application of sparse grids in parameter space. The parallel code can gain satisfactory scalability, and the sparse grids can remove the bottleneck of flash calculations. Instead of carrying out the flash calculation in each time step of the simulation, a sparse grid approximation of all possible results of the flash calculation is generated before the simulation. Then the constructed surrogate model is evaluated to approximate the flash calculation results during the simulation. The second topic is the wormhole propagation simulation in carbonate reservoir. In this work, different from the traditional simulation technique relying on the Darcy framework, we propose a new framework called Darcy-Brinkman-Forchheimer framework to simulate wormhole propagation. Furthermore, to process the large quantity of cells in the simulation grid and shorten the long simulation time of the traditional serial code, standard domain-based parallelism is employed, using the Hypre multigrid library. In addition to that, a new technique called “experimenting field approach” to set coefficients in the model equations is introduced. In the 2D dissolution experiments, different configurations of wormholes and a series of properties simulated by both frameworks are compared. We conclude that the numerical results of the DBF framework are more like wormholes and more stable than the Darcy framework, which is a demonstration of the advantages of the DBF framework. The scalability of the parallel code is also evaluated, and good scalability can be achieved. Finally, a mixed
Parallelized Three-Dimensional Resistivity Inversion Using Finite Elements And Adjoint State Methods

Science.gov (United States)

Schaa, Ralf; Gross, Lutz; Du Plessis, Jaco

2015-04-01

resistivity. The Hessian of the regularization term is used as preconditioner which requires an additional PDE solution in each iteration step. As it turns out, the relevant PDEs are naturally formulated in the finite element framework. Using the domain decomposition method provided in Escript, the inversion scheme has been parallelized for distributed memory computers with multi-core shared memory nodes. We show numerical examples from simple layered models to complex 3D models and compare with the results from other methods. The inversion scheme is furthermore tested on a field data example to characterise localised freshwater discharge in a coastal environment.. References: L. Gross and C. Kemp (2013) Large Scale Joint Inversion of Geophysical Data using the Finite Element Method in escript. ASEG Extended Abstracts 2013, http://dx.doi.org/10.1071/ASEG2013ab306
Software abstractions and computational issues in parallel structure adaptive mesh methods for electronic structure calculations

Energy Technology Data Exchange (ETDEWEB)

Kohn, S.; Weare, J.; Ong, E.; Baden, S.

1997-05-01

We have applied structured adaptive mesh refinement techniques to the solution of the LDA equations for electronic structure calculations. Local spatial refinement concentrates memory resources and numerical effort where it is most needed, near the atomic centers and in regions of rapidly varying charge density. The structured grid representation enables us to employ efficient iterative solver techniques such as conjugate gradient with FAC multigrid preconditioning. We have parallelized our solver using an object- oriented adaptive mesh refinement framework.
An extended algebraic variational multiscale-multigrid-multifractal method (XAVM4) for large-eddy simulation of turbulent two-phase flow

Science.gov (United States)

Rasthofer, U.; Wall, W. A.; Gravemeier, V.

2018-04-01

A novel and comprehensive computational method, referred to as the eXtended Algebraic Variational Multiscale-Multigrid-Multifractal Method (XAVM4), is proposed for large-eddy simulation of the particularly challenging problem of turbulent two-phase flow. The XAVM4 involves multifractal subgrid-scale modeling as well as a Nitsche-type extended finite element method as an approach for two-phase flow. The application of an advanced structural subgrid-scale modeling approach in conjunction with a sharp representation of the discontinuities at the interface between two bulk fluids promise high-fidelity large-eddy simulation of turbulent two-phase flow. The high potential of the XAVM4 is demonstrated for large-eddy simulation of turbulent two-phase bubbly channel flow, that is, turbulent channel flow carrying a single large bubble of the size of the channel half-width in this particular application.
Feasibility Study of Parallel Finite Element Analysis on Cluster-of-Clusters

Science.gov (United States)

Muraoka, Masae; Okuda, Hiroshi

With the rapid growth of WAN infrastructure and development of Grid middleware, it's become a realistic and attractive methodology to connect cluster machines on wide-area network for the execution of computation-demanding applications. Many existing parallel finite element (FE) applications have been, however, designed and developed with a single computing resource in mind, since such applications require frequent synchronization and communication among processes. There have been few FE applications that can exploit the distributed environment so far. In this study, we explore the feasibility of FE applications on the cluster-of-clusters. First, we classify FE applications into two types, tightly coupled applications (TCA) and loosely coupled applications (LCA) based on their communication pattern. A prototype of each application is implemented on the cluster-of-clusters. We perform numerical experiments executing TCA and LCA on both the cluster-of-clusters and a single cluster. Thorough these experiments, by comparing the performances and communication cost in each case, we evaluate the feasibility of FEA on the cluster-of-clusters.
Optimal multigrid algorithms for the massive Gaussian model and path integrals

International Nuclear Information System (INIS)

Brandt, A.; Galun, M.

1996-01-01

Multigrid algorithms are presented which, in addition to eliminating the critical slowing down, can also eliminate the open-quotes volume factorclose quotes. The elimination of the volume factor removes the need to produce many independent fine-grid configurations for averaging out their statistical deviations, by averaging over the many samples produced on coarse grids during the multigrid cycle. Thermodynamic limits of observables can be calculated to relative accuracy var-epsilon r in just O(var-epsilon r -2 ) computer operations, where var-epsilon r is the error relative to the standard deviation of the observable. In this paper, we describe in detail the calculation of the susceptibility in the one-dimensional massive Gaussian model, which is also a simple example of path integrals. Numerical experiments show that the susceptibility can be calculated to relative accuracy var-epsilon r in about 8 var-epsilon r -2 random number generations, independent of the mass size
Multilevel local refinement and multigrid methods for 3-D turbulent flow

Energy Technology Data Exchange (ETDEWEB)

Liao, C.; Liu, C. [UCD, Denver, CO (United States); Sung, C.H.; Huang, T.T. [David Taylor Model Basin, Bethesda, MD (United States)

1996-12-31

A numerical approach based on multigrid, multilevel local refinement, and preconditioning methods for solving incompressible Reynolds-averaged Navier-Stokes equations is presented. 3-D turbulent flow around an underwater vehicle is computed. 3 multigrid levels and 2 local refinement grid levels are used. The global grid is 24 x 8 x 12. The first patch is 40 x 16 x 20 and the second patch is 72 x 32 x 36. 4th order artificial dissipation are used for numerical stability. The conservative artificial compressibility method are used for further improvement of convergence. To improve the accuracy of coarse/fine grid interface of local refinement, flux interpolation method for refined grid boundary is used. The numerical results are in good agreement with experimental data. The local refinement can improve the prediction accuracy significantly. The flux interpolation method for local refinement can keep conservation for a composite grid, therefore further modify the prediction accuracy.
Multigrid solution of the Navier-Stokes equations at low speeds with large temperature variations

International Nuclear Information System (INIS)

Sockol, Peter M.

2003-01-01

Multigrid methods for the Navier-Stokes equations at low speeds and large temperature variations are investigated. The compressible equations with time-derivative preconditioning and preconditioned flux-difference splitting of the inviscid terms are used. Three implicit smoothers have been incorporated into a common multigrid procedure. Both full coarsening and semi-coarsening with directional fine-grid defect correction have been studied. The resulting methods have been tested on four 2D laminar problems over a range of Reynolds numbers on both uniform and highly stretched grids. Two of the three methods show efficient and robust performance over the entire range of conditions. In addition, none of the methods has any difficulty with the large temperature variations
Comparison of multihardware parallel implementations for a phase unwrapping algorithm

Science.gov (United States)

Hernandez-Lopez, Francisco Javier; Rivera, Mariano; Salazar-Garibay, Adan; Legarda-Sáenz, Ricardo

2018-04-01

Phase unwrapping is an important problem in the areas of optical metrology, synthetic aperture radar (SAR) image analysis, and magnetic resonance imaging (MRI) analysis. These images are becoming larger in size and, particularly, the availability and need for processing of SAR and MRI data have increased significantly with the acquisition of remote sensing data and the popularization of magnetic resonators in clinical diagnosis. Therefore, it is important to develop faster and accurate phase unwrapping algorithms. We propose a parallel multigrid algorithm of a phase unwrapping method named accumulation of residual maps, which builds on a serial algorithm that consists of the minimization of a cost function; minimization achieved by means of a serial Gauss-Seidel kind algorithm. Our algorithm also optimizes the original cost function, but unlike the original work, our algorithm is a parallel Jacobi class with alternated minimizations. This strategy is known as the chessboard type, where red pixels can be updated in parallel at same iteration since they are independent. Similarly, black pixels can be updated in parallel in an alternating iteration. We present parallel implementations of our algorithm for different parallel multicore architecture such as CPU-multicore, Xeon Phi coprocessor, and Nvidia graphics processing unit. In all the cases, we obtain a superior performance of our parallel algorithm when compared with the original serial version. In addition, we present a detailed comparative performance of the developed parallel versions.
Parallel, Multigrid Finite Element Simulator for Fractured/Faulted and Other Complex Reservoirs based on Common Component Architecture (CCA)

Energy Technology Data Exchange (ETDEWEB)

Milind Deo; Chung-Kan Huang; Huabing Wang

2008-08-31

volume of injection at lower rates. However, if oil production can be continued at high water cuts, the discounted cumulative production usually favors higher production rates. The workflow developed during the project was also used to perform multiphase simulations in heterogeneous, fracture-matrix systems. Compositional and thermal-compositional simulators were developed for fractured reservoirs using the generalized framework. The thermal-compositional simulator was based on a novel 'equation-alignment' approach that helped choose the correct variables to solve depending on the number of phases present and the prescribed component partitioning. The simulators were used in steamflooding and in insitu combustion applications. The framework was constructed to be inherently parallel. The partitioning routines employed in the framework allowed generalized partitioning on highly complex fractured reservoirs and in instances when wells (incorporated in these models as line sources) were divided between two or more processors.
HP-Multigrid as Smoother algorithm for higher order discontinuous Galerkin discretizations of advection dominated flows. Part II: Optimization of the Runge-Kutta smoother

NARCIS (Netherlands)

van der Vegt, Jacobus J.W.; Rhebergen, Sander

2012-01-01

Using a detailed multilevel analysis of the complete hp-Multigrid as Smoother algorithm accurate predictions are obtained of the spectral radius and operator norms of the multigrid error transformation operator. This multilevel analysis is used to optimize the coefficients in the semi-implicit
Two-level Fourier analysis of a multigrid approach for discontinuous Galerkin discretisation

NARCIS (Netherlands)

P.W. Hemker (Piet); W. Hoffmann; M.H. van Raalte (Marc)

2002-01-01

textabstractIn this paper we study a multigrid method for the solution of a linear second order elliptic equation, discretized by discontinuous Galerkin (DG) methods, andwe give a detailed analysis of the convergence for different block-relaxation strategies.We find that point-wise
3D inversion based on multi-grid approach of magnetotelluric data from Northern Scandinavia

Science.gov (United States)

Cherevatova, M.; Smirnov, M.; Korja, T. J.; Egbert, G. D.

2012-12-01

In this work we investigate the geoelectrical structure of the cratonic margin of Fennoscandian Shield by means of magnetotelluric (MT) measurements carried out in Northern Norway and Sweden during summer 2011-2012. The project Magnetotellurics in the Scandes (MaSca) focuses on the investigation of the crust, upper mantle and lithospheric structure in a transition zone from a stable Precambrian cratonic interior to a passive continental margin beneath the Caledonian Orogen and the Scandes Mountains in western Fennoscandia. Recent MT profiles in the central and southern Scandes indicated a large contrast in resistivity between Caledonides and Precambrian basement. The alum shales as a highly conductive layers between the resistive Precambrian basement and the overlying Caledonian nappes are revealed from this profiles. Additional measurements in the Northern Scandes were required. All together data from 60 synchronous long period (LMT) and about 200 broad band (BMT) sites were acquired. The array stretches from Lofoten and Bodo (Norway) in the west to Kiruna and Skeleftea (Sweden) in the east covering an area of 500x500 square kilometers. LMT sites were occupied for about two months, while most of the BMT sites were measured during one day. We have used new multi-grid approach for 3D electromagnetic (EM) inversion and modelling. Our approach is based on the OcTree discretization where the spatial domain is represented by rectangular cells, each of which might be subdivided (recursively) into eight sub-cells. In this simplified implementation the grid is refined only in the horizontal direction, uniformly in each vertical layer. Using multi-grid we manage to have a high grid resolution near the surface (for instance, to tackle with galvanic distortions) and lower resolution at greater depth as the EM fields decay in the Earth according to the diffusion equation. We also have a benefit in computational costs as number of unknowns decrease. The multi-grid forward

Iterative algorithms for large sparse linear systems on parallel computers

Science.gov (United States)

Adams, L. M.

1982-01-01

Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.
Conservative multigrid methods for Cahn-Hilliard fluids

International Nuclear Information System (INIS)

Kim, Junseok; Kang, Kyungkeun; Lowengrub, John

2004-01-01

We develop a conservative, second-order accurate fully implicit discretization of the Navier-Stokes (NS) and Cahn-Hilliard (CH) system that has an associated discrete energy functional. This system provides a diffuse-interface description of binary fluid flows with compressible or incompressible flow components [R. Soc. Lond. Proc. Ser. A Math. Phys. Eng. Sci. 454 (1998) 2617]. In this work, we focus on the case of flows containing two immiscible, incompressible and density-matched components. The scheme, however, has a straightforward extension to multi-component systems. To efficiently solve the discrete system at the implicit time-level, we develop a nonlinear multigrid method to solve the CH equation which is then coupled to a projection method that is used to solve the NS equation. We demonstrate convergence of our scheme numerically in both the presence and absence of flow and perform simulations of phase separation via spinodal decomposition. We examine the separate effects of surface tension and external flow on the decomposition. We find surface tension driven flow alone increases coalescence rates through the retraction of interfaces. When there is an applied external shear, the evolution of the flow is nontrivial and the flow morphology repeats itself in time as multiple pinchoff and reconnection events occur. Eventually, the periodic motion ceases and the system relaxes to a global equilibrium. The equilibria we observe appears has a similar structure in all cases although the dynamics of the evolution is quite different. We view the work presented in this paper as preparatory for a detailed investigation of liquid-liquid interfaces with surface tension where the interfaces separate two immiscible fluids [On the pinchoff of liquid-liquid jets with surface tension, in preparation]. To this end, we also include a simulation of the pinchoff of a liquid thread under the Rayleigh instability at finite Reynolds number
A parallel algorithm for transient solid dynamics simulations with contact detection

International Nuclear Information System (INIS)

Attaway, S.; Hendrickson, B.; Plimpton, S.; Gardner, D.; Vaughan, C.; Heinstein, M.; Peery, J.

1996-01-01

Solid dynamics simulations with Lagrangian finite elements are used to model a wide variety of problems, such as the calculation of impact damage to shipping containers for nuclear waste and the analysis of vehicular crashes. Using parallel computers for these simulations has been hindered by the difficulty of searching efficiently for material surface contacts in parallel. A new parallel algorithm for calculation of arbitrary material contacts in finite element simulations has been developed and implemented in the PRONTO3D transient solid dynamics code. This paper will explore some of the issues involved in developing efficient, portable, parallel finite element models for nonlinear transient solid dynamics simulations. The contact-detection problem poses interesting challenges for efficient implementation of a solid dynamics simulation on a parallel computer. The finite element mesh is typically partitioned so that each processor owns a localized region of the finite element mesh. This mesh partitioning is optimal for the finite element portion of the calculation since each processor must communicate only with the few connected neighboring processors that share boundaries with the decomposed mesh. However, contacts can occur between surfaces that may be owned by any two arbitrary processors. Hence, a global search across all processors is required at every time step to search for these contacts. Load-imbalance can become a problem since the finite element decomposition divides the volumetric mesh evenly across processors but typically leaves the surface elements unevenly distributed. In practice, these complications have been limiting factors in the performance and scalability of transient solid dynamics on massively parallel computers. In this paper the authors present a new parallel algorithm for contact detection that overcomes many of these limitations
Algebraic multigrid preconditioners for two-phase flow in porous media with phase transitions

Science.gov (United States)

Bui, Quan M.; Wang, Lu; Osei-Kuffuor, Daniel

2018-04-01

Multiphase flow is a critical process in a wide range of applications, including oil and gas recovery, carbon sequestration, and contaminant remediation. Numerical simulation of multiphase flow requires solving of a large, sparse linear system resulting from the discretization of the partial differential equations modeling the flow. In the case of multiphase multicomponent flow with miscible effect, this is a very challenging task. The problem becomes even more difficult if phase transitions are taken into account. A new approach to handle phase transitions is to formulate the system as a nonlinear complementarity problem (NCP). Unlike in the primary variable switching technique, the set of primary variables in this approach is fixed even when there is phase transition. Not only does this improve the robustness of the nonlinear solver, it opens up the possibility to use multigrid methods to solve the resulting linear system. The disadvantage of the complementarity approach, however, is that when a phase disappears, the linear system has the structure of a saddle point problem and becomes indefinite, and current algebraic multigrid (AMG) algorithms cannot be applied directly. In this study, we explore the effectiveness of a new multilevel strategy, based on the multigrid reduction technique, to deal with problems of this type. We demonstrate the effectiveness of the method through numerical results for the case of two-phase, two-component flow with phase appearance/disappearance. We also show that the strategy is efficient and scales optimally with problem size.
Analysis of preconditioning and multigrid for Euler flows with low-subsonic regions

NARCIS (Netherlands)

Koren, B.; Leer, van B.

1995-01-01

For subsonic flows and upwind-discretized, linearized 1-D Euler equations, the smoothing behavior of multigrid-accelerated point Gauss-Seidel relaxation is analyzed. Error decay by convection across domain boundaries is also discussed. A fix to poor convergence rates at low Mach numbers is sought in
Compiler generation and autotuning of communication-avoiding operators for geometric multigrid

Energy Technology Data Exchange (ETDEWEB)

Basu, Protonu [Univ. of Utah, Salt Lake City, UT (United States); Venkat, Anand [Univ. of Utah, Salt Lake City, UT (United States); Hall, Mary [Univ. of Utah, Salt Lake City, UT (United States); Williams, Samuel [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Van Straalen, Brian [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Oliker, Leonid [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

2014-04-17

This paper describes a compiler approach to introducing communication-avoiding optimizations in geometric multigrid (GMG), one of the most popular methods for solving partial differential equations. Communication-avoiding optimizations reduce vertical communication through the memory hierarchy and horizontal communication across processes or threads, usually at the expense of introducing redundant computation. We focus on applying these optimizations to the smooth operator, which successively reduces the error and accounts for the largest fraction of the GMG execution time. Our compiler technology applies both novel and known transformations to derive an implementation comparable to manually-tuned code. To make the approach portable, an underlying autotuning system explores the tradeoff between reduced communication and increased computation, as well as tradeoffs in threading schemes, to automatically identify the best implementation for a particular architecture and at each computation phase. Results show that we are able to quadruple the performance of the smooth operation on the finest grids while attaining performance within 94% of manually-tuned code. Overall we improve the overall multigrid solve time by 2.5× without sacrificing programer productivity.
Cellular Automaton Modeling of Dendritic Growth Using a Multi-grid Method

International Nuclear Information System (INIS)

Natsume, Y; Ohsasa, K

2015-01-01

A two-dimensional cellular automaton model with a multi-grid method was developed to simulate dendritic growth. In the present model, we used a triple-grid system for temperature, solute concentration and solid fraction fields as a new approach of the multi-grid method. In order to evaluate the validity of the present model, we carried out simulations of single dendritic growth, secondary dendrite arm growth, multi-columnar dendritic growth and multi-equiaxed dendritic growth. From the results of the grid dependency from the simulation of single dendritic growth, we confirmed that the larger grid can be used in the simulation and that the computational time can be reduced dramatically. In the simulation of secondary dendrite arm growth, the results from the present model were in good agreement with the experimental data and the simulated results from a phase-field model. Thus, the present model can quantitatively simulate dendritic growth. From the simulated results of multi-columnar and multi-equiaxed dendrites, we confirmed that the present model can perform simulations under practical solidification conditions. (paper)
A first-order multigrid method for bound-constrained convex optimization

Czech Academy of Sciences Publication Activity Database

Kočvara, Michal; Mohammed, S.

2016-01-01

Roč. 31, č. 3 (2016), s. 622-644 ISSN 1055-6788 R&D Projects: GA ČR(CZ) GAP201/12/0671 Grant - others:European Commission - EC(XE) 313781 Institutional support: RVO:67985556 Keywords : bound-constrained optimization * multigrid methods * linear complementarity problems Subject RIV: BA - General Mathematics Impact factor: 1.023, year: 2016 http://library.utia.cas.cz/separaty/2016/MTR/kocvara-0460326.pdf
Multigrid direct numerical simulation of the whole process of flow transition in 3-D boundary layers

Science.gov (United States)

Liu, Chaoqun; Liu, Zhining

1993-01-01

A new technology was developed in this study which provides a successful numerical simulation of the whole process of flow transition in 3-D boundary layers, including linear growth, secondary instability, breakdown, and transition at relatively low CPU cost. Most other spatial numerical simulations require high CPU cost and blow up at the stage of flow breakdown. A fourth-order finite difference scheme on stretched and staggered grids, a fully implicit time marching technique, a semi-coarsening multigrid based on the so-called approximate line-box relaxation, and a buffer domain for the outflow boundary conditions were all used for high-order accuracy, good stability, and fast convergence. A new fine-coarse-fine grid mapping technique was developed to keep the code running after the laminar flow breaks down. The computational results are in good agreement with linear stability theory, secondary instability theory, and some experiments. The cost for a typical case with 162 x 34 x 34 grid is around 2 CRAY-YMP CPU hours for 10 T-S periods.
Multi-grid Beam and Warming scheme for the simulation of unsteady ...

African Journals Online (AJOL)

In this paper, a multi-grid algorithm is applied to a large-scale block matrix that is produced from a Beam and Warming scheme. The Beam and Warming scheme is used in the simulation of unsteady flow in an open channel. The Gauss-Seidel block-wise iteration method is used for a smoothing process with a few iterations.
Development and application of a parallel finite volume method for flow simulation on unstructured grids with local refinement; Entwicklung und Anwendung eines parallelen Finite-Volumen-Verfahrens zur Stroemungssimulation auf unstrukturierten Gittern mit lokaler Verfeinerung

Energy Technology Data Exchange (ETDEWEB)

Seidl, V.

1997-11-01

A finite vomume method for calculation of steady and unsteady flow on unstructured grids is parallelized by local spatial and time decomposition. In the first case, a parallel variant of the conjugated gradient method with multiple local preconditioning is formulated and analyzed. The method is tested for simple applications (e.g. flow around a cylinder). The second part of the publication describes a direct numerical simulation of turbulent flow around a sphere at a Reynolds number of 5000 (based on flow velocity and sphere diameter). Current and Reynolds-averaged flow fields are discussed. Particular emphasis is placed on coordinate-independent representation of the anisotropy ratios of the Reynolds tensor and dissipation tensor. (orig.) [Deutsch] Ein Finite-Volumen-Verfahren fuer die Berechnung stationaerer und instationaerer Stroemungen auf unstrukturierten Netzen wird durch Gebietszerlegung im Raum und Zeit parallelisiert. Fuer die raeumliche Zerlegung wird eine parallele Variante der konjugierten Gradienten Methode mit mehrfacher, lokaler Vorkonditionierung formuliert und analysiert. Anhand einfacher Anwendungsbeispiele (Zylinderumstroemung, deckelgetriebene Nischenstroemung) wird das entwickelte Gesamtverfahren getestet und seine Effizienz bestimmt. Der zweite Teil der Arbeit beschreibt eine direkte numerische Simulation der turbulenten Kugelumstroemung bei einer Reynolds-Zahl von 5 000 (basierend auf Anstroemgeschwindigkeit und Kugeldurchmesser). In der Ergebnisauswertung werden augenblickliche und Reynolds-gemittelte Stroemungsfelder diskutiert und besonderer Wert auf eine koordinatenunabhaengige Darstellung der Anisotropieverhaeltnisse des Reynolds-Tensors und des Dissipationstensors gelegt. (orig.)
Multigrid methods for fully implicit oil reservoir simulation

Energy Technology Data Exchange (ETDEWEB)

Molenaar, J.

1995-12-31

In this paper, the authors consider the simultaneous flow of oil and water in reservoir rock. This displacement process is modeled by two basic equations the material balance or continuity equations, and the equation of motion (Darcy`s law). For the numerical solution of this system of nonlinear partial differential equations, there are two approaches: the fully implicit or simultaneous solution method, and the sequential solution method. In this paper, the authors consider the possibility of applying multigrid methods for the iterative solution of the systems of nonlinear equations.
Impact of new computing systems on finite element computations

International Nuclear Information System (INIS)

Noor, A.K.; Fulton, R.E.; Storaasi, O.O.

1983-01-01

Recent advances in computer technology that are likely to impact finite element computations are reviewed. The characteristics of supersystems, highly parallel systems, and small systems (mini and microcomputers) are summarized. The interrelations of numerical algorithms and software with parallel architectures are discussed. A scenario is presented for future hardware/software environment and finite element systems. A number of research areas which have high potential for improving the effectiveness of finite element analysis in the new environment are identified
Parallel direct solver for finite element modeling of manufacturing processes

DEFF Research Database (Denmark)

Nielsen, Chris Valentin; Martins, P.A.F.

2017-01-01

The central processing unit (CPU) time is of paramount importance in finite element modeling of manufacturing processes. Because the most significant part of the CPU time is consumed in solving the main system of equations resulting from finite element assemblies, different approaches have been...
NONLINEAR MULTIGRID SOLVER EXPLOITING AMGe COARSE SPACES WITH APPROXIMATION PROPERTIES

Energy Technology Data Exchange (ETDEWEB)

Christensen, Max La Cour [Technical Univ. of Denmark, Lyngby (Denmark); Villa, Umberto E. [Univ. of Texas, Austin, TX (United States); Engsig-Karup, Allan P. [Technical Univ. of Denmark, Lyngby (Denmark); Vassilevski, Panayot S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2016-01-22

The paper introduces a nonlinear multigrid solver for mixed nite element discretizations based on the Full Approximation Scheme (FAS) and element-based Algebraic Multigrid (AMGe). The main motivation to use FAS for unstruc- tured problems is the guaranteed approximation property of the AMGe coarse spaces that were developed recently at Lawrence Livermore National Laboratory. These give the ability to derive stable and accurate coarse nonlinear discretization problems. The previous attempts (including ones with the original AMGe method, [5, 11]), were less successful due to lack of such good approximation properties of the coarse spaces. With coarse spaces with approximation properties, our FAS approach on un- structured meshes should be as powerful/successful as FAS on geometrically re ned meshes. For comparison, Newton's method and Picard iterations with an inner state-of-the-art linear solver is compared to FAS on a nonlinear saddle point problem with applications to porous media ow. It is demonstrated that FAS is faster than Newton's method and Picard iterations for the experiments considered here. Due to the guaranteed approximation properties of our AMGe, the coarse spaces are very accurate, providing a solver with the potential for mesh-independent convergence on general unstructured meshes.
Monolithic multigrid method for the coupled Stokes flow and deformable porous medium system

NARCIS (Netherlands)

P. Luo (Peiyao); C. Rodrigo (Carmen); F.J. Gaspar Lorenz (Franscisco); C.W. Oosterlee (Cornelis)

2018-01-01

textabstractThe interaction between fluid flow and a deformable porous medium is a complicated multi-physics problem, which can be described by a coupled model based on the Stokes and poroelastic equations. A monolithic multigrid method together with either a coupled Vanka smoother or a decoupled
Multigrid technique and Optimized Schwarz method on block-structured grids with discontinuous interfaces

DEFF Research Database (Denmark)

Kolmogorov, Dmitry; Sørensen, Niels N.; Shen, Wen Zhong

2013-01-01

An Optimized Schwarz method using Robin boundary conditions for relaxation scheme is presented in the frame of Multigrid method on discontinuous grids. At each iteration the relaxation scheme is performed in two steps: one step with Dirichlet and another step with Robin boundary conditions at inn...
Proceedings of the fifth international symposium on numerical methods in engineering. Vol. 1 and 2

Energy Technology Data Exchange (ETDEWEB)

Gruber, R [Ecole Polytechnique Federale, Lausanne (Switzerland); Periaux, J [Avions Marcel Dassault-Breguet Aviation, 92 - Saint-Cloud (France). Aerodynamique, Methodes Numeriques; Shaw, R P [Buffalo Univ., NY (USA). Dept. of Civil Engineering; eds.

1989-01-01

The present two volumes survey the state of the art in advanced scientific computing as applied to engineering science. Many fields ranging from modelization (partial differential equations, integral equations, boundary conditions, macroscopic models, cellular automata) to numerical methods (finite elements, optimization, software engineering tools, parallel and vector computing methods, domain decomposition, multigrid) and applications (nonlinear solid mechanics, fracture mechanics, composite materials, friction and contact, fluid mechanics, chemical flows, convection, free boundaries, combustion, electromagnetics) are covered by invited papers, minisymposia and contributed papers. (orig./HP).
Run-Time and Compiler Support for Programming in Adaptive Parallel Environments

Directory of Open Access Journals (Sweden)

Guy Edjlali

1997-01-01

Full Text Available For better utilization of computing resources, it is important to consider parallel programming environments in which the number of available processors varies at run-time. In this article, we discuss run-time support for data-parallel programming in such an adaptive environment. Executing programs in an adaptive environment requires redistributing data when the number of processors changes, and also requires determining new loop bounds and communication patterns for the new set of processors. We have developed a run-time library to provide this support. We discuss how the run-time library can be used by compilers of high-performance Fortran (HPF-like languages to generate code for an adaptive environment. We present performance results for a Navier-Stokes solver and a multigrid template run on a network of workstations and an IBM SP-2. Our experiments show that if the number of processors is not varied frequently, the cost of data redistribution is not significant compared to the time required for the actual computation. Overall, our work establishes the feasibility of compiling HPF for a network of nondedicated workstations, which are likely to be an important resource for parallel programming in the future.
Multigrid techniques with non-standard coarsening and group relaxation methods

International Nuclear Information System (INIS)

Danaee, A.

1989-06-01

In the usual (standard) multigrid methods, doubling of grid sizes with different smoothing iterations (pointwise, or blockwise) has been considered by different authors. Some have indicated that a large coarsening can also be used, but is not beneficial (cf. H3, p.59). In this paper, it is shown that with a suitable blockwise smoothing scheme, some advantages could be achieved even with a factor of H l-1 /h l = 3. (author). 10 refs, 2 figs, 6 tabs

A multigrid based 3D space-charge routine in the tracking code GPT

NARCIS (Netherlands)

Pöplau, G.; Rienen, van U.; Loos, de M.J.; Geer, van der S.B.; Berz, M.; Makino, K.

2005-01-01

Fast calculation of3D non-linear space-charge fields is essential for the simulation ofhigh-brightness charged particle beams. We report on our development of a new 3D spacecharge routine in the General Particle Tracer (GPT) code. The model is based on a nonequidistant multigrid Poisson solver that
Adaptive Multigrid Algorithm for the Lattice Wilson-Dirac Operator

International Nuclear Information System (INIS)

Babich, R.; Brower, R. C.; Rebbi, C.; Brannick, J.; Clark, M. A.; Manteuffel, T. A.; McCormick, S. F.; Osborn, J. C.

2010-01-01

We present an adaptive multigrid solver for application to the non-Hermitian Wilson-Dirac system of QCD. The key components leading to the success of our proposed algorithm are the use of an adaptive projection onto coarse grids that preserves the near null space of the system matrix together with a simplified form of the correction based on the so-called γ 5 -Hermitian symmetry of the Dirac operator. We demonstrate that the algorithm nearly eliminates critical slowing down in the chiral limit and that it has weak dependence on the lattice volume.
Conjugate gradient coupled with multigrid for an indefinite problem

Science.gov (United States)

Gozani, J.; Nachshon, A.; Turkel, E.

1984-01-01

An iterative algorithm for the Helmholtz equation is presented. This scheme was based on the preconditioned conjugate gradient method for the normal equations. The preconditioning is one cycle of a multigrid method for the discrete Laplacian. The smoothing algorithm is red-black Gauss-Seidel and is constructed so it is a symmetric operator. The total number of iterations needed by the algorithm is independent of h. By varying the number of grids, the number of iterations depends only weakly on k when k(3)h(2) is constant. Comparisons with a SSOR preconditioner are presented.
3D, parallel fluid-structure interaction code

CSIR Research Space (South Africa)

Oxtoby, Oliver F

2011-01-01

Full Text Available The authors describe the development of a 3D parallel Fluid–Structure–Interaction (FSI) solver and its application to benchmark problems. Fluid and solid domains are discretised using and edge-based finite-volume scheme for efficient parallel...
Exploiting Symmetry on Parallel Architectures.

Science.gov (United States)

Stiller, Lewis Benjamin

1995-01-01

This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.
Parallel knock-out schemes in networks

NARCIS (Netherlands)

Broersma, H.J.; Fomin, F.V.; Woeginger, G.J.

2004-01-01

We consider parallel knock-out schemes, a procedure on graphs introduced by Lampert and Slater in 1997 in which each vertex eliminates exactly one of its neighbors in each round. We are considering cases in which after a finite number of rounds, where the minimimum number is called the parallel
Two-Level Adaptive Algebraic Multigrid for a Sequence of Problems with Slowly Varying Random Coefficients [Adaptive Algebraic Multigrid for Sequence of Problems with Slowly Varying Random Coefficients

Energy Technology Data Exchange (ETDEWEB)

Kalchev, D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Ketelsen, C. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Vassilevski, P. S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2013-11-07

Our paper proposes an adaptive strategy for reusing a previously constructed coarse space by algebraic multigrid to construct a two-level solver for a problem with nearby characteristics. Furthermore, a main target application is the solution of the linear problems that appear throughout a sequence of Markov chain Monte Carlo simulations of subsurface flow with uncertain permeability field. We demonstrate the efficacy of the method with extensive set of numerical experiments.
Gyrokinetic simulation of finite-β plasmas on parallel architectures

International Nuclear Information System (INIS)

Reynders, J.V.W.

1993-01-01

Much research exists on the linear and non-linear properties of plasma microinstabilities induced by density and temperature gradients. There has been an interest in the electromagnetic or finite-β effects on these microinstabilities. This thesis focuses on the finite-β modification of an ion temperature gradient (ITG) driven microinstability in a two-dimensional shearless and sheared-slab geometries. A gyrokinetic model is employed in the numerical and analytic studies of this instability. Chapter 1 introduces the electromagnetic gyrokinetic model employed in the numerical and analytic studies of the ITG instability. Some discussion of the Klimontovich particle representation of the gyrokinetic Vlasov equation and a multiple scale model of the background plasma gradient is presented. Chapter 2 details the computational issues facing an electromagnetic gyrokinetic particle simulation of the ITG mode. An electromagnetic extension of the partially linearized algorithm is presented with a comparison of quiet particle initialization routines. Chapter 3 presents and compares algorithms for the gyrokinetic particle simulation technique on SIMD and MIMD computing platforms. Chapter 4 discusses electromagnetic gyrokinetic fluctuation theory and provides a comparison of analytic and numerical results. Chapter 5 contains a linear and a non-linear three-wave coupling analysis of the finite-β modified ITG mode in a shearless slab geometry. Comparisons are made with linear and partially linearized gyrokinetic simulation results. Chapter 6 presents results from a finite-β modified ITG mode in a sheared slab geometry. The linear dispersion relation is derived and results from an integral eigenvalue code are presented. Comparisons are made with the gyrokinetic particle code in a variety of limits with both adiabatic and non-adiabatic electrons. Evidence of ITG driven microtearing is presented
A parallel adaptive finite element simplified spherical harmonics approximation solver for frequency domain fluorescence molecular imaging

International Nuclear Information System (INIS)

Lu Yujie; Zhu Banghe; Rasmussen, John C; Sevick-Muraca, Eva M; Shen Haiou; Wang Ge

2010-01-01

Fluorescence molecular imaging/tomography may play an important future role in preclinical research and clinical diagnostics. Time- and frequency-domain fluorescence imaging can acquire more measurement information than the continuous wave (CW) counterpart, improving the image quality of fluorescence molecular tomography. Although diffusion approximation (DA) theory has been extensively applied in optical molecular imaging, high-order photon migration models need to be further investigated to match quantitation provided by nuclear imaging. In this paper, a frequency-domain parallel adaptive finite element solver is developed with simplified spherical harmonics (SP N ) approximations. To fully evaluate the performance of the SP N approximations, a fast time-resolved tetrahedron-based Monte Carlo fluorescence simulator suitable for complex heterogeneous geometries is developed using a convolution strategy to realize the simulation of the fluorescence excitation and emission. The validation results show that high-order SP N can effectively correct the modeling errors of the diffusion equation, especially when the tissues have high absorption characteristics or when high modulation frequency measurements are used. Furthermore, the parallel adaptive mesh evolution strategy improves the modeling precision and the simulation speed significantly on a realistic digital mouse phantom. This solver is a promising platform for fluorescence molecular tomography using high-order approximations to the radiative transfer equation.
Interaction of a finite-length ion beam with a background plasma: Reflected ions at the quasi-parallel bow shock

International Nuclear Information System (INIS)

Onsager, T.G.; Winske, D.; Thomsen, M.F.

1991-01-01

The coupling of a finite-length, field-aligned, ion beam with a uniform background plasma is investigated using one-dimensional hybrid computer simulations. The finite-length beam is used to study the interaction between the incident solar wind and ions reflected from the Earth's quasi-parallel bow shock, where the reflection process may vary with time. The coupling between the reflected ions and the solar wind is relevant to ion heating at the bow shock and possibly to the formation of hot, flow anomalies and re-formation of the shock itself. The authors find that although there are many similarities between the instabilities driven by the finite-length beam and those predicted by linear theory for an infinite, homogeneous beam, there are also some important differences. Consistent with linear theory, the waves which dominate the interaction are the electromagnetic right-hand polarized resonant and nonresonant modes. However, in addition to the instability growth rates, the length of time that the waves are in contact with the beam is also an important factor in determining which wave mode will dominate the interaction. Whereas linear theory predicts the nonresonant mode to have the larger growth rate for the parameters they investigate, with finite-length beam they find that both the nonresonant and resonant modes contribute to the interaction. They find that the interaction will result in strong coupling, where a significant fraction of the available free energy is converted into thermal energy in a short time, provided the beam is sufficiently dense or sufficiently long
Multigrid Algorithms for the Solution of Linear Complementarity Problems Arising from Free Boundary Problems.

Science.gov (United States)

1980-10-01

solving (1.3); PFAS combines the concepts of multigrid algorithms with those of projected SOR. In Section 3, we discuss the implementation of PFAS, and...numerique de la torsion elasto- plastique d’une barre cylindrique. In Approximation et Methodes Iteratives de Resolution d’Inequations Variationelles et
Angular Multigrid Preconditioner for Krylov-Based Solution Techniques Applied to the Sn Equations with Highly Forward-Peaked Scattering

Science.gov (United States)

Turcksin, Bruno; Ragusa, Jean C.; Morel, Jim E.

2012-01-01

It is well known that the diffusion synthetic acceleration (DSA) methods for the Sn equations become ineffective in the Fokker-Planck forward-peaked scattering limit. In response to this deficiency, Morel and Manteuffel (1991) developed an angular multigrid method for the 1-D Sn equations. This method is very effective, costing roughly twice as much as DSA per source iteration, and yielding a maximum spectral radius of approximately 0.6 in the Fokker-Planck limit. Pautz, Adams, and Morel (PAM) (1999) later generalized the angular multigrid to 2-D, but it was found that the method was unstable with sufficiently forward-peaked mappings between the angular grids. The method was stabilized via a filtering technique based on diffusion operators, but this filtering also degraded the effectiveness of the overall scheme. The spectral radius was not bounded away from unity in the Fokker-Planck limit, although the method remained more effective than DSA. The purpose of this article is to recast the multidimensional PAM angular multigrid method without the filtering as an Sn preconditioner and use it in conjunction with the Generalized Minimal RESidual (GMRES) Krylov method. The approach ensures stability and our computational results demonstrate that it is also significantly more efficient than an analogous DSA-preconditioned Krylov method.
Parallel Computation on Multicore Processors Using Explicit Form of the Finite Element Method and C++ Standard Libraries

Directory of Open Access Journals (Sweden)

Rek Václav

2016-11-01

Full Text Available In this paper, the form of modifications of the existing sequential code written in C or C++ programming language for the calculation of various kind of structures using the explicit form of the Finite Element Method (Dynamic Relaxation Method, Explicit Dynamics in the NEXX system is introduced. The NEXX system is the core of engineering software NEXIS, Scia Engineer, RFEM and RENEX. It has the possibilities of multithreaded running, which can now be supported at the level of native C++ programming language using standard libraries. Thanks to the high degree of abstraction that a contemporary C++ programming language provides, a respective library created in this way can be very generalized for other purposes of usage of parallelism in computational mechanics.
VALIDATION OF CRACK INTERACTION LIMIT MODEL FOR PARALLEL EDGE CRACKS USING TWO-DIMENSIONAL FINITE ELEMENT ANALYSIS

Directory of Open Access Journals (Sweden)

R. Daud

2013-06-01

Full Text Available Shielding interaction effects of two parallel edge cracks in finite thickness plates subjected to remote tension load is analyzed using a developed finite element analysis program. In the present study, the crack interaction limit is evaluated based on the fitness of service (FFS code, and focus is given to the weak crack interaction region as the crack interval exceeds the length of cracks (b > a. Crack interaction factors are evaluated based on stress intensity factors (SIFs for Mode I SIFs using a displacement extrapolation technique. Parametric studies involved a wide range of crack-to-width (0.05 ≤ a/W ≤ 0.5 and crack interval ratios (b/a > 1. For validation, crack interaction factors are compared with single edge crack SIFs as a state of zero interaction. Within the considered range of parameters, the proposed numerical evaluation used to predict the crack interaction factor reduces the error of existing analytical solution from 1.92% to 0.97% at higher a/W. In reference to FFS codes, the small discrepancy in the prediction of the crack interaction factor validates the reliability of the numerical model to predict crack interaction limits under shielding interaction effects. In conclusion, the numerical model gave a successful prediction in estimating the crack interaction limit, which can be used as a reference for the shielding orientation of other cracks.
Is the Multigrid Method Fault Tolerant? The Two-Grid Case

Energy Technology Data Exchange (ETDEWEB)

Ainsworth, Mark [Brown Univ., Providence, RI (United States). Division of Applied Mathematics; Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Computer Science and Mathematics Division; Glusa, Christian [Brown Univ., Providence, RI (United States). Division of Applied Mathematics

2016-06-30

The predicted reduced resiliency of next-generation high performance computers means that it will become necessary to take into account the effects of randomly occurring faults on numerical methods. Further, in the event of a hard fault occurring, a decision has to be made as to what remedial action should be taken in order to resume the execution of the algorithm. The action that is chosen can have a dramatic effect on the performance and characteristics of the scheme. Ideally, the resulting algorithm should be subjected to the same kind of mathematical analysis that was applied to the original, deterministic variant. The purpose of this work is to provide an analysis of the behaviour of the multigrid algorithm in the presence of faults. Multigrid is arguably the method of choice for the solution of large-scale linear algebra problems arising from discretization of partial differential equations and it is of considerable importance to anticipate its behaviour on an exascale machine. The analysis of resilience of algorithms is in its infancy and the current work is perhaps the first to provide a mathematical model for faults and analyse the behaviour of a state-of-the-art algorithm under the model. It is shown that the Two Grid Method fails to be resilient to faults. Attention is then turned to identifying the minimal necessary remedial action required to restore the rate of convergence to that enjoyed by the ideal fault-free method.
Three-dimensional magnetic field computation on a distributed memory parallel processor

International Nuclear Information System (INIS)

Barion, M.L.

1990-01-01

The analysis of three-dimensional magnetic fields by finite element methods frequently proves too onerous a task for the computing resource on which it is attempted. When non-linear and transient effects are included, it may become impossible to calculate the field distribution to sufficient resolution. One approach to this problem is to exploit the natural parallelism in the finite element method via parallel processing. This paper reports on an implementation of a finite element code for non-linear three-dimensional low-frequency magnetic field calculation on Intel's iPSC/2
An inherently parallel method for solving discretized diffusion equations

International Nuclear Information System (INIS)

Eccleston, B.R.; Palmer, T.S.

1999-01-01

A Monte Carlo approach to solving linear systems of equations is being investigated in the context of the solution of discretized diffusion equations. While the technique was originally devised decades ago, changes in computer architectures (namely, massively parallel machines) have driven the authors to revisit this technique. There are a number of potential advantages to this approach: (1) Analog Monte Carlo techniques are inherently parallel; this is not necessarily true to today's more advanced linear equation solvers (multigrid, conjugate gradient, etc.); (2) Some forms of this technique are adaptive in that they allow the user to specify locations in the problem where resolution is of particular importance and to concentrate the work at those locations; and (3) These techniques permit the solution of very large systems of equations in that matrix elements need not be stored. The user could trade calculational speed for storage if elements of the matrix are calculated on the fly. The goal of this study is to compare the parallel performance of Monte Carlo linear solvers to that of a more traditional parallelized linear solver. The authors observe the linear speedup that they expect from the Monte Carlo algorithm, given that there is no domain decomposition to cause significant communication overhead. Overall, PETSc outperforms the Monte Carlo solver for the test problem. The PETSc parallel performance improves with larger numbers of unknowns for a given number of processors. Parallel performance of the Monte Carlo technique is independent of the size of the matrix and the number of processes. They are investigating modifications to the scheme to accommodate matrix problems with positive off-diagonal elements. They are also currently coding an on-the-fly version of the algorithm to investigate the solution of very large linear systems
Adaptive tree multigrids and simplified spherical harmonics approximation in deterministic neutral and charged particle transport

International Nuclear Information System (INIS)

Kotiluoto, P.

2007-05-01

A new deterministic three-dimensional neutral and charged particle transport code, MultiTrans, has been developed. In the novel approach, the adaptive tree multigrid technique is used in conjunction with simplified spherical harmonics approximation of the Boltzmann transport equation. The development of the new radiation transport code started in the framework of the Finnish boron neutron capture therapy (BNCT) project. Since the application of the MultiTrans code to BNCT dose planning problems, the testing and development of the MultiTrans code has continued in conventional radiotherapy and reactor physics applications. In this thesis, an overview of different numerical radiation transport methods is first given. Special features of the simplified spherical harmonics method and the adaptive tree multigrid technique are then reviewed. The usefulness of the new MultiTrans code has been indicated by verifying and validating the code performance for different types of neutral and charged particle transport problems, reported in separate publications. (orig.)
{sup 10}B multi-grid proportional gas counters for large area thermal neutron detectors

Energy Technology Data Exchange (ETDEWEB)

Andersen, K. [ESS, P.O. Box 176, SE-221 00 Lund (Sweden); Bigault, T. [ILL, BP 156, 6, rue Jules Horowitz, 38042 Grenoble Cedex 9 (France); Birch, J. [Linköping University, SE-581, 83 Linköping (Sweden); Buffet, J. C.; Correa, J. [ILL, BP 156, 6, rue Jules Horowitz, 38042 Grenoble Cedex 9 (France); Hall-Wilton, R. [ESS, P.O. Box 176, SE-221 00 Lund (Sweden); Hultman, L. [Linköping University, SE-581, 83 Linköping (Sweden); Höglund, C. [ESS, P.O. Box 176, SE-221 00 Lund (Sweden); Linköping University, SE-581, 83 Linköping (Sweden); Guérard, B., E-mail: guerard@ill.fr [ILL, BP 156, 6, rue Jules Horowitz, 38042 Grenoble Cedex 9 (France); Jensen, J. [Linköping University, SE-581, 83 Linköping (Sweden); Khaplanov, A. [ILL, BP 156, 6, rue Jules Horowitz, 38042 Grenoble Cedex 9 (France); ESS, P.O. Box 176, SE-221 00 Lund (Sweden); Kirstein, O. [Linköping University, SE-581, 83 Linköping (Sweden); Piscitelli, F.; Van Esch, P. [ILL, BP 156, 6, rue Jules Horowitz, 38042 Grenoble Cedex 9 (France); Vettier, C. [ESS, P.O. Box 176, SE-221 00 Lund (Sweden)

2013-08-21

{sup 3}He was a popular material in neutrons detectors until its availability dropped drastically in 2008. The development of techniques based on alternative convertors is now of high priority for neutron research institutes. Thin films of {sup 10}B or {sup 10}B{sub 4}C have been used in gas proportional counters to detect neutrons, but until now, only for small or medium sensitive area. We present here the multi-grid design, introduced at the ILL and developed in collaboration with ESS for LAN (large area neutron) detectors. Typically thirty {sup 10}B{sub 4}C films of 1 μm thickness are used to convert neutrons into ionizing particles which are subsequently detected in a proportional gas counter. The principle and the fabrication of the multi-grid are described and some preliminary results obtained with a prototype of 200 cm×8 cm are reported; a detection efficiency of 48% has been measured at 2.5 Å with a monochromatic neutron beam line, showing the good potential of this new technique.
FILMPAR: A parallel algorithm designed for the efficient and accurate computation of thin film flow on functional surfaces containing micro-structure

Science.gov (United States)

Lee, Y. C.; Thompson, H. M.; Gaskell, P. H.

2009-12-01

FILMPAR is a highly efficient and portable parallel multigrid algorithm for solving a discretised form of the lubrication approximation to three-dimensional, gravity-driven, continuous thin film free-surface flow over substrates containing micro-scale topography. While generally applicable to problems involving heterogeneous and distributed features, for illustrative purposes the algorithm is benchmarked on a distributed memory IBM BlueGene/P computing platform for the case of flow over a single trench topography, enabling direct comparison with complementary experimental data and existing serial multigrid solutions. Parallel performance is assessed as a function of the number of processors employed and shown to lead to super-linear behaviour for the production of mesh-independent solutions. In addition, the approach is used to solve for the case of flow over a complex inter-connected topographical feature and a description provided of how FILMPAR could be adapted relatively simply to solve for a wider class of related thin film flow problems. Program summaryProgram title: FILMPAR Catalogue identifier: AEEL_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEL_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 530 421 No. of bytes in distributed program, including test data, etc.: 1 960 313 Distribution format: tar.gz Programming language: C++ and MPI Computer: Desktop, server Operating system: Unix/Linux Mac OS X Has the code been vectorised or parallelised?: Yes. Tested with up to 128 processors RAM: 512 MBytes Classification: 12 External routines: GNU C/C++, MPI Nature of problem: Thin film flows over functional substrates containing well-defined single and complex topographical features are of enormous significance, having a wide variety of engineering

Measuring Communication in Parallel Communicating Finite Automata

Directory of Open Access Journals (Sweden)

Henning Bordihn

2014-05-01

Full Text Available Systems of deterministic finite automata communicating by sending their states upon request are investigated, when the amount of communication is restricted. The computational power and decidability properties are studied for the case of returning centralized systems, when the number of necessary communications during the computations of the system is bounded by a function depending on the length of the input. It is proved that an infinite hierarchy of language families exists, depending on the number of messages sent during their most economical recognitions. Moreover, several properties are shown to be not semi-decidable for the systems under consideration.
A scalable approach to modeling groundwater flow on massively parallel computers

International Nuclear Information System (INIS)

Ashby, S.F.; Falgout, R.D.; Tompson, A.F.B.

1995-12-01

We describe a fully scalable approach to the simulation of groundwater flow on a hierarchy of computing platforms, ranging from workstations to massively parallel computers. Specifically, we advocate the use of scalable conceptual models in which the subsurface model is defined independently of the computational grid on which the simulation takes place. We also describe a scalable multigrid algorithm for computing the groundwater flow velocities. We axe thus able to leverage both the engineer's time spent developing the conceptual model and the computing resources used in the numerical simulation. We have successfully employed this approach at the LLNL site, where we have run simulations ranging in size from just a few thousand spatial zones (on workstations) to more than eight million spatial zones (on the CRAY T3D)-all using the same conceptual model
Inversion of potential field data using the finite element method on parallel computers

Science.gov (United States)

Gross, L.; Altinay, C.; Shaw, S.

2015-11-01

In this paper we present a formulation of the joint inversion of potential field anomaly data as an optimization problem with partial differential equation (PDE) constraints. The problem is solved using the iterative Broyden-Fletcher-Goldfarb-Shanno (BFGS) method with the Hessian operator of the regularization and cross-gradient component of the cost function as preconditioner. We will show that each iterative step requires the solution of several PDEs namely for the potential fields, for the adjoint defects and for the application of the preconditioner. In extension to the traditional discrete formulation the BFGS method is applied to continuous descriptions of the unknown physical properties in combination with an appropriate integral form of the dot product. The PDEs can easily be solved using standard conforming finite element methods (FEMs) with potentially different resolutions. For two examples we demonstrate that the number of PDE solutions required to reach a given tolerance in the BFGS iteration is controlled by weighting regularization and cross-gradient but is independent of the resolution of PDE discretization and that as a consequence the method is weakly scalable with the number of cells on parallel computers. We also show a comparison with the UBC-GIF GRAV3D code.
Optimization of Finite-Differencing Kernels for Numerical Relativity Applications

Directory of Open Access Journals (Sweden)

Roberto Alfieri

2018-05-01

Full Text Available A simple optimization strategy for the computation of 3D finite-differencing kernels on many-cores architectures is proposed. The 3D finite-differencing computation is split direction-by-direction and exploits two level of parallelism: in-core vectorization and multi-threads shared-memory parallelization. The main application of this method is to accelerate the high-order stencil computations in numerical relativity codes. Our proposed method provides substantial speedup in computations involving tensor contractions and 3D stencil calculations on different processor microarchitectures, including Intel Knight Landing.
Finite-Larmor-radius stability theory of EBT plasmas

International Nuclear Information System (INIS)

Berk, H.L.; Cheng, C.Z.; Rosenbluth, M.N.; Van Dam, J.W.

1982-11-01

An eikonal ballooning-mode formalism is developed to describe curvature-driven modes of hot electron plasmas in bumpy tori. The formalism treats frequencies comparable to the ion-cyclotron frequency, as well as arbitrary finite Larmor radius and field polarization, although the detailed analysis is restricted to E/sub parallel/ = 0. Moderate hot-electron finite-Larmor-radius effects are found to lower the background beta core limit, whereas strong finite-Lamor-radius effects produce stabilization
On the fixed-stress split scheme as smoother in multigrid methods for coupling flow and geomechanics

NARCIS (Netherlands)

F.J. Gaspar Lorenz (Franscisco); C. Rodrigo (Carmen)

2017-01-01

textabstractThe fixed-stress split method has been widely used as solution method in the coupling of flow and geomechanics. In this work, we analyze the behavior of an inexact version of this algorithm as smoother within a geometric multigrid method, in order to obtain an efficient monolithic solver
Multigrid preconditioned conjugate-gradient method for large-scale wave-front reconstruction.

Science.gov (United States)

Gilles, Luc; Vogel, Curtis R; Ellerbroek, Brent L

2002-09-01

We introduce a multigrid preconditioned conjugate-gradient (MGCG) iterative scheme for computing open-loop wave-front reconstructors for extreme adaptive optics systems. We present numerical simulations for a 17-m class telescope with n = 48756 sensor measurement grid points within the aperture, which indicate that our MGCG method has a rapid convergence rate for a wide range of subaperture average slope measurement signal-to-noise ratios. The total computational cost is of order n log n. Hence our scheme provides for fast wave-front simulation and control in large-scale adaptive optics systems.
Programming the finite element method

CERN Document Server

Smith, I M; Margetts, L

2013-01-01

Many students, engineers, scientists and researchers have benefited from the practical, programming-oriented style of the previous editions of Programming the Finite Element Method, learning how to develop computer programs to solve specific engineering problems using the finite element method. This new fifth edition offers timely revisions that include programs and subroutine libraries fully updated to Fortran 2003, which are freely available online, and provides updated material on advances in parallel computing, thermal stress analysis, plasticity return algorithms, convection boundary c
Expressiveness modulo Bisimilarity of Regular Expressions with Parallel Composition (Extended Abstract

Directory of Open Access Journals (Sweden)

Jos C. M. Baeten

2010-11-01

Full Text Available The languages accepted by finite automata are precisely the languages denoted by regular expressions. In contrast, finite automata may exhibit behaviours that cannot be described by regular expressions up to bisimilarity. In this paper, we consider extensions of the theory of regular expressions with various forms of parallel composition and study the effect on expressiveness. First we prove that adding pure interleaving to the theory of regular expressions strictly increases its expressiveness up to bisimilarity. Then, we prove that replacing the operation for pure interleaving by ACP-style parallel composition gives a further increase in expressiveness. Finally, we prove that the theory of regular expressions with ACP-style parallel composition and encapsulation is expressive enough to express all finite automata up to bisimilarity. Our results extend the expressiveness results obtained by Bergstra, Bethke and Ponse for process algebras with (the binary variant of Kleene's star operation.
Multi-grid and ICCG for problems with interfaces

International Nuclear Information System (INIS)

Dendy, J.E.; Hyman, J.M.

1980-01-01

Computation times for the multi-grid (MG) algorithm, the incomplete Cholesky conjugate gradient (ICCG) algorithm [J. Comp. Phys. 26, 43-65 (1978); Math. Comp. 31, 148-162 (1977)], and the modified ICCG (MICCG) algorithm [BIT 18, 142-156 (1978)] to solve elliptic partial differential equations are compared. The MICCG and ICCG algorithms are more robust than the MG for general positive definite systems. A major advantage of the MG algorithm is that the structure of the problem can be exploited to reduce the solution time significantly. Five example problems are discussed. For problems with little structure and for one-shot calculations ICCG is recommended over MG, and MICCG, over ICCG. For problems that are done many times, it is worth investing the effort to study methods like MG. 1 table
Design and fabrication of multigrid X-ray collimators. [For airborne x-ray spectroscopy

Energy Technology Data Exchange (ETDEWEB)

Acton, L W; Joki, E G; Salmon, R J [Lockheed Missiles and Space Co., Palo Alto, Calif. (USA). Lockheed Palo Alto Research Lab.

1976-08-01

Multigrid X-ray collimators continue to find wide application in space research. This paper treats the principles of their design and fabrication and summarizes the experience obtained in making and flying thirteen such collimators ranging in angular resolution from 10 to 0.7 arc min FWHM. Included is a summary of a survey of scientist-users and industrial producers of collimator grids regarding grid materials, precision, plating, hole quality and results of acceptance testing.
Parallel paving: An algorithm for generating distributed, adaptive, all-quadrilateral meshes on parallel computers

Energy Technology Data Exchange (ETDEWEB)

Lober, R.R.; Tautges, T.J.; Vaughan, C.T.

1997-03-01

Paving is an automated mesh generation algorithm which produces all-quadrilateral elements. It can additionally generate these elements in varying sizes such that the resulting mesh adapts to a function distribution, such as an error function. While powerful, conventional paving is a very serial algorithm in its operation. Parallel paving is the extension of serial paving into parallel environments to perform the same meshing functions as conventional paving only on distributed, discretized models. This extension allows large, adaptive, parallel finite element simulations to take advantage of paving`s meshing capabilities for h-remap remeshing. A significantly modified version of the CUBIT mesh generation code has been developed to host the parallel paving algorithm and demonstrate its capabilities on both two dimensional and three dimensional surface geometries and compare the resulting parallel produced meshes to conventionally paved meshes for mesh quality and algorithm performance. Sandia`s {open_quotes}tiling{close_quotes} dynamic load balancing code has also been extended to work with the paving algorithm to retain parallel efficiency as subdomains undergo iterative mesh refinement.
A Multigrid Algorithm for an Elliptic Problem with a Perturbed Boundary Condition

KAUST Repository

Bonito, Andrea; Pasciak, Joseph E.

2013-01-01

We discuss the preconditioning of systems coupling elliptic operators in Ω⊂Rd, d=2,3, with elliptic operators defined on hypersurfaces. These systems arise naturally when physical phenomena are affected by geometric boundary forces, such as the evolution of liquid drops subject to surface tension. The resulting operators are sums of interior and boundary terms weighted by parameters. We investigate the behavior of multigrid algorithms suited to this context and demonstrate numerical results which suggest uniform preconditioning bounds that are level and parameter independent.
Massively parallel evolutionary computation on GPGPUs

CERN Document Server

Tsutsui, Shigeyoshi

2013-01-01

Evolutionary algorithms (EAs) are metaheuristics that learn from natural collective behavior and are applied to solve optimization problems in domains such as scheduling, engineering, bioinformatics, and finance. Such applications demand acceptable solutions with high-speed execution using finite computational resources. Therefore, there have been many attempts to develop platforms for running parallel EAs using multicore machines, massively parallel cluster machines, or grid computing environments. Recent advances in general-purpose computing on graphics processing units (GPGPU) have opened u
Local multigrid mesh refinement in view of nuclear fuel 3D modelling in pressurised water reactors

International Nuclear Information System (INIS)

Barbie, L.

2013-01-01

The aim of this study is to improve the performances, in terms of memory space and computational time, of the current modelling of the Pellet-Cladding mechanical Interaction (PCI), complex phenomenon which may occurs during high power rises in pressurised water reactors. Among the mesh refinement methods - methods dedicated to efficiently treat local singularities - a local multi-grid approach was selected because it enables the use of a black-box solver while dealing few degrees of freedom at each level. The Local Defect Correction (LDC) method, well suited to a finite element discretization, was first analysed and checked in linear elasticity, on configurations resulting from the PCI, since its use in solid mechanics is little widespread. Various strategies concerning the implementation of the multilevel algorithm were also compared. Coupling the LDC method with the Zienkiewicz-Zhu a posteriori error estimator in order to automatically detect the zones to be refined, was then tested. Performances obtained on two-dimensional and three-dimensional cases are very satisfactory, since the algorithm proposed is more efficient than h-adaptive refinement methods. Lastly, the LDC algorithm was extended to nonlinear mechanics. Space/time refinement as well as transmission of the initial conditions during the re-meshing step were looked at. The first results obtained are encouraging and show the interest of using the LDC method for PCI modelling. (author) [fr
Massively Parallel Geostatistical Inversion of Coupled Processes in Heterogeneous Porous Media

Science.gov (United States)

Ngo, A.; Schwede, R. L.; Li, W.; Bastian, P.; Ippisch, O.; Cirpka, O. A.

2012-04-01

The quasi-linear geostatistical approach is an inversion scheme that can be used to estimate the spatial distribution of a heterogeneous hydraulic conductivity field. The estimated parameter field is considered to be a random variable that varies continuously in space, meets the measurements of dependent quantities (such as the hydraulic head, the concentration of a transported solute or its arrival time) and shows the required spatial correlation (described by certain variogram models). This is a method of conditioning a parameter field to observations. Upon discretization, this results in as many parameters as elements of the computational grid. For a full three dimensional representation of the heterogeneous subsurface it is hardly sufficient to work with resolutions (up to one million parameters) of the model domain that can be achieved on a serial computer. The forward problems to be solved within the inversion procedure consists of the elliptic steady-state groundwater flow equation and the formally elliptic but nearly hyperbolic steady-state advection-dominated solute transport equation in a heterogeneous porous medium. Both equations are discretized by Finite Element Methods (FEM) using fully scalable domain decomposition techniques. Whereas standard conforming FEM is sufficient for the flow equation, for the advection dominated transport equation, which rises well known numerical difficulties at sharp fronts or boundary layers, we use the streamline diffusion approach. The arising linear systems are solved using efficient iterative solvers with an AMG (algebraic multigrid) pre-conditioner. During each iteration step of the inversion scheme one needs to solve a multitude of forward and adjoint problems in order to calculate the sensitivities of each measurement and the related cross-covariance matrix of the unknown parameters and the observations. In order to reduce interprocess communications and to improve the scalability of the code on larger clusters
On Chudnovsky-Based Arithmetic Algorithms in Finite Fields

OpenAIRE

Atighehchi, Kevin; Ballet, Stéphane; Bonnecaze, Alexis; Rolland, Robert

2015-01-01

Thanks to a new construction of the so-called Chudnovsky-Chudnovsky multiplication algorithm, we design efficient algorithms for both the exponentiation and the multiplication in finite fields. They are tailored to hardware implementation and they allow computations to be parallelized while maintaining a low number of bilinear multiplications. We give an example with the finite field ${\\mathbb F}_{16^{13}}$.
A preconditioner for the finite element computation of incompressible, nonlinear elastic deformations

Science.gov (United States)

Whiteley, J. P.

2017-10-01

Large, incompressible elastic deformations are governed by a system of nonlinear partial differential equations. The finite element discretisation of these partial differential equations yields a system of nonlinear algebraic equations that are usually solved using Newton's method. On each iteration of Newton's method, a linear system must be solved. We exploit the structure of the Jacobian matrix to propose a preconditioner, comprising two steps. The first step is the solution of a relatively small, symmetric, positive definite linear system using the preconditioned conjugate gradient method. This is followed by a small number of multigrid V-cycles for a larger linear system. Through the use of exemplar elastic deformations, the preconditioner is demonstrated to facilitate the iterative solution of the linear systems arising. The number of GMRES iterations required has only a very weak dependence on the number of degrees of freedom of the linear systems.
On the use of diffusion synthetic acceleration in parallel 3D neutral particle transport calculations

International Nuclear Information System (INIS)

Brown, P.; Chang, B.

1998-01-01

The linear Boltzmann transport equation (BTE) is an integro-differential equation arising in deterministic models of neutral and charged particle transport. In slab (one-dimensional Cartesian) geometry and certain higher-dimensional cases, Diffusion Synthetic Acceleration (DSA) is known to be an effective algorithm for the iterative solution of the discretized BTE. Fourier and asymptotic analyses have been applied to various idealizations (e.g., problems on infinite domains with constant coefficients) to obtain sharp bounds on the convergence rate of DSA in such cases. While DSA has been shown to be a highly effective acceleration (or preconditioning) technique in one-dimensional problems, it has been observed to be less effective in higher dimensions. This is due in part to the expense of solving the related diffusion linear system. We investigate here the effectiveness of a parallel semicoarsening multigrid (SMG) solution approach to DSA preconditioning in several three dimensional problems. In particular, we consider the algorithmic and implementation scalability of a parallel SMG-DSA preconditioner on several types of test problems
FACC: A Novel Finite Automaton Based on Cloud Computing for the Multiple Longest Common Subsequences Search

Directory of Open Access Journals (Sweden)

Yanni Li

2012-01-01

Full Text Available Searching for the multiple longest common subsequences (MLCS has significant applications in the areas of bioinformatics, information processing, and data mining, and so forth, Although a few parallel MLCS algorithms have been proposed, the efficiency and effectiveness of the algorithms are not satisfactory with the increasing complexity and size of biologic data. To overcome the shortcomings of the existing MLCS algorithms, and considering that MapReduce parallel framework of cloud computing being a promising technology for cost-effective high performance parallel computing, a novel finite automaton (FA based on cloud computing called FACC is proposed under MapReduce parallel framework, so as to exploit a more efficient and effective general parallel MLCS algorithm. FACC adopts the ideas of matched pairs and finite automaton by preprocessing sequences, constructing successor tables, and common subsequences finite automaton to search for MLCS. Simulation experiments on a set of benchmarks from both real DNA and amino acid sequences have been conducted and the results show that the proposed FACC algorithm outperforms the current leading parallel MLCS algorithm FAST-MLCS.

Multigrid methods for S/sub N/ problems

International Nuclear Information System (INIS)

Nowak, P.F.; Larsen, E.W.; Martin, W.R.

1987-01-01

It has long been known that the standard source iteration (SI) method for obtaining iterative solutions of S/sub N/ problems is very slowly converging in optically thick regions with low absorption. The rebalance and diffusion synthetic acceleration (DSA) methods are generalizations of SI that have been developed to accelerate convergence, but neither of these methods has been completely successful. In particular, the rebalance method tends to become unstable in problems where it is needed most (problems with high scattering ratios c = 1), while the DSA method, to be implemented in a stable fashion, requires the solution of a particular system of acceleration equations, and this has been done efficiently in two-dimensional geometries only for the diamond difference S/sub N/ equations. This paper discusses another extension of the SI method, namely, SI combined with the spatial multigrid algorithm (SIMG). This appears to be a viable way to accelerate many S/sub N/ problems in multidimensional geometries, provided the finest mesh consists of cells that are not optically thick
On Start to End Simulation and Modeling Issues of the Megawatt Proton Beam Facility at PSI

CERN Document Server

Adelmann, Andreas; Fitze, Hansruedi; Geus, Roman; Humbel, Martin; Stingelin, Lukas

2005-01-01

At the Paul Scherrer Institut (PSI) we routinely extract a one megawatt (CW) proton beam out of our 590 MeV Ring Cyclotron. In the frame of the ongoing upgrade program, large scale simulations have been undertaken in order to provide a sound basis to assess the behaviour of very intense beams in cyclotrons. The challenges and attempts towards massive parallel three dimensional start-to- end simulations will be discussed. The used state of the art numerical tools (mapping techniques, time integration, parallel FFT and finite element based multigrid Poisson solver) and their parallel implementation will be discussed. Results will be presented in the area of: space charge dominated beam transport including neighbouring turns, eigenmode analysis to obtain accurate electromagnetic fields in large the rf cavities and higher order mode interaction between the electromagnetic fields and the particle beam. For the problems investigated so far a good agreement between theory i.e. calculations and measurements is obtain...
Parallel computation of transverse wakes in linear colliders

International Nuclear Information System (INIS)

Zhan, Xiaowei; Ko, Kwok.

1996-11-01

SLAC has proposed the detuned structure (DS) as one possible design to control the emittance growth of long bunch trains due to transverse wakefields in the Next Linear Collider (NLC). The DS consists of 206 cells with tapering from cell to cell of the order of few microns to provide Gaussian detuning of the dipole modes. The decoherence of these modes leads to two orders of magnitude reduction in wakefield experienced by the trailing bunch. To model such a large heterogeneous structure realistically is impractical with finite-difference codes using structured grids. The authors have calculated the wakefield in the DS on a parallel computer with a finite-element code using an unstructured grid. The parallel implementation issues are presented along with simulation results that include contributions from higher dipole bands and wall dissipation
The development of an algebraic multigrid algorithm for symmetric positive definite linear systems

Energy Technology Data Exchange (ETDEWEB)

Vanek, P.; Mandel, J.; Brezina, M. [Univ. of Colorado, Denver, CO (United States)

1996-12-31

An algebraic multigrid algorithm for symmetric, positive definite linear systems is developed based on the concept of prolongation by smoothed aggregation. Coarse levels are generated automatically. We present a set of requirements motivated heuristically by a convergence theory. The algorithm then attempts to satisfy the requirements. Input to the method are the coefficient matrix and zero energy modes, which are determined from nodal coordinates and knowledge of the differential equation. Efficiency of the resulting algorithm is demonstrated by computational results on real world problems from solid elasticity, plate blending, and shells.
A two-level parallel direct search implementation for arbitrarily sized objective functions

Energy Technology Data Exchange (ETDEWEB)

Hutchinson, S.A.; Shadid, N.; Moffat, H.K. [Sandia National Labs., Albuquerque, NM (United States)] [and others

1994-12-31

In the past, many optimization schemes for massively parallel computers have attempted to achieve parallel efficiency using one of two methods. In the case of large and expensive objective function calculations, the optimization itself may be run in serial and the objective function calculations parallelized. In contrast, if the objective function calculations are relatively inexpensive and can be performed on a single processor, then the actual optimization routine itself may be parallelized. In this paper, a scheme based upon the Parallel Direct Search (PDS) technique is presented which allows the objective function calculations to be done on an arbitrarily large number (p{sub 2}) of processors. If, p, the number of processors available, is greater than or equal to 2p{sub 2} then the optimization may be parallelized as well. This allows for efficient use of computational resources since the objective function calculations can be performed on the number of processors that allow for peak parallel efficiency and then further speedup may be achieved by parallelizing the optimization. Results are presented for an optimization problem which involves the solution of a PDE using a finite-element algorithm as part of the objective function calculation. The optimum number of processors for the finite-element calculations is less than p/2. Thus, the PDS method is also parallelized. Performance comparisons are given for a nCUBE 2 implementation.
A NetCDF version of the two-dimensional energy balance model based on the full multigrid algorithm

Directory of Open Access Journals (Sweden)

Kelin Zhuang

2017-01-01

Full Text Available A NetCDF version of the two-dimensional energy balance model based on the full multigrid method in Fortran is introduced for both pedagogical and research purposes. Based on the land–sea–ice distribution, orbital elements, greenhouse gases concentration, and albedo, the code calculates the global seasonal surface temperature. A step-by-step guide with examples is provided for practice.
A NetCDF version of the two-dimensional energy balance model based on the full multigrid algorithm

Science.gov (United States)

Zhuang, Kelin; North, Gerald R.; Stevens, Mark J.

A NetCDF version of the two-dimensional energy balance model based on the full multigrid method in Fortran is introduced for both pedagogical and research purposes. Based on the land-sea-ice distribution, orbital elements, greenhouse gases concentration, and albedo, the code calculates the global seasonal surface temperature. A step-by-step guide with examples is provided for practice.
Combinatorics of spreads and parallelisms

CERN Document Server

Johnson, Norman

2010-01-01

Partitions of Vector Spaces Quasi-Subgeometry Partitions Finite Focal-SpreadsGeneralizing André SpreadsThe Going Up Construction for Focal-SpreadsSubgeometry Partitions Subgeometry and Quasi-Subgeometry Partitions Subgeometries from Focal-SpreadsExtended André SubgeometriesKantor's Flag-Transitive DesignsMaximal Additive Partial SpreadsSubplane Covered Nets and Baer Groups Partial Desarguesian t-Parallelisms Direct Products of Affine PlanesJha-Johnson SL(2,
Parallel Newton-Krylov-Schwarz algorithms for the transonic full potential equation

Science.gov (United States)

Cai, Xiao-Chuan; Gropp, William D.; Keyes, David E.; Melvin, Robin G.; Young, David P.

1996-01-01

We study parallel two-level overlapping Schwarz algorithms for solving nonlinear finite element problems, in particular, for the full potential equation of aerodynamics discretized in two dimensions with bilinear elements. The overall algorithm, Newton-Krylov-Schwarz (NKS), employs an inexact finite-difference Newton method and a Krylov space iterative method, with a two-level overlapping Schwarz method as a preconditioner. We demonstrate that NKS, combined with a density upwinding continuation strategy for problems with weak shocks, is robust and, economical for this class of mixed elliptic-hyperbolic nonlinear partial differential equations, with proper specification of several parameters. We study upwinding parameters, inner convergence tolerance, coarse grid density, subdomain overlap, and the level of fill-in in the incomplete factorization, and report their effect on numerical convergence rate, overall execution time, and parallel efficiency on a distributed-memory parallel computer.
Parallel implementation of the PHOENIX generalized stellar atmosphere program. II. Wavelength parallelization

International Nuclear Information System (INIS)

Baron, E.; Hauschildt, Peter H.

1998-01-01

We describe an important addition to the parallel implementation of our generalized nonlocal thermodynamic equilibrium (NLTE) stellar atmosphere and radiative transfer computer program PHOENIX. In a previous paper in this series we described data and task parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. These algorithms divided the work spatially or by spectral lines, that is, distributing the radial zones, individual spectral lines, or characteristic rays among different processors and employ, in addition, task parallelism for logically independent functions (such as atomic and molecular line opacities). For finite, monotonic velocity fields, the radiative transfer equation is an initial value problem in wavelength, and hence each wavelength point depends upon the previous one. However, for sophisticated NLTE models of both static and moving atmospheres needed to accurately describe, e.g., novae and supernovae, the number of wavelength points is very large (200,000 - 300,000) and hence parallelization over wavelength can lead both to considerable speedup in calculation time and the ability to make use of the aggregate memory available on massively parallel supercomputers. Here, we describe an implementation of a pipelined design for the wavelength parallelization of PHOENIX, where the necessary data from the processor working on a previous wavelength point is sent to the processor working on the succeeding wavelength point as soon as it is known. Our implementation uses a MIMD design based on a relatively small number of standard message passing interface (MPI) library calls and is fully portable between serial and parallel computers. copyright 1998 The American Astronomical Society
Domain decomposition methods and parallel computing

International Nuclear Information System (INIS)

Meurant, G.

1991-01-01

In this paper, we show how to efficiently solve large linear systems on parallel computers. These linear systems arise from discretization of scientific computing problems described by systems of partial differential equations. We show how to get a discrete finite dimensional system from the continuous problem and the chosen conjugate gradient iterative algorithm is briefly described. Then, the different kinds of parallel architectures are reviewed and their advantages and deficiencies are emphasized. We sketch the problems found in programming the conjugate gradient method on parallel computers. For this algorithm to be efficient on parallel machines, domain decomposition techniques are introduced. We give results of numerical experiments showing that these techniques allow a good rate of convergence for the conjugate gradient algorithm as well as computational speeds in excess of a billion of floating point operations per second. (author). 5 refs., 11 figs., 2 tabs., 1 inset
Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

Science.gov (United States)

Hsieh, Shang-Hsien

1993-01-01

The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.
Wakefield calculations on parallel computers

International Nuclear Information System (INIS)

Schoessow, P.

1990-01-01

The use of parallelism in the solution of wakefield problems is illustrated for two different computer architectures (SIMD and MIMD). Results are given for finite difference codes which have been implemented on a Connection Machine and an Alliant FX/8 and which are used to compute wakefields in dielectric loaded structures. Benchmarks on code performance are presented for both cases. 4 refs., 3 figs., 2 tabs
Parallel and non-parallel laminar mixed convection flow in an inclined tube: The effect of the boundary conditions

International Nuclear Information System (INIS)

Barletta, A.

2008-01-01

The necessary condition for the onset of parallel flow in the fully developed region of an inclined duct is applied to the case of a circular tube. Parallel flow in inclined ducts is an uncommon regime, since in most cases buoyancy tends to produce the onset of secondary flow. The present study shows how proper thermal boundary conditions may preserve parallel flow regime. Mixed convection flow is studied for a special non-axisymmetric thermal boundary condition that, with a proper choice of a switch parameter, may be compatible with parallel flow. More precisely, a circumferentially variable heat flux distribution is prescribed on the tube wall, expressed as a sinusoidal function of the azimuthal coordinate θ with period 2π. A π/2 rotation in the position of the maximum heat flux, achieved by setting the switch parameter, may allow or not the existence of parallel flow. Two cases are considered corresponding to parallel and non-parallel flow. In the first case, the governing balance equations allow a simple analytical solution. On the contrary, in the second case, the local balance equations are solved numerically by employing a finite element method
Layer-oriented multigrid wavefront reconstruction algorithms for multi-conjugate adaptive optics

Science.gov (United States)

Gilles, Luc; Ellerbroek, Brent L.; Vogel, Curtis R.

2003-02-01

Multi-conjugate adaptive optics (MCAO) systems with 104-105 degrees of freedom have been proposed for future giant telescopes. Using standard matrix methods to compute, optimize, and implement wavefront control algorithms for these systems is impractical, since the number of calculations required to compute and apply the reconstruction matrix scales respectively with the cube and the square of the number of AO degrees of freedom. In this paper, we develop an iterative sparse matrix implementation of minimum variance wavefront reconstruction for telescope diameters up to 32m with more than 104 actuators. The basic approach is the preconditioned conjugate gradient method, using a multigrid preconditioner incorporating a layer-oriented (block) symmetric Gauss-Seidel iterative smoothing operator. We present open-loop numerical simulation results to illustrate algorithm convergence.
Neurite, a finite difference large scale parallel program for the simulation of electrical signal propagation in neurites under mechanical loading.

Directory of Open Access Journals (Sweden)

Julián A García-Grajales

Full Text Available With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, functions of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells, Neurite, has only very recently been proposed. In this paper, we present the implementation details of this model: a finite difference parallel program for simulating electrical signal propagation along neurites under mechanical loading. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to simulate the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The simulation of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the simulated cells grows. The solvers implemented in Neurite--explicit and implicit--were therefore parallelized using graphics processing units in order to reduce the burden of the simulation costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between electrophysiology and mechanics. This paper provides the details of the parallel implementation of Neurite, along with three different application examples: a long myelinated axon
Modern industrial simulation tools: Kernel-level integration of high performance parallel processing, object-oriented numerics, and adaptive finite element analysis. Final report, July 16, 1993--September 30, 1997

Energy Technology Data Exchange (ETDEWEB)

Deb, M.K.; Kennon, S.R.

1998-04-01

A cooperative R&D effort between industry and the US government, this project, under the HPPP (High Performance Parallel Processing) initiative of the Dept. of Energy, started the investigations into parallel object-oriented (OO) numerics. The basic goal was to research and utilize the emerging technologies to create a physics-independent computational kernel for applications using adaptive finite element method. The industrial team included Computational Mechanics Co., Inc. (COMCO) of Austin, TX (as the primary contractor), Scientific Computing Associates, Inc. (SCA) of New Haven, CT, Texaco and CONVEX. Sandia National Laboratory (Albq., NM) was the technology partner from the government side. COMCO had the responsibility of the main kernel design and development, SCA had the lead in parallel solver technology and guidance on OO technologies was Sandia`s main expertise in this venture. CONVEX and Texaco supported the partnership by hardware resource and application knowledge, respectively. As such, a minimum of fifty-percent cost-sharing was provided by the industry partnership during this project. This report describes the R&D activities and provides some details about the prototype kernel and example applications.
Parallel algorithms for 2-D cylindrical transport equations of Eigenvalue problem

International Nuclear Information System (INIS)

Wei, J.; Yang, S.

2013-01-01

In this paper, aimed at the neutron transport equations of eigenvalue problem under 2-D cylindrical geometry on unstructured grid, the discrete scheme of Sn discrete ordinate and discontinuous finite is built, and the parallel computation for the scheme is realized on MPI systems. Numerical experiments indicate that the designed parallel algorithm can reach perfect speedup, it has good practicality and scalability. (authors)
Finite Volume Element (FVE) discretization and multilevel solution of the axisymmetric heat equation

Science.gov (United States)

Litaker, Eric T.

1994-12-01

The axisymmetric heat equation, resulting from a point-source of heat applied to a metal block, is solved numerically; both iterative and multilevel solutions are computed in order to compare the two processes. The continuum problem is discretized in two stages: finite differences are used to discretize the time derivatives, resulting is a fully implicit backward time-stepping scheme, and the Finite Volume Element (FVE) method is used to discretize the spatial derivatives. The application of the FVE method to a problem in cylindrical coordinates is new, and results in stencils which are analyzed extensively. Several iteration schemes are considered, including both Jacobi and Gauss-Seidel; a thorough analysis of these schemes is done, using both the spectral radii of the iteration matrices and local mode analysis. Using this discretization, a Gauss-Seidel relaxation scheme is used to solve the heat equation iteratively. A multilevel solution process is then constructed, including the development of intergrid transfer and coarse grid operators. Local mode analysis is performed on the components of the amplification matrix, resulting in the two-level convergence factors for various combinations of the operators. A multilevel solution process is implemented by using multigrid V-cycles; the iterative and multilevel results are compared and discussed in detail. The computational savings resulting from the multilevel process are then discussed.
Finite mixture model applied in the analysis of a turbulent bistable flow on two parallel circular cylinders

Energy Technology Data Exchange (ETDEWEB)

Paula, A.V. de, E-mail: vagtinski@mecanica.ufrgs.br [PROMEC – Programa de Pós Graduação em Engenharia Mecânica, UFRGS – Universidade Federal do Rio Grande do Sul, Porto Alegre, RS (Brazil); Möller, S.V., E-mail: svmoller@ufrgs.br [PROMEC – Programa de Pós Graduação em Engenharia Mecânica, UFRGS – Universidade Federal do Rio Grande do Sul, Porto Alegre, RS (Brazil)

2013-11-15

This paper presents a study of the bistable phenomenon which occurs in the turbulent flow impinging on circular cylinders placed side-by-side. Time series of axial and transversal velocity obtained with the constant temperature hot wire anemometry technique in an aerodynamic channel are used as input data in a finite mixture model, to classify the observed data according to a family of probability density functions. Wavelet transforms are applied to analyze the unsteady turbulent signals. Results of flow visualization show that the flow is predominantly two-dimensional. A double-well energy model is suggested to describe the behavior of the bistable phenomenon in this case. -- Highlights: ► Bistable flow on two parallel cylinders is studied with hot wire anemometry as a first step for the application on the analysis to tube bank flow. ► The method of maximum likelihood estimation is applied to hot wire experimental series to classify the data according to PDF functions in a mixture model approach. ► Results show no evident correlation between the changes of flow modes with time. ► An energy model suggests the presence of more than two flow modes.

Least-squares wave-front reconstruction of Shack-Hartmann sensors and shearing interferometers using multigrid techniques

International Nuclear Information System (INIS)

Baker, K.L.

2005-01-01

This article details a multigrid algorithm that is suitable for least-squares wave-front reconstruction of Shack-Hartmann and shearing interferometer wave-front sensors. The algorithm detailed in this article is shown to scale with the number of subapertures in the same fashion as fast Fourier transform techniques, making it suitable for use in applications requiring a large number of subapertures and high Strehl ratio systems such as for high spatial frequency characterization of high-density plasmas, optics metrology, and multiconjugate and extreme adaptive optics systems
3D magnetospheric parallel hybrid multi-grid method applied to planet–plasma interactions

Energy Technology Data Exchange (ETDEWEB)

Leclercq, L., E-mail: ludivine.leclercq@latmos.ipsl.fr [LATMOS/IPSL, UVSQ Université Paris-Saclay, UPMC Univ. Paris 06, CNRS, Guyancourt (France); Modolo, R., E-mail: ronan.modolo@latmos.ipsl.fr [LATMOS/IPSL, UVSQ Université Paris-Saclay, UPMC Univ. Paris 06, CNRS, Guyancourt (France); Leblanc, F. [LATMOS/IPSL, UPMC Univ. Paris 06 Sorbonne Universités, UVSQ, CNRS, Paris (France); Hess, S. [ONERA, Toulouse (France); Mancini, M. [LUTH, Observatoire Paris-Meudon (France)

2016-03-15

We present a new method to exploit multiple refinement levels within a 3D parallel hybrid model, developed to study planet–plasma interactions. This model is based on the hybrid formalism: ions are kinetically treated whereas electrons are considered as a inertia-less fluid. Generally, ions are represented by numerical particles whose size equals the volume of the cells. Particles that leave a coarse grid subsequently entering a refined region are split into particles whose volume corresponds to the volume of the refined cells. The number of refined particles created from a coarse particle depends on the grid refinement rate. In order to conserve velocity distribution functions and to avoid calculations of average velocities, particles are not coalesced. Moreover, to ensure the constancy of particles' shape function sizes, the hybrid method is adapted to allow refined particles to move within a coarse region. Another innovation of this approach is the method developed to compute grid moments at interfaces between two refinement levels. Indeed, the hybrid method is adapted to accurately account for the special grid structure at the interfaces, avoiding any overlapping grid considerations. Some fundamental test runs were performed to validate our approach (e.g. quiet plasma flow, Alfven wave propagation). Lastly, we also show a planetary application of the model, simulating the interaction between Jupiter's moon Ganymede and the Jovian plasma.
Vector and parallel processors in computational science

International Nuclear Information System (INIS)

Duff, I.S.; Reid, J.K.

1985-01-01

This book presents the papers given at a conference which reviewed the new developments in parallel and vector processing. Topics considered at the conference included hardware (array processors, supercomputers), programming languages, software aids, numerical methods (e.g., Monte Carlo algorithms, iterative methods, finite elements, optimization), and applications (e.g., neutron transport theory, meteorology, image processing)
Solving the Flood Propagation Problem with Newton Algorithm on Parallel Systems

Directory of Open Access Journals (Sweden)

Chefi Triki

2012-04-01

Full Text Available In this paper we propose a parallel implementation for the flood propagation method Flo2DH. The model is built on a finite element spatial approximation combined with a Newton algorithm that uses a direct LU linear solver. The parallel implementation has been developed by using the standard MPI protocol and has been tested on a set of real world problems.
Parallel linear solvers for simulations of reactor thermal hydraulics

International Nuclear Information System (INIS)

Yan, Y.; Antal, S.P.; Edge, B.; Keyes, D.E.; Shaver, D.; Bolotnov, I.A.; Podowski, M.Z.

2011-01-01

The state-of-the-art multiphase fluid dynamics code, NPHASE-CMFD, performs multiphase flow simulations in complex domains using implicit nonlinear treatment of the governing equations and in parallel, which is a very challenging environment for the linear solver. The present work illustrates how the Portable, Extensible Toolkit for Scientific Computation (PETSc) and scalable Algebraic Multigrid (AMG) preconditioner from Hypre can be utilized to construct robust and scalable linear solvers for the Newton correction equation obtained from the discretized system of governing conservation equations in NPHASE-CMFD. The overall long-tem objective of this work is to extend the NPHASE-CMFD code into a fully-scalable solver of multiphase flow and heat transfer problems, applicable to both steady-state and stiff time-dependent phenomena in complete fuel assemblies of nuclear reactors and, eventually, the entire reactor core (such as the Virtual Reactor concept envisioned by CASL). This campaign appropriately begins with the linear algebraic equation solver, which is traditionally a bottleneck to scalability in PDE-based codes. The computational complexity of the solver is usually superlinear in problem size, whereas the rest of the code, the “physics” portion, usually has its complexity linear in the problem size. (author)
Design, development and use of the finite element machine

Science.gov (United States)

Adams, L. M.; Voigt, R. C.

1983-01-01

Some of the considerations that went into the design of the Finite Element Machine, a research asynchronous parallel computer are described. The present status of the system is also discussed along with some indication of the type of results that were obtained.
StagBL : A Scalable, Portable, High-Performance Discretization and Solver Layer for Geodynamic Simulation

Science.gov (United States)

Sanan, P.; Tackley, P. J.; Gerya, T.; Kaus, B. J. P.; May, D.

2017-12-01

StagBL is an open-source parallel solver and discretization library for geodynamic simulation,encapsulating and optimizing operations essential to staggered-grid finite volume Stokes flow solvers.It provides a parallel staggered-grid abstraction with a high-level interface in C and Fortran.On top of this abstraction, tools are available to define boundary conditions and interact with particle systems.Tools and examples to efficiently solve Stokes systems defined on the grid are provided in small (direct solver), medium (simple preconditioners), and large (block factorization and multigrid) model regimes.By working directly with leading application codes (StagYY, I3ELVIS, and LaMEM) and providing an API and examples to integrate with others, StagBL aims to become a community tool supplying scalable, portable, reproducible performance toward novel science in regional- and planet-scale geodynamics and planetary science.By implementing kernels used by many research groups beneath a uniform abstraction layer, the library will enable optimization for modern hardware, thus reducing community barriers to large- or extreme-scale parallel simulation on modern architectures. In particular, the library will include CPU-, Manycore-, and GPU-optimized variants of matrix-free operators and multigrid components.The common layer provides a framework upon which to introduce innovative new tools.StagBL will leverage p4est to provide distributed adaptive meshes, and incorporate a multigrid convergence analysis tool.These options, in addition to a wealth of solver options provided by an interface to PETSc, will make the most modern solution techniques available from a common interface. StagBL in turn provides a PETSc interface, DMStag, to its central staggered grid abstraction.We present public version 0.5 of StagBL, including preliminary integration with application codes and demonstrations with its own demonstration application, StagBLDemo. Central to StagBL is the notion of an
The GBS code for tokamak scrape-off layer simulations

International Nuclear Information System (INIS)

Halpern, F.D.; Ricci, P.; Jolliet, S.; Loizu, J.; Morales, J.; Mosetto, A.; Musil, F.; Riva, F.; Tran, T.M.; Wersal, C.

2016-01-01

We describe a new version of GBS, a 3D global, flux-driven plasma turbulence code to simulate the turbulent dynamics in the tokamak scrape-off layer (SOL), superseding the code presented by Ricci et al. (2012) [14]. The present work is driven by the objective of studying SOL turbulent dynamics in medium size tokamaks and beyond with a high-fidelity physics model. We emphasize an intertwining framework of improved physics models and the computational improvements that allow them. The model extensions include neutral atom physics, finite ion temperature, the addition of a closed field line region, and a non-Boussinesq treatment of the polarization drift. GBS has been completely refactored with the introduction of a 3-D Cartesian communicator and a scalable parallel multigrid solver. We report dramatically enhanced parallel scalability, with the possibility of treating electromagnetic fluctuations very efficiently. The method of manufactured solutions as a verification process has been carried out for this new code version, demonstrating the correct implementation of the physical model.
A parallel direct solver for the self-adaptive hp Finite Element Method

KAUST Repository

Paszyński, Maciej R.; Pardo, David; Torres-Verdí n, Carlos; Demkowicz, Leszek F.; Calo, Victor M.

2010-01-01

measurement simulations problems. We measure the execution time and memory usage of the solver over a large regular mesh with 1.5 million degrees of freedom as well as on the highly non-regular mesh, generated by the self-adaptive h p-FEM, with finite elements
Use of massively parallel computing to improve modelling accuracy within the nuclear sector

Directory of Open Access Journals (Sweden)

L M Evans

2016-06-01

This work presents recent advancements in three techniques: Uncertainty quantification (UQ; Cellular automata finite element (CAFE; Image based finite element methods (IBFEM. Case studies are presented demonstrating their suitability for use in nuclear engineering made possible by advancements in parallel computing hardware that is projected to be available for industry within the next decade costing of the order of $100k.
Asynchronous Task-Based Parallelization of Algebraic Multigrid

KAUST Repository

AlOnazi, Amani A.; Markomanolis, George S.; Keyes, David E.

2017-01-01

As processor clock rates become more dynamic and workloads become more adaptive, the vulnerability to global synchronization that already complicates programming for performance in today's petascale environment will be exacerbated. Algebraic
Parallel Auxiliary Space AMG Solver for $H(div)$ Problems

Energy Technology Data Exchange (ETDEWEB)

Kolev, Tzanio V. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Vassilevski, Panayot S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2012-12-18

We present a family of scalable preconditioners for matrices arising in the discretization of $H(div)$ problems using the lowest order Raviart--Thomas finite elements. Our approach belongs to the class of “auxiliary space''--based methods and requires only the finite element stiffness matrix plus some minimal additional discretization information about the topology and orientation of mesh entities. Also, we provide a detailed algebraic description of the theory, parallel implementation, and different variants of this parallel auxiliary space divergence solver (ADS) and discuss its relations to the Hiptmair--Xu (HX) auxiliary space decomposition of $H(div)$ [SIAM J. Numer. Anal., 45 (2007), pp. 2483--2509] and to the auxiliary space Maxwell solver AMS [J. Comput. Math., 27 (2009), pp. 604--623]. Finally, an extensive set of numerical experiments demonstrates the robustness and scalability of our implementation on large-scale $H(div)$ problems with large jumps in the material coefficients.
Massive parallelization of a 3D finite difference electromagnetic forward solution using domain decomposition methods on multiple CUDA enabled GPUs

Science.gov (United States)

Schultz, A.

2010-12-01

describe our ongoing efforts to achieve massive parallelization on a novel hybrid GPU testbed machine currently configured with 12 Intel Westmere Xeon CPU cores (or 24 parallel computational threads) with 96 GB DDR3 system memory, 4 GPU subsystems which in aggregate contain 960 NVidia Tesla GPU cores with 16 GB dedicated DDR3 GPU memory, and a second interleved bank of 4 GPU subsystems containing in aggregate 1792 NVidia Fermi GPU cores with 12 GB dedicated DDR5 GPU memory. We are applying domain decomposition methods to a modified version of Weiss' (2001) 3D frequency domain full physics EM finite difference code, an open source GPL licensed f90 code available for download from www.OpenEM.org. This will be the core of a new hybrid 3D inversion that parallelizes frequencies across CPUs and individual forward solutions across GPUs. We describe progress made in modifying the code to use direct solvers in GPU cores dedicated to each small subdomain, iteratively improving the solution by matching adjacent subdomain boundary solutions, rather than iterative Krylov space sparse solvers as currently applied to the whole domain.
A robust and efficient finite volume scheme for the discretization of diffusive flux on extremely skewed meshes in complex geometries

Science.gov (United States)

Traoré, Philippe; Ahipo, Yves Marcel; Louste, Christophe

2009-08-01

In this paper an improved finite volume scheme to discretize diffusive flux on a non-orthogonal mesh is proposed. This approach, based on an iterative technique initially suggested by Khosla [P.K. Khosla, S.G. Rubin, A diagonally dominant second-order accurate implicit scheme, Computers and Fluids 2 (1974) 207-209] and known as deferred correction, has been intensively utilized by Muzaferija [S. Muzaferija, Adaptative finite volume method for flow prediction using unstructured meshes and multigrid approach, Ph.D. Thesis, Imperial College, 1994] and later Fergizer and Peric [J.H. Fergizer, M. Peric, Computational Methods for Fluid Dynamics, Springer, 2002] to deal with the non-orthogonality of the control volumes. Using a more suitable decomposition of the normal gradient, our scheme gives accurate solutions in geometries where the basic idea of Muzaferija fails. First the performances of both schemes are compared for a Poisson problem solved in quadrangular domains where control volumes are increasingly skewed in order to test their robustness and efficiency. It is shown that convergence properties and the accuracy order of the solution are not degraded even on extremely skewed mesh. Next, the very stable behavior of the method is successfully demonstrated on a randomly distorted grid as well as on an anisotropically distorted one. Finally we compare the solution obtained for quadrilateral control volumes to the ones obtained with a finite element code and with an unstructured version of our finite volume code for triangular control volumes. No differences can be observed between the different solutions, which demonstrates the effectiveness of our approach.
Massive parallel electromagnetic field simulation program JEMS-FDTD design and implementation on jasmin

International Nuclear Information System (INIS)

Li Hanyu; Zhou Haijing; Dong Zhiwei; Liao Cheng; Chang Lei; Cao Xiaolin; Xiao Li

2010-01-01

A large-scale parallel electromagnetic field simulation program JEMS-FDTD(J Electromagnetic Solver-Finite Difference Time Domain) is designed and implemented on JASMIN (J parallel Adaptive Structured Mesh applications INfrastructure). This program can simulate propagation, radiation, couple of electromagnetic field by solving Maxwell equations on structured mesh explicitly with FDTD method. JEMS-FDTD is able to simulate billion-mesh-scale problems on thousands of processors. In this article, the program is verified by simulating the radiation of an electric dipole. A beam waveguide is simulated to demonstrate the capability of large scale parallel computation. A parallel performance test indicates that a high parallel efficiency is obtained. (authors)
Self-balanced modulation and magnetic rebalancing method for parallel multilevel inverters

Science.gov (United States)

Li, Hui; Shi, Yanjun

2017-11-28

A self-balanced modulation method and a closed-loop magnetic flux rebalancing control method for parallel multilevel inverters. The combination of the two methods provides for balancing of the magnetic flux of the inter-cell transformers (ICTs) of the parallel multilevel inverters without deteriorating the quality of the output voltage. In various embodiments a parallel multi-level inverter modulator is provide including a multi-channel comparator to generate a multiplexed digitized ideal waveform for a parallel multi-level inverter and a finite state machine (FSM) module coupled to the parallel multi-channel comparator, the FSM module to receive the multiplexed digitized ideal waveform and to generate a pulse width modulated gate-drive signal for each switching device of the parallel multi-level inverter. The system and method provides for optimization of the output voltage spectrum without influence the magnetic balancing.
Sharp asymptotics for stochastic dynamics with parallel updating rule

NARCIS (Netherlands)

Nardi, F.R.; Spitoni, C.

2012-01-01

In this paper we study the metastability problem for a stochastic dynamics with a parallel updating rule; in particular we consider a finite volume Probabilistic Cellular Automaton (PCA) in a small external field at low temperature regime. We are interested in the nucleation of the system, i.e., the
Sharp Asymptotics for Stochastic Dynamics with Parallel Updating Rule

NARCIS (Netherlands)

Nardi, F.R.; Spitoni, C.

2012-01-01

In this paper we study the metastability problem for a stochastic dynamics with a parallel updating rule; in particular we consider a finite volume Probabilistic Cellular Automaton (PCA) in a small external field at low temperature regime. We are interested in the nucleation of the system, i.e.,
'Research and development of research information infrastructure'. Achievement report on development of parallel processing software technology for discrete value solving methods; Kenkyu joho kiban kenkyu kaihatsu seika hokokusho. Risanka suchi kaiho no tame no heiretsu shori software gijutsu kaihatsu

Energy Technology Data Exchange (ETDEWEB)

NONE

2000-09-01

Research and development has been performed on a general purpose parallel processing software that can be utilized for value solving methods, such as the finite element method, finite volume method and finite difference method. The achievements of the research and development may be summarized as follows: this parallel platform is parallelized in the concept of the domain division method for the elements (calculation cells), and is applicable to any of the finite element method, finite volume method and finite difference method; a researcher who has developed a program can easily perform the parallelization work to have the parallelizing performance displayed; the platform can be utilized in agreement with several parallel levels that are required by the user; with regard to the parallelization efficiency in large-size problems, it has become possible to execute at an efficiency of higher than 70% for the solver parts by using 32 processors of SR8000 at the computation center of the Agency of Industrial Science and Technology; the rigidity matrix preparing part shows an efficiency close to 100%W; and the developed parallel platform is under continued evaluation at the Machine Technology Research Institute and the Material Engineering Research Institute. (NEDO)
Parallel Adaptive Mesh Refinement for High-Order Finite-Volume Schemes in Computational Fluid Dynamics

Science.gov (United States)

Schwing, Alan Michael

For computational fluid dynamics, the governing equations are solved on a discretized domain of nodes, faces, and cells. The quality of the grid or mesh can be a driving source for error in the results. While refinement studies can help guide the creation of a mesh, grid quality is largely determined by user expertise and understanding of the flow physics. Adaptive mesh refinement is a technique for enriching the mesh during a simulation based on metrics for error, impact on important parameters, or location of important flow features. This can offload from the user some of the difficult and ambiguous decisions necessary when discretizing the domain. This work explores the implementation of adaptive mesh refinement in an implicit, unstructured, finite-volume solver. Consideration is made for applying modern computational techniques in the presence of hanging nodes and refined cells. The approach is developed to be independent of the flow solver in order to provide a path for augmenting existing codes. It is designed to be applicable for unsteady simulations and refinement and coarsening of the grid does not impact the conservatism of the underlying numerics. The effect on high-order numerical fluxes of fourth- and sixth-order are explored. Provided the criteria for refinement is appropriately selected, solutions obtained using adapted meshes have no additional error when compared to results obtained on traditional, unadapted meshes. In order to leverage large-scale computational resources common today, the methods are parallelized using MPI. Parallel performance is considered for several test problems in order to assess scalability of both adapted and unadapted grids. Dynamic repartitioning of the mesh during refinement is crucial for load balancing an evolving grid. Development of the methods outlined here depend on a dual-memory approach that is described in detail. Validation of the solver developed here against a number of motivating problems shows favorable

Goal-Oriented Self-Adaptive hp Finite Element Simulation of 3D DC Borehole Resistivity Simulations

KAUST Repository

Calo, Victor M.

2011-05-14

In this paper we present a goal-oriented self-adaptive hp Finite Element Method (hp-FEM) with shared data structures and a parallel multi-frontal direct solver. The algorithm automatically generates (without any user interaction) a sequence of meshes delivering exponential convergence of a prescribed quantity of interest with respect to the number of degrees of freedom. The sequence of meshes is generated from a given initial mesh, by performing h (breaking elements into smaller elements), p (adjusting polynomial orders of approximation) or hp (both) refinements on the finite elements. The new parallel implementation utilizes a computational mesh shared between multiple processors. All computational algorithms, including automatic hp goal-oriented adaptivity and the solver work fully in parallel. We describe the parallel self-adaptive hp-FEM algorithm with shared computational domain, as well as its efficiency measurements. We apply the methodology described to the three-dimensional simulation of the borehole resistivity measurement of direct current through casing in the presence of invasion.
Parallel state transfer and efficient quantum routing on quantum networks.

Science.gov (United States)

Chudzicki, Christopher; Strauch, Frederick W

2010-12-31

We study the routing of quantum information in parallel on multidimensional networks of tunable qubits and oscillators. These theoretical models are inspired by recent experiments in superconducting circuits. We show that perfect parallel state transfer is possible for certain networks of harmonic oscillator modes. We extend this to the distribution of entanglement between every pair of nodes in the network, finding that the routing efficiency of hypercube networks is optimal and robust in the presence of dissipation and finite bandwidth.
Nonlinear magnetohydrodynamics simulation using high-order finite elements

International Nuclear Information System (INIS)

Plimpton, Steven James; Schnack, D.D.; Tarditi, A.; Chu, M.S.; Gianakon, T.A.; Kruger, S.E.; Nebel, R.A.; Barnes, D.C.; Sovinec, C.R.; Glasser, A.H.

2005-01-01

A conforming representation composed of 2D finite elements and finite Fourier series is applied to 3D nonlinear non-ideal magnetohydrodynamics using a semi-implicit time-advance. The self-adjoint semi-implicit operator and variational approach to spatial discretization are synergistic and enable simulation in the extremely stiff conditions found in high temperature plasmas without sacrificing the geometric flexibility needed for modeling laboratory experiments. Growth rates for resistive tearing modes with experimentally relevant Lundquist number are computed accurately with time-steps that are large with respect to the global Alfven time and moderate spatial resolution when the finite elements have basis functions of polynomial degree (p) two or larger. An error diffusion method controls the generation of magnetic divergence error. Convergence studies show that this approach is effective for continuous basis functions with p (ge) 2, where the number of test functions for the divergence control terms is less than the number of degrees of freedom in the expansion for vector fields. Anisotropic thermal conduction at realistic ratios of parallel to perpendicular conductivity (x(parallel)/x(perpendicular)) is computed accurately with p (ge) 3 without mesh alignment. A simulation of tearing-mode evolution for a shaped toroidal tokamak equilibrium demonstrates the effectiveness of the algorithm in nonlinear conditions, and its results are used to verify the accuracy of the numerical anisotropic thermal conduction in 3D magnetic topologies.
A Simple Method for Static Load Balancing of Parallel FDTD Codes

DEFF Research Database (Denmark)

Franek, Ondrej

2016-01-01

A static method for balancing computational loads in parallel implementations of the finite-difference timedomain method is presented. The procedure is fairly straightforward and computationally inexpensive, thus providing an attractive alternative to optimization techniques. The method is descri...
Bioinformatics algorithm based on a parallel implementation of a machine learning approach using transducers

International Nuclear Information System (INIS)

Roche-Lima, Abiel; Thulasiram, Ruppa K

2012-01-01

Finite automata, in which each transition is augmented with an output label in addition to the familiar input label, are considered finite-state transducers. Transducers have been used to analyze some fundamental issues in bioinformatics. Weighted finite-state transducers have been proposed to pairwise alignments of DNA and protein sequences; as well as to develop kernels for computational biology. Machine learning algorithms for conditional transducers have been implemented and used for DNA sequence analysis. Transducer learning algorithms are based on conditional probability computation. It is calculated by using techniques, such as pair-database creation, normalization (with Maximum-Likelihood normalization) and parameters optimization (with Expectation-Maximization - EM). These techniques are intrinsically costly for computation, even worse when are applied to bioinformatics, because the databases sizes are large. In this work, we describe a parallel implementation of an algorithm to learn conditional transducers using these techniques. The algorithm is oriented to bioinformatics applications, such as alignments, phylogenetic trees, and other genome evolution studies. Indeed, several experiences were developed using the parallel and sequential algorithm on Westgrid (specifically, on the Breeze cluster). As results, we obtain that our parallel algorithm is scalable, because execution times are reduced considerably when the data size parameter is increased. Another experience is developed by changing precision parameter. In this case, we obtain smaller execution times using the parallel algorithm. Finally, number of threads used to execute the parallel algorithm on the Breezy cluster is changed. In this last experience, we obtain as result that speedup is considerably increased when more threads are used; however there is a convergence for number of threads equal to or greater than 16.
Electrostatic turbulence with finite parallel correlation length and radial electric field generation

International Nuclear Information System (INIS)

Vlad, M.; Spineanu, F.; Misguich, J.H.; Balescu, R.

2001-01-01

Particle diffusion in a given electrostatic turbulence with a finite correlation length along the confining magnetic field is studied in the test particle approach. An anomalous diffusion regime of amplified diffusion coefficients is found in the conditions when particle trapping in the structure of the stochastic potential is effective. The auto-generated radial electric field is calculated. (author)
Parametric instabilities of parallel propagating incoherent Alfven waves in a finite ion beta plasma

International Nuclear Information System (INIS)

Nariyuki, Y.; Hada, T.; Tsubouchi, K.

2007-01-01

Large amplitude, low-frequency Alfven waves constitute one of the most essential elements of magnetohydrodynamic (MHD) turbulence in the fast solar wind. Due to small collisionless dissipation rates, the waves can propagate long distances and efficiently convey such macroscopic quantities as momentum, energy, and helicity. Since loading of such quantities is completed when the waves damp away, it is important to examine how the waves can dissipate in the solar wind. Among various possible dissipation processes of the Alfven waves, parametric instabilities have been believed to be important. In this paper, we numerically discuss the parametric instabilities of coherent/incoherent Alfven waves in a finite ion beta plasma using a one-dimensional hybrid (superparticle ions plus an electron massless fluid) simulation, in order to explain local production of sunward propagating Alfven waves, as suggested by Helios/Ulysses observation results. Parameter studies clarify the dependence of parametric instabilities of coherent/incoherent Alfven waves on the ion and electron beta ratio. Parametric instabilities of coherent Alfven waves in a finite ion beta plasma are vastly different from those in the cold ions (i.e., MHD and/or Hall-MHD systems), even if the collisionless damping of the Alfven waves are neglected. Further, ''nonlinearly driven'' modulational instability is important for the dissipation of incoherent Alfven waves in a finite ion beta plasma regardless of their polarization, since the ion kinetic effects let both the right-hand and left-hand polarized waves become unstable to the modulational instability. The present results suggest that, although the antisunward propagating dispersive Alfven waves are efficiently dissipated through the parametric instabilities in a finite ion beta plasma, these instabilities hardly produce the sunward propagating waves
Parallel Simulation of Three-Dimensional Free Surface Fluid Flow Problems

International Nuclear Information System (INIS)

BAER, THOMAS A.; SACKINGER, PHILIP A.; SUBIA, SAMUEL R.

1999-01-01

Simulation of viscous three-dimensional fluid flow typically involves a large number of unknowns. When free surfaces are included, the number of unknowns increases dramatically. Consequently, this class of problem is an obvious application of parallel high performance computing. We describe parallel computation of viscous, incompressible, free surface, Newtonian fluid flow problems that include dynamic contact fines. The Galerkin finite element method was used to discretize the fully-coupled governing conservation equations and a ''pseudo-solid'' mesh mapping approach was used to determine the shape of the free surface. In this approach, the finite element mesh is allowed to deform to satisfy quasi-static solid mechanics equations subject to geometric or kinematic constraints on the boundaries. As a result, nodal displacements must be included in the set of unknowns. Other issues discussed are the proper constraints appearing along the dynamic contact line in three dimensions. Issues affecting efficient parallel simulations include problem decomposition to equally distribute computational work among a SPMD computer and determination of robust, scalable preconditioners for the distributed matrix systems that must be solved. Solution continuation strategies important for serial simulations have an enhanced relevance in a parallel coquting environment due to the difficulty of solving large scale systems. Parallel computations will be demonstrated on an example taken from the coating flow industry: flow in the vicinity of a slot coater edge. This is a three dimensional free surface problem possessing a contact line that advances at the web speed in one region but transitions to static behavior in another region. As such, a significant fraction of the computational time is devoted to processing boundary data. Discussion focuses on parallel speed ups for fixed problem size, a class of problems of immediate practical importance
Locating an axis-parallel rectangle on a Manhattan plane

DEFF Research Database (Denmark)

Brimberg, Jack; Juel, Henrik; Körner, Mark-Christoph

2014-01-01

In this paper we consider the problem of locating an axis-parallel rectangle in the plane such that the sum of distances between the rectangle and a finite point set is minimized, where the distance is measured by the Manhattan norm 1. In this way we solve an extension of the Weber problem...
Convergence analysis of variational and non-variational multigrid algorithms for the Laplace-Beltrami operator

KAUST Repository

Bonito, Andrea

2012-09-01

We design and analyze variational and non-variational multigrid algorithms for the Laplace-Beltrami operator on a smooth and closed surface. In both cases, a uniform convergence for the V -cycle algorithm is obtained provided the surface geometry is captured well enough by the coarsest grid. The main argument hinges on a perturbation analysis from an auxiliary variational algorithm defined directly on the smooth surface. In addition, the vanishing mean value constraint is imposed on each level, thereby avoiding singular quadratic forms without adding additional computational cost. Numerical results supporting our analysis are reported. In particular, the algorithms perform well even when applied to surfaces with a large aspect ratio. © 2011 American Mathematical Society.
Computational split-field finite-difference time-domain evaluation of simplified tilt-angle models for parallel-aligned liquid-crystal devices

Science.gov (United States)

Márquez, Andrés; Francés, Jorge; Martínez, Francisco J.; Gallego, Sergi; Álvarez, Mariela L.; Calzado, Eva M.; Pascual, Inmaculada; Beléndez, Augusto

2018-03-01

Simplified analytical models with predictive capability enable simpler and faster optimization of the performance in applications of complex photonic devices. We recently demonstrated the most simplified analytical model still showing predictive capability for parallel-aligned liquid crystal on silicon (PA-LCoS) devices, which provides the voltage-dependent retardance for a very wide range of incidence angles and any wavelength in the visible. We further show that the proposed model is not only phenomenological but also physically meaningful, since two of its parameters provide the correct values for important internal properties of these devices related to the birefringence, cell gap, and director profile. Therefore, the proposed model can be used as a means to inspect internal physical properties of the cell. As an innovation, we also show the applicability of the split-field finite-difference time-domain (SF-FDTD) technique for phase-shift and retardance evaluation of PA-LCoS devices under oblique incidence. As a simplified model for PA-LCoS devices, we also consider the exact description of homogeneous birefringent slabs. However, we show that, despite its higher degree of simplification, the proposed model is more robust, providing unambiguous and physically meaningful solutions when fitting its parameters.
OFF, Open source Finite volume Fluid dynamics code: A free, high-order solver based on parallel, modular, object-oriented Fortran API

Science.gov (United States)

Zaghi, S.

2014-07-01

OFF, an open source (free software) code for performing fluid dynamics simulations, is presented. The aim of OFF is to solve, numerically, the unsteady (and steady) compressible Navier-Stokes equations of fluid dynamics by means of finite volume techniques: the research background is mainly focused on high-order (WENO) schemes for multi-fluids, multi-phase flows over complex geometries. To this purpose a highly modular, object-oriented application program interface (API) has been developed. In particular, the concepts of data encapsulation and inheritance available within Fortran language (from standard 2003) have been stressed in order to represent each fluid dynamics "entity" (e.g. the conservative variables of a finite volume, its geometry, etc…) by a single object so that a large variety of computational libraries can be easily (and efficiently) developed upon these objects. The main features of OFF can be summarized as follows: Programming LanguageOFF is written in standard (compliant) Fortran 2003; its design is highly modular in order to enhance simplicity of use and maintenance without compromising the efficiency; Parallel Frameworks Supported the development of OFF has been also targeted to maximize the computational efficiency: the code is designed to run on shared-memory multi-cores workstations and distributed-memory clusters of shared-memory nodes (supercomputers); the code's parallelization is based on Open Multiprocessing (OpenMP) and Message Passing Interface (MPI) paradigms; Usability, Maintenance and Enhancement in order to improve the usability, maintenance and enhancement of the code also the documentation has been carefully taken into account; the documentation is built upon comprehensive comments placed directly into the source files (no external documentation files needed): these comments are parsed by means of doxygen free software producing high quality html and latex documentation pages; the distributed versioning system referred as git
Simulations of finite beta turbulence in tokamaks and stellarators

International Nuclear Information System (INIS)

Jenko, F.

2002-01-01

One of the central open questions in our attempt to understand microturbulence in fusion plasmas concerns the role of finite beta effects. Nonlinear codes trying to investigate this issue must go beyond the commonly used adiabatic electron approximation - a task which turns out to be a serious computational challenge. This step is necessary because the electrons are the prime contributor to the parallel currents which in turn produce the magnetic field fluctuations. Results at both ion and electron space-time scales from gyrokinetic and gyrofluid models are presented which shed light on the character of finite beta turbulence in tokamaks and stellarators. (author)
A Direct Elliptic Solver Based on Hierarchically Low-Rank Schur Complements

KAUST Repository

Chávez, Gustavo

2017-03-17

A parallel fast direct solver for rank-compressible block tridiagonal linear systems is presented. Algorithmic synergies between Cyclic Reduction and Hierarchical matrix arithmetic operations result in a solver with O(Nlog2N) arithmetic complexity and O(NlogN) memory footprint. We provide a baseline for performance and applicability by comparing with well-known implementations of the $$\\\\mathcal{H}$$ -LU factorization and algebraic multigrid within a shared-memory parallel environment that leverages the concurrency features of the method. Numerical experiments reveal that this method is comparable with other fast direct solvers based on Hierarchical Matrices such as $$\\\\mathcal{H}$$ -LU and that it can tackle problems where algebraic multigrid fails to converge.
A High-Performance Parallel FDTD Method Enhanced by Using SSE Instruction Set

Directory of Open Access Journals (Sweden)

Dau-Chyrh Chang

2012-01-01

Full Text Available We introduce a hardware acceleration technique for the parallel finite difference time domain (FDTD method using the SSE (streaming (single instruction multiple data SIMD extensions instruction set. The implementation of SSE instruction set to parallel FDTD method has achieved the significant improvement on the simulation performance. The benchmarks of the SSE acceleration on both the multi-CPU workstation and computer cluster have demonstrated the advantages of (vector arithmetic logic unit VALU acceleration over GPU acceleration. Several engineering applications are employed to demonstrate the performance of parallel FDTD method enhanced by SSE instruction set.
Fast parallel diffractive multi-beam femtosecond laser surface micro-structuring

Energy Technology Data Exchange (ETDEWEB)

Zheng Kuang, E-mail: z.kuang@liv.ac.uk [Laser Group, Department of Engineering, University of Liverpool, Brodie Building, Liverpool L69 3GQ (United Kingdom); Dun Liu; Perrie, Walter; Edwardson, Stuart; Sharp, Martin; Fearon, Eamonn; Dearden, Geoff; Watkins, Ken [Laser Group, Department of Engineering, University of Liverpool, Brodie Building, Liverpool L69 3GQ (United Kingdom)

2009-04-15

Fast parallel femtosecond laser surface micro-structuring is demonstrated using a spatial light modulator (SLM). The Gratings and Lenses algorithm, which is simple and computationally fast, is used to calculate computer generated holograms (CGHs) producing diffractive multiple beams for the parallel processing. The results show that the finite laser bandwidth can significantly alter the intensity distribution of diffracted beams at higher angles resulting in elongated hole shapes. In addition, by synchronisation of applied CGHs and the scanning system, true 3D micro-structures are created on Ti6Al4V.
Parallel implementation of a dynamic unstructured chimera method in the DLR finite volume TAU-code

Energy Technology Data Exchange (ETDEWEB)

Madrane, A.; Raichle, A.; Stuermer, A. [German Aerospace Center, DLR, Numerical Methods, Inst. of Aerodynamics and Flow Technology, Braunschweig (Germany)]. E-mail: aziz.madrane@dlr.de

2004-07-01

Aerodynamic problems involving moving geometries have many applications, including store separation, high-speed train entering into a tunnel, simulation of full configurations of the helicopter and fast maneuverability. Overset grid method offers the option of calculating these procedures. The solution process uses a grid system that discretizes the problem domain by using separately generated but overlapping unstructured grids that update and exchange boundary information through interpolation. However, such computations are complicated and time consuming. Parallel computing offers a very effective way to improve the productivity in doing computational fluid dynamics (CFD). Therefore the purpose of this study is to develop an efficient parallel computation algorithm for analyzing the flowfield of complex geometries using overset grids method. The strategy adopted in the parallelization of the overset grids method including the use of data structures and communication, is described. Numerical results are presented to demonstrate the efficiency of the resulting parallel overset grids method. (author)
Parallel implementation of a dynamic unstructured chimera method in the DLR finite volume TAU-code

International Nuclear Information System (INIS)

Madrane, A.; Raichle, A.; Stuermer, A.

2004-01-01

Aerodynamic problems involving moving geometries have many applications, including store separation, high-speed train entering into a tunnel, simulation of full configurations of the helicopter and fast maneuverability. Overset grid method offers the option of calculating these procedures. The solution process uses a grid system that discretizes the problem domain by using separately generated but overlapping unstructured grids that update and exchange boundary information through interpolation. However, such computations are complicated and time consuming. Parallel computing offers a very effective way to improve the productivity in doing computational fluid dynamics (CFD). Therefore the purpose of this study is to develop an efficient parallel computation algorithm for analyzing the flowfield of complex geometries using overset grids method. The strategy adopted in the parallelization of the overset grids method including the use of data structures and communication, is described. Numerical results are presented to demonstrate the efficiency of the resulting parallel overset grids method. (author)
An implicit multigrid algorithm for computing hypersonic, chemically reacting viscous flows

International Nuclear Information System (INIS)

Edwards, J.R.

1996-01-01

An implicit algorithm for computing viscous flows in chemical nonequilibrium is presented. Emphasis is placed on the numerical efficiency of the time integration scheme, both in terms of periteration workload and overall convergence rate. In this context, several techniques are introduced, including a stable, O(m 2 ) approximate factorization of the chemical source Jacobian and implementations of V-cycle and filtered multigrid acceleration methods. A five species-seventeen reaction air model is used to calculate hypersonic viscous flow over a cylinder at conditions corresponding to flight at 5 km/s, 60 km altitude and at 11.36 km/s, 76.42 km altitude. Inviscid calculations using an eleven-species reaction mechanism including ionization are presented for a case involving 11.37 km/s flow at an altitude of 84.6 km. Comparisons among various options for the implicit treatment of the chemical source terms and among different multilevel approaches for convergence acceleration are presented for all simulations
Parallel Representation of Value-Based and Finite State-Based Strategies in the Ventral and Dorsal Striatum.

Directory of Open Access Journals (Sweden)

Makoto Ito

2015-11-01

Full Text Available Previous theoretical studies of animal and human behavioral learning have focused on the dichotomy of the value-based strategy using action value functions to predict rewards and the model-based strategy using internal models to predict environmental states. However, animals and humans often take simple procedural behaviors, such as the "win-stay, lose-switch" strategy without explicit prediction of rewards or states. Here we consider another strategy, the finite state-based strategy, in which a subject selects an action depending on its discrete internal state and updates the state depending on the action chosen and the reward outcome. By analyzing choice behavior of rats in a free-choice task, we found that the finite state-based strategy fitted their behavioral choices more accurately than value-based and model-based strategies did. When fitted models were run autonomously with the same task, only the finite state-based strategy could reproduce the key feature of choice sequences. Analyses of neural activity recorded from the dorsolateral striatum (DLS, the dorsomedial striatum (DMS, and the ventral striatum (VS identified significant fractions of neurons in all three subareas for which activities were correlated with individual states of the finite state-based strategy. The signal of internal states at the time of choice was found in DMS, and for clusters of states was found in VS. In addition, action values and state values of the value-based strategy were encoded in DMS and VS, respectively. These results suggest that both the value-based strategy and the finite state-based strategy are implemented in the striatum.

Parallel Representation of Value-Based and Finite State-Based Strategies in the Ventral and Dorsal Striatum.

Science.gov (United States)

Ito, Makoto; Doya, Kenji

2015-11-01

Previous theoretical studies of animal and human behavioral learning have focused on the dichotomy of the value-based strategy using action value functions to predict rewards and the model-based strategy using internal models to predict environmental states. However, animals and humans often take simple procedural behaviors, such as the "win-stay, lose-switch" strategy without explicit prediction of rewards or states. Here we consider another strategy, the finite state-based strategy, in which a subject selects an action depending on its discrete internal state and updates the state depending on the action chosen and the reward outcome. By analyzing choice behavior of rats in a free-choice task, we found that the finite state-based strategy fitted their behavioral choices more accurately than value-based and model-based strategies did. When fitted models were run autonomously with the same task, only the finite state-based strategy could reproduce the key feature of choice sequences. Analyses of neural activity recorded from the dorsolateral striatum (DLS), the dorsomedial striatum (DMS), and the ventral striatum (VS) identified significant fractions of neurons in all three subareas for which activities were correlated with individual states of the finite state-based strategy. The signal of internal states at the time of choice was found in DMS, and for clusters of states was found in VS. In addition, action values and state values of the value-based strategy were encoded in DMS and VS, respectively. These results suggest that both the value-based strategy and the finite state-based strategy are implemented in the striatum.
Finite-size polyelectrolyte bundles at thermodynamic equilibrium

Science.gov (United States)

Sayar, M.; Holm, C.

2007-01-01

We present the results of extensive computer simulations performed on solutions of monodisperse charged rod-like polyelectrolytes in the presence of trivalent counterions. To overcome energy barriers we used a combination of parallel tempering and hybrid Monte Carlo techniques. Our results show that for small values of the electrostatic interaction the solution mostly consists of dispersed single rods. The potential of mean force between the polyelectrolyte monomers yields an attractive interaction at short distances. For a range of larger values of the Bjerrum length, we find finite-size polyelectrolyte bundles at thermodynamic equilibrium. Further increase of the Bjerrum length eventually leads to phase separation and precipitation. We discuss the origin of the observed thermodynamic stability of the finite-size aggregates.
A language for data-parallel and task parallel programming dedicated to multi-SIMD computers. Contributions to hydrodynamic simulation with lattice gases

International Nuclear Information System (INIS)

Pic, Marc Michel

1995-01-01

Parallel programming covers task-parallelism and data-parallelism. Many problems need both parallelisms. Multi-SIMD computers allow hierarchical approach of these parallelisms. The T++ language, based on C++, is dedicated to exploit Multi-SIMD computers using a programming paradigm which is an extension of array-programming to tasks managing. Our language introduced array of independent tasks to achieve separately (MIMD), on subsets of processors of identical behaviour (SIMD), in order to translate the hierarchical inclusion of data-parallelism in task-parallelism. To manipulate in a symmetrical way tasks and data we propose meta-operations which have the same behaviour on tasks arrays and on data arrays. We explain how to implement this language on our parallel computer SYMPHONIE in order to profit by the locally-shared memory, by the hardware virtualization, and by the multiplicity of communications networks. We analyse simultaneously a typical application of such architecture. Finite elements scheme for Fluid mechanic needs powerful parallel computers and requires large floating points abilities. Lattice gases is an alternative to such simulations. Boolean lattice bases are simple, stable, modular, need to floating point computation, but include numerical noise. Boltzmann lattice gases present large precision of computation, but needs floating points and are only locally stable. We propose a new scheme, called multi-bit, who keeps the advantages of each boolean model to which it is applied, with large numerical precision and reduced noise. Experiments on viscosity, physical behaviour, noise reduction and spurious invariants are shown and implementation techniques for parallel Multi-SIMD computers detailed. (author) [fr
Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB

Science.gov (United States)

Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.

2017-01-01

Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.
Recent development of the Multi-Grid detector for large area neutron scattering instruments

International Nuclear Information System (INIS)

Guerard, Bruno

2015-01-01

Most of the Neutron Scattering facilities are committed in a continuous program of modernization of their instruments, requiring large area and high performance thermal neutron detectors. Beside scintillators detectors, 3 He detectors, like linear PSDs (Position Sensitive Detectors) and MWPCs (Multi-Wires Proportional Chambers), are the most current techniques nowadays. Time Of Flight instruments are using 3 He PSDs mounted side by side to cover tens of m 2 . As a result of the so-called ' 3 He shortage crisis , the volume of 3He which is needed to build one of these instruments is not accessible anymore. The development of alternative techniques requiring no 3He, has been given high priority to secure the future of neutron scattering instrumentation. This is particularly important in the context where the future ESS (European Spallation Source) will start its operation in 2019-2020. Improved scintillators represent one of the alternative techniques. Another one is the Multi-Grid introduced at the ILL in 2009. A Multi-Grid detector is composed of several independent modules of typically 0.8 m x 3 m sensitive area, mounted side by side in air or in a vacuum TOF chamber. One module is composed of segmented boron-lined proportional counters mounted in a gas vessel; the counters, of square section, are assembled with Aluminium grids electrically insulated and stacked together. This design provides two advantages: First, magnetron sputtering techniques can be used to coat B 4 C films on planar substrates, and second, the neutron position along the anode wires can be measured by reading out individually the grid signals with fast shaping amplifiers followed by comparators. Unlike charge division localisation in linear PSDs, the individual readout of the grids allows operating the Multi-Grid at a low amplification gain, hence this detector is tolerant to mechanical defects and its production accessible to laboratories equipped with standard equipment. Prototypes of
Recent development of the Multi-Grid detector for large area neutron scattering instruments

Energy Technology Data Exchange (ETDEWEB)

Guerard, Bruno [ILL-ESS-LiU collaboration, CRISP project, Institut Laue Langevin - ILL, Grenoble (France)

2015-07-01

Most of the Neutron Scattering facilities are committed in a continuous program of modernization of their instruments, requiring large area and high performance thermal neutron detectors. Beside scintillators detectors, {sup 3}He detectors, like linear PSDs (Position Sensitive Detectors) and MWPCs (Multi-Wires Proportional Chambers), are the most current techniques nowadays. Time Of Flight instruments are using {sup 3}He PSDs mounted side by side to cover tens of m{sup 2}. As a result of the so-called '{sup 3}He shortage crisis{sup ,} the volume of 3He which is needed to build one of these instruments is not accessible anymore. The development of alternative techniques requiring no 3He, has been given high priority to secure the future of neutron scattering instrumentation. This is particularly important in the context where the future ESS (European Spallation Source) will start its operation in 2019-2020. Improved scintillators represent one of the alternative techniques. Another one is the Multi-Grid introduced at the ILL in 2009. A Multi-Grid detector is composed of several independent modules of typically 0.8 m x 3 m sensitive area, mounted side by side in air or in a vacuum TOF chamber. One module is composed of segmented boron-lined proportional counters mounted in a gas vessel; the counters, of square section, are assembled with Aluminium grids electrically insulated and stacked together. This design provides two advantages: First, magnetron sputtering techniques can be used to coat B{sub 4}C films on planar substrates, and second, the neutron position along the anode wires can be measured by reading out individually the grid signals with fast shaping amplifiers followed by comparators. Unlike charge division localisation in linear PSDs, the individual readout of the grids allows operating the Multi-Grid at a low amplification gain, hence this detector is tolerant to mechanical defects and its production accessible to laboratories equipped with standard
Finite element modeling of electrically rectified piezoelectric energy harvesters

International Nuclear Information System (INIS)

Wu, P H; Shu, Y C

2015-01-01

Finite element models are developed for designing electrically rectified piezoelectric energy harvesters. They account for the consideration of common interface circuits such as the standard and parallel-/series-SSHI (synchronized switch harvesting on inductor) circuits, as well as complicated structural configurations such as arrays of piezoelectric oscillators. The idea is to replace the energy harvesting circuit by the proposed equivalent load impedance together with the capacitance of negative value. As a result, the proposed framework is capable of being implemented into conventional finite element solvers for direct system-level design without resorting to circuit simulators. The validation based on COMSOL simulations carried out for various interface circuits by the comparison with the standard modal analysis model. The framework is then applied to the investigation on how harvested power is reduced due to fabrication deviations in geometric and material properties of oscillators in an array system. Remarkably, it is found that for a standard array system with strong electromechanical coupling, the drop in peak power turns out to be insignificant if the optimal load is carefully chosen. The second application is to design broadband energy harvesting by developing array systems with suitable interface circuits. The result shows that significant broadband is observed for the parallel (series) connection of oscillators endowed with the parallel-SSHI (series-SSHI) circuit technique. (paper)
Finite element modeling of electrically rectified piezoelectric energy harvesters

Science.gov (United States)

Wu, P. H.; Shu, Y. C.

2015-09-01

Finite element models are developed for designing electrically rectified piezoelectric energy harvesters. They account for the consideration of common interface circuits such as the standard and parallel-/series-SSHI (synchronized switch harvesting on inductor) circuits, as well as complicated structural configurations such as arrays of piezoelectric oscillators. The idea is to replace the energy harvesting circuit by the proposed equivalent load impedance together with the capacitance of negative value. As a result, the proposed framework is capable of being implemented into conventional finite element solvers for direct system-level design without resorting to circuit simulators. The validation based on COMSOL simulations carried out for various interface circuits by the comparison with the standard modal analysis model. The framework is then applied to the investigation on how harvested power is reduced due to fabrication deviations in geometric and material properties of oscillators in an array system. Remarkably, it is found that for a standard array system with strong electromechanical coupling, the drop in peak power turns out to be insignificant if the optimal load is carefully chosen. The second application is to design broadband energy harvesting by developing array systems with suitable interface circuits. The result shows that significant broadband is observed for the parallel (series) connection of oscillators endowed with the parallel-SSHI (series-SSHI) circuit technique.
Impact of computer advances on future finite elements computations. [for aircraft and spacecraft design

Science.gov (United States)

Fulton, Robert E.

1985-01-01

Research performed over the past 10 years in engineering data base management and parallel computing is discussed, and certain opportunities for research toward the next generation of structural analysis capability are proposed. Particular attention is given to data base management associated with the IPAD project and parallel processing associated with the Finite Element Machine project, both sponsored by NASA, and a near term strategy for a distributed structural analysis capability based on relational data base management software and parallel computers for a future structural analysis system.
Finite element analysis of three patterns of internal fixation of fractures of the mandibular condyle.

Science.gov (United States)

Aquilina, Peter; Chamoli, Uphar; Parr, William C H; Clausen, Philip D; Wroe, Stephen

2013-06-01

The most stable pattern of internal fixation for fractures of the mandibular condyle is a matter for ongoing discussion. In this study we investigated the stability of three commonly used patterns of plate fixation, and constructed finite element models of a simulated mandibular condylar fracture. The completed models were heterogeneous in the distribution of bony material properties, contained about 1.2 million elements, and incorporated simulated jaw-adducting musculature. Models were run assuming linear elasticity and isotropic material properties for bone. This model was considerably larger and more complex than previous finite element models that have been used to analyse the biomechanical behaviour of differing plating techniques. The use of two parallel 2.0 titanium miniplates gave a more stable configuration with lower mean element stresses and displacements over the use of a single miniplate. In addition, a parallel orientation of two miniplates resulted in lower stresses and displacements than did the use of two miniplates in an offset pattern. The use of two parallel titanium plates resulted in a superior biomechanical result as defined by mean element stresses and relative movement between the fractured fragments in these finite element models. Copyright © 2012 The British Association of Oral and Maxillofacial Surgeons. Published by Elsevier Ltd. All rights reserved.
Simulations of finite β turbulence in tokamaks and stellarators

International Nuclear Information System (INIS)

Jenko, F.; Scott, B.; Kendl, A.; Strintzi, D.; Dorland, W.

2003-01-01

One of the central open questions in our attempt to understand microturbulence in fusion plasmas concerns the role of finite β effects. Nonlinear codes trying to investigate this issue must go beyond the commonly used adiabatic electron approximation - a task which turns out to be a serious computational challenge. This step is necessary because the passing electrons are the prime contributor to the parallel currents which in turn produce the magnetic field fluctuations. Results at both ion and electron space-time scales from gyrokinetic and gyro fluid models are presented which shed light on the character of finite β turbulence in tokamaks and stellarators. (author)
A finite parallel zone model to interpret and extend Giddings' coupling theory for the eddy-dispersion in porous chromatographic media.

Science.gov (United States)

Desmet, Gert

2013-11-01

The finite length parallel zone (FPZ)-model is proposed as an alternative model for the axial- or eddy-dispersion caused by the occurrence of local velocity biases or flow heterogeneities in porous media such as those used in liquid chromatography columns. The mathematical plate height expression evolving from the model shows that the A- and C-term band broadening effects that can originate from a given velocity bias should be coupled in an exponentially decaying way instead of harmonically as proposed in Giddings' coupling theory. In the low and high velocity limit both models converge, while a 12% difference can be observed in the (practically most relevant) intermediate range of reduced velocities. Explicit expressions for the A- and C-constants appearing in the exponential decay-based plate height expression have been derived for each of the different possible velocity bias levels (single through-pore and particle level, multi-particle level and trans-column level). These expressions allow to directly relate the band broadening originating from these different levels to the local fundamental transport parameters, hence offering the possibility to include a velocity-dependent and, if, needed retention factor-dependent transversal dispersion coefficient. Having developed the mathematics for the general case wherein a difference in retention equilibrium establishes between the two parallel zones, the effect of any possible local variations in packing density and/or retention capacity on the eddy-dispersion can be explicitly accounted for as well. It is furthermore also shown that, whereas the lumped transport parameter model used in the basic variant of the FPZ-model only provides a first approximation of the true decay constant, the model can be extended by introducing a constant correction factor to correctly account for the continuous transversal dispersion transport in the velocity bias zones. Copyright © 2013 Elsevier B.V. All rights reserved.
Parallel processing of structural integrity analysis codes

International Nuclear Information System (INIS)

Swami Prasad, P.; Dutta, B.K.; Kushwaha, H.S.

1996-01-01

Structural integrity analysis forms an important role in assessing and demonstrating the safety of nuclear reactor components. This analysis is performed using analytical tools such as Finite Element Method (FEM) with the help of digital computers. The complexity of the problems involved in nuclear engineering demands high speed computation facilities to obtain solutions in reasonable amount of time. Parallel processing systems such as ANUPAM provide an efficient platform for realising the high speed computation. The development and implementation of software on parallel processing systems is an interesting and challenging task. The data and algorithm structure of the codes plays an important role in exploiting the parallel processing system capabilities. Structural analysis codes based on FEM can be divided into two categories with respect to their implementation on parallel processing systems. The first category codes such as those used for harmonic analysis, mechanistic fuel performance codes need not require the parallelisation of individual modules of the codes. The second category of codes such as conventional FEM codes require parallelisation of individual modules. In this category, parallelisation of equation solution module poses major difficulties. Different solution schemes such as domain decomposition method (DDM), parallel active column solver and substructuring method are currently used on parallel processing systems. Two codes, FAIR and TABS belonging to each of these categories have been implemented on ANUPAM. The implementation details of these codes and the performance of different equation solvers are highlighted. (author). 5 refs., 12 figs., 1 tab
Parallel computation of Euler and Navier-Stokes flows

International Nuclear Information System (INIS)

Swisshelm, J.M.; Johnson, G.M.; Kumar, S.P.

1986-01-01

A multigrid technique useful for accelerating the convergence of Euler and Navier-Stokes flow computations has been restructured to improve its performance on both SIMD and MIMD computers. The new algorithm allows both the construction of longer coarse-grid vectors and the multitasking of entire grids. Computational results are presented for the CDC Cyber 205, Cray X-MP, and Denelcor HEP I. 15 references
Efficient Parallel Algorithm For Direct Numerical Simulation of Turbulent Flows

Science.gov (United States)

Moitra, Stuti; Gatski, Thomas B.

1997-01-01

A distributed algorithm for a high-order-accurate finite-difference approach to the direct numerical simulation (DNS) of transition and turbulence in compressible flows is described. This work has two major objectives. The first objective is to demonstrate that parallel and distributed-memory machines can be successfully and efficiently used to solve computationally intensive and input/output intensive algorithms of the DNS class. The second objective is to show that the computational complexity involved in solving the tridiagonal systems inherent in the DNS algorithm can be reduced by algorithm innovations that obviate the need to use a parallelized tridiagonal solver.
Fast algorithms for transport models. Final report, June 1, 1993--May 31, 1994

International Nuclear Information System (INIS)

Manteuffel, T.

1994-12-01

The focus of this project is the study of multigrid and multilevel algorithms for the numerical solution of Boltzmann models of the transport of neutral and charged particles. In previous work a fast multigrid algorithm was developed for the numerical solution of the Boltzmann model of neutral particle transport in slab geometry assuming isotropic scattering. The new algorithm is extremely fast in the thick diffusion limit; the multigrid v-cycle convergence factor approaches zero as the mean-free-path between collisions approaches zero, independent of the mesh. Also, a fast multilevel method was developed for the numerical solution of the Boltzmann model of charged particle transport in the thick Fokker-Plank limit for slab geometry. Parallel implementations were developed for both algorithms
Computational fluid dynamics on a massively parallel computer

Science.gov (United States)

Jespersen, Dennis C.; Levit, Creon

1989-01-01

A finite difference code was implemented for the compressible Navier-Stokes equations on the Connection Machine, a massively parallel computer. The code is based on the ARC2D/ARC3D program and uses the implicit factored algorithm of Beam and Warming. The codes uses odd-even elimination to solve linear systems. Timings and computation rates are given for the code, and a comparison is made with a Cray XMP.
Parallel computation of rotating flows

DEFF Research Database (Denmark)

Lundin, Lars Kristian; Barker, Vincent A.; Sørensen, Jens Nørkær

1999-01-01

This paper deals with the simulation of 3‐D rotating flows based on the velocity‐vorticity formulation of the Navier‐Stokes equations in cylindrical coordinates. The governing equations are discretized by a finite difference method. The solution is advanced to a new time level by a two‐step process...... is that of solving a singular, large, sparse, over‐determined linear system of equations, and the iterative method CGLS is applied for this purpose. We discuss some of the mathematical and numerical aspects of this procedure and report on the performance of our software on a wide range of parallel computers. Darbe...
Parallel supercomputing: Advanced methods, algorithms, and software for large-scale linear and nonlinear problems

Energy Technology Data Exchange (ETDEWEB)

Carey, G.F.; Young, D.M.

1993-12-31

The program outlined here is directed to research on methods, algorithms, and software for distributed parallel supercomputers. Of particular interest are finite element methods and finite difference methods together with sparse iterative solution schemes for scientific and engineering computations of very large-scale systems. Both linear and nonlinear problems will be investigated. In the nonlinear case, applications with bifurcation to multiple solutions will be considered using continuation strategies. The parallelizable numerical methods of particular interest are a family of partitioning schemes embracing domain decomposition, element-by-element strategies, and multi-level techniques. The methods will be further developed incorporating parallel iterative solution algorithms with associated preconditioners in parallel computer software. The schemes will be implemented on distributed memory parallel architectures such as the CRAY MPP, Intel Paragon, the NCUBE3, and the Connection Machine. We will also consider other new architectures such as the Kendall-Square (KSQ) and proposed machines such as the TERA. The applications will focus on large-scale three-dimensional nonlinear flow and reservoir problems with strong convective transport contributions. These are legitimate grand challenge class computational fluid dynamics (CFD) problems of significant practical interest to DOE. The methods developed and algorithms will, however, be of wider interest.
An M-step preconditioned conjugate gradient method for parallel computation

Science.gov (United States)

Adams, L.

1983-01-01

This paper describes a preconditioned conjugate gradient method that can be effectively implemented on both vector machines and parallel arrays to solve sparse symmetric and positive definite systems of linear equations. The implementation on the CYBER 203/205 and on the Finite Element Machine is discussed and results obtained using the method on these machines are given.

Parallel computing for homogeneous diffusion and transport equations in neutronics; Calcul parallele pour les equations de diffusion et de transport homogenes en neutronique

Energy Technology Data Exchange (ETDEWEB)

Pinchedez, K

1999-06-01

Parallel computing meets the ever-increasing requirements for neutronic computer code speed and accuracy. In this work, two different approaches have been considered. We first parallelized the sequential algorithm used by the neutronics code CRONOS developed at the French Atomic Energy Commission. The algorithm computes the dominant eigenvalue associated with PN simplified transport equations by a mixed finite element method. Several parallel algorithms have been developed on distributed memory machines. The performances of the parallel algorithms have been studied experimentally by implementation on a T3D Cray and theoretically by complexity models. A comparison of various parallel algorithms has confirmed the chosen implementations. We next applied a domain sub-division technique to the two-group diffusion Eigen problem. In the modal synthesis-based method, the global spectrum is determined from the partial spectra associated with sub-domains. Then the Eigen problem is expanded on a family composed, on the one hand, from eigenfunctions associated with the sub-domains and, on the other hand, from functions corresponding to the contribution from the interface between the sub-domains. For a 2-D homogeneous core, this modal method has been validated and its accuracy has been measured. (author)
MAPCUMBA: A fast iterative multi-grid map-making algorithm for CMB experiments

Science.gov (United States)

Doré, O.; Teyssier, R.; Bouchet, F. R.; Vibert, D.; Prunet, S.

2001-07-01

The data analysis of current Cosmic Microwave Background (CMB) experiments like BOOMERanG or MAXIMA poses severe challenges which already stretch the limits of current (super-) computer capabilities, if brute force methods are used. In this paper we present a practical solution for the optimal map making problem which can be used directly for next generation CMB experiments like ARCHEOPS and TopHat, and can probably be extended relatively easily to the full PLANCK case. This solution is based on an iterative multi-grid Jacobi algorithm which is both fast and memory sparing. Indeed, if there are Ntod data points along the one dimensional timeline to analyse, the number of operations is of O (Ntod \\ln Ntod) and the memory requirement is O (Ntod). Timing and accuracy issues have been analysed on simulated ARCHEOPS and TopHat data, and we discuss as well the issue of the joint evaluation of the signal and noise statistical properties.
Parallel-Vector Algorithm For Rapid Structural Anlysis

Science.gov (United States)

Agarwal, Tarun R.; Nguyen, Duc T.; Storaasli, Olaf O.

1993-01-01

New algorithm developed to overcome deficiency of skyline storage scheme by use of variable-band storage scheme. Exploits both parallel and vector capabilities of modern high-performance computers. Gives engineers and designers opportunity to include more design variables and constraints during optimization of structures. Enables use of more refined finite-element meshes to obtain improved understanding of complex behaviors of aerospace structures leading to better, safer designs. Not only attractive for current supercomputers but also for next generation of shared-memory supercomputers.
An Experiment of Robust Parallel Algorithm for the Eigenvalue problem of a Multigroup Neutron Diffusion based on modified FETI-DP

Energy Technology Data Exchange (ETDEWEB)

Chang, Jonghwa [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)

2014-05-15

Parallelization of Monte Carlo simulation is widely adpoted. There are also several parallel algorithms developed for the SN transport theory using the parallel wave sweeping algorithm and for the CPM using parallel ray tracing. For practical purpose of reactor physics application, the thermal feedback and burnup effects on the multigroup cross section should be considered. In this respect, the domain decomposition method(DDM) is suitable for distributing the expensive cross section calculation work. Parallel transport code and diffusion code based on the Raviart-Thomas mixed finite element method was developed. However most of the developed methods rely on the heuristic convergence of flux and current at the domain interfaces. Convergence was not attained in some cases. Mechanical stress computation community has also work on the DDM to solve the stress-strain equation using the finite element methods. The most successful domain decomposition method in terms of robustness is FETI-DP. We have modified the original FETI-DP to solve the eigenvalue problem for the multigroup diffusion problem in this study.
Optimising a parallel conjugate gradient solver

Energy Technology Data Exchange (ETDEWEB)

Field, M.R. [O`Reilly Institute, Dublin (Ireland)

1996-12-31

This work arises from the introduction of a parallel iterative solver to a large structural analysis finite element code. The code is called FEX and it was developed at Hitachi`s Mechanical Engineering Laboratory. The FEX package can deal with a large range of structural analysis problems using a large number of finite element techniques. FEX can solve either stress or thermal analysis problems of a range of different types from plane stress to a full three-dimensional model. These problems can consist of a number of different materials which can be modelled by a range of material models. The structure being modelled can have the load applied at either a point or a surface, or by a pressure, a centrifugal force or just gravity. Alternatively a thermal load can be applied with a given initial temperature. The displacement of the structure can be constrained by having a fixed boundary or by prescribing the displacement at a boundary.
Goal-Oriented Self-Adaptive hp Finite Element Simulation of 3D DC Borehole Resistivity Simulations

KAUST Repository

Calo, Victor M.; Pardo, David; Paszyński, Maciej R.

2011-01-01

(adjusting polynomial orders of approximation) or hp (both) refinements on the finite elements. The new parallel implementation utilizes a computational mesh shared between multiple processors. All computational algorithms, including automatic hp goal
Fast parallel algorithms for the x-ray transform and its adjoint.

Science.gov (United States)

Gao, Hao

2012-11-01

Iterative reconstruction methods often offer better imaging quality and allow for reconstructions with lower imaging dose than classical methods in computed tomography. However, the computational speed is a major concern for these iterative methods, for which the x-ray transform and its adjoint are two most time-consuming components. The speed issue becomes even notable for the 3D imaging such as cone beam scans or helical scans, since the x-ray transform and its adjoint are frequently computed as there is usually not enough computer memory to save the corresponding system matrix. The purpose of this paper is to optimize the algorithm for computing the x-ray transform and its adjoint, and their parallel computation. The fast and highly parallelizable algorithms for the x-ray transform and its adjoint are proposed for the infinitely narrow beam in both 2D and 3D. The extension of these fast algorithms to the finite-size beam is proposed in 2D and discussed in 3D. The CPU and GPU codes are available at https://sites.google.com/site/fastxraytransform. The proposed algorithm is faster than Siddon's algorithm for computing the x-ray transform. In particular, the improvement for the parallel computation can be an order of magnitude. The authors have proposed fast and highly parallelizable algorithms for the x-ray transform and its adjoint, which are extendable for the finite-size beam. The proposed algorithms are suitable for parallel computing in the sense that the computational cost per parallel thread is O(1).
Sharp asymptotics for stochastic dynamics with parallel updating rule with self-interaction

NARCIS (Netherlands)

Bovier, A.; Nardi, F.R.; Spitoni, C.

2011-01-01

In this paper we study metastability for a stochastic dynamics with a parallel updating rule in particular for a probabilistic cellular automata. The problem is addressed in the Freidlin Wentzel regime, i.e., finite volume, small magnetic field, and in the limit when temperature tends to zero. We
Distributed Finite Element Analysis Using a Transputer Network

Science.gov (United States)

Watson, James; Favenesi, James; Danial, Albert; Tombrello, Joseph; Yang, Dabby; Reynolds, Brian; Turrentine, Ronald; Shephard, Mark; Baehmann, Peggy

1989-01-01

The principal objective of this research effort was to demonstrate the extraordinarily cost effective acceleration of finite element structural analysis problems using a transputer-based parallel processing network. This objective was accomplished in the form of a commercially viable parallel processing workstation. The workstation is a desktop size, low-maintenance computing unit capable of supercomputer performance yet costs two orders of magnitude less. To achieve the principal research objective, a transputer based structural analysis workstation termed XPFEM was implemented with linear static structural analysis capabilities resembling commercially available NASTRAN. Finite element model files, generated using the on-line preprocessing module or external preprocessing packages, are downloaded to a network of 32 transputers for accelerated solution. The system currently executes at about one third Cray X-MP24 speed but additional acceleration appears likely. For the NASA selected demonstration problem of a Space Shuttle main engine turbine blade model with about 1500 nodes and 4500 independent degrees of freedom, the Cray X-MP24 required 23.9 seconds to obtain a solution while the transputer network, operated from an IBM PC-AT compatible host computer, required 71.7 seconds. Consequently, the $80,000 transputer network demonstrated a cost-performance ratio about 60 times better than the $15,000,000 Cray X-MP24 system.
Bistatic scattering from a three-dimensional object above a two-dimensional randomly rough surface modeled with the parallel FDTD approach.

Science.gov (United States)

Guo, L-X; Li, J; Zeng, H

2009-11-01

We present an investigation of the electromagnetic scattering from a three-dimensional (3-D) object above a two-dimensional (2-D) randomly rough surface. A Message Passing Interface-based parallel finite-difference time-domain (FDTD) approach is used, and the uniaxial perfectly matched layer (UPML) medium is adopted for truncation of the FDTD lattices, in which the finite-difference equations can be used for the total computation domain by properly choosing the uniaxial parameters. This makes the parallel FDTD algorithm easier to implement. The parallel performance with different number of processors is illustrated for one rough surface realization and shows that the computation time of our parallel FDTD algorithm is dramatically reduced relative to a single-processor implementation. Finally, the composite scattering coefficients versus scattered and azimuthal angle are presented and analyzed for different conditions, including the surface roughness, the dielectric constants, the polarization, and the size of the 3-D object.
A multilevel in space and energy solver for multigroup diffusion eigenvalue problems

Directory of Open Access Journals (Sweden)

Ben C. Yee

2017-09-01

Full Text Available In this paper, we present a new multilevel in space and energy diffusion (MSED method for solving multigroup diffusion eigenvalue problems. The MSED method can be described as a PI scheme with three additional features: (1 a grey (one-group diffusion equation used to efficiently converge the fission source and eigenvalue, (2 a space-dependent Wielandt shift technique used to reduce the number of PIs required, and (3 a multigrid-in-space linear solver for the linear solves required by each PI step. In MSED, the convergence of the solution of the multigroup diffusion eigenvalue problem is accelerated by performing work on lower-order equations with only one group and/or coarser spatial grids. Results from several Fourier analyses and a one-dimensional test code are provided to verify the efficiency of the MSED method and to justify the incorporation of the grey diffusion equation and the multigrid linear solver. These results highlight the potential efficiency of the MSED method as a solver for multidimensional multigroup diffusion eigenvalue problems, and they serve as a proof of principle for future work. Our ultimate goal is to implement the MSED method as an efficient solver for the two-dimensional/three-dimensional coarse mesh finite difference diffusion system in the Michigan parallel characteristics transport code. The work in this paper represents a necessary step towards that goal.
Two parallel finite queues with simultaneous services and Markovian arrivals

Directory of Open Access Journals (Sweden)

S. R. Chakravarthy

1997-01-01

Full Text Available In this paper, we consider a finite capacity single server queueing model with two buffers, A and B, of sizes K and N respectively. Messages arrive one at a time according to a Markovian arrival process. Messages that arrive at buffer A are of a different type from the messages that arrive at buffer B. Messages are processed according to the following rules: 1. When buffer A(B has a message and buffer B(A is empty, then one message from A(B is processed by the server. 2. When both buffers, A and B, have messages, then two messages, one from A and one from B, are processed simultaneously by the server. The service times are assumed to be exponentially distributed with parameters that may depend on the type of service. This queueing model is studied as a Markov process with a large state space and efficient algorithmic procedures for computing various system performance measures are given. Some numerical examples are discussed.
Multigrid for refined triangle meshes

Energy Technology Data Exchange (ETDEWEB)

Shapira, Yair

1997-02-01

A two-level preconditioning method for the solution of (locally) refined finite element schemes using triangle meshes is introduced. In the isotropic SPD case, it is shown that the condition number of the preconditioned stiffness matrix is bounded uniformly for all sufficiently regular triangulations. This is also verified numerically for an isotropic diffusion problem with highly discontinuous coefficients.
Linear theory of a cold relativistic beam in a strongly magnetized finite-geometry plasma

International Nuclear Information System (INIS)

Gagne, R.R.J.; Shoucri, M.M.

1976-01-01

The linear theory of a finite-geometry cold relativistic beam propagating in a cold homogeneous finite-geometry plasma, is investigated in the case of a strongly magnetized plasma. The beam is assumed to propagate parallel to the external magnetic field. It is shown that the instability which takes place at the Cherenkov resonance ωapprox. =k/subz/v/subb/ is of the convective type. The effect of the finite geometry on the instability growth rate is studied and is shown to decrease the growth rate, with respect to the infinite geometry, by a factor depending on the ratio of the beam-to-plasma radius
Numerical Methods for Forward and Inverse Problems in Discontinuous Media

Energy Technology Data Exchange (ETDEWEB)

Chartier, Timothy P.

2011-03-08

The research emphasis under this grant's funding is in the area of algebraic multigrid methods. The research has two main branches: 1) exploring interdisciplinary applications in which algebraic multigrid can make an impact and 2) extending the scope of algebraic multigrid methods with algorithmic improvements that are based in strong analysis.The work in interdisciplinary applications falls primarily in the field of biomedical imaging. Work under this grant demonstrated the effectiveness and robustness of multigrid for solving linear systems that result from highly heterogeneous finite element method models of the human head. The results in this work also give promise to medical advances possible with software that may be developed. Research to extend the scope of algebraic multigrid has been focused in several areas. In collaboration with researchers at the University of Colorado, Lawrence Livermore National Laboratory, and Los Alamos National Laboratory, the PI developed an adaptive multigrid with subcycling via complementary grids. This method has very cheap computing costs per iterate and is showing promise as a preconditioner for conjugate gradient. Recent work with Los Alamos National Laboratory concentrates on developing algorithms that take advantage of the recent advances in adaptive multigrid research. The results of the various efforts in this research could ultimately have direct use and impact to researchers for a wide variety of applications, including, astrophysics, neuroscience, contaminant transport in porous media, bi-domain heart modeling, modeling of tumor growth, and flow in heterogeneous porous media. This work has already led to basic advances in computational mathematics and numerical linear algebra and will continue to do so into the future.
Transmission Index Research of Parallel Manipulators Based on Matrix Orthogonal Degree

Science.gov (United States)

Shao, Zhu-Feng; Mo, Jiao; Tang, Xiao-Qiang; Wang, Li-Ping

2017-11-01

Performance index is the standard of performance evaluation, and is the foundation of both performance analysis and optimal design for the parallel manipulator. Seeking the suitable kinematic indices is always an important and challenging issue for the parallel manipulator. So far, there are extensive studies in this field, but few existing indices can meet all the requirements, such as simple, intuitive, and universal. To solve this problem, the matrix orthogonal degree is adopted, and generalized transmission indices that can evaluate motion/force transmissibility of fully parallel manipulators are proposed. Transmission performance analysis of typical branches, end effectors, and parallel manipulators is given to illustrate proposed indices and analysis methodology. Simulation and analysis results reveal that proposed transmission indices possess significant advantages, such as normalized finite (ranging from 0 to 1), dimensionally homogeneous, frame-free, intuitive and easy to calculate. Besides, proposed indices well indicate the good transmission region and relativity to the singularity with better resolution than the traditional local conditioning index, and provide a novel tool for kinematic analysis and optimal design of fully parallel manipulators.
Parallel gradient effects on ICRH (Ion Cyclotron Resonance Heating) in Tokamaks

International Nuclear Information System (INIS)

Smithe, D.N.

1987-01-01

This dissertation examines the effects on Ion Cyclotron Resonance Heating of parallel nonuniformity in the magnetic field which arises from the poloidal field in a tokamak and the universal (major radius)/sup /minus/1/ scaling of the cyclotron frequency. The goal of the analysis is the macroscopic warm plasma current including temperature in the sense of the finite Larmor radius expansion and the quasilocal approximation of the parallel guiding center motion. A 1-D numerical application of the fully nonlocal integral dielectric is performed. Parallel gradient effects are studied for He-3 minority, 2nd harmonic deuterium, and hydrogen minority heating in tokamaks. The results show quite significant alteration of the toroidal wavenumber absorption spectrum, and a wealth of new behavior on the local propagation scale. 95 refs., 37 figs
Parallel algorithms and archtectures for computational structural mechanics

Science.gov (United States)

Patrick, Merrell; Ma, Shing; Mahajan, Umesh

1989-01-01

The determination of the fundamental (lowest) natural vibration frequencies and associated mode shapes is a key step used to uncover and correct potential failures or problem areas in most complex structures. However, the computation time taken by finite element codes to evaluate these natural frequencies is significant, often the most computationally intensive part of structural analysis calculations. There is continuing need to reduce this computation time. This study addresses this need by developing methods for parallel computation.
Finite element analysis of multi-material models using a balancing domain decomposition method combined with the diagonal scaling preconditioner

International Nuclear Information System (INIS)

Ogino, Masao

2016-01-01

Actual problems in science and industrial applications are modeled by multi-materials and large-scale unstructured mesh, and the finite element analysis has been widely used to solve such problems on the parallel computer. However, for large-scale problems, the iterative methods for linear finite element equations suffer from slow or no convergence. Therefore, numerical methods having both robust convergence and scalable parallel efficiency are in great demand. The domain decomposition method is well known as an iterative substructuring method, and is an efficient approach for parallel finite element methods. Moreover, the balancing preconditioner achieves robust convergence. However, in case of problems consisting of very different materials, the convergence becomes bad. There are some research to solve this issue, however not suitable for cases of complex shape and composite materials. In this study, to improve convergence of the balancing preconditioner for multi-materials, a balancing preconditioner combined with the diagonal scaling preconditioner, called Scaled-BDD method, is proposed. Some numerical results are included which indicate that the proposed method has robust convergence for the number of subdomains and shows high performances compared with the original balancing preconditioner. (author)
Parallel computing for homogeneous diffusion and transport equations in neutronics

International Nuclear Information System (INIS)

Pinchedez, K.

1999-06-01

Parallel computing meets the ever-increasing requirements for neutronic computer code speed and accuracy. In this work, two different approaches have been considered. We first parallelized the sequential algorithm used by the neutronics code CRONOS developed at the French Atomic Energy Commission. The algorithm computes the dominant eigenvalue associated with PN simplified transport equations by a mixed finite element method. Several parallel algorithms have been developed on distributed memory machines. The performances of the parallel algorithms have been studied experimentally by implementation on a T3D Cray and theoretically by complexity models. A comparison of various parallel algorithms has confirmed the chosen implementations. We next applied a domain sub-division technique to the two-group diffusion Eigen problem. In the modal synthesis-based method, the global spectrum is determined from the partial spectra associated with sub-domains. Then the Eigen problem is expanded on a family composed, on the one hand, from eigenfunctions associated with the sub-domains and, on the other hand, from functions corresponding to the contribution from the interface between the sub-domains. For a 2-D homogeneous core, this modal method has been validated and its accuracy has been measured. (author)

Development of parallel benchmark code by sheet metal forming simulator 'ITAS'

International Nuclear Information System (INIS)

Watanabe, Hiroshi; Suzuki, Shintaro; Minami, Kazuo

1999-03-01

This report describes the development of parallel benchmark code by sheet metal forming simulator 'ITAS'. ITAS is a nonlinear elasto-plastic analysis program by the finite element method for the purpose of the simulation of sheet metal forming. ITAS adopts the dynamic analysis method that computes displacement of sheet metal at every time unit and utilizes the implicit method with the direct linear equation solver. Therefore the simulator is very robust. However, it requires a lot of computational time and memory capacity. In the development of the parallel benchmark code, we designed the code by MPI programming to reduce the computational time. In numerical experiments on the five kinds of parallel super computers at CCSE JAERI, i.e., SP2, SR2201, SX-4, T94 and VPP300, good performances are observed. The result will be shown to the public through WWW so that the benchmark results may become a guideline of research and development of the parallel program. (author)
A parallel graded-mesh FDTD algorithm for human-antenna interaction problems.

Science.gov (United States)

Catarinucci, Luca; Tarricone, Luciano

2009-01-01

The finite difference time domain method (FDTD) is frequently used for the numerical solution of a wide variety of electromagnetic (EM) problems and, among them, those concerning human exposure to EM fields. In many practical cases related to the assessment of occupational EM exposure, large simulation domains are modeled and high space resolution adopted, so that strong memory and central processing unit power requirements have to be satisfied. To better afford the computational effort, the use of parallel computing is a winning approach; alternatively, subgridding techniques are often implemented. However, the simultaneous use of subgridding schemes and parallel algorithms is very new. In this paper, an easy-to-implement and highly-efficient parallel graded-mesh (GM) FDTD scheme is proposed and applied to human-antenna interaction problems, demonstrating its appropriateness in dealing with complex occupational tasks and showing its capability to guarantee the advantages of a traditional subgridding technique without affecting the parallel FDTD performance.
Message-passing-interface-based parallel FDTD investigation on the EM scattering from a 1-D rough sea surface using uniaxial perfectly matched layer absorbing boundary.

Science.gov (United States)

Li, J; Guo, L-X; Zeng, H; Han, X-B

2009-06-01

A message-passing-interface (MPI)-based parallel finite-difference time-domain (FDTD) algorithm for the electromagnetic scattering from a 1-D randomly rough sea surface is presented. The uniaxial perfectly matched layer (UPML) medium is adopted for truncation of FDTD lattices, in which the finite-difference equations can be used for the total computation domain by properly choosing the uniaxial parameters. This makes the parallel FDTD algorithm easier to implement. The parallel performance with different processors is illustrated for one sea surface realization, and the computation time of the parallel FDTD algorithm is dramatically reduced compared to a single-process implementation. Finally, some numerical results are shown, including the backscattering characteristics of sea surface for different polarization and the bistatic scattering from a sea surface with large incident angle and large wind speed.
Development of new multigrid schemes for the method of characteristics in neutron transport theory

International Nuclear Information System (INIS)

Grassi, G.

2006-01-01

This dissertation is based upon our doctoral research that dealt with the conception and development of new non-linear multigrid techniques for the Method of the Characteristics (MOC) within the TDT code. Here we focus upon a two-level scheme consisting of a fine level on which the neutron transport equation is iteratively solved using the MOC algorithm, and a coarse level defined by a more coarsely discretized phase space on which a low-order problem is considered. The solution of this problem is then used in order to correct the angular flux moments resulting from the previous transport iteration. A flux-volume homogenization procedure is employed to evaluate the coarse-level material properties after each transport iteration. This entails the non-linearity of the methods. According to the Generalised Equivalence Theory (GET), additional degrees of freedom are introduced for the low-order problem so that the convergence of the acceleration scheme can be ensured. We present two classes of non-linear methods: transport-like methods and discussion-like methods. Transport-like methods consider a homogenized low-order transport problem on the coarse level. This problem is iteratively solved using the same MOC algorithm as for the transport problem on the fine level. Discontinuity factors are then employed, per region or per surface, in order to reconstruct the currents evaluated by the low-order operator, which ensure the convergence of the acceleration scheme. On the other hand, discussion-like methods consider a low-order problem inspired by diffusion. We studied the non-linear Coarse Mesh Finite Difference (CMFD) method, already present in literature, in the perspective of integrating it into TDT code. Then, we developed a new non-linear method on the model of CMFD. From the latter, we borrowed the idea to establish a simple relation between currents and fluxes in order to obtain a problem involving only coarse fluxes. Finally, those non-linear methods have been
New Parallel Algorithms for Landscape Evolution Model

Science.gov (United States)

Jin, Y.; Zhang, H.; Shi, Y.

2017-12-01

Most landscape evolution models (LEM) developed in the last two decades solve the diffusion equation to simulate the transportation of surface sediments. This numerical approach is difficult to parallelize due to the computation of drainage area for each node, which needs huge amount of communication if run in parallel. In order to overcome this difficulty, we developed two parallel algorithms for LEM with a stream net. One algorithm handles the partition of grid with traditional methods and applies an efficient global reduction algorithm to do the computation of drainage areas and transport rates for the stream net; the other algorithm is based on a new partition algorithm, which partitions the nodes in catchments between processes first, and then partitions the cells according to the partition of nodes. Both methods focus on decreasing communication between processes and take the advantage of massive computing techniques, and numerical experiments show that they are both adequate to handle large scale problems with millions of cells. We implemented the two algorithms in our program based on the widely used finite element library deal.II, so that it can be easily coupled with ASPECT.
Parallel algorithms for solving the diffusion equation by finite elements methods and by nodal methods

International Nuclear Information System (INIS)

Coulomb, F.

1989-06-01

The aim of this work is to study methods for solving the diffusion equation, based on a primal or mixed-dual finite elements discretization and well suited for use on multiprocessors computers; domain decomposition methods are the subject of the main part of this study, the linear systems being solved by the block-Jacobi method. The origin of the diffusion equation is explained in short, and various variational formulations are reminded. A survey of iterative methods is given. The elemination of the flux or current is treated in the case of a mixed method. Numerical tests are performed on two examples of reactors, in order to compare mixed elements and Lagrange elements. A theoretical study of domain decomposition is led in the case of Lagrange finite elements, and convergence conditions for the block-Jacobi method are derived; the dissection decomposition is previously the purpose of a particular numerical analysis. In the case of mixed-dual finite elements, a study is led on examples and is confirmed by numerical tests performed for the dissection decomposition; furthermore, after being justified, decompositions along axes of symmetry are numerically tested. In the case of a decomposition into two subdomains, the dissection decomposition and the decomposition with an integrated interface are compared. Alternative directions methods are defined; the convergence of those relative to Lagrange elements is shown; in the case of mixed elements, convergence conditions are found [fr
Accuracy of finite-difference harmonic frequencies in density functional theory.

Science.gov (United States)

Liu, Kuan-Yu; Liu, Jie; Herbert, John M

2017-07-15

Analytic Hessians are often viewed as essential for the calculation of accurate harmonic frequencies, but the implementation of analytic second derivatives is nontrivial and solution of the requisite coupled-perturbed equations engenders a sizable memory footprint for large systems, given that these equations are not required for energy and gradient calculations in density functional theory. Here, we benchmark the alternative approach to harmonic frequencies based on finite differences of analytic first derivatives, a procedure that is amenable to large-scale parallelization. Not only for absolute frequencies but also for isotopic and conformer-dependent frequency shifts in flexible molecules, we find that the finite-difference approach exhibits mean errors numbers. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
OpenSeesPy: Python library for the OpenSees finite element framework

Science.gov (United States)

Zhu, Minjie; McKenna, Frank; Scott, Michael H.

2018-01-01

OpenSees, an open source finite element software framework, has been used broadly in the earthquake engineering community for simulating the seismic response of structural and geotechnical systems. The framework allows users to perform finite element analysis with a scripting language and for developers to create both serial and parallel finite element computer applications as interpreters. For the last 15 years, Tcl has been the primary scripting language to which the model building and analysis modules of OpenSees are linked. To provide users with different scripting language options, particularly Python, the OpenSees interpreter interface was refactored to provide multi-interpreter capabilities. This refactoring, resulting in the creation of OpenSeesPy as a Python module, is accomplished through an abstract interface for interpreter calls with concrete implementations for different scripting languages. Through this approach, users are able to develop applications that utilize the unique features of several scripting languages while taking advantage of advanced finite element analysis models and algorithms.
The Analysis of Quadrupole Magnetic Focusing Effect by Finite Element Method

International Nuclear Information System (INIS)

Utaja

2003-01-01

Quadrupole magnets will introduce focusing effect to a beam of the charge particle passing parallel to the magnet faces. The focusing effect is need to control the particle beam, so that it is in accordance with necessity requirement stated. This paper describes the analysis of focusing effect on the quadrupole magnetic by the finite element method. The finite element method in this paper is used for solve the potential distribution of magnetic field. If the potential magnetic field distribution in every node have known, a charge particle trajectory can be traced. This charge particle trajectory will secure the focusing effect of the quadrupole magnets. (author)
De Novo Ultrascale Atomistic Simulations On High-End Parallel Supercomputers

Energy Technology Data Exchange (ETDEWEB)

Nakano, A; Kalia, R K; Nomura, K; Sharma, A; Vashishta, P; Shimojo, F; van Duin, A; Goddard, III, W A; Biswas, R; Srivastava, D; Yang, L H

2006-09-04

We present a de novo hierarchical simulation framework for first-principles based predictive simulations of materials and their validation on high-end parallel supercomputers and geographically distributed clusters. In this framework, high-end chemically reactive and non-reactive molecular dynamics (MD) simulations explore a wide solution space to discover microscopic mechanisms that govern macroscopic material properties, into which highly accurate quantum mechanical (QM) simulations are embedded to validate the discovered mechanisms and quantify the uncertainty of the solution. The framework includes an embedded divide-and-conquer (EDC) algorithmic framework for the design of linear-scaling simulation algorithms with minimal bandwidth complexity and tight error control. The EDC framework also enables adaptive hierarchical simulation with automated model transitioning assisted by graph-based event tracking. A tunable hierarchical cellular decomposition parallelization framework then maps the O(N) EDC algorithms onto Petaflops computers, while achieving performance tunability through a hierarchy of parameterized cell data/computation structures, as well as its implementation using hybrid Grid remote procedure call + message passing + threads programming. High-end computing platforms such as IBM BlueGene/L, SGI Altix 3000 and the NSF TeraGrid provide an excellent test grounds for the framework. On these platforms, we have achieved unprecedented scales of quantum-mechanically accurate and well validated, chemically reactive atomistic simulations--1.06 billion-atom fast reactive force-field MD and 11.8 million-atom (1.04 trillion grid points) quantum-mechanical MD in the framework of the EDC density functional theory on adaptive multigrids--in addition to 134 billion-atom non-reactive space-time multiresolution MD, with the parallel efficiency as high as 0.998 on 65,536 dual-processor BlueGene/L nodes. We have also achieved an automated execution of hierarchical QM
Strong Bisimilarity and Regularity of Basic Parallel Processes is PSPACE-Hard

DEFF Research Database (Denmark)

Srba, Jirí

2002-01-01

We show that the problem of checking whether two processes definable in the syntax of Basic Parallel Processes (BPP) are strongly bisimilar is PSPACE-hard. We also demonstrate that there is a polynomial time reduction from the strong bisimilarity checking problem of regular BPP to the strong...... regularity (finiteness) checking of BPP. This implies that strong regularity of BPP is also PSPACE-hard....
Heat transfer model and finite element formulation for simulation of selective laser melting

Science.gov (United States)

Roy, Souvik; Juha, Mario; Shephard, Mark S.; Maniatty, Antoinette M.

2017-10-01

A novel approach and finite element formulation for modeling the melting, consolidation, and re-solidification process that occurs in selective laser melting additive manufacturing is presented. Two state variables are introduced to track the phase (melt/solid) and the degree of consolidation (powder/fully dense). The effect of the consolidation on the absorption of the laser energy into the material as it transforms from a porous powder to a dense melt is considered. A Lagrangian finite element formulation, which solves the governing equations on the unconsolidated reference configuration is derived, which naturally considers the effect of the changing geometry as the powder melts without needing to update the simulation domain. The finite element model is implemented into a general-purpose parallel finite element solver. Results are presented comparing to experimental results in the literature for a single laser track with good agreement. Predictions for a spiral laser pattern are also shown.
Multigrid solution of incompressible turbulent flows by using two-equation turbulence models

Energy Technology Data Exchange (ETDEWEB)

Zheng, X.; Liu, C. [Front Range Scientific Computations, Inc., Denver, CO (United States); Sung, C.H. [David Taylor Model Basin, Bethesda, MD (United States)

1996-12-31

Most of practical flows are turbulent. From the interest of engineering applications, simulation of realistic flows is usually done through solution of Reynolds-averaged Navier-Stokes equations and turbulence model equations. It has been widely accepted that turbulence modeling plays a very important role in numerical simulation of practical flow problem, particularly when the accuracy is of great concern. Among the most used turbulence models today, two-equation models appear to be favored for the reason that they are more general than algebraic models and affordable with current available computer resources. However, investigators using two-equation models seem to have been more concerned with the solution of N-S equations. Less attention is paid to the solution method for the turbulence model equations. In most cases, the turbulence model equations are loosely coupled with N-S equations, multigrid acceleration is only applied to the solution of N-S equations due to perhaps the fact the turbulence model equations are source-term dominant and very stiff in sublayer region.
Finite element modelling

International Nuclear Information System (INIS)

Tonks, M.R.; Williamson, R.; Masson, R.

2015-01-01

The Finite Element Method (FEM) is a numerical technique for finding approximate solutions to boundary value problems. While FEM is commonly used to solve solid mechanics equations, it can be applied to a large range of BVPs from many different fields. FEM has been used for reactor fuels modelling for many years. It is most often used for fuel performance modelling at the pellet and pin scale, however, it has also been used to investigate properties of the fuel material, such as thermal conductivity and fission gas release. Recently, the United Stated Department Nuclear Energy Advanced Modelling and Simulation Program has begun using FEM as the basis of the MOOSE-BISON-MARMOT Project that is developing a multi-dimensional, multi-physics fuel performance capability that is massively parallel and will use multi-scale material models to provide a truly predictive modelling capability. (authors)
Eddy current testing probe optimization using a parallel genetic algorithm

Directory of Open Access Journals (Sweden)

Dolapchiev Ivaylo

2008-01-01

Full Text Available This paper uses the developed parallel version of Michalewicz's Genocop III Genetic Algorithm (GA searching technique to optimize the coil geometry of an eddy current non-destructive testing probe (ECTP. The electromagnetic field is computed using FEMM 2D finite element code. The aim of this optimization was to determine coil dimensions and positions that improve ECTP sensitivity to physical properties of the tested devices.
Switching current imbalance mitigation in power modules with parallel connected SiC MOSFETs

DEFF Research Database (Denmark)

Beczkowski, Szymon; Jørgensen, Asger Bjørn; Li, Helong

2017-01-01

Multichip power modules use parallel connected chips to achieve high current rating. Due to a finite flexibility in a DBC layout, some electrical asymmetries will occur in the module. Parallel connected transistors will exhibit uneven static and dynamic current sharing due to these asymmetries....... Especially important are the couplings between gate and power loops of individual transistors. Fast changing source currents cause gate voltage imbalances yielding uneven switching currents. Equalizing gate voltages seen by paralleled transistors, done by adjusting source bond wires, is proposed...... in this paper. Analysis is performed on an industry standard DBC layout using numerically extracted module parasitics. The method of tuning individual source inductances shows clear improvement in dynamic current balancing and prevents excessive current overshoot during transistors turn-on....
Parallel Dynamic Analysis of a Large-Scale Water Conveyance Tunnel under Seismic Excitation Using ALE Finite-Element Method

Directory of Open Access Journals (Sweden)

Xiaoqing Wang

2016-01-01

Full Text Available Parallel analyses about the dynamic responses of a large-scale water conveyance tunnel under seismic excitation are presented in this paper. A full three-dimensional numerical model considering the water-tunnel-soil coupling is established and adopted to investigate the tunnel’s dynamic responses. The movement and sloshing of the internal water are simulated using the multi-material Arbitrary Lagrangian Eulerian (ALE method. Nonlinear fluid–structure interaction (FSI between tunnel and inner water is treated by using the penalty method. Nonlinear soil-structure interaction (SSI between soil and tunnel is dealt with by using the surface to surface contact algorithm. To overcome computing power limitations and to deal with such a large-scale calculation, a parallel algorithm based on the modified recursive coordinate bisection (MRCB considering the balance of SSI and FSI loads is proposed and used. The whole simulation is accomplished on Dawning 5000 A using the proposed MRCB based parallel algorithm optimized to run on supercomputers. The simulation model and the proposed approaches are validated by comparison with the added mass method. Dynamic responses of the tunnel are analyzed and the parallelism is discussed. Besides, factors affecting the dynamic responses are investigated. Better speedup and parallel efficiency show the scalability of the parallel method and the analysis results can be used to aid in the design of water conveyance tunnels.
The parallel dynamics of drift wave turbulence in the WEGA stellarator

Energy Technology Data Exchange (ETDEWEB)

Marsen, S; Endler, M; Otte, M; Wagner, F, E-mail: stefan.marsen@ipp.mpg.d [Max-Planck-Institut fuer Plasmaphysik, EURATOM Association, Wendelsteinstrasse 1, 17491 Greifswald (Germany)

2009-08-15

The three-dimensional structure of turbulence in the edge (inside the last closed flux surface) of the WEGA stellarator is studied focusing on the parallel dynamics. WEGA as a small stellarator with moderate plasma parameters offers the opportunity to study turbulence with Langmuir probes providing high spatial and temporal resolution. Multiple probes with radial, poloidal and toroidal resolution are used to measure density fluctuations. Correlation analysis is used to reconstruct a 3D picture of turbulent structures. We find that these structures originate predominantly on the low field side and have a three-dimensional character with a finite averaged parallel wavenumber. The ratio between the parallel and perpendicular wavenumber component is in the order of 10{sup -2}. The parallel dynamics are compared at magnetic inductions of 57 and 500 mT. At 500 mT, the parallel wavelength is in the order of the field line connection length 2{pi}R{iota}-bar. A frequency resolved measure of k{sub ||}/k{sub {theta}} shows a constant ratio in this case. At 57 mT the observed k{sub ||} is much smaller than at 500 mT. However, the observed small average value is due to an averaging over positive and negative components pointing parallel and antiparallel to the magnetic field vector.
Untitled

Indian Academy of Sciences (India)

Various techniques have been proposed to accelerate the convergence rates. Raithby. (1976) and Patankar (1981) ... methods (Oran & Boris 1987) and multigrid techniques (Peric et al 1988) can also be used to enhance convergence rates. ... PEA involves manipulations of finite difference equations of the velocities of all ...
A PARALLEL NONOVERLAPPING DOMAIN DECOMPOSITION METHOD FOR STOKES PROBLEMS

Institute of Scientific and Technical Information of China (English)

Mei-qun Jiang; Pei-liang Dai

2006-01-01

A nonoverlapping domain decomposition iterative procedure is developed and analyzed for generalized Stokes problems and their finite element approximate problems in RN(N=2,3). The method is based on a mixed-type consistency condition with two parameters as a transmission condition together with a derivative-free transmission data updating technique on the artificial interfaces. The method can be applied to a general multi-subdomain decomposition and implemented on parallel machines with local simple communications naturally.

Matrix equation decomposition and parallel solution of systems resulting from unstructured finite element problems in electromagnetics

Energy Technology Data Exchange (ETDEWEB)

Cwik, T. [California Institute of Technology, Pasadena, CA (United States); Katz, D.S. [Cray Research, El Segundo, CA (United States)

1996-12-31

Finite element modeling has proven useful for accurately simulating scattered or radiated electromagnetic fields from complex three-dimensional objects whose geometry varies on the scale of a fraction of an electrical wavelength. An unstructured finite element model of realistic objects leads to a large, sparse, system of equations that needs to be solved efficiently with regard to machine memory and execution time. Both factorization and iterative solvers can be used to produce solutions to these systems of equations. Factorization leads to high memory requirements that limit the electrical problem size of three-dimensional objects that can be modeled. An iterative solver can be used to efficiently solve the system without excessive memory use and in a minimal amount of time if the convergence rate is controlled.
A parallel algorithm for the two-dimensional time fractional diffusion equation with implicit difference method.

Science.gov (United States)

Gong, Chunye; Bao, Weimin; Tang, Guojian; Jiang, Yuewen; Liu, Jie

2014-01-01

It is very time consuming to solve fractional differential equations. The computational complexity of two-dimensional fractional differential equation (2D-TFDE) with iterative implicit finite difference method is O(M(x)M(y)N(2)). In this paper, we present a parallel algorithm for 2D-TFDE and give an in-depth discussion about this algorithm. A task distribution model and data layout with virtual boundary are designed for this parallel algorithm. The experimental results show that the parallel algorithm compares well with the exact solution. The parallel algorithm on single Intel Xeon X5540 CPU runs 3.16-4.17 times faster than the serial algorithm on single CPU core. The parallel efficiency of 81 processes is up to 88.24% compared with 9 processes on a distributed memory cluster system. We do think that the parallel computing technology will become a very basic method for the computational intensive fractional applications in the near future.
Lattice gauge theory using parallel processors

International Nuclear Information System (INIS)

Lee, T.D.; Chou, K.C.; Zichichi, A.

1987-01-01

The book's contents include: Lattice Gauge Theory Lectures: Introduction and Current Fermion Simulations; Monte Carlo Algorithms for Lattice Gauge Theory; Specialized Computers for Lattice Gauge Theory; Lattice Gauge Theory at Finite Temperature: A Monte Carlo Study; Computational Method - An Elementary Introduction to the Langevin Equation, Present Status of Numerical Quantum Chromodynamics; Random Lattice Field Theory; The GF11 Processor and Compiler; and The APE Computer and First Physics Results; Columbia Supercomputer Project: Parallel Supercomputer for Lattice QCD; Statistical and Systematic Errors in Numerical Simulations; Monte Carlo Simulation for LGT and Programming Techniques on the Columbia Supercomputer; Food for Thought: Five Lectures on Lattice Gauge Theory
Similarity solutions of time-dependent relativistic radiation-hydrodynamical plane-parallel flows

Science.gov (United States)

Fukue, Jun

2018-04-01

Similarity solutions are examined for the frequency-integrated relativistic radiation-hydrodynamical flows, which are described by the comoving quantities. The flows are vertical plane-parallel time-dependent ones with a gray opacity coefficient. For adequate boundary conditions, the flows are accelerated in a somewhat homologous manner, but terminate at some singular locus, which originates from the pathological behavior in relativistic radiation moment equations truncated in finite orders.
Effective arithmetic in finite fields based on Chudnovsky's multiplication algorithm

OpenAIRE

Atighehchi , Kévin; Ballet , Stéphane; Bonnecaze , Alexis; Rolland , Robert

2016-01-01

International audience; Thanks to a new construction of the Chudnovsky and Chudnovsky multiplication algorithm, we design efficient algorithms for both the exponentiation and the multiplication in finite fields. They are tailored to hardware implementation and they allow computations to be parallelized, while maintaining a low number of bilinear multiplications.À partir d'une nouvelle construction de l'algorithme de multiplication de Chudnovsky et Chudnovsky, nous concevons des algorithmes ef...
Parallel iterative solution of the Hermite Collocation equations on GPUs II

International Nuclear Information System (INIS)

Vilanakis, N; Mathioudakis, E

2014-01-01

Hermite Collocation is a high order finite element method for Boundary Value Problems modelling applications in several fields of science and engineering. Application of this integration free numerical solver for the solution of linear BVPs results in a large and sparse general system of algebraic equations, suggesting the usage of an efficient iterative solver especially for realistic simulations. In part I of this work an efficient parallel algorithm of the Schur complement method coupled with Bi-Conjugate Gradient Stabilized (BiCGSTAB) iterative solver has been designed for multicore computing architectures with a Graphics Processing Unit (GPU). In the present work the proposed algorithm has been extended for high performance computing environments consisting of multiprocessor machines with multiple GPUs. Since this is a distributed GPU and shared CPU memory parallel architecture, a hybrid memory treatment is needed for the development of the parallel algorithm. The realization of the algorithm took place on a multiprocessor machine HP SL390 with Tesla M2070 GPUs using the OpenMP and OpenACC standards. Execution time measurements reveal the efficiency of the parallel implementation
The FORCE: A portable parallel programming language supporting computational structural mechanics

Science.gov (United States)

Jordan, Harry F.; Benten, Muhammad S.; Brehm, Juergen; Ramanan, Aruna

1989-01-01

This project supports the conversion of codes in Computational Structural Mechanics (CSM) to a parallel form which will efficiently exploit the computational power available from multiprocessors. The work is a part of a comprehensive, FORTRAN-based system to form a basis for a parallel version of the NICE/SPAR combination which will form the CSM Testbed. The software is macro-based and rests on the force methodology developed by the principal investigator in connection with an early scientific multiprocessor. Machine independence is an important characteristic of the system so that retargeting it to the Flex/32, or any other multiprocessor on which NICE/SPAR might be imnplemented, is well supported. The principal investigator has experience in producing parallel software for both full and sparse systems of linear equations using the force macros. Other researchers have used the Force in finite element programs. It has been possible to rapidly develop software which performs at maximum efficiency on a multiprocessor. The inherent machine independence of the system also means that the parallelization will not be limited to a specific multiprocessor.
SPINET: A Parallel Computing Approach to Spine Simulations

Directory of Open Access Journals (Sweden)

Peter G. Kropf

1996-01-01

Full Text Available Research in scientitic programming enables us to realize more and more complex applications, and on the other hand, application-driven demands on computing methods and power are continuously growing. Therefore, interdisciplinary approaches become more widely used. The interdisciplinary SPINET project presented in this article applies modern scientific computing tools to biomechanical simulations: parallel computing and symbolic and modern functional programming. The target application is the human spine. Simulations of the spine help us to investigate and better understand the mechanisms of back pain and spinal injury. Two approaches have been used: the first uses the finite element method for high-performance simulations of static biomechanical models, and the second generates a simulation developmenttool for experimenting with different dynamic models. A finite element program for static analysis has been parallelized for the MUSIC machine. To solve the sparse system of linear equations, a conjugate gradient solver (iterative method and a frontal solver (direct method have been implemented. The preprocessor required for the frontal solver is written in the modern functional programming language SML, the solver itself in C, thus exploiting the characteristic advantages of both functional and imperative programming. The speedup analysis of both solvers show very satisfactory results for this irregular problem. A mixed symbolic-numeric environment for rigid body system simulations is presented. It automatically generates C code from a problem specification expressed by the Lagrange formalism using Maple.
In-beam test of the Boron-10 Multi-Grid neutron detector at the IN6 time-of-flight spectrometer at the ILL

Energy Technology Data Exchange (ETDEWEB)

Birch, J; Hultman, L; Höglund, C [Linköping University, Thin Film Physics Division, IFM, SE-581 83 Linköping (Sweden); Buffet, J-C; Clergeau, J-F; Correa, J; Van Esch, P; Ferraton, M; Guerard, B; Halbwachs, J; Khaplanov, A; Koza, M; Piscitelli, F; Zbiri, M [Institute Laue Langevin, Rue Jules Horowitz, FR-38000 Grenoble (France); Hall-Wilton, R [European Spallation Source ESS AB, P.O Box 176, SE-221 00 Lund (Sweden)

2014-07-24

A neutron detector concept based on solid layers of boron carbide enriched in {sup 10}B has been in development for the last few years as an alternative for {sup 3}He by collaboration between the ILL, ESS and Linköping University. This Multi-Grid detector uses layers of aluminum substrates coated with {sup 10}B{sub 4}C on both sides that are traversed by the incoming neutrons. Detection is achieved using a gas counter readout principle. By segmenting the substrate and using multiple anode wires, the detector is made inherently position sensitive. This development is aimed primarily at neutron scattering instruments with large detector areas, such as time-of-flight chopper spectrometers. The most recent prototype has been built to be interchangeable with the {sup 3}He detectors of IN6 at ILL. The {sup 10}B detector has an active area of 32 x 48cm{sup 2}. It was installed at the IN6 instrument and operated for several weeks, collecting data in parallel with the regularly scheduled experiments, thus providing the first side-by-side comparison with the conventional {sup 3}He detectors. Results include an efficiency comparison, assessment of the in-detector scattering contribution, sensitivity to gamma-rays and the signal-to-noise ratio in time-of-flight spectra. The good expected performance has been confirmed with the exception of an unexpected background count rate. This has been identified as natural alpha activity in aluminum. New convertor substrates are under study to eliminate this source of background.
The DANTE Boltzmann transport solver: An unstructured mesh, 3-D, spherical harmonics algorithm compatible with parallel computer architectures

International Nuclear Information System (INIS)

McGhee, J.M.; Roberts, R.M.; Morel, J.E.

1997-01-01

A spherical harmonics research code (DANTE) has been developed which is compatible with parallel computer architectures. DANTE provides 3-D, multi-material, deterministic, transport capabilities using an arbitrary finite element mesh. The linearized Boltzmann transport equation is solved in a second order self-adjoint form utilizing a Galerkin finite element spatial differencing scheme. The core solver utilizes a preconditioned conjugate gradient algorithm. Other distinguishing features of the code include options for discrete-ordinates and simplified spherical harmonics angular differencing, an exact Marshak boundary treatment for arbitrarily oriented boundary faces, in-line matrix construction techniques to minimize memory consumption, and an effective diffusion based preconditioner for scattering dominated problems. Algorithm efficiency is demonstrated for a massively parallel SIMD architecture (CM-5), and compatibility with MPP multiprocessor platforms or workstation clusters is anticipated
Finite element analysis of turbulent flow in fast reactor fuel subassembly elementary flow cell

International Nuclear Information System (INIS)

Muehlbauer, P.

1987-03-01

The method is described of calculating fully developed longitudinal steady-state turbulent flow of an incompressible fluid through an infinite bundle of parallel smooth rods, based on the finite element method and one-equation turbulence model. Theoretical calculation results are compared with experimental results. (author). 5 figs., 3 refs
Sequential and parallel image restoration: neural network implementations.

Science.gov (United States)

Figueiredo, M T; Leitao, J N

1994-01-01

Sequential and parallel image restoration algorithms and their implementations on neural networks are proposed. For images degraded by linear blur and contaminated by additive white Gaussian noise, maximum a posteriori (MAP) estimation and regularization theory lead to the same high dimension convex optimization problem. The commonly adopted strategy (in using neural networks for image restoration) is to map the objective function of the optimization problem into the energy of a predefined network, taking advantage of its energy minimization properties. Departing from this approach, we propose neural implementations of iterative minimization algorithms which are first proved to converge. The developed schemes are based on modified Hopfield (1985) networks of graded elements, with both sequential and parallel updating schedules. An algorithm supported on a fully standard Hopfield network (binary elements and zero autoconnections) is also considered. Robustness with respect to finite numerical precision is studied, and examples with real images are presented.
Three-dimensional finite element analysis of different implant configurations for a mandibular fixed prosthesis.

Science.gov (United States)

Fazi, Giovanni; Tellini, Simone; Vangi, Dario; Branchi, Roberto

2011-01-01

The distribution of stresses in bone, implants, and prosthesis were analyzed via three-dimensional finite element modeling in different implant configurations for a fixed implant-supported prosthesis in an edentulous mandible. A finite element model was created with data obtained from computed tomographic scans of a human mandible. Anisotropic characteristics for cortical and cancellous bone were incorporated into the model. Six different configurations of intraforaminal implants were tested, with the number of implants varying from three to five and the distal implants inserted either parallel to the other implants or tilted distally by 17 or 34 degrees. A prosthetic structure connecting the implants was designed, with 20-mm posterior cantilevers for the parallel implant configurations, and a load of 200 N was applied to the distal portion of the cantilevers. Stresses were measured at the level of the implant, the prosthetic structure, and the bone. Bone-level stresses were analyzed at the implant-bone interface, at the external cortical bone surface, distal to the terminal implant, and in the cancellous bone along the implant body. A three-parallel-implant configuration resulted in higher stress in the implant and bone than configurations with four or five parallel implants. Configurations with the distal implants tilted resulted in a more favorable stress distribution at all levels. In parallel-implant configurations for fixed implant-supported mandibular prostheses, four and five implants resulted in similar stress distribution in the bone, framework, and implants. A distribution of four implants with the distal implants tilted 34 degrees (ie, the "All-on-Four" configuration) resulted in a favorable reduction of stresses in the bone, framework, and implants.
Behaviour of parallel girders stabilised with U-frames

DEFF Research Database (Denmark)

Virdi, Kuldeep; Azzi, Walid

2010-01-01

Lateral torsional buckling is a key factor in the design of steel girders. Stability can be enhanced by cross-bracing, reducing the effective length and thus increasing the ultimate capacity. U-frames are an option often used to brace the girders when designing through type of bridges and where...... overhead bracing is not practical. This paper investigates the effect of the U-frame spacing on the stability of the parallel girders. Eigenvalue buckling analysis was undertaken with four different spacings of the U-frames. Results were extracted from finite element analysis, interpreted and conclusions...
High performance shallow water kernels for parallel overland flow simulations based on FullSWOF2D

KAUST Repository

Wittmann, Roland; Bungartz, Hans-Joachim; Neumann, Philipp

2017-01-01

-by-step transformation of the second order finite volume scheme in FullSWOF2D towards MPI parallelization. Second, the computational kernels are optimized by the use of templates and a portable vectorization approach. We discuss the load imbalance of the flux computation
A massively-parallel electronic-structure calculations based on real-space density functional theory

International Nuclear Information System (INIS)

Iwata, Jun-Ichi; Takahashi, Daisuke; Oshiyama, Atsushi; Boku, Taisuke; Shiraishi, Kenji; Okada, Susumu; Yabana, Kazuhiro

2010-01-01

Based on the real-space finite-difference method, we have developed a first-principles density functional program that efficiently performs large-scale calculations on massively-parallel computers. In addition to efficient parallel implementation, we also implemented several computational improvements, substantially reducing the computational costs of O(N 3 ) operations such as the Gram-Schmidt procedure and subspace diagonalization. Using the program on a massively-parallel computer cluster with a theoretical peak performance of several TFLOPS, we perform electronic-structure calculations for a system consisting of over 10,000 Si atoms, and obtain a self-consistent electronic-structure in a few hundred hours. We analyze in detail the costs of the program in terms of computation and of inter-node communications to clarify the efficiency, the applicability, and the possibility for further improvements.
Parallel simulation of tsunami inundation on a large-scale supercomputer

Science.gov (United States)

Oishi, Y.; Imamura, F.; Sugawara, D.

2013-12-01

An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the
On some Aitken-like acceleration of the Schwarz method

Science.gov (United States)

Garbey, M.; Tromeur-Dervout, D.

2002-12-01

In this paper we present a family of domain decomposition based on Aitken-like acceleration of the Schwarz method seen as an iterative procedure with a linear rate of convergence. We first present the so-called Aitken-Schwarz procedure for linear differential operators. The solver can be a direct solver when applied to the Helmholtz problem with five-point finite difference scheme on regular grids. We then introduce the Steffensen-Schwarz variant which is an iterative domain decomposition solver that can be applied to linear and nonlinear problems. We show that these solvers have reasonable numerical efficiency compared to classical fast solvers for the Poisson problem or multigrids for more general linear and nonlinear elliptic problems. However, the salient feature of our method is that our algorithm has high tolerance to slow network in the context of distributed parallel computing and is attractive, generally speaking, to use with computer architecture for which performance is limited by the memory bandwidth rather than the flop performance of the CPU. This is nowadays the case for most parallel. computer using the RISC processor architecture. We will illustrate this highly desirable property of our algorithm with large-scale computing experiments.
Isovector pairing effect on nuclear moment of inertia at finite temperature in N = Z even–even systems

International Nuclear Information System (INIS)

Ami, I.; Fellah, M.; Allal, N.H.; Benhamouda, N.; Oudih, M.R.; Belabbas, M.

2011-01-01

Expressions of temperature-dependent perpendicular (ℑ⊥) and parallel (ℑ‖) moments of inertia, including isovector pairing effects, have been established using the cranking method. They are derived from recently proposed temperature-dependent gap equations. The obtained expressions generalize the conventional finite-temperature BCS (FTBCS) ones. Numerical calculations have been carried out within the framework of the schematic Richardson model as well as for nuclei such as N = Z, using the single-particle energies and eigenstates of a deformed Woods–Saxon mean-field. ℑ⊥ and ℑ‖ have been studied as a function of the temperature. It has been shown that the isovector pairing effect on both the perpendicular and parallel moments of inertia is non-negligible at finite temperature. These correlations must thus be taking into account in studies of warm rotating nuclei in the N ≃ Z region. (author)
Analysis of secondary particle behavior in multiaperture, multigrid accelerator for the ITER neutral beam injector.

Science.gov (United States)

Mizuno, T; Taniguchi, M; Kashiwagi, M; Umeda, N; Tobari, H; Watanabe, K; Dairaku, M; Sakamoto, K; Inoue, T

2010-02-01

Heat load on acceleration grids by secondary particles such as electrons, neutrals, and positive ions, is a key issue for long pulse acceleration of negative ion beams. Complicated behaviors of the secondary particles in multiaperture, multigrid (MAMuG) accelerator have been analyzed using electrostatic accelerator Monte Carlo code. The analytical result is compared to experimental one obtained in a long pulse operation of a MeV accelerator, of which second acceleration grid (A2G) was removed for simplification of structure. The analytical results show that relatively high heat load on the third acceleration grid (A3G) since stripped electrons were deposited mainly on A3G. This heat load on the A3G can be suppressed by installing the A2G. Thus, capability of MAMuG accelerator is demonstrated for suppression of heat load due to secondary particles by the intermediate grids.

Simulating electron wave dynamics in graphene superlattices exploiting parallel processing advantages

Science.gov (United States)

Rodrigues, Manuel J.; Fernandes, David E.; Silveirinha, Mário G.; Falcão, Gabriel

2018-01-01

This work introduces a parallel computing framework to characterize the propagation of electron waves in graphene-based nanostructures. The electron wave dynamics is modeled using both "microscopic" and effective medium formalisms and the numerical solution of the two-dimensional massless Dirac equation is determined using a Finite-Difference Time-Domain scheme. The propagation of electron waves in graphene superlattices with localized scattering centers is studied, and the role of the symmetry of the microscopic potential in the electron velocity is discussed. The computational methodologies target the parallel capabilities of heterogeneous multi-core CPU and multi-GPU environments and are built with the OpenCL parallel programming framework which provides a portable, vendor agnostic and high throughput-performance solution. The proposed heterogeneous multi-GPU implementation achieves speedup ratios up to 75x when compared to multi-thread and multi-core CPU execution, reducing simulation times from several hours to a couple of minutes.
Direct and iterative algorithms for the parallel solution of the one-dimensional macroscopic Navier-Stokes equations

International Nuclear Information System (INIS)

Doster, J.M.; Sills, E.D.

1986-01-01

Current efforts are under way to develop and evaluate numerical algorithms for the parallel solution of the large sparse matrix equations associated with the finite difference representation of the macroscopic Navier-Stokes equations. Previous work has shown that these equations can be cast into smaller coupled matrix equations suitable for solution utilizing multiple computer processors operating in parallel. The individual processors themselves may exhibit parallelism through the use of vector pipelines. This wor, has concentrated on the one-dimensional drift flux form of the Navier-Stokes equations. Direct and iterative algorithms that may be suitable for implementation on parallel computer architectures are evaluated in terms of accuracy and overall execution speed. This work has application to engineering and training simulations, on-line process control systems, and engineering workstations where increased computational speeds are required
Architecture and program structures for a special purpose finite element computer

Energy Technology Data Exchange (ETDEWEB)

Norrie, D.H.; Norrie, C.W.

1983-01-01

The development of very large scale integration (VLSI) has made special-purpose computers economically possible. With such a machine, the loss of flexibility compared with a general-purpose computer can be offset by the increased speed which can be obtained by tailoring the architecture to the particular problem or class of problem. The first kind of special-purpose machine has its architecture modelled on the physical structure of the problem and the second kind has its design tailored to the computational algorithm used. The parallel finite element machine (PARFEM) being designed at the University of Calgary for the solution of finite element problems is of the second kind. Its conceptual design is described and progress to date outlined. 14 references.
An object-oriented decomposition of the adaptive-hp finite element method

Energy Technology Data Exchange (ETDEWEB)

Wiley, J.C.

1994-12-13

Adaptive-hp methods are those which use a refinement control strategy driven by a local error estimate to locally modify the element size, h, and polynomial order, p. The result is an unstructured mesh in which each node may be associated with a different polynomial order and which generally require complex data structures to implement. Object-oriented design strategies and languages which support them, e.g., C++, help control the complexity of these methods. Here an overview of the major classes and class structure of an adaptive-hp finite element code is described. The essential finite element structure is described in terms of four areas of computation each with its own dynamic characteristics. Implications of converting the code for a distributed-memory parallel environment are also discussed.
Data Parallel Line Relaxation (DPLR) Code User Manual: Acadia - Version 4.01.1

Science.gov (United States)

Wright, Michael J.; White, Todd; Mangini, Nancy

2009-01-01

Data-Parallel Line Relaxation (DPLR) code is a computational fluid dynamic (CFD) solver that was developed at NASA Ames Research Center to help mission support teams generate high-value predictive solutions for hypersonic flow field problems. The DPLR Code Package is an MPI-based, parallel, full three-dimensional Navier-Stokes CFD solver with generalized models for finite-rate reaction kinetics, thermal and chemical non-equilibrium, accurate high-temperature transport coefficients, and ionized flow physics incorporated into the code. DPLR also includes a large selection of generalized realistic surface boundary conditions and links to enable loose coupling with external thermal protection system (TPS) material response and shock layer radiation codes.
Ramses-GPU: Second order MUSCL-Handcock finite volume fluid solver

Science.gov (United States)

Kestener, Pierre

2017-10-01

RamsesGPU is a reimplementation of RAMSES (ascl:1011.007) which drops the adaptive mesh refinement (AMR) features to optimize 3D uniform grid algorithms for modern graphics processor units (GPU) to provide an efficient software package for astrophysics applications that do not need AMR features but do require a very large number of integration time steps. RamsesGPU provides an very efficient C++/CUDA/MPI software implementation of a second order MUSCL-Handcock finite volume fluid solver for compressible hydrodynamics as a magnetohydrodynamics solver based on the constraint transport technique. Other useful modules includes static gravity, dissipative terms (viscosity, resistivity), and forcing source term for turbulence studies, and special care was taken to enhance parallel input/output performance by using state-of-the-art libraries such as HDF5 and parallel-netcdf.
Multigrid techniques for nonlinear eigenvalue probems: Solutions of a nonlinear Schroedinger eigenvalue problem in 2D and 3D

Science.gov (United States)

Costiner, Sorin; Taasan, Shlomo

1994-01-01

This paper presents multigrid (MG) techniques for nonlinear eigenvalue problems (EP) and emphasizes an MG algorithm for a nonlinear Schrodinger EP. The algorithm overcomes the mentioned difficulties combining the following techniques: an MG projection coupled with backrotations for separation of solutions and treatment of difficulties related to clusters of close and equal eigenvalues; MG subspace continuation techniques for treatment of the nonlinearity; an MG simultaneous treatment of the eigenvectors at the same time with the nonlinearity and with the global constraints. The simultaneous MG techniques reduce the large number of self consistent iterations to only a few or one MG simultaneous iteration and keep the solutions in a right neighborhood where the algorithm converges fast.
An efficicient data structure for three-dimensional vertex based finite volume method

Science.gov (United States)

Akkurt, Semih; Sahin, Mehmet

2017-11-01

A vertex based three-dimensional finite volume algorithm has been developed using an edge based data structure.The mesh data structure of the given algorithm is similar to ones that exist in the literature. However, the data structures are redesigned and simplied in order to fit requirements of the vertex based finite volume method. In order to increase the cache efficiency, the data access patterns for the vertex based finite volume method are investigated and these datas are packed/allocated in a way that they are close to each other in the memory. The present data structure is not limited with tetrahedrons, arbitrary polyhedrons are also supported in the mesh without putting any additional effort. Furthermore, the present data structure also supports adaptive refinement and coarsening. For the implicit and parallel implementation of the FVM algorithm, PETSc and MPI libraries are employed. The performance and accuracy of the present algorithm are tested for the classical benchmark problems by comparing the CPU time for the open source algorithms.
Modeling and Control of the Redundant Parallel Adjustment Mechanism on a Deployable Antenna Panel

Directory of Open Access Journals (Sweden)

Lili Tian

2016-10-01

Full Text Available With the aim of developing multiple input and multiple output (MIMO coupling systems with a redundant parallel adjustment mechanism on the deployable antenna panel, a structural control integrated design methodology is proposed in this paper. Firstly, the modal information from the finite element model of the structure of the antenna panel is extracted, and then the mathematical model is established with the Hamilton principle; Secondly, the discrete Linear Quadratic Regulator (LQR controller is added to the model in order to control the actuators and adjust the shape of the panel. Finally, the engineering practicality of the modeling and control method based on finite element analysis simulation is verified.
Parallel solutions of the two-group neutron diffusion equations

International Nuclear Information System (INIS)

Zee, K.S.; Turinsky, P.J.

1987-01-01

Recent efforts to adapt various numerical solution algorithms to parallel computer architectures have addressed the possibility of substantially reducing the running time of few-group neutron diffusion calculations. The authors have developed an efficient iterative parallel algorithm and an associated computer code for the rapid solution of the finite difference method representation of the two-group neutron diffusion equations on the CRAY X/MP-48 supercomputer having multi-CPUs and vector pipelines. For realistic simulation of light water reactor cores, the code employees a macroscopic depletion model with trace capability for selected fission product transients and critical boron. In addition to this, moderator and fuel temperature feedback models are also incorporated into the code. The validity of the physics models used in the code were benchmarked against qualified codes and proved accurate. This work is an extension of previous work in that various feedback effects are accounted for in the system; the entire code is structured to accommodate extensive vectorization; and an additional parallelism by multitasking is achieved not only for the solution of the matrix equations associated with the inner iterations but also for the other segments of the code, e.g., outer iterations
Automatic Thread-Level Parallelization in the Chombo AMR Library

Energy Technology Data Exchange (ETDEWEB)

Christen, Matthias; Keen, Noel; Ligocki, Terry; Oliker, Leonid; Shalf, John; Van Straalen, Brian; Williams, Samuel

2011-05-26

The increasing on-chip parallelism has some substantial implications for HPC applications. Currently, hybrid programming models (typically MPI+OpenMP) are employed for mapping software to the hardware in order to leverage the hardware?s architectural features. In this paper, we present an approach that automatically introduces thread level parallelism into Chombo, a parallel adaptive mesh refinement framework for finite difference type PDE solvers. In Chombo, core algorithms are specified in the ChomboFortran, a macro language extension to F77 that is part of the Chombo framework. This domain-specific language forms an already used target language for an automatic migration of the large number of existing algorithms into a hybrid MPI+OpenMP implementation. It also provides access to the auto-tuning methodology that enables tuning certain aspects of an algorithm to hardware characteristics. Performance measurements are presented for a few of the most relevant kernels with respect to a specific application benchmark using this technique as well as benchmark results for the entire application. The kernel benchmarks show that, using auto-tuning, up to a factor of 11 in performance was gained with 4 threads with respect to the serial reference implementation.
Modeling of fatigue crack induced nonlinear ultrasonics using a highly parallelized explicit local interaction simulation approach

Science.gov (United States)

Shen, Yanfeng; Cesnik, Carlos E. S.

2016-04-01

This paper presents a parallelized modeling technique for the efficient simulation of nonlinear ultrasonics introduced by the wave interaction with fatigue cracks. The elastodynamic wave equations with contact effects are formulated using an explicit Local Interaction Simulation Approach (LISA). The LISA formulation is extended to capture the contact-impact phenomena during the wave damage interaction based on the penalty method. A Coulomb friction model is integrated into the computation procedure to capture the stick-slip contact shear motion. The LISA procedure is coded using the Compute Unified Device Architecture (CUDA), which enables the highly parallelized supercomputing on powerful graphic cards. Both the explicit contact formulation and the parallel feature facilitates LISA's superb computational efficiency over the conventional finite element method (FEM). The theoretical formulations based on the penalty method is introduced and a guideline for the proper choice of the contact stiffness is given. The convergence behavior of the solution under various contact stiffness values is examined. A numerical benchmark problem is used to investigate the new LISA formulation and results are compared with a conventional contact finite element solution. Various nonlinear ultrasonic phenomena are successfully captured using this contact LISA formulation, including the generation of nonlinear higher harmonic responses. Nonlinear mode conversion of guided waves at fatigue cracks is also studied.
The Development of a Finite Volume Method for Modeling Sound in Coastal Ocean Environment

Energy Technology Data Exchange (ETDEWEB)

Long, Wen; Yang, Zhaoqing; Copping, Andrea E.; Jung, Ki Won; Deng, Zhiqun

2015-10-28

: As the rapid growth of marine renewable energy and off-shore wind energy, there have been concerns that the noises generated from construction and operation of the devices may interfere marine animals’ communication. In this research, a underwater sound model is developed to simulate sound prorogation generated by marine-hydrokinetic energy (MHK) devices or offshore wind (OSW) energy platforms. Finite volume and finite difference methods are developed to solve the 3D Helmholtz equation of sound propagation in the coastal environment. For finite volume method, the grid system consists of triangular grids in horizontal plane and sigma-layers in vertical dimension. A 3D sparse matrix solver with complex coefficients is formed for solving the resulting acoustic pressure field. The Complex Shifted Laplacian Preconditioner (CSLP) method is applied to efficiently solve the matrix system iteratively with MPI parallelization using a high performance cluster. The sound model is then coupled with the Finite Volume Community Ocean Model (FVCOM) for simulating sound propagation generated by human activities in a range-dependent setting, such as offshore wind energy platform constructions and tidal stream turbines. As a proof of concept, initial validation of the finite difference solver is presented for two coastal wedge problems. Validation of finite volume method will be reported separately.
Three-dimensional photoacoustic tomography based on graphics-processing-unit-accelerated finite element method.

Science.gov (United States)

Peng, Kuan; He, Ling; Zhu, Ziqiang; Tang, Jingtian; Xiao, Jiaying

2013-12-01

Compared with commonly used analytical reconstruction methods, the frequency-domain finite element method (FEM) based approach has proven to be an accurate and flexible algorithm for photoacoustic tomography. However, the FEM-based algorithm is computationally demanding, especially for three-dimensional cases. To enhance the algorithm's efficiency, in this work a parallel computational strategy is implemented in the framework of the FEM-based reconstruction algorithm using a graphic-processing-unit parallel frame named the "compute unified device architecture." A series of simulation experiments is carried out to test the accuracy and accelerating effect of the improved method. The results obtained indicate that the parallel calculation does not change the accuracy of the reconstruction algorithm, while its computational cost is significantly reduced by a factor of 38.9 with a GTX 580 graphics card using the improved method.
Parallelization of a three-dimensional whole core transport code DeCART

Energy Technology Data Exchange (ETDEWEB)

Jin Young, Cho; Han Gyu, Joo; Ha Yong, Kim; Moon-Hee, Chang [Korea Atomic Energy Research Institute, Yuseong-gu, Daejon (Korea, Republic of)

2003-07-01

Parallelization of the DeCART (deterministic core analysis based on ray tracing) code is presented that reduces the computational burden of the tremendous computing time and memory required in three-dimensional whole core transport calculations. The parallelization employs the concept of MPI grouping and the MPI/OpenMP mixed scheme as well. Since most of the computing time and memory are used in MOC (method of characteristics) and the multi-group CMFD (coarse mesh finite difference) calculation in DeCART, variables and subroutines related to these two modules are the primary targets for parallelization. Specifically, the ray tracing module was parallelized using a planar domain decomposition scheme and an angular domain decomposition scheme. The parallel performance of the DeCART code is evaluated by solving a rodded variation of the C5G7MOX three dimensional benchmark problem and a simplified three-dimensional SMART PWR core problem. In C5G7MOX problem with 24 CPUs, a speedup of maximum 21 is obtained on an IBM Regatta machine and 22 on a LINUX Cluster in the MOC kernel, which indicates good parallel performance of the DeCART code. In the simplified SMART problem, the memory requirement of about 11 GBytes in the single processor cases reduces to 940 Mbytes with 24 processors, which means that the DeCART code can now solve large core problems with affordable LINUX clusters. (authors)
Consistency between analytical and finite element predictions for safety of cylindrical pressure vessels at higher temperatures

International Nuclear Information System (INIS)

Iancu, Otto Theodor

2014-01-01

The prediction of the plastic collapse load of cylindrical pressure vessels is very often made by using expensive Finite Element computations. The calculation of the collapse load requires an elastic-plastic material model and the consideration of non-linear geometry effects. The plastic collapse load causes overall structural instability and cannot be determined directly from a Finite Element analysis. In the present paper the plastic collapse load for a cylindrical pressure vessel is determined by an analytical method based on a linear elastic perfectly plastic material model. When plasticity occurs the material is considered to be incompressible and the tensor of plastic strains to be parallel to the stress deviator tensor. In this case the finite stress-strain relationships of Henkel can be used for calculating the pressure for which plastic flow occurs. The analytical results are completely confirmed by Finite Element predictions. (orig.)
Multi-grid Beam and Warming scheme for the simulation of unsteady ...

African Journals Online (AJOL)

2010-03-08

Mar 8, 2010 ... cal methods, the implicit finite-difference method and finite element method ...... there is no explicit matrix G and there are merely a number of block matrices, the .... is usually used as an input function for flood routing analysis.
Experimental research and use of finite elements method on mechanical behaviors of honeycomb structures assembled with epoxy-based adhesives reinforced with nanoparticles

Energy Technology Data Exchange (ETDEWEB)

Akkus, Harun [Technical Sciences Vocational School, Amasya University, Amasya (Turkmenistan); Duzcukoglu, Hayrettin; Sahin, Omer Sinan [Mechanical Engineering Department, Selcuk University, Selcuk (Turkmenistan)

2017-01-15

This study utilized experimental and finite element methods to investigate the mechanical behavior of aluminum honeycomb structures under compression. Aluminum honeycomb composite structures were subjected to pressing experiments according to the standard ASTM C365. Resistive forces in response to compression and maximum compressive force values were measured. Structural damage was observed. In the honeycomb structure, the cell width decreased as the compressive force increased. Results obtained with finite element models generated using ANSYS Workbench 15 were validated. Experimental results paralleled the finite element modeling results. The ANSYS results were approximately 85 % reliable.
Domain decomposition multigrid for unstructured grids

Energy Technology Data Exchange (ETDEWEB)

Shapira, Yair

1997-01-01

A two-level preconditioning method for the solution of elliptic boundary value problems using finite element schemes on possibly unstructured meshes is introduced. It is based on a domain decomposition and a Galerkin scheme for the coarse level vertex unknowns. For both the implementation and the analysis, it is not required that the curves of discontinuity in the coefficients of the PDE match the interfaces between subdomains. Generalizations to nonmatching or overlapping grids are made.
Nonlinear and parallel algorithms for finite element discretizations of the incompressible Navier-Stokes equations

Science.gov (United States)

Arteaga, Santiago Egido

1998-12-01

The steady-state Navier-Stokes equations are of considerable interest because they are used to model numerous common physical phenomena. The applications encountered in practice often involve small viscosities and complicated domain geometries, and they result in challenging problems in spite of the vast attention that has been dedicated to them. In this thesis we examine methods for computing the numerical solution of the primitive variable formulation of the incompressible equations on distributed memory parallel computers. We use the Galerkin method to discretize the differential equations, although most results are stated so that they apply also to stabilized methods. We also reformulate some classical results in a single framework and discuss some issues frequently dismissed in the literature, such as the implementation of pressure space basis and non- homogeneous boundary values. We consider three nonlinear methods: Newton's method, Oseen's (or Picard) iteration, and sequences of Stokes problems. All these iterative nonlinear methods require solving a linear system at every step. Newton's method has quadratic convergence while that of the others is only linear; however, we obtain theoretical bounds showing that Oseen's iteration is more robust, and we confirm it experimentally. In addition, although Oseen's iteration usually requires more iterations than Newton's method, the linear systems it generates tend to be simpler and its overall costs (in CPU time) are lower. The Stokes problems result in linear systems which are easier to solve, but its convergence is much slower, so that it is competitive only for large viscosities. Inexact versions of these methods are studied, and we explain why the best timings are obtained using relatively modest error tolerances in solving the corresponding linear systems. We also present a new damping optimization strategy based on the quadratic nature of the Navier-Stokes equations, which improves the robustness of all the

Damage mapping in structural health monitoring using a multi-grid architecture

Energy Technology Data Exchange (ETDEWEB)

Mathews, V. John [Dept. of Electrical and Computer Engineering, University of Utah, Salt Lake City, UT 84112 (United States)

2015-03-31

This paper presents a multi-grid architecture for tomography-based damage mapping of composite aerospace structures. The system employs an array of piezo-electric transducers bonded on the structure. Each transducer may be used as an actuator as well as a sensor. The structure is excited sequentially using the actuators and the guided waves arriving at the sensors in response to the excitations are recorded for further analysis. The sensor signals are compared to their baseline counterparts and a damage index is computed for each actuator-sensor pair. These damage indices are then used as inputs to the tomographic reconstruction system. Preliminary damage maps are reconstructed on multiple coordinate grids defined on the structure. These grids are shifted versions of each other where the shift is a fraction of the spatial sampling interval associated with each grid. These preliminary damage maps are then combined to provide a reconstruction that is more robust to measurement noise in the sensor signals and the ill-conditioned problem formulation for single-grid algorithms. Experimental results on a composite structure with complexity that is representative of aerospace structures included in the paper demonstrate that for sufficiently high sensor densities, the algorithm of this paper is capable of providing damage detection and characterization with accuracy comparable to traditional C-scan and A-scan-based ultrasound non-destructive inspection systems quickly and without human supervision.
Pomarning-eddington approximation for time-dependent radiation transfer in finite slab media

International Nuclear Information System (INIS)

El-Wakil, S.A.; Degheidy, A.R.; Sallah, M.

2005-01-01

The time-dependent monoenergetic radiation transfer equation with linear anisotropic scattering is proposed. Pomraning-Eddington approximation is used to calculate the radiation intensity in finite plane-parallel media. Numerical results are done for the isotropic media. Shielding calculations are shown for reflectivity and transmissivity at different times. The medium is assumed to have specular-reflecting boundaries. Two different weight functions are introduced to force the boundary conditions to be fulfilled
A two-dimensional linear elasticity problem for anisotropic materials, solved with a parallelization code

Directory of Open Access Journals (Sweden)

Mihai-Victor PRICOP

2010-09-01

Full Text Available The present paper introduces a numerical approach of static linear elasticity equations for anisotropic materials. The domain and boundary conditions are simple, to enhance an easy implementation of the finite difference scheme. SOR and gradient are used to solve the resulting linear system. The simplicity of the geometry is also useful for MPI parallelization of the code.
An application of multigrid methods for a discrete elastic model for epitaxial systems

International Nuclear Information System (INIS)

Caflisch, R.E.; Lee, Y.-J.; Shu, S.; Xiao, Y.-X.; Xu, J.

2006-01-01

We apply an efficient and fast algorithm to simulate the atomistic strain model for epitaxial systems, recently introduced by Schindler et al. [Phys. Rev. B 67, 075316 (2003)]. The discrete effects in this lattice statics model are crucial for proper simulation of the influence of strain for thin film epitaxial growth, but the size of the atomistic systems of interest is in general quite large and hence the solution of the discrete elastic equations is a considerable numerical challenge. In this paper, we construct an algebraic multigrid method suitable for efficient solution of the large scale discrete strain model. Using this method, simulations are performed for several representative physical problems, including an infinite periodic step train, a layered nanocrystal, and a system of quantum dots. The results demonstrate the effectiveness and robustness of the method and show that the method attains optimal convergence properties, regardless of the problem size, the geometry and the physical parameters. The effects of substrate depth and of invariance due to traction-free boundary conditions are assessed. For a system of quantum dots, the simulated strain energy density supports the observations that trench formation near the dots provides strain relief
Risk assessment of salt contamination of groundwater under uncertain aquifer properties

KAUST Repository

Litvinenko, Alexander

2017-10-01

One of the central topics in hydrogeology and environmental science is the investigation of salinity-driven groundwater flow in heterogeneous porous media. Our goals are to model and to predict pollution of water resources. We simulate a density driven groundwater flow with uncertain porosity and permeability. This strongly non-linear model describes the unstable transport of salt water with building ‘fingers’-shaped patterns. The computation requires a very fine unstructured mesh and, therefore, high computational resources. We run the highly-parallel multigrid solver, based on ug4, on supercomputer Shaheen II. A MPI-based parallelization is done in the geometrical as well as in the stochastic spaces. Every scenario is computed on 32 cores and requires a mesh with ~8M grid points and 1500 or more time steps. 200 scenarios are computed concurrently. The total number of cores in parallel computation is 200x32=6400. The main goal of this work is to estimate propagation of uncertainties through the model, to investigate sensitivity of the solution to the input uncertain parameters. Additionally, we demonstrate how the multigrid ug4-based solver can be applied as a black-box in the uncertainty quantification framework.
Conversion of optical wave polarizations in 1D finite anisotropic photonic crystal

International Nuclear Information System (INIS)

Ouchani, N.; Nougaoui, N.; Daoudi, A.; Bria, D.

2006-07-01

We show that by using one dimensional anisotropic photonic structures, it is possible to realize optical wave polarization conversion by transmission or by reflection. Thus a single incident S(P) polarized plane wave can produce a single reflected P(S) polarized wave and a single transmitted P(S) polarized wave. This polarization conversion property can be fulfilled with a simple finite superlattice constituted by anisotropic dielectric materials. We discuss the appropriate choices of the material and geometrical properties to realize such structures. The transmission and reflection coefficients are discussed in relation with the dispersion curves of the finite structure embedded between two isotropic substrates. Both transmission and reflection coefficients are calculated in the framework of Green's function method. The amplitude and the polarization characteristics of reflected and transmitted waves are determined as function of frequency ω , and wave vector k parallel ( parallel to the interface) and the orientations of the principal axes of the layers constituting the SL. Moreover, this structure exhibits a coupling between S and P waves that does not exist in SL composed only of isotropic materials. Specific applications of these results are given for a superlattice consisting of alternating biaxial anisotropic layers NaNO 2 /SbSi sandwiched between two identical semi-infinite isotropic media. (author)
Study on Parallel Processing for Efficient Flexible Multibody Analysis based on Subsystem Synthesis Method

Energy Technology Data Exchange (ETDEWEB)

Han, Jong-Boo; Song, Hajun; Kim, Sung-Soo [Chungnam Nat’l Univ., Daejeon (Korea, Republic of)

2017-06-15

Flexible multibody simulations are widely used in the industry to design mechanical systems. In flexible multibody dynamics, deformation coordinates are described either relatively in the body reference frame that is floating in the space or in the inertial reference frame. Moreover, these deformation coordinates are generated based on the discretization of the body according to the finite element approach. Therefore, the formulation of the flexible multibody system always deals with a huge number of degrees of freedom and the numerical solution methods require a substantial amount of computational time. Parallel computational methods are a solution for efficient computation. However, most of the parallel computational methods are focused on the efficient solution of large-sized linear equations. For multibody analysis, we need to develop an efficient formulation that could be suitable for parallel computation. In this paper, we developed a subsystem synthesis method for a flexible multibody system and proposed efficient parallel computational schemes based on the OpenMP API in order to achieve efficient computation. Simulations of a rotating blade system, which consists of three identical blades, were carried out with two different parallel computational schemes. Actual CPU times were measured to investigate the efficiency of the proposed parallel schemes.
Fluid/Structure Interaction Studies of Aircraft Using High Fidelity Equations on Parallel Computers

Science.gov (United States)

Guruswamy, Guru; VanDalsem, William (Technical Monitor)

1994-01-01

Abstract Aeroelasticity which involves strong coupling of fluids, structures and controls is an important element in designing an aircraft. Computational aeroelasticity using low fidelity methods such as the linear aerodynamic flow equations coupled with the modal structural equations are well advanced. Though these low fidelity approaches are computationally less intensive, they are not adequate for the analysis of modern aircraft such as High Speed Civil Transport (HSCT) and Advanced Subsonic Transport (AST) which can experience complex flow/structure interactions. HSCT can experience vortex induced aeroelastic oscillations whereas AST can experience transonic buffet associated structural oscillations. Both aircraft may experience a dip in the flutter speed at the transonic regime. For accurate aeroelastic computations at these complex fluid/structure interaction situations, high fidelity equations such as the Navier-Stokes for fluids and the finite-elements for structures are needed. Computations using these high fidelity equations require large computational resources both in memory and speed. Current conventional super computers have reached their limitations both in memory and speed. As a result, parallel computers have evolved to overcome the limitations of conventional computers. This paper will address the transition that is taking place in computational aeroelasticity from conventional computers to parallel computers. The paper will address special techniques needed to take advantage of the architecture of new parallel computers. Results will be illustrated from computations made on iPSC/860 and IBM SP2 computer by using ENSAERO code that directly couples the Euler/Navier-Stokes flow equations with high resolution finite-element structural equations.
Analyzing logistic map pseudorandom number generators for periodicity induced by finite precision floating-point representation

International Nuclear Information System (INIS)

Persohn, K.J.; Povinelli, R.J.

2012-01-01

Highlights: ► A chaotic pseudorandom number generator (C-PRNG) poorly explores the key space. ► A C-PRNG is finite and periodic when implemented on a finite precision computer. ► We present a method to determine the period lengths of a C-PRNG. - Abstract: Because of the mixing and aperiodic properties of chaotic maps, such maps have been used as the basis for pseudorandom number generators (PRNGs). However, when implemented on a finite precision computer, chaotic maps have finite and periodic orbits. This manuscript explores the consequences finite precision has on the periodicity of a PRNG based on the logistic map. A comparison is made with conventional methods of generating pseudorandom numbers. The approach used to determine the number, delay, and period of the orbits of the logistic map at varying degrees of precision (3 to 23 bits) is described in detail, including the use of the Condor high-throughput computing environment to parallelize independent tasks of analyzing a large initial seed space. Results demonstrate that in terms of pathological seeds and effective bit length, a PRNG based on the logistic map performs exponentially worse than conventional PRNGs.
Recent progress in modelling 3D lithospheric deformation

Science.gov (United States)

Kaus, B. J. P.; Popov, A.; May, D. A.

2012-04-01

Modelling 3D lithospheric deformation remains a challenging task, predominantly because the variations in rock types, as well as nonlinearities due to for example plastic deformation result in sharp and very large jumps in effective viscosity contrast. As a result, there are only a limited number of 3D codes available, most of which are using direct solvers which are computationally and memory-wise very demanding. As a result, the resolutions for typical model runs are quite modest, despite the use of hundreds of processors (and using much larger computers is unlikely to bring much improvement in this situation). For this reason we recently developed a new 3D deformation code,called LaMEM: Lithosphere and Mantle Evolution Model. LaMEM is written on top of PETSc, and as a result it runs on massive parallel machines and we have a large number of iterative solvers available (including geometric and algebraic multigrid methods). As it remains unclear which solver combinations work best under which conditions, we have implemented most currently suggested methods (such as schur complement reduction or Fully coupled iterations). In addition, we can use either a finite element discretization (with Q1P0, stabilized Q1Q1 or Q2P-1 elements) or a staggered finite difference discretization for the same input geometry, which is based on a marker and cell technique). This gives us he flexibility to test various solver methodologies on the same model setup, in terms of accuracy, speed, memory usage etc. Here, we will report on some features of LaMEM, on recent code additions, as well as on some lessons we learned which are important for modelling 3D lithospheric deformation. Specifically we will discuss: 1) How we combine a particle-and-cell method to make it work with both a finite difference and a (lagrangian, eulerian or ALE) finite element formulation, with only minor code modifications code 2) How finite difference and finite element discretizations compare in terms of
Parallel deposition, sorting, and reordering methods in the Hybrid Ordered Plasma Simulation (HOPS) code

International Nuclear Information System (INIS)

Anderson, D.V.; Shumaker, D.E.

1993-01-01

From a computational standpoint, particle simulation calculations for plasmas have not adapted well to the transitions from scalar to vector processing nor from serial to parallel environments. They have suffered from inordinate and excessive accessing of computer memory and have been hobbled by relatively inefficient gather-scatter constructs resulting from the use of indirect indexing. Lastly, the many-to-one mapping characteristic of the deposition phase has made it difficult to perform this in parallel. The authors' code sorts and reorders the particles in a spatial order. This allows them to greatly reduce the memory references, to run in directly indexed vector mode, and to employ domain decomposition to achieve parallelization. In this hybrid simulation the electrons are modeled as a fluid and the field equations solved are obtained from the electron momentum equation together with the pre-Maxwell equations (displacement current neglected). Either zero or finite electron mass can be used in the electron model. The resulting field equations are solved with an iteratively explicit procedure which is thus trivial to parallelize. Likewise, the field interpolations and the particle pushing is simple to parallelize. The deposition, sorting, and reordering phases are less simple and it is for these that the authors present detailed algorithms. They have now successfully tested the parallel version of HOPS in serial mode and it is now being readied for parallel execution on the Cray C-90. They will then port HOPS to a massively parallel computer, in the next year
A Parallel Numerical Micromagnetic Code Using FEniCS

Science.gov (United States)

Nagy, L.; Williams, W.; Mitchell, L.

2013-12-01

Many problems in the geosciences depend on understanding the ability of magnetic minerals to provide stable paleomagnetic recordings. Numerical micromagnetic modelling allows us to calculate the domain structures found in naturally occurring magnetic materials. However the computational cost rises exceedingly quickly with respect to the size and complexity of the geometries that we wish to model. This problem is compounded by the fact that the modern processor design no longer focuses on the speed at which calculations are performed, but rather on the number of computational units amongst which we may distribute our calculations. Consequently to better exploit modern computational resources our micromagnetic simulations must "go parallel". We present a parallel and scalable micromagnetics code written using FEniCS. FEniCS is a multinational collaboration involving several institutions (University of Cambridge, University of Chicago, The Simula Research Laboratory, etc.) that aims to provide a set of tools for writing scientific software; in particular software that employs the finite element method. The advantages of this approach are the leveraging of pre-existing projects from the world of scientific computing (PETSc, Trilinos, Metis/Parmetis, etc.) and exposing these so that researchers may pose problems in a manner closer to the mathematical language of their domain. Our code provides a scriptable interface (in Python) that allows users to not only run micromagnetic models in parallel, but also to perform pre/post processing of data.
The Laguerre finite difference one-way equation solver

Science.gov (United States)

Terekhov, Andrew V.

2017-05-01

This paper presents a new finite difference algorithm for solving the 2D one-way wave equation with a preliminary approximation of a pseudo-differential operator by a system of partial differential equations. As opposed to the existing approaches, the integral Laguerre transform instead of Fourier transform is used. After carrying out the approximation of spatial variables it is possible to obtain systems of linear algebraic equations with better computing properties and to reduce computer costs for their solution. High accuracy of calculations is attained at the expense of employing finite difference approximations of higher accuracy order that are based on the dispersion-relationship-preserving method and the Richardson extrapolation in the downward continuation direction. The numerical experiments have verified that as compared to the spectral difference method based on Fourier transform, the new algorithm allows one to calculate wave fields with a higher degree of accuracy and a lower level of numerical noise and artifacts including those for non-smooth velocity models. In the context of solving the geophysical problem the post-stack migration for velocity models of the types Syncline and Sigsbee2A has been carried out. It is shown that the images obtained contain lesser noise and are considerably better focused as compared to those obtained by the known Fourier Finite Difference and Phase-Shift Plus Interpolation methods. There is an opinion that purely finite difference approaches do not allow carrying out the seismic migration procedure with sufficient accuracy, however the results obtained disprove this statement. For the supercomputer implementation it is proposed to use the parallel dichotomy algorithm when solving systems of linear algebraic equations with block-tridiagonal matrices.
Peridynamic Multiscale Finite Element Methods

Energy Technology Data Exchange (ETDEWEB)

Costa, Timothy [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Bond, Stephen D. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Littlewood, David John [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Moore, Stan Gerald [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

2015-12-01

art of local models with the flexibility and accuracy of the nonlocal peridynamic model. In the mixed locality method this coupling occurs across scales, so that the nonlocal model can be used to communicate material heterogeneity at scales inappropriate to local partial differential equation models. Additionally, the computational burden of the weak form of the peridynamic model is reduced dramatically by only requiring that the model be solved on local patches of the simulation domain which may be computed in parallel, taking advantage of the heterogeneous nature of next generation computing platforms. Addition- ally, we present a novel Galerkin framework, the 'Ambulant Galerkin Method', which represents a first step towards a unified mathematical analysis of local and nonlocal multiscale finite element methods, and whose future extension will allow the analysis of multiscale finite element methods that mix models across scales under certain assumptions of the consistency of those models.
MP Salsa: a finite element computer program for reacting flow problems. Part 1--theoretical development

Energy Technology Data Exchange (ETDEWEB)

Shadid, J.N.; Moffat, H.K.; Hutchinson, S.A.; Hennigan, G.L.; Devine, K.D.; Salinger, A.G.

1996-05-01

The theoretical background for the finite element computer program, MPSalsa, is presented in detail. MPSalsa is designed to solve laminar, low Mach number, two- or three-dimensional incompressible and variable density reacting fluid flows on massively parallel computers, using a Petrov-Galerkin finite element formulation. The code has the capability to solve coupled fluid flow, heat transport, multicomponent species transport, and finite-rate chemical reactions, and to solver coupled multiple Poisson or advection-diffusion- reaction equations. The program employs the CHEMKIN library to provide a rigorous treatment of multicomponent ideal gas kinetics and transport. Chemical reactions occurring in the gas phase and on surfaces are treated by calls to CHEMKIN and SURFACE CHEMKIN, respectively. The code employs unstructured meshes, using the EXODUS II finite element data base suite of programs for its input and output files. MPSalsa solves both transient and steady flows by using fully implicit time integration, an inexact Newton method and iterative solvers based on preconditioned Krylov methods as implemented in the Aztec solver library.
Mapping behavioral landscapes for animal movement: a finite mixture modeling approach

Science.gov (United States)

Tracey, Jeff A.; Zhu, Jun; Boydston, Erin E.; Lyren, Lisa M.; Fisher, Robert N.; Crooks, Kevin R.

2013-01-01

Because of its role in many ecological processes, movement of animals in response to landscape features is an important subject in ecology and conservation biology. In this paper, we develop models of animal movement in relation to objects or fields in a landscape. We take a finite mixture modeling approach in which the component densities are conceptually related to different choices for movement in response to a landscape feature, and the mixing proportions are related to the probability of selecting each response as a function of one or more covariates. We combine particle swarm optimization and an Expectation-Maximization (EM) algorithm to obtain maximum likelihood estimates of the model parameters. We use this approach to analyze data for movement of three bobcats in relation to urban areas in southern California, USA. A behavioral interpretation of the models revealed similarities and differences in bobcat movement response to urbanization. All three bobcats avoided urbanization by moving either parallel to urban boundaries or toward less urban areas as the proportion of urban land cover in the surrounding area increased. However, one bobcat, a male with a dispersal-like large-scale movement pattern, avoided urbanization at lower densities and responded strictly by moving parallel to the urban edge. The other two bobcats, which were both residents and occupied similar geographic areas, avoided urban areas using a combination of movements parallel to the urban edge and movement toward areas of less urbanization. However, the resident female appeared to exhibit greater repulsion at lower levels of urbanization than the resident male, consistent with empirical observations of bobcats in southern California. Using the parameterized finite mixture models, we mapped behavioral states to geographic space, creating a representation of a behavioral landscape. This approach can provide guidance for conservation planning based on analysis of animal movement data using
Solution of the within-group multidimensional discrete ordinates transport equations on massively parallel architectures

Science.gov (United States)

Zerr, Robert Joseph

2011-12-01

thousands of processors. The PGS method does outperform SI DSA for the periodic heterogeneous layers (PHL) configuration problems. Although this demonstrates a relative strength/weakness between the two methods, the practicality of these problems is much less, further limiting instances where it would be beneficial to select ITMM over SI DSA. The results strongly indicate a need for a robust, stable, and efficient acceleration method (or preconditioner for PGMRES). The spatial multigrid (SMG) method is currently incomplete in that it does not work for all cases considered and does not effectively improve the convergence rate for all values of scattering ratio c or cell dimension h. Nevertheless, it does display the desired trend for highly scattering, optically thin problems. That is, it tends to lower the rate of growth of number of iterations with increasing number of processes, P, while not increasing the number of additional operations per iteration to the extent that the total execution time of the rapidly converging accelerated iterations exceeds that of the slower unaccelerated iterations. A predictive parallel performance model has been developed for the PBJ method. Timing tests were performed such that trend lines could be fitted to the data for the different components and used to estimate the execution times. Applied to the weak scaling results, the model notably underestimates construction time, but combined with a slight overestimation in iterative solution time, the model predicts total execution time very well for large P. It also does a decent job with the strong scaling results, closely predicting the construction time and time per iteration, especially as P increases. Although not shown to be competitive up to 1,024 processing elements with the current state of the art, the parallelized ITMM exhibits promising scaling trends. Ultimately, compared to the KBA method, the parallelized ITMM may be found to be a very attractive option for transport calculations
Effects of parallel dynamics on vortex structures in electron temperature gradient driven turbulence

International Nuclear Information System (INIS)

Nakata, M.; Watanabe, T.-H.; Sugama, H.; Horton, W.

2011-01-01

Vortex structures and related heat transport properties in slab electron temperature gradient (ETG) driven turbulence are comprehensively investigated by means of nonlinear gyrokinetic Vlasov simulations, with the aim of elucidating the underlying physical mechanisms of the transition from turbulent to coherent states. Numerical results show three different types of vortex structures, i.e., coherent vortex streets accompanied with the transport reduction, turbulent vortices with steady transport, and a zonal-flow-dominated state, depending on the relative magnitude of the parallel compression to the diamagnetic drift. In particular, the formation of coherent vortex streets is correlated with the strong generation of zonal flows for the cases with weak parallel compression, even though the maximum growth rate of linear ETG modes is relatively large. The zonal flow generation in the ETG turbulence is investigated by the modulational instability analysis with a truncated fluid model, where the parallel dynamics such as acoustic modes for electrons is incorporated. The modulational instability for zonal flows is found to be stabilized by the effect of the finite parallel compression. The theoretical analysis qualitatively agrees with secondary growth of zonal flows found in the slab ETG turbulence simulations, where the transition of vortex structures is observed.
Simulation of finite-strain inelastic phenomena governed by creep and plasticity

Science.gov (United States)

Li, Zhen; Bloomfield, Max O.; Oberai, Assad A.

2017-11-01

Inelastic mechanical behavior plays an important role in many applications in science and engineering. Phenomenologically, this behavior is often modeled as plasticity or creep. Plasticity is used to represent the rate-independent component of inelastic deformation and creep is used to represent the rate-dependent component. In several applications, especially those at elevated temperatures and stresses, these processes occur simultaneously. In order to model these process, we develop a rate-objective, finite-deformation constitutive model for plasticity and creep. The plastic component of this model is based on rate-independent J_2 plasticity, and the creep component is based on a thermally activated Norton model. We describe the implementation of this model within a finite element formulation, and present a radial return mapping algorithm for it. This approach reduces the additional complexity of modeling plasticity and creep, over thermoelasticity, to just solving one nonlinear scalar equation at each quadrature point. We implement this algorithm within a multiphysics finite element code and evaluate the consistent tangent through automatic differentiation. We verify and validate the implementation, apply it to modeling the evolution of stresses in the flip chip manufacturing process, and test its parallel strong-scaling performance.
Effect of reactor finiteness on the boundary condition at the surface of a booster section

International Nuclear Information System (INIS)

Wassef, W.A.

1982-01-01

Effect of reactor finiteness on the boundary condition at the surface of an absorbing booster embedded in the reactor core is studied and formulated. The model used in these calculations depends on the Pl-Transport coupling technique. This method takes into consideration the rigorous neutron transport behavior inside the booster medium, while the Pl-approximation in the bulk of the scattering medium surrounding the booster which can be considered infinite in most practical applications. The neutron flux gradient parallel to the surface of the booster is considered. The geometrical configuration of the reactor core cross section is circular or rectangular. Finiteness of the reactor is introduced in the general formulation through its dimensions or buckling. Extensive numerical results are given to demonstrate the dependence of the boundary condition at the surface of the booster section on the reactor finiteness and the different physical parameters

Computational Challenge of Fractional Differential Equations and the Potential Solutions: A Survey

Directory of Open Access Journals (Sweden)

Chunye Gong

2015-01-01

Full Text Available We present a survey of fractional differential equations and in particular of the computational cost for their numerical solutions from the view of computer science. The computational complexities of time fractional, space fractional, and space-time fractional equations are O(N2M, O(NM2, and O(NM(M + N compared with O(MN for the classical partial differential equations with finite difference methods, where M, N are the number of space grid points and time steps. The potential solutions for this challenge include, but are not limited to, parallel computing, memory access optimization (fractional precomputing operator, short memory principle, fast Fourier transform (FFT based solutions, alternating direction implicit method, multigrid method, and preconditioner technology. The relationships of these solutions for both space fractional derivative and time fractional derivative are discussed. The authors pointed out that the technologies of parallel computing should be regarded as a basic method to overcome this challenge, and some attention should be paid to the fractional killer applications, high performance iteration methods, high order schemes, and Monte Carlo methods. Since the computation of fractional equations with high dimension and variable order is even heavier, the researchers from the area of mathematics and computer science have opportunity to invent cornerstones in the area of fractional calculus.
Parallel-fed planar dipole antenna arrays for low-observable platforms

CERN Document Server

Singh, Hema; Jha, Rakesh Mohan

2016-01-01

This book focuses on determination of scattering of parallel-fed planar dipole arrays in terms of reflection and transmission coefficients at different levels of the array system. In aerospace vehicles, the phased arrays are often in planar configuration. The radar cross section (RCS) of the vehicle is mainly due to its structure and the antennas mounted over it. There can be situation when the signatures due to antennas dominate over the structural RCS of the platform. This necessitates the study towards the reduction and control of antenna/ array RCS. The planar dipole array is considered as a stacked linear dipole array. A systematic, step-by-step approach is used to determine the RCS pattern including the finite dimensions of dipole antenna elements. The mutual impedance between the dipole elements for planar configuration is determined. The scattering till second-level of couplers in parallel feed network is taken into account. The phase shifters are modelled as delay line. All the couplers in the feed n...
Coarse-grain parallel solution of few-group neutron diffusion equations

International Nuclear Information System (INIS)

Sarsour, H.N.; Turinsky, P.J.

1991-01-01

The authors present a parallel numerical algorithm for the solution of the finite difference representation of the few-group neutron diffusion equations. The targeted architectures are multiprocessor computers with shared memory like the Cray Y-MP and the IBM 3090/VF, where coarse granularity is important for minimizing overhead. Most of the work done in the past, which attempts to exploit concurrence, has concentrated on the inner iterations of the standard outer-inner iterative strategy. This produces very fine granularity. To coarsen granularity, the authors introduce parallelism at the nested outer-inner level. The problem's spatial domain was partitioned into contiguous subregions and assigned a processor to solve for each subregion independent of all other subregions, hence, processors; i.e., each subregion is treated as a reactor core with imposed boundary conditions. Since those boundary conditions on interior surfaces, referred to as internal boundary conditions (IBCs), are not known, a third iterative level, the recomposition iterations, is introduced to communicate results between subregions
3-D electromagnetic plasma particle simulations on the Intel Delta parallel computer

International Nuclear Information System (INIS)

Wang, J.; Liewer, P.C.

1994-01-01

A three-dimensional electromagnetic PIC code has been developed on the 512 node Intel Touchstone Delta MIMD parallel computer. This code is based on the General Concurrent PIC algorithm which uses a domain decomposition to divide the computation among the processors. The 3D simulation domain can be partitioned into 1-, 2-, or 3-dimensional sub-domains. Particles must be exchanged between processors as they move among the subdomains. The Intel Delta allows one to use this code for very-large-scale simulations (i.e. over 10 8 particles and 10 6 grid cells). The parallel efficiency of this code is measured, and the overall code performance on the Delta is compared with that on Cray supercomputers. It is shown that their code runs with a high parallel efficiency of ≥ 95% for large size problems. The particle push time achieved is 115 nsecs/particle/time step for 162 million particles on 512 nodes. Comparing with the performance on a single processor Cray C90, this represents a factor of 58 speedup. The code uses a finite-difference leap frog method for field solve which is significantly more efficient than fast fourier transforms on parallel computers. The performance of this code on the 128 node Cray T3D will also be discussed
Three-dimensional magnetotelluric inversion including topography using deformed hexahedral edge finite elements, direct solvers and data space Gauss-Newton, parallelized on SMP computers

Science.gov (United States)

Kordy, M. A.; Wannamaker, P. E.; Maris, V.; Cherkaev, E.; Hill, G. J.

2014-12-01

We have developed an algorithm for 3D simulation and inversion of magnetotelluric (MT) responses using deformable hexahedral finite elements that permits incorporation of topography. Direct solvers parallelized on symmetric multiprocessor (SMP), single-chassis workstations with large RAM are used for the forward solution, parameter jacobians, and model update. The forward simulator, jacobians calculations, as well as synthetic and real data inversion are presented. We use first-order edge elements to represent the secondary electric field (E), yielding accuracy O(h) for E and its curl (magnetic field). For very low frequency or small material admittivity, the E-field requires divergence correction. Using Hodge decomposition, correction may be applied after the forward solution is calculated. It allows accurate E-field solutions in dielectric air. The system matrix factorization is computed using the MUMPS library, which shows moderately good scalability through 12 processor cores but limited gains beyond that. The factored matrix is used to calculate the forward response as well as the jacobians of field and MT responses using the reciprocity theorem. Comparison with other codes demonstrates accuracy of our forward calculations. We consider a popular conductive/resistive double brick structure and several topographic models. In particular, the ability of finite elements to represent smooth topographic slopes permits accurate simulation of refraction of electromagnetic waves normal to the slopes at high frequencies. Run time tests indicate that for meshes as large as 150x150x60 elements, MT forward response and jacobians can be calculated in ~2.5 hours per frequency. For inversion, we implemented data space Gauss-Newton method, which offers reduction in memory requirement and a significant speedup of the parameter step versus model space approach. For dense matrix operations we use tiling approach of PLASMA library, which shows very good scalability. In synthetic
Cellular Genetic Algorithm with Communicating Grids for Assembly Line Balancing Problems

Directory of Open Access Journals (Sweden)

BRUDARU, O.

2010-05-01

Full Text Available This paper presents a new approach with cellular multigrid genetic algorithms for the "I"-shaped and "U"-shaped assembly line balancing problems, including parallel workstations and compatibility constraints. First, a cellular hybrid genetic algorithm that uses a single grid is described. Appropriate operators for mutation, hypermutation, and crossover and two devoration techniques are proposed for creating and maintaining groups based on similarity. This monogrid algorithm is extended for handling many populations placed on different grids. In the multigrid version, the population of each grid is organized in clusters using the positional information of the chromosomes. A similarity preserving communication protocol between the clusters placed on different grids is introduced. The experimental evaluation shows that the multigrid cellular genetic algorithm with communicating grids is better than the hybrid genetic algorithm used for building it, whereas it dominates the monogrid version in all cases. Absolute performance is evaluated using classical benchmarks. The role of certain components of the cellular algorithm is explained and the effect of some parameters is evaluated.
An Improved Parallel DNA Algorithm of 3-SAT

Directory of Open Access Journals (Sweden)

Wei Liu

2007-09-01

Full Text Available There are many large-size and difficult computational problems in mathematics and computer science. For many of these problems, traditional computers cannot handle the mass of data in acceptable timeframes, which we call an NP problem. DNA computing is a means of solving a class of intractable computational problems in which the computing time grows exponentially with problem size. This paper proposes a parallel algorithm model for the universal 3-SAT problem based on the Adleman-Lipton model and applies biological operations to handling the mass of data in solution space. In this manner, we can control the run time of the algorithm to be finite and approximately constant.
The simplified spherical harmonics (SPL) methodology with space and moment decomposition in parallel environments

International Nuclear Information System (INIS)

Gianluca, Longoni; Alireza, Haghighat

2003-01-01

In recent years, the SP L (simplified spherical harmonics) equations have received renewed interest for the simulation of nuclear systems. We have derived the SP L equations starting from the even-parity form of the S N equations. The SP L equations form a system of (L+1)/2 second order partial differential equations that can be solved with standard iterative techniques such as the Conjugate Gradient (CG). We discretized the SP L equations with the finite-volume approach in a 3-D Cartesian space. We developed a new 3-D general code, Pensp L (Parallel Environment Neutral-particle SP L ). Pensp L solves both fixed source and criticality eigenvalue problems. In order to optimize the memory management, we implemented a Compressed Diagonal Storage (CDS) to store the SP L matrices. Pensp L includes parallel algorithms for space and moment domain decomposition. The computational load is distributed on different processors, using a mapping function, which maps the 3-D Cartesian space and moments onto processors. The code is written in Fortran 90 using the Message Passing Interface (MPI) libraries for the parallel implementation of the algorithm. The code has been tested on the Pcpen cluster and the parallel performance has been assessed in terms of speed-up and parallel efficiency. (author)
Discontinuous finite element and characteristics methods for neutrons transport equation solution in heterogeneous grids

International Nuclear Information System (INIS)

Masiello, E.

2006-01-01

The principal goal of this manuscript is devoted to the investigation of a new type of heterogeneous mesh adapted to the shape of the fuel pins (fuel-clad-moderator). The new heterogeneous mesh guarantees the spatial modelling of the pin-cell with a minimum of regions. Two methods are investigated for the spatial discretization of the transport equation: the discontinuous finite element method and the method of characteristics for structured cells. These methods together with the new representation of the pin-cell result in an appreciable reduction of calculation points. They allow an exact modelling of the fuel pin-cell without spatial homogenization. A new synthetic acceleration technique based on an angular multigrid is also presented for the speed up of the inner iterations. These methods are good candidates for transport calculations for a nuclear reactor core. A second objective of this work is the application of method of characteristics for non-structured geometries to the study of double heterogeneity problem. The letters is characterized by fuel material with a stochastic dispersion of heterogeneous grains, and until now was solved with a model based on collision probabilities. We propose a new statistical model based on renewal-Markovian theory, which makes possible to take into account the stochastic nature of the problem and to avoid the approximations of the collision probability model. The numerical solution of this model is guaranteed by the method of characteristics. (author)
Energy flow of electric dipole radiation in between parallel mirrors

Science.gov (United States)

Xu, Zhangjin; Arnoldus, Henk F.

2017-11-01

We have studied the energy flow patterns of the radiation emitted by an electric dipole located in between parallel mirrors. It appears that the field lines of the Poynting vector (the flow lines of energy) can have very intricate structures, including many singularities and vortices. The flow line patterns depend on the distance between the mirrors, the distance of the dipole to one of the mirrors and the angle of oscillation of the dipole moment with respect to the normal of the mirror surfaces. Already for the simplest case of a dipole moment oscillating perpendicular to the mirrors, singularities appear at regular intervals along the direction of propagation (parallel to the mirrors). For a parallel dipole, vortices appear in the neighbourhood of the dipole. For a dipole oscillating under a finite angle with the surface normal, the radiating tends to swirl around the dipole before travelling off parallel to the mirrors. For relatively large mirror separations, vortices appear in the pattern. When the dipole is off-centred with respect to the midway point between the mirrors, the flow line structure becomes even more complicated, with numerous vortices in the pattern, and tiny loops near the dipole. We have also investigated the locations of the vortices and singularities, and these can be found without any specific knowledge about the flow lines. This provides an independent means of studying the propagation of dipole radiation between mirrors.
Parallel Programming with Intel Parallel Studio XE

CERN Document Server

Blair-Chappell , Stephen

2012-01-01

Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the
Shapes of leaves with parallel venation. Modelling of the Epipactis sp. (Orchidaceae) leaves with the help of a system of coupled elastic beams

OpenAIRE

Jakubska-Busse, Anna; Janowicz, Maciej; Ochnio, Luiza; Jackowska-Zduniak, Beata

2016-01-01

Static properties of leaves with parallel venation, with particular emphasis on the genus EpipactisZinn, 1757 (Orchidaceae, Neottieae) have been modelled with coupled quasi-parallel elastic “beams.” The non-linear theory of strongly bended beams have been employed. The resulting boundary-value problem has been solved numerically with the help of the finite-difference method. Possible dislocations resulting in additional Dirac-delta like forces have been take into account. Morphological simila...
Parallel implementation of a Lagrangian-based model on an adaptive mesh in C++: Application to sea-ice

Science.gov (United States)

Samaké, Abdoulaye; Rampal, Pierre; Bouillon, Sylvain; Ólason, Einar

2017-12-01

We present a parallel implementation framework for a new dynamic/thermodynamic sea-ice model, called neXtSIM, based on the Elasto-Brittle rheology and using an adaptive mesh. The spatial discretisation of the model is done using the finite-element method. The temporal discretisation is semi-implicit and the advection is achieved using either a pure Lagrangian scheme or an Arbitrary Lagrangian Eulerian scheme (ALE). The parallel implementation presented here focuses on the distributed-memory approach using the message-passing library MPI. The efficiency and the scalability of the parallel algorithms are illustrated by the numerical experiments performed using up to 500 processor cores of a cluster computing system. The performance obtained by the proposed parallel implementation of the neXtSIM code is shown being sufficient to perform simulations for state-of-the-art sea ice forecasting and geophysical process studies over geographical domain of several millions squared kilometers like the Arctic region.
Finite element method for solving Kohn-Sham equations based on self-adaptive tetrahedral mesh

International Nuclear Information System (INIS)

Zhang Dier; Shen Lihua; Zhou Aihui; Gong Xingao

2008-01-01

A finite element (FE) method with self-adaptive mesh-refinement technique is developed for solving the density functional Kohn-Sham equations. The FE method adopts local piecewise polynomials basis functions, which produces sparsely structured matrices of Hamiltonian. The method is well suitable for parallel implementation without using Fourier transform. In addition, the self-adaptive mesh-refinement technique can control the computational accuracy and efficiency with optimal mesh density in different regions
Radiative Heat Transfer in Combustion Applications: Parallel Efficiencies of Two Gas Models, Turbulent Radiation Interactions in Particulate Laden Flows, and Coarse Mesh Finite Difference Acceleration for Improved Temporal Accuracy

Science.gov (United States)

Cleveland, Mathew A.

We investigate several aspects of the numerical solution of the radiative transfer equation in the context of coal combustion: the parallel efficiency of two commonly-used opacity models, the sensitivity of turbulent radiation interaction (TRI) effects to the presence of coal particulate, and an improvement of the order of temporal convergence using the coarse mesh finite difference (CMFD) method. There are four opacity models commonly employed to evaluate the radiative transfer equation in combustion applications; line-by-line (LBL), multigroup, band, and global. Most of these models have been rigorously evaluated for serial computations of a spectrum of problem types [1]. Studies of these models for parallel computations [2] are limited. We assessed the performance of the Spectral-Line-Based weighted sum of gray gasses (SLW) model, a global method related to K-distribution methods [1], and the LBL model. The LBL model directly interpolates opacity information from large data tables. The LBL model outperforms the SLW model in almost all cases, as suggested by Wang et al. [3]. The SLW model, however, shows superior parallel scaling performance and a decreased sensitivity to load imbalancing, suggesting that for some problems, global methods such as the SLW model, could outperform the LBL model. Turbulent radiation interaction (TRI) effects are associated with the differences in the time scales of the fluid dynamic equations and the radiative transfer equations. Solving on the fluid dynamic time step size produces large changes in the radiation field over the time step. We have modified the statistically homogeneous, non-premixed flame problem of Deshmukh et al. [4] to include coal-type particulate. The addition of low mass loadings of particulate minimally impacts the TRI effects. Observed differences in the TRI effects from variations in the packing fractions and Stokes numbers are difficult to analyze because of the significant effect of variations in problem
Contact-impact algorithms on parallel computers

International Nuclear Information System (INIS)

Zhong Zhihua; Nilsson, Larsgunnar

1994-01-01

Contact-impact algorithms on parallel computers are discussed within the context of explicit finite element analysis. The algorithms concerned include a contact searching algorithm and an algorithm for contact force calculations. The contact searching algorithm is based on the territory concept of the general HITA algorithm. However, no distinction is made between different contact bodies, or between different contact surfaces. All contact segments from contact boundaries are taken as a single set. Hierarchy territories and contact territories are expanded. A three-dimensional bucket sort algorithm is used to sort contact nodes. The defence node algorithm is used in the calculation of contact forces. Both the contact searching algorithm and the defence node algorithm are implemented on the connection machine CM-200. The performance of the algorithms is examined under different circumstances, and numerical results are presented. ((orig.))
An efficient flexible-order model for 3D nonlinear water waves

Science.gov (United States)

Engsig-Karup, A. P.; Bingham, H. B.; Lindberg, O.

2009-04-01

The flexible-order, finite difference based fully nonlinear potential flow model described in [H.B. Bingham, H. Zhang, On the accuracy of finite difference solutions for nonlinear water waves, J. Eng. Math. 58 (2007) 211-228] is extended to three dimensions (3D). In order to obtain an optimal scaling of the solution effort multigrid is employed to precondition a GMRES iterative solution of the discretized Laplace problem. A robust multigrid method based on Gauss-Seidel smoothing is found to require special treatment of the boundary conditions along solid boundaries, and in particular on the sea bottom. A new discretization scheme using one layer of grid points outside the fluid domain is presented and shown to provide convergent solutions over the full physical and discrete parameter space of interest. Linear analysis of the fundamental properties of the scheme with respect to accuracy, robustness and energy conservation are presented together with demonstrations of grid independent iteration count and optimal scaling of the solution effort. Calculations are made for 3D nonlinear wave problems for steep nonlinear waves and a shoaling problem which show good agreement with experimental measurements and other calculations from the literature.
An efficient flexible-order model for 3D nonlinear water waves

International Nuclear Information System (INIS)

Engsig-Karup, A.P.; Bingham, H.B.; Lindberg, O.

2009-01-01

The flexible-order, finite difference based fully nonlinear potential flow model described in [H.B. Bingham, H. Zhang, On the accuracy of finite difference solutions for nonlinear water waves, J. Eng. Math. 58 (2007) 211-228] is extended to three dimensions (3D). In order to obtain an optimal scaling of the solution effort multigrid is employed to precondition a GMRES iterative solution of the discretized Laplace problem. A robust multigrid method based on Gauss-Seidel smoothing is found to require special treatment of the boundary conditions along solid boundaries, and in particular on the sea bottom. A new discretization scheme using one layer of grid points outside the fluid domain is presented and shown to provide convergent solutions over the full physical and discrete parameter space of interest. Linear analysis of the fundamental properties of the scheme with respect to accuracy, robustness and energy conservation are presented together with demonstrations of grid independent iteration count and optimal scaling of the solution effort. Calculations are made for 3D nonlinear wave problems for steep nonlinear waves and a shoaling problem which show good agreement with experimental measurements and other calculations from the literature
Radiative transfer with finite elements. Pt. 1. Basic method and tests

Energy Technology Data Exchange (ETDEWEB)

Richling, S. [Heidelberg Univ. (Germany). Inst. fuer Theoretische Astrophysik; Meinkoehn, E. [Heidelberg Univ. (Germany). Inst. fuer Theoretische Astrophysik]|[Heidelberg Univ. (Germany). Inst. fuer Angewandte Mathematik; Kryzhevoi, N. [Heidelberg Univ. (Germany). Inst. fuer Theoretische Astrophysik]|[Heidelberg Univ. (DE). Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen (IWR); Kanschat, G. [Heidelberg Univ. (Germany). Inst. fuer Angewandte Mathematik]|[Heidelberg Univ. (DE). Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen (IWR)

2001-10-01

A finite element method for solving the monochromatic radiation transfer equation including scattering in three dimensions is presented. The algorithm employs unstructured grids which are adaptively refined. Adaptivity as well as ordinate parallelization reduce memory requirements and execution time and make it possible to calculate the radiation field across several length scales for objects with strong opacity gradients. An a posteriori error estimate for one particular quantity is obtained by solving the dual problem. The application to a sample of test problems reveals the properties of the implementation. (orig.)
Finite-element analysis and modal testing of a rotating wind turbine

Science.gov (United States)

Carne, T. G.; Lobitz, D. W.; Nord, A. R.; Watson, R. A.

1982-10-01

A finite element procedure, which includes geometric stiffening, and centrifugal and Coriolis terms resulting from the use of a rotating coordinate system, was developed to compute the mode shapes and frequencies of rotating structures. Special applications of this capability was made to Darrieus, vertical axis wind turbines. In a parallel development effort, a technique for the modal testing of a rotating vertical axis wind turbine is established to measure modal parameters directly. Results from the predictive and experimental techniques for the modal frequencies and mode shapes are compared over a wide range of rotational speeds.

A finite Hankel algorithm for intense optical beam propagation in saturable medium

International Nuclear Information System (INIS)

Bardin, C.; Babuel-Peyrissac, J.P.; Marinier, J.P.; Mattar, F.P.

1985-01-01

Many physical problems, especially light-propagation, that involve the Laplacian operator, are naturally connected with Fourier or Hankel transforms (in case of axial symmetry), which both remove the Laplacian term in the transformed space. Sometimes the analytical calculation can be handled at its end, giving a series or an integral representation of the solution. Otherwise, an analytical pre-treatment of the original equation may be done, leading to numerical computation techniques as opposed to self-adaptive stretching and rezoning techniques, which do not use Fourier or Hankel transforms. The authors present here some basic mathematical properties of infinite and finite Hankel transform, their connection with physics and their adaptation to numerical calculation. The finite Hankel transform is well-suited to numerical computation, because it deals with a finite interval, and the precision of the calculation can be easily controlled by the number of zeros of J 0 (x) to be taken. Moreover, they use a special quadrature formula which is well connected to integral conservation laws. The inconvenience of having to sum a series is reduced by the use of vectorized computers, and in the future will be still more reduced with parallel processors. A finite-Hankel code has been performed on CRAY-XMP in order to solve the propagation of a CW optical beam in a saturable absorber. For large diffractions or when a very small radial grid is required for the description of the optical field, this FHT algorithm has been found to perform better than a direct finite-difference code
Two-Level Hierarchical FEM Method for Modeling Passive Microwave Devices

Science.gov (United States)

Polstyanko, Sergey V.; Lee, Jin-Fa

1998-03-01

In recent years multigrid methods have been proven to be very efficient for solving large systems of linear equations resulting from the discretization of positive definite differential equations by either the finite difference method or theh-version of the finite element method. In this paper an iterative method of the multiple level type is proposed for solving systems of algebraic equations which arise from thep-version of the finite element analysis applied to indefinite problems. A two-levelV-cycle algorithm has been implemented and studied with a Gauss-Seidel iterative scheme used as a smoother. The convergence of the method has been investigated, and numerical results for a number of numerical examples are presented.
Periodic Boundary Conditions in the ALEGRA Finite Element Code

International Nuclear Information System (INIS)

Aidun, John B.; Robinson, Allen C.; Weatherby, Joe R.

1999-01-01

This document describes the implementation of periodic boundary conditions in the ALEGRA finite element code. ALEGRA is an arbitrary Lagrangian-Eulerian multi-physics code with both explicit and implicit numerical algorithms. The periodic boundary implementation requires a consistent set of boundary input sets which are used to describe virtual periodic regions. The implementation is noninvasive to the majority of the ALEGRA coding and is based on the distributed memory parallel framework in ALEGRA. The technique involves extending the ghost element concept for interprocessor boundary communications in ALEGRA to additionally support on- and off-processor periodic boundary communications. The user interface, algorithmic details and sample computations are given
Finite-size resonance dielectric cylinder in a rectangular waveguide

International Nuclear Information System (INIS)

Chuprina, V.N.; Khizhnyak, N.A.

1988-01-01

The problem on resonance spread of an electromagnetic wave by a dielectric circular cylinder of finite size in a rectangular waveguide is solved by a numerical-analytical method. The cylinder axes are parallel. The cylinder can be used as a resonance tuning element in accelerating SHF-sections. Problems on cutting off linear algebraic equation systems, to which relations of macroscopic electrodynamics in the integral differential form written for the concrete problem considered here are reduced by analytical transformations, are investigated in the stage of numerical analysis. Theoretical dependences of the insertion of the voltage standing wave coefficient on the generator wave length calculated for different values of problem parameters are constracted
An efficient numerical scheme for the simulation of parallel-plate active magnetic regenerators

DEFF Research Database (Denmark)

Torregrosa-Jaime, Bárbara; Corberán, José M.; Payá, Jorge

2015-01-01

A one-dimensional model of a parallel-plate active magnetic regenerator (AMR) is presented in this work. The model is based on an efficient numerical scheme which has been developed after analysing the heat transfer mechanisms in the regenerator bed. The new finite difference scheme optimally com...... to the fully implicit scheme, the proposed scheme achieves more accurate results, prevents numerical errors and requires less computational effort. In AMR simulations the new scheme can reduce the computational time by 88%....
Computation of the Spitzer function in stellarators and tokamaks with finite collisionality

Directory of Open Access Journals (Sweden)

Kernbichler Winfried

2015-01-01

Full Text Available The generalized Spitzer function, which determines the current drive efficiency in toka- maks and stellarators is modelled for finite plasma collisionality with help of the drift kinetic equation solver NEO-2 [1]. The effect of finite collisionality on the global ECCD efficiency in a tokamak is studied using results of the code NEO-2 as input to the ray tracing code TRAVIS [2]. As it is known [3], specific features of the generalized Spitzer function, which are absent in asymptotic (collisionless or highly collisional regimes result in current drive from a symmetric microwave spectrum with respect to parallel wave numbers. Due to this effect the direction of the current may become independent of the microwave beam launch angle in advanced ECCD scenarii (O2 and X3 where due to relatively low optical depth a significant amount of power is absorbed by trapped particles.
A Study on GPU Computing of Bi-conjugate Gradient Method for Finite Element Analysis of the Incompressible Navier-Stokes Equations

International Nuclear Information System (INIS)

Yoon, Jong Seon; Choi, Hyoung Gwon; Jeon, Byoung Jin; Jung, Hye Dong

2016-01-01

A parallel algorithm of bi-conjugate gradient method was developed based on CUDA for parallel computation of the incompressible Navier-Stokes equations. The governing equations were discretized using splitting P2P1 finite element method. Asymmetric stenotic flow problem was solved to validate the proposed algorithm, and then the parallel performance of the GPU was examined by measuring the elapsed times. Further, the GPU performance for sparse matrix-vector multiplication was also investigated with a matrix of fluid-structure interaction problem. A kernel was generated to simultaneously compute the inner product of each row of sparse matrix and a vector. In addition, the kernel was optimized to improve the performance by using both parallel reduction and memory coalescing. In the kernel construction, the effect of warp on the parallel performance of the present CUDA was also examined. The present GPU computation was more than 7 times faster than the single CPU by double precision.
A Study on GPU Computing of Bi-conjugate Gradient Method for Finite Element Analysis of the Incompressible Navier-Stokes Equations

Energy Technology Data Exchange (ETDEWEB)

Yoon, Jong Seon; Choi, Hyoung Gwon [Seoul Nat’l Univ. of Science and Technology, Seoul (Korea, Republic of); Jeon, Byoung Jin [Yonsei Univ., Seoul (Korea, Republic of); Jung, Hye Dong [Korea Electronics Technology Institute, Seongnam (Korea, Republic of)

2016-09-15

A parallel algorithm of bi-conjugate gradient method was developed based on CUDA for parallel computation of the incompressible Navier-Stokes equations. The governing equations were discretized using splitting P2P1 finite element method. Asymmetric stenotic flow problem was solved to validate the proposed algorithm, and then the parallel performance of the GPU was examined by measuring the elapsed times. Further, the GPU performance for sparse matrix-vector multiplication was also investigated with a matrix of fluid-structure interaction problem. A kernel was generated to simultaneously compute the inner product of each row of sparse matrix and a vector. In addition, the kernel was optimized to improve the performance by using both parallel reduction and memory coalescing. In the kernel construction, the effect of warp on the parallel performance of the present CUDA was also examined. The present GPU computation was more than 7 times faster than the single CPU by double precision.
A non overlapping parallel domain decomposition method applied to the simplified transport equations

International Nuclear Information System (INIS)

Lathuiliere, B.; Barrault, M.; Ramet, P.; Roman, J.

2009-01-01

A reactivity computation requires to compute the highest eigenvalue of a generalized eigenvalue problem. An inverse power algorithm is used commonly. Very fine modelizations are difficult to tackle for our sequential solver, based on the simplified transport equations, in terms of memory consumption and computational time. So, we propose a non-overlapping domain decomposition method for the approximate resolution of the linear system to solve at each inverse power iteration. Our method brings to a low development effort as the inner multigroup solver can be re-use without modification, and allows us to adapt locally the numerical resolution (mesh, finite element order). Numerical results are obtained by a parallel implementation of the method on two different cases with a pin by pin discretization. This results are analyzed in terms of memory consumption and parallel efficiency. (authors)
Application Of Multi-grid Method On China Seas' Temperature Forecast

Science.gov (United States)

Li, W.; Xie, Y.; He, Z.; Liu, K.; Han, G.; Ma, J.; Li, D.

2006-12-01

Correlation scales have been used in traditional scheme of 3-dimensional variational (3D-Var) data assimilation to estimate the background error covariance for the numerical forecast and reanalysis of atmosphere and ocean for decades. However there are still some drawbacks of this scheme. First, the correlation scales are difficult to be determined accurately. Second, the positive definition of the first-guess error covariance matrix cannot be guaranteed unless the correlation scales are sufficiently small. Xie et al. (2005) indicated that a traditional 3D-Var only corrects some certain wavelength errors and its accuracy depends on the accuracy of the first-guess covariance. And in general, short wavelength error can not be well corrected until long one is corrected and then inaccurate first-guess covariance may mistakenly take long wave error as short wave ones and result in erroneous analysis. For the purpose of quickly minimizing the errors of long and short waves successively, a new 3D-Var data assimilation scheme, called multi-grid data assimilation scheme, is proposed in this paper. By assimilating the shipboard SST and temperature profiles data into a numerical model of China Seas, we applied this scheme in two-month data assimilation and forecast experiment which ended in a favorable result. Comparing with the traditional scheme of 3D-Var, the new scheme has higher forecast accuracy and a lower forecast Root-Mean-Square (RMS) error. Furthermore, this scheme was applied to assimilate the SST of shipboard, AVHRR Pathfinder Version 5.0 SST and temperature profiles at the same time, and a ten-month forecast experiment on sea temperature of China Seas was carried out, in which a successful forecast result was obtained. Particularly, the new scheme is demonstrated a great numerical efficiency in these analyses.
Nambu-Jona-Lasinio model in a parallel electromagnetic field

Science.gov (United States)

Wang, Lingxiao; Cao, Gaoqing; Huang, Xu-Guang; Zhuang, Pengfei

2018-05-01

We explore the features of the UA (1) and chiral symmetry breaking of the Nambu-Jona-Lasinio model without the Kobayashi-Maskawa-'t Hooft determinant term in the presence of a parallel electromagnetic field. We show that the electromagnetic chiral anomaly can induce both finite neutral pion condensate and isospin-singlet pseudo-scalar η condensate and thus modifies the chiral symmetry breaking pattern. In order to characterize the strength of the UA (1) symmetry breaking, we evaluate the susceptibility associated with the UA (1) charge. The result shows that the susceptibility contributed from the chiral anomaly is consistent with the behavior of the corresponding η condensate. The spectra of the mesonic excitations are also studied.
A Computational Fluid Dynamics Algorithm on a Massively Parallel Computer

Science.gov (United States)

Jespersen, Dennis C.; Levit, Creon

1989-01-01

The discipline of computational fluid dynamics is demanding ever-increasing computational power to deal with complex fluid flow problems. We investigate the performance of a finite-difference computational fluid dynamics algorithm on a massively parallel computer, the Connection Machine. Of special interest is an implicit time-stepping algorithm; to obtain maximum performance from the Connection Machine, it is necessary to use a nonstandard algorithm to solve the linear systems that arise in the implicit algorithm. We find that the Connection Machine ran achieve very high computation rates on both explicit and implicit algorithms. The performance of the Connection Machine puts it in the same class as today's most powerful conventional supercomputers.
Parallel application of plasma equilibrium fitting based on inhomogeneous platforms

International Nuclear Information System (INIS)

Liao Min; Zhang Jinhua; Chen Liaoyuan; Li Yongge; Pan Wei; Pan Li

2008-01-01

An online analysis and online display platform EFIT, which is based on the equilibrium-fitting mode, is inducted in this paper. This application can realize large data transportation between inhomogeneous platforms by designing a communication mechanism using sockets. It spends approximately one minute to complete the equilibrium fitting reconstruction by using a finite state machine to describe the management node and several node computers of cluster system to fulfill the parallel computation, this satisfies the online display during the discharge interval. An effective communication model between inhomogeneous platforms is provided, which could transport the computing results from Linux platform to Windows platform for online analysis and display. (authors)
Selective maintenance for multi-state series–parallel systems under economic dependence

International Nuclear Information System (INIS)

Dao, Cuong D.; Zuo, Ming J.; Pandey, Mayank

2014-01-01

This paper presents a study on selective maintenance for multi-state series–parallel systems with economically dependent components. In the selective maintenance problem, the maintenance manager has to decide which components should receive maintenance activities within a finite break between missions. All the system reliabilities in the next operating mission, the available budget and the maintenance time for each component from its current state to a higher state are taken into account in the optimization models. In addition, the components in series–parallel systems are considered to be economically dependent. Time and cost savings will be achieved when several components are simultaneously repaired in a selective maintenance strategy. As the number of repaired components increases, the saved time and cost will also increase due to the share of setting up between components and another additional reduction amount resulting from the repair of multiple identical components. Different optimization models are derived to find the best maintenance strategy for multi-state series–parallel systems. A genetic algorithm is used to solve the optimization models. The decision makers may select different components to be repaired to different working states based on the maintenance objective, resource availabilities and how dependent the repair time and cost of each component are
Compressibility effects on ideal and kinetic ballooning modes and elimination of finite Larmor radius stabilization

International Nuclear Information System (INIS)

Kotschenreuther, M.

1985-07-01

The dynamics of ideal and kinetic ballooning modes are considered analytically including parallel ion dynamics, but without electron dissipation. For ideal modes, parallel dynamics predominantly determine the growth rate when β is within approx.30% of the ideal threshold, resulting in a substantial reduction in growth rate. Compressibility also eliminates the stabilization effects of finite Larmor radius (FLR); FLR effects (when temperature gradients are neglected) can even increase the growth rate above the MHD value. Temperature gradients accentuate this by adding a new source of free energy independent of the MHD drive, in this region of ballooning coordinate corresponding in MHD to the continuum. Analytic dispersion relations are derived demonstrating the effects above; the formalism emphasizes the similarities between the ideal MHD and kinetic cases
Parallel time domain solvers for electrically large transient scattering problems

KAUST Repository

Liu, Yang

2014-09-26

Marching on in time (MOT)-based integral equation solvers represent an increasingly appealing avenue for analyzing transient electromagnetic interactions with large and complex structures. MOT integral equation solvers for analyzing electromagnetic scattering from perfect electrically conducting objects are obtained by enforcing electric field boundary conditions and implicitly time advance electric surface current densities by iteratively solving sparse systems of equations at all time steps. Contrary to finite difference and element competitors, these solvers apply to nonlinear and multi-scale structures comprising geometrically intricate and deep sub-wavelength features residing atop electrically large platforms. Moreover, they are high-order accurate, stable in the low- and high-frequency limits, and applicable to conducting and penetrable structures represented by highly irregular meshes. This presentation reviews some recent advances in the parallel implementations of time domain integral equation solvers, specifically those that leverage multilevel plane-wave time-domain algorithm (PWTD) on modern manycore computer architectures including graphics processing units (GPUs) and distributed memory supercomputers. The GPU-based implementation achieves at least one order of magnitude speedups compared to serial implementations while the distributed parallel implementation are highly scalable to thousands of compute-nodes. A distributed parallel PWTD kernel has been adopted to solve time domain surface/volume integral equations (TDSIE/TDVIE) for analyzing transient scattering from large and complex-shaped perfectly electrically conducting (PEC)/dielectric objects involving ten million/tens of millions of spatial unknowns.
Moving magnets in a micromagnetic finite-difference framework

Science.gov (United States)

Rissanen, Ilari; Laurson, Lasse

2018-05-01

We present a method and an implementation for smooth linear motion in a finite-difference-based micromagnetic simulation code, to be used in simulating magnetic friction and other phenomena involving moving microscale magnets. Our aim is to accurately simulate the magnetization dynamics and relative motion of magnets while retaining high computational speed. To this end, we combine techniques for fast scalar potential calculation and cubic b-spline interpolation, parallelizing them on a graphics processing unit (GPU). The implementation also includes the possibility of explicitly simulating eddy currents in the case of conducting magnets. We test our implementation by providing numerical examples of stick-slip motion of thin films pulled by a spring and the effect of eddy currents on the switching time of magnetic nanocubes.
Parallel computation of electrostatic potentials and fields in technical geometries on SUPRENUM

International Nuclear Information System (INIS)

Alef, M.

1990-02-01

The programs EPOTZR und EFLDZR have been developed in order to compute electrostatic potentials and the corresponding fields in technical geometries (example: Diode geometry for optimum focussing of ion beams in pulsed high-current ion diodes). The Poisson equation is discretized in a two-dimensional boundary-fitted grid in the (r,z)-plane and solved using multigrid methods. The z- and r-components of the field are determined by numerical differentiation of the potential. This report contains the user's guide of the SUPRENUM versions EPOTZR-P and EFLDZR-P. (orig./HP) [de
Finite element and finite difference methods in electromagnetic scattering

CERN Document Server

Morgan, MA

2013-01-01

This second volume in the Progress in Electromagnetic Research series examines recent advances in computational electromagnetics, with emphasis on scattering, as brought about by new formulations and algorithms which use finite element or finite difference techniques. Containing contributions by some of the world's leading experts, the papers thoroughly review and analyze this rapidly evolving area of computational electromagnetics. Covering topics ranging from the new finite-element based formulation for representing time-harmonic vector fields in 3-D inhomogeneous media using two coupled sca
Parallel Numerical Simulations of Water Reservoirs

Science.gov (United States)

Torres, Pedro; Mangiavacchi, Norberto

2010-11-01

The study of the water flow and scalar transport in water reservoirs is important for the determination of the water quality during the initial stages of the reservoir filling and during the life of the reservoir. For this scope, a parallel 2D finite element code for solving the incompressible Navier-Stokes equations coupled with scalar transport was implemented using the message-passing programming model, in order to perform simulations of hidropower water reservoirs in a computer cluster environment. The spatial discretization is based on the MINI element that satisfies the Babuska-Brezzi (BB) condition, which provides sufficient conditions for a stable mixed formulation. All the distributed data structures needed in the different stages of the code, such as preprocessing, solving and post processing, were implemented using the PETSc library. The resulting linear systems for the velocity and the pressure fields were solved using the projection method, implemented by an approximate block LU factorization. In order to increase the parallel performance in the solution of the linear systems, we employ the static condensation method for solving the intermediate velocity at vertex and centroid nodes separately. We compare performance results of the static condensation method with the approach of solving the complete system. In our tests the static condensation method shows better performance for large problems, at the cost of an increased memory usage. Performance results for other intensive parts of the code in a computer cluster are also presented.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.