Koldan, Jelena; Puzyrev, Vladimir; de la Puente, Josep; Houzeaux, Guillaume; Cela, José María
2014-06-01
We present an elaborate preconditioning scheme for Krylov subspace methods which has been developed to improve the performance and reduce the execution time of parallel node-based finite-element (FE) solvers for 3-D electromagnetic (EM) numerical modelling in exploration geophysics. This new preconditioner is based on algebraic multigrid (AMG) that uses different basic relaxation methods, such as Jacobi, symmetric successive over-relaxation (SSOR) and Gauss-Seidel, as smoothers and the wave front algorithm to create groups, which are used for a coarse-level generation. We have implemented and tested this new preconditioner within our parallel nodal FE solver for 3-D forward problems in EM induction geophysics. We have performed series of experiments for several models with different conductivity structures and characteristics to test the performance of our AMG preconditioning technique when combined with biconjugate gradient stabilized method. The results have shown that, the more challenging the problem is in terms of conductivity contrasts, ratio between the sizes of grid elements and/or frequency, the more benefit is obtained by using this preconditioner. Compared to other preconditioning schemes, such as diagonal, SSOR and truncated approximate inverse, the AMG preconditioner greatly improves the convergence of the iterative solver for all tested models. Also, when it comes to cases in which other preconditioners succeed to converge to a desired precision, AMG is able to considerably reduce the total execution time of the forward-problem code-up to an order of magnitude. Furthermore, the tests have confirmed that our AMG scheme ensures grid-independent rate of convergence, as well as improvement in convergence regardless of how big local mesh refinements are. In addition, AMG is designed to be a black-box preconditioner, which makes it easy to use and combine with different iterative methods. Finally, it has proved to be very practical and efficient in the
A survey of parallel multigrid algorithms
Chan, Tony F.; Tuminaro, Ray S.
1987-01-01
A typical multigrid algorithm applied to well-behaved linear-elliptic partial-differential equations (PDEs) is described. Criteria for designing and evaluating parallel algorithms are presented. Before evaluating the performance of some parallel multigrid algorithms, consideration is given to some theoretical complexity results for solving PDEs in parallel and for executing the multigrid algorithm. The effect of mapping and load imbalance on the partial efficiency of the algorithm is studied.
Analysis of a parallel multigrid algorithm
Chan, Tony F.; Tuminaro, Ray S.
1989-01-01
The parallel multigrid algorithm of Frederickson and McBryan (1987) is considered. This algorithm uses multiple coarse-grid problems (instead of one problem) in the hope of accelerating convergence and is found to have a close relationship to traditional multigrid methods. Specifically, the parallel coarse-grid correction operator is identical to a traditional multigrid coarse-grid correction operator, except that the mixing of high and low frequencies caused by aliasing error is removed. Appropriate relaxation operators can be chosen to take advantage of this property. Comparisons between the standard multigrid and the new method are made.
Parallel Element Agglomeration Algebraic Multigrid and Upscaling Library
Energy Technology Data Exchange (ETDEWEB)
2017-10-24
ParELAG is a parallel C++ library for numerical upscaling of finite element discretizations and element-based algebraic multigrid solvers. It provides optimal complexity algorithms to build multilevel hierarchies and solvers that can be used for solving a wide class of partial differential equations (elliptic, hyperbolic, saddle point problems) on general unstructured meshes. Additionally, a novel multilevel solver for saddle point problems with divergence constraint is implemented.
High Performance Parallel Multigrid Algorithms for Unstructured Grids
Frederickson, Paul O.
1996-01-01
We describe a high performance parallel multigrid algorithm for a rather general class of unstructured grid problems in two and three dimensions. The algorithm PUMG, for parallel unstructured multigrid, is related in structure to the parallel multigrid algorithm PSMG introduced by McBryan and Frederickson, for they both obtain a higher convergence rate through the use of multiple coarse grids. Another reason for the high convergence rate of PUMG is its smoother, an approximate inverse developed by Baumgardner and Frederickson.
Energy Technology Data Exchange (ETDEWEB)
Milind Deo; Chung-Kan Huang; Huabing Wang
2008-08-31
volume of injection at lower rates. However, if oil production can be continued at high water cuts, the discounted cumulative production usually favors higher production rates. The workflow developed during the project was also used to perform multiphase simulations in heterogeneous, fracture-matrix systems. Compositional and thermal-compositional simulators were developed for fractured reservoirs using the generalized framework. The thermal-compositional simulator was based on a novel 'equation-alignment' approach that helped choose the correct variables to solve depending on the number of phases present and the prescribed component partitioning. The simulators were used in steamflooding and in insitu combustion applications. The framework was constructed to be inherently parallel. The partitioning routines employed in the framework allowed generalized partitioning on highly complex fractured reservoirs and in instances when wells (incorporated in these models as line sources) were divided between two or more processors.
Semi-coarsening multigrid methods for parallel computing
Energy Technology Data Exchange (ETDEWEB)
Jones, J.E.
1996-12-31
Standard multigrid methods are not well suited for problems with anisotropic coefficients which can occur, for example, on grids that are stretched to resolve a boundary layer. There are several different modifications of the standard multigrid algorithm that yield efficient methods for anisotropic problems. In the paper, we investigate the parallel performance of these multigrid algorithms. Multigrid algorithms which work well for anisotropic problems are based on line relaxation and/or semi-coarsening. In semi-coarsening multigrid algorithms a grid is coarsened in only one of the coordinate directions unlike standard or full-coarsening multigrid algorithms where a grid is coarsened in each of the coordinate directions. When both semi-coarsening and line relaxation are used, the resulting multigrid algorithm is robust and automatic in that it requires no knowledge of the nature of the anisotropy. This is the basic multigrid algorithm whose parallel performance we investigate in the paper. The algorithm is currently being implemented on an IBM SP2 and its performance is being analyzed. In addition to looking at the parallel performance of the basic semi-coarsening algorithm, we present algorithmic modifications with potentially better parallel efficiency. One modification reduces the amount of computational work done in relaxation at the expense of using multiple coarse grids. This modification is also being implemented with the aim of comparing its performance to that of the basic semi-coarsening algorithm.
A multigrid solution method for mixed hybrid finite elements
Energy Technology Data Exchange (ETDEWEB)
Schmid, W. [Universitaet Augsburg (Germany)
1996-12-31
We consider the multigrid solution of linear equations arising within the discretization of elliptic second order boundary value problems of the form by mixed hybrid finite elements. Using the equivalence of mixed hybrid finite elements and non-conforming nodal finite elements, we construct a multigrid scheme for the corresponding non-conforming finite elements, and, by this equivalence, for the mixed hybrid finite elements, following guidelines from Arbogast/Chen. For a rectangular triangulation of the computational domain, this non-conforming schemes are the so-called nodal finite elements. We explicitly construct prolongation and restriction operators for this type of non-conforming finite elements. We discuss the use of plain multigrid and the multilevel-preconditioned cg-method and compare their efficiency in numerical tests.
Mapping robust parallel multigrid algorithms to scalable memory architectures
Overman, Andrea; Vanrosendale, John
1993-01-01
The convergence rate of standard multigrid algorithms degenerates on problems with stretched grids or anisotropic operators. The usual cure for this is the use of line or plane relaxation. However, multigrid algorithms based on line and plane relaxation have limited and awkward parallelism and are quite difficult to map effectively to highly parallel architectures. Newer multigrid algorithms that overcome anisotropy through the use of multiple coarse grids rather than relaxation are better suited to massively parallel architectures because they require only simple point-relaxation smoothers. In this paper, we look at the parallel implementation of a V-cycle multiple semicoarsened grid (MSG) algorithm on distributed-memory architectures such as the Intel iPSC/860 and Paragon computers. The MSG algorithms provide two levels of parallelism: parallelism within the relaxation or interpolation on each grid and across the grids on each multigrid level. Both levels of parallelism must be exploited to map these algorithms effectively to parallel architectures. This paper describes a mapping of an MSG algorithm to distributed-memory architectures that demonstrates how both levels of parallelism can be exploited. The result is a robust and effective multigrid algorithm for distributed-memory machines.
Parallel multigrid smoothing: polynomial versus Gauss-Seidel
International Nuclear Information System (INIS)
Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray
2003-01-01
Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines
Parallel multigrid smoothing: polynomial versus Gauss-Seidel
Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray
2003-07-01
Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines.
Analysis of multigrid methods on massively parallel computers: Architectural implications
Matheson, Lesley R.; Tarjan, Robert E.
1993-01-01
We study the potential performance of multigrid algorithms running on massively parallel computers with the intent of discovering whether presently envisioned machines will provide an efficient platform for such algorithms. We consider the domain parallel version of the standard V cycle algorithm on model problems, discretized using finite difference techniques in two and three dimensions on block structured grids of size 10(exp 6) and 10(exp 9), respectively. Our models of parallel computation were developed to reflect the computing characteristics of the current generation of massively parallel multicomputers. These models are based on an interconnection network of 256 to 16,384 message passing, 'workstation size' processors executing in an SPMD mode. The first model accomplishes interprocessor communications through a multistage permutation network. The communication cost is a logarithmic function which is similar to the costs in a variety of different topologies. The second model allows single stage communication costs only. Both models were designed with information provided by machine developers and utilize implementation derived parameters. With the medium grain parallelism of the current generation and the high fixed cost of an interprocessor communication, our analysis suggests an efficient implementation requires the machine to support the efficient transmission of long messages, (up to 1000 words) or the high initiation cost of a communication must be significantly reduced through an alternative optimization technique. Furthermore, with variable length message capability, our analysis suggests the low diameter multistage networks provide little or no advantage over a simple single stage communications network.
The Mixed Finite Element Multigrid Method for Stokes Equations
Muzhinji, K.; Shateyi, S.; Motsa, S. S.
2015-01-01
The stable finite element discretization of the Stokes problem produces a symmetric indefinite system of linear algebraic equations. A variety of iterative solvers have been proposed for such systems in an attempt to construct efficient, fast, and robust solution techniques. This paper investigates one of such iterative solvers, the geometric multigrid solver, to find the approximate solution of the indefinite systems. The main ingredient of the multigrid method is the choice of an appropriate smoothing strategy. This study considers the application of different smoothers and compares their effects in the overall performance of the multigrid solver. We study the multigrid method with the following smoothers: distributed Gauss Seidel, inexact Uzawa, preconditioned MINRES, and Braess-Sarazin type smoothers. A comparative study of the smoothers shows that the Braess-Sarazin smoothers enhance good performance of the multigrid method. We study the problem in a two-dimensional domain using stable Hood-Taylor Q 2-Q 1 pair of finite rectangular elements. We also give the main theoretical convergence results. We present the numerical results to demonstrate the efficiency and robustness of the multigrid method and confirm the theoretical results. PMID:25945361
A parallel version of a multigrid algorithm for isotropic transport equations
International Nuclear Information System (INIS)
Manteuffel, T.; McCormick, S.; Yang, G.; Morel, J.; Oliveira, S.
1994-01-01
The focus of this paper is on a parallel algorithm for solving the transport equations in a slab geometry using multigrid. The spatial discretization scheme used is a finite element method called the modified linear discontinuous (MLD) scheme. The MLD scheme represents a lumped version of the standard linear discontinuous (LD) scheme. The parallel algorithm was implemented on the Connection Machine 2 (CM2). Convergence rates and timings for this algorithm on the CM2 and Cray-YMP are shown
Trottenberg, Ulrich; Schuller, Anton
2000-01-01
Multigrid presents both an elementary introduction to multigrid methods for solving partial differential equations and a contemporary survey of advanced multigrid techniques and real-life applications.Multigrid methods are invaluable to researchers in scientific disciplines including physics, chemistry, meteorology, fluid and continuum mechanics, geology, biology, and all engineering disciplines. They are also becoming increasingly important in economics and financial mathematics.Readers are presented with an invaluable summary covering 25 years of practical experience acquired by the multigrid research group at the Germany National Research Center for Information Technology. The book presents both practical and theoretical points of view.* Covers the whole field of multigrid methods from its elements up to the most advanced applications* Style is essentially elementary but mathematically rigorous* No other book is so comprehensive and written for both practitioners and students
Efficient relaxed-Jacobi smoothers for multigrid on parallel computers
Yang, Xiang; Mittal, Rajat
2017-03-01
In this Technical Note, we present a family of Jacobi-based multigrid smoothers suitable for the solution of discretized elliptic equations. These smoothers are based on the idea of scheduled-relaxation Jacobi proposed recently by Yang & Mittal (2014) [18] and employ two or three successive relaxed Jacobi iterations with relaxation factors derived so as to maximize the smoothing property of these iterations. The performance of these new smoothers measured in terms of convergence acceleration and computational workload, is assessed for multi-domain implementations typical of parallelized solvers, and compared to the lexicographic point Gauss-Seidel smoother. The tests include the geometric multigrid method on structured grids as well as the algebraic grid method on unstructured grids. The tests demonstrate that unlike Gauss-Seidel, the convergence of these Jacobi-based smoothers is unaffected by domain decomposition, and furthermore, they outperform the lexicographic Gauss-Seidel by factors that increase with domain partition count.
A Parallel Algebraic Multigrid Solver on Graphics Processing Units
Haase, Gundolf
2010-01-01
The paper presents a multi-GPU implementation of the preconditioned conjugate gradient algorithm with an algebraic multigrid preconditioner (PCG-AMG) for an elliptic model problem on a 3D unstructured grid. An efficient parallel sparse matrix-vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster and a multi-GPU configuration with eight GPUs is about 100 times faster than a typical server CPU core. © 2010 Springer-Verlag.
Large-Scale Parallel Viscous Flow Computations using an Unstructured Multigrid Algorithm
Mavriplis, Dimitri J.
1999-01-01
The development and testing of a parallel unstructured agglomeration multigrid algorithm for steady-state aerodynamic flows is discussed. The agglomeration multigrid strategy uses a graph algorithm to construct the coarse multigrid levels from the given fine grid, similar to an algebraic multigrid approach, but operates directly on the non-linear system using the FAS (Full Approximation Scheme) approach. The scalability and convergence rate of the multigrid algorithm are examined on the SGI Origin 2000 and the Cray T3E. An argument is given which indicates that the asymptotic scalability of the multigrid algorithm should be similar to that of its underlying single grid smoothing scheme. For medium size problems involving several million grid points, near perfect scalability is obtained for the single grid algorithm, while only a slight drop-off in parallel efficiency is observed for the multigrid V- and W-cycles, using up to 128 processors on the SGI Origin 2000, and up to 512 processors on the Cray T3E. For a large problem using 25 million grid points, good scalability is observed for the multigrid algorithm using up to 1450 processors on a Cray T3E, even when the coarsest grid level contains fewer points than the total number of processors.
Asynchronous Task-Based Parallelization of Algebraic Multigrid
AlOnazi, Amani A.
2017-06-23
As processor clock rates become more dynamic and workloads become more adaptive, the vulnerability to global synchronization that already complicates programming for performance in today\\'s petascale environment will be exacerbated. Algebraic multigrid (AMG), the solver of choice in many large-scale PDE-based simulations, scales well in the weak sense, with fixed problem size per node, on tightly coupled systems when loads are well balanced and core performance is reliable. However, its strong scaling to many cores within a node is challenging. Reducing synchronization and increasing concurrency are vital adaptations of AMG to hybrid architectures. Recent communication-reducing improvements to classical additive AMG by Vassilevski and Yang improve concurrency and increase communication-computation overlap, while retaining convergence properties close to those of standard multiplicative AMG, but remain bulk synchronous.We extend the Vassilevski and Yang additive AMG to asynchronous task-based parallelism using a hybrid MPI+OmpSs (from the Barcelona Supercomputer Center) within a node, along with MPI for internode communications. We implement a tiling approach to decompose the grid hierarchy into parallel units within task containers. We compare against the MPI-only BoomerAMG and the Auxiliary-space Maxwell Solver (AMS) in the hypre library for the 3D Laplacian operator and the electromagnetic diffusion, respectively. In time to solution for a full solve an MPI-OmpSs hybrid improves over an all-MPI approach in strong scaling at full core count (32 threads per single Haswell node of the Cray XC40) and maintains this per node advantage as both weak scale to thousands of cores, with MPI between nodes.
International Nuclear Information System (INIS)
Solchenbach, K.; Thole, C.A.; Trottenberg, U.
1987-01-01
For a wide class of problems in scientific computing, in particular for partial differential equations, the multigrid principle has proved to yield highly efficient numerical methods. However, the principle has to be applied carefully: if the multigrid components are not chosen adequately with respect to the given problem, the efficiency may be much smaller than possible. This has been demonstrated for many practical problems. Unfortunately, the general theories on multigrid convergence do not give much help in constructing really efficient multigrid algorithms. Although some progress has been made in bridging the gap between theory and practice during the last few years, there are still several theoretical approaches which are misleading rather than helpful with respect to the objective of real efficiency. The research in finding highly efficient algorithms for non-model applications therefore is still a sophisticated mixture of theoretical considerations, a transfer of experiences from model to real life problems and systematical experimental work. The emphasis of the practical research activity today lies - among others - in the following fields: - finding efficient multigrid components for really complex problems, - combining the multigrid approach with advanced discretizative techniques: - constructing highly parallel multigrid algorithms. In this paper, we want to deal mainly with the last topic
Lv, X.; Zhao, Y.; Huang, X. Y.; Xia, G. H.; Su, X. H.
2007-07-01
A new three-dimensional (3D) matrix-free implicit unstructured multigrid finite volume (FV) solver for structural dynamics is presented in this paper. The solver is first validated using classical 2D and 3D cantilever problems. It is shown that very accurate predictions of the fundamental natural frequencies of the problems can be obtained by the solver with fast convergence rates. This method has been integrated into our existing FV compressible solver [X. Lv, Y. Zhao, et al., An efficient parallel/unstructured-multigrid preconditioned implicit method for simulating 3d unsteady compressible flows with moving objects, Journal of Computational Physics 215(2) (2006) 661-690] based on the immersed membrane method (IMM) [X. Lv, Y. Zhao, et al., as mentioned above]. Results for the interaction between the fluid and an immersed fixed-free cantilever are also presented to demonstrate the potential of this integrated fluid-structure interaction approach.
Massively Parallel Finite Element Programming
Heister, Timo
2010-01-01
Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
Massively Parallel Finite Element Programming
Heister, Timo; Kronbichler, Martin; Bangerth, Wolfgang
2010-01-01
Today's large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
A multigrid algorithm for the cell-centered finite difference scheme
Ewing, Richard E.; Shen, Jian
1993-01-01
In this article, we discuss a non-variational V-cycle multigrid algorithm based on the cell-centered finite difference scheme for solving a second-order elliptic problem with discontinuous coefficients. Due to the poor approximation property of piecewise constant spaces and the non-variational nature of our scheme, one step of symmetric linear smoothing in our V-cycle multigrid scheme may fail to be a contraction. Again, because of the simple structure of the piecewise constant spaces, prolongation and restriction are trivial; we save significant computation time with very promising computational results.
Czech Academy of Sciences Publication Activity Database
Bauer, Petr; Klement, V.; Oberhuber, T.; Žabka, V.
2016-01-01
Roč. 200, March (2016), s. 50-56 ISSN 0010-4655 R&D Projects: GA ČR GB14-36566G Institutional support: RVO:61388998 Keywords : Navier–Stokes equations * mixed finite elements * multigrid * Vanka-type smoothers * Gauss–Seidel * red–black coloring * parallelization * GPU Subject RIV: BK - Fluid Dynamics Impact factor: 3.936, year: 2016
Multigrid Finite Element Method in Calculation of 3D Homogeneous and Composite Solids
Directory of Open Access Journals (Sweden)
A.D. Matveev
2016-12-01
Full Text Available In the present paper, a method of multigrid finite elements to calculate elastic three-dimensional homogeneous and composite solids under static loading has been suggested. The method has been developed based on the finite element method algorithms using homogeneous and composite three-dimensional multigrid finite elements (MFE. The procedures for construction of MFE of both rectangular parallelepiped and complex shapes have been shown. The advantages of MFE are that they take into account, following the rules of the microapproach, heterogeneous and microhomogeneous structures of the bodies, describe the three-dimensional stress-strain state (without any simplifying hypotheses in homogeneous and composite solids, as well as generate small dimensional discrete models and numerical solutions with a high accuracy.
Hano, Mitsuo; Hotta, Masashi
A new multigrid method based on high-order vector finite elements is proposed in this paper. Low level discretizations in this method are obtained by using low-order vector finite elements for the same mesh. Gauss-Seidel method is used as a smoother, and a linear equation of lowest level is solved by ICCG method. But it is often found that multigrid solutions do not converge into ICCG solutions. An elimination algolithm of constant term using a null space of the coefficient matrix is also described. In three dimensional magnetostatic field analysis, convergence time and number of iteration of this multigrid method are discussed with the convectional ICCG method.
Distance-two interpolation for parallel algebraic multigrid
International Nuclear Information System (INIS)
Sterck, H de; Falgout, R D; Nolting, J W; Yang, U M
2007-01-01
In this paper we study the use of long distance interpolation methods with the low complexity coarsening algorithm PMIS. AMG performance and scalability is compared for classical as well as long distance interpolation methods on parallel computers. It is shown that the increased interpolation accuracy largely restores the scalability of AMG convergence factors for PMIS-coarsened grids, and in combination with complexity reducing methods, such as interpolation truncation, one obtains a class of parallel AMG methods that enjoy excellent scalability properties on large parallel computers
A Parallel Multigrid Solver for Viscous Flows on Anisotropic Structured Grids
Prieto, Manuel; Montero, Ruben S.; Llorente, Ignacio M.; Bushnell, Dennis M. (Technical Monitor)
2001-01-01
This paper presents an efficient parallel multigrid solver for speeding up the computation of a 3-D model that treats the flow of a viscous fluid over a flat plate. The main interest of this simulation lies in exhibiting some basic difficulties that prevent optimal multigrid efficiencies from being achieved. As the computing platform, we have used Coral, a Beowulf-class system based on Intel Pentium processors and equipped with GigaNet cLAN and switched Fast Ethernet networks. Our study not only examines the scalability of the solver but also includes a performance evaluation of Coral where the investigated solver has been used to compare several of its design choices, namely, the interconnection network (GigaNet versus switched Fast-Ethernet) and the node configuration (dual nodes versus single nodes). As a reference, the performance results have been compared with those obtained with the NAS-MG benchmark.
Lavery, N.; Taylor, C.
1999-07-01
Multigrid and iterative methods are used to reduce the solution time of the matrix equations which arise from the finite element (FE) discretisation of the time-independent equations of motion of the incompressible fluid in turbulent motion. Incompressible flow is solved by using the method of reduce interpolation for the pressure to satisfy the Brezzi-Babuska condition. The k-l model is used to complete the turbulence closure problem. The non-symmetric iterative matrix methods examined are the methods of least squares conjugate gradient (LSCG), biconjugate gradient (BCG), conjugate gradient squared (CGS), and the biconjugate gradient squared stabilised (BCGSTAB). The multigrid algorithm applied is based on the FAS algorithm of Brandt, and uses two and three levels of grids with a V-cycling schedule. These methods are all compared to the non-symmetric frontal solver. Copyright
Codd, A. L.; Gross, L.
2018-03-01
We present a new inversion method for Electrical Resistivity Tomography which, in contrast to established approaches, minimizes the cost function prior to finite element discretization for the unknown electric conductivity and electric potential. Minimization is performed with the Broyden-Fletcher-Goldfarb-Shanno method (BFGS) in an appropriate function space. BFGS is self-preconditioning and avoids construction of the dense Hessian which is the major obstacle to solving large 3-D problems using parallel computers. In addition to the forward problem predicting the measurement from the injected current, the so-called adjoint problem also needs to be solved. For this problem a virtual current is injected through the measurement electrodes and an adjoint electric potential is obtained. The magnitude of the injected virtual current is equal to the misfit at the measurement electrodes. This new approach has the advantage that the solution process of the optimization problem remains independent to the meshes used for discretization and allows for mesh adaptation during inversion. Computation time is reduced by using superposition of pole loads for the forward and adjoint problems. A smoothed aggregation algebraic multigrid (AMG) preconditioned conjugate gradient is applied to construct the potentials for a given electric conductivity estimate and for constructing a first level BFGS preconditioner. Through the additional reuse of AMG operators and coarse grid solvers inversion time for large 3-D problems can be reduced further. We apply our new inversion method to synthetic survey data created by the resistivity profile representing the characteristics of subsurface fluid injection. We further test it on data obtained from a 2-D surface electrode survey on Heron Island, a small tropical island off the east coast of central Queensland, Australia.
A Parallel Algebraic Multigrid Solver on Graphics Processing Units
Haase, Gundolf; Liebmann, Manfred; Douglas, Craig C.; Plank, Gernot
2010-01-01
-vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster
A parallel finite-difference method for computational aerodynamics
International Nuclear Information System (INIS)
Swisshelm, J.M.
1989-01-01
A finite-difference scheme for solving complex three-dimensional aerodynamic flow on parallel-processing supercomputers is presented. The method consists of a basic flow solver with multigrid convergence acceleration, embedded grid refinements, and a zonal equation scheme. Multitasking and vectorization have been incorporated into the algorithm. Results obtained include multiprocessed flow simulations from the Cray X-MP and Cray-2. Speedups as high as 3.3 for the two-dimensional case and 3.5 for segments of the three-dimensional case have been achieved on the Cray-2. The entire solver attained a factor of 2.7 improvement over its unitasked version on the Cray-2. The performance of the parallel algorithm on each machine is analyzed. 14 refs
Adaptive parallel multigrid for Euler and incompressible Navier-Stokes equations
Energy Technology Data Exchange (ETDEWEB)
Trottenberg, U.; Oosterlee, K.; Ritzdorf, H. [and others
1996-12-31
The combination of (1) very efficient solution methods (Multigrid), (2) adaptivity, and (3) parallelism (distributed memory) clearly is absolutely necessary for future oriented numerics but still regarded as extremely difficult or even unsolved. We show that very nice results can be obtained for real life problems. Our approach is straightforward (based on {open_quotes}MLAT{close_quotes}). But, of course, reasonable refinement and load-balancing strategies have to be used. Our examples are 2D, but 3D is on the way.
Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained Parallelization
Energy Technology Data Exchange (ETDEWEB)
Clark, M. A. [NVIDIA Corp., Santa Clara; Joó, Bálint [Jefferson Lab; Strelchenko, Alexei [Fermilab; Cheng, Michael [Boston U., Ctr. Comp. Sci.; Gambhir, Arjun [William-Mary Coll.; Brower, Richard [Boston U.
2016-12-22
The past decade has witnessed a dramatic acceleration of lattice quantum chromodynamics calculations in nuclear and particle physics. This has been due to both significant progress in accelerating the iterative linear solvers using multi-grid algorithms, and due to the throughput improvements brought by GPUs. Deploying hierarchical algorithms optimally on GPUs is non-trivial owing to the lack of parallelism on the coarse grids, and as such, these advances have not proved multiplicative. Using the QUDA library, we demonstrate that by exposing all sources of parallelism that the underlying stencil problem possesses, and through appropriate mapping of this parallelism to the GPU architecture, we can achieve high efficiency even for the coarsest of grids. Results are presented for the Wilson-Clover discretization, where we demonstrate up to 10x speedup over present state-of-the-art GPU-accelerated methods on Titan. Finally, we look to the future, and consider the software implications of our findings.
The finite volume element (FVE) and multigrid method for the incompressible Navier-Stokes equations
International Nuclear Information System (INIS)
Gu Lizhen; Bao Weizhu
1992-01-01
The authors apply FVE method to discrete INS equations with the original variable, in which the bilinear square finite element and the square finite volume are chosen. The discrete schemes of INS equations are presented. The FMV multigrid algorithm is applied to solve that discrete system, where DGS iteration is used as smoother, DGS distributive mode for the INS discrete system is also presented. The sample problems for the square cavity flow with Reynolds number Re≤100 are successfully calculated. The numerical solutions show that the results with 1 FMV is satisfactory and when Re is not large, The FVE discrete scheme of the conservative INS equations and that of non-conservative INS equations with linearization both can provide almost same accuracy
Multigrid solution of the convection-diffusion equation with high-Reynolds number
Energy Technology Data Exchange (ETDEWEB)
Zhang, Jun [George Washington Univ., Washington, DC (United States)
1996-12-31
A fourth-order compact finite difference scheme is employed with the multigrid technique to solve the variable coefficient convection-diffusion equation with high-Reynolds number. Scaled inter-grid transfer operators and potential on vectorization and parallelization are discussed. The high-order multigrid method is unconditionally stable and produces solution of 4th-order accuracy. Numerical experiments are included.
Segmental Refinement: A Multigrid Technique for Data Locality
Energy Technology Data Exchange (ETDEWEB)
Adams, Mark [Columbia Univ., New York, NY (United States). Applied Physics and Applied Mathematics Dept.; Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
2014-10-27
We investigate a technique - segmental refinement (SR) - proposed by Brandt in the 1970s as a low memory multigrid method. The technique is attractive for modern computer architectures because it provides high data locality, minimizes network communication, is amenable to loop fusion, and is naturally highly parallel and asynchronous. The network communication minimization property was recognized by Brandt and Diskin in 1994; we continue this work by developing a segmental refinement method for a finite volume discretization of the 3D Laplacian on massively parallel computers. An understanding of the asymptotic complexities, required to maintain textbook multigrid efficiency, are explored experimentally with a simple SR method. A two-level memory model is developed to compare the asymptotic communication complexity of a proposed SR method with traditional parallel multigrid. Performance and scalability are evaluated with a Cray XC30 with up to 64K cores. We achieve modest improvement in scalability from traditional parallel multigrid with a simple SR implementation.
Finite volume multigrid method of the planar contraction flow of a viscoelastic fluid
Moatssime, H. Al; Esselaoui, D.; Hakim, A.; Raghay, S.
2001-08-01
This paper reports on a numerical algorithm for the steady flow of viscoelastic fluid. The conservative and constitutive equations are solved using the finite volume method (FVM) with a hybrid scheme for the velocities and first-order upwind approximation for the viscoelastic stress. A non-uniform staggered grid system is used. The iterative SIMPLE algorithm is employed to relax the coupled momentum and continuity equations. The non-linear algebraic equations over the flow domain are solved iteratively by the symmetrical coupled Gauss-Seidel (SCGS) method. In both, the full approximation storage (FAS) multigrid algorithm is used. An Oldroyd-B fluid model was selected for the calculation. Results are reported for planar 4:1 abrupt contraction at various Weissenberg numbers. The solutions are found to be stable and smooth. The solutions show that at high Weissenberg number the domain must be long enough. The convergence of the method has been verified with grid refinement. All the calculations have been performed on a PC equipped with a Pentium III processor at 550 MHz. Copyright
An evaluation of parallel multigrid as a solver and a preconditioner for singular perturbed problems
Energy Technology Data Exchange (ETDEWEB)
Oosterlee, C.W. [Inst. for Algorithms and Scientific Computing, Sankt Augustin (Germany); Washio, T. [C& C Research Lab., Sankt Augustin (Germany)
1996-12-31
In this paper we try to achieve h-independent convergence with preconditioned GMRES and BiCGSTAB for 2D singular perturbed equations. Three recently developed multigrid methods are adopted as a preconditioner. They are also used as solution methods in order to compare the performance of the methods as solvers and as preconditioners. Two of the multigrid methods differ only in the transfer operators. One uses standard matrix- dependent prolongation operators from. The second uses {open_quotes}upwind{close_quotes} prolongation operators, developed. Both employ the Galerkin coarse grid approximation and an alternating zebra line Gauss-Seidel smoother. The third method is based on the block LU decomposition of a matrix and on an approximate Schur complement. This multigrid variant is presented in. All three multigrid algorithms are algebraic methods.
Abstract Level Parallelization of Finite Difference Methods
Directory of Open Access Journals (Sweden)
Edwin Vollebregt
1997-01-01
Full Text Available A formalism is proposed for describing finite difference calculations in an abstract way. The formalism consists of index sets and stencils, for characterizing the structure of sets of data items and interactions between data items (“neighbouring relations”. The formalism provides a means for lifting programming to a more abstract level. This simplifies the tasks of performance analysis and verification of correctness, and opens the way for automaticcode generation. The notation is particularly useful in parallelization, for the systematic construction of parallel programs in a process/channel programming paradigm (e.g., message passing. This is important because message passing, unfortunately, still is the only approach that leads to acceptable performance for many more unstructured or irregular problems on parallel computers that have non-uniform memory access times. It will be shown that the use of index sets and stencils greatly simplifies the determination of which data must be exchanged between different computing processes.
Directory of Open Access Journals (Sweden)
A. Averbuch
1994-01-01
Full Text Available Parallel elliptic single/multigrid solutions around an aligned and nonaligned body are presented and implemented on two multi-user and single-user shared memory multiprocessors (Sequent Symmetry and MOS and on a distributed memory multiprocessor (a Transputer network. Our parallel implementation uses the Virtual Machine for Muli-Processors (VMMP, a software package that provides a coherent set of services for explicitly parallel application programs running on diverse multiple instruction multiple data (MIMD multiprocessors, both shared memory and message passing. VMMP is intended to simplify parallel program writing and to promote portable and efficient programming. Furthermore, it ensures high portability of application programs by implementing the same services on all target multiprocessors. The performance of our algorithm is investigated in detail. It is seen to fit well the above architectures when the number of processors is less than the maximal number of grid points along the axes. In general, the efficiency in the nonaligned case is higher than in the aligned case. Alignment overhead is observed to be up to 200% in the shared-memory case and up to 65% in the message-passing case. We have demonstrated that when using VMMP, the portability of the algorithms is straightforward and efficient.
Coco, Armando; Russo, Giovanni
2018-05-01
In this paper we propose a second-order accurate numerical method to solve elliptic problems with discontinuous coefficients (with general non-homogeneous jumps in the solution and its gradient) in 2D and 3D. The method consists of a finite-difference method on a Cartesian grid in which complex geometries (boundaries and interfaces) are embedded, and is second order accurate in the solution and the gradient itself. In order to avoid the drop in accuracy caused by the discontinuity of the coefficients across the interface, two numerical values are assigned on grid points that are close to the interface: a real value, that represents the numerical solution on that grid point, and a ghost value, that represents the numerical solution extrapolated from the other side of the interface, obtained by enforcing the assigned non-homogeneous jump conditions on the solution and its flux. The method is also extended to the case of matrix coefficient. The linear system arising from the discretization is solved by an efficient multigrid approach. Unlike the 1D case, grid points are not necessarily aligned with the normal derivative and therefore suitable stencils must be chosen to discretize interface conditions in order to achieve second order accuracy in the solution and its gradient. A proper treatment of the interface conditions will allow the multigrid to attain the optimal convergence factor, comparable with the one obtained by Local Fourier Analysis for rectangular domains. The method is robust enough to handle large jump in the coefficients: order of accuracy, monotonicity of the errors and good convergence factor are maintained by the scheme.
Toward robust scalable algebraic multigrid solvers
International Nuclear Information System (INIS)
Waisman, Haim; Schroder, Jacob; Olson, Luke; Hiriyur, Badri; Gaidamour, Jeremie; Siefert, Christopher; Hu, Jonathan Joseph; Tuminaro, Raymond Stephen
2010-01-01
This talk highlights some multigrid challenges that arise from several application areas including structural dynamics, fluid flow, and electromagnetics. A general framework is presented to help introduce and understand algebraic multigrid methods based on energy minimization concepts. Connections between algebraic multigrid prolongators and finite element basis functions are made to explored. It is shown how the general algebraic multigrid framework allows one to adapt multigrid ideas to a number of different situations. Examples are given corresponding to linear elasticity and specifically in the solution of linear systems associated with extended finite elements for fracture problems.
Adaptive Algebraic Multigrid for Finite Element Elliptic Equations with Random Coefficients
Energy Technology Data Exchange (ETDEWEB)
Kalchev, D
2012-04-02
This thesis presents a two-grid algorithm based on Smoothed Aggregation Spectral Element Agglomeration Algebraic Multigrid (SA-{rho}AMGe) combined with adaptation. The aim is to build an efficient solver for the linear systems arising from discretization of second-order elliptic partial differential equations (PDEs) with stochastic coefficients. Examples include PDEs that model subsurface flow with random permeability field. During a Markov Chain Monte Carlo (MCMC) simulation process, that draws PDE coefficient samples from a certain distribution, the PDE coefficients change, hence the resulting linear systems to be solved change. At every such step the system (discretized PDE) needs to be solved and the computed solution used to evaluate some functional(s) of interest that then determine if the coefficient sample is acceptable or not. The MCMC process is hence computationally intensive and requires the solvers used to be efficient and fast. This fact that at every step of MCMC the resulting linear system changes, makes an already existing solver built for the old problem perhaps not as efficient for the problem corresponding to the new sampled coefficient. This motivates the main goal of our study, namely, to adapt an already existing solver to handle the problem (with changed coefficient) with the objective to achieve this goal to be faster and more efficient than building a completely new solver from scratch. Our approach utilizes the local element matrices (for the problem with changed coefficients) to build local problems associated with constructed by the method agglomerated elements (a set of subdomains that cover the given computational domain). We solve a generalized eigenproblem for each set in a subspace spanned by the previous local coarse space (used for the old solver) and a vector, component of the error, that the old solver cannot handle. A portion of the spectrum of these local eigen-problems (corresponding to eigenvalues close to zero) form the
Womack, James C; Anton, Lucian; Dziedzic, Jacek; Hasnip, Phil J; Probert, Matt I J; Skylaris, Chris-Kriton
2018-03-13
The solution of the Poisson equation is a crucial step in electronic structure calculations, yielding the electrostatic potential-a key component of the quantum mechanical Hamiltonian. In recent decades, theoretical advances and increases in computer performance have made it possible to simulate the electronic structure of extended systems in complex environments. This requires the solution of more complicated variants of the Poisson equation, featuring nonhomogeneous dielectric permittivities, ionic concentrations with nonlinear dependencies, and diverse boundary conditions. The analytic solutions generally used to solve the Poisson equation in vacuum (or with homogeneous permittivity) are not applicable in these circumstances, and numerical methods must be used. In this work, we present DL_MG, a flexible, scalable, and accurate solver library, developed specifically to tackle the challenges of solving the Poisson equation in modern large-scale electronic structure calculations on parallel computers. Our solver is based on the multigrid approach and uses an iterative high-order defect correction method to improve the accuracy of solutions. Using two chemically relevant model systems, we tested the accuracy and computational performance of DL_MG when solving the generalized Poisson and Poisson-Boltzmann equations, demonstrating excellent agreement with analytic solutions and efficient scaling to ∼10 9 unknowns and 100s of CPU cores. We also applied DL_MG in actual large-scale electronic structure calculations, using the ONETEP linear-scaling electronic structure package to study a 2615 atom protein-ligand complex with routinely available computational resources. In these calculations, the overall execution time with DL_MG was not significantly greater than the time required for calculations using a conventional FFT-based solver.
Implementation of a high performance parallel finite element micromagnetics package
International Nuclear Information System (INIS)
Scholz, W.; Suess, D.; Dittrich, R.; Schrefl, T.; Tsiantos, V.; Forster, H.; Fidler, J.
2004-01-01
A new high performance scalable parallel finite element micromagnetics package has been implemented. It includes solvers for static energy minimization, time integration of the Landau-Lifshitz-Gilbert equation, and the nudged elastic band method
3D magnetospheric parallel hybrid multi-grid method applied to planet–plasma interactions
Energy Technology Data Exchange (ETDEWEB)
Leclercq, L., E-mail: ludivine.leclercq@latmos.ipsl.fr [LATMOS/IPSL, UVSQ Université Paris-Saclay, UPMC Univ. Paris 06, CNRS, Guyancourt (France); Modolo, R., E-mail: ronan.modolo@latmos.ipsl.fr [LATMOS/IPSL, UVSQ Université Paris-Saclay, UPMC Univ. Paris 06, CNRS, Guyancourt (France); Leblanc, F. [LATMOS/IPSL, UPMC Univ. Paris 06 Sorbonne Universités, UVSQ, CNRS, Paris (France); Hess, S. [ONERA, Toulouse (France); Mancini, M. [LUTH, Observatoire Paris-Meudon (France)
2016-03-15
We present a new method to exploit multiple refinement levels within a 3D parallel hybrid model, developed to study planet–plasma interactions. This model is based on the hybrid formalism: ions are kinetically treated whereas electrons are considered as a inertia-less fluid. Generally, ions are represented by numerical particles whose size equals the volume of the cells. Particles that leave a coarse grid subsequently entering a refined region are split into particles whose volume corresponds to the volume of the refined cells. The number of refined particles created from a coarse particle depends on the grid refinement rate. In order to conserve velocity distribution functions and to avoid calculations of average velocities, particles are not coalesced. Moreover, to ensure the constancy of particles' shape function sizes, the hybrid method is adapted to allow refined particles to move within a coarse region. Another innovation of this approach is the method developed to compute grid moments at interfaces between two refinement levels. Indeed, the hybrid method is adapted to accurately account for the special grid structure at the interfaces, avoiding any overlapping grid considerations. Some fundamental test runs were performed to validate our approach (e.g. quiet plasma flow, Alfven wave propagation). Lastly, we also show a planetary application of the model, simulating the interaction between Jupiter's moon Ganymede and the Jovian plasma.
Energy Technology Data Exchange (ETDEWEB)
Kim, S. [Purdue Univ., West Lafayette, IN (United States)
1994-12-31
Parallel iterative procedures based on domain decomposition techniques are defined and analyzed for the numerical solution of wave propagation by finite element and finite difference methods. For finite element methods, in a Lagrangian framework, an efficient way for choosing the algorithm parameter as well as the algorithm convergence are indicated. Some heuristic arguments for finding the algorithm parameter for finite difference schemes are addressed. Numerical results are presented to indicate the effectiveness of the methods.
Measuring Communication in Parallel Communicating Finite Automata
Directory of Open Access Journals (Sweden)
Henning Bordihn
2014-05-01
Full Text Available Systems of deterministic finite automata communicating by sending their states upon request are investigated, when the amount of communication is restricted. The computational power and decidability properties are studied for the case of returning centralized systems, when the number of necessary communications during the computations of the system is bounded by a function depending on the length of the input. It is proved that an infinite hierarchy of language families exists, depending on the number of messages sent during their most economical recognitions. Moreover, several properties are shown to be not semi-decidable for the systems under consideration.
Element-topology-independent preconditioners for parallel finite element computations
Park, K. C.; Alexander, Scott
1992-01-01
A family of preconditioners for the solution of finite element equations are presented, which are element-topology independent and thus can be applicable to element order-free parallel computations. A key feature of the present preconditioners is the repeated use of element connectivity matrices and their left and right inverses. The properties and performance of the present preconditioners are demonstrated via beam and two-dimensional finite element matrices for implicit time integration computations.
A finite element solution method for quadrics parallel computer
International Nuclear Information System (INIS)
Zucchini, A.
1996-08-01
A distributed preconditioned conjugate gradient method for finite element analysis has been developed and implemented on a parallel SIMD Quadrics computer. The main characteristic of the method is that it does not require any actual assembling of all element equations in a global system. The physical domain of the problem is partitioned in cells of n p finite elements and each cell element is assigned to a different node of an n p -processors machine. Element stiffness matrices are stored in the data memory of the assigned processing node and the solution process is completely executed in parallel at element level. Inter-element and therefore inter-processor communications are required once per iteration to perform local sums of vector quantities between neighbouring elements. A prototype implementation has been tested on an 8-nodes Quadrics machine in a simple 2D benchmark problem
Multigrid methods in structural mechanics
Raju, I. S.; Bigelow, C. A.; Taasan, S.; Hussaini, M. Y.
1986-01-01
Although the application of multigrid methods to the equations of elasticity has been suggested, few such applications have been reported in the literature. In the present work, multigrid techniques are applied to the finite element analysis of a simply supported Bernoulli-Euler beam, and various aspects of the multigrid algorithm are studied and explained in detail. In this study, six grid levels were used to model half the beam. With linear prolongation and sequential ordering, the multigrid algorithm yielded results which were of machine accuracy with work equivalent to 200 standard Gauss-Seidel iterations on the fine grid. Also with linear prolongation and sequential ordering, the V(1,n) cycle with n greater than 2 yielded better convergence rates than the V(n,1) cycle. The restriction and prolongation operators were derived based on energy principles. Conserving energy during the inter-grid transfers required that the prolongation operator be the transpose of the restriction operator, and led to improved convergence rates. With energy-conserving prolongation and sequential ordering, the multigrid algorithm yielded results of machine accuracy with a work equivalent to 45 Gauss-Seidel iterations on the fine grid. The red-black ordering of relaxations yielded solutions of machine accuracy in a single V(1,1) cycle, which required work equivalent to about 4 iterations on the finest grid level.
The multigrid preconditioned conjugate gradient method
Tatebe, Osamu
1993-01-01
A multigrid preconditioned conjugate gradient method (MGCG method), which uses the multigrid method as a preconditioner of the PCG method, is proposed. The multigrid method has inherent high parallelism and improves convergence of long wavelength components, which is important in iterative methods. By using this method as a preconditioner of the PCG method, an efficient method with high parallelism and fast convergence is obtained. First, it is considered a necessary condition of the multigrid preconditioner in order to satisfy requirements of a preconditioner of the PCG method. Next numerical experiments show a behavior of the MGCG method and that the MGCG method is superior to both the ICCG method and the multigrid method in point of fast convergence and high parallelism. This fast convergence is understood in terms of the eigenvalue analysis of the preconditioned matrix. From this observation of the multigrid preconditioner, it is realized that the MGCG method converges in very few iterations and the multigrid preconditioner is a desirable preconditioner of the conjugate gradient method.
A parallel adaptive finite difference algorithm for petroleum reservoir simulation
Energy Technology Data Exchange (ETDEWEB)
Hoang, Hai Minh
2005-07-01
Adaptive finite differential for problems arising in simulation of flow in porous medium applications are considered. Such methods have been proven useful for overcoming limitations of computational resources and improving the resolution of the numerical solutions to a wide range of problems. By local refinement of the computational mesh where it is needed to improve the accuracy of solutions, yields better solution resolution representing more efficient use of computational resources than is possible with traditional fixed-grid approaches. In this thesis, we propose a parallel adaptive cell-centered finite difference (PAFD) method for black-oil reservoir simulation models. This is an extension of the adaptive mesh refinement (AMR) methodology first developed by Berger and Oliger (1984) for the hyperbolic problem. Our algorithm is fully adaptive in time and space through the use of subcycling, in which finer grids are advanced at smaller time steps than the coarser ones. When coarse and fine grids reach the same advanced time level, they are synchronized to ensure that the global solution is conservative and satisfy the divergence constraint across all levels of refinement. The material in this thesis is subdivided in to three overall parts. First we explain the methodology and intricacies of AFD scheme. Then we extend a finite differential cell-centered approximation discretization to a multilevel hierarchy of refined grids, and finally we are employing the algorithm on parallel computer. The results in this work show that the approach presented is robust, and stable, thus demonstrating the increased solution accuracy due to local refinement and reduced computing resource consumption. (Author)
Finite element electromagnetic field computation on the Sequent Symmetry 81 parallel computer
International Nuclear Information System (INIS)
Ratnajeevan, S.; Hoole, H.
1990-01-01
Finite element field analysis algorithms lend themselves to parallelization and this fact is exploited in this paper to implement a finite element analysis program for electromagnetic field computation on the Sequent Symmetry 81 parallel computer with three processors. In terms of waiting time, the maximum gains are to be made in matrix solution and therefore this paper concentrates on the gains in parallelizing the solution part of finite element analysis. An outline of how parallelization could be exploited in most finite element operations is given in this paper although the actual implemention of parallelism on the Sequent Symmetry 81 parallel computer was in sparsity computation, matrix assembly and the matrix solution areas. In all cases, the algorithms were modified suit the parallel programming application rather than allowing the compiler to parallelize on existing algorithms
Byun, Chansup; Guruswamy, Guru P.; Kutler, Paul (Technical Monitor)
1994-01-01
In recent years significant advances have been made for parallel computers in both hardware and software. Now parallel computers have become viable tools in computational mechanics. Many application codes developed on conventional computers have been modified to benefit from parallel computers. Significant speedups in some areas have been achieved by parallel computations. For single-discipline use of both fluid dynamics and structural dynamics, computations have been made on wing-body configurations using parallel computers. However, only a limited amount of work has been completed in combining these two disciplines for multidisciplinary applications. The prime reason is the increased level of complication associated with a multidisciplinary approach. In this work, procedures to compute aeroelasticity on parallel computers using direct coupling of fluid and structural equations will be investigated for wing-body configurations. The parallel computer selected for computations is an Intel iPSC/860 computer which is a distributed-memory, multiple-instruction, multiple data (MIMD) computer with 128 processors. In this study, the computational efficiency issues of parallel integration of both fluid and structural equations will be investigated in detail. The fluid and structural domains will be modeled using finite-difference and finite-element approaches, respectively. Results from the parallel computer will be compared with those from the conventional computers using a single processor. This study will provide an efficient computational tool for the aeroelastic analysis of wing-body structures on MIMD type parallel computers.
Parallel direct solver for finite element modeling of manufacturing processes
DEFF Research Database (Denmark)
Nielsen, Chris Valentin; Martins, P.A.F.
2017-01-01
The central processing unit (CPU) time is of paramount importance in finite element modeling of manufacturing processes. Because the most significant part of the CPU time is consumed in solving the main system of equations resulting from finite element assemblies, different approaches have been...
Advanced Algebraic Multigrid Solvers for Subsurface Flow Simulation
Chen, Meng-Huo; Sun, Shuyu; Salama, Amgad
2015-01-01
and issues will be addressed and the corresponding remedies will be studied. As the multigrid methods are used as the linear solver, the simulator can be parallelized (although not trivial) and the high-resolution simulation become feasible, the ultimately
Toward textbook multigrid efficiency for fully implicit resistive magnetohydrodynamics
International Nuclear Information System (INIS)
Adams, Mark F.; Samtaney, Ravi; Brandt, Achi
2013-01-01
Multigrid methods can solve some classes of elliptic and parabolic equations to accuracy below the truncation error with a work-cost equivalent to a few residual calculations so-called textbook multigrid efficiency. We investigate methods to solve the system of equations that arise in time dependent magnetohydrodynamics (MHD) simulations with textbook multigrid efficiency. We apply multigrid techniques such as geometric interpolation, full approximate storage, Gauss-Seidel smoothers, and defect correction for fully implicit, nonlinear, second-order finite volume discretizations of MHD. We apply these methods to a standard resistive MHD benchmark problem, the GEM reconnection problem, and add a strong magnetic guide field, which is a critical characteristic of magnetically confined fusion plasmas. We show that our multigrid methods can achieve near textbook efficiency on fully implicit resistive MHD simulations.
Toward textbook multigrid efficiency for fully implicit resistive magnetohydrodynamics
Adams, Mark F.; Samtaney, Ravi; Brandt, Achi
2010-01-01
Multigrid methods can solve some classes of elliptic and parabolic equations to accuracy below the truncation error with a work-cost equivalent to a few residual calculations so-called "textbook" multigrid efficiency. We investigate methods to solve the system of equations that arise in time dependent magnetohydrodynamics (MHD) simulations with textbook multigrid efficiency. We apply multigrid techniques such as geometric interpolation, full approximate storage, Gauss-Seidel smoothers, and defect correction for fully implicit, nonlinear, second-order finite volume discretizations of MHD. We apply these methods to a standard resistive MHD benchmark problem, the GEM reconnection problem, and add a strong magnetic guide field, which is a critical characteristic of magnetically confined fusion plasmas. We show that our multigrid methods can achieve near textbook efficiency on fully implicit resistive MHD simulations. (C) 2010 Elsevier Inc. All rights reserved.
Toward textbook multigrid efficiency for fully implicit resistive magnetohydrodynamics
Adams, Mark F.
2010-09-01
Multigrid methods can solve some classes of elliptic and parabolic equations to accuracy below the truncation error with a work-cost equivalent to a few residual calculations so-called "textbook" multigrid efficiency. We investigate methods to solve the system of equations that arise in time dependent magnetohydrodynamics (MHD) simulations with textbook multigrid efficiency. We apply multigrid techniques such as geometric interpolation, full approximate storage, Gauss-Seidel smoothers, and defect correction for fully implicit, nonlinear, second-order finite volume discretizations of MHD. We apply these methods to a standard resistive MHD benchmark problem, the GEM reconnection problem, and add a strong magnetic guide field, which is a critical characteristic of magnetically confined fusion plasmas. We show that our multigrid methods can achieve near textbook efficiency on fully implicit resistive MHD simulations. (C) 2010 Elsevier Inc. All rights reserved.
Toward textbook multigrid efficiency for fully implicit resistive magnetohydrodynamics
International Nuclear Information System (INIS)
Adams, Mark F.; Samtaney, Ravi; Brandt, Achi
2010-01-01
Multigrid methods can solve some classes of elliptic and parabolic equations to accuracy below the truncation error with a work-cost equivalent to a few residual calculations - so-called 'textbook' multigrid efficiency. We investigate methods to solve the system of equations that arise in time dependent magnetohydrodynamics (MHD) simulations with textbook multigrid efficiency. We apply multigrid techniques such as geometric interpolation, full approximate storage, Gauss-Seidel smoothers, and defect correction for fully implicit, nonlinear, second-order finite volume discretizations of MHD. We apply these methods to a standard resistive MHD benchmark problem, the GEM reconnection problem, and add a strong magnetic guide field, which is a critical characteristic of magnetically confined fusion plasmas. We show that our multigrid methods can achieve near textbook efficiency on fully implicit resistive MHD simulations.
A multigrid method for variational inequalities
Energy Technology Data Exchange (ETDEWEB)
Oliveira, S.; Stewart, D.E.; Wu, W.
1996-12-31
Multigrid methods have been used with great success for solving elliptic partial differential equations. Penalty methods have been successful in solving finite-dimensional quadratic programs. In this paper these two techniques are combined to give a fast method for solving obstacle problems. A nonlinear penalized problem is solved using Newton`s method for large values of a penalty parameter. Multigrid methods are used to solve the linear systems in Newton`s method. The overall numerical method developed is based on an exterior penalty function, and numerical results showing the performance of the method have been obtained.
Some multigrid algorithms for SIMD machines
Energy Technology Data Exchange (ETDEWEB)
Dendy, J.E. Jr. [Los Alamos National Lab., NM (United States)
1996-12-31
Previously a semicoarsening multigrid algorithm suitable for use on SIMD architectures was investigated. Through the use of new software tools, the performance of this algorithm has been considerably improved. The method has also been extended to three space dimensions. The method performs well for strongly anisotropic problems and for problems with coefficients jumping by orders of magnitude across internal interfaces. The parallel efficiency of this method is analyzed, and its actual performance on the CM-5 is compared with its performance on the CRAY-YMP. A standard coarsening multigrid algorithm is also considered, and we compare its performance on these two platforms as well.
Gyrokinetic simulation of finite-β plasmas on parallel architectures
International Nuclear Information System (INIS)
Reynders, J.V.W.
1993-01-01
Much research exists on the linear and non-linear properties of plasma microinstabilities induced by density and temperature gradients. There has been an interest in the electromagnetic or finite-β effects on these microinstabilities. This thesis focuses on the finite-β modification of an ion temperature gradient (ITG) driven microinstability in a two-dimensional shearless and sheared-slab geometries. A gyrokinetic model is employed in the numerical and analytic studies of this instability. Chapter 1 introduces the electromagnetic gyrokinetic model employed in the numerical and analytic studies of the ITG instability. Some discussion of the Klimontovich particle representation of the gyrokinetic Vlasov equation and a multiple scale model of the background plasma gradient is presented. Chapter 2 details the computational issues facing an electromagnetic gyrokinetic particle simulation of the ITG mode. An electromagnetic extension of the partially linearized algorithm is presented with a comparison of quiet particle initialization routines. Chapter 3 presents and compares algorithms for the gyrokinetic particle simulation technique on SIMD and MIMD computing platforms. Chapter 4 discusses electromagnetic gyrokinetic fluctuation theory and provides a comparison of analytic and numerical results. Chapter 5 contains a linear and a non-linear three-wave coupling analysis of the finite-β modified ITG mode in a shearless slab geometry. Comparisons are made with linear and partially linearized gyrokinetic simulation results. Chapter 6 presents results from a finite-β modified ITG mode in a sheared slab geometry. The linear dispersion relation is derived and results from an integral eigenvalue code are presented. Comparisons are made with the gyrokinetic particle code in a variety of limits with both adiabatic and non-adiabatic electrons. Evidence of ITG driven microtearing is presented
Segmental Refinement: A Multigrid Technique for Data Locality
Adams, Mark F.; Brown, Jed; Knepley, Matt; Samtaney, Ravi
2016-01-01
We investigate a domain decomposed multigrid technique, termed segmental refinement, for solving general nonlinear elliptic boundary value problems. We extend the method first proposed in 1994 by analytically and experimentally investigating its complexity. We confirm that communication of traditional parallel multigrid is eliminated on fine grids, with modest amounts of extra work and storage, while maintaining the asymptotic exactness of full multigrid. We observe an accuracy dependence on the segmental refinement subdomain size, which was not considered in the original analysis. We present a communication complexity analysis that quantifies the communication costs ameliorated by segmental refinement and report performance results with up to 64K cores on a Cray XC30.
Segmental Refinement: A Multigrid Technique for Data Locality
Adams, Mark F.
2016-08-04
We investigate a domain decomposed multigrid technique, termed segmental refinement, for solving general nonlinear elliptic boundary value problems. We extend the method first proposed in 1994 by analytically and experimentally investigating its complexity. We confirm that communication of traditional parallel multigrid is eliminated on fine grids, with modest amounts of extra work and storage, while maintaining the asymptotic exactness of full multigrid. We observe an accuracy dependence on the segmental refinement subdomain size, which was not considered in the original analysis. We present a communication complexity analysis that quantifies the communication costs ameliorated by segmental refinement and report performance results with up to 64K cores on a Cray XC30.
Chen, Zhangxin; Ewing, Richard E.
1996-01-01
Multigrid algorithms for nonconforming and mixed finite element methods for second order elliptic problems on triangular and rectangular finite elements are considered. The construction of several coarse-to-fine intergrid transfer operators for nonconforming multigrid algorithms is discussed. The equivalence between the nonconforming and mixed finite element methods with and without projection of the coefficient of the differential problems into finite element spaces is described.
Unweighted least squares phase unwrapping by means of multigrid techniques
Pritt, Mark D.
1995-11-01
We present a multigrid algorithm for unweighted least squares phase unwrapping. This algorithm applies Gauss-Seidel relaxation schemes to solve the Poisson equation on smaller, coarser grids and transfers the intermediate results to the finer grids. This approach forms the basis of our multigrid algorithm for weighted least squares phase unwrapping, which is described in a separate paper. The key idea of our multigrid approach is to maintain the partial derivatives of the phase data in separate arrays and to correct these derivatives at the boundaries of the coarser grids. This maintains the boundary conditions necessary for rapid convergence to the correct solution. Although the multigrid algorithm is an iterative algorithm, we demonstrate that it is nearly as fast as the direct Fourier-based method. We also describe how to parallelize the algorithm for execution on a distributed-memory parallel processor computer or a network-cluster of workstations.
Trottenberg, U; Third European Conference on Multigrid Methods
1991-01-01
These proceedings contain a selection of papers presented at the Third European Conference on Multigrid Methods which was held in Bonn on October 1-4, 1990. Following conferences in 1981 and 1985, a platform for the presentation of new Multigrid results was provided for a third time. Multigrid methods no longer have problems being accepted by numerical analysts and users of numerical methods; on the contrary, they have been further developed in such a successful way that they have penetrated a variety of new fields of application. The high number of 154 participants from 18 countries and 76 presented papers show the need to continue the series of the European Multigrid Conferences. The papers of this volume give a survey on the current Multigrid situation; in particular, they correspond to those fields where new developments can be observed. For example, se veral papers study the appropriate treatment of time dependent problems. Improvements can also be noticed in the Multigrid approach for semiconductor eq...
Vectorization and parallelization of the finite strip method for dynamic Mindlin plate problems
Chen, Hsin-Chu; He, Ai-Fang
1993-01-01
The finite strip method is a semi-analytical finite element process which allows for a discrete analysis of certain types of physical problems by discretizing the domain of the problem into finite strips. This method decomposes a single large problem into m smaller independent subproblems when m harmonic functions are employed, thus yielding natural parallelism at a very high level. In this paper we address vectorization and parallelization strategies for the dynamic analysis of simply-supported Mindlin plate bending problems and show how to prevent potential conflicts in memory access during the assemblage process. The vector and parallel implementations of this method and the performance results of a test problem under scalar, vector, and vector-concurrent execution modes on the Alliant FX/80 are also presented.
Two parallel finite queues with simultaneous services and Markovian arrivals
Directory of Open Access Journals (Sweden)
S. R. Chakravarthy
1997-01-01
Full Text Available In this paper, we consider a finite capacity single server queueing model with two buffers, A and B, of sizes K and N respectively. Messages arrive one at a time according to a Markovian arrival process. Messages that arrive at buffer A are of a different type from the messages that arrive at buffer B. Messages are processed according to the following rules: 1. When buffer A(B has a message and buffer B(A is empty, then one message from A(B is processed by the server. 2. When both buffers, A and B, have messages, then two messages, one from A and one from B, are processed simultaneously by the server. The service times are assumed to be exponentially distributed with parameters that may depend on the type of service. This queueing model is studied as a Markov process with a large state space and efficient algorithmic procedures for computing various system performance measures are given. Some numerical examples are discussed.
Node-based finite element method for large-scale adaptive fluid analysis in parallel environments
Energy Technology Data Exchange (ETDEWEB)
Toshimitsu, Fujisawa [Tokyo Univ., Collaborative Research Center of Frontier Simulation Software for Industrial Science, Institute of Industrial Science (Japan); Genki, Yagawa [Tokyo Univ., Department of Quantum Engineering and Systems Science (Japan)
2003-07-01
In this paper, a FEM-based (finite element method) mesh free method with a probabilistic node generation technique is presented. In the proposed method, all computational procedures, from the mesh generation to the solution of a system of equations, can be performed fluently in parallel in terms of nodes. Local finite element mesh is generated robustly around each node, even for harsh boundary shapes such as cracks. The algorithm and the data structure of finite element calculation are based on nodes, and parallel computing is realized by dividing a system of equations by the row of the global coefficient matrix. In addition, the node-based finite element method is accompanied by a probabilistic node generation technique, which generates good-natured points for nodes of finite element mesh. Furthermore, the probabilistic node generation technique can be performed in parallel environments. As a numerical example of the proposed method, we perform a compressible flow simulation containing strong shocks. Numerical simulations with frequent mesh refinement, which are required for such kind of analysis, can effectively be performed on parallel processors by using the proposed method. (authors)
Node-based finite element method for large-scale adaptive fluid analysis in parallel environments
International Nuclear Information System (INIS)
Toshimitsu, Fujisawa; Genki, Yagawa
2003-01-01
In this paper, a FEM-based (finite element method) mesh free method with a probabilistic node generation technique is presented. In the proposed method, all computational procedures, from the mesh generation to the solution of a system of equations, can be performed fluently in parallel in terms of nodes. Local finite element mesh is generated robustly around each node, even for harsh boundary shapes such as cracks. The algorithm and the data structure of finite element calculation are based on nodes, and parallel computing is realized by dividing a system of equations by the row of the global coefficient matrix. In addition, the node-based finite element method is accompanied by a probabilistic node generation technique, which generates good-natured points for nodes of finite element mesh. Furthermore, the probabilistic node generation technique can be performed in parallel environments. As a numerical example of the proposed method, we perform a compressible flow simulation containing strong shocks. Numerical simulations with frequent mesh refinement, which are required for such kind of analysis, can effectively be performed on parallel processors by using the proposed method. (authors)
Parallel finite elements with domain decomposition and its pre-processing
International Nuclear Information System (INIS)
Yoshida, A.; Yagawa, G.; Hamada, S.
1993-01-01
This paper describes a parallel finite element analysis using a domain decomposition method, and the pre-processing for the parallel calculation. Computer simulations are about to replace experiments in various fields, and the scale of model to be simulated tends to be extremely large. On the other hand, computational environment has drastically changed in these years. Especially, parallel processing on massively parallel computers or computer networks is considered to be promising techniques. In order to achieve high efficiency on such parallel computation environment, large granularity of tasks, a well-balanced workload distribution are key issues. It is also important to reduce the cost of pre-processing in such parallel FEM. From the point of view, the authors developed the domain decomposition FEM with the automatic and dynamic task-allocation mechanism and the automatic mesh generation/domain subdivision system for it. (author)
Large parallel volumes of finite and compact sets in d-dimensional Euclidean space
DEFF Research Database (Denmark)
Kampf, Jürgen; Kiderlen, Markus
The r-parallel volume V (Cr) of a compact subset C in d-dimensional Euclidean space is the volume of the set Cr of all points of Euclidean distance at most r > 0 from C. According to Steiner’s formula, V (Cr) is a polynomial in r when C is convex. For finite sets C satisfying a certain geometric...
Parallel eigenanalysis of finite element models in a completely connected architecture
Akl, F. A.; Morel, M. R.
1989-01-01
A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi) = (M)(phi)(omega), where (K) and (M) are of order N, and (omega) is order of q. The concurrent solution of the eigenproblem is based on the multifrontal/modified subspace method and is achieved in a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm was successfully implemented on a tightly coupled multiple-instruction multiple-data parallel processing machine, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macrotasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. A parallel finite element dynamic analysis program, p-feda, is documented and the performance of its subroutines in parallel environment is analyzed.
Directory of Open Access Journals (Sweden)
José Miguel Vargas-Félix
2012-11-01
Full Text Available The Finite Element Method (FEM is used to solve problems like solid deformation and heat diffusion in domains with complex geometries. This kind of geometries requires discretization with millions of elements; this is equivalent to solve systems of equations with sparse matrices and tens or hundreds of millions of variables. The aim is to use computer clusters to solve these systems. The solution method used is Schur substructuration. Using it is possible to divide a large system of equations into many small ones to solve them more efficiently. This method allows parallelization. MPI (Message Passing Interface is used to distribute the systems of equations to solve each one in a computer of a cluster. Each system of equations is solved using a solver implemented to use OpenMP as a local parallelization method.The Finite Element Method (FEM is used to solve problems like solid deformation and heat diffusion in domains with complex geometries. This kind of geometries requires discretization with millions of elements; this is equivalent to solve systems of equations with sparse matrices and tens or hundreds of millions of variables. The aim is to use computer clusters to solve these systems. The solution method used is Schur substructuration. Using it is possible to divide a large system of equations into many small ones to solve them more efficiently. This method allows parallelization. MPI (Message Passing Interface is used to distribute the systems of equations to solve each one in a computer of a cluster. Each system of equations is solved using a solver implemented to use OpenMP as a local parallelization method.
A framework for grand scale parallelization of the combined finite discrete element method in 2d
Lei, Z.; Rougier, E.; Knight, E. E.; Munjiza, A.
2014-09-01
Within the context of rock mechanics, the Combined Finite-Discrete Element Method (FDEM) has been applied to many complex industrial problems such as block caving, deep mining techniques (tunneling, pillar strength, etc.), rock blasting, seismic wave propagation, packing problems, dam stability, rock slope stability, rock mass strength characterization problems, etc. The reality is that most of these were accomplished in a 2D and/or single processor realm. In this work a hardware independent FDEM parallelization framework has been developed using the Virtual Parallel Machine for FDEM, (V-FDEM). With V-FDEM, a parallel FDEM software can be adapted to different parallel architecture systems ranging from just a few to thousands of cores.
A parallel finite element method for the analysis of crystalline solids
DEFF Research Database (Denmark)
Sørensen, N.J.; Andersen, B.S.
1996-01-01
A parallel finite element method suitable for the analysis of 3D quasi-static crystal plasticity problems has been developed. The method is based on substructuring of the original mesh into a number of substructures which are treated as isolated finite element models related via the interface...... conditions. The resulting interface equations are solved using a direct solution method. The method shows a good speedup when increasing the number of processors from 1 to 8 and the effective solution of 3D crystal plasticity problems whose size is much too large for a single work station becomes possible....
Large-Scale Parallel Finite Element Analysis of the Stress Singular Problems
International Nuclear Information System (INIS)
Noriyuki Kushida; Hiroshi Okuda; Genki Yagawa
2002-01-01
In this paper, the convergence behavior of large-scale parallel finite element method for the stress singular problems was investigated. The convergence behavior of iterative solvers depends on the efficiency of the pre-conditioners. However, efficiency of pre-conditioners may be influenced by the domain decomposition that is necessary for parallel FEM. In this study the following results were obtained: Conjugate gradient method without preconditioning and the diagonal scaling preconditioned conjugate gradient method were not influenced by the domain decomposition as expected. symmetric successive over relaxation method preconditioned conjugate gradient method converged 6% faster as maximum if the stress singular area was contained in one sub-domain. (authors)
Non-Galerkin Coarse Grids for Algebraic Multigrid
Energy Technology Data Exchange (ETDEWEB)
Falgout, Robert D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Schroder, Jacob B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2014-06-26
Algebraic multigrid (AMG) is a popular and effective solver for systems of linear equations that arise from discretized partial differential equations. And while AMG has been effectively implemented on large scale parallel machines, challenges remain, especially when moving to exascale. Particularly, stencil sizes (the number of nonzeros in a row) tend to increase further down in the coarse grid hierarchy, and this growth leads to more communication. Therefore, as problem size increases and the number of levels in the hierarchy grows, the overall efficiency of the parallel AMG method decreases, sometimes dramatically. This growth in stencil size is due to the standard Galerkin coarse grid operator, $P^T A P$, where $P$ is the prolongation (i.e., interpolation) operator. For example, the coarse grid stencil size for a simple three-dimensional (3D) seven-point finite differencing approximation to diffusion can increase into the thousands on present day machines, causing an associated increase in communication costs. We therefore consider algebraically truncating coarse grid stencils to obtain a non-Galerkin coarse grid. First, the sparsity pattern of the non-Galerkin coarse grid is determined by employing a heuristic minimal “safe” pattern together with strength-of-connection ideas. Second, the nonzero entries are determined by collapsing the stencils in the Galerkin operator using traditional AMG techniques. The result is a reduction in coarse grid stencil size, overall operator complexity, and parallel AMG solve phase times.
Parallel algorithms for testing finite state machines:Generating UIO sequences
Hierons, RM; Turker, UC
2016-01-01
This paper describes an efficient parallel algorithm that uses many-core GPUs for automatically deriving Unique Input Output sequences (UIOs) from Finite State Machines. The proposed algorithm uses the global scope of the GPU's global memory through coalesced memory access and minimises the transfer between CPU and GPU memory. The results of experiments indicate that the proposed method yields considerably better results compared to a single core UIO construction algorithm. Our algorithm is s...
Multigrid and defect correction for the steady Navier-Stokes equations : application to aerodynamics
Koren, B.
1991-01-01
Theoretical and expcrimental convergence results are presented for nonlinear multigrid and iterative defect correction applied to finite volume discretizations of the full, steady, 2D, compressible NavierStokes equations. lterative defect correction is introduced for circumventing the difficulty in
Multigrid and defect correction for the steady Navier-Stokes equations
Koren, B.
1990-01-01
Theoretical and experimental convergence results are presented for nonlinear multigrid and iterative defect correction applied to finite volume discretizations of the full, steady, 2D, compressible Navier-Stokes equations. Iterative defect correction is introduced for circumventing the difficulty in
Multigrid Methods for EHL Problems
Nurgat, Elyas; Berzins, Martin
1996-01-01
In many bearings and contacts, forces are transmitted through thin continuous fluid films which separate two contacting elements. Objects in contact are normally subjected to friction and wear which can be reduced effectively by using lubricants. If the lubricant film is sufficiently thin to prevent the opposing solids from coming into contact and carries the entire load, then we have hydrodynamic lubrication, where the lubricant film is determined by the motion and geometry of the solids. However, for loaded contacts of low geometrical conformity, such as gears, rolling contact bearings and cams, this is not the case due to high pressures and this is referred to as Elasto-Hydrodynamic Lubrication (EHL) In EHL, elastic deformation of the contacting elements and the increase in fluid viscosity with pressure are very significant and cannot be ignored. Since the deformation results in changing the geometry of the lubricating film, which in turn determines the pressure distribution, an EHL mathematical model must simultaneously satisfy the complex elasticity (integral) and the Reynolds lubrication (differential) equations. The nonlinear and coupled nature of the two equations makes numerical calculations computationally intensive. This is especially true for highly loaded problems found in practice. One novel feature of these problems is that the solution may exhibit sharp pressure spikes in the outlet region. To this date both finite element and finite difference methods have been used to solve EHL problems with perhaps greater emphasis on the use of the finite difference approach. In both cases, a major computational difficulty is ensuring convergence of the nonlinear equations solver to a steady state solution. Two successful methods for achieving this are direct iteration and multigrid methods. Direct iteration methods (e.g Gauss Seidel) have long been used in conjunction with finite difference discretizations on regular meshes. Perhaps one of the best examples of
DEFF Research Database (Denmark)
Cai, Hongzhu; Čuma, Martin; Zhdanov, Michael
2015-01-01
This paper presents a parallelized version of the edge-based finite element method with a novel post-processing approach for numerical modeling of an electromagnetic field in complex media. The method uses an unstructured tetrahedral mesh which can reduce the number of degrees of freedom signific......This paper presents a parallelized version of the edge-based finite element method with a novel post-processing approach for numerical modeling of an electromagnetic field in complex media. The method uses an unstructured tetrahedral mesh which can reduce the number of degrees of freedom...... significantly. The linear system of finite element equations is solved using parallel direct solvers which are robust for ill-conditioned systems and efficient for multiple source electromagnetic (EM) modeling. We also introduce a novel approach to compute the scalar components of the electric field from...... the tangential components along each edge based on field redatuming. The method can produce a more accurate result as compared to conventional approach. We have applied the developed algorithm to compute the EM response for a typical 3D anisotropic geoelectrical model of the off-shore HC reservoir with complex...
Eigensolution of finite element problems in a completely connected parallel architecture
Akl, Fred A.; Morel, Michael R.
1989-01-01
A parallel algorithm for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi)=(M)(phi)(omega), where (K) and (M) are of order N, and (omega) is of order q is presented. The parallel algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm has been successfully implemented on a tightly coupled multiple-instruction-multiple-data (MIMD) parallel processing computer, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor, or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macro-tasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18 and 3.61 are achieved on two, four, six and eight processors, respectively.
Parallel Object-Oriented Computation Applied to a Finite Element Problem
Directory of Open Access Journals (Sweden)
Jon B. Weissman
1993-01-01
Full Text Available The conventional wisdom in the scientific computing community is that the best way to solve large-scale numerically intensive scientific problems on today's parallel MIMD computers is to use Fortran or C programmed in a data-parallel style using low-level message-passing primitives. This approach inevitably leads to nonportable codes and extensive development time, and restricts parallel programming to the domain of the expert programmer. We believe that these problems are not inherent to parallel computing but are the result of the programming tools used. We will show that comparable performance can be achieved with little effort if better tools that present higher level abstractions are used. The vehicle for our demonstration is a 2D electromagnetic finite element scattering code we have implemented in Mentat, an object-oriented parallel processing system. We briefly describe the application. Mentat, the implementation, and present performance results for both a Mentat and a hand-coded parallel Fortran version.
Parallel DSMC Solution of Three-Dimensional Flow Over a Finite Flat Plate
Nance, Robert P.; Wilmoth, Richard G.; Moon, Bongki; Hassan, H. A.; Saltz, Joel
1994-01-01
This paper describes a parallel implementation of the direct simulation Monte Carlo (DSMC) method. Runtime library support is used for scheduling and execution of communication between nodes, and domain decomposition is performed dynamically to maintain a good load balance. Performance tests are conducted using the code to evaluate various remapping and remapping-interval policies, and it is shown that a one-dimensional chain-partitioning method works best for the problems considered. The parallel code is then used to simulate the Mach 20 nitrogen flow over a finite-thickness flat plate. It is shown that the parallel algorithm produces results which compare well with experimental data. Moreover, it yields significantly faster execution times than the scalar code, as well as very good load-balance characteristics.
Algorithms and data structures for massively parallel generic adaptive finite element codes
Bangerth, Wolfgang
2011-12-01
Today\\'s largest supercomputers have 100,000s of processor cores and offer the potential to solve partial differential equations discretized by billions of unknowns. However, the complexity of scaling to such large machines and problem sizes has so far prevented the emergence of generic software libraries that support such computations, although these would lower the threshold of entry and enable many more applications to benefit from large-scale computing. We are concerned with providing this functionality for mesh-adaptive finite element computations. We assume the existence of an "oracle" that implements the generation and modification of an adaptive mesh distributed across many processors, and that responds to queries about its structure. Based on querying the oracle, we develop scalable algorithms and data structures for generic finite element methods. Specifically, we consider the parallel distribution of mesh data, global enumeration of degrees of freedom, constraints, and postprocessing. Our algorithms remove the bottlenecks that typically limit large-scale adaptive finite element analyses. We demonstrate scalability of complete finite element workflows on up to 16,384 processors. An implementation of the proposed algorithms, based on the open source software p4est as mesh oracle, is provided under an open source license through the widely used deal.II finite element software library. © 2011 ACM 0098-3500/2011/12-ART10 $10.00.
Higher-order ice-sheet modelling accelerated by multigrid on graphics cards
Brædstrup, Christian; Egholm, David
2013-04-01
Higher-order ice flow modelling is a very computer intensive process owing primarily to the nonlinear influence of the horizontal stress coupling. When applied for simulating long-term glacial landscape evolution, the ice-sheet models must consider very long time series, while both high temporal and spatial resolution is needed to resolve small effects. The use of higher-order and full stokes models have therefore seen very limited usage in this field. However, recent advances in graphics card (GPU) technology for high performance computing have proven extremely efficient in accelerating many large-scale scientific computations. The general purpose GPU (GPGPU) technology is cheap, has a low power consumption and fits into a normal desktop computer. It could therefore provide a powerful tool for many glaciologists working on ice flow models. Our current research focuses on utilising the GPU as a tool in ice-sheet and glacier modelling. To this extent we have implemented the Integrated Second-Order Shallow Ice Approximation (iSOSIA) equations on the device using the finite difference method. To accelerate the computations, the GPU solver uses a non-linear Red-Black Gauss-Seidel iterator coupled with a Full Approximation Scheme (FAS) multigrid setup to further aid convergence. The GPU finite difference implementation provides the inherent parallelization that scales from hundreds to several thousands of cores on newer cards. We demonstrate the efficiency of the GPU multigrid solver using benchmark experiments.
International Nuclear Information System (INIS)
Masoud Ziaei-Rad
2010-01-01
In this paper, a two-dimensional numerical scheme is presented for the simulation of turbulent, viscous, transient compressible flows in the simultaneously developing hydraulic and thermal boundary layer region. The numerical procedure is a finite-volume-based finite-element method applied to unstructured grids. This combination together with a new method applied for the boundary conditions allows for accurate computation of the variables in the entrance region and for a wide range of flow fields from subsonic to transonic. The Roe-Riemann solver is used for the convective terms, whereas the standard Galerkin technique is applied for the viscous terms. A modified κ-ε model with a two-layer equation for the near-wall region combined with a compressibility correction is used to predict the turbulent viscosity. Parallel processing is also employed to divide the computational domain among the different processors to reduce the computational time. The method is applied to some test cases in order to verify the numerical accuracy. The results show significant differences between incompressible and compressible flows in the friction coefficient, Nusselt number, shear stress and the ratio of the compressible turbulent viscosity to the molecular viscosity along the developing region. A transient flow generated after an accidental rupture in a pipeline was also studied as a test case. The results show that the present numerical scheme is stable, accurate and efficient enough to solve the problem of transient wall-bounded flow.
Simulation of incompressible flows with heat and mass transfer using parallel finite element method
Directory of Open Access Journals (Sweden)
Jalal Abedi
2003-02-01
Full Text Available The stabilized finite element formulations based on the SUPG (Stream-line-Upwind/Petrov-Galerkin and PSPG (Pressure-Stabilization/Petrov-Galerkin methods are developed and applied to solve buoyancy-driven incompressible flows with heat and mass transfer. The SUPG stabilization term allows us to solve flow problems at high speeds (advection dominant flows and the PSPG term eliminates instabilities associated with the use of equal order interpolation functions for both pressure and velocity. The finite element formulations are implemented in parallel using MPI. In parallel computations, the finite element mesh is partitioned into contiguous subdomains using METIS, which are then assigned to individual processors. To ensure a balanced load, the number of elements assigned to each processor is approximately equal. To solve nonlinear systems in large-scale applications, we developed a matrix-free GMRES iterative solver. Here we totally eliminate a need to form any matrices, even at the element levels. To measure the accuracy of the method, we solve 2D and 3D example of natural convection flows at moderate to high Rayleigh numbers.
Parallelized implicit propagators for the finite-difference Schrödinger equation
Parker, Jonathan; Taylor, K. T.
1995-08-01
We describe the application of block Gauss-Seidel and block Jacobi iterative methods to the design of implicit propagators for finite-difference models of the time-dependent Schrödinger equation. The block-wise iterative methods discussed here are mixed direct-iterative methods for solving simultaneous equations, in the sense that direct methods (e.g. LU decomposition) are used to invert certain block sub-matrices, and iterative methods are used to complete the solution. We describe parallel variants of the basic algorithm that are well suited to the medium- to coarse-grained parallelism of work-station clusters, and MIMD supercomputers, and we show that under a wide range of conditions, fine-grained parallelism of the computation can be achieved. Numerical tests are conducted on a typical one-electron atom Hamiltonian. The methods converge robustly to machine precision (15 significant figures), in some cases in as few as 6 or 7 iterations. The rate of convergence is nearly independent of the finite-difference grid-point separations.
A Dual Super-Element Domain Decomposition Approach for Parallel Nonlinear Finite Element Analysis
Jokhio, G. A.; Izzuddin, B. A.
2015-05-01
This article presents a new domain decomposition method for nonlinear finite element analysis introducing the concept of dual partition super-elements. The method extends ideas from the displacement frame method and is ideally suited for parallel nonlinear static/dynamic analysis of structural systems. In the new method, domain decomposition is realized by replacing one or more subdomains in a "parent system," each with a placeholder super-element, where the subdomains are processed separately as "child partitions," each wrapped by a dual super-element along the partition boundary. The analysis of the overall system, including the satisfaction of equilibrium and compatibility at all partition boundaries, is realized through direct communication between all pairs of placeholder and dual super-elements. The proposed method has particular advantages for matrix solution methods based on the frontal scheme, and can be readily implemented for existing finite element analysis programs to achieve parallelization on distributed memory systems with minimal intervention, thus overcoming memory bottlenecks typically faced in the analysis of large-scale problems. Several examples are presented in this article which demonstrate the computational benefits of the proposed parallel domain decomposition approach and its applicability to the nonlinear structural analysis of realistic structural systems.
Multigrid treatment of implicit continuum diffusion
Francisquez, Manaure; Zhu, Ben; Rogers, Barrett
2017-10-01
Implicit treatment of diffusive terms of various differential orders common in continuum mechanics modeling, such as computational fluid dynamics, is investigated with spectral and multigrid algorithms in non-periodic 2D domains. In doubly periodic time dependent problems these terms can be efficiently and implicitly handled by spectral methods, but in non-periodic systems solved with distributed memory parallel computing and 2D domain decomposition, this efficiency is lost for large numbers of processors. We built and present here a multigrid algorithm for these types of problems which outperforms a spectral solution that employs the highly optimized FFTW library. This multigrid algorithm is not only suitable for high performance computing but may also be able to efficiently treat implicit diffusion of arbitrary order by introducing auxiliary equations of lower order. We test these solvers for fourth and sixth order diffusion with idealized harmonic test functions as well as a turbulent 2D magnetohydrodynamic simulation. It is also shown that an anisotropic operator without cross-terms can improve model accuracy and speed, and we examine the impact that the various diffusion operators have on the energy, the enstrophy, and the qualitative aspect of a simulation. This work was supported by DOE-SC-0010508. This research used resources of the National Energy Research Scientific Computing Center (NERSC).
Evaluating the performance of the particle finite element method in parallel architectures
Gimenez, Juan M.; Nigro, Norberto M.; Idelsohn, Sergio R.
2014-05-01
This paper presents a high performance implementation for the particle-mesh based method called particle finite element method two (PFEM-2). It consists of a material derivative based formulation of the equations with a hybrid spatial discretization which uses an Eulerian mesh and Lagrangian particles. The main aim of PFEM-2 is to solve transport equations as fast as possible keeping some level of accuracy. The method was found to be competitive with classical Eulerian alternatives for these targets, even in their range of optimal application. To evaluate the goodness of the method with large simulations, it is imperative to use of parallel environments. Parallel strategies for Finite Element Method have been widely studied and many libraries can be used to solve Eulerian stages of PFEM-2. However, Lagrangian stages, such as streamline integration, must be developed considering the parallel strategy selected. The main drawback of PFEM-2 is the large amount of memory needed, which limits its application to large problems with only one computer. Therefore, a distributed-memory implementation is urgently needed. Unlike a shared-memory approach, using domain decomposition the memory is automatically isolated, thus avoiding race conditions; however new issues appear due to data distribution over the processes. Thus, a domain decomposition strategy for both particle and mesh is adopted, which minimizes the communication between processes. Finally, performance analysis running over multicore and multinode architectures are presented. The Courant-Friedrichs-Lewy number used influences the efficiency of the parallelization and, in some cases, a weighted partitioning can be used to improve the speed-up. However the total cputime for cases presented is lower than that obtained when using classical Eulerian strategies.
Design Considerations for a Flexible Multigrid Preconditioning Library
Directory of Open Access Journals (Sweden)
Jérémie Gaidamour
2012-01-01
Full Text Available MueLu is a library within the Trilinos software project [An overview of Trilinos, Technical Report SAND2003-2927, Sandia National Laboratories, 2003] and provides a framework for parallel multigrid preconditioning methods for large sparse linear systems. While providing efficient implementations of modern multigrid methods based on smoothed aggregation and energy minimization concepts, MueLu is designed to be customized and extended. This article gives an overview of design considerations for the MueLu package: user interfaces, internal design, data management, usage of modern software constructs, leveraging Trilinos capabilities, linear algebra operations and advanced application.
Advanced Algebraic Multigrid Solvers for Subsurface Flow Simulation
Chen, Meng-Huo
2015-09-13
In this research we are particularly interested in extending the robustness of multigrid solvers to encounter complex systems related to subsurface reservoir applications for flow problems in porous media. In many cases, the step for solving the pressure filed in subsurface flow simulation becomes a bottleneck for the performance of the simulator. For solving large sparse linear system arising from MPFA discretization, we choose multigrid methods as the linear solver. The possible difficulties and issues will be addressed and the corresponding remedies will be studied. As the multigrid methods are used as the linear solver, the simulator can be parallelized (although not trivial) and the high-resolution simulation become feasible, the ultimately goal which we desire to achieve.
Mesh Partitioning Algorithm Based on Parallel Finite Element Analysis and Its Actualization
Directory of Open Access Journals (Sweden)
Lei Zhang
2013-01-01
Full Text Available In parallel computing based on finite element analysis, domain decomposition is a key technique for its preprocessing. Generally, a domain decomposition of a mesh can be realized through partitioning of a graph which is converted from a finite element mesh. This paper discusses the method for graph partitioning and the way to actualize mesh partitioning. Relevant softwares are introduced, and the data structure and key functions of Metis and ParMetis are introduced. The writing, compiling, and testing of the mesh partitioning interface program based on these key functions are performed. The results indicate some objective law and characteristics to guide the users who use the graph partitioning algorithm and software to write PFEM program, and ideal partitioning effects can be achieved by actualizing mesh partitioning through the program. The interface program can also be used directly by the engineering researchers as a module of the PFEM software. So that it can reduce the application of the threshold of graph partitioning algorithm, improve the calculation efficiency, and promote the application of graph theory and parallel computing.
A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows
Bui, Trong T.
1999-01-01
A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.
International Nuclear Information System (INIS)
Nakamachi, Eiji
2005-01-01
A crystallographic homogenization procedure is introduced to the conventional static-explicit and dynamic-explicit finite element formulation to develop a multi scale - double scale - analysis code to predict the plastic strain induced texture evolution, yield loci and formability of sheet metal. The double-scale structure consists of a crystal aggregation - micro-structure - and a macroscopic elastic plastic continuum. At first, we measure crystal morphologies by using SEM-EBSD apparatus, and define a unit cell of micro structure, which satisfy the periodicity condition in the real scale of polycrystal. Next, this crystallographic homogenization FE code is applied to 3N pure-iron and 'Benchmark' aluminum A6022 polycrystal sheets. It reveals that the initial crystal orientation distribution - the texture - affects very much to a plastic strain induced texture and anisotropic hardening evolutions and sheet deformation. Since, the multi-scale finite element analysis requires a large computation time, a parallel computing technique by using PC cluster is developed for a quick calculation. In this parallelization scheme, a dynamic workload balancing technique is introduced for quick and efficient calculations
Fast multigrid-based computation of the induced electric field for transcranial magnetic stimulation
Laakso, Ilkka; Hirata, Akimasa
2012-12-01
In transcranial magnetic stimulation (TMS), the distribution of the induced electric field, and the affected brain areas, depends on the position of the stimulation coil and the individual geometry of the head and brain. The distribution of the induced electric field in realistic anatomies can be modelled using computational methods. However, existing computational methods for accurately determining the induced electric field in realistic anatomical models have suffered from long computation times, typically in the range of tens of minutes or longer. This paper presents a matrix-free implementation of the finite-element method with a geometric multigrid method that can potentially reduce the computation time to several seconds or less even when using an ordinary computer. The performance of the method is studied by computing the induced electric field in two anatomically realistic models. An idealized two-loop coil is used as the stimulating coil. Multiple computational grid resolutions ranging from 2 to 0.25 mm are used. The results show that, for macroscopic modelling of the electric field in an anatomically realistic model, computational grid resolutions of 1 mm or 2 mm appear to provide good numerical accuracy compared to higher resolutions. The multigrid iteration typically converges in less than ten iterations independent of the grid resolution. Even without parallelization, each iteration takes about 1.0 s or 0.1 s for the 1 and 2 mm resolutions, respectively. This suggests that calculating the electric field with sufficient accuracy in real time is feasible.
Parallelized Three-Dimensional Resistivity Inversion Using Finite Elements And Adjoint State Methods
Schaa, Ralf; Gross, Lutz; Du Plessis, Jaco
2015-04-01
resistivity. The Hessian of the regularization term is used as preconditioner which requires an additional PDE solution in each iteration step. As it turns out, the relevant PDEs are naturally formulated in the finite element framework. Using the domain decomposition method provided in Escript, the inversion scheme has been parallelized for distributed memory computers with multi-core shared memory nodes. We show numerical examples from simple layered models to complex 3D models and compare with the results from other methods. The inversion scheme is furthermore tested on a field data example to characterise localised freshwater discharge in a coastal environment.. References: L. Gross and C. Kemp (2013) Large Scale Joint Inversion of Geophysical Data using the Finite Element Method in escript. ASEG Extended Abstracts 2013, http://dx.doi.org/10.1071/ASEG2013ab306
A parallel direct solver for the self-adaptive hp Finite Element Method
Paszyński, Maciej R.
2010-03-01
In this paper we present a new parallel multi-frontal direct solver, dedicated for the hp Finite Element Method (hp-FEM). The self-adaptive hp-FEM generates in a fully automatic mode, a sequence of hp-meshes delivering exponential convergence of the error with respect to the number of degrees of freedom (d.o.f.) as well as the CPU time, by performing a sequence of hp refinements starting from an arbitrary initial mesh. The solver constructs an initial elimination tree for an arbitrary initial mesh, and expands the elimination tree each time the mesh is refined. This allows us to keep track of the order of elimination for the solver. The solver also minimizes the memory usage, by de-allocating partial LU factorizations computed during the elimination stage of the solver, and recomputes them for the backward substitution stage, by utilizing only about 10% of the computational time necessary for the original computations. The solver has been tested on 3D Direct Current (DC) borehole resistivity measurement simulations problems. We measure the execution time and memory usage of the solver over a large regular mesh with 1.5 million degrees of freedom as well as on the highly non-regular mesh, generated by the self-adaptive h p-FEM, with finite elements of various sizes and polynomial orders of approximation varying from p = 1 to p = 9. From the presented experiments it follows that the parallel solver scales well up to the maximum number of utilized processors. The limit for the solver scalability is the maximum sequential part of the algorithm: the computations of the partial LU factorizations over the longest path, coming from the root of the elimination tree down to the deepest leaf. © 2009 Elsevier Inc. All rights reserved.
International Nuclear Information System (INIS)
Lu Yujie; Zhu Banghe; Rasmussen, John C; Sevick-Muraca, Eva M; Shen Haiou; Wang Ge
2010-01-01
Fluorescence molecular imaging/tomography may play an important future role in preclinical research and clinical diagnostics. Time- and frequency-domain fluorescence imaging can acquire more measurement information than the continuous wave (CW) counterpart, improving the image quality of fluorescence molecular tomography. Although diffusion approximation (DA) theory has been extensively applied in optical molecular imaging, high-order photon migration models need to be further investigated to match quantitation provided by nuclear imaging. In this paper, a frequency-domain parallel adaptive finite element solver is developed with simplified spherical harmonics (SP N ) approximations. To fully evaluate the performance of the SP N approximations, a fast time-resolved tetrahedron-based Monte Carlo fluorescence simulator suitable for complex heterogeneous geometries is developed using a convolution strategy to realize the simulation of the fluorescence excitation and emission. The validation results show that high-order SP N can effectively correct the modeling errors of the diffusion equation, especially when the tissues have high absorption characteristics or when high modulation frequency measurements are used. Furthermore, the parallel adaptive mesh evolution strategy improves the modeling precision and the simulation speed significantly on a realistic digital mouse phantom. This solver is a promising platform for fluorescence molecular tomography using high-order approximations to the radiative transfer equation.
Solving the Fluid Pressure Poisson Equation Using Multigrid-Evaluation and Improvements.
Dick, Christian; Rogowsky, Marcus; Westermann, Rudiger
2016-11-01
In many numerical simulations of fluids governed by the incompressible Navier-Stokes equations, the pressure Poisson equation needs to be solved to enforce mass conservation. Multigrid solvers show excellent convergence in simple scenarios, yet they can converge slowly in domains where physically separated regions are combined at coarser scales. Moreover, existing multigrid solvers are tailored to specific discretizations of the pressure Poisson equation, and they cannot easily be adapted to other discretizations. In this paper we analyze the convergence properties of existing multigrid solvers for the pressure Poisson equation in different simulation domains, and we show how to further improve the multigrid convergence rate by using a graph-based extension to determine the coarse grid hierarchy. The proposed multigrid solver is generic in that it can be applied to different kinds of discretizations of the pressure Poisson equation, by using solely the specification of the simulation domain and pre-assembled computational stencils. We analyze the proposed solver in combination with finite difference and finite volume discretizations of the pressure Poisson equation. Our evaluations show that, despite the common assumption, multigrid schemes can exploit their potential even in the most complicated simulation scenarios, yet this behavior is obtained at the price of higher memory consumption.
On a multigrid method for the coupled Stokes and porous media flow problem
Luo, P.; Rodrigo, C.; Gaspar, F. J.; Oosterlee, C. W.
2017-07-01
The multigrid solution of coupled porous media and Stokes flow problems is considered. The Darcy equation as the saturated porous medium model is coupled to the Stokes equations by means of appropriate interface conditions. We focus on an efficient multigrid solution technique for the coupled problem, which is discretized by finite volumes on staggered grids, giving rise to a saddle point linear system. Special treatment is required regarding the discretization at the interface. An Uzawa smoother is employed in multigrid, which is a decoupled procedure based on symmetric Gauss-Seidel smoothing for velocity components and a simple Richardson iteration for the pressure field. Since a relaxation parameter is part of a Richardson iteration, Local Fourier Analysis (LFA) is applied to determine the optimal parameters. Highly satisfactory multigrid convergence is reported, and, moreover, the algorithm performs well for small values of the hydraulic conductivity and fluid viscosity, that are relevant for applications.
Cai, Yong; Cui, Xiangyang; Li, Guangyao; Liu, Wenyang
2018-04-01
The edge-smooth finite element method (ES-FEM) can improve the computational accuracy of triangular shell elements and the mesh partition efficiency of complex models. In this paper, an approach is developed to perform explicit finite element simulations of contact-impact problems with a graphical processing unit (GPU) using a special edge-smooth triangular shell element based on ES-FEM. Of critical importance for this problem is achieving finer-grained parallelism to enable efficient data loading and to minimize communication between the device and host. Four kinds of parallel strategies are then developed to efficiently solve these ES-FEM based shell element formulas, and various optimization methods are adopted to ensure aligned memory access. Special focus is dedicated to developing an approach for the parallel construction of edge systems. A parallel hierarchy-territory contact-searching algorithm (HITA) and a parallel penalty function calculation method are embedded in this parallel explicit algorithm. Finally, the program flow is well designed, and a GPU-based simulation system is developed, using Nvidia's CUDA. Several numerical examples are presented to illustrate the high quality of the results obtained with the proposed methods. In addition, the GPU-based parallel computation is shown to significantly reduce the computing time.
Feasibility Study of Parallel Finite Element Analysis on Cluster-of-Clusters
Muraoka, Masae; Okuda, Hiroshi
With the rapid growth of WAN infrastructure and development of Grid middleware, it's become a realistic and attractive methodology to connect cluster machines on wide-area network for the execution of computation-demanding applications. Many existing parallel finite element (FE) applications have been, however, designed and developed with a single computing resource in mind, since such applications require frequent synchronization and communication among processes. There have been few FE applications that can exploit the distributed environment so far. In this study, we explore the feasibility of FE applications on the cluster-of-clusters. First, we classify FE applications into two types, tightly coupled applications (TCA) and loosely coupled applications (LCA) based on their communication pattern. A prototype of each application is implemented on the cluster-of-clusters. We perform numerical experiments executing TCA and LCA on both the cluster-of-clusters and a single cluster. Thorough these experiments, by comparing the performances and communication cost in each case, we evaluate the feasibility of FEA on the cluster-of-clusters.
Inversion of potential field data using the finite element method on parallel computers
Gross, L.; Altinay, C.; Shaw, S.
2015-11-01
In this paper we present a formulation of the joint inversion of potential field anomaly data as an optimization problem with partial differential equation (PDE) constraints. The problem is solved using the iterative Broyden-Fletcher-Goldfarb-Shanno (BFGS) method with the Hessian operator of the regularization and cross-gradient component of the cost function as preconditioner. We will show that each iterative step requires the solution of several PDEs namely for the potential fields, for the adjoint defects and for the application of the preconditioner. In extension to the traditional discrete formulation the BFGS method is applied to continuous descriptions of the unknown physical properties in combination with an appropriate integral form of the dot product. The PDEs can easily be solved using standard conforming finite element methods (FEMs) with potentially different resolutions. For two examples we demonstrate that the number of PDE solutions required to reach a given tolerance in the BFGS iteration is controlled by weighting regularization and cross-gradient but is independent of the resolution of PDE discretization and that as a consequence the method is weakly scalable with the number of cells on parallel computers. We also show a comparison with the UBC-GIF GRAV3D code.
Multigrid solution of diffusion equations on distributed memory multiprocessor systems
International Nuclear Information System (INIS)
Finnemann, H.
1988-01-01
The subject is the solution of partial differential equations for simulation of the reactor core on high-performance computers. The parallelization and implementation of nodal multigrid diffusion algorithms on array and ring configurations of the DIRMU multiprocessor system is outlined. The particular iteration scheme employed in the nodal expansion method appears similarly efficient in serial and parallel environments. The combination of modern multi-level techniques with innovative hardware (vector-multiprocessor systems) provides powerful tools needed for real time simulation of physical systems. The parallel efficiencies range from 70 to 90%. The same performance is estimated for large problems on large multiprocessor systems being designed at present. (orig.) [de
Self-correcting Multigrid Solver
International Nuclear Information System (INIS)
Lewandowski, Jerome L.V.
2004-01-01
A new multigrid algorithm based on the method of self-correction for the solution of elliptic problems is described. The method exploits information contained in the residual to dynamically modify the source term (right-hand side) of the elliptic problem. It is shown that the self-correcting solver is more efficient at damping the short wavelength modes of the algebraic error than its standard equivalent. When used in conjunction with a multigrid method, the resulting solver displays an improved convergence rate with no additional computational work
Schwing, Alan Michael
For computational fluid dynamics, the governing equations are solved on a discretized domain of nodes, faces, and cells. The quality of the grid or mesh can be a driving source for error in the results. While refinement studies can help guide the creation of a mesh, grid quality is largely determined by user expertise and understanding of the flow physics. Adaptive mesh refinement is a technique for enriching the mesh during a simulation based on metrics for error, impact on important parameters, or location of important flow features. This can offload from the user some of the difficult and ambiguous decisions necessary when discretizing the domain. This work explores the implementation of adaptive mesh refinement in an implicit, unstructured, finite-volume solver. Consideration is made for applying modern computational techniques in the presence of hanging nodes and refined cells. The approach is developed to be independent of the flow solver in order to provide a path for augmenting existing codes. It is designed to be applicable for unsteady simulations and refinement and coarsening of the grid does not impact the conservatism of the underlying numerics. The effect on high-order numerical fluxes of fourth- and sixth-order are explored. Provided the criteria for refinement is appropriately selected, solutions obtained using adapted meshes have no additional error when compared to results obtained on traditional, unadapted meshes. In order to leverage large-scale computational resources common today, the methods are parallelized using MPI. Parallel performance is considered for several test problems in order to assess scalability of both adapted and unadapted grids. Dynamic repartitioning of the mesh during refinement is crucial for load balancing an evolving grid. Development of the methods outlined here depend on a dual-memory approach that is described in detail. Validation of the solver developed here against a number of motivating problems shows favorable
Layout optimization with algebraic multigrid methods
Regler, Hans; Ruede, Ulrich
1993-01-01
Finding the optimal position for the individual cells (also called functional modules) on the chip surface is an important and difficult step in the design of integrated circuits. This paper deals with the problem of relative placement, that is the minimization of a quadratic functional with a large, sparse, positive definite system matrix. The basic optimization problem must be augmented by constraints to inhibit solutions where cells overlap. Besides classical iterative methods, based on conjugate gradients (CG), we show that algebraic multigrid methods (AMG) provide an interesting alternative. For moderately sized examples with about 10000 cells, AMG is already competitive with CG and is expected to be superior for larger problems. Besides the classical 'multiplicative' AMG algorithm where the levels are visited sequentially, we propose an 'additive' variant of AMG where levels may be treated in parallel and that is suitable as a preconditioner in the CG algorithm.
Nonlinear Multigrid solver exploiting AMGe Coarse Spaces with Approximation Properties
DEFF Research Database (Denmark)
Christensen, Max la Cour; Villa, Umberto; Engsig-Karup, Allan Peter
The paper introduces a nonlinear multigrid solver for mixed finite element discretizations based on the Full Approximation Scheme (FAS) and element-based Algebraic Multigrid (AMGe). The main motivation to use FAS for unstructured problems is the guaranteed approximation property of the AMGe coarse...... properties of the coarse spaces. With coarse spaces with approximation properties, our FAS approach on unstructured meshes has the ability to be as powerful/successful as FAS on geometrically refined meshes. For comparison, Newton’s method and Picard iterations with an inner state-of-the-art linear solver...... are compared to FAS on a nonlinear saddle point problem with applications to porous media flow. It is demonstrated that FAS is faster than Newton’s method and Picard iterations for the experiments considered here. Due to the guaranteed approximation properties of our AMGe, the coarse spaces are very accurate...
Multigrid for Staggered Lattice Fermions
Energy Technology Data Exchange (ETDEWEB)
Brower, Richard C. [Boston U.; Clark, M. A. [Unlisted, US; Strelchenko, Alexei [Fermilab; Weinberg, Evan [Boston U.
2018-01-23
Critical slowing down in Krylov methods for the Dirac operator presents a major obstacle to further advances in lattice field theory as it approaches the continuum solution. Here we formulate a multi-grid algorithm for the Kogut-Susskind (or staggered) fermion discretization which has proven difficult relative to Wilson multigrid due to its first-order anti-Hermitian structure. The solution is to introduce a novel spectral transformation by the K\\"ahler-Dirac spin structure prior to the Galerkin projection. We present numerical results for the two-dimensional, two-flavor Schwinger model, however, the general formalism is agnostic to dimension and is directly applicable to four-dimensional lattice QCD.
International Nuclear Information System (INIS)
Barry, J.M.; Pollard, J.P.
1986-11-01
A FORTRAN subroutine MLTGRD is provided to solve efficiently the large systems of linear equations arising from a five-point finite difference discretisation of some elliptic partial differential equations. MLTGRD is a multigrid algorithm which provides multiplicative correction to iterative solution estimates from successively reduced systems of linear equations. It uses the method of implicit non-stationary iteration for all grid levels
Multigrid Reduction in Time for Nonlinear Parabolic Problems
Energy Technology Data Exchange (ETDEWEB)
Falgout, R. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Manteuffel, T. A. [Univ. of Colorado, Boulder, CO (United States); O' Neill, B. [Univ. of Colorado, Boulder, CO (United States); Schroder, J. B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2016-01-04
The need for parallel-in-time is being driven by changes in computer architectures, where future speed-ups will be available through greater concurrency, but not faster clock speeds, which are stagnant.This leads to a bottleneck for sequential time marching schemes, because they lack parallelism in the time dimension. Multigrid Reduction in Time (MGRIT) is an iterative procedure that allows for temporal parallelism by utilizing multigrid reduction techniques and a multilevel hierarchy of coarse time grids. MGRIT has been shown to be effective for linear problems, with speedups of up to 50 times. The goal of this work is the efficient solution of nonlinear problems with MGRIT, where efficient is defined as achieving similar performance when compared to a corresponding linear problem. As our benchmark, we use the p-Laplacian, where p = 4 corresponds to a well-known nonlinear diffusion equation and p = 2 corresponds to our benchmark linear diffusion problem. When considering linear problems and implicit methods, the use of optimal spatial solvers such as spatial multigrid imply that the cost of one time step evaluation is fixed across temporal levels, which have a large variation in time step sizes. This is not the case for nonlinear problems, where the work required increases dramatically on coarser time grids, where relatively large time steps lead to worse conditioned nonlinear solves and increased nonlinear iteration counts per time step evaluation. This is the key difficulty explored by this paper. We show that by using a variety of strategies, most importantly, spatial coarsening and an alternate initial guess to the nonlinear time-step solver, we can reduce the work per time step evaluation over all temporal levels to a range similar with the corresponding linear problem. This allows for parallel scaling behavior comparable to the corresponding linear problem.
Parallel implementation of a dynamic unstructured chimera method in the DLR finite volume TAU-code
International Nuclear Information System (INIS)
Madrane, A.; Raichle, A.; Stuermer, A.
2004-01-01
Aerodynamic problems involving moving geometries have many applications, including store separation, high-speed train entering into a tunnel, simulation of full configurations of the helicopter and fast maneuverability. Overset grid method offers the option of calculating these procedures. The solution process uses a grid system that discretizes the problem domain by using separately generated but overlapping unstructured grids that update and exchange boundary information through interpolation. However, such computations are complicated and time consuming. Parallel computing offers a very effective way to improve the productivity in doing computational fluid dynamics (CFD). Therefore the purpose of this study is to develop an efficient parallel computation algorithm for analyzing the flowfield of complex geometries using overset grids method. The strategy adopted in the parallelization of the overset grids method including the use of data structures and communication, is described. Numerical results are presented to demonstrate the efficiency of the resulting parallel overset grids method. (author)
Parallel implementation of a dynamic unstructured chimera method in the DLR finite volume TAU-code
Energy Technology Data Exchange (ETDEWEB)
Madrane, A.; Raichle, A.; Stuermer, A. [German Aerospace Center, DLR, Numerical Methods, Inst. of Aerodynamics and Flow Technology, Braunschweig (Germany)]. E-mail: aziz.madrane@dlr.de
2004-07-01
Aerodynamic problems involving moving geometries have many applications, including store separation, high-speed train entering into a tunnel, simulation of full configurations of the helicopter and fast maneuverability. Overset grid method offers the option of calculating these procedures. The solution process uses a grid system that discretizes the problem domain by using separately generated but overlapping unstructured grids that update and exchange boundary information through interpolation. However, such computations are complicated and time consuming. Parallel computing offers a very effective way to improve the productivity in doing computational fluid dynamics (CFD). Therefore the purpose of this study is to develop an efficient parallel computation algorithm for analyzing the flowfield of complex geometries using overset grids method. The strategy adopted in the parallelization of the overset grids method including the use of data structures and communication, is described. Numerical results are presented to demonstrate the efficiency of the resulting parallel overset grids method. (author)
Discrete Fourier analysis of multigrid algorithms
van der Vegt, Jacobus J.W.; Rhebergen, Sander
2011-01-01
The main topic of this report is a detailed discussion of the discrete Fourier multilevel analysis of multigrid algorithms. First, a brief overview of multigrid methods is given for discretizations of both linear and nonlinear partial differential equations. Special attention is given to the
International Nuclear Information System (INIS)
El-Wakil, S.A.; Sallah, M.; Degheidy, A.R.
2005-01-01
The time-dependent radiation transfer equation in plane geometry with Rayleigh scattering is studied. The traveling wave transformation is used to obtain the corresponding stationary-like equation. Pomraning-Eddington approximation is then used to calculate the radiation intensity in finite plane-parallel media. Numerical results and shielding calculations are shown for reflectivity and transmissivity at different times. The medium is assumed to have specular-reflecting boundaries. For the sake of comparison, two different weight functions are introduced and to force the boundary conditions to be fulfilled
International Nuclear Information System (INIS)
Candel, A.; Kabel, A.; Ko, K.; Lee, L.; Li, Z.; Limborg, C.; Ng, C.; Prudencio, E.; Schussman, G.; Uplenchwar, R.
2007-01-01
Over the past years, SLAC's Advanced Computations Department (ACD) has developed the parallel finite element (FE) particle-in-cell code Pic3P (Pic2P) for simulations of beam-cavity interactions dominated by space-charge effects. As opposed to standard space-charge dominated beam transport codes, which are based on the electrostatic approximation, Pic3P (Pic2P) includes space-charge, retardation and boundary effects as it self-consistently solves the complete set of Maxwell-Lorentz equations using higher-order FE methods on conformal meshes. Use of efficient, large-scale parallel processing allows for the modeling of photoinjectors with unprecedented accuracy, aiding the design and operation of the next-generation of accelerator facilities. Applications to the Linac Coherent Light Source (LCLS) RF gun are presented
An Optimal Order Nonnested Mixed Multigrid Method for Generalized Stokes Problems
Deng, Qingping
1996-01-01
A multigrid algorithm is developed and analyzed for generalized Stokes problems discretized by various nonnested mixed finite elements within a unified framework. It is abstractly proved by an element-independent analysis that the multigrid algorithm converges with an optimal order if there exists a 'good' prolongation operator. A technique to construct a 'good' prolongation operator for nonnested multilevel finite element spaces is proposed. Its basic idea is to introduce a sequence of auxiliary nested multilevel finite element spaces and define a prolongation operator as a composite operator of two single grid level operators. This makes not only the construction of a prolongation operator much easier (the final explicit forms of such prolongation operators are fairly simple), but the verification of the approximate properties for prolongation operators is also simplified. Finally, as an application, the framework and technique is applied to seven typical nonnested mixed finite elements.
Fast multigrid solution of the advection problem with closed characteristics
Energy Technology Data Exchange (ETDEWEB)
Yavneh, I. [Israel Inst. of Technology, Haifa (Israel); Venner, C.H. [Univ. of Twente, Enschede (Netherlands); Brandt, A. [Weizmann Inst. of Science, Rehovot (Israel)
1996-12-31
The numerical solution of the advection-diffusion problem in the inviscid limit with closed characteristics is studied as a prelude to an efficient high Reynolds-number flow solver. It is demonstrated by a heuristic analysis and numerical calculations that using upstream discretization with downstream relaxation-ordering and appropriate residual weighting in a simple multigrid V cycle produces an efficient solution process. We also derive upstream finite-difference approximations to the advection operator, whose truncation terms approximate {open_quotes}physical{close_quotes} (Laplacian) viscosity, thus avoiding spurious solutions to the homogeneous problem when the artificial diffusivity dominates the physical viscosity.
Kim, Jae Wook
2013-05-01
This paper proposes a novel systematic approach for the parallelization of pentadiagonal compact finite-difference schemes and filters based on domain decomposition. The proposed approach allows a pentadiagonal banded matrix system to be split into quasi-disjoint subsystems by using a linear-algebraic transformation technique. As a result the inversion of pentadiagonal matrices can be implemented within each subdomain in an independent manner subject to a conventional halo-exchange process. The proposed matrix transformation leads to new subdomain boundary (SB) compact schemes and filters that require three halo terms to exchange with neighboring subdomains. The internode communication overhead in the present approach is equivalent to that of standard explicit schemes and filters based on seven-point discretization stencils. The new SB compact schemes and filters demand additional arithmetic operations compared to the original serial ones. However, it is shown that the additional cost becomes sufficiently low by choosing optimal sizes of their discretization stencils. Compared to earlier published results, the proposed SB compact schemes and filters successfully reduce parallelization artifacts arising from subdomain boundaries to a level sufficiently negligible for sophisticated aeroacoustic simulations without degrading parallel efficiency. The overall performance and parallel efficiency of the proposed approach are demonstrated by stringent benchmark tests.
A parallel direct solver for the self-adaptive hp Finite Element Method
Paszyński, Maciej R.; Pardo, David; Torres-Verdí n, Carlos; Demkowicz, Leszek F.; Calo, Victor M.
2010-01-01
measurement simulations problems. We measure the execution time and memory usage of the solver over a large regular mesh with 1.5 million degrees of freedom as well as on the highly non-regular mesh, generated by the self-adaptive h p-FEM, with finite elements
International Nuclear Information System (INIS)
Vlad, M.; Spineanu, F.; Misguich, J.H.; Balescu, R.
2001-01-01
Particle diffusion in a given electrostatic turbulence with a finite correlation length along the confining magnetic field is studied in the test particle approach. An anomalous diffusion regime of amplified diffusion coefficients is found in the conditions when particle trapping in the structure of the stochastic potential is effective. The auto-generated radial electric field is calculated. (author)
Totally parallel multilevel algorithms
Frederickson, Paul O.
1988-01-01
Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.
Energy Technology Data Exchange (ETDEWEB)
Cwik, T. [California Institute of Technology, Pasadena, CA (United States); Katz, D.S. [Cray Research, El Segundo, CA (United States)
1996-12-31
Finite element modeling has proven useful for accurately simulating scattered or radiated electromagnetic fields from complex three-dimensional objects whose geometry varies on the scale of a fraction of an electrical wavelength. An unstructured finite element model of realistic objects leads to a large, sparse, system of equations that needs to be solved efficiently with regard to machine memory and execution time. Both factorization and iterative solvers can be used to produce solutions to these systems of equations. Factorization leads to high memory requirements that limit the electrical problem size of three-dimensional objects that can be modeled. An iterative solver can be used to efficiently solve the system without excessive memory use and in a minimal amount of time if the convergence rate is controlled.
Parametric instabilities of parallel propagating incoherent Alfven waves in a finite ion beta plasma
International Nuclear Information System (INIS)
Nariyuki, Y.; Hada, T.; Tsubouchi, K.
2007-01-01
Large amplitude, low-frequency Alfven waves constitute one of the most essential elements of magnetohydrodynamic (MHD) turbulence in the fast solar wind. Due to small collisionless dissipation rates, the waves can propagate long distances and efficiently convey such macroscopic quantities as momentum, energy, and helicity. Since loading of such quantities is completed when the waves damp away, it is important to examine how the waves can dissipate in the solar wind. Among various possible dissipation processes of the Alfven waves, parametric instabilities have been believed to be important. In this paper, we numerically discuss the parametric instabilities of coherent/incoherent Alfven waves in a finite ion beta plasma using a one-dimensional hybrid (superparticle ions plus an electron massless fluid) simulation, in order to explain local production of sunward propagating Alfven waves, as suggested by Helios/Ulysses observation results. Parameter studies clarify the dependence of parametric instabilities of coherent/incoherent Alfven waves on the ion and electron beta ratio. Parametric instabilities of coherent Alfven waves in a finite ion beta plasma are vastly different from those in the cold ions (i.e., MHD and/or Hall-MHD systems), even if the collisionless damping of the Alfven waves are neglected. Further, ''nonlinearly driven'' modulational instability is important for the dissipation of incoherent Alfven waves in a finite ion beta plasma regardless of their polarization, since the ion kinetic effects let both the right-hand and left-hand polarized waves become unstable to the modulational instability. The present results suggest that, although the antisunward propagating dispersive Alfven waves are efficiently dissipated through the parametric instabilities in a finite ion beta plasma, these instabilities hardly produce the sunward propagating waves
International Nuclear Information System (INIS)
Coulomb, F.
1989-06-01
The aim of this work is to study methods for solving the diffusion equation, based on a primal or mixed-dual finite elements discretization and well suited for use on multiprocessors computers; domain decomposition methods are the subject of the main part of this study, the linear systems being solved by the block-Jacobi method. The origin of the diffusion equation is explained in short, and various variational formulations are reminded. A survey of iterative methods is given. The elemination of the flux or current is treated in the case of a mixed method. Numerical tests are performed on two examples of reactors, in order to compare mixed elements and Lagrange elements. A theoretical study of domain decomposition is led in the case of Lagrange finite elements, and convergence conditions for the block-Jacobi method are derived; the dissection decomposition is previously the purpose of a particular numerical analysis. In the case of mixed-dual finite elements, a study is led on examples and is confirmed by numerical tests performed for the dissection decomposition; furthermore, after being justified, decompositions along axes of symmetry are numerically tested. In the case of a decomposition into two subdomains, the dissection decomposition and the decomposition with an integrated interface are compared. Alternative directions methods are defined; the convergence of those relative to Lagrange elements is shown; in the case of mixed elements, convergence conditions are found [fr
Extending the applicability of multigrid methods
International Nuclear Information System (INIS)
Brannick, J; Brezina, M; Falgout, R; Manteuffel, T; McCormick, S; Ruge, J; Sheehan, B; Xu, J; Zikatanov, L
2006-01-01
Multigrid methods are ideal for solving the increasingly large-scale problems that arise in numerical simulations of physical phenomena because of their potential for computational costs and memory requirements that scale linearly with the degrees of freedom. Unfortunately, they have been historically limited by their applicability to elliptic-type problems and the need for special handling in their implementation. In this paper, we present an overview of several recent theoretical and algorithmic advances made by the TOPS multigrid partners and their collaborators in extending applicability of multigrid methods. specific examples that are presented include quantum chromodynamics, radiation transport, and electromagnetics
Arteaga, Santiago Egido
1998-12-01
The steady-state Navier-Stokes equations are of considerable interest because they are used to model numerous common physical phenomena. The applications encountered in practice often involve small viscosities and complicated domain geometries, and they result in challenging problems in spite of the vast attention that has been dedicated to them. In this thesis we examine methods for computing the numerical solution of the primitive variable formulation of the incompressible equations on distributed memory parallel computers. We use the Galerkin method to discretize the differential equations, although most results are stated so that they apply also to stabilized methods. We also reformulate some classical results in a single framework and discuss some issues frequently dismissed in the literature, such as the implementation of pressure space basis and non- homogeneous boundary values. We consider three nonlinear methods: Newton's method, Oseen's (or Picard) iteration, and sequences of Stokes problems. All these iterative nonlinear methods require solving a linear system at every step. Newton's method has quadratic convergence while that of the others is only linear; however, we obtain theoretical bounds showing that Oseen's iteration is more robust, and we confirm it experimentally. In addition, although Oseen's iteration usually requires more iterations than Newton's method, the linear systems it generates tend to be simpler and its overall costs (in CPU time) are lower. The Stokes problems result in linear systems which are easier to solve, but its convergence is much slower, so that it is competitive only for large viscosities. Inexact versions of these methods are studied, and we explain why the best timings are obtained using relatively modest error tolerances in solving the corresponding linear systems. We also present a new damping optimization strategy based on the quadratic nature of the Navier-Stokes equations, which improves the robustness of all the
Directory of Open Access Journals (Sweden)
Daniel Marcsa
2015-01-01
Full Text Available The analysis and design of electromechanical devices involve the solution of large sparse linear systems, and require therefore high performance algorithms. In this paper, the primal Domain Decomposition Method (DDM with parallel forward-backward and with parallel Preconditioned Conjugate Gradient (PCG solvers are introduced in two-dimensional parallel time-stepping finite element formulation to analyze rotating machine considering the electromagnetic field, external circuit and rotor movement. The proposed parallel direct and the iterative solver with two preconditioners are analyzed concerning its computational efficiency and number of iterations of the solver with different preconditioners. Simulation results of a rotating machine is also presented.
Directory of Open Access Journals (Sweden)
Rek Václav
2016-11-01
Full Text Available In this paper, the form of modifications of the existing sequential code written in C or C++ programming language for the calculation of various kind of structures using the explicit form of the Finite Element Method (Dynamic Relaxation Method, Explicit Dynamics in the NEXX system is introduced. The NEXX system is the core of engineering software NEXIS, Scia Engineer, RFEM and RENEX. It has the possibilities of multithreaded running, which can now be supported at the level of native C++ programming language using standard libraries. Thanks to the high degree of abstraction that a contemporary C++ programming language provides, a respective library created in this way can be very generalized for other purposes of usage of parallelism in computational mechanics.
Summary Report: Multigrid for Systems of Elliptic PDEs
Energy Technology Data Exchange (ETDEWEB)
Lee, Barry [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2016-11-17
We are interested in determining if multigrid can be effectively applied to the system. The conclusion that I seem to be drawn to is that it is impossible to develop a blackbox multigrid solver for these general systems. Analysis of the system of PDEs must be conducted first to determine pre-processing procedures on the continuous problem before applying a multigrid method. Determining this pre-processing is currently not incorporated in black-box multigrid strategies. Nevertheless, we characterize some system features that will make the system more amenable to multigrid approaches, techniques that may lead to more amenable systems, and multigrid procedures that are generally more appropriate for these systems.
Directory of Open Access Journals (Sweden)
R. Daud
2013-06-01
Full Text Available Shielding interaction effects of two parallel edge cracks in finite thickness plates subjected to remote tension load is analyzed using a developed finite element analysis program. In the present study, the crack interaction limit is evaluated based on the fitness of service (FFS code, and focus is given to the weak crack interaction region as the crack interval exceeds the length of cracks (b > a. Crack interaction factors are evaluated based on stress intensity factors (SIFs for Mode I SIFs using a displacement extrapolation technique. Parametric studies involved a wide range of crack-to-width (0.05 ≤ a/W ≤ 0.5 and crack interval ratios (b/a > 1. For validation, crack interaction factors are compared with single edge crack SIFs as a state of zero interaction. Within the considered range of parameters, the proposed numerical evaluation used to predict the crack interaction factor reduces the error of existing analytical solution from 1.92% to 0.97% at higher a/W. In reference to FFS codes, the small discrepancy in the prediction of the crack interaction factor validates the reliability of the numerical model to predict crack interaction limits under shielding interaction effects. In conclusion, the numerical model gave a successful prediction in estimating the crack interaction limit, which can be used as a reference for the shielding orientation of other cracks.
Directory of Open Access Journals (Sweden)
Julián A García-Grajales
Full Text Available With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, functions of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells, Neurite, has only very recently been proposed. In this paper, we present the implementation details of this model: a finite difference parallel program for simulating electrical signal propagation along neurites under mechanical loading. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to simulate the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The simulation of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the simulated cells grows. The solvers implemented in Neurite--explicit and implicit--were therefore parallelized using graphics processing units in order to reduce the burden of the simulation costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between electrophysiology and mechanics. This paper provides the details of the parallel implementation of Neurite, along with three different application examples: a long myelinated axon
Directory of Open Access Journals (Sweden)
W.R. Azzam
2015-08-01
Full Text Available This paper reports the application of using a skirted foundation system to study the behavior of foundations with structural skirts adjacent to a sand slope and subjected to earthquake loading. The effect of the adopted skirts to safeguard foundation and slope from collapse is studied. The skirts effect on controlling horizontal soil movement and decreasing pore water pressure beneath foundations and beside the slopes during earthquake is investigated. This technique is investigated numerically using finite element analysis. A four story reinforced concrete building that rests on a raft foundation is idealized as a two-dimensional model with and without skirts. A two dimensional plain strain program PLAXIS, (dynamic version is adopted. A series of models for the problem under investigation were run under different skirt depths and lactation from the slope crest. The effect of subgrade relative density and skirts thickness is also discussed. Nodal displacement and element strains were analyzed for the foundation with and without skirts and at different studied parameters. The research results showed a great effectiveness in increasing the overall stability of the slope and foundation. The confined soil footing system by such skirts reduced the foundation acceleration therefore it can be tended to damping element and relieved the transmitted disturbance to the adjacent slope. This technique can be considered as a good method to control the slope deformation and decrease the slope acceleration during earthquakes.
Schultz, A.
2010-12-01
describe our ongoing efforts to achieve massive parallelization on a novel hybrid GPU testbed machine currently configured with 12 Intel Westmere Xeon CPU cores (or 24 parallel computational threads) with 96 GB DDR3 system memory, 4 GPU subsystems which in aggregate contain 960 NVidia Tesla GPU cores with 16 GB dedicated DDR3 GPU memory, and a second interleved bank of 4 GPU subsystems containing in aggregate 1792 NVidia Fermi GPU cores with 12 GB dedicated DDR5 GPU memory. We are applying domain decomposition methods to a modified version of Weiss' (2001) 3D frequency domain full physics EM finite difference code, an open source GPL licensed f90 code available for download from www.OpenEM.org. This will be the core of a new hybrid 3D inversion that parallelizes frequencies across CPUs and individual forward solutions across GPUs. We describe progress made in modifying the code to use direct solvers in GPU cores dedicated to each small subdomain, iteratively improving the solution by matching adjacent subdomain boundary solutions, rather than iterative Krylov space sparse solvers as currently applied to the whole domain.
Algorithms for computational fluid dynamics n parallel processors
International Nuclear Information System (INIS)
Van de Velde, E.F.
1986-01-01
A study of parallel algorithms for the numerical solution of partial differential equations arising in computational fluid dynamics is presented. The actual implementation on parallel processors of shared and nonshared memory design is discussed. The performance of these algorithms is analyzed in terms of machine efficiency, communication time, bottlenecks and software development costs. For elliptic equations, a parallel preconditioned conjugate gradient method is described, which has been used to solve pressure equations discretized with high order finite elements on irregular grids. A parallel full multigrid method and a parallel fast Poisson solver are also presented. Hyperbolic conservation laws were discretized with parallel versions of finite difference methods like the Lax-Wendroff scheme and with the Random Choice method. Techniques are developed for comparing the behavior of an algorithm on different architectures as a function of problem size and local computational effort. Effective use of these advanced architecture machines requires the use of machine dependent programming. It is shown that the portability problems can be minimized by introducing high level operations on vectors and matrices structured into program libraries
Multigrid methods for partial differential equations - a short introduction
International Nuclear Information System (INIS)
Linden, J.; Stueben, K.
1993-01-01
These notes summarize the multigrid methods and emphasis is laid on the algorithmic concepts of multigrid for solving linear and non-linear partial differential equations. In this paper there is brief description of the basic structure of multigrid methods. Detailed introduction is also contained with applications to VLSI process simulation. (A.B.)
Energy Technology Data Exchange (ETDEWEB)
Paula, A.V. de, E-mail: vagtinski@mecanica.ufrgs.br [PROMEC – Programa de Pós Graduação em Engenharia Mecânica, UFRGS – Universidade Federal do Rio Grande do Sul, Porto Alegre, RS (Brazil); Möller, S.V., E-mail: svmoller@ufrgs.br [PROMEC – Programa de Pós Graduação em Engenharia Mecânica, UFRGS – Universidade Federal do Rio Grande do Sul, Porto Alegre, RS (Brazil)
2013-11-15
This paper presents a study of the bistable phenomenon which occurs in the turbulent flow impinging on circular cylinders placed side-by-side. Time series of axial and transversal velocity obtained with the constant temperature hot wire anemometry technique in an aerodynamic channel are used as input data in a finite mixture model, to classify the observed data according to a family of probability density functions. Wavelet transforms are applied to analyze the unsteady turbulent signals. Results of flow visualization show that the flow is predominantly two-dimensional. A double-well energy model is suggested to describe the behavior of the bistable phenomenon in this case. -- Highlights: ► Bistable flow on two parallel cylinders is studied with hot wire anemometry as a first step for the application on the analysis to tube bank flow. ► The method of maximum likelihood estimation is applied to hot wire experimental series to classify the data according to PDF functions in a mixture model approach. ► Results show no evident correlation between the changes of flow modes with time. ► An energy model suggests the presence of more than two flow modes.
A Pseudo-Temporal Multi-Grid Relaxation Scheme for Solving the Parabolized Navier-Stokes Equations
White, J. A.; Morrison, J. H.
1999-01-01
A multi-grid, flux-difference-split, finite-volume code, VULCAN, is presented for solving the elliptic and parabolized form of the equations governing three-dimensional, turbulent, calorically perfect and non-equilibrium chemically reacting flows. The space marching algorithms developed to improve convergence rate and or reduce computational cost are emphasized. The algorithms presented are extensions to the class of implicit pseudo-time iterative, upwind space-marching schemes. A full approximate storage, full multi-grid scheme is also described which is used to accelerate the convergence of a Gauss-Seidel relaxation method. The multi-grid algorithm is shown to significantly improve convergence on high aspect ratio grids.
Multicloud: Multigrid convergence with a meshless operator
International Nuclear Information System (INIS)
Katz, Aaron; Jameson, Antony
2009-01-01
The primary objective of this work is to develop and test a new convergence acceleration technique we call multicloud. Multicloud is well-founded in the mathematical basis of multigrid, but relies on a meshless operator on coarse levels. The meshless operator enables extremely simple and automatic coarsening procedures for arbitrary meshes using arbitrary fine level discretization schemes. The performance of multicloud is compared with established multigrid techniques for structured and unstructured meshes for the Euler equations on two-dimensional test cases. Results indicate comparable convergence rates per unit work for multicloud and multigrid. However, because of its mesh and scheme transparency, multicloud may be applied to a wide array of problems with no modification of fine level schemes as is often required with agglomeration techniques. The implication is that multicloud can be implemented in a completely modular fashion, allowing researchers to develop fine level algorithms independent of the convergence accelerator for complex three-dimensional problems.
New multigrid solver advances in TOPS
International Nuclear Information System (INIS)
Falgout, R D; Brannick, J; Brezina, M; Manteuffel, T; McCormick, S
2005-01-01
In this paper, we highlight new multigrid solver advances in the Terascale Optimal PDE Simulations (TOPS) project in the Scientific Discovery Through Advanced Computing (SciDAC) program. We discuss two new algebraic multigrid (AMG) developments in TOPS: the adaptive smoothed aggregation method (αSA) and a coarse-grid selection algorithm based on compatible relaxation (CR). The αSA method is showing promising results in initial studies for Quantum Chromodynamics (QCD) applications. The CR method has the potential to greatly improve the applicability of AMG
Experiences using multigrid for geothermal simulation
Energy Technology Data Exchange (ETDEWEB)
Bullivant, D.P.; O`Sullivan, M.J. [Univ. of Auckland (New Zealand); Yang, Z. [Univ. of New South Wales (Australia)
1995-03-01
Experiences of applying multigrid to the calculation of natural states for geothermal simulations are discussed. The modelling of natural states was chosen for this study because they can take a long time to compute and the computation is often dominated by the development of phase change boundaries that take up a small region in the simulation. For the first part of this work a modified version of TOUGH was used for 2-D vertical problems. A {open_quotes}test-bed{close_quotes} program is now being used to investigate some of the problems encountered with implementing multigrid. This is ongoing work. To date, there have been some encouraging but not startling results.
Highly indefinite multigrid for eigenvalue problems
Energy Technology Data Exchange (ETDEWEB)
Borges, L.; Oliveira, S.
1996-12-31
Eigenvalue problems are extremely important in understanding dynamic processes such as vibrations and control systems. Large scale eigenvalue problems can be very difficult to solve, especially if a large number of eigenvalues and the corresponding eigenvectors need to be computed. For solving this problem a multigrid preconditioned algorithm is presented in {open_quotes}The Davidson Algorithm, preconditioning and misconvergence{close_quotes}. Another approach for solving eigenvalue problems is by developing efficient solutions for highly indefinite problems. In this paper we concentrate on the use of new highly indefinite multigrid algorithms for the eigenvalue problem.
Efficient multigrid computation of steady hypersonic flows
Koren, B.; Hemker, P.W.; Murthy, T.K.S.
1991-01-01
In steady hypersonic flow computations, Newton iteration as a local relaxation procedure and nonlinear multigrid iteration as an acceleration procedure may both easily fail. In the present chapter, same remedies are presented for overcoming these problems. The equations considered are the steady,
The multigrid method for reactor calculations
International Nuclear Information System (INIS)
Douglas, S.R.
1991-07-01
Iterative solutions to linear systems of equations are discussed. The emphasis is on the concepts that affect convergence rates of these solution methods. The multigrid method is described, including the smoothing property, restriction, and prolongation. A simple example is used to illustrate the ideas
The Closest Point Method and Multigrid Solvers for Elliptic Equations on Surfaces
Chen, Yujia
2015-01-01
© 2015 Society for Industrial and Applied Mathematics. Elliptic partial differential equations are important from both application and analysis points of view. In this paper we apply the closest point method to solve elliptic equations on general curved surfaces. Based on the closest point representation of the underlying surface, we formulate an embedding equation for the surface elliptic problem, then discretize it using standard finite differences and interpolation schemes on banded but uniform Cartesian grids. We prove the convergence of the difference scheme for the Poisson\\'s equation on a smooth closed curve. In order to solve the resulting large sparse linear systems, we propose a specific geometric multigrid method in the setting of the closest point method. Convergence studies in both the accuracy of the difference scheme and the speed of the multigrid algorithm show that our approaches are effective.
Parallel computing solution of Boltzmann neutron transport equation
International Nuclear Information System (INIS)
Ansah-Narh, T.
2010-01-01
The focus of the research was on developing parallel computing algorithm for solving Eigen-values of the Boltzmam Neutron Transport Equation (BNTE) in a slab geometry using multi-grid approach. In response to the problem of slow execution of serial computing when solving large problems, such as BNTE, the study was focused on the design of parallel computing systems which was an evolution of serial computing that used multiple processing elements simultaneously to solve complex physical and mathematical problems. Finite element method (FEM) was used for the spatial discretization scheme, while angular discretization was accomplished by expanding the angular dependence in terms of Legendre polynomials. The eigenvalues representing the multiplication factors in the BNTE were determined by the power method. MATLAB Compiler Version 4.1 (R2009a) was used to compile the MATLAB codes of BNTE. The implemented parallel algorithms were enabled with matlabpool, a Parallel Computing Toolbox function. The option UseParallel was set to 'always' and the default value of the option was 'never'. When those conditions held, the solvers computed estimated gradients in parallel. The parallel computing system was used to handle all the bottlenecks in the matrix generated from the finite element scheme and each domain of the power method generated. The parallel algorithm was implemented on a Symmetric Multi Processor (SMP) cluster machine, which had Intel 32 bit quad-core x 86 processors. Convergence rates and timings for the algorithm on the SMP cluster machine were obtained. Numerical experiments indicated the designed parallel algorithm could reach perfect speedup and had good stability and scalability. (au)
International Nuclear Information System (INIS)
Da Silva, R S; De Carvalho, D K E; Antunes, A R E; Lyra, P R M; Willmersdorf, R B
2010-01-01
In this paper a finite volume method with a 'Modified Implicit Pressure, Explicit Saturation' (MIMPES) approach is used to model the 3-D incompressible and immiscible two-phase flow of water and oil in heterogeneous and anisotropic porous media. A vertex centered finite volume method with an edge-based data structure is adopted to discretize both the elliptic pressure and the hyperbolic saturation equations using parallel computers with distributed memory. Due to the explicit solution of the saturation equation in the IMPES method, severe time step restrictions are imposed on the simulation. In order to circumvent this problem, an edge-based implementation of the MIMPES method was used. In this method, the pressure equation is solved and the velocity field is computed much less frequently than the saturation field. Following the work of Hurtado, a mean relative variation of the velocity field throughout the simulation is used to automatically control the updating process, allowing for much larger time-steps in a very simple way. In order to run large scale problems, we have developed a parallel implementation using clusters of PC's. The simulator uses open source parallel libraries like FMDB, ParMetis and PETSc. Results of speed-up and efficiency are presented to validate the performance of the parallel simulator.
Multigrid for refined triangle meshes
Energy Technology Data Exchange (ETDEWEB)
Shapira, Yair
1997-02-01
A two-level preconditioning method for the solution of (locally) refined finite element schemes using triangle meshes is introduced. In the isotropic SPD case, it is shown that the condition number of the preconditioned stiffness matrix is bounded uniformly for all sufficiently regular triangulations. This is also verified numerically for an isotropic diffusion problem with highly discontinuous coefficients.
Energy Technology Data Exchange (ETDEWEB)
Seidl, V.
1997-11-01
A finite vomume method for calculation of steady and unsteady flow on unstructured grids is parallelized by local spatial and time decomposition. In the first case, a parallel variant of the conjugated gradient method with multiple local preconditioning is formulated and analyzed. The method is tested for simple applications (e.g. flow around a cylinder). The second part of the publication describes a direct numerical simulation of turbulent flow around a sphere at a Reynolds number of 5000 (based on flow velocity and sphere diameter). Current and Reynolds-averaged flow fields are discussed. Particular emphasis is placed on coordinate-independent representation of the anisotropy ratios of the Reynolds tensor and dissipation tensor. (orig.) [Deutsch] Ein Finite-Volumen-Verfahren fuer die Berechnung stationaerer und instationaerer Stroemungen auf unstrukturierten Netzen wird durch Gebietszerlegung im Raum und Zeit parallelisiert. Fuer die raeumliche Zerlegung wird eine parallele Variante der konjugierten Gradienten Methode mit mehrfacher, lokaler Vorkonditionierung formuliert und analysiert. Anhand einfacher Anwendungsbeispiele (Zylinderumstroemung, deckelgetriebene Nischenstroemung) wird das entwickelte Gesamtverfahren getestet und seine Effizienz bestimmt. Der zweite Teil der Arbeit beschreibt eine direkte numerische Simulation der turbulenten Kugelumstroemung bei einer Reynolds-Zahl von 5 000 (basierend auf Anstroemgeschwindigkeit und Kugeldurchmesser). In der Ergebnisauswertung werden augenblickliche und Reynolds-gemittelte Stroemungsfelder diskutiert und besonderer Wert auf eine koordinatenunabhaengige Darstellung der Anisotropieverhaeltnisse des Reynolds-Tensors und des Dissipationstensors gelegt. (orig.)
Moghaderi, Hamid; Dehghan, Mehdi; Donatelli, Marco; Mazza, Mariarosa
2017-12-01
Fractional diffusion equations (FDEs) are a mathematical tool used for describing some special diffusion phenomena arising in many different applications like porous media and computational finance. In this paper, we focus on a two-dimensional space-FDE problem discretized by means of a second order finite difference scheme obtained as combination of the Crank-Nicolson scheme and the so-called weighted and shifted Grünwald formula. By fully exploiting the Toeplitz-like structure of the resulting linear system, we provide a detailed spectral analysis of the coefficient matrix at each time step, both in the case of constant and variable diffusion coefficients. Such a spectral analysis has a very crucial role, since it can be used for designing fast and robust iterative solvers. In particular, we employ the obtained spectral information to define a Galerkin multigrid method based on the classical linear interpolation as grid transfer operator and damped-Jacobi as smoother, and to prove the linear convergence rate of the corresponding two-grid method. The theoretical analysis suggests that the proposed grid transfer operator is strong enough for working also with the V-cycle method and the geometric multigrid. On this basis, we introduce two computationally favourable variants of the proposed multigrid method and we use them as preconditioners for Krylov methods. Several numerical results confirm that the resulting preconditioning strategies still keep a linear convergence rate.
Parallel discontinuous Galerkin FEM for computing hyperbolic conservation law on unstructured grids
Ma, Xinrong; Duan, Zhijian
2018-04-01
High-order resolution Discontinuous Galerkin finite element methods (DGFEM) has been known as a good method for solving Euler equations and Navier-Stokes equations on unstructured grid, but it costs too much computational resources. An efficient parallel algorithm was presented for solving the compressible Euler equations. Moreover, the multigrid strategy based on three-stage three-order TVD Runge-Kutta scheme was used in order to improve the computational efficiency of DGFEM and accelerate the convergence of the solution of unsteady compressible Euler equations. In order to make each processor maintain load balancing, the domain decomposition method was employed. Numerical experiment performed for the inviscid transonic flow fluid problems around NACA0012 airfoil and M6 wing. The results indicated that our parallel algorithm can improve acceleration and efficiency significantly, which is suitable for calculating the complex flow fluid.
Asynchronous Task-Based Parallelization of Algebraic Multigrid
AlOnazi, Amani A.; Markomanolis, George S.; Keyes, David E.
2017-01-01
As processor clock rates become more dynamic and workloads become more adaptive, the vulnerability to global synchronization that already complicates programming for performance in today's petascale environment will be exacerbated. Algebraic
High-Fidelity RF Gun Simulations with the Parallel 3D Finite Element Particle-In-Cell Code Pic3P
Energy Technology Data Exchange (ETDEWEB)
Candel, A; Kabel, A.; Lee, L.; Li, Z.; Limborg, C.; Ng, C.; Schussman, G.; Ko, K.; /SLAC
2009-06-19
SLAC's Advanced Computations Department (ACD) has developed the first parallel Finite Element 3D Particle-In-Cell (PIC) code, Pic3P, for simulations of RF guns and other space-charge dominated beam-cavity interactions. Pic3P solves the complete set of Maxwell-Lorentz equations and thus includes space charge, retardation and wakefield effects from first principles. Pic3P uses higher-order Finite Elementmethods on unstructured conformal meshes. A novel scheme for causal adaptive refinement and dynamic load balancing enable unprecedented simulation accuracy, aiding the design and operation of the next generation of accelerator facilities. Application to the Linac Coherent Light Source (LCLS) RF gun is presented.
Directory of Open Access Journals (Sweden)
Xiaoqing Wang
2016-01-01
Full Text Available Parallel analyses about the dynamic responses of a large-scale water conveyance tunnel under seismic excitation are presented in this paper. A full three-dimensional numerical model considering the water-tunnel-soil coupling is established and adopted to investigate the tunnel’s dynamic responses. The movement and sloshing of the internal water are simulated using the multi-material Arbitrary Lagrangian Eulerian (ALE method. Nonlinear fluid–structure interaction (FSI between tunnel and inner water is treated by using the penalty method. Nonlinear soil-structure interaction (SSI between soil and tunnel is dealt with by using the surface to surface contact algorithm. To overcome computing power limitations and to deal with such a large-scale calculation, a parallel algorithm based on the modified recursive coordinate bisection (MRCB considering the balance of SSI and FSI loads is proposed and used. The whole simulation is accomplished on Dawning 5000 A using the proposed MRCB based parallel algorithm optimized to run on supercomputers. The simulation model and the proposed approaches are validated by comparison with the added mass method. Dynamic responses of the tunnel are analyzed and the parallelism is discussed. Besides, factors affecting the dynamic responses are investigated. Better speedup and parallel efficiency show the scalability of the parallel method and the analysis results can be used to aid in the design of water conveyance tunnels.
International Nuclear Information System (INIS)
Belliard, M.; Grandotto, M.
2003-01-01
In the framework of the two-phase fluid simulations of the steam generators of pressurized water nuclear reactors, we present in this paper a geometric version of a pseudo-Full MultiGrid (pseudo- FMG) Full Approximation Storage (FAS) preconditioning of balance equations in the GENEPI code. In our application, the 3D steady state flow is reached by a transient computation using a semi-implicit fractional step algorithm for the averaged two-phase mixture balance equations (mass, momentum and energy for the secondary flow). Our application, running on workstation clusters, is based on a CEA code-linker and the PVM package. The difficulties to apply the geometric FAS multigrid method to the momentum and mass balance equations are addressed. The use of a sequential pseudo-FMG FAS twogrid method for both energy and mass/momentum balance equations, using dynamic multigrid cycles, leads to perceptibly improvements in the computation convergences. An original parallel red-black pseudo-FMG FAS three-grid algorithm is presented too. The numerical tests (steam generator mockup simulations) underline the sizable increase in speed of convergence of the computations, essentially for the ones involving a large number of freedom degrees (about 100 thousand cells). The two-phase mixture balance equation residuals are quickly reduced: the reached speed-up stands between 2 and 3 following the number of grids. The effects on the convergence behavior of the numerical parameters are investigated
Multi-Grid detector for neutron spectroscopy: results obtained on time-of-flight spectrometer CNCS
Anastasopoulos, M.; Bebb, R.; Berry, K.; Birch, J.; Bryś, T.; Buffet, J.-C.; Clergeau, J.-F.; Deen, P. P.; Ehlers, G.; van Esch, P.; Everett, S. M.; Guerard, B.; Hall-Wilton, R.; Herwig, K.; Hultman, L.; Höglund, C.; Iruretagoiena, I.; Issa, F.; Jensen, J.; Khaplanov, A.; Kirstein, O.; Lopez Higuera, I.; Piscitelli, F.; Robinson, L.; Schmidt, S.; Stefanescu, I.
2017-04-01
The Multi-Grid detector technology has evolved from the proof-of-principle and characterisation stages. Here we report on the performance of the Multi-Grid detector, the MG.CNCS prototype, which has been installed and tested at the Cold Neutron Chopper Spectrometer, CNCS at SNS. This has allowed a side-by-side comparison to the performance of 3He detectors on an operational instrument. The demonstrator has an active area of 0.2 m2. It is specifically tailored to the specifications of CNCS. The detector was installed in June 2016 and has operated since then, collecting neutron scattering data in parallel to the He-3 detectors of CNCS. In this paper, we present a comprehensive analysis of this data, in particular on instrument energy resolution, rate capability, background and relative efficiency. Stability, gamma-ray and fast neutron sensitivity have also been investigated. The effect of scattering in the detector components has been measured and provides input to comparison for Monte Carlo simulations. All data is presented in comparison to that measured by the 3He detectors simultaneously, showing that all features recorded by one detector are also recorded by the other. The energy resolution matches closely. We find that the Multi-Grid is able to match the data collected by 3He, and see an indication of a considerable advantage in the count rate capability. Based on these results, we are confident that the Multi-Grid detector will be capable of producing high quality scientific data on chopper spectrometers utilising the unprecedented neutron flux of the ESS.
Directory of Open Access Journals (Sweden)
Makoto Ito
2015-11-01
Full Text Available Previous theoretical studies of animal and human behavioral learning have focused on the dichotomy of the value-based strategy using action value functions to predict rewards and the model-based strategy using internal models to predict environmental states. However, animals and humans often take simple procedural behaviors, such as the "win-stay, lose-switch" strategy without explicit prediction of rewards or states. Here we consider another strategy, the finite state-based strategy, in which a subject selects an action depending on its discrete internal state and updates the state depending on the action chosen and the reward outcome. By analyzing choice behavior of rats in a free-choice task, we found that the finite state-based strategy fitted their behavioral choices more accurately than value-based and model-based strategies did. When fitted models were run autonomously with the same task, only the finite state-based strategy could reproduce the key feature of choice sequences. Analyses of neural activity recorded from the dorsolateral striatum (DLS, the dorsomedial striatum (DMS, and the ventral striatum (VS identified significant fractions of neurons in all three subareas for which activities were correlated with individual states of the finite state-based strategy. The signal of internal states at the time of choice was found in DMS, and for clusters of states was found in VS. In addition, action values and state values of the value-based strategy were encoded in DMS and VS, respectively. These results suggest that both the value-based strategy and the finite state-based strategy are implemented in the striatum.
Ito, Makoto; Doya, Kenji
2015-11-01
Previous theoretical studies of animal and human behavioral learning have focused on the dichotomy of the value-based strategy using action value functions to predict rewards and the model-based strategy using internal models to predict environmental states. However, animals and humans often take simple procedural behaviors, such as the "win-stay, lose-switch" strategy without explicit prediction of rewards or states. Here we consider another strategy, the finite state-based strategy, in which a subject selects an action depending on its discrete internal state and updates the state depending on the action chosen and the reward outcome. By analyzing choice behavior of rats in a free-choice task, we found that the finite state-based strategy fitted their behavioral choices more accurately than value-based and model-based strategies did. When fitted models were run autonomously with the same task, only the finite state-based strategy could reproduce the key feature of choice sequences. Analyses of neural activity recorded from the dorsolateral striatum (DLS), the dorsomedial striatum (DMS), and the ventral striatum (VS) identified significant fractions of neurons in all three subareas for which activities were correlated with individual states of the finite state-based strategy. The signal of internal states at the time of choice was found in DMS, and for clusters of states was found in VS. In addition, action values and state values of the value-based strategy were encoded in DMS and VS, respectively. These results suggest that both the value-based strategy and the finite state-based strategy are implemented in the striatum.
International Nuclear Information System (INIS)
Malinowski, S.
1984-01-01
It is shown that the total field energy for general solutions of the sourceless Maxwell's equations with E-vector parallel to B-vector is infinite. Moreover the action and/or the ''pseudoscalar charge'' must be infinite too in this case. Therefore the expected similarity to the instanton or meron solutions of nonabelian gauge theories is illusory. 5 refs. (author)
Energy Technology Data Exchange (ETDEWEB)
McLay, R.T.; Carey, G.F.
1996-12-31
In this study we consider parallel solution of sparse linear systems arising from discretized PDE`s. As part of our continuing work on our parallel PCG Solver package, we have made improvements in two areas. The first is improving the performance of the matrix-vector product. Here on regular finite-difference grids, we are able to use the cache memory more efficiently for smaller domains or where there are multiple degrees of freedom. The second problem of interest in the present work is the construction of preconditioners in the context of the parallel PCG solver we are developing. Here the problem is partitioned over a set of processors subdomains and the matrix-vector product for PCG is carried out in parallel for overlapping grid subblocks. For problems of scaled speedup, the actual rate of convergence of the unpreconditioned system deteriorates as the mesh is refined. Multigrid and subdomain strategies provide a logical approach to resolving the problem. We consider the parallel trade-offs between communication and computation and provide a complexity analysis of a representative algorithm. Some preliminary calculations using the parallel package and comparisons with other preconditioners are provided together with parallel performance results.
International Nuclear Information System (INIS)
Onsager, T.G.; Winske, D.; Thomsen, M.F.
1991-01-01
The coupling of a finite-length, field-aligned, ion beam with a uniform background plasma is investigated using one-dimensional hybrid computer simulations. The finite-length beam is used to study the interaction between the incident solar wind and ions reflected from the Earth's quasi-parallel bow shock, where the reflection process may vary with time. The coupling between the reflected ions and the solar wind is relevant to ion heating at the bow shock and possibly to the formation of hot, flow anomalies and re-formation of the shock itself. The authors find that although there are many similarities between the instabilities driven by the finite-length beam and those predicted by linear theory for an infinite, homogeneous beam, there are also some important differences. Consistent with linear theory, the waves which dominate the interaction are the electromagnetic right-hand polarized resonant and nonresonant modes. However, in addition to the instability growth rates, the length of time that the waves are in contact with the beam is also an important factor in determining which wave mode will dominate the interaction. Whereas linear theory predicts the nonresonant mode to have the larger growth rate for the parameters they investigate, with finite-length beam they find that both the nonresonant and resonant modes contribute to the interaction. They find that the interaction will result in strong coupling, where a significant fraction of the available free energy is converted into thermal energy in a short time, provided the beam is sufficiently dense or sufficiently long
Kifonidis, K.; Müller, E.
2012-08-01
Aims: We describe and study a family of new multigrid iterative solvers for the multidimensional, implicitly discretized equations of hydrodynamics. Schemes of this class are free of the Courant-Friedrichs-Lewy condition. They are intended for simulations in which widely differing wave propagation timescales are present. A preferred solver in this class is identified. Applications to some simple stiff test problems that are governed by the compressible Euler equations, are presented to evaluate the convergence behavior, and the stability properties of this solver. Algorithmic areas are determined where further work is required to make the method sufficiently efficient and robust for future application to difficult astrophysical flow problems. Methods: The basic equations are formulated and discretized on non-orthogonal, structured curvilinear meshes. Roe's approximate Riemann solver and a second-order accurate reconstruction scheme are used for spatial discretization. Implicit Runge-Kutta (ESDIRK) schemes are employed for temporal discretization. The resulting discrete equations are solved with a full-coarsening, non-linear multigrid method. Smoothing is performed with multistage-implicit smoothers. These are applied here to the time-dependent equations by means of dual time stepping. Results: For steady-state problems, our results show that the efficiency of the present approach is comparable to the best implicit solvers for conservative discretizations of the compressible Euler equations that can be found in the literature. The use of red-black as opposed to symmetric Gauss-Seidel iteration in the multistage-smoother is found to have only a minor impact on multigrid convergence. This should enable scalable parallelization without having to seriously compromise the method's algorithmic efficiency. For time-dependent test problems, our results reveal that the multigrid convergence rate degrades with increasing Courant numbers (i.e. time step sizes). Beyond a
International Nuclear Information System (INIS)
Yamada, Tomonori
2010-01-01
The safety requirement of nuclear power plant attracts much attention nowadays. With the growing computing power, numerical simulation is one of key technologies to meet this safety requirement. Center for Computational Science and e-Systems of Japan Atomic Energy Agency has been developing a finite element analysis code for assembled structure to accurately evaluate the structural integrity of nuclear power plant in its entirety under seismic events. Because nuclear power plant is very huge assembled structure with tens of millions of mechanical components, the finite element model of each component is assembled into one structure and non-conforming meshes of mechanical components are bonded together inside the code. The main technique to bond these mechanical components is triple sparse matrix multiplication with multiple point constrains and global stiffness matrix. In our code, this procedure is conducted in a component by component manner, so that the working memory size and computing time for this multiplication are available on the current computing environment. As an illustrative example, seismic simulation of a real nuclear reactor of High Temperature engineering Test Reactor, which is located at the O-arai research and development center of JAEA, with 80 major mechanical components was conducted. Consequently, our code successfully simulated detailed elasto-plastic deformation of nuclear reactor and its computational performance was investigated. (author)
Multigrid methods for the computation of propagators in gauge fields
International Nuclear Information System (INIS)
Kalkreuter, T.
1992-11-01
In the present work generalizations of multigrid methods for propagators in gauge fields are investigated. We discuss proper averaging operations for bosons and for staggered fermions. An efficient algorithm for computing C numerically is presented. The averaging kernels C can be used not only in deterministic multigrid computations, but also in multigrid Monte Carlo simulations, and for the definition of block spins and blocked gauge fields in Monte Carlo renormalization group studies of gauge theories. Actual numerical computations of kernels and propagators are performed in compact four-dimensional SU(2) gauge fields. (orig./HSI)
Scalable multi-grid preconditioning techniques for the even-parity S_N solver in UNIC
International Nuclear Information System (INIS)
Mahadevan, Vijay S.; Smith, Michael A.
2011-01-01
The Even-parity neutron transport equation with FE-S_N discretization is solved traditionally using SOR preconditioned CG method at the lowest level of iterations in order to compute the criticality in reactor analysis problems. The use of high order isoparametric finite elements prohibits the formation of the discrete operator explicitly due to memory constraints in peta scale architectures. Hence, a h-p multi-grid preconditioner based on linear tessellation of the higher order mesh is introduced here for the space-angle system and compared against SOR and Algebraic MG black-box solvers. The performance and scalability of the multi-grid scheme was determined for two test problems and found to be competitive in terms of both computational time and memory requirements. The implementation of this preconditioner in an even-parity solver like UNIC from ANL can further enable high fidelity calculations in a scalable manner on peta flop machines. (author)
Progress with multigrid schemes for hypersonic flow problems
International Nuclear Information System (INIS)
Radespiel, R.; Swanson, R.C.
1995-01-01
Several multigrid schemes are considered for the numerical computation of viscous hypersonic flows. For each scheme, the basic solution algorithm employs upwind spatial discretization with explicit multistage time stepping. Two-level versions of the various multigrid algorithms are applied to the two-dimensional advection equation, and Fourier analysis is used to determine their damping properties. The capabilities of the multigrid methods are assessed by solving three different hypersonic flow problems. Some new multigrid schemes based on semicoarsening strategies are shown to be quite effective in relieving the stiffness caused by the high-aspect-ratio cells required to resolve high Reynolds number flows. These schemes exhibit good convergence rates for Reynolds numbers up to 200 X 10 6 and Mach numbers up to 25. 32 refs., 31 figs., 1 tab
Parallel Solver for H(div) Problems Using Hybridization and AMG
Energy Technology Data Exchange (ETDEWEB)
Lee, Chak S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Vassilevski, Panayot S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2016-01-15
In this paper, a scalable parallel solver is proposed for H(div) problems discretized by arbitrary order finite elements on general unstructured meshes. The solver is based on hybridization and algebraic multigrid (AMG). Unlike some previously studied H(div) solvers, the hybridization solver does not require discrete curl and gradient operators as additional input from the user. Instead, only some element information is needed in the construction of the solver. The hybridization results in a H1-equivalent symmetric positive definite system, which is then rescaled and solved by AMG solvers designed for H1 problems. Weak and strong scaling of the method are examined through several numerical tests. Our numerical results show that the proposed solver provides a promising alternative to ADS, a state-of-the-art solver [12], for H(div) problems. In fact, it outperforms ADS for higher order elements.
Neural multigrid for gauge theories and other disordered systems
International Nuclear Information System (INIS)
Baeker, M.; Kalkreuter, T.; Mack, G.; Speh, M.
1992-09-01
We present evidence that multigrid works for wave equations in disordered systems, e.g. in the presence of gauge fields, no matter how strong the disorder, but one needs to introduce a 'neural computations' point of view into large scale simulations: First, the system must learn how to do the simulations efficiently, then do the simulation (fast). The method can also be used to provide smooth interpolation kernels which are needed in multigrid Monte Carlo updates. (orig.)
Matrix-dependent multigrid-homogenization for diffusion problems
Energy Technology Data Exchange (ETDEWEB)
Knapek, S. [Institut fuer Informatik tu Muenchen (Germany)
1996-12-31
We present a method to approximately determine the effective diffusion coefficient on the coarse scale level of problems with strongly varying or discontinuous diffusion coefficients. It is based on techniques used also in multigrid, like Dendy`s matrix-dependent prolongations and the construction of coarse grid operators by means of the Galerkin approximation. In numerical experiments, we compare our multigrid-homogenization method with homogenization, renormalization and averaging approaches.
Scalable smoothing strategies for a geometric multigrid method for the immersed boundary equations
Energy Technology Data Exchange (ETDEWEB)
Bhalla, Amneet Pal Singh [Univ. of North Carolina, Chapel Hill, NC (United States); Knepley, Matthew G. [Rice Univ., Houston, TX (United States); Adams, Mark F. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Guy, Robert D. [Univ. of California, Davis, CA (United States); Griffith, Boyce E. [Univ. of North Carolina, Chapel Hill, NC (United States)
2016-12-20
The immersed boundary (IB) method is a widely used approach to simulating fluid-structure interaction (FSI). Although explicit versions of the IB method can suffer from severe time step size restrictions, these methods remain popular because of their simplicity and generality. In prior work (Guy et al., Adv Comput Math, 2015), some of us developed a geometric multigrid preconditioner for a stable semi-implicit IB method under Stokes flow conditions; however, this solver methodology used a Vanka-type smoother that presented limited opportunities for parallelization. This work extends this Stokes-IB solver methodology by developing smoothing techniques that are suitable for parallel implementation. Specifically, we demonstrate that an additive version of the Vanka smoother can yield an effective multigrid preconditioner for the Stokes-IB equations, and we introduce an efficient Schur complement-based smoother that is also shown to be effective for the Stokes-IB equations. We investigate the performance of these solvers for a broad range of material stiffnesses, both for Stokes flows and flows at nonzero Reynolds numbers, and for thick and thin structural models. We show here that linear solver performance degrades with increasing Reynolds number and material stiffness, especially for thin interface cases. Nonetheless, the proposed approaches promise to yield effective solution algorithms, especially at lower Reynolds numbers and at modest-to-high elastic stiffnesses.
A Critical Study of Agglomerated Multigrid Methods for Diffusion
Nishikawa, Hiroaki; Diskin, Boris; Thomas, James L.
2011-01-01
Agglomerated multigrid techniques used in unstructured-grid methods are studied critically for a model problem representative of laminar diffusion in the incompressible limit. The studied target-grid discretizations and discretizations used on agglomerated grids are typical of current node-centered formulations. Agglomerated multigrid convergence rates are presented using a range of two- and three-dimensional randomly perturbed unstructured grids for simple geometries with isotropic and stretched grids. Two agglomeration techniques are used within an overall topology-preserving agglomeration framework. The results show that multigrid with an inconsistent coarse-grid scheme using only the edge terms (also referred to in the literature as a thin-layer formulation) provides considerable speedup over single-grid methods but its convergence deteriorates on finer grids. Multigrid with a Galerkin coarse-grid discretization using piecewise-constant prolongation and a heuristic correction factor is slower and also grid-dependent. In contrast, grid-independent convergence rates are demonstrated for multigrid with consistent coarse-grid discretizations. Convergence rates of multigrid cycles are verified with quantitative analysis methods in which parts of the two-grid cycle are replaced by their idealized counterparts.
DEFF Research Database (Denmark)
Kolmogorov, Dmitry
turbine computations, collocated grid-based SIMPLE-like algorithms are developed for computations on block-structured grids with nonconformal interfaces. A technique to enhance both the convergence speed and the solution accuracy of the SIMPLE-like algorithms is presented. The erroneous behavior, which...... versions of the SIMPLE algorithm. The new technique is implemented in an existing conservative 2nd order finite-volume scheme flow solver (EllipSys), which is extended to cope with grids with nonconformal interfaces. The behavior of the discrete Navier-Stokes equations is discussed in detail...... Block LU relaxation scheme is shown to possess several optimal conditions, which enables to preserve high efficiency of the multigrid solver on both conformal and nonconformal grids. The developments are done using a parallel MPI algorithm, which can handle multiple numbers of interfaces with multiple...
Energy Technology Data Exchange (ETDEWEB)
White, M.J.; Iskander, M.F. [Univ. of Utah, Salt Lake City, UT (United States). Electrical Engineering Dept.; Kimrey, H.D. [Oak Ridge National Lab., TN (United States)
1996-12-31
The Finite-Difference Time-Domain (FDTD) code available at the University of Utah has been used to simulate sintering of ceramics in single and multimode cavities, and many useful results have been reported in literature. More detailed and accurate results, specifically around and including the ceramic sample, are often desired to help evaluate the adequacy of the heating procedure. In electrically large multimode cavities, however, computer memory requirements limit the number of the mathematical cells, and the desired resolution is impractical to achieve due to limited computer resources. Therefore, an FDTD algorithm which incorporates multiple-grid regions with variable-grid sizes is required to adequately perform the desired simulations. In this paper the authors describe the development of a three-dimensional multi-grid FDTD code to help focus a large number of cells around the desired region. Test geometries were solved using a uniform-grid and the developed multi-grid code to help validate the results from the developed code. Results from these comparisons, as well as the results of comparisons between the developed FDTD code and other available variable-grid codes are presented. In addition, results from the simulation of realistic microwave sintering experiments showed improved resolution in critical sites inside the three-dimensional sintering cavity. With the validation of the FDTD code, simulations were performed for electrically large, multimode, microwave sintering cavities to fully demonstrate the advantages of the developed multi-grid FDTD code.
Conservative multigrid methods for Cahn-Hilliard fluids
International Nuclear Information System (INIS)
Kim, Junseok; Kang, Kyungkeun; Lowengrub, John
2004-01-01
We develop a conservative, second-order accurate fully implicit discretization of the Navier-Stokes (NS) and Cahn-Hilliard (CH) system that has an associated discrete energy functional. This system provides a diffuse-interface description of binary fluid flows with compressible or incompressible flow components [R. Soc. Lond. Proc. Ser. A Math. Phys. Eng. Sci. 454 (1998) 2617]. In this work, we focus on the case of flows containing two immiscible, incompressible and density-matched components. The scheme, however, has a straightforward extension to multi-component systems. To efficiently solve the discrete system at the implicit time-level, we develop a nonlinear multigrid method to solve the CH equation which is then coupled to a projection method that is used to solve the NS equation. We demonstrate convergence of our scheme numerically in both the presence and absence of flow and perform simulations of phase separation via spinodal decomposition. We examine the separate effects of surface tension and external flow on the decomposition. We find surface tension driven flow alone increases coalescence rates through the retraction of interfaces. When there is an applied external shear, the evolution of the flow is nontrivial and the flow morphology repeats itself in time as multiple pinchoff and reconnection events occur. Eventually, the periodic motion ceases and the system relaxes to a global equilibrium. The equilibria we observe appears has a similar structure in all cases although the dynamics of the evolution is quite different. We view the work presented in this paper as preparatory for a detailed investigation of liquid-liquid interfaces with surface tension where the interfaces separate two immiscible fluids [On the pinchoff of liquid-liquid jets with surface tension, in preparation]. To this end, we also include a simulation of the pinchoff of a liquid thread under the Rayleigh instability at finite Reynolds number
Dinar, N.
1978-01-01
Several aspects of multigrid methods are briefly described. The main subjects include the development of very efficient multigrid algorithms for systems of elliptic equations (Cauchy-Riemann, Stokes, Navier-Stokes), as well as the development of control and prediction tools (based on local mode Fourier analysis), used to analyze, check and improve these algorithms. Preliminary research on multigrid algorithms for time dependent parabolic equations is also described. Improvements in existing multigrid processes and algorithms for elliptic equations were studied.
Zaghi, S.
2014-07-01
OFF, an open source (free software) code for performing fluid dynamics simulations, is presented. The aim of OFF is to solve, numerically, the unsteady (and steady) compressible Navier-Stokes equations of fluid dynamics by means of finite volume techniques: the research background is mainly focused on high-order (WENO) schemes for multi-fluids, multi-phase flows over complex geometries. To this purpose a highly modular, object-oriented application program interface (API) has been developed. In particular, the concepts of data encapsulation and inheritance available within Fortran language (from standard 2003) have been stressed in order to represent each fluid dynamics "entity" (e.g. the conservative variables of a finite volume, its geometry, etc…) by a single object so that a large variety of computational libraries can be easily (and efficiently) developed upon these objects. The main features of OFF can be summarized as follows: Programming LanguageOFF is written in standard (compliant) Fortran 2003; its design is highly modular in order to enhance simplicity of use and maintenance without compromising the efficiency; Parallel Frameworks Supported the development of OFF has been also targeted to maximize the computational efficiency: the code is designed to run on shared-memory multi-cores workstations and distributed-memory clusters of shared-memory nodes (supercomputers); the code's parallelization is based on Open Multiprocessing (OpenMP) and Message Passing Interface (MPI) paradigms; Usability, Maintenance and Enhancement in order to improve the usability, maintenance and enhancement of the code also the documentation has been carefully taken into account; the documentation is built upon comprehensive comments placed directly into the source files (no external documentation files needed): these comments are parsed by means of doxygen free software producing high quality html and latex documentation pages; the distributed versioning system referred as git
Monolithic multigrid method for the coupled Stokes flow and deformable porous medium system
Luo, P.; Rodrigo, C.; Gaspar, F. J.; Oosterlee, C. W.
2018-01-01
The interaction between fluid flow and a deformable porous medium is a complicated multi-physics problem, which can be described by a coupled model based on the Stokes and poroelastic equations. A monolithic multigrid method together with either a coupled Vanka smoother or a decoupled Uzawa smoother is employed as an efficient numerical technique for the linear discrete system obtained by finite volumes on staggered grids. A specialty in our modeling approach is that at the interface of the fluid and poroelastic medium, two unknowns from the different subsystems are defined at the same grid point. We propose a special discretization at and near the points on the interface, which combines the approximation of the governing equations and the considered interface conditions. In the decoupled Uzawa smoother, Local Fourier Analysis (LFA) helps us to select optimal values of the relaxation parameter appearing. To implement the monolithic multigrid method, grid partitioning is used to deal with the interface updates when communication is required between two subdomains. Numerical experiments show that the proposed numerical method has an excellent convergence rate. The efficiency and robustness of the method are confirmed in numerical experiments with typically small realistic values of the physical coefficients.
Chen, Hui; Deng, Ju-Zhi; Yin, Min; Yin, Chang-Chun; Tang, Wen-Wu
2017-03-01
To speed up three-dimensional (3D) DC resistivity modeling, we present a new multigrid method, the aggregation-based algebraic multigrid method (AGMG). We first discretize the differential equation of the secondary potential field with mixed boundary conditions by using a seven-point finite-difference method to obtain a large sparse system of linear equations. Then, we introduce the theory behind the pairwise aggregation algorithms for AGMG and use the conjugate-gradient method with the V-cycle AGMG preconditioner (AGMG-CG) to solve the linear equations. We use typical geoelectrical models to test the proposed AGMG-CG method and compare the results with analytical solutions and the 3DDCXH algorithm for 3D DC modeling (3DDCXH). In addition, we apply the AGMG-CG method to different grid sizes and geoelectrical models and compare it to different iterative methods, such as ILU-BICGSTAB, ILU-GCR, and SSOR-CG. The AGMG-CG method yields nearly linearly decreasing errors, whereas the number of iterations increases slowly with increasing grid size. The AGMG-CG method is precise and converges fast, and thus can improve the computational efficiency in forward modeling of three-dimensional DC resistivity.
International Nuclear Information System (INIS)
Cole, C.R.; Foote, H.P.
1987-02-01
Reliable ground water transport prediction requires accurate spatial and temporal characterization of a hydrogeologic system. However, cost constraints and the desire to maintain site integrity by minimizing drilling can restrict the amount of spatial sampling that can be obtained to resolve the flow parameter variability associated with heterogeneities. This study quantifies the errors in subsurface transport predictions resulting from incomplete characterization of hydraulic conductivity heterogeneity. A multigrid technique was used to simulate two-dimensional flow velocity fields with high resolution. To obtain these velocity fields, the finite difference code MGRID, which implements a multigrid solution technique, was applied to compute stream functions on a 256-by-256 grid for a variety of hypothetical systems having detailed distributions of hydraulic conductivity. Spatial variability in hydraulic conductivity distributions was characterized by the components in the spectrum of spatial frequencies. A low-pass spatial filtering technique was applied to the base case hydraulic conductivity distribution to produce a data set with lower spatial frequency content. Arrival time curves were then calculated for filtered hydraulic conductivity distribution and compared to base case results to judge the relative importance of the higher spatial frequency components. Results indicate a progression from multimode to single-mode arrival time curves as the number and extent of distinct flow pathways are reduced by low-pass filtering. This relationship between transport predictions and spatial frequencies was used to judge the consequences of sampling the hydraulic conductivity with reduced spatial resolution. 22 refs., 17 figs
Desmet, Gert
2013-11-01
The finite length parallel zone (FPZ)-model is proposed as an alternative model for the axial- or eddy-dispersion caused by the occurrence of local velocity biases or flow heterogeneities in porous media such as those used in liquid chromatography columns. The mathematical plate height expression evolving from the model shows that the A- and C-term band broadening effects that can originate from a given velocity bias should be coupled in an exponentially decaying way instead of harmonically as proposed in Giddings' coupling theory. In the low and high velocity limit both models converge, while a 12% difference can be observed in the (practically most relevant) intermediate range of reduced velocities. Explicit expressions for the A- and C-constants appearing in the exponential decay-based plate height expression have been derived for each of the different possible velocity bias levels (single through-pore and particle level, multi-particle level and trans-column level). These expressions allow to directly relate the band broadening originating from these different levels to the local fundamental transport parameters, hence offering the possibility to include a velocity-dependent and, if, needed retention factor-dependent transversal dispersion coefficient. Having developed the mathematics for the general case wherein a difference in retention equilibrium establishes between the two parallel zones, the effect of any possible local variations in packing density and/or retention capacity on the eddy-dispersion can be explicitly accounted for as well. It is furthermore also shown that, whereas the lumped transport parameter model used in the basic variant of the FPZ-model only provides a first approximation of the true decay constant, the model can be extended by introducing a constant correction factor to correctly account for the continuous transversal dispersion transport in the velocity bias zones. Copyright © 2013 Elsevier B.V. All rights reserved.
Multigrid Methods for Fully Implicit Oil Reservoir Simulation
Molenaar, J.
1996-01-01
In this paper we consider the simultaneous flow of oil and water in reservoir rock. This displacement process is modeled by two basic equations: the material balance or continuity equations and the equation of motion (Darcy's law). For the numerical solution of this system of nonlinear partial differential equations there are two approaches: the fully implicit or simultaneous solution method and the sequential solution method. In the sequential solution method the system of partial differential equations is manipulated to give an elliptic pressure equation and a hyperbolic (or parabolic) saturation equation. In the IMPES approach the pressure equation is first solved, using values for the saturation from the previous time level. Next the saturations are updated by some explicit time stepping method; this implies that the method is only conditionally stable. For the numerical solution of the linear, elliptic pressure equation multigrid methods have become an accepted technique. On the other hand, the fully implicit method is unconditionally stable, but it has the disadvantage that in every time step a large system of nonlinear algebraic equations has to be solved. The most time-consuming part of any fully implicit reservoir simulator is the solution of this large system of equations. Usually this is done by Newton's method. The resulting systems of linear equations are then either solved by a direct method or by some conjugate gradient type method. In this paper we consider the possibility of applying multigrid methods for the iterative solution of the systems of nonlinear equations. There are two ways of using multigrid for this job: either we use a nonlinear multigrid method or we use a linear multigrid method to deal with the linear systems that arise in Newton's method. So far only a few authors have reported on the use of multigrid methods for fully implicit simulations. Two-level FAS algorithm is presented for the black-oil equations, and linear multigrid for
Márquez, Andrés; Francés, Jorge; Martínez, Francisco J.; Gallego, Sergi; Álvarez, Mariela L.; Calzado, Eva M.; Pascual, Inmaculada; Beléndez, Augusto
2018-03-01
Simplified analytical models with predictive capability enable simpler and faster optimization of the performance in applications of complex photonic devices. We recently demonstrated the most simplified analytical model still showing predictive capability for parallel-aligned liquid crystal on silicon (PA-LCoS) devices, which provides the voltage-dependent retardance for a very wide range of incidence angles and any wavelength in the visible. We further show that the proposed model is not only phenomenological but also physically meaningful, since two of its parameters provide the correct values for important internal properties of these devices related to the birefringence, cell gap, and director profile. Therefore, the proposed model can be used as a means to inspect internal physical properties of the cell. As an innovation, we also show the applicability of the split-field finite-difference time-domain (SF-FDTD) technique for phase-shift and retardance evaluation of PA-LCoS devices under oblique incidence. As a simplified model for PA-LCoS devices, we also consider the exact description of homogeneous birefringent slabs. However, we show that, despite its higher degree of simplification, the proposed model is more robust, providing unambiguous and physically meaningful solutions when fitting its parameters.
International Nuclear Information System (INIS)
Vadlamani, Srinath; Kruger, Scott; Austin, Travis
2008-01-01
Extended magnetohydrodynamic (MHD) codes are used to model the large, slow-growing instabilities that are projected to limit the performance of International Thermonuclear Experimental Reactor (ITER). The multiscale nature of the extended MHD equations requires an implicit approach. The current linear solvers needed for the implicit algorithm scale poorly because the resultant matrices are so ill-conditioned. A new solver is needed, especially one that scales to the petascale. The most successful scalable parallel processor solvers to date are multigrid solvers. Applying multigrid techniques to a set of equations whose fundamental modes are dispersive waves is a promising solution to CEMM problems. For the Phase 1, we implemented multigrid preconditioners from the HYPRE project of the Center for Applied Scientific Computing at LLNL via PETSc of the DOE SciDAC TOPS for the real matrix systems of the extended MHD code NIMROD which is a one of the primary modeling codes of the OFES-funded Center for Extended Magnetohydrodynamic Modeling (CEMM) SciDAC. We implemented the multigrid solvers on the fusion test problem that allows for real matrix systems with success, and in the process learned about the details of NIMROD data structures and the difficulties of inverting NIMROD operators. The further success of this project will allow for efficient usage of future petascale computers at the National Leadership Facilities: Oak Ridge National Laboratory, Argonne National Laboratory, and National Energy Research Scientific Computing Center. The project will be a collaborative effort between computational plasma physicists and applied mathematicians at Tech-X Corporation, applied mathematicians Front Range Scientific Computations, Inc. (who are collaborators on the HYPRE project), and other computational plasma physicists involved with the CEMM project.
van der Vegt, Jacobus J.W.; Rhebergen, Sander
2011-01-01
The hp-Multigrid as Smoother algorithm (hp-MGS) for the solution of higher order accurate space-(time) discontinuous Galerkin discretizations of advection dominated flows is presented. This algorithm combines p-multigrid with h-multigrid at all p-levels, where the h-multigrid acts as smoother in the
Botti, L.; Colombo, A.; Bassi, F.
2017-10-01
In this work we exploit agglomeration based h-multigrid preconditioners to speed-up the iterative solution of discontinuous Galerkin discretizations of the Stokes and Navier-Stokes equations. As a distinctive feature h-coarsened mesh sequences are generated by recursive agglomeration of a fine grid, admitting arbitrarily unstructured grids of complex domains, and agglomeration based discontinuous Galerkin discretizations are employed to deal with agglomerated elements of coarse levels. Both the expense of building coarse grid operators and the performance of the resulting multigrid iteration are investigated. For the sake of efficiency coarse grid operators are inherited through element-by-element L2 projections, avoiding the cost of numerical integration over agglomerated elements. Specific care is devoted to the projection of viscous terms discretized by means of the BR2 dG method. We demonstrate that enforcing the correct amount of stabilization on coarse grids levels is mandatory for achieving uniform convergence with respect to the number of levels. The numerical solution of steady and unsteady, linear and non-linear problems is considered tackling challenging 2D test cases and 3D real life computations on parallel architectures. Significant execution time gains are documented.
Esmaily, M.; Jofre, L.; Mani, A.; Iaccarino, G.
2018-03-01
A geometric multigrid algorithm is introduced for solving nonsymmetric linear systems resulting from the discretization of the variable density Navier-Stokes equations on nonuniform structured rectilinear grids and high-Reynolds number flows. The restriction operation is defined such that the resulting system on the coarser grids is symmetric, thereby allowing for the use of efficient smoother algorithms. To achieve an optimal rate of convergence, the sequence of interpolation and restriction operations are determined through a dynamic procedure. A parallel partitioning strategy is introduced to minimize communication while maintaining the load balance between all processors. To test the proposed algorithm, we consider two cases: 1) homogeneous isotropic turbulence discretized on uniform grids and 2) turbulent duct flow discretized on stretched grids. Testing the algorithm on systems with up to a billion unknowns shows that the cost varies linearly with the number of unknowns. This O (N) behavior confirms the robustness of the proposed multigrid method regarding ill-conditioning of large systems characteristic of multiscale high-Reynolds number turbulent flows. The robustness of our method to density variations is established by considering cases where density varies sharply in space by a factor of up to 104, showing its applicability to two-phase flow problems. Strong and weak scalability studies are carried out, employing up to 30,000 processors, to examine the parallel performance of our implementation. Excellent scalability of our solver is shown for a granularity as low as 104 to 105 unknowns per processor. At its tested peak throughput, it solves approximately 4 billion unknowns per second employing over 16,000 processors with a parallel efficiency higher than 50%.
Investigations on application of multigrid method to MHD equilibrium analysis
International Nuclear Information System (INIS)
Ikuno, Soichiro
2000-01-01
The potentiality of application for Multi-grid method to MHD equilibrium analysis is investigated. The nonlinear eigenvalue problem often appears when the MHD equilibria are determined by solving the Grad-Shafranov equation numerically. After linearization of the equation, the problem is solved by use of the iterative method. Although the Red-Black SOR method or Gauss-Seidel method is often used for the solution of the linearized equation, it takes much CPU time to solve the problem. The Multi-grid method is compared with the SOR method for the Poisson Problem. The results of computations show that the CPU time required for the Multi-grid method is about 1000 times as small as that for the SOR method. (author)
Multigrid and multilevel domain decomposition for unstructured grids
Energy Technology Data Exchange (ETDEWEB)
Chan, T.; Smith, B.
1994-12-31
Multigrid has proven itself to be a very versatile method for the iterative solution of linear and nonlinear systems of equations arising from the discretization of PDES. In some applications, however, no natural multilevel structure of grids is available, and these must be generated as part of the solution procedure. In this presentation the authors will consider the problem of generating a multigrid algorithm when only a fine, unstructured grid is given. Their techniques generate a sequence of coarser grids by first forming an approximate maximal independent set of the vertices and then applying a Cavendish type algorithm to form the coarser triangulation. Numerical tests indicate that convergence using this approach can be as fast as standard multigrid on a structured mesh, at least in two dimensions.
Kordy, M. A.; Wannamaker, P. E.; Maris, V.; Cherkaev, E.; Hill, G. J.
2014-12-01
We have developed an algorithm for 3D simulation and inversion of magnetotelluric (MT) responses using deformable hexahedral finite elements that permits incorporation of topography. Direct solvers parallelized on symmetric multiprocessor (SMP), single-chassis workstations with large RAM are used for the forward solution, parameter jacobians, and model update. The forward simulator, jacobians calculations, as well as synthetic and real data inversion are presented. We use first-order edge elements to represent the secondary electric field (E), yielding accuracy O(h) for E and its curl (magnetic field). For very low frequency or small material admittivity, the E-field requires divergence correction. Using Hodge decomposition, correction may be applied after the forward solution is calculated. It allows accurate E-field solutions in dielectric air. The system matrix factorization is computed using the MUMPS library, which shows moderately good scalability through 12 processor cores but limited gains beyond that. The factored matrix is used to calculate the forward response as well as the jacobians of field and MT responses using the reciprocity theorem. Comparison with other codes demonstrates accuracy of our forward calculations. We consider a popular conductive/resistive double brick structure and several topographic models. In particular, the ability of finite elements to represent smooth topographic slopes permits accurate simulation of refraction of electromagnetic waves normal to the slopes at high frequencies. Run time tests indicate that for meshes as large as 150x150x60 elements, MT forward response and jacobians can be calculated in ~2.5 hours per frequency. For inversion, we implemented data space Gauss-Newton method, which offers reduction in memory requirement and a significant speedup of the parameter step versus model space approach. For dense matrix operations we use tiling approach of PLASMA library, which shows very good scalability. In synthetic
Final report on the Copper Mountain conference on multigrid methods
Energy Technology Data Exchange (ETDEWEB)
NONE
1997-10-01
The Copper Mountain Conference on Multigrid Methods was held on April 6-11, 1997. It took the same format used in the previous Copper Mountain Conferences on Multigrid Method conferences. Over 87 mathematicians from all over the world attended the meeting. 56 half-hour talks on current research topics were presented. Talks with similar content were organized into sessions. Session topics included: fluids; domain decomposition; iterative methods; basics; adaptive methods; non-linear filtering; CFD; applications; transport; algebraic solvers; supercomputing; and student paper winners.
Kordy, M.; Wannamaker, P.; Maris, V.; Cherkaev, E.; Hill, G.
2016-01-01
Following the creation described in Part I of a deformable edge finite-element simulator for 3-D magnetotelluric (MT) responses using direct solvers, in Part II we develop an algorithm named HexMT for 3-D regularized inversion of MT data including topography. Direct solvers parallelized on large-RAM, symmetric multiprocessor (SMP) workstations are used also for the Gauss-Newton model update. By exploiting the data-space approach, the computational cost of the model update becomes much less in both time and computer memory than the cost of the forward simulation. In order to regularize using the second norm of the gradient, we factor the matrix related to the regularization term and apply its inverse to the Jacobian, which is done using the MKL PARDISO library. For dense matrix multiplication and factorization related to the model update, we use the PLASMA library which shows very good scalability across processor cores. A synthetic test inversion using a simple hill model shows that including topography can be important; in this case depression of the electric field by the hill can cause false conductors at depth or mask the presence of resistive structure. With a simple model of two buried bricks, a uniform spatial weighting for the norm of model smoothing recovered more accurate locations for the tomographic images compared to weightings which were a function of parameter Jacobians. We implement joint inversion for static distortion matrices tested using the Dublin secret model 2, for which we are able to reduce nRMS to ˜1.1 while avoiding oscillatory convergence. Finally we test the code on field data by inverting full impedance and tipper MT responses collected around Mount St Helens in the Cascade volcanic chain. Among several prominent structures, the north-south trending, eruption-controlling shear zone is clearly imaged in the inversion.
Primal-Dual Interior Point Multigrid Method for Topology Optimization
Czech Academy of Sciences Publication Activity Database
Kočvara, Michal; Mohammed, S.
2016-01-01
Roč. 38, č. 5 (2016), B685-B709 ISSN 1064-8275 Grant - others:European Commission - EC(XE) 313781 Institutional support: RVO:67985556 Keywords : topology optimization * multigrid method s * interior point method Subject RIV: BA - General Mathematics Impact factor: 2.195, year: 2016 http://library.utia.cas.cz/separaty/2016/MTR/kocvara-0462418.pdf
On multigrid-CG for efficient topology optimization
DEFF Research Database (Denmark)
Amir, Oded; Aage, Niels; Lazarov, Boyan Stefanov
2014-01-01
reduction is obtained by exploiting specific characteristics of a multigrid preconditioned conjugate gradients (MGCG) solver. In particular, the number of MGCG iterations is reduced by relating it to the geometric parameters of the problem. At the same time, accurate outcome of the optimization process...
Multigrid Methods for the Computation of Propagators in Gauge Fields
Kalkreuter, Thomas
Multigrid methods were invented for the solution of discretized partial differential equations in order to overcome the slowness of traditional algorithms by updates on various length scales. In the present work generalizations of multigrid methods for propagators in gauge fields are investigated. Gauge fields are incorporated in algorithms in a covariant way. The kernel C of the restriction operator which averages from one grid to the next coarser grid is defined by projection on the ground-state of a local Hamiltonian. The idea behind this definition is that the appropriate notion of smoothness depends on the dynamics. The ground-state projection choice of C can be used in arbitrary dimension and for arbitrary gauge group. We discuss proper averaging operations for bosons and for staggered fermions. The kernels C can also be used in multigrid Monte Carlo simulations, and for the definition of block spins and blocked gauge fields in Monte Carlo renormalization group studies. Actual numerical computations are performed in four-dimensional SU(2) gauge fields. We prove that our proposals for block spins are “good”, using renormalization group arguments. A central result is that the multigrid method works in arbitrarily disordered gauge fields, in principle. It is proved that computations of propagators in gauge fields without critical slowing down are possible when one uses an ideal interpolation kernel. Unfortunately, the idealized algorithm is not practical, but it was important to answer questions of principle. Practical methods are able to outperform the conjugate gradient algorithm in case of bosons. The case of staggered fermions is harder. Multigrid methods give considerable speed-ups compared to conventional relaxation algorithms, but on lattices up to 184 conjugate gradient is superior.
Ground-state projection multigrid for propagators in 4-dimensional SU(2) gauge fields
International Nuclear Information System (INIS)
Kalkreuter, T.
1991-09-01
The ground-state projection multigrid method is studied for computations of slowly decaying bosonic propagators in 4-dimensional SU(2) lattice gauge theory. The defining eigenvalue equation for the restriction operator is solved exactly. Although the critical exponent z is not reduced in nontrivial gauge fields, multigrid still yields considerable speedup compared with conventional relaxation. Multigrid is also able to outperform the conjugate gradient algorithm. (orig.)
Achieving Textbook Multigrid Efficiency for Hydrostatic Ice Sheet Flow
Brown, Jed
2013-03-12
The hydrostatic equations for ice sheet flow offer improved fidelity compared with the shallow ice approximation and shallow stream approximation popular in today\\'s ice sheet models. Nevertheless, they present a serious bottleneck because they require the solution of a three-dimensional (3D) nonlinear system, as opposed to the two-dimensional system present in the shallow stream approximation. This 3D system is posed on high-aspect domains with strong anisotropy and variation in coefficients, making it expensive to solve with current methods. This paper presents a Newton--Krylov multigrid solver for the hydrostatic equations that demonstrates textbook multigrid efficiency (an order of magnitude reduction in residual per iteration and solution of the fine-level system at a small multiple of the cost of a residual evaluation). Scalability on Blue Gene/P is demonstrated, and the method is compared to various algebraic methods that are in use or have been proposed as viable approaches.
Achieving Textbook Multigrid Efficiency for Hydrostatic Ice Sheet Flow
Brown, Jed; Smith, Barry; Ahmadia, Aron
2013-01-01
The hydrostatic equations for ice sheet flow offer improved fidelity compared with the shallow ice approximation and shallow stream approximation popular in today's ice sheet models. Nevertheless, they present a serious bottleneck because they require the solution of a three-dimensional (3D) nonlinear system, as opposed to the two-dimensional system present in the shallow stream approximation. This 3D system is posed on high-aspect domains with strong anisotropy and variation in coefficients, making it expensive to solve with current methods. This paper presents a Newton--Krylov multigrid solver for the hydrostatic equations that demonstrates textbook multigrid efficiency (an order of magnitude reduction in residual per iteration and solution of the fine-level system at a small multiple of the cost of a residual evaluation). Scalability on Blue Gene/P is demonstrated, and the method is compared to various algebraic methods that are in use or have been proposed as viable approaches.
Towards a multigrid scheme in SU(2) lattice gauge theory
International Nuclear Information System (INIS)
Gutbrod, F.
1992-12-01
The task of constructing a viable updating multigrid scheme for SU(2) lattice gauge theory is discussed in connection with the classical eigenvalue problem. For a nonlocal overrelaxation Monte Carlo update step, the central numerical problem is the search for the minimum of a quadratic approximation to the action under nonlocal constraints. Here approximate eigenfunctions are essential to reduce the numerical work, and these eigenfunctions are to be constructed with multigrid techniques. A simple implementation on asymmetric lattices is described, where the grids are restricted to 3-dimensional hyperplanes. The scheme is shown to be moderately successful in the early stages of the updating history (starting from a cold configuration). The main results of another, less asymmetric scheme are presented briefly. (orig.)
Domain decomposition multigrid for unstructured grids
Energy Technology Data Exchange (ETDEWEB)
Shapira, Yair
1997-01-01
A two-level preconditioning method for the solution of elliptic boundary value problems using finite element schemes on possibly unstructured meshes is introduced. It is based on a domain decomposition and a Galerkin scheme for the coarse level vertex unknowns. For both the implementation and the analysis, it is not required that the curves of discontinuity in the coefficients of the PDE match the interfaces between subdomains. Generalizations to nonmatching or overlapping grids are made.
Multigrid time-accurate integration of Navier-Stokes equations
Arnone, Andrea; Liou, Meng-Sing; Povinelli, Louis A.
1993-01-01
Efficient acceleration techniques typical of explicit steady-state solvers are extended to time-accurate calculations. Stability restrictions are greatly reduced by means of a fully implicit time discretization. A four-stage Runge-Kutta scheme with local time stepping, residual smoothing, and multigridding is used instead of traditional time-expensive factorizations. Some applications to natural and forced unsteady viscous flows show the capability of the procedure.
Multigrid Computation of Stratified Flow over Two-Dimensional Obstacles
Paisley, M. F.
1997-09-01
A robust multigrid method for the incompressible Navier-Stokes equations is presented and applied to the computation of viscous flow over obstacles in a bounded domain under conditions of neutral stability and stable density stratification. Two obstacle shapes have been used, namely a vertical barrier, for which the grid is Cartesian, and a smooth cosine-shaped obstacle, for which a boundary-conforming transformation is incorporated. Results are given for laminar flows at low Reynolds numbers and turbulent flows at a high Reynolds number, when a simple mixing length turbulence model is included. The multigrid algorithm is used to compute steady flows for each obstacle at low and high Reynolds numbers in conditions of weak static stability, defined byK=ND/πU≤ 1, whereU,N, andDare the upstream velocity, bouyancy frequency, and domain height respectively. Results are also presented for the vertical barrier at low and high Reynolds number in conditions of strong static stability,K> 1, when lee wave motions ensure that the flow is unsteady, and the multigrid algorithm is used to compute the flow at each timestep.
Cao, Jian; Chen, Jing-Bo; Dai, Meng-Xue
2018-01-01
An efficient finite-difference frequency-domain modeling of seismic wave propagation relies on the discrete schemes and appropriate solving methods. The average-derivative optimal scheme for the scalar wave modeling is advantageous in terms of the storage saving for the system of linear equations and the flexibility for arbitrary directional sampling intervals. However, using a LU-decomposition-based direct solver to solve its resulting system of linear equations is very costly for both memory and computational requirements. To address this issue, we consider establishing a multigrid-preconditioned BI-CGSTAB iterative solver fit for the average-derivative optimal scheme. The choice of preconditioning matrix and its corresponding multigrid components is made with the help of Fourier spectral analysis and local mode analysis, respectively, which is important for the convergence. Furthermore, we find that for the computation with unequal directional sampling interval, the anisotropic smoothing in the multigrid precondition may affect the convergence rate of this iterative solver. Successful numerical applications of this iterative solver for the homogenous and heterogeneous models in 2D and 3D are presented where the significant reduction of computer memory and the improvement of computational efficiency are demonstrated by comparison with the direct solver. In the numerical experiments, we also show that the unequal directional sampling interval will weaken the advantage of this multigrid-preconditioned iterative solver in the computing speed or, even worse, could reduce its accuracy in some cases, which implies the need for a reasonable control of directional sampling interval in the discretization.
Uzawa smoother in multigrid for the coupleD porous medium and stokes flow system
P. Luo (Peiyao); C. Rodrigo (Carmen); F.J. Gaspar Lorenz (Franscisco); C.W. Oosterlee (Kees)
2017-01-01
textabstractThe multigrid solution of coupled porous media and Stokes flow problems is considered. The Darcy equation as the saturated porous medium model is coupled to the Stokes equations by means of appropriate interface conditions. We focus on an efficient multigrid solution technique for the
Multigrid for high dimensional elliptic partial differential equations on non-equidistant grids
bin Zubair, H.; Oosterlee, C.E.; Wienands, R.
2006-01-01
This work presents techniques, theory and numbers for multigrid in a general d-dimensional setting. The main focus is the multigrid convergence for high-dimensional partial differential equations (PDEs). As a model problem we have chosen the anisotropic diffusion equation, on a unit hypercube. We
Simulating streamer discharges in 3D with the parallel adaptive Afivo framework
H.J. Teunissen (Jannis); U. M. Ebert (Ute)
2017-01-01
htmlabstractWe present an open-source plasma fluid code for 2D, cylindrical and 3D simulations of streamer discharges, based on the Afivo framework that features adaptive mesh refinement, geometric multigrid methods for Poisson's equation, and OpenMP parallelism. We describe the numerical
Finite approximations in fluid mechanics
International Nuclear Information System (INIS)
Hirschel, E.H.
1986-01-01
This book contains twenty papers on work which was conducted between 1983 and 1985 in the Priority Research Program ''Finite Approximations in Fluid Mechanics'' of the German Research Society (Deutsche Forschungsgemeinschaft). Scientists from numerical mathematics, fluid mechanics, and aerodynamics present their research on boundary-element methods, factorization methods, higher-order panel methods, multigrid methods for elliptical and parabolic problems, two-step schemes for the Euler equations, etc. Applications are made to channel flows, gas dynamical problems, large eddy simulation of turbulence, non-Newtonian flow, turbomachine flow, zonal solutions for viscous flow problems, etc. The contents include: multigrid methods for problems from fluid dynamics, development of a 2D-Transonic Potential Flow Solver; a boundary element spectral method for nonstationary viscous flows in 3 dimensions; navier-stokes computations of two-dimensional laminar flows in a channel with a backward facing step; calculations and experimental investigations of the laminar unsteady flow in a pipe expansion; calculation of the flow-field caused by shock wave and deflagration interaction; a multi-level discretization and solution method for potential flow problems in three dimensions; solutions of the conservation equations with the approximate factorization method; inviscid and viscous flow through rotating meridional contours; zonal solutions for viscous flow problems
DEFF Research Database (Denmark)
Christensen, Max la Cour; Villa, Umberto; Engsig-Karup, Allan Peter
2017-01-01
associated with non-planar interfaces between agglomerates, the coarse velocity space has guaranteed approximation properties. The employed AMGe technique provides coarse spaces with desirable local mass conservation and stability properties analogous to the original pair of Raviart-Thomas and piecewise......We study the application of a finite element numerical upscaling technique to the incompressible two-phase porous media total velocity formulation. Specifically, an element agglomeration based Algebraic Multigrid (AMGe) technique with improved approximation proper ties [37] is used, for the first...... discontinuous polynomial spaces, resulting in strong mass conservation for the upscaled systems. Due to the guaranteed approximation properties and the generic nature of the AMGe method, recursive multilevel upscaling is automatically obtained. Furthermore, this technique works for both structured...
Energy Technology Data Exchange (ETDEWEB)
Deb, M.K.; Kennon, S.R.
1998-04-01
A cooperative R&D effort between industry and the US government, this project, under the HPPP (High Performance Parallel Processing) initiative of the Dept. of Energy, started the investigations into parallel object-oriented (OO) numerics. The basic goal was to research and utilize the emerging technologies to create a physics-independent computational kernel for applications using adaptive finite element method. The industrial team included Computational Mechanics Co., Inc. (COMCO) of Austin, TX (as the primary contractor), Scientific Computing Associates, Inc. (SCA) of New Haven, CT, Texaco and CONVEX. Sandia National Laboratory (Albq., NM) was the technology partner from the government side. COMCO had the responsibility of the main kernel design and development, SCA had the lead in parallel solver technology and guidance on OO technologies was Sandia`s main expertise in this venture. CONVEX and Texaco supported the partnership by hardware resource and application knowledge, respectively. As such, a minimum of fifty-percent cost-sharing was provided by the industry partnership during this project. This report describes the R&D activities and provides some details about the prototype kernel and example applications.
Multigrid methods for fully implicit oil reservoir simulation
Energy Technology Data Exchange (ETDEWEB)
Molenaar, J.
1995-12-31
In this paper, the authors consider the simultaneous flow of oil and water in reservoir rock. This displacement process is modeled by two basic equations the material balance or continuity equations, and the equation of motion (Darcy`s law). For the numerical solution of this system of nonlinear partial differential equations, there are two approaches: the fully implicit or simultaneous solution method, and the sequential solution method. In this paper, the authors consider the possibility of applying multigrid methods for the iterative solution of the systems of nonlinear equations.
Adaptive Multigrid Algorithm for the Lattice Wilson-Dirac Operator
International Nuclear Information System (INIS)
Babich, R.; Brower, R. C.; Rebbi, C.; Brannick, J.; Clark, M. A.; Manteuffel, T. A.; McCormick, S. F.; Osborn, J. C.
2010-01-01
We present an adaptive multigrid solver for application to the non-Hermitian Wilson-Dirac system of QCD. The key components leading to the success of our proposed algorithm are the use of an adaptive projection onto coarse grids that preserves the near null space of the system matrix together with a simplified form of the correction based on the so-called γ 5 -Hermitian symmetry of the Dirac operator. We demonstrate that the algorithm nearly eliminates critical slowing down in the chiral limit and that it has weak dependence on the lattice volume.
Conjugate gradient coupled with multigrid for an indefinite problem
Gozani, J.; Nachshon, A.; Turkel, E.
1984-01-01
An iterative algorithm for the Helmholtz equation is presented. This scheme was based on the preconditioned conjugate gradient method for the normal equations. The preconditioning is one cycle of a multigrid method for the discrete Laplacian. The smoothing algorithm is red-black Gauss-Seidel and is constructed so it is a symmetric operator. The total number of iterations needed by the algorithm is independent of h. By varying the number of grids, the number of iterations depends only weakly on k when k(3)h(2) is constant. Comparisons with a SSOR preconditioner are presented.
Local multigrid mesh refinement in view of nuclear fuel 3D modelling in pressurised water reactors
International Nuclear Information System (INIS)
Barbie, L.
2013-01-01
The aim of this study is to improve the performances, in terms of memory space and computational time, of the current modelling of the Pellet-Cladding mechanical Interaction (PCI), complex phenomenon which may occurs during high power rises in pressurised water reactors. Among the mesh refinement methods - methods dedicated to efficiently treat local singularities - a local multi-grid approach was selected because it enables the use of a black-box solver while dealing few degrees of freedom at each level. The Local Defect Correction (LDC) method, well suited to a finite element discretization, was first analysed and checked in linear elasticity, on configurations resulting from the PCI, since its use in solid mechanics is little widespread. Various strategies concerning the implementation of the multilevel algorithm were also compared. Coupling the LDC method with the Zienkiewicz-Zhu a posteriori error estimator in order to automatically detect the zones to be refined, was then tested. Performances obtained on two-dimensional and three-dimensional cases are very satisfactory, since the algorithm proposed is more efficient than h-adaptive refinement methods. Lastly, the LDC algorithm was extended to nonlinear mechanics. Space/time refinement as well as transmission of the initial conditions during the re-meshing step were looked at. The first results obtained are encouraging and show the interest of using the LDC method for PCI modelling. (author) [fr
Multigrid direct numerical simulation of the whole process of flow transition in 3-D boundary layers
Liu, Chaoqun; Liu, Zhining
1993-01-01
A new technology was developed in this study which provides a successful numerical simulation of the whole process of flow transition in 3-D boundary layers, including linear growth, secondary instability, breakdown, and transition at relatively low CPU cost. Most other spatial numerical simulations require high CPU cost and blow up at the stage of flow breakdown. A fourth-order finite difference scheme on stretched and staggered grids, a fully implicit time marching technique, a semi-coarsening multigrid based on the so-called approximate line-box relaxation, and a buffer domain for the outflow boundary conditions were all used for high-order accuracy, good stability, and fast convergence. A new fine-coarse-fine grid mapping technique was developed to keep the code running after the laminar flow breaks down. The computational results are in good agreement with linear stability theory, secondary instability theory, and some experiments. The cost for a typical case with 162 x 34 x 34 grid is around 2 CRAY-YMP CPU hours for 10 T-S periods.
Parallel Reservoir Simulations with Sparse Grid Techniques and Applications to Wormhole Propagation
Wu, Yuanqing
2015-09-08
In this work, two topics of reservoir simulations are discussed. The first topic is the two-phase compositional flow simulation in hydrocarbon reservoir. The major obstacle that impedes the applicability of the simulation code is the long run time of the simulation procedure, and thus speeding up the simulation code is necessary. Two means are demonstrated to address the problem: parallelism in physical space and the application of sparse grids in parameter space. The parallel code can gain satisfactory scalability, and the sparse grids can remove the bottleneck of flash calculations. Instead of carrying out the flash calculation in each time step of the simulation, a sparse grid approximation of all possible results of the flash calculation is generated before the simulation. Then the constructed surrogate model is evaluated to approximate the flash calculation results during the simulation. The second topic is the wormhole propagation simulation in carbonate reservoir. In this work, different from the traditional simulation technique relying on the Darcy framework, we propose a new framework called Darcy-Brinkman-Forchheimer framework to simulate wormhole propagation. Furthermore, to process the large quantity of cells in the simulation grid and shorten the long simulation time of the traditional serial code, standard domain-based parallelism is employed, using the Hypre multigrid library. In addition to that, a new technique called “experimenting field approach” to set coefficients in the model equations is introduced. In the 2D dissolution experiments, different configurations of wormholes and a series of properties simulated by both frameworks are compared. We conclude that the numerical results of the DBF framework are more like wormholes and more stable than the Darcy framework, which is a demonstration of the advantages of the DBF framework. The scalability of the parallel code is also evaluated, and good scalability can be achieved. Finally, a mixed
Annual Copper Mountain Conferences on Multigrid and Iterative Methods, Copper Mountain, Colorado
International Nuclear Information System (INIS)
McCormick, Stephen F.
2016-01-01
This project supported the Copper Mountain Conference on Multigrid and Iterative Methods, held from 2007 to 2015, at Copper Mountain, Colorado. The subject of the Copper Mountain Conference Series alternated between Multigrid Methods in odd-numbered years and Iterative Methods in even-numbered years. Begun in 1983, the Series represents an important forum for the exchange of ideas in these two closely related fields. This report describes the Copper Mountain Conference on Multigrid and Iterative Methods, 2007-2015. Information on the conference series is available at http://grandmaster.colorado.edu/~copper/
Annual Copper Mountain Conferences on Multigrid and Iterative Methods, Copper Mountain, Colorado
Energy Technology Data Exchange (ETDEWEB)
McCormick, Stephen F. [Front Range Scientific, Inc., Lake City, CO (United States)
2016-03-25
This project supported the Copper Mountain Conference on Multigrid and Iterative Methods, held from 2007 to 2015, at Copper Mountain, Colorado. The subject of the Copper Mountain Conference Series alternated between Multigrid Methods in odd-numbered years and Iterative Methods in even-numbered years. Begun in 1983, the Series represents an important forum for the exchange of ideas in these two closely related fields. This report describes the Copper Mountain Conference on Multigrid and Iterative Methods, 2007-2015. Information on the conference series is available at http://grandmaster.colorado.edu/~copper/.
NONLINEAR MULTIGRID SOLVER EXPLOITING AMGe COARSE SPACES WITH APPROXIMATION PROPERTIES
Energy Technology Data Exchange (ETDEWEB)
Christensen, Max La Cour [Technical Univ. of Denmark, Lyngby (Denmark); Villa, Umberto E. [Univ. of Texas, Austin, TX (United States); Engsig-Karup, Allan P. [Technical Univ. of Denmark, Lyngby (Denmark); Vassilevski, Panayot S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2016-01-22
The paper introduces a nonlinear multigrid solver for mixed nite element discretizations based on the Full Approximation Scheme (FAS) and element-based Algebraic Multigrid (AMGe). The main motivation to use FAS for unstruc- tured problems is the guaranteed approximation property of the AMGe coarse spaces that were developed recently at Lawrence Livermore National Laboratory. These give the ability to derive stable and accurate coarse nonlinear discretization problems. The previous attempts (including ones with the original AMGe method, [5, 11]), were less successful due to lack of such good approximation properties of the coarse spaces. With coarse spaces with approximation properties, our FAS approach on un- structured meshes should be as powerful/successful as FAS on geometrically re ned meshes. For comparison, Newton's method and Picard iterations with an inner state-of-the-art linear solver is compared to FAS on a nonlinear saddle point problem with applications to porous media ow. It is demonstrated that FAS is faster than Newton's method and Picard iterations for the experiments considered here. Due to the guaranteed approximation properties of our AMGe, the coarse spaces are very accurate, providing a solver with the potential for mesh-independent convergence on general unstructured meshes.
Velivelli, A. C.; Bryden, K. M.
2006-03-01
Lattice Boltzmann methods are gaining recognition in the field of computational fluid dynamics due to their computational efficiency. In order to quantify the computational efficiency and accuracy of the lattice Boltzmann method, it is compared with efficient traditional finite difference methods such as the alternating direction implicit scheme. The lattice Boltzmann algorithm implemented in previous studies does not approach peak performance for simulations where the data involved in computation per time step is more than the cache size. Due to this, data is obtained from the main memory and this access is much slower than access to cache memory. Using a cache-optimized lattice Boltzmann algorithm, this paper takes into account the full computational strength of the lattice Boltzmann method. The com parison is performed on both a single processor and multiple processors.
Energy Technology Data Exchange (ETDEWEB)
Druinsky, A; Ghysels, P; Li, XS; Marques, O; Williams, S; Barker, A; Kalchev, D; Vassilevski, P
2016-04-02
In this paper, we study the performance of a two-level algebraic-multigrid algorithm, with a focus on the impact of the coarse-grid solver on performance. We consider two algorithms for solving the coarse-space systems: the preconditioned conjugate gradient method and a new robust HSS-embedded low-rank sparse-factorization algorithm. Our test data comes from the SPE Comparative Solution Project for oil-reservoir simulations. We contrast the performance of our code on one 12-core socket of a Cray XC30 machine with performance on a 60-core Intel Xeon Phi coprocessor. To obtain top performance, we optimized the code to take full advantage of fine-grained parallelism and made it thread-friendly for high thread count. We also developed a bounds-and-bottlenecks performance model of the solver which we used to guide us through the optimization effort, and also carried out performance tuning in the solver’s large parameter space. Finally, as a result, significant speedups were obtained on both machines.
3D inversion based on multi-grid approach of magnetotelluric data from Northern Scandinavia
Cherevatova, M.; Smirnov, M.; Korja, T. J.; Egbert, G. D.
2012-12-01
In this work we investigate the geoelectrical structure of the cratonic margin of Fennoscandian Shield by means of magnetotelluric (MT) measurements carried out in Northern Norway and Sweden during summer 2011-2012. The project Magnetotellurics in the Scandes (MaSca) focuses on the investigation of the crust, upper mantle and lithospheric structure in a transition zone from a stable Precambrian cratonic interior to a passive continental margin beneath the Caledonian Orogen and the Scandes Mountains in western Fennoscandia. Recent MT profiles in the central and southern Scandes indicated a large contrast in resistivity between Caledonides and Precambrian basement. The alum shales as a highly conductive layers between the resistive Precambrian basement and the overlying Caledonian nappes are revealed from this profiles. Additional measurements in the Northern Scandes were required. All together data from 60 synchronous long period (LMT) and about 200 broad band (BMT) sites were acquired. The array stretches from Lofoten and Bodo (Norway) in the west to Kiruna and Skeleftea (Sweden) in the east covering an area of 500x500 square kilometers. LMT sites were occupied for about two months, while most of the BMT sites were measured during one day. We have used new multi-grid approach for 3D electromagnetic (EM) inversion and modelling. Our approach is based on the OcTree discretization where the spatial domain is represented by rectangular cells, each of which might be subdivided (recursively) into eight sub-cells. In this simplified implementation the grid is refined only in the horizontal direction, uniformly in each vertical layer. Using multi-grid we manage to have a high grid resolution near the surface (for instance, to tackle with galvanic distortions) and lower resolution at greater depth as the EM fields decay in the Earth according to the diffusion equation. We also have a benefit in computational costs as number of unknowns decrease. The multi-grid forward
Uniform convergence of multigrid V-cycle iterations for indefinite and nonsymmetric problems
Bramble, James H.; Kwak, Do Y.; Pasciak, Joseph E.
1993-01-01
In this paper, we present an analysis of a multigrid method for nonsymmetric and/or indefinite elliptic problems. In this multigrid method various types of smoothers may be used. One type of smoother which we consider is defined in terms of an associated symmetric problem and includes point and line, Jacobi, and Gauss-Seidel iterations. We also study smoothers based entirely on the original operator. One is based on the normal form, that is, the product of the operator and its transpose. Other smoothers studied include point and line, Jacobi, and Gauss-Seidel. We show that the uniform estimates for symmetric positive definite problems carry over to these algorithms. More precisely, the multigrid iteration for the nonsymmetric and/or indefinite problem is shown to converge at a uniform rate provided that the coarsest grid in the multilevel iteration is sufficiently fine (but not depending on the number of multigrid levels).
Copper Mountain conference on multigrid methods. Preliminary proceedings -- List of abstracts
Energy Technology Data Exchange (ETDEWEB)
NONE
1995-12-31
This report contains abstracts of the papers presented at the conference. Papers cover multigrid algorithms and applications of multigrid methods. Applications include the following: solution of elliptical problems; electric power grids; fluid mechanics; atmospheric data assimilation; thermocapillary effects on weld pool shape; boundary-value problems; prediction of hurricane tracks; modeling multi-dimensional combustion and detailed chemistry; black-oil reservoir simulation; image processing; and others.
International Nuclear Information System (INIS)
Kucukboyaci, Vefa; Haghighat, Alireza
2001-01-01
We have developed new angular multigrid formulations, including the Simplified Angular Multigrid (SAM), Nested Iteration (NI), and V-Cycle schemes, that are compatible with the parallel environment and the adaptive differencing strategy of the PENTRAN three-dimensional parallel S N code. Using the Fourier analysis method for an infinite, homogenous medium, we have investigated the effectiveness of the V-Cycle scheme for different problem parameters including scattering ratio, spatial differencing weights, quadrature order, and mesh size. We have further investigated the effectiveness of the new schemes for practical shielding applications such as the Kobayashi benchmark problem and the boiling water reactor core shroud problem. In this paper, we summarize the angular V-Cycle scheme implemented in the PENTRAN code, the Fourier Analysis of the V-Cycle scheme, and results of convergence analysis of the V-Cycle scheme using different problem parameters. The theoretical analysis reveals that the V-Cycle scheme is effective for a large range of scattering ratios and is insensitive to mesh size. Besides the theoretical analysis, we have applied the new angular multigrid schemes to shielding problems. In comparison to the standard PCR formulation, combinations of the new angular multigrid schemes and PCR (e.g., SAM+V-Cycle+PCR) have proved to be very effective for scattering ratios in a range of 0.6 to 0.9. (authors)
Efficient numerical methods for the large-scale, parallel solution of elastoplastic contact problems
Frohne, Jörg
2015-08-06
© 2016 John Wiley & Sons, Ltd. Quasi-static elastoplastic contact problems are ubiquitous in many industrial processes and other contexts, and their numerical simulation is consequently of great interest in accurately describing and optimizing production processes. The key component in these simulations is the solution of a single load step of a time iteration. From a mathematical perspective, the problems to be solved in each time step are characterized by the difficulties of variational inequalities for both the plastic behavior and the contact problem. Computationally, they also often lead to very large problems. In this paper, we present and evaluate a complete set of methods that are (1) designed to work well together and (2) allow for the efficient solution of such problems. In particular, we use adaptive finite element meshes with linear and quadratic elements, a Newton linearization of the plasticity, active set methods for the contact problem, and multigrid-preconditioned linear solvers. Through a sequence of numerical experiments, we show the performance of these methods. This includes highly accurate solutions of a three-dimensional benchmark problem and scaling our methods in parallel to 1024 cores and more than a billion unknowns.
Efficient numerical methods for the large-scale, parallel solution of elastoplastic contact problems
Frohne, Jö rg; Heister, Timo; Bangerth, Wolfgang
2015-01-01
© 2016 John Wiley & Sons, Ltd. Quasi-static elastoplastic contact problems are ubiquitous in many industrial processes and other contexts, and their numerical simulation is consequently of great interest in accurately describing and optimizing production processes. The key component in these simulations is the solution of a single load step of a time iteration. From a mathematical perspective, the problems to be solved in each time step are characterized by the difficulties of variational inequalities for both the plastic behavior and the contact problem. Computationally, they also often lead to very large problems. In this paper, we present and evaluate a complete set of methods that are (1) designed to work well together and (2) allow for the efficient solution of such problems. In particular, we use adaptive finite element meshes with linear and quadratic elements, a Newton linearization of the plasticity, active set methods for the contact problem, and multigrid-preconditioned linear solvers. Through a sequence of numerical experiments, we show the performance of these methods. This includes highly accurate solutions of a three-dimensional benchmark problem and scaling our methods in parallel to 1024 cores and more than a billion unknowns.
Multi-grid and ICCG for problems with interfaces
International Nuclear Information System (INIS)
Dendy, J.E.; Hyman, J.M.
1980-01-01
Computation times for the multi-grid (MG) algorithm, the incomplete Cholesky conjugate gradient (ICCG) algorithm [J. Comp. Phys. 26, 43-65 (1978); Math. Comp. 31, 148-162 (1977)], and the modified ICCG (MICCG) algorithm [BIT 18, 142-156 (1978)] to solve elliptic partial differential equations are compared. The MICCG and ICCG algorithms are more robust than the MG for general positive definite systems. A major advantage of the MG algorithm is that the structure of the problem can be exploited to reduce the solution time significantly. Five example problems are discussed. For problems with little structure and for one-shot calculations ICCG is recommended over MG, and MICCG, over ICCG. For problems that are done many times, it is worth investing the effort to study methods like MG. 1 table
A Parallel Boltzmann Simulation for Multi-grid Inertial Electrostatic Confinement Fusion
National Aeronautics and Space Administration — Inertial electrostatic confinement (IEC) is a means of confining a non-neutral, non-Maxwellian plasma with an electric field, with the goal of creating fusion...
Cleveland, Mathew A.
We investigate several aspects of the numerical solution of the radiative transfer equation in the context of coal combustion: the parallel efficiency of two commonly-used opacity models, the sensitivity of turbulent radiation interaction (TRI) effects to the presence of coal particulate, and an improvement of the order of temporal convergence using the coarse mesh finite difference (CMFD) method. There are four opacity models commonly employed to evaluate the radiative transfer equation in combustion applications; line-by-line (LBL), multigroup, band, and global. Most of these models have been rigorously evaluated for serial computations of a spectrum of problem types [1]. Studies of these models for parallel computations [2] are limited. We assessed the performance of the Spectral-Line-Based weighted sum of gray gasses (SLW) model, a global method related to K-distribution methods [1], and the LBL model. The LBL model directly interpolates opacity information from large data tables. The LBL model outperforms the SLW model in almost all cases, as suggested by Wang et al. [3]. The SLW model, however, shows superior parallel scaling performance and a decreased sensitivity to load imbalancing, suggesting that for some problems, global methods such as the SLW model, could outperform the LBL model. Turbulent radiation interaction (TRI) effects are associated with the differences in the time scales of the fluid dynamic equations and the radiative transfer equations. Solving on the fluid dynamic time step size produces large changes in the radiation field over the time step. We have modified the statistically homogeneous, non-premixed flame problem of Deshmukh et al. [4] to include coal-type particulate. The addition of low mass loadings of particulate minimally impacts the TRI effects. Observed differences in the TRI effects from variations in the packing fractions and Stokes numbers are difficult to analyze because of the significant effect of variations in problem
Multigrid methods for S/sub N/ problems
International Nuclear Information System (INIS)
Nowak, P.F.; Larsen, E.W.; Martin, W.R.
1987-01-01
It has long been known that the standard source iteration (SI) method for obtaining iterative solutions of S/sub N/ problems is very slowly converging in optically thick regions with low absorption. The rebalance and diffusion synthetic acceleration (DSA) methods are generalizations of SI that have been developed to accelerate convergence, but neither of these methods has been completely successful. In particular, the rebalance method tends to become unstable in problems where it is needed most (problems with high scattering ratios c = 1), while the DSA method, to be implemented in a stable fashion, requires the solution of a particular system of acceleration equations, and this has been done efficiently in two-dimensional geometries only for the diamond difference S/sub N/ equations. This paper discusses another extension of the SI method, namely, SI combined with the spatial multigrid algorithm (SIMG). This appears to be a viable way to accelerate many S/sub N/ problems in multidimensional geometries, provided the finest mesh consists of cells that are not optically thick
Multigrid on unstructured grids using an auxiliary set of structured grids
Energy Technology Data Exchange (ETDEWEB)
Douglas, C.C.; Malhotra, S.; Schultz, M.H. [Yale Univ., New Haven, CT (United States)
1996-12-31
Unstructured grids do not have a convenient and natural multigrid framework for actually computing and maintaining a high floating point rate on standard computers. In fact, just the coarsening process is expensive for many applications. Since unstructured grids play a vital role in many scientific computing applications, many modifications have been proposed to solve this problem. One suggested solution is to map the original unstructured grid onto a structured grid. This can be used as a fine grid in a standard multigrid algorithm to precondition the original problem on the unstructured grid. We show that unless extreme care is taken, this mapping can lead to a system with a high condition number which eliminates the usefulness of the multigrid method. Theorems with lower and upper bounds are provided. Simple examples show that the upper bounds are sharp.
Rasthofer, U.; Wall, W. A.; Gravemeier, V.
2018-04-01
A novel and comprehensive computational method, referred to as the eXtended Algebraic Variational Multiscale-Multigrid-Multifractal Method (XAVM4), is proposed for large-eddy simulation of the particularly challenging problem of turbulent two-phase flow. The XAVM4 involves multifractal subgrid-scale modeling as well as a Nitsche-type extended finite element method as an approach for two-phase flow. The application of an advanced structural subgrid-scale modeling approach in conjunction with a sharp representation of the discontinuities at the interface between two bulk fluids promise high-fidelity large-eddy simulation of turbulent two-phase flow. The high potential of the XAVM4 is demonstrated for large-eddy simulation of turbulent two-phase bubbly channel flow, that is, turbulent channel flow carrying a single large bubble of the size of the channel half-width in this particular application.
Multi-grid Particle-in-cell Simulations of Plasma Microturbulence
International Nuclear Information System (INIS)
Lewandowski, J.L.V.
2003-01-01
A new scheme to accurately retain kinetic electron effects in particle-in-cell (PIC) simulations for the case of electrostatic drift waves is presented. The splitting scheme, which is based on exact separation between adiabatic and on adiabatic electron responses, is shown to yield more accurate linear growth rates than the standard df scheme. The linear and nonlinear elliptic problems that arise in the splitting scheme are solved using a multi-grid solver. The multi-grid particle-in-cell approach offers an attractive path, both from the physics and numerical points of view, to simulate kinetic electron dynamics in global toroidal plasmas
A Cost-Effective Smoothed Multigrid with Modified Neighborhood-Based Aggregation for Markov Chains
Directory of Open Access Journals (Sweden)
Zhao-Li Shen
2015-01-01
Full Text Available Smoothed aggregation multigrid method is considered for computing stationary distributions of Markov chains. A judgement which determines whether to implement the whole aggregation procedure is proposed. Through this strategy, a large amount of time in the aggregation procedure is saved without affecting the convergence behavior. Besides this, we explain the shortage and irrationality of the Neighborhood-Based aggregation which is commonly used in multigrid methods. Then a modified version is presented to remedy and improve it. Numerical experiments on some typical Markov chain problems are reported to illustrate the performance of these methods.
Multigrid for the Galerkin least squares method in linear elasticity: The pure displacement problem
Energy Technology Data Exchange (ETDEWEB)
Yoo, Jaechil [Univ. of Wisconsin, Madison, WI (United States)
1996-12-31
Franca and Stenberg developed several Galerkin least squares methods for the solution of the problem of linear elasticity. That work concerned itself only with the error estimates of the method. It did not address the related problem of finding effective methods for the solution of the associated linear systems. In this work, we prove the convergence of a multigrid (W-cycle) method. This multigrid is robust in that the convergence is uniform as the parameter, v, goes to 1/2 Computational experiments are included.
Energy Technology Data Exchange (ETDEWEB)
Kohn, S.; Weare, J.; Ong, E.; Baden, S.
1997-05-01
We have applied structured adaptive mesh refinement techniques to the solution of the LDA equations for electronic structure calculations. Local spatial refinement concentrates memory resources and numerical effort where it is most needed, near the atomic centers and in regions of rapidly varying charge density. The structured grid representation enables us to employ efficient iterative solver techniques such as conjugate gradient with FAC multigrid preconditioning. We have parallelized our solver using an object- oriented adaptive mesh refinement framework.
Energy Technology Data Exchange (ETDEWEB)
Kalchev, D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Ketelsen, C. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Vassilevski, P. S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2013-11-07
Our paper proposes an adaptive strategy for reusing a previously constructed coarse space by algebraic multigrid to construct a two-level solver for a problem with nearby characteristics. Furthermore, a main target application is the solution of the linear problems that appear throughout a sequence of Markov chain Monte Carlo simulations of subsurface flow with uncertain permeability field. We demonstrate the efficacy of the method with extensive set of numerical experiments.
Multi-grid Beam and Warming scheme for the simulation of unsteady ...
African Journals Online (AJOL)
In this paper, a multi-grid algorithm is applied to a large-scale block matrix that is produced from a Beam and Warming scheme. The Beam and Warming scheme is used in the simulation of unsteady flow in an open channel. The Gauss-Seidel block-wise iteration method is used for a smoothing process with a few iterations.
A Multigrid NLS-4DVar Data Assimilation Scheme with Advanced Research WRF (ARW)
Zhang, H.; Tian, X.
2017-12-01
The motions of the atmosphere have multiscale properties in space and/or time, and the background error covariance matrix (Β) should thus contain error information at different correlation scales. To obtain an optimal analysis, the multigrid three-dimensional variational data assimilation scheme is used widely when sequentially correcting errors from large to small scales. However, introduction of the multigrid technique into four-dimensional variational data assimilation is not easy, due to its strong dependence on the adjoint model, which has extremely high computational costs in data coding, maintenance, and updating. In this study, the multigrid technique was introduced into the nonlinear least-squares four-dimensional variational assimilation (NLS-4DVar) method, which is an advanced four-dimensional ensemble-variational method that can be applied without invoking the adjoint models. The multigrid NLS-4DVar (MG-NLS-4DVar) scheme uses the number of grid points to control the scale, with doubling of this number when moving from a coarse to a finer grid. Furthermore, the MG-NLS-4DVar scheme not only retains the advantages of NLS-4DVar, but also sufficiently corrects multiscale errors to achieve a highly accurate analysis. The effectiveness and efficiency of the proposed MG-NLS-4DVar scheme were evaluated by several groups of observing system simulation experiments using the Advanced Research Weather Research and Forecasting Model. MG-NLS-4DVar outperformed NLS-4DVar, with a lower computational cost.
Monolithic multigrid method for the coupled Stokes flow and deformable porous medium system
P. Luo (Peiyao); C. Rodrigo (Carmen); F.J. Gaspar Lorenz (Franscisco); C.W. Oosterlee (Cornelis)
2018-01-01
textabstractThe interaction between fluid flow and a deformable porous medium is a complicated multi-physics problem, which can be described by a coupled model based on the Stokes and poroelastic equations. A monolithic multigrid method together with either a coupled Vanka smoother or a decoupled
Analysis of preconditioning and multigrid for Euler flows with low-subsonic regions
Koren, B.; Leer, van B.
1995-01-01
For subsonic flows and upwind-discretized, linearized 1-D Euler equations, the smoothing behavior of multigrid-accelerated point Gauss-Seidel relaxation is analyzed. Error decay by convection across domain boundaries is also discussed. A fix to poor convergence rates at low Mach numbers is sought in
Two-level Fourier analysis of a multigrid approach for discontinuous Galerkin discretisation
P.W. Hemker (Piet); W. Hoffmann; M.H. van Raalte (Marc)
2002-01-01
textabstractIn this paper we study a multigrid method for the solution of a linear second order elliptic equation, discretized by discontinuous Galerkin (DG) methods, andwe give a detailed analysis of the convergence for different block-relaxation strategies.We find that point-wise
A multigrid based 3D space-charge routine in the tracking code GPT
Pöplau, G.; Rienen, van U.; Loos, de M.J.; Geer, van der S.B.; Berz, M.; Makino, K.
2005-01-01
Fast calculation of3D non-linear space-charge fields is essential for the simulation ofhigh-brightness charged particle beams. We report on our development of a new 3D spacecharge routine in the General Particle Tracer (GPT) code. The model is based on a nonequidistant multigrid Poisson solver that
DEFF Research Database (Denmark)
Kolmogorov, Dmitry; Sørensen, Niels N.; Shen, Wen Zhong
2013-01-01
An Optimized Schwarz method using Robin boundary conditions for relaxation scheme is presented in the frame of Multigrid method on discontinuous grids. At each iteration the relaxation scheme is performed in two steps: one step with Dirichlet and another step with Robin boundary conditions at inn...
A multigrid Newton-Krylov method for flux-limited radiation diffusion
International Nuclear Information System (INIS)
Rider, W.J.; Knoll, D.A.; Olson, G.L.
1998-01-01
The authors focus on the integration of radiation diffusion including flux-limited diffusion coefficients. The nonlinear integration is accomplished with a Newton-Krylov method preconditioned with a multigrid Picard linearization of the governing equations. They investigate the efficiency of the linear and nonlinear iterative techniques
Analysis and development of stochastic multigrid methods in lattice field theory
International Nuclear Information System (INIS)
Grabenstein, M.
1994-01-01
We study the relation between the dynamical critical behavior and the kinematics of stochastic multigrid algorithms. The scale dependence of acceptance rates for nonlocal Metropolis updates is analyzed with the help of an approximation formula. A quantitative study of the kinematics of multigrid algorithms in several interacting models is performed. We find that for a critical model with Hamiltonian H(Φ) absence of critical slowing down can only be expected if the expansion of (H(Φ+ψ)) in terms of the shift ψ contains no relevant term (mass term). The predictions of this rule was verified in a multigrid Monte Carlo simulation of the Sine Gordon model in two dimensions. Our analysis can serve as a guideline for the development of new algorithms: We propose a new multigrid method for nonabelian lattice gauge theory, the time slice blocking. For SU(2) gauge fields in two dimensions, critical slowing down is almost completely eliminated by this method, in accordance with the theoretical prediction. The generalization of the time slice blocking to SU(2) in four dimensions is investigated analytically and by numerical simulations. Compared to two dimensions, the local disorder in the four dimensional gauge field leads to kinematical problems. (orig.)
1980-10-01
solving (1.3); PFAS combines the concepts of multigrid algorithms with those of projected SOR. In Section 3, we discuss the implementation of PFAS, and...numerique de la torsion elasto- plastique d’une barre cylindrique. In Approximation et Methodes Iteratives de Resolution d’Inequations Variationelles et
Development of new multigrid schemes for the method of characteristics in neutron transport theory
International Nuclear Information System (INIS)
Grassi, G.
2006-01-01
This dissertation is based upon our doctoral research that dealt with the conception and development of new non-linear multigrid techniques for the Method of the Characteristics (MOC) within the TDT code. Here we focus upon a two-level scheme consisting of a fine level on which the neutron transport equation is iteratively solved using the MOC algorithm, and a coarse level defined by a more coarsely discretized phase space on which a low-order problem is considered. The solution of this problem is then used in order to correct the angular flux moments resulting from the previous transport iteration. A flux-volume homogenization procedure is employed to evaluate the coarse-level material properties after each transport iteration. This entails the non-linearity of the methods. According to the Generalised Equivalence Theory (GET), additional degrees of freedom are introduced for the low-order problem so that the convergence of the acceleration scheme can be ensured. We present two classes of non-linear methods: transport-like methods and discussion-like methods. Transport-like methods consider a homogenized low-order transport problem on the coarse level. This problem is iteratively solved using the same MOC algorithm as for the transport problem on the fine level. Discontinuity factors are then employed, per region or per surface, in order to reconstruct the currents evaluated by the low-order operator, which ensure the convergence of the acceleration scheme. On the other hand, discussion-like methods consider a low-order problem inspired by diffusion. We studied the non-linear Coarse Mesh Finite Difference (CMFD) method, already present in literature, in the perspective of integrating it into TDT code. Then, we developed a new non-linear method on the model of CMFD. From the latter, we borrowed the idea to establish a simple relation between currents and fluxes in order to obtain a problem involving only coarse fluxes. Finally, those non-linear methods have been
Crockett, Thomas W.
1995-01-01
This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.
1982-01-01
Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn
Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.
2017-12-01
This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.
s-Step Krylov Subspace Methods as Bottom Solvers for Geometric Multigrid
Energy Technology Data Exchange (ETDEWEB)
Williams, Samuel [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Lijewski, Mike [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Almgren, Ann [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Straalen, Brian Van [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Carson, Erin [Univ. of California, Berkeley, CA (United States); Knight, Nicholas [Univ. of California, Berkeley, CA (United States); Demmel, James [Univ. of California, Berkeley, CA (United States)
2014-08-14
Geometric multigrid solvers within adaptive mesh refinement (AMR) applications often reach a point where further coarsening of the grid becomes impractical as individual sub domain sizes approach unity. At this point the most common solution is to use a bottom solver, such as BiCGStab, to reduce the residual by a fixed factor at the coarsest level. Each iteration of BiCGStab requires multiple global reductions (MPI collectives). As the number of BiCGStab iterations required for convergence grows with problem size, and the time for each collective operation increases with machine scale, bottom solves in large-scale applications can constitute a significant fraction of the overall multigrid solve time. In this paper, we implement, evaluate, and optimize a communication-avoiding s-step formulation of BiCGStab (CABiCGStab for short) as a high-performance, distributed-memory bottom solver for geometric multigrid solvers. This is the first time s-step Krylov subspace methods have been leveraged to improve multigrid bottom solver performance. We use a synthetic benchmark for detailed analysis and integrate the best implementation into BoxLib in order to evaluate the benefit of a s-step Krylov subspace method on the multigrid solves found in the applications LMC and Nyx on up to 32,768 cores on the Cray XE6 at NERSC. Overall, we see bottom solver improvements of up to 4.2x on synthetic problems and up to 2.7x in real applications. This results in as much as a 1.5x improvement in solver performance in real applications.
An h-adaptive finite element solver for the calculations of the electronic structures
International Nuclear Information System (INIS)
Bao Gang; Hu Guanghui; Liu Di
2012-01-01
In this paper, a framework of using h-adaptive finite element method for the Kohn–Sham equation on the tetrahedron mesh is presented. The Kohn–Sham equation is discretized by the finite element method, and the h-adaptive technique is adopted to optimize the accuracy and the efficiency of the algorithm. The locally optimal block preconditioned conjugate gradient method is employed for solving the generalized eigenvalue problem, and an algebraic multigrid preconditioner is used to accelerate the solver. A variety of numerical experiments demonstrate the effectiveness of our algorithm for both the all-electron and the pseudo-potential calculations.
International Nuclear Information System (INIS)
Masiello, E.; Sanchez, R.
2007-01-01
A discontinuous heterogeneous finite element method is presented and discussed. The method is intended for realistic numerical pin-by-pin lattice calculations when an exact representation of the geometric shape of the pins is made without need for homogenization. The method keeps the advantages of conventional discrete ordinate methods, such as fast execution together with the possibility to deal with a large number of spatial meshes, while minimizing the need for geometric modeling. It also provides a complete factorization in space, angle, and energy for the discretized matrices and minimizes, thus, storage requirements. An angular multigrid acceleration technique has also been developed to speed up the rate of convergence of the inner iterations. A particular aspect of this acceleration is the introduction of boundary restriction and prolongation operators that minimize oscillatory behavior and enhance positivity. Numerical tests are presented that show the high precision of the method and the efficiency of the angular multigrid acceleration. (authors)
Programming the finite element method
Smith, I M; Margetts, L
2013-01-01
Many students, engineers, scientists and researchers have benefited from the practical, programming-oriented style of the previous editions of Programming the Finite Element Method, learning how to develop computer programs to solve specific engineering problems using the finite element method. This new fifth edition offers timely revisions that include programs and subroutine libraries fully updated to Fortran 2003, which are freely available online, and provides updated material on advances in parallel computing, thermal stress analysis, plasticity return algorithms, convection boundary c
Casanova, Henri; Robert, Yves
2008-01-01
""…The authors of the present book, who have extensive credentials in both research and instruction in the area of parallelism, present a sound, principled treatment of parallel algorithms. … This book is very well written and extremely well designed from an instructional point of view. … The authors have created an instructive and fascinating text. The book will serve researchers as well as instructors who need a solid, readable text for a course on parallelism in computing. Indeed, for anyone who wants an understandable text from which to acquire a current, rigorous, and broad vi
Massively Parallel Geostatistical Inversion of Coupled Processes in Heterogeneous Porous Media
Ngo, A.; Schwede, R. L.; Li, W.; Bastian, P.; Ippisch, O.; Cirpka, O. A.
2012-04-01
The quasi-linear geostatistical approach is an inversion scheme that can be used to estimate the spatial distribution of a heterogeneous hydraulic conductivity field. The estimated parameter field is considered to be a random variable that varies continuously in space, meets the measurements of dependent quantities (such as the hydraulic head, the concentration of a transported solute or its arrival time) and shows the required spatial correlation (described by certain variogram models). This is a method of conditioning a parameter field to observations. Upon discretization, this results in as many parameters as elements of the computational grid. For a full three dimensional representation of the heterogeneous subsurface it is hardly sufficient to work with resolutions (up to one million parameters) of the model domain that can be achieved on a serial computer. The forward problems to be solved within the inversion procedure consists of the elliptic steady-state groundwater flow equation and the formally elliptic but nearly hyperbolic steady-state advection-dominated solute transport equation in a heterogeneous porous medium. Both equations are discretized by Finite Element Methods (FEM) using fully scalable domain decomposition techniques. Whereas standard conforming FEM is sufficient for the flow equation, for the advection dominated transport equation, which rises well known numerical difficulties at sharp fronts or boundary layers, we use the streamline diffusion approach. The arising linear systems are solved using efficient iterative solvers with an AMG (algebraic multigrid) pre-conditioner. During each iteration step of the inversion scheme one needs to solve a multitude of forward and adjoint problems in order to calculate the sensitivities of each measurement and the related cross-covariance matrix of the unknown parameters and the observations. In order to reduce interprocess communications and to improve the scalability of the code on larger clusters
Iterative algorithms for large sparse linear systems on parallel computers
Adams, L. M.
1982-01-01
Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.
Comparison of multihardware parallel implementations for a phase unwrapping algorithm
Hernandez-Lopez, Francisco Javier; Rivera, Mariano; Salazar-Garibay, Adan; Legarda-Sáenz, Ricardo
2018-04-01
Phase unwrapping is an important problem in the areas of optical metrology, synthetic aperture radar (SAR) image analysis, and magnetic resonance imaging (MRI) analysis. These images are becoming larger in size and, particularly, the availability and need for processing of SAR and MRI data have increased significantly with the acquisition of remote sensing data and the popularization of magnetic resonators in clinical diagnosis. Therefore, it is important to develop faster and accurate phase unwrapping algorithms. We propose a parallel multigrid algorithm of a phase unwrapping method named accumulation of residual maps, which builds on a serial algorithm that consists of the minimization of a cost function; minimization achieved by means of a serial Gauss-Seidel kind algorithm. Our algorithm also optimizes the original cost function, but unlike the original work, our algorithm is a parallel Jacobi class with alternated minimizations. This strategy is known as the chessboard type, where red pixels can be updated in parallel at same iteration since they are independent. Similarly, black pixels can be updated in parallel in an alternating iteration. We present parallel implementations of our algorithm for different parallel multicore architecture such as CPU-multicore, Xeon Phi coprocessor, and Nvidia graphics processing unit. In all the cases, we obtain a superior performance of our parallel algorithm when compared with the original serial version. In addition, we present a detailed comparative performance of the developed parallel versions.
Multigrid solution of the Navier-Stokes equations at low speeds with large temperature variations
International Nuclear Information System (INIS)
Sockol, Peter M.
2003-01-01
Multigrid methods for the Navier-Stokes equations at low speeds and large temperature variations are investigated. The compressible equations with time-derivative preconditioning and preconditioned flux-difference splitting of the inviscid terms are used. Three implicit smoothers have been incorporated into a common multigrid procedure. Both full coarsening and semi-coarsening with directional fine-grid defect correction have been studied. The resulting methods have been tested on four 2D laminar problems over a range of Reynolds numbers on both uniform and highly stretched grids. Two of the three methods show efficient and robust performance over the entire range of conditions. In addition, none of the methods has any difficulty with the large temperature variations
Blockspin and multigrid for staggered fermions in non-abelian gauge fields
International Nuclear Information System (INIS)
Kalkreuter, T.; Mack, G.; Speh, M.
1991-07-01
We discuss blockspins for staggered fermions, i.e. averaging and interpolation procedures which are needed in a real space renormalization group approach to gauge theories with staggered fermions and in a multigrid approach to the computation of gauge covariant propagators. The discussion starts from the requirement that the symmetries of the free action should be preserved by the blocking procedure in the limit of a pure gauge. A definition of an averaging kernel as a solution of a gauge covariant eigenvalue equation is proposed, and the properties of a corresponding interpolation kernel are examined in the light of general criteria for good choices of blockspins. Some results of multigrid computation of bosonic propagation in an SU(2) gauge field in 4 dimensions are also presented. (orig.)
International Nuclear Information System (INIS)
Kotiluoto, P.
2007-05-01
A new deterministic three-dimensional neutral and charged particle transport code, MultiTrans, has been developed. In the novel approach, the adaptive tree multigrid technique is used in conjunction with simplified spherical harmonics approximation of the Boltzmann transport equation. The development of the new radiation transport code started in the framework of the Finnish boron neutron capture therapy (BNCT) project. Since the application of the MultiTrans code to BNCT dose planning problems, the testing and development of the MultiTrans code has continued in conventional radiotherapy and reactor physics applications. In this thesis, an overview of different numerical radiation transport methods is first given. Special features of the simplified spherical harmonics method and the adaptive tree multigrid technique are then reviewed. The usefulness of the new MultiTrans code has been indicated by verifying and validating the code performance for different types of neutral and charged particle transport problems, reported in separate publications. (orig.)
Optimal multigrid algorithms for the massive Gaussian model and path integrals
International Nuclear Information System (INIS)
Brandt, A.; Galun, M.
1996-01-01
Multigrid algorithms are presented which, in addition to eliminating the critical slowing down, can also eliminate the open-quotes volume factorclose quotes. The elimination of the volume factor removes the need to produce many independent fine-grid configurations for averaging out their statistical deviations, by averaging over the many samples produced on coarse grids during the multigrid cycle. Thermodynamic limits of observables can be calculated to relative accuracy var-epsilon r in just O(var-epsilon r -2 ) computer operations, where var-epsilon r is the error relative to the standard deviation of the observable. In this paper, we describe in detail the calculation of the susceptibility in the one-dimensional massive Gaussian model, which is also a simple example of path integrals. Numerical experiments show that the susceptibility can be calculated to relative accuracy var-epsilon r in about 8 var-epsilon r -2 random number generations, independent of the mass size
Multilevel local refinement and multigrid methods for 3-D turbulent flow
Energy Technology Data Exchange (ETDEWEB)
Liao, C.; Liu, C. [UCD, Denver, CO (United States); Sung, C.H.; Huang, T.T. [David Taylor Model Basin, Bethesda, MD (United States)
1996-12-31
A numerical approach based on multigrid, multilevel local refinement, and preconditioning methods for solving incompressible Reynolds-averaged Navier-Stokes equations is presented. 3-D turbulent flow around an underwater vehicle is computed. 3 multigrid levels and 2 local refinement grid levels are used. The global grid is 24 x 8 x 12. The first patch is 40 x 16 x 20 and the second patch is 72 x 32 x 36. 4th order artificial dissipation are used for numerical stability. The conservative artificial compressibility method are used for further improvement of convergence. To improve the accuracy of coarse/fine grid interface of local refinement, flux interpolation method for refined grid boundary is used. The numerical results are in good agreement with experimental data. The local refinement can improve the prediction accuracy significantly. The flux interpolation method for local refinement can keep conservation for a composite grid, therefore further modify the prediction accuracy.
Multigrid techniques with non-standard coarsening and group relaxation methods
International Nuclear Information System (INIS)
Danaee, A.
1989-06-01
In the usual (standard) multigrid methods, doubling of grid sizes with different smoothing iterations (pointwise, or blockwise) has been considered by different authors. Some have indicated that a large coarsening can also be used, but is not beneficial (cf. H3, p.59). In this paper, it is shown that with a suitable blockwise smoothing scheme, some advantages could be achieved even with a factor of H l-1 /h l = 3. (author). 10 refs, 2 figs, 6 tabs
Design and fabrication of multigrid X-ray collimators. [For airborne x-ray spectroscopy
Energy Technology Data Exchange (ETDEWEB)
Acton, L W; Joki, E G; Salmon, R J [Lockheed Missiles and Space Co., Palo Alto, Calif. (USA). Lockheed Palo Alto Research Lab.
1976-08-01
Multigrid X-ray collimators continue to find wide application in space research. This paper treats the principles of their design and fabrication and summarizes the experience obtained in making and flying thirteen such collimators ranging in angular resolution from 10 to 0.7 arc min FWHM. Included is a summary of a survey of scientist-users and industrial producers of collimator grids regarding grid materials, precision, plating, hole quality and results of acceptance testing.
A first-order multigrid method for bound-constrained convex optimization
Czech Academy of Sciences Publication Activity Database
Kočvara, Michal; Mohammed, S.
2016-01-01
Roč. 31, č. 3 (2016), s. 622-644 ISSN 1055-6788 R&D Projects: GA ČR(CZ) GAP201/12/0671 Grant - others:European Commission - EC(XE) 313781 Institutional support: RVO:67985556 Keywords : bound-constrained optimization * multigrid methods * linear complementarity problems Subject RIV: BA - General Mathematics Impact factor: 1.023, year: 2016 http://library.utia.cas.cz/separaty/2016/MTR/kocvara-0460326.pdf
Multidimensional radiative transfer with multilevel atoms. II. The non-linear multigrid method.
Fabiani Bendicho, P.; Trujillo Bueno, J.; Auer, L.
1997-08-01
A new iterative method for solving non-LTE multilevel radiative transfer (RT) problems in 1D, 2D or 3D geometries is presented. The scheme obtains the self-consistent solution of the kinetic and RT equations at the cost of only a few (iteration (Brandt, 1977, Math. Comp. 31, 333; Hackbush, 1985, Multi-Grid Methods and Applications, springer-Verlag, Berlin), an efficient multilevel RT scheme based on Gauss-Seidel iterations (cf. Trujillo Bueno & Fabiani Bendicho, 1995ApJ...455..646T), and accurate short-characteristics formal solution techniques. By combining a valid stopping criterion with a nested-grid strategy a converged solution with the desired true error is automatically guaranteed. Contrary to the current operator splitting methods the very high convergence speed of the new RT method does not deteriorate when the grid spatial resolution is increased. With this non-linear multigrid method non-LTE problems discretized on N grid points are solved in O(N) operations. The nested multigrid RT method presented here is, thus, particularly attractive in complicated multilevel transfer problems where small grid-sizes are required. The properties of the method are analyzed both analytically and with illustrative multilevel calculations for Ca II in 1D and 2D schematic model atmospheres.
Algebraic multigrid preconditioners for two-phase flow in porous media with phase transitions
Bui, Quan M.; Wang, Lu; Osei-Kuffuor, Daniel
2018-04-01
Multiphase flow is a critical process in a wide range of applications, including oil and gas recovery, carbon sequestration, and contaminant remediation. Numerical simulation of multiphase flow requires solving of a large, sparse linear system resulting from the discretization of the partial differential equations modeling the flow. In the case of multiphase multicomponent flow with miscible effect, this is a very challenging task. The problem becomes even more difficult if phase transitions are taken into account. A new approach to handle phase transitions is to formulate the system as a nonlinear complementarity problem (NCP). Unlike in the primary variable switching technique, the set of primary variables in this approach is fixed even when there is phase transition. Not only does this improve the robustness of the nonlinear solver, it opens up the possibility to use multigrid methods to solve the resulting linear system. The disadvantage of the complementarity approach, however, is that when a phase disappears, the linear system has the structure of a saddle point problem and becomes indefinite, and current algebraic multigrid (AMG) algorithms cannot be applied directly. In this study, we explore the effectiveness of a new multilevel strategy, based on the multigrid reduction technique, to deal with problems of this type. We demonstrate the effectiveness of the method through numerical results for the case of two-phase, two-component flow with phase appearance/disappearance. We also show that the strategy is efficient and scales optimally with problem size.
Block-accelerated aggregation multigrid for Markov chains with application to PageRank problems
Shen, Zhao-Li; Huang, Ting-Zhu; Carpentieri, Bruno; Wen, Chun; Gu, Xian-Ming
2018-06-01
Recently, the adaptive algebraic aggregation multigrid method has been proposed for computing stationary distributions of Markov chains. This method updates aggregates on every iterative cycle to keep high accuracies of coarse-level corrections. Accordingly, its fast convergence rate is well guaranteed, but often a large proportion of time is cost by aggregation processes. In this paper, we show that the aggregates on each level in this method can be utilized to transfer the probability equation of that level into a block linear system. Then we propose a Block-Jacobi relaxation that deals with the block system on each level to smooth error. Some theoretical analysis of this technique is presented, meanwhile it is also adapted to solve PageRank problems. The purpose of this technique is to accelerate the adaptive aggregation multigrid method and its variants for solving Markov chains and PageRank problems. It also attempts to shed some light on new solutions for making aggregation processes more cost-effective for aggregation multigrid methods. Numerical experiments are presented to illustrate the effectiveness of this technique.
Energy Technology Data Exchange (ETDEWEB)
Gjesdal, Thor
1997-12-31
This thesis discusses the development and application of efficient numerical methods for the simulation of fluid flows, in particular the flow of incompressible fluids. The emphasis is on practical aspects of algorithm development and on application of the methods either to linear scalar model equations or to the non-linear incompressible Navier-Stokes equations. The first part deals with cell centred multigrid methods and linear correction scheme and presents papers on (1) generalization of the method to arbitrary sized grids for diffusion problems, (2) low order method for advection-diffusion problems, (3) attempt to extend the basic method to advection-diffusion problems, (4) Fourier smoothing analysis of multicolour relaxation schemes, and (5) analysis of high-order discretizations for advection terms. The second part discusses a multigrid based on pressure correction methods, non-linear full approximation scheme, and papers on (1) systematic comparison of the performance of different pressure correction smoothers and some other algorithmic variants, low to moderate Reynolds numbers, and (2) systematic study of implementation strategies for high order advection schemes, high-Re flow. An appendix contains Fortran 90 data structures for multigrid development. 160 refs., 26 figs., 22 tabs.
Parallel knock-out schemes in networks
Broersma, H.J.; Fomin, F.V.; Woeginger, G.J.
2004-01-01
We consider parallel knock-out schemes, a procedure on graphs introduced by Lampert and Slater in 1997 in which each vertex eliminates exactly one of its neighbors in each round. We are considering cases in which after a finite number of rounds, where the minimimum number is called the parallel
Exploiting Symmetry on Parallel Architectures.
Stiller, Lewis Benjamin
1995-01-01
This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.
van der Vegt, Jacobus J.W.; Rhebergen, Sander
2012-01-01
Using a detailed multilevel analysis of the complete hp-Multigrid as Smoother algorithm accurate predictions are obtained of the spectral radius and operator norms of the multigrid error transformation operator. This multilevel analysis is used to optimize the coefficients in the semi-implicit
International Nuclear Information System (INIS)
Jejcic, A.; Maillard, J.; Maurel, G.; Silva, J.; Wolff-Bacha, F.
1997-01-01
The work in the field of parallel processing has developed as research activities using several numerical Monte Carlo simulations related to basic or applied current problems of nuclear and particle physics. For the applications utilizing the GEANT code development or improvement works were done on parts simulating low energy physical phenomena like radiation, transport and interaction. The problem of actinide burning by means of accelerators was approached using a simulation with the GEANT code. A program of neutron tracking in the range of low energies up to the thermal region has been developed. It is coupled to the GEANT code and permits in a single pass the simulation of a hybrid reactor core receiving a proton burst. Other works in this field refers to simulations for nuclear medicine applications like, for instance, development of biological probes, evaluation and characterization of the gamma cameras (collimators, crystal thickness) as well as the method for dosimetric calculations. Particularly, these calculations are suited for a geometrical parallelization approach especially adapted to parallel machines of the TN310 type. Other works mentioned in the same field refer to simulation of the electron channelling in crystals and simulation of the beam-beam interaction effect in colliders. The GEANT code was also used to simulate the operation of germanium detectors designed for natural and artificial radioactivity monitoring of environment
Generalization of mixed multiscale finite element methods with applications
Energy Technology Data Exchange (ETDEWEB)
Lee, C S [Texas A & M Univ., College Station, TX (United States)
2016-08-01
Many science and engineering problems exhibit scale disparity and high contrast. The small scale features cannot be omitted in the physical models because they can affect the macroscopic behavior of the problems. However, resolving all the scales in these problems can be prohibitively expensive. As a consequence, some types of model reduction techniques are required to design efficient solution algorithms. For practical purpose, we are interested in mixed finite element problems as they produce solutions with certain conservative properties. Existing multiscale methods for such problems include the mixed multiscale finite element methods. We show that for complicated problems, the mixed multiscale finite element methods may not be able to produce reliable approximations. This motivates the need of enrichment for coarse spaces. Two enrichment approaches are proposed, one is based on generalized multiscale finte element metthods (GMsFEM), while the other is based on spectral element-based algebraic multigrid (rAMGe). The former one, which is called mixed GMsFEM, is developed for both Darcy’s flow and linear elasticity. Application of the algorithm in two-phase flow simulations are demonstrated. For linear elasticity, the algorithm is subtly modified due to the symmetry requirement of the stress tensor. The latter enrichment approach is based on rAMGe. The algorithm differs from GMsFEM in that both of the velocity and pressure spaces are coarsened. Due the multigrid nature of the algorithm, recursive application is available, which results in an efficient multilevel construction of the coarse spaces. Stability, convergence analysis, and exhaustive numerical experiments are carried out to validate the proposed enrichment approaches. iii
Zhebel, E.; Minisini, S.; Kononov, A.; Mulder, W.A.
2013-01-01
With the rapid developments in parallel compute architectures, algorithms for seismic modeling and imaging need to be reconsidered in terms of parallelization. The aim of this paper is to compare scalability of seismic modeling algorithms: finite differences, continuous mass-lumped finite elements
McCallum, Ethan
2011-01-01
It's tough to argue with R as a high-quality, cross-platform, open source statistical software product-unless you're in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets. You'll learn the basics of Snow, Multicore, Parallel, and some Hadoop-related tools, including how to find them, how to use them, when they work well, and when they don't. With these packages, you can overcome R's single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R's memory barrier.
Impact of new computing systems on finite element computations
International Nuclear Information System (INIS)
Noor, A.K.; Fulton, R.E.; Storaasi, O.O.
1983-01-01
Recent advances in computer technology that are likely to impact finite element computations are reviewed. The characteristics of supersystems, highly parallel systems, and small systems (mini and microcomputers) are summarized. The interrelations of numerical algorithms and software with parallel architectures are discussed. A scenario is presented for future hardware/software environment and finite element systems. A number of research areas which have high potential for improving the effectiveness of finite element analysis in the new environment are identified
Combinatorics of spreads and parallelisms
Johnson, Norman
2010-01-01
Partitions of Vector Spaces Quasi-Subgeometry Partitions Finite Focal-SpreadsGeneralizing André SpreadsThe Going Up Construction for Focal-SpreadsSubgeometry Partitions Subgeometry and Quasi-Subgeometry Partitions Subgeometries from Focal-SpreadsExtended André SubgeometriesKantor's Flag-Transitive DesignsMaximal Additive Partial SpreadsSubplane Covered Nets and Baer Groups Partial Desarguesian t-Parallelisms Direct Products of Affine PlanesJha-Johnson SL(2,
Wakefield calculations on parallel computers
International Nuclear Information System (INIS)
Schoessow, P.
1990-01-01
The use of parallelism in the solution of wakefield problems is illustrated for two different computer architectures (SIMD and MIMD). Results are given for finite difference codes which have been implemented on a Connection Machine and an Alliant FX/8 and which are used to compute wakefields in dielectric loaded structures. Benchmarks on code performance are presented for both cases. 4 refs., 3 figs., 2 tabs
Multigrid preconditioned conjugate-gradient method for large-scale wave-front reconstruction.
Gilles, Luc; Vogel, Curtis R; Ellerbroek, Brent L
2002-09-01
We introduce a multigrid preconditioned conjugate-gradient (MGCG) iterative scheme for computing open-loop wave-front reconstructors for extreme adaptive optics systems. We present numerical simulations for a 17-m class telescope with n = 48756 sensor measurement grid points within the aperture, which indicate that our MGCG method has a rapid convergence rate for a wide range of subaperture average slope measurement signal-to-noise ratios. The total computational cost is of order n log n. Hence our scheme provides for fast wave-front simulation and control in large-scale adaptive optics systems.
A Multigrid Algorithm for an Elliptic Problem with a Perturbed Boundary Condition
Bonito, Andrea; Pasciak, Joseph E.
2013-01-01
We discuss the preconditioning of systems coupling elliptic operators in Ω⊂Rd, d=2,3, with elliptic operators defined on hypersurfaces. These systems arise naturally when physical phenomena are affected by geometric boundary forces, such as the evolution of liquid drops subject to surface tension. The resulting operators are sums of interior and boundary terms weighted by parameters. We investigate the behavior of multigrid algorithms suited to this context and demonstrate numerical results which suggest uniform preconditioning bounds that are level and parameter independent.
The development of an algebraic multigrid algorithm for symmetric positive definite linear systems
Energy Technology Data Exchange (ETDEWEB)
Vanek, P.; Mandel, J.; Brezina, M. [Univ. of Colorado, Denver, CO (United States)
1996-12-31
An algebraic multigrid algorithm for symmetric, positive definite linear systems is developed based on the concept of prolongation by smoothed aggregation. Coarse levels are generated automatically. We present a set of requirements motivated heuristically by a convergence theory. The algorithm then attempts to satisfy the requirements. Input to the method are the coefficient matrix and zero energy modes, which are determined from nodal coordinates and knowledge of the differential equation. Efficiency of the resulting algorithm is demonstrated by computational results on real world problems from solid elasticity, plate blending, and shells.
A multilevel correction adaptive finite element method for Kohn-Sham equation
Hu, Guanghui; Xie, Hehu; Xu, Fei
2018-02-01
In this paper, an adaptive finite element method is proposed for solving Kohn-Sham equation with the multilevel correction technique. In the method, the Kohn-Sham equation is solved on a fixed and appropriately coarse mesh with the finite element method in which the finite element space is kept improving by solving the derived boundary value problems on a series of adaptively and successively refined meshes. A main feature of the method is that solving large scale Kohn-Sham system is avoided effectively, and solving the derived boundary value problems can be handled efficiently by classical methods such as the multigrid method. Hence, the significant acceleration can be obtained on solving Kohn-Sham equation with the proposed multilevel correction technique. The performance of the method is examined by a variety of numerical experiments.
Directory of Open Access Journals (Sweden)
James G. Worner
2017-05-01
Full Text Available James Worner is an Australian-based writer and scholar currently pursuing a PhD at the University of Technology Sydney. His research seeks to expose masculinities lost in the shadow of Australia’s Anzac hegemony while exploring new opportunities for contemporary historiography. He is the recipient of the Doctoral Scholarship in Historical Consciousness at the university’s Australian Centre of Public History and will be hosted by the University of Bologna during 2017 on a doctoral research writing scholarship. ‘Parallel Lines’ is one of a collection of stories, The Shapes of Us, exploring liminal spaces of modern life: class, gender, sexuality, race, religion and education. It looks at lives, like lines, that do not meet but which travel in proximity, simultaneously attracted and repelled. James’ short stories have been published in various journals and anthologies.
National Research Council Canada - National Science Library
Liu, Chaoqun
1999-01-01
.... Four transitional stages are observed: the linear and weakly nonlinear growth, the appearance of staggered A-vortex patterns, the evolution of A-vortex into hairpin vortex, the breakdown of hairpin vortical structures...
Compiler generation and autotuning of communication-avoiding operators for geometric multigrid
Energy Technology Data Exchange (ETDEWEB)
Basu, Protonu [Univ. of Utah, Salt Lake City, UT (United States); Venkat, Anand [Univ. of Utah, Salt Lake City, UT (United States); Hall, Mary [Univ. of Utah, Salt Lake City, UT (United States); Williams, Samuel [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Van Straalen, Brian [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Oliker, Leonid [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
2014-04-17
This paper describes a compiler approach to introducing communication-avoiding optimizations in geometric multigrid (GMG), one of the most popular methods for solving partial differential equations. Communication-avoiding optimizations reduce vertical communication through the memory hierarchy and horizontal communication across processes or threads, usually at the expense of introducing redundant computation. We focus on applying these optimizations to the smooth operator, which successively reduces the error and accounts for the largest fraction of the GMG execution time. Our compiler technology applies both novel and known transformations to derive an implementation comparable to manually-tuned code. To make the approach portable, an underlying autotuning system explores the tradeoff between reduced communication and increased computation, as well as tradeoffs in threading schemes, to automatically identify the best implementation for a particular architecture and at each computation phase. Results show that we are able to quadruple the performance of the smooth operation on the finest grids while attaining performance within 94% of manually-tuned code. Overall we improve the overall multigrid solve time by 2.5× without sacrificing programer productivity.
Is the Multigrid Method Fault Tolerant? The Two-Grid Case
Energy Technology Data Exchange (ETDEWEB)
Ainsworth, Mark [Brown Univ., Providence, RI (United States). Division of Applied Mathematics; Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Computer Science and Mathematics Division; Glusa, Christian [Brown Univ., Providence, RI (United States). Division of Applied Mathematics
2016-06-30
The predicted reduced resiliency of next-generation high performance computers means that it will become necessary to take into account the effects of randomly occurring faults on numerical methods. Further, in the event of a hard fault occurring, a decision has to be made as to what remedial action should be taken in order to resume the execution of the algorithm. The action that is chosen can have a dramatic effect on the performance and characteristics of the scheme. Ideally, the resulting algorithm should be subjected to the same kind of mathematical analysis that was applied to the original, deterministic variant. The purpose of this work is to provide an analysis of the behaviour of the multigrid algorithm in the presence of faults. Multigrid is arguably the method of choice for the solution of large-scale linear algebra problems arising from discretization of partial differential equations and it is of considerable importance to anticipate its behaviour on an exascale machine. The analysis of resilience of algorithms is in its infancy and the current work is perhaps the first to provide a mathematical model for faults and analyse the behaviour of a state-of-the-art algorithm under the model. It is shown that the Two Grid Method fails to be resilient to faults. Attention is then turned to identifying the minimal necessary remedial action required to restore the rate of convergence to that enjoyed by the ideal fault-free method.
Cellular Automaton Modeling of Dendritic Growth Using a Multi-grid Method
International Nuclear Information System (INIS)
Natsume, Y; Ohsasa, K
2015-01-01
A two-dimensional cellular automaton model with a multi-grid method was developed to simulate dendritic growth. In the present model, we used a triple-grid system for temperature, solute concentration and solid fraction fields as a new approach of the multi-grid method. In order to evaluate the validity of the present model, we carried out simulations of single dendritic growth, secondary dendrite arm growth, multi-columnar dendritic growth and multi-equiaxed dendritic growth. From the results of the grid dependency from the simulation of single dendritic growth, we confirmed that the larger grid can be used in the simulation and that the computational time can be reduced dramatically. In the simulation of secondary dendrite arm growth, the results from the present model were in good agreement with the experimental data and the simulated results from a phase-field model. Thus, the present model can quantitatively simulate dendritic growth. From the simulated results of multi-columnar and multi-equiaxed dendrites, we confirmed that the present model can perform simulations under practical solidification conditions. (paper)
{sup 10}B multi-grid proportional gas counters for large area thermal neutron detectors
Energy Technology Data Exchange (ETDEWEB)
Andersen, K. [ESS, P.O. Box 176, SE-221 00 Lund (Sweden); Bigault, T. [ILL, BP 156, 6, rue Jules Horowitz, 38042 Grenoble Cedex 9 (France); Birch, J. [Linköping University, SE-581, 83 Linköping (Sweden); Buffet, J. C.; Correa, J. [ILL, BP 156, 6, rue Jules Horowitz, 38042 Grenoble Cedex 9 (France); Hall-Wilton, R. [ESS, P.O. Box 176, SE-221 00 Lund (Sweden); Hultman, L. [Linköping University, SE-581, 83 Linköping (Sweden); Höglund, C. [ESS, P.O. Box 176, SE-221 00 Lund (Sweden); Linköping University, SE-581, 83 Linköping (Sweden); Guérard, B., E-mail: guerard@ill.fr [ILL, BP 156, 6, rue Jules Horowitz, 38042 Grenoble Cedex 9 (France); Jensen, J. [Linköping University, SE-581, 83 Linköping (Sweden); Khaplanov, A. [ILL, BP 156, 6, rue Jules Horowitz, 38042 Grenoble Cedex 9 (France); ESS, P.O. Box 176, SE-221 00 Lund (Sweden); Kirstein, O. [Linköping University, SE-581, 83 Linköping (Sweden); Piscitelli, F.; Van Esch, P. [ILL, BP 156, 6, rue Jules Horowitz, 38042 Grenoble Cedex 9 (France); Vettier, C. [ESS, P.O. Box 176, SE-221 00 Lund (Sweden)
2013-08-21
{sup 3}He was a popular material in neutrons detectors until its availability dropped drastically in 2008. The development of techniques based on alternative convertors is now of high priority for neutron research institutes. Thin films of {sup 10}B or {sup 10}B{sub 4}C have been used in gas proportional counters to detect neutrons, but until now, only for small or medium sensitive area. We present here the multi-grid design, introduced at the ILL and developed in collaboration with ESS for LAN (large area neutron) detectors. Typically thirty {sup 10}B{sub 4}C films of 1 μm thickness are used to convert neutrons into ionizing particles which are subsequently detected in a proportional gas counter. The principle and the fabrication of the multi-grid are described and some preliminary results obtained with a prototype of 200 cm×8 cm are reported; a detection efficiency of 48% has been measured at 2.5 Å with a monochromatic neutron beam line, showing the good potential of this new technique.
Atkins, H. L.; Helenbrook, B. T.
2005-01-01
This paper describes numerical experiments with P-multigrid to corroborate analysis, validate the present implementation, and to examine issues that arise in the implementations of the various combinations of relaxation schemes, discretizations and P-multigrid methods. The two approaches to implement P-multigrid presented here are equivalent for most high-order discretization methods such as spectral element, SUPG, and discontinuous Galerkin applied to advection; however it is discovered that the approach that mimics the common geometric multigrid implementation is less robust, and frequently unstable when applied to discontinuous Galerkin discretizations of di usion. Gauss-Seidel relaxation converges 40% faster than block Jacobi, as predicted by analysis; however, the implementation of Gauss-Seidel is considerably more expensive that one would expect because gradients in most neighboring elements must be updated. A compromise quasi Gauss-Seidel relaxation method that evaluates the gradient in each element twice per iteration converges at rates similar to those predicted for true Gauss-Seidel.
A NetCDF version of the two-dimensional energy balance model based on the full multigrid algorithm
Zhuang, Kelin; North, Gerald R.; Stevens, Mark J.
A NetCDF version of the two-dimensional energy balance model based on the full multigrid method in Fortran is introduced for both pedagogical and research purposes. Based on the land-sea-ice distribution, orbital elements, greenhouse gases concentration, and albedo, the code calculates the global seasonal surface temperature. A step-by-step guide with examples is provided for practice.
Directory of Open Access Journals (Sweden)
S. Sleesongsom
2015-01-01
Full Text Available This paper has twin aims. Firstly, a multigrid design approach for optimization of an unconventional morphing wing is proposed. The structural design problem is assigned to optimize wing mass, lift effectiveness, and buckling factor subject to structural safety requirements. Design variables consist of partial topology, nodal positions, and component sizes of a wing internal structure. Such a design process can be accomplished by using multiple resolutions of ground elements, which is called a multigrid approach. Secondly, an opposite-based multiobjective population-based incremental learning (OMPBIL is proposed for comparison with the original multiobjective population-based incremental learning (MPBIL. Multiobjective design problems with single-grid and multigrid design variables are then posed and tackled by OMPBIL and MPBIL. The results show that using OMPBIL in combination with a multigrid design approach is the best design strategy. OMPBIL is superior to MPBIL since the former provides better population diversity. Aeroelastic trim for an elastic morphing wing is also presented.
On the fixed-stress split scheme as smoother in multigrid methods for coupling flow and geomechanics
F.J. Gaspar Lorenz (Franscisco); C. Rodrigo (Carmen)
2017-01-01
textabstractThe fixed-stress split method has been widely used as solution method in the coupling of flow and geomechanics. In this work, we analyze the behavior of an inexact version of this algorithm as smoother within a geometric multigrid method, in order to obtain an efficient monolithic solver
International Nuclear Information System (INIS)
Tonks, M.R.; Williamson, R.; Masson, R.
2015-01-01
The Finite Element Method (FEM) is a numerical technique for finding approximate solutions to boundary value problems. While FEM is commonly used to solve solid mechanics equations, it can be applied to a large range of BVPs from many different fields. FEM has been used for reactor fuels modelling for many years. It is most often used for fuel performance modelling at the pellet and pin scale, however, it has also been used to investigate properties of the fuel material, such as thermal conductivity and fission gas release. Recently, the United Stated Department Nuclear Energy Advanced Modelling and Simulation Program has begun using FEM as the basis of the MOOSE-BISON-MARMOT Project that is developing a multi-dimensional, multi-physics fuel performance capability that is massively parallel and will use multi-scale material models to provide a truly predictive modelling capability. (authors)
Massively parallel evolutionary computation on GPGPUs
Tsutsui, Shigeyoshi
2013-01-01
Evolutionary algorithms (EAs) are metaheuristics that learn from natural collective behavior and are applied to solve optimization problems in domains such as scheduling, engineering, bioinformatics, and finance. Such applications demand acceptable solutions with high-speed execution using finite computational resources. Therefore, there have been many attempts to develop platforms for running parallel EAs using multicore machines, massively parallel cluster machines, or grid computing environments. Recent advances in general-purpose computing on graphics processing units (GPGPU) have opened u
Bonito, Andrea
2012-09-01
We design and analyze variational and non-variational multigrid algorithms for the Laplace-Beltrami operator on a smooth and closed surface. In both cases, a uniform convergence for the V -cycle algorithm is obtained provided the surface geometry is captured well enough by the coarsest grid. The main argument hinges on a perturbation analysis from an auxiliary variational algorithm defined directly on the smooth surface. In addition, the vanishing mean value constraint is imposed on each level, thereby avoiding singular quadratic forms without adding additional computational cost. Numerical results supporting our analysis are reported. In particular, the algorithms perform well even when applied to surfaces with a large aspect ratio. © 2011 American Mathematical Society.
Layer-oriented multigrid wavefront reconstruction algorithms for multi-conjugate adaptive optics
Gilles, Luc; Ellerbroek, Brent L.; Vogel, Curtis R.
2003-02-01
Multi-conjugate adaptive optics (MCAO) systems with 104-105 degrees of freedom have been proposed for future giant telescopes. Using standard matrix methods to compute, optimize, and implement wavefront control algorithms for these systems is impractical, since the number of calculations required to compute and apply the reconstruction matrix scales respectively with the cube and the square of the number of AO degrees of freedom. In this paper, we develop an iterative sparse matrix implementation of minimum variance wavefront reconstruction for telescope diameters up to 32m with more than 104 actuators. The basic approach is the preconditioned conjugate gradient method, using a multigrid preconditioner incorporating a layer-oriented (block) symmetric Gauss-Seidel iterative smoothing operator. We present open-loop numerical simulation results to illustrate algorithm convergence.
Mizuno, T; Taniguchi, M; Kashiwagi, M; Umeda, N; Tobari, H; Watanabe, K; Dairaku, M; Sakamoto, K; Inoue, T
2010-02-01
Heat load on acceleration grids by secondary particles such as electrons, neutrals, and positive ions, is a key issue for long pulse acceleration of negative ion beams. Complicated behaviors of the secondary particles in multiaperture, multigrid (MAMuG) accelerator have been analyzed using electrostatic accelerator Monte Carlo code. The analytical result is compared to experimental one obtained in a long pulse operation of a MeV accelerator, of which second acceleration grid (A2G) was removed for simplification of structure. The analytical results show that relatively high heat load on the third acceleration grid (A3G) since stripped electrons were deposited mainly on A3G. This heat load on the A3G can be suppressed by installing the A2G. Thus, capability of MAMuG accelerator is demonstrated for suppression of heat load due to secondary particles by the intermediate grids.
MAPCUMBA: A fast iterative multi-grid map-making algorithm for CMB experiments
Doré, O.; Teyssier, R.; Bouchet, F. R.; Vibert, D.; Prunet, S.
2001-07-01
The data analysis of current Cosmic Microwave Background (CMB) experiments like BOOMERanG or MAXIMA poses severe challenges which already stretch the limits of current (super-) computer capabilities, if brute force methods are used. In this paper we present a practical solution for the optimal map making problem which can be used directly for next generation CMB experiments like ARCHEOPS and TopHat, and can probably be extended relatively easily to the full PLANCK case. This solution is based on an iterative multi-grid Jacobi algorithm which is both fast and memory sparing. Indeed, if there are Ntod data points along the one dimensional timeline to analyse, the number of operations is of O (Ntod \\ln Ntod) and the memory requirement is O (Ntod). Timing and accuracy issues have been analysed on simulated ARCHEOPS and TopHat data, and we discuss as well the issue of the joint evaluation of the signal and noise statistical properties.
Southworth, Benjamin Scott
PART I: One of the most fascinating questions to humans has long been whether life exists outside of our planet. To our knowledge, water is a fundamental building block of life, which makes liquid water on other bodies in the universe a topic of great interest. In fact, there are large bodies of water right here in our solar system, underneath the icy crust of moons around Saturn and Jupiter. The NASA-ESA Cassini Mission spent two decades studying the Saturnian system. One of the many exciting discoveries was a "plume" on the south pole of Enceladus, emitting hundreds of kg/s of water vapor and frozen water-ice particles from Enceladus' subsurface ocean. It has since been determined that Enceladus likely has a global liquid water ocean separating its rocky core from icy surface, with conditions that are relatively favorable to support life. The plume is of particular interest because it gives direct access to ocean particles from space, by flying through the plume. Recently, evidence has been found for similar geological activity occurring on Jupiter's moon Europa, long considered one of the most likely candidate bodies to support life in our solar system. Here, a model for plume-particle dynamics is developed based on studies of the Enceladus plume and data from the Cassini Cosmic Dust Analyzer. A C++, OpenMP/MPI parallel software package is then built to run large scale simulations of dust plumes on planetary satellites. In the case of Enceladus, data from simulations and the Cassini mission provide insight into the structure of emissions on the surface, the total mass production of the plume, and the distribution of particles being emitted. Each of these are fundamental to understanding the plume and, for Europa and Enceladus, simulation data provide important results for the planning of future missions to these icy moons. In particular, this work has contributed to the Europa Clipper mission and proposed Enceladus Life Finder. PART II: Solving large, sparse
Leamer, Micah J.
2004-01-01
Let K be a field and Q a finite directed multi-graph. In this paper I classify all path algebras KQ and admissible orders with the property that all of their finitely generated ideals have finite Groebner bases. MS
Locally Finite Root Supersystems
Yousofzadeh, Malihe
2013-01-01
We introduce the notion of locally finite root supersystems as a generalization of both locally finite root systems and generalized root systems. We classify irreducible locally finite root supersystems.
Vector and parallel processors in computational science
International Nuclear Information System (INIS)
Duff, I.S.; Reid, J.K.
1985-01-01
This book presents the papers given at a conference which reviewed the new developments in parallel and vector processing. Topics considered at the conference included hardware (array processors, supercomputers), programming languages, software aids, numerical methods (e.g., Monte Carlo algorithms, iterative methods, finite elements, optimization), and applications (e.g., neutron transport theory, meteorology, image processing)
Parallel computation of rotating flows
DEFF Research Database (Denmark)
Lundin, Lars Kristian; Barker, Vincent A.; Sørensen, Jens Nørkær
1999-01-01
This paper deals with the simulation of 3‐D rotating flows based on the velocity‐vorticity formulation of the Navier‐Stokes equations in cylindrical coordinates. The governing equations are discretized by a finite difference method. The solution is advanced to a new time level by a two‐step process...... is that of solving a singular, large, sparse, over‐determined linear system of equations, and the iterative method CGLS is applied for this purpose. We discuss some of the mathematical and numerical aspects of this procedure and report on the performance of our software on a wide range of parallel computers. Darbe...
An inherently parallel method for solving discretized diffusion equations
International Nuclear Information System (INIS)
Eccleston, B.R.; Palmer, T.S.
1999-01-01
A Monte Carlo approach to solving linear systems of equations is being investigated in the context of the solution of discretized diffusion equations. While the technique was originally devised decades ago, changes in computer architectures (namely, massively parallel machines) have driven the authors to revisit this technique. There are a number of potential advantages to this approach: (1) Analog Monte Carlo techniques are inherently parallel; this is not necessarily true to today's more advanced linear equation solvers (multigrid, conjugate gradient, etc.); (2) Some forms of this technique are adaptive in that they allow the user to specify locations in the problem where resolution is of particular importance and to concentrate the work at those locations; and (3) These techniques permit the solution of very large systems of equations in that matrix elements need not be stored. The user could trade calculational speed for storage if elements of the matrix are calculated on the fly. The goal of this study is to compare the parallel performance of Monte Carlo linear solvers to that of a more traditional parallelized linear solver. The authors observe the linear speedup that they expect from the Monte Carlo algorithm, given that there is no domain decomposition to cause significant communication overhead. Overall, PETSc outperforms the Monte Carlo solver for the test problem. The PETSc parallel performance improves with larger numbers of unknowns for a given number of processors. Parallel performance of the Monte Carlo technique is independent of the size of the matrix and the number of processes. They are investigating modifications to the scheme to accommodate matrix problems with positive off-diagonal elements. They are also currently coding an on-the-fly version of the algorithm to investigate the solution of very large linear systems
A NetCDF version of the two-dimensional energy balance model based on the full multigrid algorithm
Directory of Open Access Journals (Sweden)
Kelin Zhuang
2017-01-01
Full Text Available A NetCDF version of the two-dimensional energy balance model based on the full multigrid method in Fortran is introduced for both pedagogical and research purposes. Based on the land–sea–ice distribution, orbital elements, greenhouse gases concentration, and albedo, the code calculates the global seasonal surface temperature. A step-by-step guide with examples is provided for practice.
Turcksin, Bruno; Ragusa, Jean C.; Morel, Jim E.
2012-01-01
It is well known that the diffusion synthetic acceleration (DSA) methods for the Sn equations become ineffective in the Fokker-Planck forward-peaked scattering limit. In response to this deficiency, Morel and Manteuffel (1991) developed an angular multigrid method for the 1-D Sn equations. This method is very effective, costing roughly twice as much as DSA per source iteration, and yielding a maximum spectral radius of approximately 0.6 in the Fokker-Planck limit. Pautz, Adams, and Morel (PAM) (1999) later generalized the angular multigrid to 2-D, but it was found that the method was unstable with sufficiently forward-peaked mappings between the angular grids. The method was stabilized via a filtering technique based on diffusion operators, but this filtering also degraded the effectiveness of the overall scheme. The spectral radius was not bounded away from unity in the Fokker-Planck limit, although the method remained more effective than DSA. The purpose of this article is to recast the multidimensional PAM angular multigrid method without the filtering as an Sn preconditioner and use it in conjunction with the Generalized Minimal RESidual (GMRES) Krylov method. The approach ensures stability and our computational results demonstrate that it is also significantly more efficient than an analogous DSA-preconditioned Krylov method.
3D, parallel fluid-structure interaction code
CSIR Research Space (South Africa)
Oxtoby, Oliver F
2011-01-01
Full Text Available The authors describe the development of a 3D parallel Fluid–Structure–Interaction (FSI) solver and its application to benchmark problems. Fluid and solid domains are discretised using and edge-based finite-volume scheme for efficient parallel...
On Chudnovsky-Based Arithmetic Algorithms in Finite Fields
Atighehchi, Kevin; Ballet, Stéphane; Bonnecaze, Alexis; Rolland, Robert
2015-01-01
Thanks to a new construction of the so-called Chudnovsky-Chudnovsky multiplication algorithm, we design efficient algorithms for both the exponentiation and the multiplication in finite fields. They are tailored to hardware implementation and they allow computations to be parallelized while maintaining a low number of bilinear multiplications. We give an example with the finite field ${\\mathbb F}_{16^{13}}$.
Finite-Larmor-radius stability theory of EBT plasmas
International Nuclear Information System (INIS)
Berk, H.L.; Cheng, C.Z.; Rosenbluth, M.N.; Van Dam, J.W.
1982-11-01
An eikonal ballooning-mode formalism is developed to describe curvature-driven modes of hot electron plasmas in bumpy tori. The formalism treats frequencies comparable to the ion-cyclotron frequency, as well as arbitrary finite Larmor radius and field polarization, although the detailed analysis is restricted to E/sub parallel/ = 0. Moderate hot-electron finite-Larmor-radius effects are found to lower the background beta core limit, whereas strong finite-Lamor-radius effects produce stabilization
Parallel Programming with Intel Parallel Studio XE
Blair-Chappell , Stephen
2012-01-01
Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the
Recent development of the Multi-Grid detector for large area neutron scattering instruments
International Nuclear Information System (INIS)
Guerard, Bruno
2015-01-01
Most of the Neutron Scattering facilities are committed in a continuous program of modernization of their instruments, requiring large area and high performance thermal neutron detectors. Beside scintillators detectors, 3 He detectors, like linear PSDs (Position Sensitive Detectors) and MWPCs (Multi-Wires Proportional Chambers), are the most current techniques nowadays. Time Of Flight instruments are using 3 He PSDs mounted side by side to cover tens of m 2 . As a result of the so-called ' 3 He shortage crisis , the volume of 3He which is needed to build one of these instruments is not accessible anymore. The development of alternative techniques requiring no 3He, has been given high priority to secure the future of neutron scattering instrumentation. This is particularly important in the context where the future ESS (European Spallation Source) will start its operation in 2019-2020. Improved scintillators represent one of the alternative techniques. Another one is the Multi-Grid introduced at the ILL in 2009. A Multi-Grid detector is composed of several independent modules of typically 0.8 m x 3 m sensitive area, mounted side by side in air or in a vacuum TOF chamber. One module is composed of segmented boron-lined proportional counters mounted in a gas vessel; the counters, of square section, are assembled with Aluminium grids electrically insulated and stacked together. This design provides two advantages: First, magnetron sputtering techniques can be used to coat B 4 C films on planar substrates, and second, the neutron position along the anode wires can be measured by reading out individually the grid signals with fast shaping amplifiers followed by comparators. Unlike charge division localisation in linear PSDs, the individual readout of the grids allows operating the Multi-Grid at a low amplification gain, hence this detector is tolerant to mechanical defects and its production accessible to laboratories equipped with standard equipment. Prototypes of
Recent development of the Multi-Grid detector for large area neutron scattering instruments
Energy Technology Data Exchange (ETDEWEB)
Guerard, Bruno [ILL-ESS-LiU collaboration, CRISP project, Institut Laue Langevin - ILL, Grenoble (France)
2015-07-01
Most of the Neutron Scattering facilities are committed in a continuous program of modernization of their instruments, requiring large area and high performance thermal neutron detectors. Beside scintillators detectors, {sup 3}He detectors, like linear PSDs (Position Sensitive Detectors) and MWPCs (Multi-Wires Proportional Chambers), are the most current techniques nowadays. Time Of Flight instruments are using {sup 3}He PSDs mounted side by side to cover tens of m{sup 2}. As a result of the so-called '{sup 3}He shortage crisis{sup ,} the volume of 3He which is needed to build one of these instruments is not accessible anymore. The development of alternative techniques requiring no 3He, has been given high priority to secure the future of neutron scattering instrumentation. This is particularly important in the context where the future ESS (European Spallation Source) will start its operation in 2019-2020. Improved scintillators represent one of the alternative techniques. Another one is the Multi-Grid introduced at the ILL in 2009. A Multi-Grid detector is composed of several independent modules of typically 0.8 m x 3 m sensitive area, mounted side by side in air or in a vacuum TOF chamber. One module is composed of segmented boron-lined proportional counters mounted in a gas vessel; the counters, of square section, are assembled with Aluminium grids electrically insulated and stacked together. This design provides two advantages: First, magnetron sputtering techniques can be used to coat B{sub 4}C films on planar substrates, and second, the neutron position along the anode wires can be measured by reading out individually the grid signals with fast shaping amplifiers followed by comparators. Unlike charge division localisation in linear PSDs, the individual readout of the grids allows operating the Multi-Grid at a low amplification gain, hence this detector is tolerant to mechanical defects and its production accessible to laboratories equipped with standard
Energy Technology Data Exchange (ETDEWEB)
Birch, J; Hultman, L; Höglund, C [Linköping University, Thin Film Physics Division, IFM, SE-581 83 Linköping (Sweden); Buffet, J-C; Clergeau, J-F; Correa, J; Van Esch, P; Ferraton, M; Guerard, B; Halbwachs, J; Khaplanov, A; Koza, M; Piscitelli, F; Zbiri, M [Institute Laue Langevin, Rue Jules Horowitz, FR-38000 Grenoble (France); Hall-Wilton, R [European Spallation Source ESS AB, P.O Box 176, SE-221 00 Lund (Sweden)
2014-07-24
A neutron detector concept based on solid layers of boron carbide enriched in {sup 10}B has been in development for the last few years as an alternative for {sup 3}He by collaboration between the ILL, ESS and Linköping University. This Multi-Grid detector uses layers of aluminum substrates coated with {sup 10}B{sub 4}C on both sides that are traversed by the incoming neutrons. Detection is achieved using a gas counter readout principle. By segmenting the substrate and using multiple anode wires, the detector is made inherently position sensitive. This development is aimed primarily at neutron scattering instruments with large detector areas, such as time-of-flight chopper spectrometers. The most recent prototype has been built to be interchangeable with the {sup 3}He detectors of IN6 at ILL. The {sup 10}B detector has an active area of 32 x 48cm{sup 2}. It was installed at the IN6 instrument and operated for several weeks, collecting data in parallel with the regularly scheduled experiments, thus providing the first side-by-side comparison with the conventional {sup 3}He detectors. Results include an efficiency comparison, assessment of the in-detector scattering contribution, sensitivity to gamma-rays and the signal-to-noise ratio in time-of-flight spectra. The good expected performance has been confirmed with the exception of an unexpected background count rate. This has been identified as natural alpha activity in aluminum. New convertor substrates are under study to eliminate this source of background.
Domain decomposition methods and parallel computing
International Nuclear Information System (INIS)
Meurant, G.
1991-01-01
In this paper, we show how to efficiently solve large linear systems on parallel computers. These linear systems arise from discretization of scientific computing problems described by systems of partial differential equations. We show how to get a discrete finite dimensional system from the continuous problem and the chosen conjugate gradient iterative algorithm is briefly described. Then, the different kinds of parallel architectures are reviewed and their advantages and deficiencies are emphasized. We sketch the problems found in programming the conjugate gradient method on parallel computers. For this algorithm to be efficient on parallel machines, domain decomposition techniques are introduced. We give results of numerical experiments showing that these techniques allow a good rate of convergence for the conjugate gradient algorithm as well as computational speeds in excess of a billion of floating point operations per second. (author). 5 refs., 11 figs., 2 tabs., 1 inset
Damage mapping in structural health monitoring using a multi-grid architecture
Energy Technology Data Exchange (ETDEWEB)
Mathews, V. John [Dept. of Electrical and Computer Engineering, University of Utah, Salt Lake City, UT 84112 (United States)
2015-03-31
This paper presents a multi-grid architecture for tomography-based damage mapping of composite aerospace structures. The system employs an array of piezo-electric transducers bonded on the structure. Each transducer may be used as an actuator as well as a sensor. The structure is excited sequentially using the actuators and the guided waves arriving at the sensors in response to the excitations are recorded for further analysis. The sensor signals are compared to their baseline counterparts and a damage index is computed for each actuator-sensor pair. These damage indices are then used as inputs to the tomographic reconstruction system. Preliminary damage maps are reconstructed on multiple coordinate grids defined on the structure. These grids are shifted versions of each other where the shift is a fraction of the spatial sampling interval associated with each grid. These preliminary damage maps are then combined to provide a reconstruction that is more robust to measurement noise in the sensor signals and the ill-conditioned problem formulation for single-grid algorithms. Experimental results on a composite structure with complexity that is representative of aerospace structures included in the paper demonstrate that for sufficiently high sensor densities, the algorithm of this paper is capable of providing damage detection and characterization with accuracy comparable to traditional C-scan and A-scan-based ultrasound non-destructive inspection systems quickly and without human supervision.
An application of multigrid methods for a discrete elastic model for epitaxial systems
International Nuclear Information System (INIS)
Caflisch, R.E.; Lee, Y.-J.; Shu, S.; Xiao, Y.-X.; Xu, J.
2006-01-01
We apply an efficient and fast algorithm to simulate the atomistic strain model for epitaxial systems, recently introduced by Schindler et al. [Phys. Rev. B 67, 075316 (2003)]. The discrete effects in this lattice statics model are crucial for proper simulation of the influence of strain for thin film epitaxial growth, but the size of the atomistic systems of interest is in general quite large and hence the solution of the discrete elastic equations is a considerable numerical challenge. In this paper, we construct an algebraic multigrid method suitable for efficient solution of the large scale discrete strain model. Using this method, simulations are performed for several representative physical problems, including an infinite periodic step train, a layered nanocrystal, and a system of quantum dots. The results demonstrate the effectiveness and robustness of the method and show that the method attains optimal convergence properties, regardless of the problem size, the geometry and the physical parameters. The effects of substrate depth and of invariance due to traction-free boundary conditions are assessed. For a system of quantum dots, the simulated strain energy density supports the observations that trench formation near the dots provides strain relief
An implicit multigrid algorithm for computing hypersonic, chemically reacting viscous flows
International Nuclear Information System (INIS)
Edwards, J.R.
1996-01-01
An implicit algorithm for computing viscous flows in chemical nonequilibrium is presented. Emphasis is placed on the numerical efficiency of the time integration scheme, both in terms of periteration workload and overall convergence rate. In this context, several techniques are introduced, including a stable, O(m 2 ) approximate factorization of the chemical source Jacobian and implementations of V-cycle and filtered multigrid acceleration methods. A five species-seventeen reaction air model is used to calculate hypersonic viscous flow over a cylinder at conditions corresponding to flight at 5 km/s, 60 km altitude and at 11.36 km/s, 76.42 km altitude. Inviscid calculations using an eleven-species reaction mechanism including ionization are presented for a case involving 11.37 km/s flow at an altitude of 84.6 km. Comparisons among various options for the implicit treatment of the chemical source terms and among different multilevel approaches for convergence acceleration are presented for all simulations
Multigrid solution of incompressible turbulent flows by using two-equation turbulence models
Energy Technology Data Exchange (ETDEWEB)
Zheng, X.; Liu, C. [Front Range Scientific Computations, Inc., Denver, CO (United States); Sung, C.H. [David Taylor Model Basin, Bethesda, MD (United States)
1996-12-31
Most of practical flows are turbulent. From the interest of engineering applications, simulation of realistic flows is usually done through solution of Reynolds-averaged Navier-Stokes equations and turbulence model equations. It has been widely accepted that turbulence modeling plays a very important role in numerical simulation of practical flow problem, particularly when the accuracy is of great concern. Among the most used turbulence models today, two-equation models appear to be favored for the reason that they are more general than algebraic models and affordable with current available computer resources. However, investigators using two-equation models seem to have been more concerned with the solution of N-S equations. Less attention is paid to the solution method for the turbulence model equations. In most cases, the turbulence model equations are loosely coupled with N-S equations, multigrid acceleration is only applied to the solution of N-S equations due to perhaps the fact the turbulence model equations are source-term dominant and very stiff in sublayer region.
Morse, H Stephen
1994-01-01
Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi
Akl, Selim G
1985-01-01
Parallel Sorting Algorithms explains how to use parallel algorithms to sort a sequence of items on a variety of parallel computers. The book reviews the sorting problem, the parallel models of computation, parallel algorithms, and the lower bounds on the parallel sorting problems. The text also presents twenty different algorithms, such as linear arrays, mesh-connected computers, cube-connected computers. Another example where algorithm can be applied is on the shared-memory SIMD (single instruction stream multiple data stream) computers in which the whole sequence to be sorted can fit in the
International Nuclear Information System (INIS)
Acharya, B.S.; Douglas, M.R.
2006-06-01
We present evidence that the number of string/M theory vacua consistent with experiments is finite. We do this both by explicit analysis of infinite sequences of vacua and by applying various mathematical finiteness theorems. (author)
Nilpotent -local finite groups
Cantarero, José; Scherer, Jérôme; Viruel, Antonio
2014-10-01
We provide characterizations of -nilpotency for fusion systems and -local finite groups that are inspired by known result for finite groups. In particular, we generalize criteria by Atiyah, Brunetti, Frobenius, Quillen, Stammbach and Tate.
International Nuclear Information System (INIS)
Lee, Byeong Hae
1992-02-01
This book gives descriptions of basic finite element method, which includes basic finite element method and data, black box, writing of data, definition of VECTOR, definition of matrix, matrix and multiplication of matrix, addition of matrix, and unit matrix, conception of hardness matrix like spring power and displacement, governed equation of an elastic body, finite element method, Fortran method and programming such as composition of computer, order of programming and data card and Fortran card, finite element program and application of nonelastic problem.
Alabdulmohsin, Ibrahim M.
2018-01-01
In this chapter, we extend the previous results of Chap. 2 to the more general case of composite finite sums. We describe what composite finite sums are and how their analysis can be reduced to the analysis of simple finite sums using the chain rule. We apply these techniques, next, on numerical integration and on some identities of Ramanujan.
Alabdulmohsin, Ibrahim M.
2018-03-07
In this chapter, we extend the previous results of Chap. 2 to the more general case of composite finite sums. We describe what composite finite sums are and how their analysis can be reduced to the analysis of simple finite sums using the chain rule. We apply these techniques, next, on numerical integration and on some identities of Ramanujan.
Introduction to parallel programming
Brawer, Steven
1989-01-01
Introduction to Parallel Programming focuses on the techniques, processes, methodologies, and approaches involved in parallel programming. The book first offers information on Fortran, hardware and operating system models, and processes, shared memory, and simple parallel programs. Discussions focus on processes and processors, joining processes, shared memory, time-sharing with multiple processors, hardware, loops, passing arguments in function/subroutine calls, program structure, and arithmetic expressions. The text then elaborates on basic parallel programming techniques, barriers and race
Fox, Geoffrey C; Messina, Guiseppe C
2014-01-01
A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop
Multi-grid Beam and Warming scheme for the simulation of unsteady ...
African Journals Online (AJOL)
2010-03-08
Mar 8, 2010 ... cal methods, the implicit finite-difference method and finite element method ...... there is no explicit matrix G and there are merely a number of block matrices, the .... is usually used as an input function for flood routing analysis.
Hsieh, Shang-Hsien
1993-01-01
The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.
International Nuclear Information System (INIS)
Baker, K.L.
2005-01-01
This article details a multigrid algorithm that is suitable for least-squares wave-front reconstruction of Shack-Hartmann and shearing interferometer wave-front sensors. The algorithm detailed in this article is shown to scale with the number of subapertures in the same fashion as fast Fourier transform techniques, making it suitable for use in applications requiring a large number of subapertures and high Strehl ratio systems such as for high spatial frequency characterization of high-density plasmas, optics metrology, and multiconjugate and extreme adaptive optics systems
Parallel Atomistic Simulations
Energy Technology Data Exchange (ETDEWEB)
HEFFELFINGER,GRANT S.
2000-01-18
Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.
Application Of Multi-grid Method On China Seas' Temperature Forecast
Li, W.; Xie, Y.; He, Z.; Liu, K.; Han, G.; Ma, J.; Li, D.
2006-12-01
Correlation scales have been used in traditional scheme of 3-dimensional variational (3D-Var) data assimilation to estimate the background error covariance for the numerical forecast and reanalysis of atmosphere and ocean for decades. However there are still some drawbacks of this scheme. First, the correlation scales are difficult to be determined accurately. Second, the positive definition of the first-guess error covariance matrix cannot be guaranteed unless the correlation scales are sufficiently small. Xie et al. (2005) indicated that a traditional 3D-Var only corrects some certain wavelength errors and its accuracy depends on the accuracy of the first-guess covariance. And in general, short wavelength error can not be well corrected until long one is corrected and then inaccurate first-guess covariance may mistakenly take long wave error as short wave ones and result in erroneous analysis. For the purpose of quickly minimizing the errors of long and short waves successively, a new 3D-Var data assimilation scheme, called multi-grid data assimilation scheme, is proposed in this paper. By assimilating the shipboard SST and temperature profiles data into a numerical model of China Seas, we applied this scheme in two-month data assimilation and forecast experiment which ended in a favorable result. Comparing with the traditional scheme of 3D-Var, the new scheme has higher forecast accuracy and a lower forecast Root-Mean-Square (RMS) error. Furthermore, this scheme was applied to assimilate the SST of shipboard, AVHRR Pathfinder Version 5.0 SST and temperature profiles at the same time, and a ten-month forecast experiment on sea temperature of China Seas was carried out, in which a successful forecast result was obtained. Particularly, the new scheme is demonstrated a great numerical efficiency in these analyses.
Whiteley, J. P.
2017-10-01
Large, incompressible elastic deformations are governed by a system of nonlinear partial differential equations. The finite element discretisation of these partial differential equations yields a system of nonlinear algebraic equations that are usually solved using Newton's method. On each iteration of Newton's method, a linear system must be solved. We exploit the structure of the Jacobian matrix to propose a preconditioner, comprising two steps. The first step is the solution of a relatively small, symmetric, positive definite linear system using the preconditioned conjugate gradient method. This is followed by a small number of multigrid V-cycles for a larger linear system. Through the use of exemplar elastic deformations, the preconditioner is demonstrated to facilitate the iterative solution of the linear systems arising. The number of GMRES iterations required has only a very weak dependence on the number of degrees of freedom of the linear systems.
Parallel processing of structural integrity analysis codes
International Nuclear Information System (INIS)
Swami Prasad, P.; Dutta, B.K.; Kushwaha, H.S.
1996-01-01
Structural integrity analysis forms an important role in assessing and demonstrating the safety of nuclear reactor components. This analysis is performed using analytical tools such as Finite Element Method (FEM) with the help of digital computers. The complexity of the problems involved in nuclear engineering demands high speed computation facilities to obtain solutions in reasonable amount of time. Parallel processing systems such as ANUPAM provide an efficient platform for realising the high speed computation. The development and implementation of software on parallel processing systems is an interesting and challenging task. The data and algorithm structure of the codes plays an important role in exploiting the parallel processing system capabilities. Structural analysis codes based on FEM can be divided into two categories with respect to their implementation on parallel processing systems. The first category codes such as those used for harmonic analysis, mechanistic fuel performance codes need not require the parallelisation of individual modules of the codes. The second category of codes such as conventional FEM codes require parallelisation of individual modules. In this category, parallelisation of equation solution module poses major difficulties. Different solution schemes such as domain decomposition method (DDM), parallel active column solver and substructuring method are currently used on parallel processing systems. Two codes, FAIR and TABS belonging to each of these categories have been implemented on ANUPAM. The implementation details of these codes and the performance of different equation solvers are highlighted. (author). 5 refs., 12 figs., 1 tab
A scalable approach to modeling groundwater flow on massively parallel computers
International Nuclear Information System (INIS)
Ashby, S.F.; Falgout, R.D.; Tompson, A.F.B.
1995-12-01
We describe a fully scalable approach to the simulation of groundwater flow on a hierarchy of computing platforms, ranging from workstations to massively parallel computers. Specifically, we advocate the use of scalable conceptual models in which the subsurface model is defined independently of the computational grid on which the simulation takes place. We also describe a scalable multigrid algorithm for computing the groundwater flow velocities. We axe thus able to leverage both the engineer's time spent developing the conceptual model and the computing resources used in the numerical simulation. We have successfully employed this approach at the LLNL site, where we have run simulations ranging in size from just a few thousand spatial zones (on workstations) to more than eight million spatial zones (on the CRAY T3D)-all using the same conceptual model
CERN. Geneva
2016-01-01
The traditionally used and well established parallel programming models OpenMP and MPI are both targeting lower level parallelism and are meant to be as language agnostic as possible. For a long time, those models were the only widely available portable options for developing parallel C++ applications beyond using plain threads. This has strongly limited the optimization capabilities of compilers, has inhibited extensibility and genericity, and has restricted the use of those models together with other, modern higher level abstractions introduced by the C++11 and C++14 standards. The recent revival of interest in the industry and wider community for the C++ language has also spurred a remarkable amount of standardization proposals and technical specifications being developed. Those efforts however have so far failed to build a vision on how to seamlessly integrate various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous many-task execution flows, continuation s...
Parallelism in matrix computations
Gallopoulos, Efstratios; Sameh, Ahmed H
2016-01-01
This book is primarily intended as a research monograph that could also be used in graduate courses for the design of parallel algorithms in matrix computations. It assumes general but not extensive knowledge of numerical linear algebra, parallel architectures, and parallel programming paradigms. The book consists of four parts: (I) Basics; (II) Dense and Special Matrix Computations; (III) Sparse Matrix Computations; and (IV) Matrix functions and characteristics. Part I deals with parallel programming paradigms and fundamental kernels, including reordering schemes for sparse matrices. Part II is devoted to dense matrix computations such as parallel algorithms for solving linear systems, linear least squares, the symmetric algebraic eigenvalue problem, and the singular-value decomposition. It also deals with the development of parallel algorithms for special linear systems such as banded ,Vandermonde ,Toeplitz ,and block Toeplitz systems. Part III addresses sparse matrix computations: (a) the development of pa...
Fractional finite Fourier transform.
Khare, Kedar; George, Nicholas
2004-07-01
We show that a fractional version of the finite Fourier transform may be defined by using prolate spheroidal wave functions of order zero. The transform is linear and additive in its index and asymptotically goes over to Namias's definition of the fractional Fourier transform. As a special case of this definition, it is shown that the finite Fourier transform may be inverted by using information over a finite range of frequencies in Fourier space, the inversion being sensitive to noise. Numerical illustrations for both forward (fractional) and inverse finite transforms are provided.
International Nuclear Information System (INIS)
Lucha, W.; Neufeld, H.
1986-01-01
We investigate the relation between finiteness of a four-dimensional quantum field theory and global supersymmetry. To this end we consider the most general quantum field theory and analyse the finiteness conditions resulting from the requirement of the absence of divergent contributions to the renormalizations of the parameters of the theory. In addition to the gauge bosons, both fermions and scalar bosons turn out to be a necessary ingredient in a non-trivial finite gauge theory. In all cases discussed, the supersymmetric theory restricted by two well-known constraints on the dimensionless couplings proves to be the unique solution of the finiteness conditions. (Author)
DEFF Research Database (Denmark)
Sitchinava, Nodar; Zeh, Norbert
2012-01-01
We present the parallel buffer tree, a parallel external memory (PEM) data structure for batched search problems. This data structure is a non-trivial extension of Arge's sequential buffer tree to a private-cache multiprocessor environment and reduces the number of I/O operations by the number of...... in the optimal OhOf(psortN + K/PB) parallel I/O complexity, where K is the size of the output reported in the process and psortN is the parallel I/O complexity of sorting N elements using P processors....
Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A; Seiberlich, Nicole
2012-07-01
Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the undersampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. Copyright © 2012 Wiley Periodicals, Inc.
Parallel Algorithms and Patterns
Energy Technology Data Exchange (ETDEWEB)
Robey, Robert W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
2016-06-16
This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.
Application Portable Parallel Library
Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott
1995-01-01
Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.
Parallel linear solvers for simulations of reactor thermal hydraulics
International Nuclear Information System (INIS)
Yan, Y.; Antal, S.P.; Edge, B.; Keyes, D.E.; Shaver, D.; Bolotnov, I.A.; Podowski, M.Z.
2011-01-01
The state-of-the-art multiphase fluid dynamics code, NPHASE-CMFD, performs multiphase flow simulations in complex domains using implicit nonlinear treatment of the governing equations and in parallel, which is a very challenging environment for the linear solver. The present work illustrates how the Portable, Extensible Toolkit for Scientific Computation (PETSc) and scalable Algebraic Multigrid (AMG) preconditioner from Hypre can be utilized to construct robust and scalable linear solvers for the Newton correction equation obtained from the discretized system of governing conservation equations in NPHASE-CMFD. The overall long-tem objective of this work is to extend the NPHASE-CMFD code into a fully-scalable solver of multiphase flow and heat transfer problems, applicable to both steady-state and stiff time-dependent phenomena in complete fuel assemblies of nuclear reactors and, eventually, the entire reactor core (such as the Virtual Reactor concept envisioned by CASL). This campaign appropriately begins with the linear algebraic equation solver, which is traditionally a bottleneck to scalability in PDE-based codes. The computational complexity of the solver is usually superlinear in problem size, whereas the rest of the code, the “physics” portion, usually has its complexity linear in the problem size. (author)
Parallel state transfer and efficient quantum routing on quantum networks.
Chudzicki, Christopher; Strauch, Frederick W
2010-12-31
We study the routing of quantum information in parallel on multidimensional networks of tunable qubits and oscillators. These theoretical models are inspired by recent experiments in superconducting circuits. We show that perfect parallel state transfer is possible for certain networks of harmonic oscillator modes. We extend this to the distribution of entanglement between every pair of nodes in the network, finding that the routing efficiency of hypercube networks is optimal and robust in the presence of dissipation and finite bandwidth.
Parallel discrete event simulation
Overeinder, B.J.; Hertzberger, L.O.; Sloot, P.M.A.; Withagen, W.J.
1991-01-01
In simulating applications for execution on specific computing systems, the simulation performance figures must be known in a short period of time. One basic approach to the problem of reducing the required simulation time is the exploitation of parallelism. However, in parallelizing the simulation
Parallel reservoir simulator computations
International Nuclear Information System (INIS)
Hemanth-Kumar, K.; Young, L.C.
1995-01-01
The adaptation of a reservoir simulator for parallel computations is described. The simulator was originally designed for vector processors. It performs approximately 99% of its calculations in vector/parallel mode and relative to scalar calculations it achieves speedups of 65 and 81 for black oil and EOS simulations, respectively on the CRAY C-90
Sman, van der R.G.M.
2006-01-01
In the special case of relaxation parameter = 1 lattice Boltzmann schemes for (convection) diffusion and fluid flow are equivalent to finite difference/volume (FD) schemes, and are thus coined finite Boltzmann (FB) schemes. We show that the equivalence is inherent to the homology of the
1996-01-01
Designs and Finite Geometries brings together in one place important contributions and up-to-date research results in this important area of mathematics. Designs and Finite Geometries serves as an excellent reference, providing insight into some of the most important research issues in the field.
Supersymmetric theories and finiteness
International Nuclear Information System (INIS)
Helayel-Neto, J.A.
1989-01-01
We attempt here to present a short survey of the all-order finite Lagrangian field theories known at present in four-and two-dimensional space-times. The question of the possible relevance of these ultraviolet finite models in the formulation of consistent unified frameworks for the fundamental forces is also addressed to. (author)
International Nuclear Information System (INIS)
Wu, Vincent W.C.; Tse, Teddy K.H.; Ho, Cola L.M.; Yeung, Eric C.Y.
2013-01-01
Monte Carlo (MC) simulation is currently the most accurate dose calculation algorithm in radiotherapy planning but requires relatively long processing time. Faster model-based algorithms such as the anisotropic analytical algorithm (AAA) by the Eclipse treatment planning system and multigrid superposition (MGS) by the XiO treatment planning system are 2 commonly used algorithms. This study compared AAA and MGS against MC, as the gold standard, on brain, nasopharynx, lung, and prostate cancer patients. Computed tomography of 6 patients of each cancer type was used. The same hypothetical treatment plan using the same machine and treatment prescription was computed for each case by each planning system using their respective dose calculation algorithm. The doses at reference points including (1) soft tissues only, (2) bones only, (3) air cavities only, (4) soft tissue-bone boundary (Soft/Bone), (5) soft tissue-air boundary (Soft/Air), and (6) bone-air boundary (Bone/Air), were measured and compared using the mean absolute percentage error (MAPE), which was a function of the percentage dose deviations from MC. Besides, the computation time of each treatment plan was recorded and compared. The MAPEs of MGS were significantly lower than AAA in all types of cancers (p<0.001). With regards to body density combinations, the MAPE of AAA ranged from 1.8% (soft tissue) to 4.9% (Bone/Air), whereas that of MGS from 1.6% (air cavities) to 2.9% (Soft/Bone). The MAPEs of MGS (2.6%±2.1) were significantly lower than that of AAA (3.7%±2.5) in all tissue density combinations (p<0.001). The mean computation time of AAA for all treatment plans was significantly lower than that of the MGS (p<0.001). Both AAA and MGS algorithms demonstrated dose deviations of less than 4.0% in most clinical cases and their performance was better in homogeneous tissues than at tissue boundaries. In general, MGS demonstrated relatively smaller dose deviations than AAA but required longer computation time
Energy Technology Data Exchange (ETDEWEB)
1991-10-23
An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.
Massively parallel mathematical sieves
Energy Technology Data Exchange (ETDEWEB)
Montry, G.R.
1989-01-01
The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.
Alabdulmohsin, Ibrahim M.
2018-03-07
We will begin our treatment of summability calculus by analyzing what will be referred to, throughout this book, as simple finite sums. Even though the results of this chapter are particular cases of the more general results presented in later chapters, they are important to start with for a few reasons. First, this chapter serves as an excellent introduction to what summability calculus can markedly accomplish. Second, simple finite sums are encountered more often and, hence, they deserve special treatment. Third, the results presented in this chapter for simple finite sums will, themselves, be used as building blocks for deriving the most general results in subsequent chapters. Among others, we establish that fractional finite sums are well-defined mathematical objects and show how various identities related to the Euler constant as well as the Riemann zeta function can actually be derived in an elementary manner using fractional finite sums.
Alabdulmohsin, Ibrahim M.
2018-01-01
We will begin our treatment of summability calculus by analyzing what will be referred to, throughout this book, as simple finite sums. Even though the results of this chapter are particular cases of the more general results presented in later chapters, they are important to start with for a few reasons. First, this chapter serves as an excellent introduction to what summability calculus can markedly accomplish. Second, simple finite sums are encountered more often and, hence, they deserve special treatment. Third, the results presented in this chapter for simple finite sums will, themselves, be used as building blocks for deriving the most general results in subsequent chapters. Among others, we establish that fractional finite sums are well-defined mathematical objects and show how various identities related to the Euler constant as well as the Riemann zeta function can actually be derived in an elementary manner using fractional finite sums.
Finite fields and applications
Mullen, Gary L
2007-01-01
This book provides a brief and accessible introduction to the theory of finite fields and to some of their many fascinating and practical applications. The first chapter is devoted to the theory of finite fields. After covering their construction and elementary properties, the authors discuss the trace and norm functions, bases for finite fields, and properties of polynomials over finite fields. Each of the remaining chapters details applications. Chapter 2 deals with combinatorial topics such as the construction of sets of orthogonal latin squares, affine and projective planes, block designs, and Hadamard matrices. Chapters 3 and 4 provide a number of constructions and basic properties of error-correcting codes and cryptographic systems using finite fields. Each chapter includes a set of exercises of varying levels of difficulty which help to further explain and motivate the material. Appendix A provides a brief review of the basic number theory and abstract algebra used in the text, as well as exercises rel...
Optimization of Finite-Differencing Kernels for Numerical Relativity Applications
Directory of Open Access Journals (Sweden)
Roberto Alfieri
2018-05-01
Full Text Available A simple optimization strategy for the computation of 3D finite-differencing kernels on many-cores architectures is proposed. The 3D finite-differencing computation is split direction-by-direction and exploits two level of parallelism: in-core vectorization and multi-threads shared-memory parallelization. The main application of this method is to accelerate the high-order stencil computations in numerical relativity codes. Our proposed method provides substantial speedup in computations involving tensor contractions and 3D stencil calculations on different processor microarchitectures, including Intel Knight Landing.
A Simple Method for Static Load Balancing of Parallel FDTD Codes
DEFF Research Database (Denmark)
Franek, Ondrej
2016-01-01
A static method for balancing computational loads in parallel implementations of the finite-difference timedomain method is presented. The procedure is fairly straightforward and computationally inexpensive, thus providing an attractive alternative to optimization techniques. The method is descri...
Finite elements and approximation
Zienkiewicz, O C
2006-01-01
A powerful tool for the approximate solution of differential equations, the finite element is extensively used in industry and research. This book offers students of engineering and physics a comprehensive view of the principles involved, with numerous illustrative examples and exercises.Starting with continuum boundary value problems and the need for numerical discretization, the text examines finite difference methods, weighted residual methods in the context of continuous trial functions, and piecewise defined trial functions and the finite element method. Additional topics include higher o
Costiner, Sorin; Taasan, Shlomo
1994-01-01
This paper presents multigrid (MG) techniques for nonlinear eigenvalue problems (EP) and emphasizes an MG algorithm for a nonlinear Schrodinger EP. The algorithm overcomes the mentioned difficulties combining the following techniques: an MG projection coupled with backrotations for separation of solutions and treatment of difficulties related to clusters of close and equal eigenvalues; MG subspace continuation techniques for treatment of the nonlinearity; an MG simultaneous treatment of the eigenvectors at the same time with the nonlinearity and with the global constraints. The simultaneous MG techniques reduce the large number of self consistent iterations to only a few or one MG simultaneous iteration and keep the solutions in a right neighborhood where the algorithm converges fast.
Algorithms for parallel computers
International Nuclear Information System (INIS)
Churchhouse, R.F.
1985-01-01
Until relatively recently almost all the algorithms for use on computers had been designed on the (usually unstated) assumption that they were to be run on single processor, serial machines. With the introduction of vector processors, array processors and interconnected systems of mainframes, minis and micros, however, various forms of parallelism have become available. The advantage of parallelism is that it offers increased overall processing speed but it also raises some fundamental questions, including: (i) which, if any, of the existing 'serial' algorithms can be adapted for use in the parallel mode. (ii) How close to optimal can such adapted algorithms be and, where relevant, what are the convergence criteria. (iii) How can we design new algorithms specifically for parallel systems. (iv) For multi-processor systems how can we handle the software aspects of the interprocessor communications. Aspects of these questions illustrated by examples are considered in these lectures. (orig.)
Parallelism and array processing
International Nuclear Information System (INIS)
Zacharov, V.
1983-01-01
Modern computing, as well as the historical development of computing, has been dominated by sequential monoprocessing. Yet there is the alternative of parallelism, where several processes may be in concurrent execution. This alternative is discussed in a series of lectures, in which the main developments involving parallelism are considered, both from the standpoint of computing systems and that of applications that can exploit such systems. The lectures seek to discuss parallelism in a historical context, and to identify all the main aspects of concurrency in computation right up to the present time. Included will be consideration of the important question as to what use parallelism might be in the field of data processing. (orig.)
A non-linear multigrid method for the steady Euler equations
Hemker, P.W.; Koren, B.; Dervieux, A.; Leer, van B.; Periaux, J.; Rizzi, A.
1989-01-01
Higher-order accurate Euler-flow solutions are presented for some airfoil test cases. Second-order accurate solutions are computed by an Iterative Defect Correction process. For two test cases even higher accuracy is obtained by the additional use of a ~xtrapolation technique. Finite volume
Jönsthövel, T.B.; Van Gijzen, M.B.; MacLachlan, S.; Vuik, C.; Scarpas, A.
2012-01-01
Many applications in computational science and engineering concern composite materials, which are characterized by large discontinuities in the material properties. Such applications require fine-scale finite-element meshes, which lead to large linear systems that are challenging to solve with
Sharp asymptotics for stochastic dynamics with parallel updating rule
Nardi, F.R.; Spitoni, C.
2012-01-01
In this paper we study the metastability problem for a stochastic dynamics with a parallel updating rule; in particular we consider a finite volume Probabilistic Cellular Automaton (PCA) in a small external field at low temperature regime. We are interested in the nucleation of the system, i.e., the
Sharp Asymptotics for Stochastic Dynamics with Parallel Updating Rule
Nardi, F.R.; Spitoni, C.
2012-01-01
In this paper we study the metastability problem for a stochastic dynamics with a parallel updating rule; in particular we consider a finite volume Probabilistic Cellular Automaton (PCA) in a small external field at low temperature regime. We are interested in the nucleation of the system, i.e.,
Locating an axis-parallel rectangle on a Manhattan plane
DEFF Research Database (Denmark)
Brimberg, Jack; Juel, Henrik; Körner, Mark-Christoph
2014-01-01
In this paper we consider the problem of locating an axis-parallel rectangle in the plane such that the sum of distances between the rectangle and a finite point set is minimized, where the distance is measured by the Manhattan norm 1. In this way we solve an extension of the Weber problem...
Optimising a parallel conjugate gradient solver
Energy Technology Data Exchange (ETDEWEB)
Field, M.R. [O`Reilly Institute, Dublin (Ireland)
1996-12-31
This work arises from the introduction of a parallel iterative solver to a large structural analysis finite element code. The code is called FEX and it was developed at Hitachi`s Mechanical Engineering Laboratory. The FEX package can deal with a large range of structural analysis problems using a large number of finite element techniques. FEX can solve either stress or thermal analysis problems of a range of different types from plane stress to a full three-dimensional model. These problems can consist of a number of different materials which can be modelled by a range of material models. The structure being modelled can have the load applied at either a point or a surface, or by a pressure, a centrifugal force or just gravity. Alternatively a thermal load can be applied with a given initial temperature. The displacement of the structure can be constrained by having a fixed boundary or by prescribing the displacement at a boundary.
Indian Academy of Sciences (India)
IAS Admin
wavelength, they are called shallow water waves. In the ... Deep and intermediate water waves are dispersive as the velocity of these depends on wavelength. This is not the ..... generation processes, the finite amplitude wave theories are very ...
Finite Discrete Gabor Analysis
DEFF Research Database (Denmark)
Søndergaard, Peter Lempel
2007-01-01
frequency bands at certain times. Gabor theory can be formulated for both functions on the real line and for discrete signals of finite length. The two theories are largely the same because many aspects come from the same underlying theory of locally compact Abelian groups. The two types of Gabor systems...... can also be related by sampling and periodization. This thesis extends on this theory by showing new results for window construction. It also provides a discussion of the problems associated to discrete Gabor bases. The sampling and periodization connection is handy because it allows Gabor systems...... on the real line to be well approximated by finite and discrete Gabor frames. This method of approximation is especially attractive because efficient numerical methods exists for doing computations with finite, discrete Gabor systems. This thesis presents new algorithms for the efficient computation of finite...
International Nuclear Information System (INIS)
Rittenberg, V.
1983-01-01
Fischer's finite-size scaling describes the cross over from the singular behaviour of thermodynamic quantities at the critical point to the analytic behaviour of the finite system. Recent extensions of the method--transfer matrix technique, and the Hamiltonian formalism--are discussed in this paper. The method is presented, with equations deriving scaling function, critical temperature, and exponent v. As an application of the method, a 3-states Hamiltonian with Z 3 global symmetry is studied. Diagonalization of the Hamiltonian for finite chains allows one to estimate the critical exponents, and also to discover new phase transitions at lower temperatures. The critical points lambda, and indices v estimated for finite-scaling are given
Parallel magnetic resonance imaging
International Nuclear Information System (INIS)
Larkman, David J; Nunes, Rita G
2007-01-01
Parallel imaging has been the single biggest innovation in magnetic resonance imaging in the last decade. The use of multiple receiver coils to augment the time consuming Fourier encoding has reduced acquisition times significantly. This increase in speed comes at a time when other approaches to acquisition time reduction were reaching engineering and human limits. A brief summary of spatial encoding in MRI is followed by an introduction to the problem parallel imaging is designed to solve. There are a large number of parallel reconstruction algorithms; this article reviews a cross-section, SENSE, SMASH, g-SMASH and GRAPPA, selected to demonstrate the different approaches. Theoretical (the g-factor) and practical (coil design) limits to acquisition speed are reviewed. The practical implementation of parallel imaging is also discussed, in particular coil calibration. How to recognize potential failure modes and their associated artefacts are shown. Well-established applications including angiography, cardiac imaging and applications using echo planar imaging are reviewed and we discuss what makes a good application for parallel imaging. Finally, active research areas where parallel imaging is being used to improve data quality by repairing artefacted images are also reviewed. (invited topical review)
Linear parallel processing machines I
Energy Technology Data Exchange (ETDEWEB)
Von Kunze, M
1984-01-01
As is well-known, non-context-free grammars for generating formal languages happen to be of a certain intrinsic computational power that presents serious difficulties to efficient parsing algorithms as well as for the development of an algebraic theory of contextsensitive languages. In this paper a framework is given for the investigation of the computational power of formal grammars, in order to start a thorough analysis of grammars consisting of derivation rules of the form aB ..-->.. A/sub 1/ ... A /sub n/ b/sub 1/...b /sub m/ . These grammars may be thought of as automata by means of parallel processing, if one considers the variables as operators acting on the terminals while reading them right-to-left. This kind of automata and their 2-dimensional programming language prove to be useful by allowing a concise linear-time algorithm for integer multiplication. Linear parallel processing machines (LP-machines) which are, in their general form, equivalent to Turing machines, include finite automata and pushdown automata (with states encoded) as special cases. Bounded LP-machines yield deterministic accepting automata for nondeterministic contextfree languages, and they define an interesting class of contextsensitive languages. A characterization of this class in terms of generating grammars is established by using derivation trees with crossings as a helpful tool. From the algebraic point of view, deterministic LP-machines are effectively represented semigroups with distinguished subsets. Concerning the dualism between generating and accepting devices of formal languages within the algebraic setting, the concept of accepting automata turns out to reduce essentially to embeddability in an effectively represented extension monoid, even in the classical cases.
Energy Technology Data Exchange (ETDEWEB)
Lober, R.R.; Tautges, T.J.; Vaughan, C.T.
1997-03-01
Paving is an automated mesh generation algorithm which produces all-quadrilateral elements. It can additionally generate these elements in varying sizes such that the resulting mesh adapts to a function distribution, such as an error function. While powerful, conventional paving is a very serial algorithm in its operation. Parallel paving is the extension of serial paving into parallel environments to perform the same meshing functions as conventional paving only on distributed, discretized models. This extension allows large, adaptive, parallel finite element simulations to take advantage of paving`s meshing capabilities for h-remap remeshing. A significantly modified version of the CUBIT mesh generation code has been developed to host the parallel paving algorithm and demonstrate its capabilities on both two dimensional and three dimensional surface geometries and compare the resulting parallel produced meshes to conventionally paved meshes for mesh quality and algorithm performance. Sandia`s {open_quotes}tiling{close_quotes} dynamic load balancing code has also been extended to work with the paving algorithm to retain parallel efficiency as subdomains undergo iterative mesh refinement.
Supersymmetry at finite temperature
International Nuclear Information System (INIS)
Clark, T.E.; Love, S.T.
1983-01-01
Finite-temperature supersymmetry (SUSY) is characterized by unbroken Ward identities for SUSY variations of ensemble averages of Klein-operator inserted imaginary time-ordered products of fields. Path-integral representations of these products are defined and the Feynman rules in superspace are given. The finite-temperature no-renormalization theorem is derived. Spontaneously broken SUSY at zero temperature is shown not to be restored at high temperature. (orig.)
Peridynamic Multiscale Finite Element Methods
Energy Technology Data Exchange (ETDEWEB)
Costa, Timothy [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Bond, Stephen D. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Littlewood, David John [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Moore, Stan Gerald [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
2015-12-01
art of local models with the flexibility and accuracy of the nonlocal peridynamic model. In the mixed locality method this coupling occurs across scales, so that the nonlocal model can be used to communicate material heterogeneity at scales inappropriate to local partial differential equation models. Additionally, the computational burden of the weak form of the peridynamic model is reduced dramatically by only requiring that the model be solved on local patches of the simulation domain which may be computed in parallel, taking advantage of the heterogeneous nature of next generation computing platforms. Addition- ally, we present a novel Galerkin framework, the 'Ambulant Galerkin Method', which represents a first step towards a unified mathematical analysis of local and nonlocal multiscale finite element methods, and whose future extension will allow the analysis of multiscale finite element methods that mix models across scales under certain assumptions of the consistency of those models.
Lattice gauge theory using parallel processors
International Nuclear Information System (INIS)
Lee, T.D.; Chou, K.C.; Zichichi, A.
1987-01-01
The book's contents include: Lattice Gauge Theory Lectures: Introduction and Current Fermion Simulations; Monte Carlo Algorithms for Lattice Gauge Theory; Specialized Computers for Lattice Gauge Theory; Lattice Gauge Theory at Finite Temperature: A Monte Carlo Study; Computational Method - An Elementary Introduction to the Langevin Equation, Present Status of Numerical Quantum Chromodynamics; Random Lattice Field Theory; The GF11 Processor and Compiler; and The APE Computer and First Physics Results; Columbia Supercomputer Project: Parallel Supercomputer for Lattice QCD; Statistical and Systematic Errors in Numerical Simulations; Monte Carlo Simulation for LGT and Programming Techniques on the Columbia Supercomputer; Food for Thought: Five Lectures on Lattice Gauge Theory
Use of massively parallel computing to improve modelling accuracy within the nuclear sector
Directory of Open Access Journals (Sweden)
L M Evans
2016-06-01
This work presents recent advancements in three techniques: Uncertainty quantification (UQ; Cellular automata finite element (CAFE; Image based finite element methods (IBFEM. Case studies are presented demonstrating their suitability for use in nuclear engineering made possible by advancements in parallel computing hardware that is projected to be available for industry within the next decade costing of the order of $100k.
Run-Time and Compiler Support for Programming in Adaptive Parallel Environments
Directory of Open Access Journals (Sweden)
Guy Edjlali
1997-01-01
Full Text Available For better utilization of computing resources, it is important to consider parallel programming environments in which the number of available processors varies at run-time. In this article, we discuss run-time support for data-parallel programming in such an adaptive environment. Executing programs in an adaptive environment requires redistributing data when the number of processors changes, and also requires determining new loop bounds and communication patterns for the new set of processors. We have developed a run-time library to provide this support. We discuss how the run-time library can be used by compilers of high-performance Fortran (HPF-like languages to generate code for an adaptive environment. We present performance results for a Navier-Stokes solver and a multigrid template run on a network of workstations and an IBM SP-2. Our experiments show that if the number of processors is not varied frequently, the cost of data redistribution is not significant compared to the time required for the actual computation. Overall, our work establishes the feasibility of compiling HPF for a network of nondedicated workstations, which are likely to be an important resource for parallel programming in the future.
The STAPL Parallel Graph Library
Harshvardhan,; Fidel, Adam; Amato, Nancy M.; Rauchwerger, Lawrence
2013-01-01
This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable
Design, development and use of the finite element machine
Adams, L. M.; Voigt, R. C.
1983-01-01
Some of the considerations that went into the design of the Finite Element Machine, a research asynchronous parallel computer are described. The present status of the system is also discussed along with some indication of the type of results that were obtained.
Massively parallel multicanonical simulations
Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard
2018-03-01
Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.
SPINning parallel systems software
International Nuclear Information System (INIS)
Matlin, O.S.; Lusk, E.; McCune, W.
2002-01-01
We describe our experiences in using Spin to verify parts of the Multi Purpose Daemon (MPD) parallel process management system. MPD is a distributed collection of processes connected by Unix network sockets. MPD is dynamic processes and connections among them are created and destroyed as MPD is initialized, runs user processes, recovers from faults, and terminates. This dynamic nature is easily expressible in the Spin/Promela framework but poses performance and scalability challenges. We present here the results of expressing some of the parallel algorithms of MPD and executing both simulation and verification runs with Spin
Parallel programming with Python
Palach, Jan
2014-01-01
A fast, easy-to-follow and clear tutorial to help you develop Parallel computing systems using Python. Along with explaining the fundamentals, the book will also introduce you to slightly advanced concepts and will help you in implementing these techniques in the real world. If you are an experienced Python programmer and are willing to utilize the available computing resources by parallelizing applications in a simple way, then this book is for you. You are required to have a basic knowledge of Python development to get the most of this book.
Parallel algorithms for 2-D cylindrical transport equations of Eigenvalue problem
International Nuclear Information System (INIS)
Wei, J.; Yang, S.
2013-01-01
In this paper, aimed at the neutron transport equations of eigenvalue problem under 2-D cylindrical geometry on unstructured grid, the discrete scheme of Sn discrete ordinate and discontinuous finite is built, and the parallel computation for the scheme is realized on MPI systems. Numerical experiments indicate that the designed parallel algorithm can reach perfect speedup, it has good practicality and scalability. (authors)
Solving the Flood Propagation Problem with Newton Algorithm on Parallel Systems
Directory of Open Access Journals (Sweden)
Chefi Triki
2012-04-01
Full Text Available In this paper we propose a parallel implementation for the flood propagation method Flo2DH. The model is built on a finite element spatial approximation combined with a Newton algorithm that uses a direct LU linear solver. The parallel implementation has been developed by using the standard MPI protocol and has been tested on a set of real world problems.
International Nuclear Information System (INIS)
Feinsilver, Philip; Schott, Rene
2009-01-01
We discuss topics related to finite-dimensional calculus in the context of finite-dimensional quantum mechanics. The truncated Heisenberg-Weyl algebra is called a TAA algebra after Tekin, Aydin and Arik who formulated it in terms of orthofermions. It is shown how to use a matrix approach to implement analytic representations of the Heisenberg-Weyl algebra in univariate and multivariate settings. We provide examples for the univariate case. Krawtchouk polynomials are presented in detail, including a review of Krawtchouk polynomials that illustrates some curious properties of the Heisenberg-Weyl algebra, as well as presenting an approach to computing Krawtchouk expansions. From a mathematical perspective, we are providing indications as to how to implement infinite terms Rota's 'finite operator calculus'.
Three-dimensional magnetic field computation on a distributed memory parallel processor
International Nuclear Information System (INIS)
Barion, M.L.
1990-01-01
The analysis of three-dimensional magnetic fields by finite element methods frequently proves too onerous a task for the computing resource on which it is attempted. When non-linear and transient effects are included, it may become impossible to calculate the field distribution to sufficient resolution. One approach to this problem is to exploit the natural parallelism in the finite element method via parallel processing. This paper reports on an implementation of a finite element code for non-linear three-dimensional low-frequency magnetic field calculation on Intel's iPSC/2
International Nuclear Information System (INIS)
Li Hanyu; Zhou Haijing; Dong Zhiwei; Liao Cheng; Chang Lei; Cao Xiaolin; Xiao Li
2010-01-01
A large-scale parallel electromagnetic field simulation program JEMS-FDTD(J Electromagnetic Solver-Finite Difference Time Domain) is designed and implemented on JASMIN (J parallel Adaptive Structured Mesh applications INfrastructure). This program can simulate propagation, radiation, couple of electromagnetic field by solving Maxwell equations on structured mesh explicitly with FDTD method. JEMS-FDTD is able to simulate billion-mesh-scale problems on thousands of processors. In this article, the program is verified by simulating the radiation of an electric dipole. A beam waveguide is simulated to demonstrate the capability of large scale parallel computation. A parallel performance test indicates that a high parallel efficiency is obtained. (authors)
Finite temperature field theory
Das, Ashok
1997-01-01
This book discusses all three formalisms used in the study of finite temperature field theory, namely the imaginary time formalism, the closed time formalism and thermofield dynamics. Applications of the formalisms are worked out in detail. Gauge field theories and symmetry restoration at finite temperature are among the practical examples discussed in depth. The question of gauge dependence of the effective potential and the Nielsen identities are explained. The nonrestoration of some symmetries at high temperature (such as supersymmetry) and theories on nonsimply connected space-times are al
International Nuclear Information System (INIS)
Wachspress, E.
2009-01-01
Triangles and rectangles are the ubiquitous elements in finite element studies. Only these elements admit polynomial basis functions. Rational functions provide a basis for elements having any number of straight and curved sides. Numerical complexities initially associated with rational bases precluded extensive use. Recent analysis has reduced these difficulties and programs have been written to illustrate effectiveness. Although incorporation in major finite element software requires considerable effort, there are advantages in some applications which warrant implementation. An outline of the basic theory and of recent innovations is presented here. (authors)
Expressing Parallelism with ROOT
Energy Technology Data Exchange (ETDEWEB)
Piparo, D. [CERN; Tejedor, E. [CERN; Guiraud, E. [CERN; Ganis, G. [CERN; Mato, P. [CERN; Moneta, L. [CERN; Valls Pla, X. [CERN; Canal, P. [Fermilab
2017-11-22
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Expressing Parallelism with ROOT
Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.
2017-10-01
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Parallel Fast Legendre Transform
Alves de Inda, M.; Bisseling, R.H.; Maslen, D.K.
1998-01-01
We discuss a parallel implementation of a fast algorithm for the discrete polynomial Legendre transform We give an introduction to the DriscollHealy algorithm using polynomial arithmetic and present experimental results on the eciency and accuracy of our implementation The algorithms were
Practical parallel programming
Bauer, Barr E
2014-01-01
This is the book that will teach programmers to write faster, more efficient code for parallel processors. The reader is introduced to a vast array of procedures and paradigms on which actual coding may be based. Examples and real-life simulations using these devices are presented in C and FORTRAN.
Parallel hierarchical radiosity rendering
Energy Technology Data Exchange (ETDEWEB)
Carter, Michael [Iowa State Univ., Ames, IA (United States)
1993-07-01
In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.
Parallel universes beguile science
2007-01-01
A staple of mind-bending science fiction, the possibility of multiple universes has long intrigued hard-nosed physicists, mathematicians and cosmologists too. We may not be able -- as least not yet -- to prove they exist, many serious scientists say, but there are plenty of reasons to think that parallel dimensions are more than figments of eggheaded imagination.
Energy Technology Data Exchange (ETDEWEB)
2017-04-04
A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique. We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.
International Nuclear Information System (INIS)
Gardes, D.; Volkov, P.
1981-01-01
A 5x3cm 2 (timing only) and a 15x5cm 2 (timing and position) parallel plate avalanche counters (PPAC) are considered. The theory of operation and timing resolution is given. The measurement set-up and the curves of experimental results illustrate the possibilities of the two counters [fr
Parallel hierarchical global illumination
Energy Technology Data Exchange (ETDEWEB)
Snell, Quinn O. [Iowa State Univ., Ames, IA (United States)
1997-10-08
Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.
International Nuclear Information System (INIS)
Meszaros, A.
1984-05-01
In case the graviton has a very small non-zero mass, the existence of six additional massive gravitons with very big masses leads to a finite quantum gravity. There is an acausal behaviour on the scales that is determined by the masses of additional gravitons. (author)
Finite lattice extrapolation algorithms
International Nuclear Information System (INIS)
Henkel, M.; Schuetz, G.
1987-08-01
Two algorithms for sequence extrapolation, due to von den Broeck and Schwartz and Bulirsch and Stoer are reviewed and critically compared. Applications to three states and six states quantum chains and to the (2+1)D Ising model show that the algorithm of Bulirsch and Stoer is superior, in particular if only very few finite lattice data are available. (orig.)
Energy Technology Data Exchange (ETDEWEB)
Kapetanakis, D. (Technische Univ. Muenchen, Garching (Germany). Physik Dept.); Mondragon, M. (Technische Univ. Muenchen, Garching (Germany). Physik Dept.); Zoupanos, G. (National Technical Univ., Athens (Greece). Physics Dept.)
1993-09-01
We present phenomenologically viable SU(5) unified models which are finite to all orders before the spontaneous symmetry breaking. In the case of two models with three families the top quark mass is predicted to be 178.8 GeV. (orig.)
International Nuclear Information System (INIS)
Kapetanakis, D.; Mondragon, M.; Zoupanos, G.
1993-01-01
We present phenomenologically viable SU(5) unified models which are finite to all orders before the spontaneous symmetry breaking. In the case of two models with three families the top quark mass is predicted to be 178.8 GeV. (orig.)
International Nuclear Information System (INIS)
Kapetanakis, D.; Mondragon, M.
1993-01-01
It is shown how to obtain phenomenologically viable SU(5) unified models which are finite to all orders before the spontaneous symmetry breaking. A very interesting feature of the models with three families is that they predict the top quark mass to be around 178 GeV. 16 refs
Czech Academy of Sciences Publication Activity Database
Šorel, Michal; Šíma, Jiří
2004-01-01
Roč. 62, - (2004), s. 93-110 ISSN 0925-2312 R&D Projects: GA AV ČR IAB2030007; GA MŠk LN00A056 Keywords : radial basis function * neural network * finite automaton * Boolean circuit * computational power Subject RIV: BA - General Mathematics Impact factor: 0.641, year: 2004
Weiser, Martin
2016-01-01
All relevant implementation aspects of finite element methods are discussed in this book. The focus is on algorithms and data structures as well as on their concrete implementation. Theory is covered as far as it gives insight into the construction of algorithms. Throughout the exercises a complete FE-solver for scalar 2D problems will be implemented in Matlab/Octave.
International Nuclear Information System (INIS)
Baron, E.; Hauschildt, Peter H.
1998-01-01
We describe an important addition to the parallel implementation of our generalized nonlocal thermodynamic equilibrium (NLTE) stellar atmosphere and radiative transfer computer program PHOENIX. In a previous paper in this series we described data and task parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. These algorithms divided the work spatially or by spectral lines, that is, distributing the radial zones, individual spectral lines, or characteristic rays among different processors and employ, in addition, task parallelism for logically independent functions (such as atomic and molecular line opacities). For finite, monotonic velocity fields, the radiative transfer equation is an initial value problem in wavelength, and hence each wavelength point depends upon the previous one. However, for sophisticated NLTE models of both static and moving atmospheres needed to accurately describe, e.g., novae and supernovae, the number of wavelength points is very large (200,000 - 300,000) and hence parallelization over wavelength can lead both to considerable speedup in calculation time and the ability to make use of the aggregate memory available on massively parallel supercomputers. Here, we describe an implementation of a pipelined design for the wavelength parallelization of PHOENIX, where the necessary data from the processor working on a previous wavelength point is sent to the processor working on the succeeding wavelength point as soon as it is known. Our implementation uses a MIMD design based on a relatively small number of standard message passing interface (MPI) library calls and is fully portable between serial and parallel computers. copyright 1998 The American Astronomical Society
Finite Volume Element (FVE) discretization and multilevel solution of the axisymmetric heat equation
Litaker, Eric T.
1994-12-01
The axisymmetric heat equation, resulting from a point-source of heat applied to a metal block, is solved numerically; both iterative and multilevel solutions are computed in order to compare the two processes. The continuum problem is discretized in two stages: finite differences are used to discretize the time derivatives, resulting is a fully implicit backward time-stepping scheme, and the Finite Volume Element (FVE) method is used to discretize the spatial derivatives. The application of the FVE method to a problem in cylindrical coordinates is new, and results in stencils which are analyzed extensively. Several iteration schemes are considered, including both Jacobi and Gauss-Seidel; a thorough analysis of these schemes is done, using both the spectral radii of the iteration matrices and local mode analysis. Using this discretization, a Gauss-Seidel relaxation scheme is used to solve the heat equation iteratively. A multilevel solution process is then constructed, including the development of intergrid transfer and coarse grid operators. Local mode analysis is performed on the components of the amplification matrix, resulting in the two-level convergence factors for various combinations of the operators. A multilevel solution process is implemented by using multigrid V-cycles; the iterative and multilevel results are compared and discussed in detail. The computational savings resulting from the multilevel process are then discussed.
A parallel algorithm for transient solid dynamics simulations with contact detection
International Nuclear Information System (INIS)
Attaway, S.; Hendrickson, B.; Plimpton, S.; Gardner, D.; Vaughan, C.; Heinstein, M.; Peery, J.
1996-01-01
Solid dynamics simulations with Lagrangian finite elements are used to model a wide variety of problems, such as the calculation of impact damage to shipping containers for nuclear waste and the analysis of vehicular crashes. Using parallel computers for these simulations has been hindered by the difficulty of searching efficiently for material surface contacts in parallel. A new parallel algorithm for calculation of arbitrary material contacts in finite element simulations has been developed and implemented in the PRONTO3D transient solid dynamics code. This paper will explore some of the issues involved in developing efficient, portable, parallel finite element models for nonlinear transient solid dynamics simulations. The contact-detection problem poses interesting challenges for efficient implementation of a solid dynamics simulation on a parallel computer. The finite element mesh is typically partitioned so that each processor owns a localized region of the finite element mesh. This mesh partitioning is optimal for the finite element portion of the calculation since each processor must communicate only with the few connected neighboring processors that share boundaries with the decomposed mesh. However, contacts can occur between surfaces that may be owned by any two arbitrary processors. Hence, a global search across all processors is required at every time step to search for these contacts. Load-imbalance can become a problem since the finite element decomposition divides the volumetric mesh evenly across processors but typically leaves the surface elements unevenly distributed. In practice, these complications have been limiting factors in the performance and scalability of transient solid dynamics on massively parallel computers. In this paper the authors present a new parallel algorithm for contact detection that overcomes many of these limitations
Wald, Ingo; Ize, Santiago
2015-07-28
Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.
Ultrascalable petaflop parallel supercomputer
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Chiu, George [Cross River, NY; Cipolla, Thomas M [Katonah, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Hall, Shawn [Pleasantville, NY; Haring, Rudolf A [Cortlandt Manor, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kopcsay, Gerard V [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Salapura, Valentina [Chappaqua, NY; Sugavanam, Krishnan [Mahopac, NY; Takken, Todd [Brewster, NY
2010-07-20
A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.
DEFF Research Database (Denmark)
Gregersen, Frans; Josephson, Olle; Kristoffersen, Gjert
of departure that English may be used in parallel with the various local, in this case Nordic, languages. As such, the book integrates the challenge of internationalization faced by any university with the wish to improve quality in research, education and administration based on the local language......Abstract [en] More parallel, please is the result of the work of an Inter-Nordic group of experts on language policy financed by the Nordic Council of Ministers 2014-17. The book presents all that is needed to plan, practice and revise a university language policy which takes as its point......(s). There are three layers in the text: First, you may read the extremely brief version of the in total 11 recommendations for best practice. Second, you may acquaint yourself with the extended version of the recommendations and finally, you may study the reasoning behind each of them. At the end of the text, we give...
PARALLEL MOVING MECHANICAL SYSTEMS
Directory of Open Access Journals (Sweden)
Florian Ion Tiberius Petrescu
2014-09-01
Full Text Available Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 Moving mechanical systems parallel structures are solid, fast, and accurate. Between parallel systems it is to be noticed Stewart platforms, as the oldest systems, fast, solid and precise. The work outlines a few main elements of Stewart platforms. Begin with the geometry platform, kinematic elements of it, and presented then and a few items of dynamics. Dynamic primary element on it means the determination mechanism kinetic energy of the entire Stewart platforms. It is then in a record tail cinematic mobile by a method dot matrix of rotation. If a structural mottoelement consists of two moving elements which translates relative, drive train and especially dynamic it is more convenient to represent the mottoelement as a single moving components. We have thus seven moving parts (the six motoelements or feet to which is added mobile platform 7 and one fixed.
Xyce parallel electronic simulator.
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.
2010-05-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.
Betchov, R
2012-01-01
Stability of Parallel Flows provides information pertinent to hydrodynamical stability. This book explores the stability problems that occur in various fields, including electronics, mechanics, oceanography, administration, economics, as well as naval and aeronautical engineering. Organized into two parts encompassing 10 chapters, this book starts with an overview of the general equations of a two-dimensional incompressible flow. This text then explores the stability of a laminar boundary layer and presents the equation of the inviscid approximation. Other chapters present the general equation
Algorithmically specialized parallel computers
Snyder, Lawrence; Gannon, Dennis B
1985-01-01
Algorithmically Specialized Parallel Computers focuses on the concept and characteristics of an algorithmically specialized computer.This book discusses the algorithmically specialized computers, algorithmic specialization using VLSI, and innovative architectures. The architectures and algorithms for digital signal, speech, and image processing and specialized architectures for numerical computations are also elaborated. Other topics include the model for analyzing generalized inter-processor, pipelined architecture for search tree maintenance, and specialized computer organization for raster
New Parallel Algorithms for Landscape Evolution Model
Jin, Y.; Zhang, H.; Shi, Y.
2017-12-01
Most landscape evolution models (LEM) developed in the last two decades solve the diffusion equation to simulate the transportation of surface sediments. This numerical approach is difficult to parallelize due to the computation of drainage area for each node, which needs huge amount of communication if run in parallel. In order to overcome this difficulty, we developed two parallel algorithms for LEM with a stream net. One algorithm handles the partition of grid with traditional methods and applies an efficient global reduction algorithm to do the computation of drainage areas and transport rates for the stream net; the other algorithm is based on a new partition algorithm, which partitions the nodes in catchments between processes first, and then partitions the cells according to the partition of nodes. Both methods focus on decreasing communication between processes and take the advantage of massive computing techniques, and numerical experiments show that they are both adequate to handle large scale problems with millions of cells. We implemented the two algorithms in our program based on the widely used finite element library deal.II, so that it can be easily coupled with ASPECT.
International Nuclear Information System (INIS)
Brown, P.; Chang, B.
1998-01-01
The linear Boltzmann transport equation (BTE) is an integro-differential equation arising in deterministic models of neutral and charged particle transport. In slab (one-dimensional Cartesian) geometry and certain higher-dimensional cases, Diffusion Synthetic Acceleration (DSA) is known to be an effective algorithm for the iterative solution of the discretized BTE. Fourier and asymptotic analyses have been applied to various idealizations (e.g., problems on infinite domains with constant coefficients) to obtain sharp bounds on the convergence rate of DSA in such cases. While DSA has been shown to be a highly effective acceleration (or preconditioning) technique in one-dimensional problems, it has been observed to be less effective in higher dimensions. This is due in part to the expense of solving the related diffusion linear system. We investigate here the effectiveness of a parallel semicoarsening multigrid (SMG) solution approach to DSA preconditioning in several three dimensional problems. In particular, we consider the algorithmic and implementation scalability of a parallel SMG-DSA preconditioner on several types of test problems
Strong interaction at finite temperature
Indian Academy of Sciences (India)
Quantum chromodynamics; finite temperature; chiral perturbation theory; QCD sum rules. PACS Nos 11.10. ..... at finite temperature. The self-energy diagrams of figure 2 modify it to ..... method of determination at present. Acknowledgement.
International Nuclear Information System (INIS)
Liu, Hao
2016-01-01
This Ph.D. work takes place within the framework of studies on Pellet-Cladding mechanical Interaction (PCI) which occurs in the fuel rods of pressurized water reactor. This manuscript focuses on automatic mesh refinement to simulate more accurately this phenomena while maintaining acceptable computational time and memory space for industrial calculations. An automatic mesh refinement strategy based on the combination of the Local Defect Correction multigrid method (LDC) with the Zienkiewicz and Zhu a posteriori error estimator is proposed. The estimated error is used to detect the zones to be refined, where the local sub-grids of the LDC method are generated. Several stopping criteria are studied to end the refinement process when the solution is accurate enough or when the refinement does not improve the global solution accuracy anymore. Numerical results for elastic 2D test cases with pressure discontinuity show the efficiency of the proposed strategy. The automatic mesh refinement in case of unilateral contact problems is then considered. The strategy previously introduced can be easily adapted to the multi-body refinement by estimating solution error on each body separately. Post-processing is often necessary to ensure the conformity of the refined areas regarding the contact boundaries. A variety of numerical experiments with elastic contact (with or without friction, with or without an initial gap) confirms the efficiency and adaptability of the proposed strategy. (author) [fr
Supersymmetry at finite temperature
International Nuclear Information System (INIS)
Oliveira, M.W. de.
1986-01-01
The consequences of the incorporation of finite temperature effects in fields theories are investigated. Particularly, we consider the sypersymmetric non-linear sigma model, calculating the effective potencial in the large N limit. Initially, we present the 1/N expantion formalism and, for the O(N) model of scalar field, we show the impossibility of spontaneous symmetry breaking. Next, we study the same model at finite temperature and in the presence of conserved charges (the O(N) symmetry's generator). We conclude that these conserved charges explicitly break the symmetry. We introduce a calculation method for the thermodynamic potential of the theory in the presence of chemical potentials. We present an introduction to Supersymmetry in the aim of describing some important concepts for the treatment at T>0. We show that Suppersymmetry is broken for any T>0, in opposition to what one expects, by the solution of the Hierachy Problem. (author) [pt
Directory of Open Access Journals (Sweden)
M.H.R. Ghoreishy
2008-02-01
Full Text Available This research work is devoted to the footprint analysis of a steel-belted radial tyre (185/65R14 under vertical static load using finite element method. Two models have been developed in which in the first model the tread patterns were replaced by simple ribs while the second model was consisted of details of the tread blocks. Linear elastic and hyper elastic (Arruda-Boyce material models were selected to describe the mechanical behavior of the reinforcing and rubbery parts, respectively. The above two finite element models of the tyre were analyzed under inflation pressure and vertical static loads. The second model (with detailed tread patterns was analyzed with and without friction effect between tread and contact surfaces. In every stage of the analysis, the results were compared with the experimental data to confirm the accuracy and applicability of the model. Results showed that neglecting the tread pattern design not only reduces the computational cost and effort but also the differences between computed deformations do not show significant changes. However, more complicated variables such as shape and area of the footprint zone and contact pressure are affected considerably by the finite element model selected for the tread blocks. In addition, inclusion of friction even in static state changes these variables significantly.
Belytschko, Ted; Wing, Kam Liu
1987-01-01
In the Probabilistic Finite Element Method (PFEM), finite element methods have been efficiently combined with second-order perturbation techniques to provide an effective method for informing the designer of the range of response which is likely in a given problem. The designer must provide as input the statistical character of the input variables, such as yield strength, load magnitude, and Young's modulus, by specifying their mean values and their variances. The output then consists of the mean response and the variance in the response. Thus the designer is given a much broader picture of the predicted performance than with simply a single response curve. These methods are applicable to a wide class of problems, provided that the scale of randomness is not too large and the probabilistic density functions possess decaying tails. By incorporating the computational techniques we have developed in the past 3 years for efficiency, the probabilistic finite element methods are capable of handling large systems with many sources of uncertainties. Sample results for an elastic-plastic ten-bar structure and an elastic-plastic plane continuum with a circular hole subject to cyclic loadings with the yield stress on the random field are given.
Efficient Parallel Algorithm For Direct Numerical Simulation of Turbulent Flows
Moitra, Stuti; Gatski, Thomas B.
1997-01-01
A distributed algorithm for a high-order-accurate finite-difference approach to the direct numerical simulation (DNS) of transition and turbulence in compressible flows is described. This work has two major objectives. The first objective is to demonstrate that parallel and distributed-memory machines can be successfully and efficiently used to solve computationally intensive and input/output intensive algorithms of the DNS class. The second objective is to show that the computational complexity involved in solving the tridiagonal systems inherent in the DNS algorithm can be reduced by algorithm innovations that obviate the need to use a parallelized tridiagonal solver.
Resistor Combinations for Parallel Circuits.
McTernan, James P.
1978-01-01
To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)
SOFTWARE FOR DESIGNING PARALLEL APPLICATIONS
Directory of Open Access Journals (Sweden)
M. K. Bouza
2017-01-01
Full Text Available The object of research is the tools to support the development of parallel programs in C/C ++. The methods and software which automates the process of designing parallel applications are proposed.
Simulations of finite beta turbulence in tokamaks and stellarators
International Nuclear Information System (INIS)
Jenko, F.
2002-01-01
One of the central open questions in our attempt to understand microturbulence in fusion plasmas concerns the role of finite beta effects. Nonlinear codes trying to investigate this issue must go beyond the commonly used adiabatic electron approximation - a task which turns out to be a serious computational challenge. This step is necessary because the electrons are the prime contributor to the parallel currents which in turn produce the magnetic field fluctuations. Results at both ion and electron space-time scales from gyrokinetic and gyrofluid models are presented which shed light on the character of finite beta turbulence in tokamaks and stellarators. (author)
Simulations of finite β turbulence in tokamaks and stellarators
International Nuclear Information System (INIS)
Jenko, F.; Scott, B.; Kendl, A.; Strintzi, D.; Dorland, W.
2003-01-01
One of the central open questions in our attempt to understand microturbulence in fusion plasmas concerns the role of finite β effects. Nonlinear codes trying to investigate this issue must go beyond the commonly used adiabatic electron approximation - a task which turns out to be a serious computational challenge. This step is necessary because the passing electrons are the prime contributor to the parallel currents which in turn produce the magnetic field fluctuations. Results at both ion and electron space-time scales from gyrokinetic and gyro fluid models are presented which shed light on the character of finite β turbulence in tokamaks and stellarators. (author)
Parallel External Memory Graph Algorithms
DEFF Research Database (Denmark)
Arge, Lars Allan; Goodrich, Michael T.; Sitchinava, Nodari
2010-01-01
In this paper, we study parallel I/O efficient graph algorithms in the Parallel External Memory (PEM) model, one o f the private-cache chip multiprocessor (CMP) models. We study the fundamental problem of list ranking which leads to efficient solutions to problems on trees, such as computing lowest...... an optimal speedup of Â¿(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts....
Parallel computation of transverse wakes in linear colliders
International Nuclear Information System (INIS)
Zhan, Xiaowei; Ko, Kwok.
1996-11-01
SLAC has proposed the detuned structure (DS) as one possible design to control the emittance growth of long bunch trains due to transverse wakefields in the Next Linear Collider (NLC). The DS consists of 206 cells with tapering from cell to cell of the order of few microns to provide Gaussian detuning of the dipole modes. The decoherence of these modes leads to two orders of magnitude reduction in wakefield experienced by the trailing bunch. To model such a large heterogeneous structure realistically is impractical with finite-difference codes using structured grids. The authors have calculated the wakefield in the DS on a parallel computer with a finite-element code using an unstructured grid. The parallel implementation issues are presented along with simulation results that include contributions from higher dipole bands and wall dissipation
Parallel Auxiliary Space AMG Solver for $H(div)$ Problems
Energy Technology Data Exchange (ETDEWEB)
Kolev, Tzanio V. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Vassilevski, Panayot S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2012-12-18
We present a family of scalable preconditioners for matrices arising in the discretization of $H(div)$ problems using the lowest order Raviart--Thomas finite elements. Our approach belongs to the class of “auxiliary space''--based methods and requires only the finite element stiffness matrix plus some minimal additional discretization information about the topology and orientation of mesh entities. Also, we provide a detailed algebraic description of the theory, parallel implementation, and different variants of this parallel auxiliary space divergence solver (ADS) and discuss its relations to the Hiptmair--Xu (HX) auxiliary space decomposition of $H(div)$ [SIAM J. Numer. Anal., 45 (2007), pp. 2483--2509] and to the auxiliary space Maxwell solver AMS [J. Comput. Math., 27 (2009), pp. 604--623]. Finally, an extensive set of numerical experiments demonstrates the robustness and scalability of our implementation on large-scale $H(div)$ problems with large jumps in the material coefficients.
Parallel inter channel interaction mechanisms
International Nuclear Information System (INIS)
Jovic, V.; Afgan, N.; Jovic, L.
1995-01-01
Parallel channels interactions are examined. For experimental researches of nonstationary regimes flow in three parallel vertical channels results of phenomenon analysis and mechanisms of parallel channel interaction for adiabatic condition of one-phase fluid and two-phase mixture flow are shown. (author)
International Nuclear Information System (INIS)
Soltz, R; Vranas, P; Blumrich, M; Chen, D; Gara, A; Giampap, M; Heidelberger, P; Salapura, V; Sexton, J; Bhanot, G
2007-01-01
The theory of the strong nuclear force, Quantum Chromodynamics (QCD), can be numerically simulated from first principles on massively-parallel supercomputers using the method of Lattice Gauge Theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures that it suggests. We demonstrate these methods on the BlueGene massively-parallel supercomputer and argue that LQCD and the BlueGene architecture are a natural match. This can be traced to the simple fact that LQCD is a regular lattice discretization of space into lattice sites while the BlueGene supercomputer is a discretization of space into compute nodes, and that both are constrained by requirements of locality. This simple relation is both technologically important and theoretically intriguing. The main result of this paper is the speedup of LQCD using up to 131,072 CPUs on the largest BlueGene/L supercomputer. The speedup is perfect with sustained performance of about 20% of peak. This corresponds to a maximum of 70.5 sustained TFlop/s. At these speeds LQCD and BlueGene are poised to produce the next generation of strong interaction physics theoretical results
A Parallel Butterfly Algorithm
Poulson, Jack; Demanet, Laurent; Maxwell, Nicholas; Ying, Lexing
2014-01-01
The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform (Equation Presented.) at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2Nd logN). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most (Equation Presented.) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x, y) = exp(iΦ(x, y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. © 2014 Society for Industrial and Applied Mathematics.
A Parallel Butterfly Algorithm
Poulson, Jack
2014-02-04
The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform (Equation Presented.) at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2Nd logN). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most (Equation Presented.) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x, y) = exp(iΦ(x, y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. © 2014 Society for Industrial and Applied Mathematics.
Fast parallel event reconstruction
CERN. Geneva
2010-01-01
On-line processing of large data volumes produced in modern HEP experiments requires using maximum capabilities of modern and future many-core CPU and GPU architectures.One of such powerful feature is a SIMD instruction set, which allows packing several data items in one register and to operate on all of them, thus achievingmore operations per clock cycle. Motivated by the idea of using the SIMD unit ofmodern processors, the KF based track fit has been adapted for parallelism, including memory optimization, numerical analysis, vectorization with inline operator overloading, and optimization using SDKs. The speed of the algorithm has been increased in 120000 times with 0.1 ms/track, running in parallel on 16 SPEs of a Cell Blade computer. Running on a Nehalem CPU with 8 cores it shows the processing speed of 52 ns/track using the Intel Threading Building Blocks. The same KF algorithm running on an Nvidia GTX 280 in the CUDA frameworkprovi...
Goal-Oriented Self-Adaptive hp Finite Element Simulation of 3D DC Borehole Resistivity Simulations
Calo, Victor M.; Pardo, David; Paszyński, Maciej R.
2011-01-01
(adjusting polynomial orders of approximation) or hp (both) refinements on the finite elements. The new parallel implementation utilizes a computational mesh shared between multiple processors. All computational algorithms, including automatic hp goal
Computational fluid dynamics on a massively parallel computer
Jespersen, Dennis C.; Levit, Creon
1989-01-01
A finite difference code was implemented for the compressible Navier-Stokes equations on the Connection Machine, a massively parallel computer. The code is based on the ARC2D/ARC3D program and uses the implicit factored algorithm of Beam and Warming. The codes uses odd-even elimination to solve linear systems. Timings and computation rates are given for the code, and a comparison is made with a Cray XMP.
A PARALLEL NONOVERLAPPING DOMAIN DECOMPOSITION METHOD FOR STOKES PROBLEMS
Institute of Scientific and Technical Information of China (English)
Mei-qun Jiang; Pei-liang Dai
2006-01-01
A nonoverlapping domain decomposition iterative procedure is developed and analyzed for generalized Stokes problems and their finite element approximate problems in RN(N=2,3). The method is based on a mixed-type consistency condition with two parameters as a transmission condition together with a derivative-free transmission data updating technique on the artificial interfaces. The method can be applied to a general multi-subdomain decomposition and implemented on parallel machines with local simple communications naturally.
Parallel algorithms and archtectures for computational structural mechanics
Patrick, Merrell; Ma, Shing; Mahajan, Umesh
1989-01-01
The determination of the fundamental (lowest) natural vibration frequencies and associated mode shapes is a key step used to uncover and correct potential failures or problem areas in most complex structures. However, the computation time taken by finite element codes to evaluate these natural frequencies is significant, often the most computationally intensive part of structural analysis calculations. There is continuing need to reduce this computation time. This study addresses this need by developing methods for parallel computation.
Eddy current testing probe optimization using a parallel genetic algorithm
Directory of Open Access Journals (Sweden)
Dolapchiev Ivaylo
2008-01-01
Full Text Available This paper uses the developed parallel version of Michalewicz's Genocop III Genetic Algorithm (GA searching technique to optimize the coil geometry of an eddy current non-destructive testing probe (ECTP. The electromagnetic field is computed using FEMM 2D finite element code. The aim of this optimization was to determine coil dimensions and positions that improve ECTP sensitivity to physical properties of the tested devices.
International Nuclear Information System (INIS)
Barletta, A.
2008-01-01
The necessary condition for the onset of parallel flow in the fully developed region of an inclined duct is applied to the case of a circular tube. Parallel flow in inclined ducts is an uncommon regime, since in most cases buoyancy tends to produce the onset of secondary flow. The present study shows how proper thermal boundary conditions may preserve parallel flow regime. Mixed convection flow is studied for a special non-axisymmetric thermal boundary condition that, with a proper choice of a switch parameter, may be compatible with parallel flow. More precisely, a circumferentially variable heat flux distribution is prescribed on the tube wall, expressed as a sinusoidal function of the azimuthal coordinate θ with period 2π. A π/2 rotation in the position of the maximum heat flux, achieved by setting the switch parameter, may allow or not the existence of parallel flow. Two cases are considered corresponding to parallel and non-parallel flow. In the first case, the governing balance equations allow a simple analytical solution. On the contrary, in the second case, the local balance equations are solved numerically by employing a finite element method
Optical Finite Element Processor
Casasent, David; Taylor, Bradley K.
1986-01-01
A new high-accuracy optical linear algebra processor (OLAP) with many advantageous features is described. It achieves floating point accuracy, handles bipolar data by sign-magnitude representation, performs LU decomposition using only one channel, easily partitions and considers data flow. A new application (finite element (FE) structural analysis) for OLAPs is introduced and the results of a case study presented. Error sources in encoded OLAPs are addressed for the first time. Their modeling and simulation are discussed and quantitative data are presented. Dominant error sources and the effects of composite error sources are analyzed.
Anderson, Ian
2011-01-01
Coherent treatment provides comprehensive view of basic methods and results of the combinatorial study of finite set systems. The Clements-Lindstrom extension of the Kruskal-Katona theorem to multisets is explored, as is the Greene-Kleitman result concerning k-saturated chain partitions of general partially ordered sets. Connections with Dilworth's theorem, the marriage problem, and probability are also discussed. Each chapter ends with a helpful series of exercises and outline solutions appear at the end. ""An excellent text for a topics course in discrete mathematics."" - Bulletin of the Ame
Algébrico: Parte II - Algoritmo Paralelo
Directory of Open Access Journals (Sweden)
Fabio Henrique Pereira
2007-01-01
Full Text Available In this work, it is presented a new parallel wavelet- based algorithm for the Algebraic Multigrid Method (PWAMG. A variation of the standard parallel implementation of discrete wavelet transforms is used in the construction of a hierarchy of matrices and of intergrid transfer operators for Algebraic Multigrid. The PWAMG method has been tested as a parallel solver for the two dimensional Poisson equation, for different numbers of finite difference mesh nodes and comparisons are made with the sequential version of this method.
International Nuclear Information System (INIS)
DeHart, Mark D.; Williams, Mark L.; Bowman, Stephen M.
2010-01-01
The SCALE computational architecture has remained basically the same since its inception 30 years ago, although constituent modules and capabilities have changed significantly. This SCALE concept was intended to provide a framework whereby independent codes can be linked to provide a more comprehensive capability than possible with the individual programs - allowing flexibility to address a wide variety of applications. However, the current system was designed originally for mainframe computers with a single CPU and with significantly less memory than today's personal computers. It has been recognized that the present SCALE computation system could be restructured to take advantage of modern hardware and software capabilities, while retaining many of the modular features of the present system. Preliminary work is being done to define specifications and capabilities for a more advanced computational architecture. This paper describes the state of current SCALE development activities and plans for future development. With the release of SCALE 6.1 in 2010, a new phase of evolutionary development will be available to SCALE users within the TRITON and NEWT modules. The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system developed by Oak Ridge National Laboratory (ORNL) provides a comprehensive and integrated package of codes and nuclear data for a wide range of applications in criticality safety, reactor physics, shielding, isotopic depletion and decay, and sensitivity/uncertainty (S/U) analysis. Over the last three years, since the release of version 5.1 in 2006, several important new codes have been introduced within SCALE, and significant advances applied to existing codes. Many of these new features became available with the release of SCALE 6.0 in early 2009. However, beginning with SCALE 6.1, a first generation of parallel computing is being introduced. In addition to near-term improvements, a plan for longer term SCALE enhancement
Contact-impact algorithms on parallel computers
International Nuclear Information System (INIS)
Zhong Zhihua; Nilsson, Larsgunnar
1994-01-01
Contact-impact algorithms on parallel computers are discussed within the context of explicit finite element analysis. The algorithms concerned include a contact searching algorithm and an algorithm for contact force calculations. The contact searching algorithm is based on the territory concept of the general HITA algorithm. However, no distinction is made between different contact bodies, or between different contact surfaces. All contact segments from contact boundaries are taken as a single set. Hierarchy territories and contact territories are expanded. A three-dimensional bucket sort algorithm is used to sort contact nodes. The defence node algorithm is used in the calculation of contact forces. Both the contact searching algorithm and the defence node algorithm are implemented on the connection machine CM-200. The performance of the algorithms is examined under different circumstances, and numerical results are presented. ((orig.))
Parallel Polarization State Generation.
She, Alan; Capasso, Federico
2016-05-17
The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security.
Parallel imaging microfluidic cytometer.
Ehrlich, Daniel J; McKenna, Brian K; Evans, James G; Belkina, Anna C; Denis, Gerald V; Sherr, David H; Cheung, Man Ching
2011-01-01
By adding an additional degree of freedom from multichannel flow, the parallel microfluidic cytometer (PMC) combines some of the best features of fluorescence-activated flow cytometry (FCM) and microscope-based high-content screening (HCS). The PMC (i) lends itself to fast processing of large numbers of samples, (ii) adds a 1D imaging capability for intracellular localization assays (HCS), (iii) has a high rare-cell sensitivity, and (iv) has an unusual capability for time-synchronized sampling. An inability to practically handle large sample numbers has restricted applications of conventional flow cytometers and microscopes in combinatorial cell assays, network biology, and drug discovery. The PMC promises to relieve a bottleneck in these previously constrained applications. The PMC may also be a powerful tool for finding rare primary cells in the clinic. The multichannel architecture of current PMC prototypes allows 384 unique samples for a cell-based screen to be read out in ∼6-10 min, about 30 times the speed of most current FCM systems. In 1D intracellular imaging, the PMC can obtain protein localization using HCS marker strategies at many times for the sample throughput of charge-coupled device (CCD)-based microscopes or CCD-based single-channel flow cytometers. The PMC also permits the signal integration time to be varied over a larger range than is practical in conventional flow cytometers. The signal-to-noise advantages are useful, for example, in counting rare positive cells in the most difficult early stages of genome-wide screening. We review the status of parallel microfluidic cytometry and discuss some of the directions the new technology may take. Copyright © 2011 Elsevier Inc. All rights reserved.
Kalita, Jiten C.; Biswas, Sougata; Panda, Swapnendu
2018-04-01
Till date, the sequence of vortices present in the solid corners of steady internal viscous incompressible flows was thought to be infinite. However, the already existing and most recent geometric theories on incompressible viscous flows that express vortical structures in terms of critical points in bounded domains indicate a strong opposition to this notion of infiniteness. In this study, we endeavor to bridge the gap between the two opposing stream of thoughts by diagnosing the assumptions of the existing theorems on such vortices. We provide our own set of proofs for establishing the finiteness of the sequence of corner vortices by making use of the continuum hypothesis and Kolmogorov scale, which guarantee a nonzero scale for the smallest vortex structure possible in incompressible viscous flows. We point out that the notion of infiniteness resulting from discrete self-similarity of the vortex structures is not physically feasible. Making use of some elementary concepts of mathematical analysis and our own construction of diametric disks, we conclude that the sequence of corner vortices is finite.
Effective arithmetic in finite fields based on Chudnovsky's multiplication algorithm
Atighehchi , Kévin; Ballet , Stéphane; Bonnecaze , Alexis; Rolland , Robert
2016-01-01
International audience; Thanks to a new construction of the Chudnovsky and Chudnovsky multiplication algorithm, we design efficient algorithms for both the exponentiation and the multiplication in finite fields. They are tailored to hardware implementation and they allow computations to be parallelized, while maintaining a low number of bilinear multiplications.À partir d'une nouvelle construction de l'algorithme de multiplication de Chudnovsky et Chudnovsky, nous concevons des algorithmes ef...
About Parallel Programming: Paradigms, Parallel Execution and Collaborative Systems
Directory of Open Access Journals (Sweden)
Loredana MOCEAN
2009-01-01
Full Text Available In the last years, there were made efforts for delineation of a stabile and unitary frame, where the problems of logical parallel processing must find solutions at least at the level of imperative languages. The results obtained by now are not at the level of the made efforts. This paper wants to be a little contribution at these efforts. We propose an overview in parallel programming, parallel execution and collaborative systems.
Directory of Open Access Journals (Sweden)
Jos C. M. Baeten
2010-11-01
Full Text Available The languages accepted by finite automata are precisely the languages denoted by regular expressions. In contrast, finite automata may exhibit behaviours that cannot be described by regular expressions up to bisimilarity. In this paper, we consider extensions of the theory of regular expressions with various forms of parallel composition and study the effect on expressiveness. First we prove that adding pure interleaving to the theory of regular expressions strictly increases its expressiveness up to bisimilarity. Then, we prove that replacing the operation for pure interleaving by ACP-style parallel composition gives a further increase in expressiveness. Finally, we prove that the theory of regular expressions with ACP-style parallel composition and encapsulation is expressive enough to express all finite automata up to bisimilarity. Our results extend the expressiveness results obtained by Bergstra, Bethke and Ponse for process algebras with (the binary variant of Kleene's star operation.
A two-level parallel direct search implementation for arbitrarily sized objective functions
Energy Technology Data Exchange (ETDEWEB)
Hutchinson, S.A.; Shadid, N.; Moffat, H.K. [Sandia National Labs., Albuquerque, NM (United States)] [and others
1994-12-31
In the past, many optimization schemes for massively parallel computers have attempted to achieve parallel efficiency using one of two methods. In the case of large and expensive objective function calculations, the optimization itself may be run in serial and the objective function calculations parallelized. In contrast, if the objective function calculations are relatively inexpensive and can be performed on a single processor, then the actual optimization routine itself may be parallelized. In this paper, a scheme based upon the Parallel Direct Search (PDS) technique is presented which allows the objective function calculations to be done on an arbitrarily large number (p{sub 2}) of processors. If, p, the number of processors available, is greater than or equal to 2p{sub 2} then the optimization may be parallelized as well. This allows for efficient use of computational resources since the objective function calculations can be performed on the number of processors that allow for peak parallel efficiency and then further speedup may be achieved by parallelizing the optimization. Results are presented for an optimization problem which involves the solution of a PDE using a finite-element algorithm as part of the objective function calculation. The optimum number of processors for the finite-element calculations is less than p/2. Thus, the PDS method is also parallelized. Performance comparisons are given for a nCUBE 2 implementation.
Parallel Numerical Simulations of Water Reservoirs
Torres, Pedro; Mangiavacchi, Norberto
2010-11-01
The study of the water flow and scalar transport in water reservoirs is important for the determination of the water quality during the initial stages of the reservoir filling and during the life of the reservoir. For this scope, a parallel 2D finite element code for solving the incompressible Navier-Stokes equations coupled with scalar transport was implemented using the message-passing programming model, in order to perform simulations of hidropower water reservoirs in a computer cluster environment. The spatial discretization is based on the MINI element that satisfies the Babuska-Brezzi (BB) condition, which provides sufficient conditions for a stable mixed formulation. All the distributed data structures needed in the different stages of the code, such as preprocessing, solving and post processing, were implemented using the PETSc library. The resulting linear systems for the velocity and the pressure fields were solved using the projection method, implemented by an approximate block LU factorization. In order to increase the parallel performance in the solution of the linear systems, we employ the static condensation method for solving the intermediate velocity at vertex and centroid nodes separately. We compare performance results of the static condensation method with the approach of solving the complete system. In our tests the static condensation method shows better performance for large problems, at the cost of an increased memory usage. Performance results for other intensive parts of the code in a computer cluster are also presented.
The Determining Finite Automata Process
Directory of Open Access Journals (Sweden)
M. S. Vinogradova
2017-01-01
Full Text Available The theory of formal languages widely uses finite state automata both in implementation of automata-based approach to programming, and in synthesis of logical control algorithms.To ensure unambiguous operation of the algorithms, the synthesized finite state automata must be deterministic. Within the approach to the synthesis of the mobile robot controls, for example, based on the theory of formal languages, there are problems concerning the construction of various finite automata, but such finite automata, as a rule, will not be deterministic. The algorithm of determinization can be applied to the finite automata, as specified, in various ways. The basic ideas of the algorithm of determinization can be most simply explained using the representations of a finite automaton in the form of a weighted directed graph.The paper deals with finite automata represented as weighted directed graphs, and discusses in detail the procedure for determining the finite automata represented in this way. Gives a detailed description of the algorithm for determining finite automata. A large number of examples illustrate a capability of the determinization algorithm.
Parallel Framework for Cooperative Processes
Directory of Open Access Journals (Sweden)
Mitică Craus
2005-01-01
Full Text Available This paper describes the work of an object oriented framework designed to be used in the parallelization of a set of related algorithms. The idea behind the system we are describing is to have a re-usable framework for running several sequential algorithms in a parallel environment. The algorithms that the framework can be used with have several things in common: they have to run in cycles and the work should be possible to be split between several "processing units". The parallel framework uses the message-passing communication paradigm and is organized as a master-slave system. Two applications are presented: an Ant Colony Optimization (ACO parallel algorithm for the Travelling Salesman Problem (TSP and an Image Processing (IP parallel algorithm for the Symmetrical Neighborhood Filter (SNF. The implementations of these applications by means of the parallel framework prove to have good performances: approximatively linear speedup and low communication cost.
Finite energy electroweak dyon
Energy Technology Data Exchange (ETDEWEB)
Kimm, Kyoungtae [Seoul National University, Faculty of Liberal Education, Seoul (Korea, Republic of); Yoon, J.H. [Konkuk University, Department of Physics, College of Natural Sciences, Seoul (Korea, Republic of); Cho, Y.M. [Konkuk University, Administration Building 310-4, Seoul (Korea, Republic of); Seoul National University, School of Physics and Astronomy, Seoul (Korea, Republic of)
2015-02-01
The latest MoEDAL experiment at LHC to detect the electroweak monopole makes the theoretical prediction of the monopole mass an urgent issue. We discuss three different ways to estimate the mass of the electroweak monopole. We first present the dimensional and scaling arguments which indicate the monopole mass to be around 4 to 10 TeV. To justify this we construct finite energy analytic dyon solutions which could be viewed as the regularized Cho-Maison dyon, modifying the coupling strength at short distance. Our result demonstrates that a genuine electroweak monopole whose mass scale is much smaller than the grand unification scale can exist, which can actually be detected at the present LHC. (orig.)
Probabilistic fracture finite elements
Liu, W. K.; Belytschko, T.; Lua, Y. J.
1991-05-01
The Probabilistic Fracture Mechanics (PFM) is a promising method for estimating the fatigue life and inspection cycles for mechanical and structural components. The Probability Finite Element Method (PFEM), which is based on second moment analysis, has proved to be a promising, practical approach to handle problems with uncertainties. As the PFEM provides a powerful computational tool to determine first and second moment of random parameters, the second moment reliability method can be easily combined with PFEM to obtain measures of the reliability of the structural system. The method is also being applied to fatigue crack growth. Uncertainties in the material properties of advanced materials such as polycrystalline alloys, ceramics, and composites are commonly observed from experimental tests. This is mainly attributed to intrinsic microcracks, which are randomly distributed as a result of the applied load and the residual stress.
Parallel Monte Carlo reactor neutronics
International Nuclear Information System (INIS)
Blomquist, R.N.; Brown, F.B.
1994-01-01
The issues affecting implementation of parallel algorithms for large-scale engineering Monte Carlo neutron transport simulations are discussed. For nuclear reactor calculations, these include load balancing, recoding effort, reproducibility, domain decomposition techniques, I/O minimization, and strategies for different parallel architectures. Two codes were parallelized and tested for performance. The architectures employed include SIMD, MIMD-distributed memory, and workstation network with uneven interactive load. Speedups linear with the number of nodes were achieved
DEFF Research Database (Denmark)
Kosbar, Tamer R.; Sofan, Mamdouh A.; Waly, Mohamed A.
2015-01-01
about 6.1 °C when the TFO strand was modified with Z and the Watson-Crick strand with adenine-LNA (AL). The molecular modeling results showed that, in case of nucleobases Y and Z a hydrogen bond (1.69 and 1.72 Å, respectively) was formed between the protonated 3-aminopropyn-1-yl chain and one...... of the phosphate groups in Watson-Crick strand. Also, it was shown that the nucleobase Y made a good stacking and binding with the other nucleobases in the TFO and Watson-Crick duplex, respectively. In contrast, the nucleobase Z with LNA moiety was forced to twist out of plane of Watson-Crick base pair which......The phosphoramidites of DNA monomers of 7-(3-aminopropyn-1-yl)-8-aza-7-deazaadenine (Y) and 7-(3-aminopropyn-1-yl)-8-aza-7-deazaadenine LNA (Z) are synthesized, and the thermal stability at pH 7.2 and 8.2 of anti-parallel triplexes modified with these two monomers is determined. When, the anti...
Parallel consensual neural networks.
Benediktsson, J A; Sveinsson, J R; Ersoy, O K; Swain, P H
1997-01-01
A new type of a neural-network architecture, the parallel consensual neural network (PCNN), is introduced and applied in classification/data fusion of multisource remote sensing and geographic data. The PCNN architecture is based on statistical consensus theory and involves using stage neural networks with transformed input data. The input data are transformed several times and the different transformed data are used as if they were independent inputs. The independent inputs are first classified using the stage neural networks. The output responses from the stage networks are then weighted and combined to make a consensual decision. In this paper, optimization methods are used in order to weight the outputs from the stage networks. Two approaches are proposed to compute the data transforms for the PCNN, one for binary data and another for analog data. The analog approach uses wavelet packets. The experimental results obtained with the proposed approach show that the PCNN outperforms both a conjugate-gradient backpropagation neural network and conventional statistical methods in terms of overall classification accuracy of test data.
Axial anomaly at finite temperature and finite density
International Nuclear Information System (INIS)
Qian Zhixin; Su Rukeng; Yu, P.K.N.
1994-01-01
The U(1) axial anomaly in a hot fermion medium is investigated by using the real time Green's function method. After calculating the lowest order triangle diagrams, we find that finite temperature as well as finite fermion density does not affect the axial anomaly. The higher order corrections for the axial anomaly are discussed. (orig.)
A Parallel Particle Swarm Optimizer
National Research Council Canada - National Science Library
Schutte, J. F; Fregly, B .J; Haftka, R. T; George, A. D
2003-01-01
.... Motivated by a computationally demanding biomechanical system identification problem, we introduce a parallel implementation of a stochastic population based global optimizer, the Particle Swarm...
Patterns for Parallel Software Design
Ortega-Arjona, Jorge Luis
2010-01-01
Essential reading to understand patterns for parallel programming Software patterns have revolutionized the way we think about how software is designed, built, and documented, and the design of parallel software requires you to consider other particular design aspects and special skills. From clusters to supercomputers, success heavily depends on the design skills of software developers. Patterns for Parallel Software Design presents a pattern-oriented software architecture approach to parallel software design. This approach is not a design method in the classic sense, but a new way of managin
DEFF Research Database (Denmark)
Christensen, Mark Schram; Ehrsson, H Henrik; Nielsen, Jens Bo
2013-01-01
a different network, involving bilateral dorsal premotor cortex (PMd), primary motor cortex, and SMA, was more active when subjects viewed parallel movements while performing either symmetrical or parallel movements. Correlations between behavioral instability and brain activity were present in right lateral...... adduction-abduction movements symmetrically or in parallel with real-time congruent or incongruent visual feedback of the movements. One network, consisting of bilateral superior and middle frontal gyrus and supplementary motor area (SMA), was more active when subjects performed parallel movements, whereas...
International Nuclear Information System (INIS)
Masiello, E.
2006-01-01
The principal goal of this manuscript is devoted to the investigation of a new type of heterogeneous mesh adapted to the shape of the fuel pins (fuel-clad-moderator). The new heterogeneous mesh guarantees the spatial modelling of the pin-cell with a minimum of regions. Two methods are investigated for the spatial discretization of the transport equation: the discontinuous finite element method and the method of characteristics for structured cells. These methods together with the new representation of the pin-cell result in an appreciable reduction of calculation points. They allow an exact modelling of the fuel pin-cell without spatial homogenization. A new synthetic acceleration technique based on an angular multigrid is also presented for the speed up of the inner iterations. These methods are good candidates for transport calculations for a nuclear reactor core. A second objective of this work is the application of method of characteristics for non-structured geometries to the study of double heterogeneity problem. The letters is characterized by fuel material with a stochastic dispersion of heterogeneous grains, and until now was solved with a model based on collision probabilities. We propose a new statistical model based on renewal-Markovian theory, which makes possible to take into account the stochastic nature of the problem and to avoid the approximations of the collision probability model. The numerical solution of this model is guaranteed by the method of characteristics. (author)
Self-balanced modulation and magnetic rebalancing method for parallel multilevel inverters
Li, Hui; Shi, Yanjun
2017-11-28
A self-balanced modulation method and a closed-loop magnetic flux rebalancing control method for parallel multilevel inverters. The combination of the two methods provides for balancing of the magnetic flux of the inter-cell transformers (ICTs) of the parallel multilevel inverters without deteriorating the quality of the output voltage. In various embodiments a parallel multi-level inverter modulator is provide including a multi-channel comparator to generate a multiplexed digitized ideal waveform for a parallel multi-level inverter and a finite state machine (FSM) module coupled to the parallel multi-channel comparator, the FSM module to receive the multiplexed digitized ideal waveform and to generate a pulse width modulated gate-drive signal for each switching device of the parallel multi-level inverter. The system and method provides for optimization of the output voltage spectrum without influence the magnetic balancing.
Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB
Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.
2017-01-01
Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.
Axial anomaly at finite temperature
International Nuclear Information System (INIS)
Chaturvedi, S.; Gupte, Neelima; Srinivasan, V.
1985-01-01
The Jackiw-Bardeen-Adler anomaly for QED 4 and QED 2 are calculated at finite temperature. It is found that the anomaly is independent of temperature. Ishikawa's method [1984, Phys. Rev. Lett. vol. 53 1615] for calculating the quantised Hall effect is extended to finite temperature. (author)
Finite flavour groups of fermions
International Nuclear Information System (INIS)
Grimus, Walter; Ludl, Patrick Otto
2012-01-01
We present an overview of the theory of finite groups, with regard to their application as flavour symmetries in particle physics. In a general part, we discuss useful theorems concerning group structure, conjugacy classes, representations and character tables. In a specialized part, we attempt to give a fairly comprehensive review of finite subgroups of SO(3) and SU(3), in which we apply and illustrate the general theory. Moreover, we also provide a concise description of the symmetric and alternating groups and comment on the relationship between finite subgroups of U(3) and finite subgroups of SU(3). Although in this review we give a detailed description of a wide range of finite groups, the main focus is on the methods which allow the exploration of their different aspects. (topical review)
On finite quantum field theories
International Nuclear Information System (INIS)
Rajpoot, S.; Taylor, J.G.
1984-01-01
The properties that make massless versions of N = 4 super Yang-Mills theory and a class of N = 2 supersymmetric theories finite are: (I) a universal coupling for the gauge and matter interactions, (II) anomaly-free representations to which the bosonic and fermionic matter belong, and (III) no charge renormalisation, i.e. β(g) = 0. It was conjectured that field theories constructed out of N = 1 matter multiplets are also finite if they too share the above properties. Explicit calculations have verified these theories to be finite up to two loops. The implications of the finiteness conditions for N = 1 finite field theories with SU(M) gauge symmetry are discussed. (orig.)
PARALLEL IMPORT: REALITY FOR RUSSIA
Directory of Open Access Journals (Sweden)
Т. А. Сухопарова
2014-01-01
Full Text Available Problem of parallel import is urgent question at now. Parallel import legalization in Russia is expedient. Such statement based on opposite experts opinion analysis. At the same time it’s necessary to negative consequences consider of this decision and to apply remedies to its minimization.Purchase on Elibrary.ru > Buy now
The Galley Parallel File System
Nieuwejaar, Nils; Kotz, David
1996-01-01
Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.
Parallelization of the FLAPW method
International Nuclear Information System (INIS)
Canning, A.; Mannstadt, W.; Freeman, A.J.
1999-01-01
The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about one hundred atoms due to a lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel computer
Parallelization of the FLAPW method
Canning, A.; Mannstadt, W.; Freeman, A. J.
2000-08-01
The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining structural, electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work, we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel supercomputer.
Traoré, Philippe; Ahipo, Yves Marcel; Louste, Christophe
2009-08-01
In this paper an improved finite volume scheme to discretize diffusive flux on a non-orthogonal mesh is proposed. This approach, based on an iterative technique initially suggested by Khosla [P.K. Khosla, S.G. Rubin, A diagonally dominant second-order accurate implicit scheme, Computers and Fluids 2 (1974) 207-209] and known as deferred correction, has been intensively utilized by Muzaferija [S. Muzaferija, Adaptative finite volume method for flow prediction using unstructured meshes and multigrid approach, Ph.D. Thesis, Imperial College, 1994] and later Fergizer and Peric [J.H. Fergizer, M. Peric, Computational Methods for Fluid Dynamics, Springer, 2002] to deal with the non-orthogonality of the control volumes. Using a more suitable decomposition of the normal gradient, our scheme gives accurate solutions in geometries where the basic idea of Muzaferija fails. First the performances of both schemes are compared for a Poisson problem solved in quadrangular domains where control volumes are increasingly skewed in order to test their robustness and efficiency. It is shown that convergence properties and the accuracy order of the solution are not degraded even on extremely skewed mesh. Next, the very stable behavior of the method is successfully demonstrated on a randomly distorted grid as well as on an anisotropically distorted one. Finally we compare the solution obtained for quadrilateral control volumes to the ones obtained with a finite element code and with an unstructured version of our finite volume code for triangular control volumes. No differences can be observed between the different solutions, which demonstrates the effectiveness of our approach.
International Nuclear Information System (INIS)
Souza, Manoelito M. de
1997-01-01
We discuss the physical meaning and the geometric interpretation of implementation in classical field theories. The origin of infinities and other inconsistencies in field theories is traced to fields defined with support on the light cone; a finite and consistent field theory requires a light-cone generator as the field support. Then, we introduce a classical field theory with support on the light cone generators. It results on a description of discrete (point-like) interactions in terms of localized particle-like fields. We find the propagators of these particle-like fields and discuss their physical meaning, properties and consequences. They are conformally invariant, singularity-free, and describing a manifestly covariant (1 + 1)-dimensional dynamics in a (3 = 1) spacetime. Remarkably this conformal symmetry remains even for the propagation of a massive field in four spacetime dimensions. We apply this formalism to Classical electrodynamics and to the General Relativity Theory. The standard formalism with its distributed fields is retrieved in terms of spacetime average of the discrete field. Singularities are the by-products of the averaging process. This new formalism enlighten the meaning and the problem of field theory, and may allow a softer transition to a quantum theory. (author)
Mimetic finite difference method
Lipnikov, Konstantin; Manzini, Gianmarco; Shashkov, Mikhail
2014-01-01
The mimetic finite difference (MFD) method mimics fundamental properties of mathematical and physical systems including conservation laws, symmetry and positivity of solutions, duality and self-adjointness of differential operators, and exact mathematical identities of the vector and tensor calculus. This article is the first comprehensive review of the 50-year long history of the mimetic methodology and describes in a systematic way the major mimetic ideas and their relevance to academic and real-life problems. The supporting applications include diffusion, electromagnetics, fluid flow, and Lagrangian hydrodynamics problems. The article provides enough details to build various discrete operators on unstructured polygonal and polyhedral meshes and summarizes the major convergence results for the mimetic approximations. Most of these theoretical results, which are presented here as lemmas, propositions and theorems, are either original or an extension of existing results to a more general formulation using polyhedral meshes. Finally, flexibility and extensibility of the mimetic methodology are shown by deriving higher-order approximations, enforcing discrete maximum principles for diffusion problems, and ensuring the numerical stability for saddle-point systems.
Finite element and finite difference methods in electromagnetic scattering
Morgan, MA
2013-01-01
This second volume in the Progress in Electromagnetic Research series examines recent advances in computational electromagnetics, with emphasis on scattering, as brought about by new formulations and algorithms which use finite element or finite difference techniques. Containing contributions by some of the world's leading experts, the papers thoroughly review and analyze this rapidly evolving area of computational electromagnetics. Covering topics ranging from the new finite-element based formulation for representing time-harmonic vector fields in 3-D inhomogeneous media using two coupled sca
Lee, Y. C.; Thompson, H. M.; Gaskell, P. H.
2009-12-01
FILMPAR is a highly efficient and portable parallel multigrid algorithm for solving a discretised form of the lubrication approximation to three-dimensional, gravity-driven, continuous thin film free-surface flow over substrates containing micro-scale topography. While generally applicable to problems involving heterogeneous and distributed features, for illustrative purposes the algorithm is benchmarked on a distributed memory IBM BlueGene/P computing platform for the case of flow over a single trench topography, enabling direct comparison with complementary experimental data and existing serial multigrid solutions. Parallel performance is assessed as a function of the number of processors employed and shown to lead to super-linear behaviour for the production of mesh-independent solutions. In addition, the approach is used to solve for the case of flow over a complex inter-connected topographical feature and a description provided of how FILMPAR could be adapted relatively simply to solve for a wider class of related thin film flow problems. Program summaryProgram title: FILMPAR Catalogue identifier: AEEL_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEL_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 530 421 No. of bytes in distributed program, including test data, etc.: 1 960 313 Distribution format: tar.gz Programming language: C++ and MPI Computer: Desktop, server Operating system: Unix/Linux Mac OS X Has the code been vectorised or parallelised?: Yes. Tested with up to 128 processors RAM: 512 MBytes Classification: 12 External routines: GNU C/C++, MPI Nature of problem: Thin film flows over functional substrates containing well-defined single and complex topographical features are of enormous significance, having a wide variety of engineering
Parallel computing in plasma physics: Nonlinear instabilities
International Nuclear Information System (INIS)
Pohn, E.; Kamelander, G.; Shoucri, M.
2000-01-01
processors. Afterwards the υ x - and υ y -shifts also can be performed on the same slices in parallel without communication. Before performing theυ z -shifts communication is necessary. Each processor sends to each other processor a proper part of its υ z -slice to obtain y-slices. On these y-slices the υ z -shifts are performed, again in parallel. Then the y-slices are transformed back to υ z -slices. The nonlinear Poisson-equation is solved iteratively using a Newton-method. From the known potential the electric field was calculated using cubic-spline interpolation. No parallelization is necessary for these not time consuming calculations. In the spatial one-dimensional simulation 6000 time-steps were performed at a grid of N x x N υx x N υy x N υz =160 x 48 x 48 x 48 points. Up to 8 processors were used in parallel. The results show initially the development of a potential, oscillating with a frequency near the ion cyclotron frequency, which is followed by the evolution of equilibrium of the potential, showing a positive potential bump towards the plasma. The results show the importance of a fully-kinetic solution of the charge-separation at a plasma-edge and the important role played by the finite ion gyro-radius in establishing such equilibrium. (author)
Parallel computing for homogeneous diffusion and transport equations in neutronics
International Nuclear Information System (INIS)
Pinchedez, K.
1999-06-01
Parallel computing meets the ever-increasing requirements for neutronic computer code speed and accuracy. In this work, two different approaches have been considered. We first parallelized the sequential algorithm used by the neutronics code CRONOS developed at the French Atomic Energy Commission. The algorithm computes the dominant eigenvalue associated with PN simplified transport equations by a mixed finite element method. Several parallel algorithms have been developed on distributed memory machines. The performances of the parallel algorithms have been studied experimentally by implementation on a T3D Cray and theoretically by complexity models. A comparison of various parallel algorithms has confirmed the chosen implementations. We next applied a domain sub-division technique to the two-group diffusion Eigen problem. In the modal synthesis-based method, the global spectrum is determined from the partial spectra associated with sub-domains. Then the Eigen problem is expanded on a family composed, on the one hand, from eigenfunctions associated with the sub-domains and, on the other hand, from functions corresponding to the contribution from the interface between the sub-domains. For a 2-D homogeneous core, this modal method has been validated and its accuracy has been measured. (author)
Scalable Adaptive Multilevel Solvers for Multiphysics Problems
Energy Technology Data Exchange (ETDEWEB)
Xu, Jinchao [Pennsylvania State Univ., University Park, PA (United States). Dept. of Mathematics
2014-11-26
In this project, we carried out many studies on adaptive and parallel multilevel methods for numerical modeling for various applications, including Magnetohydrodynamics (MHD) and complex fluids. We have made significant efforts and advances in adaptive multilevel methods of the multiphysics problems: multigrid methods, adaptive finite element methods, and applications.
Is Monte Carlo embarrassingly parallel?
Energy Technology Data Exchange (ETDEWEB)
Hoogenboom, J. E. [Delft Univ. of Technology, Mekelweg 15, 2629 JB Delft (Netherlands); Delft Nuclear Consultancy, IJsselzoom 2, 2902 LB Capelle aan den IJssel (Netherlands)
2012-07-01
Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)
Is Monte Carlo embarrassingly parallel?
International Nuclear Information System (INIS)
Hoogenboom, J. E.
2012-01-01
Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)
Parallel integer sorting with medium and fine-scale parallelism
Dagum, Leonardo
1993-01-01
Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.