Fast implementations of 3D PET reconstruction using vector and parallel programming techniques
Computationally intensive techniques that offer potential clinical use have arisen in nuclear medicine. Examples include iterative reconstruction, 3D PET data acquisition and reconstruction, and 3D image volume manipulation including image registration. One obstacle in achieving clinical acceptance of these techniques is the computational time required. This study focuses on methods to reduce the computation time for 3D PET reconstruction through the use of fast computer hardware, vector and parallel programming techniques, and algorithm optimization. The strengths and weaknesses of i860 microprocessor based workstation accelerator boards are investigated in implementations of 3D PET reconstruction
Sofronov, I.D.; Voronin, B.L.; Butnev, O.I. [VNIIEF (Russian Federation)] [and others
1997-12-31
The aim of the work performed is to develop a 3D parallel program for numerical calculation of gas dynamics problem with heat conductivity on distributed memory computational systems (CS), satisfying the condition of numerical result independence from the number of processors involved. Two basically different approaches to the structure of massive parallel computations have been developed. The first approach uses the 3D data matrix decomposition reconstructed at temporal cycle and is a development of parallelization algorithms for multiprocessor CS with shareable memory. The second approach is based on using a 3D data matrix decomposition not reconstructed during a temporal cycle. The program was developed on 8-processor CS MP-3 made in VNIIEF and was adapted to a massive parallel CS Meiko-2 in LLNL by joint efforts of VNIIEF and LLNL staffs. A large number of numerical experiments has been carried out with different number of processors up to 256 and the efficiency of parallelization has been evaluated in dependence on processor number and their parameters.
Shared Memory Parallelism for 3D Cartesian Discrete Ordinates Solver
Moustafa, Salli; Dutka Malen, Ivan; Plagne, Laurent; Ponçot, Angélique; Ramet, Pierre
2014-01-01
This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multicore+SIMD) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOM...
Shared memory parallelism for 3D cartesian discrete ordinates solver
This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multi-core + SIMD - Single Instruction on Multiple Data) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOMINO can exploit the full power of modern multi-core processors and is able to tackle very large simulations, that usually require large HPC clusters, using a single computing node. For example, DOMINO solves a 3D full core PWR eigenvalue problem involving 26 energy groups, 288 angular directions (S16), 46*106 spatial cells and 1*1012 DoFs within 11 hours on a single 32-core SMP node. This represents a sustained performance of 235 GFlops and 40.74% of the SMP node peak performance for the DOMINO sweep implementation. The very high Flops/Watt ratio of DOMINO makes it a very interesting building block for a future many-nodes nuclear simulation tool. (authors)
Parallel FEM simulation of 3-D crack propagation
Full text: Crack propagation simulation is an important topic in many fields, e.g., aeronautical engineering, material sciences, and geophysics. This type of simulation requires a high computational power, mainly at three-dimensional mesh generation and structural analysis steps. These steps usually spend a large amount of computing time and machine resources. The main objective of this work is to provide a fast and accurate system for crack growth simulation in three-dimensional models. The main idea of the methodology presented is to parallelize mesh generation and structural analysis procedures, and to integrate these procedures into a computational environment able to perform automatic arbitrary crack propagation. A parallel mesh generation algorithm has been developed. This algorithm is capable of generating three-dimensional meshes of tetrahedral elements in arbitrary domains with one or multiple embedded cracks. A finite element method program called FEMOOP has been adapted to implement the parallel features. The parallel strategy to solve the set of linear equations is based on an element-by-element scheme in conjunction with a gradient iterative solution. A program called FRANC3D, which is completely integrated with other components of the system, performs crack propagation and geometry updates. The entire system is described in details and a set of parallel simulations of crack propagation are presented to show the reliability of the system. Refs. 4 (author)
DYNA3D2000*, Explicit 3-D Hydrodynamic FEM Program
1 - Description of program or function: DYNA3D2000 is a nonlinear explicit finite element code for analyzing 3-D structures and solid continuum. The code is vectorized and available on several computer platforms. The element library includes continuum, shell, beam, truss and spring/damper elements to allow maximum flexibility in modeling physical problems. Many materials are available to represent a wide range of material behavior, including elasticity, plasticity, composites, thermal effects and rate dependence. In addition, DYNA3D has a sophisticated contact interface capability, including frictional sliding, single surface contact and automatic contact generation. 2 - Method of solution: Discretization of a continuous model transforms partial differential equations into algebraic equations. A numerical solution is then obtained by solving these algebraic equations through a direct time marching scheme. 3 - Restrictions on the complexity of the problem: Recent software improvements have eliminated most of the user identified limitations with dynamic memory allocation and a very large format description that has pushed potential problem sizes beyond the reach of most users. The dominant restrictions remain in code execution speed and robustness, which the developers constantly strive to improve
3D modelling of edge parallel flow asymmetries
The issue of parallel flows asymmetries in the edge plasma is tackled with a new first principle transport and turbulence code. TOKAM-3D is a 3D full-torus fluid code that can be used both in diffusive and turbulent regimes and covers either exclusively closed flux surfaces or both open and closed field lines in limiter geometry. Two independent mechanisms susceptible to lead to large amplitude asymmetric parallel flows are evidenced. Global ExB drifts coupled with the presence of the limiter break the poloidal symmetry and can generate large amplitude parallel flows even with poloidally uniform transport coefficients. On the other hand, turbulent transport in the edge exhibits a strong ballooning of the radial particle flux generating an up-down m = 1, n = 0 structure on the parallel velocity. The combination of both mechanisms in complete simulations leads to a poloidal and radial distribution of the parallel velocity comparable to experimental results.
Parallel Processor for 3D Recovery from Optical Flow
Jose Hugo Barron-Zambrano
2009-01-01
Full Text Available 3D recovery from motion has received a major effort in computer vision systems in the recent years. The main problem lies in the number of operations and memory accesses to be performed by the majority of the existing techniques when translated to hardware or software implementations. This paper proposes a parallel processor for 3D recovery from optical flow. Its main feature is the maximum reuse of data and the low number of clock cycles to calculate the optical flow, along with the precision with which 3D recovery is achieved. The results of the proposed architecture as well as those from processor synthesis are presented.
Parallel 3-D SN performance for DANTSYS/MPI on the Cray T3D
A data parallel version of the 3-D transport solver in DANTSYS has been in use on the SIMD CM-200's at LANL since 1994. This version typically obtains grind times of 150--200 nanoseconds on a 2,048 PE CM-200. The authors have now implemented a new message passing parallel version of DANTSYS, referred to as DANTSYS/MPI, on the 512 PE Cray T3D at Los Alamos. By taking advantage of the SPMD architecture of the Cray T3D, as well as its low latency communications network, they have managed to achieve grind times of less than 10 nanoseconds on real problems. DANTSYS/MPI is fully accelerated using DSA on both the inner and outer iterations. This paper describes the implementation of DANTSYS/MPI on the Cray T3D, and presents two simple performance models for the transport sweep which accurately predict the grind time as a function of the number of PE's and problem size, or scalability
Powerful supercomputers are available today. MBC-1000M is one of Russian supercomputers that may be used by distant way access. Programs LUCKY and LUCKYC were created to work for multi-processors systems. These programs have algorithms created especially for these computers and used MPI (message passing interface) service for exchanges between processors. LUCKY may resolved shielding tasks by multigroup discreet ordinate method. LUCKYC may resolve critical tasks by same method. Only XYZ orthogonal geometry is available. Under little space steps to approximate discreet operator this geometry may be used as universal one to describe complex geometrical structures. Cross section libraries are used up to P8 approximation by Legendre polynomials for nuclear data in GIT format. Programming language is Fortran-90. 'Vector' processors may be used that lets get a time profit up to 30 times. But unfortunately MBC-1000M has not these processors. Nevertheless sufficient value for efficiency of parallel calculations was obtained under 'space' (LUCKY) and 'space and energy' (LUCKYC) paralleling. AUTOCAD program is used to control geometry after a treatment of input data. Programs have powerful geometry module, it is a beautiful tool to achieve any geometry. Output results may be processed by graphic programs on personal computer. (authors)
Parallel Optimization of 3D Cardiac Electrophysiological Model Using GPU
Yong Xia
2015-01-01
Full Text Available Large-scale 3D virtual heart model simulations are highly demanding in computational resources. This imposes a big challenge to the traditional computation resources based on CPU environment, which already cannot meet the requirement of the whole computation demands or are not easily available due to expensive costs. GPU as a parallel computing environment therefore provides an alternative to solve the large-scale computational problems of whole heart modeling. In this study, using a 3D sheep atrial model as a test bed, we developed a GPU-based simulation algorithm to simulate the conduction of electrical excitation waves in the 3D atria. In the GPU algorithm, a multicellular tissue model was split into two components: one is the single cell model (ordinary differential equation and the other is the diffusion term of the monodomain model (partial differential equation. Such a decoupling enabled realization of the GPU parallel algorithm. Furthermore, several optimization strategies were proposed based on the features of the virtual heart model, which enabled a 200-fold speedup as compared to a CPU implementation. In conclusion, an optimized GPU algorithm has been developed that provides an economic and powerful platform for 3D whole heart simulations.
A parallel sweeping preconditioner for heterogeneous 3D Helmholtz equations
Poulson, Jack; Engquist, Björn; Li, Siwei; Ying, Lexing
2012-01-01
A parallelization of a sweeping preconditioner for 3D Helmholtz equations without large cavities is introduced and benchmarked for several challenging velocity models. The setup and application costs of the sequential preconditioner are shown to be O({\\gamma}^2 N^{4/3}) and O({\\gamma} N log N), where {\\gamma}({\\omega}) denotes the modestly frequency-dependent number of grid points per Perfectly Matched Layer. Several computational and memory improvements are introduced relative to using black...
Parallel acquisition of 3D-HA(CA)NH and 3D-HACACO spectra
Reddy, Jithender G.; Hosur, Ramakrishna V., E-mail: hosur@tifr.res.in [Tata Institute of Fundamental Research, Department of Chemical Sciences (India)
2013-06-15
We present here an NMR pulse sequence with 5 independent incrementable time delays within the frame of a 3-dimensional experiment, by incorporating polarization sharing and dual receiver concepts. This has been applied to directly record 3D-HA(CA)NH and 3D-HACACO spectra of proteins simultaneously using parallel detection of {sup 1}H and {sup 13}C nuclei. While both the experiments display intra-residue backbone correlations, the 3D-HA(CA)NH provides also sequential 'i - 1 {yields} i' correlation along the {sup 1}H{alpha} dimension. Both the spectra contain special peak patterns at glycine locations which serve as check points during the sequential assignment process. The 3D-HACACO spectrum contains, in addition, information on prolines and side chains of residues having H-C-CO network (i.e., {sup 1}H{beta}, {sup 13}C{beta} and {sup 13}CO{gamma} of Asp and Asn, and {sup 1}H{gamma}, {sup 13}C{gamma} and {sup 13}CO{delta} of Glu and Gln), which are generally absent in most conventional proton detected experiments.
Parallel acquisition of 3D-HA(CA)NH and 3D-HACACO spectra
We present here an NMR pulse sequence with 5 independent incrementable time delays within the frame of a 3-dimensional experiment, by incorporating polarization sharing and dual receiver concepts. This has been applied to directly record 3D-HA(CA)NH and 3D-HACACO spectra of proteins simultaneously using parallel detection of 1H and 13C nuclei. While both the experiments display intra-residue backbone correlations, the 3D-HA(CA)NH provides also sequential ‘i − 1 → i’ correlation along the 1Hα dimension. Both the spectra contain special peak patterns at glycine locations which serve as check points during the sequential assignment process. The 3D-HACACO spectrum contains, in addition, information on prolines and side chains of residues having H–C–CO network (i.e., 1Hβ, 13Cβ and 13COγ of Asp and Asn, and 1Hγ, 13Cγ and 13COδ of Glu and Gln), which are generally absent in most conventional proton detected experiments.
Parallel PAB3D: Experiences with a Prototype in MPI
Guerinoni, Fabio; Abdol-Hamid, Khaled S.; Pao, S. Paul
1998-01-01
PAB3D is a three-dimensional Navier Stokes solver that has gained acceptance in the research and industrial communities. It takes as computational domain, a set disjoint blocks covering the physical domain. This is the first report on the implementation of PAB3D using the Message Passing Interface (MPI), a standard for parallel processing. We discuss briefly the characteristics of tile code and define a prototype for testing. The principal data structure used for communication is derived from preprocessing "patching". We describe a simple interface (COMMSYS) for MPI communication, and some general techniques likely to be encountered when working on problems of this nature. Last, we identify levels of improvement from the current version and outline future work.
INGRID, 3-D Mesh Generator for Program DYNA3D and NIKE3D and FACET and TOPAZ3D
1 - Description of program or function: INGRID is a general-purpose, three-dimensional mesh generator developed for use with finite element, nonlinear, structural dynamics codes. INGRID generates the large and complex input data files for DYNA3D (NESC 9909), NIKE3D (NESC 9725), FACET, and TOPAZ3D. One of the greatest advantages of INGRID is that virtually any shape can be described without resorting to wedge elements, tetrahedrons, triangular elements or highly distorted quadrilateral or hexahedral elements. Other capabilities available are in the areas of geometry and graphics. Exact surface equations and surface intersections considerably improve the ability to deal with accurate models, and a hidden line graphics algorithm is included which is efficient on the most complicated meshes. The most important new capability is associated with the boundary conditions, loads, and material properties required by nonlinear mechanics programs. Commands have been designed for each case to minimize user effort. This is particularly important since special processing is almost always required for each load or boundary condition. 2 - Method of solution: Geometries are described primarily using the index space notation of the INGEN program (NESC 975) with an additional type of notation, index progression. Index progressions provide a concise and simple method for describing complex structures; the concept was developed to facilitate defining multiple regions in index space. Rather than specifying the minimum and maximum indices for a region, one specifies the progression of indices along the I, J and K directions, respectively. The index progression method allows the analyst to describe most geometries including nodes and elements with roughly the same amount of input as a solids modeler
DPGL: The Direct3D9-based Parallel Graphics Library for Multi-display Environment
Zhen Liu; Jiao-Ying Shi
2007-01-01
The emergence of high performance 3D graphics cards has opened the way to PC clusters for high performance multidisplay environment. In order to exploit the rendering ability of PC clusters, we should design appropriate parallel rendering algorithms and parallel graphics library interfaces. Due to the rapid development of Direct3D, we bring forward DPGL, the Direct3D9-based parallel graphics library in D3DPR parallel rendering system, which implements Direct3D9 interfaces to support existing Direct3D9 application parallelization with no modification. Based on the parallelism analysis of Direct3D9 rendering pipeline, we briefly introduce D3DPR parallel rendering system. DPGL is the fundamental component of D3DPR. After presenting DPGL three layers architecture,we discuss the rendering resource interception and management. Finally, we describe the design and implementation of DPGL in detail,including rendering command interception layer, rendering command interpretation layer and rendering resource parallelization layer.
PIXIE3D: An efficient, fully implicit, parallel, 3D extended MHD code for fusion plasma modeling
PIXIE3D is a modern, parallel, state-of-the-art extended MHD code that employs fully implicit methods for efficiency and accuracy. It features a general geometry formulation, and is therefore suitable for the study of many magnetic fusion configurations of interest. PIXIE3D advances the state of the art in extended MHD modeling in two fundamental ways. Firstly, it employs a novel conservative finite volume scheme which is remarkably robust and stable, and demands very small physical and/or numerical dissipation. This is a fundamental requirement when one wants to study fusion plasmas with realistic conductivities. Secondly, PIXIE3D features fully-implicit time stepping, employing Newton-Krylov methods for inverting the associated nonlinear systems. These methods have been shown to be scalable and efficient when preconditioned properly. Novel preconditioned ideas (so-called physics based), which were prototypes in the context of reduced MHD, have been adapted for 3D primitive-variable resistive MHD in PIXIE3D, and are currently being extended to Hall MHD. PIXIE3D is fully parallel, employing PETSc for parallelism. PIXIE3D has been thoroughly benchmarked against linear theory and against other available extended MHD codes on nonlinear test problems (such as the GEM reconnection challenge). We are currently in the process of extending such comparisons to fusion-relevant problems in realistic geometries. In this talk, we will describe both the spatial discretization approach and the preconditioning strategy employed for extended MHD in PIXIE3D. We will report on recent benchmarking studies between PIXIE3D and other 3D extended MHD codes, and will demonstrate its usefulness in a variety of fusion-relevant configurations such as Tokamaks and Reversed Field Pinches. (Author)
A parallel algorithm for 3D particle tracking and Lagrangian trajectory reconstruction
Particle-tracking methods are widely used in fluid mechanics and multi-target tracking research because of their unique ability to reconstruct long trajectories with high spatial and temporal resolution. Researchers have recently demonstrated 3D tracking of several objects in real time, but as the number of objects is increased, real-time tracking becomes impossible due to data transfer and processing bottlenecks. This problem may be solved by using parallel processing. In this paper, a parallel-processing framework has been developed based on frame decomposition and is programmed using the asynchronous object-oriented Charm++ paradigm. This framework can be a key step in achieving a scalable Lagrangian measurement system for particle-tracking velocimetry and may lead to real-time measurement capabilities. The parallel tracking algorithm was evaluated with three data sets including the particle image velocimetry standard 3D images data set #352, a uniform data set for optimal parallel performance and a computational-fluid-dynamics-generated non-uniform data set to test trajectory reconstruction accuracy, consistency with the sequential version and scalability to more than 500 processors. The algorithm showed strong scaling up to 512 processors and no inherent limits of scalability were seen. Ultimately, up to a 200-fold speedup is observed compared to the serial algorithm when 256 processors were used. The parallel algorithm is adaptable and could be easily modified to use any sequential tracking algorithm, which inputs frames of 3D particle location data and outputs particle trajectories
Parallel Hall effect from 3D single-component metamaterials
Kern, Christian; Kadic, Muamer; Wegener, Martin
2015-01-01
We propose a class of three-dimensional metamaterial architectures composed of a single doped semiconductor (e.g., n-Si) in air or vacuum that lead to unusual effective behavior of the classical Hall effect. Using an anisotropic structure, we numerically demonstrate a Hall voltage that is parallel---rather than orthogonal---to the external static magnetic-field vector ("parallel Hall effect"). The sign of this parallel Hall voltage can be determined by a structure parameter. Together with the...
Ranjan Sen
2012-01-01
Parallel programming is an extension of sequential programming; today, it is becoming the mainstream paradigm in day-to-day information processing. Its aim is to build the fastest programs on parallel computers. The methodologies for developing a parallelprogram can be put into integrated frameworks. Development focuses on algorithm, languages, and how the program is deployed on the parallel computer.
Parallel Simulation of 3-D Turbulent Flow Through Hydraulic Machinery
徐宇; 吴玉林
2003-01-01
Parallel calculational methods were used to analyze incompressible turbulent flow through hydraulic machinery. Two parallel methods were used to simulate the complex flow field. The space decomposition method divides the computational domain into several sub-ranges. Parallel discrete event simulation divides the whole task into several parts according to their functions. The simulation results were compared with the serial simulation results and particle image velocimetry (PIV) experimental results. The results give the distribution and configuration of the complex vortices and illustrate the effectiveness of the parallel algorithms for numerical simulation of turbulent flows.
Introduction to parallel programming
Brawer, Steven
1989-01-01
Introduction to Parallel Programming focuses on the techniques, processes, methodologies, and approaches involved in parallel programming. The book first offers information on Fortran, hardware and operating system models, and processes, shared memory, and simple parallel programs. Discussions focus on processes and processors, joining processes, shared memory, time-sharing with multiple processors, hardware, loops, passing arguments in function/subroutine calls, program structure, and arithmetic expressions. The text then elaborates on basic parallel programming techniques, barriers and race
Parallel Hall effect from 3D single-component metamaterials
Kern, Christian; Wegener, Martin
2015-01-01
We propose a class of three-dimensional metamaterial architectures composed of a single doped semiconductor (e.g., n-Si) in air or vacuum that lead to unusual effective behavior of the classical Hall effect. Using an anisotropic structure, we numerically demonstrate a Hall voltage that is parallel---rather than orthogonal---to the external static magnetic-field vector ("parallel Hall effect"). The sign of this parallel Hall voltage can be determined by a structure parameter. Together with the previously demonstrated positive or negative orthogonal Hall voltage, we demonstrate four different sign combinations
Parallelism in Constraint Programming
Rolf, Carl Christian
2011-01-01
Writing efficient parallel programs is the biggest challenge of the software industry for the foreseeable future. We are currently in a time when parallel computers are the norm, not the exception. Soon, parallel processors will be standard even in cell phones. Without drastic changes in hardware development, all software must be parallelized to its fullest extent. Parallelism can increase performance and reduce power consumption at the same time. Many programs will execute faster on a...
Parallel logic programming systems
Chassin De Kergommeaux, J.; Codognet, Philippe
1992-01-01
Parallelizing logic programming has attracted much interest in the research community, because of the intrinsic or and and parallelisms of logic programs. One research stream aims at transparent exploitation of parallelism in existing logic programming languages such as Prolog while the family of concurrent logic languages develops constructs allowing programmers to express the concurrency, that is the communication and synchronization between parallel process, inside their algorithms. This p...
DANTSYS/MPI: a system for 3-D deterministic transport on parallel architectures
Baker, R.S.; Alcouffe, R.E.
1996-12-31
Since 1994, we have been using a data parallel form of our deterministic transport code DANTSYS to perform time-independent fixed source and eigenvalue calculations on the CM-200`s at Los Alamos National Laboratory (LANL). Parallelization of the transport sweep is obtained by using a 2-D spatial decomposition which retains the ability to invert the source iteration equation in a single iteration (i.e., the diagonal plane sweep). We have now implemented a message passing version of DANTSYS, referred to as DANTSYS/MPI, on the Cray T3D installed at Los Alamos in 1995. By taking advantage of the SPMD (Single Program, Multiple Data) architecture of the Cray T3D, as well as its low latency communications network, we have managed to achieve grind times (time to solve a single cell in phase space) of less than 10 nanoseconds on the 512 PE (Processing Element) T3D, as opposed to typical grind times of 150-200 nanoseconds on a 2048 PE CM-200, or 300-400 nanoseconds on a single PE of a Cray Y-MP. In addition, we have also parallelized the Diffusion Synthetic Accelerator (DSA) equations which are used to accelerate the convergence of the transport equation. DANTSYS/MPI currently runs on traditional Cray PVP`s and the Cray T3D, and it`s computational kernel (Sweep3D) has been ported to and tested on an array of SGI SMP`s (Symmetric Memory Processors), a network of IBM 590 workstations, an IBM SP2, and the Intel TFLOPs machine at Sandia National Laboratory. This paper describes the implementation of DANTSYS/MPI on the Cray T3D, and presents a simple performance model which accurately predicts the grind time as a function of the number of PE`s and problem size, or scalability. This paper also describes the parallel implementation and performance of the elliptic solver used in DANTSYS/MPI for solving the synthetic acceleration equations.
A Parallel Sweeping Preconditioner for Heterogeneous 3D Helmholtz Equations
Poulson, Jack
2013-05-02
A parallelization of a sweeping preconditioner for three-dimensional Helmholtz equations without large cavities is introduced and benchmarked for several challenging velocity models. The setup and application costs of the sequential preconditioner are shown to be O(γ2N4/3) and O(γN logN), where γ(ω) denotes the modestly frequency-dependent number of grid points per perfectly matched layer. Several computational and memory improvements are introduced relative to using black-box sparse-direct solvers for the auxiliary problems, and competitive runtimes and iteration counts are reported for high-frequency problems distributed over thousands of cores. Two open-source packages are released along with this paper: Parallel Sweeping Preconditioner (PSP) and the underlying distributed multifrontal solver, Clique. © 2013 Society for Industrial and Applied Mathematics.
Parallel deterministic neutronics with AMR in 3D
Clouse, C.; Ferguson, J.; Hendrickson, C. [Lawrence Livermore National Lab., CA (United States)
1997-12-31
AMTRAN, a three dimensional Sn neutronics code with adaptive mesh refinement (AMR) has been parallelized over spatial domains and energy groups and runs on the Meiko CS-2 with MPI message passing. Block refined AMR is used with linear finite element representations for the fluxes, which allows for a straight forward interpretation of fluxes at block interfaces with zoning differences. The load balancing algorithm assumes 8 spatial domains, which minimizes idle time among processors.
Rastogi, Richa; Srivastava, Abhishek; Khonde, Kiran; Sirasala, Kirannmayi M.; Londhe, Ashutosh; Chavhan, Hitesh
2015-07-01
This paper presents an efficient parallel 3D Kirchhoff depth migration algorithm suitable for current class of multicore architecture. The fundamental Kirchhoff depth migration algorithm exhibits inherent parallelism however, when it comes to 3D data migration, as the data size increases the resource requirement of the algorithm also increases. This challenges its practical implementation even on current generation high performance computing systems. Therefore a smart parallelization approach is essential to handle 3D data for migration. The most compute intensive part of Kirchhoff depth migration algorithm is the calculation of traveltime tables due to its resource requirements such as memory/storage and I/O. In the current research work, we target this area and develop a competent parallel algorithm for post and prestack 3D Kirchhoff depth migration, using hybrid MPI+OpenMP programming techniques. We introduce a concept of flexi-depth iterations while depth migrating data in parallel imaging space, using optimized traveltime table computations. This concept provides flexibility to the algorithm by migrating data in a number of depth iterations, which depends upon the available node memory and the size of data to be migrated during runtime. Furthermore, it minimizes the requirements of storage, I/O and inter-node communication, thus making it advantageous over the conventional parallelization approaches. The developed parallel algorithm is demonstrated and analysed on Yuva II, a PARAM series of supercomputers. Optimization, performance and scalability experiment results along with the migration outcome show the effectiveness of the parallel algorithm.
Gamble, James Graham
1990-01-01
While many parallel programming languages exist, they rarely address programming languages from the issue of communication (implying expressability, and readability). A new language called Explicit Parallel Programming (EPP), attempts to provide this quality by separating the responsibility for the execution of run time actions from the responsibility for deciding the order in which they occur. The ordering of a parallel algorithm is specified in the new EPP language; run ti...
Parallel processing for efficient 3D slope stability modelling
Marchesini, Ivan; Mergili, Martin; Alvioli, Massimiliano; Metz, Markus; Schneider-Muntau, Barbara; Rossi, Mauro; Guzzetti, Fausto
2014-05-01
We test the performance of the GIS-based, three-dimensional slope stability model r.slope.stability. The model was developed as a C- and python-based raster module of the GRASS GIS software. It considers the three-dimensional geometry of the sliding surface, adopting a modification of the model proposed by Hovland (1977), and revised and extended by Xie and co-workers (2006). Given a terrain elevation map and a set of relevant thematic layers, the model evaluates the stability of slopes for a large number of randomly selected potential slip surfaces, ellipsoidal or truncated in shape. Any single raster cell may be intersected by multiple sliding surfaces, each associated with a value of the factor of safety, FS. For each pixel, the minimum value of FS and the depth of the associated slip surface are stored. This information is used to obtain a spatial overview of the potentially unstable slopes in the study area. We test the model in the Collazzone area, Umbria, central Italy, an area known to be susceptible to landslides of different type and size. Availability of a comprehensive and detailed landslide inventory map allowed for a critical evaluation of the model results. The r.slope.stability code automatically splits the study area into a defined number of tiles, with proper overlap in order to provide the same statistical significance for the entire study area. The tiles are then processed in parallel by a given number of processors, exploiting a multi-purpose computing environment at CNR IRPI, Perugia. The map of the FS is obtained collecting the individual results, taking the minimum values on the overlapping cells. This procedure significantly reduces the processing time. We show how the gain in terms of processing time depends on the tile dimensions and on the number of cores.
Foster, I.; Tuecke, S.
1991-12-01
PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).
Saez, Fernando; Printista, Alicia Marcela; Piccoli, María Fabiana
2007-01-01
In the last time the high-performance programming community has worked to look for new templates or skeletons for several parallel programming paradigms. This new form of programming allows to programmer to reduce the time of development, since it saves time in the phase of design, testing and codification. We are concerned in some issues of skeletons that are fundamental to the definition of any skeletal parallel programming system. This paper present commentaries about these issues in the c...
A parallel multigrid-based preconditioner for the 3D heterogeneous high-frequency Helmholtz equation
We investigate the parallel performance of an iterative solver for 3D heterogeneous Helmholtz problems related to applications in seismic wave propagation. For large 3D problems, the computation is no longer feasible on a single processor, and the memory requirements increase rapidly. Therefore, parallelization of the solver is needed. We employ a complex shifted-Laplace preconditioner combined with the Bi-CGSTAB iterative method and use a multigrid method to approximate the inverse of the resulting preconditioning operator. A 3D multigrid method with 2D semi-coarsening is employed. We show numerical results for large problems arising in geophysical applications
Programming Parallel Computers
Chandy, K. Mani
1988-01-01
This paper is from a keynote address to the IEEE International Conference on Computer Languages, October 9, 1988. Keynote addresses are expected to be provocative (and perhaps even entertaining), but not necessarily scholarly. The reader should be warned that this talk was prepared with these expectations in mind.Parallel computers offer the potential of great speed at low cost. The promise of parallelism is limited by the ability to program parallel machines effectively. This paper explores ...
Compositional C++: Compositional Parallel Programming
Chandy, K. Mani; Kesselman, Carl
1992-01-01
A compositional parallel program is a program constructed by composing component programs in parallel, where the composed program inherits properties of its components. In this paper, we describe a small extension of C++ called Compositional C++ or CC++ which is an object-oriented notation that supports compositional parallel programming. CC++ integrates different paradigms of parallel programming: data-parallel, task-parallel and object-parallel paradigms; imperative and declarative programm...
The 3D Elevation Program: summary of program direction
Snyder, Gregory I.
2012-01-01
The 3D Elevation Program (3DEP) initiative responds to a growing need for high-quality topographic data and a wide range of other three-dimensional representations of the Nation's natural and constructed features. The National Enhanced Elevation Assessment (NEEA), which was completed in 2011, clearly documented this need within government and industry sectors. The results of the NEEA indicated that enhanced elevation data have the potential to generate $13 billion in new benefits annually. The benefits apply to food risk management, agriculture, water supply, homeland security, renewable energy, aviation safety, and other areas. The 3DEP initiative was recommended by the National Digital Elevation Program and its 12 Federal member agencies and was endorsed by the National States Geographic Information Council (NSGIC) and the National Geospatial Advisory Committee (NGAC).
DANTSYS/MPI- a system for 3-D deterministic transport on parallel architectures
A data parallel version of the 3-D transport solver in DANTSYS has been in use on the SIMD CM-200s at LANL since 1994. This version typically obtains grind times of 150-200 nanoseconds on a 2048 PE CM-200. A new message passing parallel version of DANTSYS has been implemented referred to as DANTSYS/MPI, on the 512 PE Cray T3D at Los Alamos. By taking advantage of the SPMD architecture of the Cray T3D, as well as its low latency communications network, we have managed to achieve grind times of less than 10 nanoseconds on real problems. DANTSYS/MPI is fully accelerated using DSA on both the inner and outer iterations. The implementation is described of DANTSYS/MPI on the Cray T3D, and presents a simple performance model which accurately predicts the grind time as a function of the number of PE's and problem size, or scalableness. (author)
Foster, I.; Tuecke, S.
1993-01-01
PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.
Christopher D. Dharmaraj
2009-01-01
Full Text Available Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23×23×23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet. The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.
Dharmaraj, Christopher D; Thadikonda, Kishan; Fletcher, Anthony R; Doan, Phuc N; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A; Cook, John A; Mitchell, James B; Subramanian, Sankaran; Krishna, Murali C
2009-01-01
Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 x 23 x 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time. PMID:19672315
Parallel Isosurface Extraction for 3D Data Analysis Workflows in Distributed Environments
D'Agostino, Daniele; Clematis, Andrea; Gianuzzi, Vittoria
2011-01-01
Abstract In this paper we discuss the issues related to the development of efficient parallel implementations of the Marching Cubes algorithm, one of the most used methods for isosurface extraction, which is a fundamental operation for 3D data analysis and visualization. We present three possible parallelization strategies and we outline pros and cons of each of them, considering isosurface extraction as stand-alone operation or as part of a dynamic workflow. Our analysis shows tha...
Parallel Programming with Intel Parallel Studio XE
Blair-Chappell , Stephen
2012-01-01
Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the
Rotation symmetry axes and the quality index in a 3D octahedral parallel robot manipulator system
Tanev, T. K.; Rooney, J.
2002-01-01
The geometry of a 3D octahedral parallel robot manipulator system is specified in terms of two rigid octahedral structures (the fixed and moving platforms) and six actuation legs. The symmetry of the system is exploited to determine the behaviour of (a new version of) the quality index for various motions. The main results are presented graphically.
A burnup corrected 3-D nodal depletion method for vector and parallel computer architectures
The 2- and 3-D nodal depletion code NOMAD-BC was parallelized and vectorized (3-D only). A 3-D, 2-cycle depletion problem was devised and successfully solved with the NOMAD-BC code in less than 35 seconds on two CPUs of a Cray X-MP/48. This shows a combined vectorization and parallelization speedup of 8.6. The same problem was solved on a 2-CPU 16 MHz SGI workstation in less than one hour, exhibiting a 1.78 speedup over the single processor solution on the same machine. It is shown in this work that complex and detailed burnup computations can be successfully optimized. In addition, the performance achieved demonstrates the possibility of obtaining results within very reasonable times, even on inexpensive workstations. Finally, the small CPU time requirements should make possible the routine evaluation of fuel cycles at great savings of the engineer's time. (author)
RELAP5-3D Developer Guidelines and Programming Practices
Dr. George L Mesina
2014-03-01
Our ultimate goal is to create and maintain RELAP5-3D as the best software tool available to analyze nuclear power plants. This begins with writing excellent programming and requires thorough testing. This document covers development of RELAP5-3D software, the behavior of the RELAP5-3D program that must be maintained, and code testing. RELAP5-3D must perform in a manner consistent with previous code versions with backward compatibility for the sake of the users. Thus file operations, code termination, input and output must remain consistent in form and content while adding appropriate new files, input and output as new features are developed. As computer hardware, operating systems, and other software change, RELAP5-3D must adapt and maintain performance. The code must be thoroughly tested to ensure that it continues to perform robustly on the supported platforms. The coding must be written in a consistent manner that makes the program easy to read to reduce the time and cost of development, maintenance and error resolution. The programming guidelines presented her are intended to institutionalize a consistent way of writing FORTRAN code for the RELAP5-3D computer program that will minimize errors and rework. A common format and organization of program units creates a unifying look and feel to the code. This in turn increases readability and reduces time required for maintenance, development and debugging. It also aids new programmers in reading and understanding the program. Therefore, when undertaking development of the RELAP5-3D computer program, the programmer must write computer code that follows these guidelines. This set of programming guidelines creates a framework of good programming practices, such as initialization, structured programming, and vector-friendly coding. It sets out formatting rules for lines of code, such as indentation, capitalization, spacing, etc. It creates limits on program units, such as subprograms, functions, and modules. It
Programming standards for effective S-3D game development
Schneider, Neil; Matveev, Alexander
2008-02-01
When a video game is in development, more often than not it is being rendered in three dimensions - complete with volumetric depth. It's the PC monitor that is taking this three-dimensional information, and artificially displaying it in a flat, two-dimensional format. Stereoscopic drivers take the three-dimensional information captured from DirectX and OpenGL calls and properly display it with a unique left and right sided view for each eye so a proper stereoscopic 3D image can be seen by the gamer. The two-dimensional limitation of how information is displayed on screen has encouraged programming short-cuts and work-arounds that stifle this stereoscopic 3D effect, and the purpose of this guide is to outline techniques to get the best of both worlds. While the programming requirements do not significantly add to the game development time, following these guidelines will greatly enhance your customer's stereoscopic 3D experience, increase your likelihood of earning Meant to be Seen certification, and give you instant cost-free access to the industry's most valued consumer base. While this outline is mostly based on NVIDIA's programming guide and iZ3D resources, it is designed to work with all stereoscopic 3D hardware solutions and is not proprietary in any way.
Greenwood, J.; Rucker, D.; Levitt, M.; Yang, X.; Lagmanson, M.
2007-12-01
High Resolution Resistivity data is currently used by hydroGEOPHYSICS, Inc to detect and characterize the distribution of suspected contaminant plumes beneath leaking tanks and disposal sites within the U.S. Department of Energy Hanford Site, in Eastern Washington State. The success of the characterization effort has led to resistivity data acquisition in extremely large survey areas exceeding 0.6 km2 and containing over 6,000 electrodes. Optimal data processing results are achieved by utilizing 105 data points within a single finite difference or finite element model domain. The large number of measurements and electrodes and high resolution of the modeling domain requires a model mesh of over 106 nodes. Existing commercially available resistivity inversion software could not support the domain size due to software and hardware limitations. hydroGEOPHYSICS, Inc teamed with Advanced Geosciences, Inc to advance the existing EarthImager3D inversion software to allow for parallel-processing and large memory support under a 64 bit operating system. The basis for the selection of EarthImager3D is demonstrated with a series of verification tests and benchmark comparisons using synthetic test models, field scale experiments and 6 months of intensive modeling using an array of multi-processor servers. The results of benchmark testing show equivalence to other industry standard inversion codes that perform the same function on significantly smaller domain models. hydroGEOPHYSICS, Inc included the use of 214 steel-cased monitoring wells as "long electrodes", 6000 surface electrodes and 8 buried point source electrodes. Advanced Geosciences, Inc. implemented a long electrode modeling function to support the Hanford Site well casing data. This utility is unique to commercial resistivity inversion software, and was evaluated through a series of laboratory and field scale tests using engineered subsurface plumes. The Hanford site is an ideal proving ground for these methods due
Li, Yong Gang; Yang, Yang; Short, Michael P.; Ding, Ze Jun; Zeng, Zhi; Li, Ju
2015-12-01
SRIM-like codes have limitations in describing general 3D geometries, for modeling radiation displacements and damage in nanostructured materials. A universal, computationally efficient and massively parallel 3D Monte Carlo code, IM3D, has been developed with excellent parallel scaling performance. IM3D is based on fast indexing of scattering integrals and the SRIM stopping power database, and allows the user a choice of Constructive Solid Geometry (CSG) or Finite Element Triangle Mesh (FETM) method for constructing 3D shapes and microstructures. For 2D films and multilayers, IM3D perfectly reproduces SRIM results, and can be ∼102 times faster in serial execution and > 104 times faster using parallel computation. For 3D problems, it provides a fast approach for analyzing the spatial distributions of primary displacements and defect generation under ion irradiation. Herein we also provide a detailed discussion of our open-source collision cascade physics engine, revealing the true meaning and limitations of the “Quick Kinchin-Pease” and “Full Cascades” options. The issues of femtosecond to picosecond timescales in defining displacement versus damage, the limitation of the displacements per atom (DPA) unit in quantifying radiation damage (such as inadequacy in quantifying degree of chemical mixing), are discussed.
MPI is a practical, portable, efficient and flexible standard for message passing, which has been implemented on most MPPs and network of workstations by machine vendors, universities and national laboratories. MPI avoids specifying how operations will take place and superfluous work to achieve efficiency as well as portability, and is also designed to encourage overlapping communication and computation to hide communication latencies. This presentation briefly explains the MPI standard, and comments on efficient parallel programming to improve performance. (author)
Gust Acoustics Computation with a Space-Time CE/SE Parallel 3D Solver
Wang, X. Y.; Himansu, A.; Chang, S. C.; Jorgenson, P. C. E.; Reddy, D. R. (Technical Monitor)
2002-01-01
The benchmark Problem 2 in Category 3 of the Third Computational Aero-Acoustics (CAA) Workshop is solved using the space-time conservation element and solution element (CE/SE) method. This problem concerns the unsteady response of an isolated finite-span swept flat-plate airfoil bounded by two parallel walls to an incident gust. The acoustic field generated by the interaction of the gust with the flat-plate airfoil is computed by solving the 3D (three-dimensional) Euler equations in the time domain using a parallel version of a 3D CE/SE solver. The effect of the gust orientation on the far-field directivity is studied. Numerical solutions are presented and compared with analytical solutions, showing a reasonable agreement.
2D/3D Program work summary report
The 2D/3D Program was carried out by Germany, Japan and the United States to investigate the thermal-hydraulics of a PWR large-break LOCA. A contributory approach was utilized in which each country contributed significant effort to the program and all three countries shared the research results. Germany constructed and operated the Upper Plenum Test Facility (UPTF), and Japan constructed and operated the Cylindrical Core Test Facility (CCTF) and the Slab Core Test Facility (SCTF). The US contribution consisted of provision of advanced instrumentation to each of the three test facilities, and assessment of the TRAC computer code against the test results. Evaluations of the test results were carried out in all three countries. This report summarizes the 2D/3D Program in terms of the contributing efforts of the participants, and was prepared in a coordination among three countries. US and Germany have published the report as NUREG/IA-0126 and GRS-100, respectively. (author)
Stiffness Analysis of 3-d.o.f. Overconstrained Translational Parallel Manipulators
Pashkevich, Anatoly; Wenger, Philippe
2008-01-01
The paper presents a new stiffness modelling method for overconstrained parallel manipulators, which is applied to 3-d.o.f. translational mechanisms. It is based on a multidimensional lumped-parameter model that replaces the link flexibility by localized 6-d.o.f. virtual springs. In contrast to other works, the method includes a FEA-based link stiffness evaluation and employs a new solution strategy of the kinetostatic equations, which allows computing the stiffness matrix for the overconstrained architectures and for the singular manipulator postures. The advantages of the developed technique are confirmed by application examples, which deal with comparative stiffness analysis of two translational parallel manipulators.
3D magnetospheric parallel hybrid multi-grid method applied to planet-plasma interactions
Leclercq, Ludivine; Modolo, Ronan; Leblanc, François; Hess, Sebastien; Mancini, Marco
2016-01-01
We present a new method to exploit multiple refinement levels within a 3D parallel hybrid model, developed to study planet-plasma interactions. This model is based on the hybrid formalism: ions are kinetically treated whereas electrons are considered as a inertia-less fluid. Generally, ions are represented by numerical particles whose size equals the volume of the cells. Particles that leave a coarse grid subsequently entering a refined region are split into particles whose volume corresponds...
A Parallel Implementation of the Mortar Element Method in 2D and 3D
Samake A.
2013-12-01
Full Text Available We present here the generic parallel computational framework in C++called Feel++for the mortar finite element method with the arbitrary number of subdomain partitions in 2D and 3D. An iterative method with block-diagonal preconditioners is used for solving the algebraic saddle-point problem arising from the finite element discretization. Finally we present a scalability study and the numerical results obtained using Feel++ library.
Edge-based electric field formulation in 3D CSEM simulations: A parallel approach
Castillo-Reyes, Octavio; de la Puente, Josep; Puzyrev, Vladimir; Cela, José M.
2015-01-01
This paper presents a parallel computing scheme for the data computation that arise when applying one of the most popular electromagnetic methods in exploration geophysics, namely, controlled-source electromagnetic (CSEM). The computational approach is based on linear edge finite element method in 3D isotropic domains. The total electromagnetic field is decomposed into primary and secondary electromagnetic field. The primary field is calculated analytically using an horizontal layered-e...
Tolerant (parallel) Programming
DiNucci, David C.; Bailey, David H. (Technical Monitor)
1997-01-01
In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.
Spatial Parallelism of a 3D Finite Difference, Velocity-Stress Elastic Wave Propagation Code
MINKOFF,SUSAN E.
1999-12-09
Finite difference methods for solving the wave equation more accurately capture the physics of waves propagating through the earth than asymptotic solution methods. Unfortunately. finite difference simulations for 3D elastic wave propagation are expensive. We model waves in a 3D isotropic elastic earth. The wave equation solution consists of three velocity components and six stresses. The partial derivatives are discretized using 2nd-order in time and 4th-order in space staggered finite difference operators. Staggered schemes allow one to obtain additional accuracy (via centered finite differences) without requiring additional storage. The serial code is most unique in its ability to model a number of different types of seismic sources. The parallel implementation uses the MP1 library, thus allowing for portability between platforms. Spatial parallelism provides a highly efficient strategy for parallelizing finite difference simulations. In this implementation, one can decompose the global problem domain into one-, two-, and three-dimensional processor decompositions with 3D decompositions generally producing the best parallel speed up. Because i/o is handled largely outside of the time-step loop (the most expensive part of the simulation) we have opted for straight-forward broadcast and reduce operations to handle i/o. The majority of the communication in the code consists of passing subdomain face information to neighboring processors for use as ''ghost cells''. When this communication is balanced against computation by allocating subdomains of reasonable size, we observe excellent scaled speed up. Allocating subdomains of size 25 x 25 x 25 on each node, we achieve efficiencies of 94% on 128 processors. Numerical examples for both a layered earth model and a homogeneous medium with a high-velocity blocky inclusion illustrate the accuracy of the parallel code.
New adaptive differencing strategy in the PENTRAN 3-d parallel Sn code
It is known that three-dimensional (3-D) discrete ordinates (Sn) transport problems require an immense amount of storage and computational effort to solve. For this reason, parallel codes that offer a capability to completely decompose the angular, energy, and spatial domains among a distributed network of processors are required. One such code recently developed is PENTRAN, which iteratively solves 3-D multi-group, anisotropic Sn problems on distributed-memory platforms, such as the IBM-SP2. Because large problems typically contain several different material zones with various properties, available differencing schemes should automatically adapt to the transport physics in each material zone. To minimize the memory and message-passing overhead required for massively parallel Sn applications, available differencing schemes in an adaptive strategy should also offer reasonable accuracy and positivity, yet require only the zeroth spatial moment of the transport equation; differencing schemes based on higher spatial moments, in spite of their greater accuracy, require at least twice the amount of storage and communication cost for implementation in a massively parallel transport code. This paper discusses a new adaptive differencing strategy that uses increasingly accurate schemes with low parallel memory and communication overhead. This strategy, implemented in PENTRAN, includes a new scheme, exponential directional averaged (EDA) differencing
3-D electromagnetic plasma particle simulations on the Intel Delta parallel computer
A three-dimensional electromagnetic PIC code has been developed on the 512 node Intel Touchstone Delta MIMD parallel computer. This code is based on the General Concurrent PIC algorithm which uses a domain decomposition to divide the computation among the processors. The 3D simulation domain can be partitioned into 1-, 2-, or 3-dimensional sub-domains. Particles must be exchanged between processors as they move among the subdomains. The Intel Delta allows one to use this code for very-large-scale simulations (i.e. over 108 particles and 106 grid cells). The parallel efficiency of this code is measured, and the overall code performance on the Delta is compared with that on Cray supercomputers. It is shown that their code runs with a high parallel efficiency of ≥ 95% for large size problems. The particle push time achieved is 115 nsecs/particle/time step for 162 million particles on 512 nodes. Comparing with the performance on a single processor Cray C90, this represents a factor of 58 speedup. The code uses a finite-difference leap frog method for field solve which is significantly more efficient than fast fourier transforms on parallel computers. The performance of this code on the 128 node Cray T3D will also be discussed
Description of a parallel, 3D, finite element, hydrodynamics-diffusion code
We describe a parallel, 3D, unstructured grid finite element, hydrodynamic diffusion code for inertial confinement fusion (ICF) applications and the ancillary software used to run it. The code system is divided into two entities, a controller and a stand-alone physics code. The code system may reside on different computers; the controller on the user s workstation and the physics code on a supercomputer. The physics code is composed of separate hydrodynamic, equation-of-state, laser energy deposition, heat conduction, and radiation transport packages and is parallelized for distributed memory architectures. For parallelization, a SPMD model is adopted; the domain is decomposed into a disjoint collection of sub-domains, one per processing element (PE). The PEs communicate using MPI. The code is used to simulate the hydrodynamic implosion of a spherical bubble
Parallel computation of 3-D Navier-Stokes flowfields for supersonic vehicles
Ryan, James S.; Weeratunga, Sisira
1993-01-01
Multidisciplinary design optimization of aircraft will require unprecedented capabilities of both analysis software and computer hardware. The speed and accuracy of the analysis will depend heavily on the computational fluid dynamics (CFD) module which is used. A new CFD module has been developed to combine the robust accuracy of conventional codes with the ability to run on parallel architectures. This is achieved by parallelizing the ARC3D algorithm, a central-differenced Navier-Stokes method, on the Intel iPSC/860. The computed solutions are identical to those from conventional machines. Computational speed on 64 processors is comparable to the rate on one Cray Y-MP processor and will increase as new generations of parallel computers become available.
Kolotilina, L.; Nikishin, A.; Yeremin, A. [and others
1994-12-31
The solution of large systems of linear equations is a crucial bottleneck when performing 3D finite element analysis of structures. Also, in many cases the reliability and robustness of iterative solution strategies, and their efficiency when exploiting hardware resources, fully determine the scope of industrial applications which can be solved on a particular computer platform. This is especially true for modern vector/parallel supercomputers with large vector length and for modern massively parallel supercomputers. Preconditioned iterative methods have been successfully applied to industrial class finite element analysis of structures. The construction and application of high quality preconditioners constitutes a high percentage of the total solution time. Parallel implementation of high quality preconditioners on such architectures is a formidable challenge. Two common types of existing preconditioners are the implicit preconditioners and the explicit preconditioners. The implicit preconditioners (e.g. incomplete factorizations of several types) are generally high quality but require solution of lower and upper triangular systems of equations per iteration which are difficult to parallelize without deteriorating the convergence rate. The explicit type of preconditionings (e.g. polynomial preconditioners or Jacobi-like preconditioners) require sparse matrix-vector multiplications and can be parallelized but their preconditioning qualities are less than desirable. The authors present results of numerical experiments with Factorized Sparse Approximate Inverses (FSAI) for symmetric positive definite linear systems. These are high quality preconditioners that possess a large resource of parallelism by construction without increasing the serial complexity.
Object-Oriented Parallel Programming
Givelberg, Edward
2014-01-01
We introduce an object-oriented framework for parallel programming, which is based on the observation that programming objects can be naturally interpreted as processes. A parallel program consists of a collection of persistent processes that communicate by executing remote methods. We discuss code parallelization and process persistence, and explain the main ideas in the context of computations with very large data objects.
Advanced quadratures and periodic boundary conditions in parallel 3D S{sub n} transport
Manalo, K.; Yi, C.; Huang, M.; Sjoden, G. [Nuclear and Radiological Engineering Program, G.W. Woodruff School of Mechanical Engineering, Georgia Institute of Technology, 770 State Street, Atlanta, GA 30332-0745 (United States)
2013-07-01
Significant updates in numerical quadratures have warranted investigation with 3D Sn discrete ordinates transport. We show new applications of quadrature departing from level symmetric (S{sub 2}o). investigating 3 recently developed quadratures: Even-Odd (EO), Linear-Discontinuous Finite Element - Surface Area (LDFE-SA), and the non-symmetric Icosahedral Quadrature (IC). We discuss implementation changes to 3D Sn codes (applied to Hybrid MOC-Sn TITAN and 3D parallel PENTRAN) that can be performed to accommodate Icosahedral Quadrature, as this quadrature is not 90-degree rotation invariant. In particular, as demonstrated using PENTRAN, the properties of Icosahedral Quadrature are suitable for trivial application using periodic BCs versus that of reflective BCs. In addition to implementing periodic BCs for 3D Sn PENTRAN, we implemented a technique termed 'angular re-sweep' which properly conditions periodic BCs for outer eigenvalue iterative loop convergence. As demonstrated by two simple transport problems (3-group fixed source and 3-group reflected/periodic eigenvalue pin cell), we remark that all of the quadratures we investigated are generally superior to level symmetric quadrature, with Icosahedral Quadrature performing the most efficiently for problems tested. (authors)
Schultz, Anthony [Nouvel Hopital Civil, Strasbourg University Hospital, Radiology Department, Strasbourg Cedex (France); Nouvel Hopital Civil, Service de Radiologie, Strasbourg Cedex (France); Caspar, Thibault [Nouvel Hopital Civil, Strasbourg University Hospital, Cardiology Department, Strasbourg Cedex (France); Schaeffer, Mickael [Nouvel Hopital Civil, Strasbourg University Hospital, Public Health and Biostatistics Department, Strasbourg Cedex (France); Labani, Aissam; Jeung, Mi-Young; El Ghannudi, Soraya; Roy, Catherine [Nouvel Hopital Civil, Strasbourg University Hospital, Radiology Department, Strasbourg Cedex (France); Ohana, Mickael [Nouvel Hopital Civil, Strasbourg University Hospital, Radiology Department, Strasbourg Cedex (France); Universite de Strasbourg / CNRS, UMR 7357, iCube Laboratory, Illkirch (France)
2016-06-15
To qualitatively and quantitatively compare different late gadolinium enhancement (LGE) sequences acquired at 3T with a parallel RF transmission technique. One hundred and sixty participants prospectively enrolled underwent a 3T cardiac MRI with 3 different LGE sequences: 3D Phase-Sensitive Inversion-Recovery (3D-PSIR) acquired 5 minutes after injection, 3D Inversion-Recovery (3D-IR) at 9 minutes and 3D-PSIR at 13 minutes. All LGE-positive patients were qualitatively evaluated both independently and blindly by two radiologists using a 4-level scale, and quantitatively assessed with measurement of contrast-to-noise ratio and LGE maximal surface. Statistical analyses were calculated under a Bayesian paradigm using MCMC methods. Fifty patients (70 % men, 56yo ± 19) exhibited LGE (62 % were post-ischemic, 30 % related to cardiomyopathy and 8 % post-myocarditis). Early and late 3D-PSIR were superior to 3D-IR sequences (global quality, estimated coefficient IR > early-PSIR: -2.37 CI = [-3.46; -1.38], prob(coef > 0) = 0 % and late-PSIR > IR: 3.12 CI = [0.62; 4.41], prob(coef > 0) = 100 %), LGE surface estimated coefficient IR > early-PSIR: -0.09 CI = [-1.11; -0.74], prob(coef > 0) = 0 % and late-PSIR > IR: 0.96 CI = [0.77; 1.15], prob(coef > 0) = 100 %. Probabilities for late PSIR being superior to early PSIR concerning global quality and CNR were over 90 %, regardless of the aetiological subgroup. In 3T cardiac MRI acquired with parallel RF transmission technique, 3D-PSIR is qualitatively and quantitatively superior to 3D-IR. (orig.)
Comparison of 3-D Synthetic Aperture Phased-Array Ultrasound Imaging and Parallel Beamforming
Rasmussen, Morten Fischer; Jensen, Jørgen Arendt
2014-01-01
This paper demonstrates that synthetic apertureimaging (SAI) can be used to achieve real-time 3-D ultra-sound phased-array imaging. It investigates whether SAI in-creases the image quality compared with the parallel beam-forming (PB) technique for real-time 3-D imaging. Data areobtained using both...... simulations and measurements with anultrasound research scanner and a commercially available 3.5-MHz 1024-element 2-D transducer array. To limit the probecable thickness, 256 active elements are used in transmit andreceive for both techniques. The two imaging techniques weredesigned for cardiac imaging, which...... requires sequences de-signed for imaging down to 15cm of depth and a frame rateof at least 20Hz. The imaging quality of the two techniquesis investigated through simulations as a function of depth andangle. SAI improved the full-width at half-maximum (FWHM) at low steering angles by 35%, and the 20-d...
Parallel Imaging of 3D Surface Profile with Space-Division Multiplexing
Hyung Seok Lee
2016-01-01
Full Text Available We have developed a modified optical frequency domain imaging (OFDI system that performs parallel imaging of three-dimensional (3D surface profiles by using the space division multiplexing (SDM method with dual-area swept sourced beams. We have also demonstrated that 3D surface information for two different areas could be well obtained in a same time with only one camera by our method. In this study, double field of views (FOVs of 11.16 mm × 5.92 mm were achieved within 0.5 s. Height range for each FOV was 460 µm and axial and transverse resolutions were 3.6 and 5.52 µm, respectively.
Parallel Imaging of 3D Surface Profile with Space-Division Multiplexing
Lee, Hyung Seok; Cho, Soon-Woo; Kim, Gyeong Hun; Jeong, Myung Yung; Won, Young Jae; Kim, Chang-Seok
2016-01-01
We have developed a modified optical frequency domain imaging (OFDI) system that performs parallel imaging of three-dimensional (3D) surface profiles by using the space division multiplexing (SDM) method with dual-area swept sourced beams. We have also demonstrated that 3D surface information for two different areas could be well obtained in a same time with only one camera by our method. In this study, double field of views (FOVs) of 11.16 mm × 5.92 mm were achieved within 0.5 s. Height range for each FOV was 460 µm and axial and transverse resolutions were 3.6 and 5.52 µm, respectively. PMID:26805840
A Parallelized 3D Particle-In-Cell Method With Magnetostatic Field Solver And Its Applications
Hsu, Kuo-Hsien; Chen, Yen-Sen; Wu, Men-Zan Bill; Wu, Jong-Shinn
2008-10-01
A parallelized 3D self-consistent electrostatic particle-in-cell finite element (PIC-FEM) code using an unstructured tetrahedral mesh was developed. For simulating some applications with external permanent magnet set, the distribution of the magnetostatic field usually also need to be considered and determined accurately. In this paper, we will firstly present the development of a 3D magnetostatic field solver with an unstructured mesh for the flexibility of modeling objects with complex geometry. The vector Poisson equation for magnetostatic field is formulated using the Galerkin nodal finite element method and the resulting matrix is solved by parallel conjugate gradient method. A parallel adaptive mesh refinement module is coupled to this solver for better resolution. Completed solver is then verified by simulating a permanent magnet array with results comparable to previous experimental observations and simulations. By taking the advantage of the same unstructured grid format of this solver, the developed PIC-FEM code could directly and easily read the magnetostatic field for particle simulation. In the upcoming conference, magnetron is simulated and presented for demonstrating the capability of this code.
Parallel Programming with Declarative Ada
Thornley, John
1993-01-01
Declarative programming languages (e.g., functional and logic programming languages) are semantically elegant and implicitly express parallelism at a high level. We show how a parallel declarative language can be based on a modern structured imperative language with single-assignment variables. Such a language combines the advantages of parallel declarative programming with the strengths and familiarity of the underlying imperative language. We introduce Declarative Ada, a parallel declarativ...
Jung, Jaewoon; Kobayashi, Chigusa; Imamura, Toshiyuki; Sugita, Yuji
2016-03-01
Three-dimensional Fast Fourier Transform (3D FFT) plays an important role in a wide variety of computer simulations and data analyses, including molecular dynamics (MD) simulations. In this study, we develop hybrid (MPI+OpenMP) parallelization schemes of 3D FFT based on two new volumetric decompositions, mainly for the particle mesh Ewald (PME) calculation in MD simulations. In one scheme, (1d_Alltoall), five all-to-all communications in one dimension are carried out, and in the other, (2d_Alltoall), one two-dimensional all-to-all communication is combined with two all-to-all communications in one dimension. 2d_Alltoall is similar to the conventional volumetric decomposition scheme. We performed benchmark tests of 3D FFT for the systems with different grid sizes using a large number of processors on the K computer in RIKEN AICS. The two schemes show comparable performances, and are better than existing 3D FFTs. The performances of 1d_Alltoall and 2d_Alltoall depend on the supercomputer network system and number of processors in each dimension. There is enough leeway for users to optimize performance for their conditions. In the PME method, short-range real-space interactions as well as long-range reciprocal-space interactions are calculated. Our volumetric decomposition schemes are particularly useful when used in conjunction with the recently developed midpoint cell method for short-range interactions, due to the same decompositions of real and reciprocal spaces. The 1d_Alltoall scheme of 3D FFT takes 4.7 ms to simulate one MD cycle for a virus system containing more than 1 million atoms using 32,768 cores on the K computer.
A parallel sweeping preconditioner for high frequency heterogeneous 3D Helmholtz equations
Poulson, Jack; Fomel, Sergey; Li, Siwei; Ying, Lexing
2012-01-01
A parallelization of a recently introduced sweeping preconditioner for high frequency heterogeneous Helmholtz equations is presented along with experimental results for the full SEG/EAGE Overthrust seismic model at 30 Hz, using eight grid points per characteristic wavelength; to the best of our knowledge, this is the largest 3D Helmholtz calculation to date, and our algorithm only required fifteen minutes to complete on 8192 cores. While the setup and application costs of the sweeping preconditioner are trivially $\\Theta(N^{4/3})$ and $\\Theta(N \\log N)$, this paper provides strong empirical evidence that the number of iterations required for the convergence of GMRES equipped with the sweeping preconditioner is essentially independent of the frequency of the problem. Generalizations to time-harmonic Maxwell and linear-elastic wave equations are also briefly discussed since the techniques behind our parallelization are not specific to the Helmholtz equation.
Parallel 3D Finite Element Particle-in-Cell Simulations with Pic3P
Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Ng, C.; Schussman, G.; Ko, K.; /SLAC; Ben-Zvi, I.; Kewisch, J.; /Brookhaven
2009-06-19
SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic Particle-In-Cell code Pic3P. Designed for simulations of beam-cavity interactions dominated by space charge effects, Pic3P solves the complete set of Maxwell-Lorentz equations self-consistently and includes space-charge, retardation and boundary effects from first principles. Higher-order Finite Element methods with adaptive refinement on conformal unstructured meshes lead to highly efficient use of computational resources. Massively parallel processing with dynamic load balancing enables large-scale modeling of photoinjectors with unprecedented accuracy, aiding the design and operation of next-generation accelerator facilities. Applications include the LCLS RF gun and the BNL polarized SRF gun.
The 3D Elevation Program initiative: a call for action
Sugarbaker, Larry J.; Constance, Eric W.; Heidemann, Hans Karl; Jason, Allyson L.; Lukas, Vicki; Saghy, David L.; Stoker, Jason M.
2014-01-01
The 3D Elevation Program (3DEP) initiative is accelerating the rate of three-dimensional (3D) elevation data collection in response to a call for action to address a wide range of urgent needs nationwide. It began in 2012 with the recommendation to collect (1) high-quality light detection and ranging (lidar) data for the conterminous United States (CONUS), Hawaii, and the U.S. territories and (2) interferometric synthetic aperture radar (ifsar) data for Alaska. Specifications were created for collecting 3D elevation data, and the data management and delivery systems are being modernized. The National Elevation Dataset (NED) will be completely refreshed with new elevation data products and services. The call for action requires broad support from a large partnership community committed to the achievement of national 3D elevation data coverage. The initiative is being led by the U.S. Geological Survey (USGS) and includes many partners—Federal agencies and State, Tribal, and local governments—who will work together to build on existing programs to complete the national collection of 3D elevation data in 8 years. Private sector firms, under contract to the Government, will continue to collect the data and provide essential technology solutions for the Government to manage and deliver these data and services. The 3DEP governance structure includes (1) an executive forum established in May 2013 to have oversight functions and (2) a multiagency coordinating committee based upon the committee structure already in place under the National Digital Elevation Program (NDEP). The 3DEP initiative is based on the results of the National Enhanced Elevation Assessment (NEEA) that was funded by NDEP agencies and completed in 2011. The study, led by the USGS, identified more than 600 requirements for enhanced (3D) elevation data to address mission-critical information requirements of 34 Federal agencies, all 50 States, and a sample of private sector companies and Tribal and local
Three-dimensional parallel UNIPIC-3D code for simulations of high-power microwave devices
Wang, Jianguo; Chen, Zaigao; Wang, Yue; Zhang, Dianhui; Liu, Chunliang; Li, Yongdong; Wang, Hongguang; Qiao, Hailiang; Fu, Meiyan; Yuan, Yuan
2010-07-01
This paper introduces a self-developed, three-dimensional parallel fully electromagnetic particle simulation code UNIPIC-3D. In this code, the electromagnetic fields are updated using the second-order, finite-difference time-domain method, and the particles are moved using the relativistic Newton-Lorentz force equation. The electromagnetic field and particles are coupled through the current term in Maxwell's equations. Two numerical examples are used to verify the algorithms adopted in this code, numerical results agree well with theoretical ones. This code can be used to simulate the high-power microwave (HPM) devices, such as the relativistic backward wave oscillator, coaxial vircator, and magnetically insulated line oscillator, etc. UNIPIC-3D is written in the object-oriented C++ language and can be run on a variety of platforms including WINDOWS, LINUX, and UNIX. Users can use the graphical user's interface to create the complex geometric structures of the simulated HPM devices, which can be automatically meshed by UNIPIC-3D code. This code has a powerful postprocessor which can display the electric field, magnetic field, current, voltage, power, spectrum, momentum of particles, etc. For the sake of comparison, the results computed by using the two-and-a-half-dimensional UNIPIC code are also provided for the same parameters of HPM devices, the numerical results computed from these two codes agree well with each other.
3D Body Scanning Measurement System Associated with RF Imaging, Zero-padding and Parallel Processing
Kim Hyung Tae
2016-04-01
Full Text Available This work presents a novel signal processing method for high-speed 3D body measurements using millimeter waves with a general processing unit (GPU and zero-padding fast Fourier transform (ZPFFT. The proposed measurement system consists of a radio-frequency (RF antenna array for a penetrable measurement, a high-speed analog-to-digital converter (ADC for significant data acquisition, and a general processing unit for fast signal processing. The RF waves of the transmitter and the receiver are converted to real and imaginary signals that are sampled by a high-speed ADC and synchronized with the kinematic positions of the scanner. Because the distance between the surface and the antenna is related to the peak frequency of the conjugate signals, a fast Fourier transform (FFT is applied to the signal processing after the sampling. The sampling time is finite owing to a short scanning time, and the physical resolution needs to be increased; further, zero-padding is applied to interpolate the spectra of the sampled signals to consider a 1/m floating point frequency. The GPU and parallel algorithm are applied to accelerate the speed of the ZPFFT because of the large number of additional mathematical operations of the ZPFFT. 3D body images are finally obtained by spectrograms that are the arrangement of the ZPFFT in a 3D space.
Introducing ZEUS-MP A 3D, Parallel, Multiphysics Code for Astrophysical Fluid Dynamics
Norman, M L
2000-01-01
We describe ZEUS-MP: a Multi-Physics, Massively-Parallel, Message-Passing code for astrophysical fluid dynamics simulations in 3 dimensions. ZEUS-MP is a follow-on to the sequential ZEUS-2D and ZEUS-3D codes developed and disseminated by the Laboratory for Computational Astrophysics (lca.ncsa.uiuc.edu) at NCSA. V1.0 released 1/1/2000 includes the following physics modules: ideal hydrodynamics, ideal MHD, and self-gravity. Future releases will include flux-limited radiation diffusion, thermal heat conduction, two-temperature plasma, and heating and cooling functions. The covariant equations are cast on a moving Eulerian grid with Cartesian, cylindrical, and spherical polar coordinates currently supported. Parallelization is done by domain decomposition and implemented in F77 and MPI. The code is portable across a wide range of platforms from networks of workstations to massively parallel processors. Some parallel performance results are presented as well as an application to turbulent star formation.
3D Navier-Stokes Time Accurate Solutions Using Multipartitioning Parallel Computation Methodology
Zha, Ge-Cheng
1998-01-01
A parallel CFD code solving 3D time accurate Navier-Stokes equations with multipartitioning parallel Methodology is being developed in collaboration with Ohio State University within the Air Vehicle Directorate, at Wright Patterson Air Force Base. The advantage of the multipartitioning parallel method is that the domain decomposition will not introduce domain boundaries for the implicit operators. A ring structure data communication is employed so that the implicit time accurate method can be implemented for multi-processors with the same accuracy as for the single processor. No sub-iteration is needed at the domain boundaries. The code has been validated for some typical unsteady flows, which include Coutte Flow, flow passing a cylinder. The code now is being employed for a large scale time accurate wall jet transient flow computation. 'ne preliminary results are promising. The mesh has been refined to capture more details of the flow field. The mesh refinement computation is in progress and would be difficult to successfully implement without the parallel computation techniques used. A modified version of the code with more efficient inversion of the diagonalized block matrix is currently being tested.
Parallel CAE system for large-scale 3-D finite element analyses
This paper describes a new pre- and post-processing system for the automation of large-scale 3D finite element analyses. In the pre-processing stage, a geometry model lo be analyzed is defined by a user through an interactive operation with a 3D graphics editor. The analysis model is constructed by adding analysis conditions and a mesh refinement information lo the geometry model. The mesh refinement information, i.e. a nodal density distribution over the whole analysis domain is initially defined by superposing several locally optimum nodal patterns stored in the nodal pattern database of the system. Nodes and tetrahedral elements are generated using some computational geometry techniques whose processing speed is almost proportional to the total number of nodes. In the post-processing stage, scalar and vector values are evaluated at arbitrary points in the analysis domain, and displayed as equi-contours, vector lines, iso-surfaces, particle plots and realtime animation by means of scientific visualization techniques. The present system is also capable of mesh optimization. A posteriori error distribution over the whole analysis domain is obtained based on the simple error estimator proposed by Zienkiewicz and Zhu. The nodal density distribution to be used for mesh generation is optimized referring the obtained error distribution. Finally nodes and tetrahedral elements are re-generated. The present remeshing method is one of the global hr-version mesh adaptation methods. To deal with large-scale 3D finite element analyses in a reasonable computational time and memory requirement, a distributed/parallel processing technique is applied to some part of the present system. Fundamental performances of the present system are clearly demonstrated through 3D thermal conduction analyses. (author)
Design and verification of an ultra-precision 3D-coordinate measuring machine with parallel drives
Bos, Edwin; Moers, Ton; van Riel, Martijn
2015-08-01
An ultra-precision 3D coordinate measuring machine (CMM), the TriNano N100, has been developed. In our design, the workpiece is mounted on a 3D stage, which is driven by three parallel drives that are mutually orthogonal. The linear drives support the 3D stage using vacuum preloaded (VPL) air bearings, whereby each drive determines the position of the 3D stage along one translation direction only. An exactly constrained design results in highly repeatable machine behavior. Furthermore, the machine complies with the Abbé principle over its full measurement range and the application of parallel drives allows for excellent dynamic behavior. The design allows a 3D measurement uncertainty of 100 nanometers in a measurement range of 200 cubic centimeters. Verification measurements using a Gannen XP 3D tactile probing system on a spherical artifact show a standard deviation in single point repeatability of around 2 nm in each direction.
Approach of generating parallel programs from parallelized algorithm design strategies
WAN Jian-yi; LI Xiao-ying
2008-01-01
Today, parallel programming is dominated by message passing libraries, such as message passing interface (MPI). This article intends to simplify parallel programming by generating parallel programs from parallelized algorithm design strategies. It uses skeletons to abstract parallelized algorithm design strategies, as well as parallel architectures. Starting from problem specification, an abstract parallel abstract programming language+ (Apla+) program is generated from parallelized algorithm design strategies and problem-specific function definitions. By combining with parallel architectures, implicity of parallelism inside the parallelized algorithm design strategies is exploited. With implementation and transformation, C++ and parallel virtual machine (CPPVM) parallel program is finally generated. Parallelized branch and bound (B&B) algorithm design strategy and parallelized divide and conquer (D & C) algorithm design strategy are studied in this article as examples. And it also illustrates the approach with a case study.
Recent progress in 3D EM/EM-PIC simulation with ARGUS and parallel ARGUS
ARGUS is an integrated, 3-D, volumetric simulation model for systems involving electric and magnetic fields and charged particles, including materials embedded in the simulation region. The code offers the capability to carry out time domain and frequency domain electromagnetic simulations of complex physical systems. ARGUS offers a boolean solid model structure input capability that can include essentially arbitrary structures on the computational domain, and a modular architecture that allows multiple physics packages to access the same data structure and to share common code utilities. Physics modules are in place to compute electrostatic and electromagnetic fields, the normal modes of RF structures, and self-consistent particle-in-cell (PIC) simulation in either a time dependent mode or a steady state mode. The PIC modules include multiple particle species, the Lorentz equations of motion, and algorithms for the creation of particles by emission from material surfaces, injection onto the grid, and ionization. In this paper, we present an updated overview of ARGUS, with particular emphasis given in recent algorithmic and computational advances. These include a completely rewritten frequency domain solver which efficiently treats lossy materials and periodic structures, a parallel version of ARGUS with support for both shared memory parallel vector (i.e. CRAY) machines and distributed memory massively parallel MIMD systems, and numerous new applications of the code
The development of laser-plasma interaction program LAP3D on thousands of processors
Modeling laser-plasma interaction (LPI) processes in real-size experiments scale is recognized as a challenging task. For explorering the influence of various instabilities in LPI processes, a three-dimensional laser and plasma code (LAP3D) has been developed, which includes filamentation, stimulated Brillouin backscattering (SBS), stimulated Raman backscattering (SRS), non-local heat transport and plasmas flow computation modules. In this program, a second-order upwind scheme is applied to solve the plasma equations which are represented by an Euler fluid model. Operator splitting method is used for solving the equations of the light wave propagation, where the Fast Fourier translation (FFT) is applied to compute the diffraction operator and the coordinate translations is used to solve the acoustic wave equation. The coupled terms of the different physics processes are computed by the second-order interpolations algorithm. In order to simulate the LPI processes in massively parallel computers well, several parallel techniques are used, such as the coupled parallel algorithm of FFT and fluid numerical computation, the load balance algorithm, and the data transfer algorithm. Now the phenomena of filamentation, SBS and SRS have been studied in low-density plasma successfully with LAP3D. Scalability of the program is demonstrated with a parallel efficiency above 50% on about ten thousand of processors
The development of laser-plasma interaction program LAP3D on thousands of processors
Xiaoyan Hu
2015-08-01
Full Text Available Modeling laser-plasma interaction (LPI processes in real-size experiments scale is recognized as a challenging task. For explorering the influence of various instabilities in LPI processes, a three-dimensional laser and plasma code (LAP3D has been developed, which includes filamentation, stimulated Brillouin backscattering (SBS, stimulated Raman backscattering (SRS, non-local heat transport and plasmas flow computation modules. In this program, a second-order upwind scheme is applied to solve the plasma equations which are represented by an Euler fluid model. Operator splitting method is used for solving the equations of the light wave propagation, where the Fast Fourier translation (FFT is applied to compute the diffraction operator and the coordinate translations is used to solve the acoustic wave equation. The coupled terms of the different physics processes are computed by the second-order interpolations algorithm. In order to simulate the LPI processes in massively parallel computers well, several parallel techniques are used, such as the coupled parallel algorithm of FFT and fluid numerical computation, the load balance algorithm, and the data transfer algorithm. Now the phenomena of filamentation, SBS and SRS have been studied in low-density plasma successfully with LAP3D. Scalability of the program is demonstrated with a parallel efficiency above 50% on about ten thousand of processors.
The development of laser-plasma interaction program LAP3D on thousands of processors
Hu, Xiaoyan, E-mail: hu-xiaoyan@iapcm.ac.cn; Hao, Liang; Liu, Zhanjun; Zheng, Chunyang; Li, Bin, E-mail: li.bin@iapcm.ac.cn; Guo, Hong [Institute of Applied Physics and Computational Mathematics, Beijing 100088 (China)
2015-08-15
Modeling laser-plasma interaction (LPI) processes in real-size experiments scale is recognized as a challenging task. For explorering the influence of various instabilities in LPI processes, a three-dimensional laser and plasma code (LAP3D) has been developed, which includes filamentation, stimulated Brillouin backscattering (SBS), stimulated Raman backscattering (SRS), non-local heat transport and plasmas flow computation modules. In this program, a second-order upwind scheme is applied to solve the plasma equations which are represented by an Euler fluid model. Operator splitting method is used for solving the equations of the light wave propagation, where the Fast Fourier translation (FFT) is applied to compute the diffraction operator and the coordinate translations is used to solve the acoustic wave equation. The coupled terms of the different physics processes are computed by the second-order interpolations algorithm. In order to simulate the LPI processes in massively parallel computers well, several parallel techniques are used, such as the coupled parallel algorithm of FFT and fluid numerical computation, the load balance algorithm, and the data transfer algorithm. Now the phenomena of filamentation, SBS and SRS have been studied in low-density plasma successfully with LAP3D. Scalability of the program is demonstrated with a parallel efficiency above 50% on about ten thousand of processors.
Patterns For Parallel Programming
Mattson, Timothy G; Massingill, Berna L
2005-01-01
From grids and clusters to next-generation game consoles, parallel computing is going mainstream. Innovations such as Hyper-Threading Technology, HyperTransport Technology, and multicore microprocessors from IBM, Intel, and Sun are accelerating the movement's growth. Only one thing is missing: programmers with the skills to meet the soaring demand for parallel software.
3-D readout-electronics packaging for high-bandwidth massively paralleled imager
Kwiatkowski, Kris; Lyke, James
2007-12-18
Dense, massively parallel signal processing electronics are co-packaged behind associated sensor pixels. Microchips containing a linear or bilinear arrangement of photo-sensors, together with associated complex electronics, are integrated into a simple 3-D structure (a "mirror cube"). An array of photo-sensitive cells are disposed on a stacked CMOS chip's surface at a 45.degree. angle from light reflecting mirror surfaces formed on a neighboring CMOS chip surface. Image processing electronics are held within the stacked CMOS chip layers. Electrical connections couple each of said stacked CMOS chip layers and a distribution grid, the connections for distributing power and signals to components associated with each stacked CSMO chip layer.
The new Exponential Directional Iterative (EDI) 3-D Sn scheme for parallel adaptive differencing
The new Exponential Directional Iterative (EDI) discrete ordinates (Sn) scheme for 3-D Cartesian Coordinates is presented. The EDI scheme is a logical extension of the positive, efficient Exponential Directional Weighted (EDW) Sn scheme currently used as the third level of the adaptive spatial differencing algorithm in the PENTRAN parallel discrete ordinates solver. Here, the derivation and advantages of the EDI scheme are presented; EDI uses EDW-rendered exponential coefficients as initial starting values to begin a fixed point iteration of the exponential coefficients. One issue that required evaluation was an iterative cutoff criterion to prevent the application of an unstable fixed point iteration; although this was needed in some cases, it was readily treated with a default to EDW. Iterative refinement of the exponential coefficients in EDI typically converged in fewer than four fixed point iterations. Moreover, EDI yielded more accurate angular fluxes compared to the other schemes tested, particularly in streaming conditions. Overall, it was found that the EDI scheme was up to an order of magnitude more accurate than the EDW scheme on a given mesh interval in streaming cases, and is potentially a good candidate as a fourth-level differencing scheme in the PENTRAN adaptive differencing sequence. The 3-D Cartesian computational cost of EDI was only about 20% more than the EDW scheme, and about 40% more than Diamond Zero (DZ). More evaluation and testing are required to determine suitable upgrade metrics for EDI to be fully integrated into the current adaptive spatial differencing sequence in PENTRAN. (author)
Combining parallel search and parallel consistency in constraint programming
Rolf, Carl Christian; Kuchcinski, Krzysztof
2010-01-01
Program parallelization becomes increasingly important when new multi-core architectures provide ways to improve performance. One of the greatest challenges of this development lies in programming parallel applications. Declarative languages, such as constraint programming, can make the transition to parallelism easier by hiding the parallelization details in a framework. Automatic parallelization in constraint programming has mostly focused on parallel search. While search and consist...
In situ patterned micro 3D liver constructs for parallel toxicology testing in a fluidic device.
Skardal, Aleksander; Devarasetty, Mahesh; Soker, Shay; Hall, Adam R
2015-09-01
3D tissue models are increasingly being implemented for drug and toxicology testing. However, the creation of tissue-engineered constructs for this purpose often relies on complex biofabrication techniques that are time consuming, expensive, and difficult to scale up. Here, we describe a strategy for realizing multiple tissue constructs in a parallel microfluidic platform using an approach that is simple and can be easily scaled for high-throughput formats. Liver cells mixed with a UV-crosslinkable hydrogel solution are introduced into parallel channels of a sealed microfluidic device and photopatterned to produce stable tissue constructs in situ. The remaining uncrosslinked material is washed away, leaving the structures in place. By using a hydrogel that specifically mimics the properties of the natural extracellular matrix, we closely emulate native tissue, resulting in constructs that remain stable and functional in the device during a 7-day culture time course under recirculating media flow. As proof of principle for toxicology analysis, we expose the constructs to ethyl alcohol (0-500 mM) and show that the cell viability and the secretion of urea and albumin decrease with increasing alcohol exposure, while markers for cell damage increase. PMID:26355538
Parallel programming with Python
Palach, Jan
2014-01-01
A fast, easy-to-follow and clear tutorial to help you develop Parallel computing systems using Python. Along with explaining the fundamentals, the book will also introduce you to slightly advanced concepts and will help you in implementing these techniques in the real world. If you are an experienced Python programmer and are willing to utilize the available computing resources by parallelizing applications in a simple way, then this book is for you. You are required to have a basic knowledge of Python development to get the most of this book.
Practical parallel programming
Bauer, Barr E
2014-01-01
This is the book that will teach programmers to write faster, more efficient code for parallel processors. The reader is introduced to a vast array of procedures and paradigms on which actual coding may be based. Examples and real-life simulations using these devices are presented in C and FORTRAN.
Application of the SDD-CMFD acceleration method to parallel 3-D MOC transport
In this paper the spatial domain decomposed coarse mesh finite difference (SDD-CMFD) method is applied as an acceleration technique to a parallel implementation of the 3-D method of characteristics (MOC) for a series of problems to assess the effectiveness of the method for practical applications. The SDD-CMFD method assumes the problem domain is divided into independent parallelizable sweep regions globally linked within the framework of a CMFD-like system. Results obtained with the MPACT code are examined for three problems. The first analysis is of multi-dimensional, 1-group, infinite homogeneous media problems that compare the numerically-measured rate of convergence to that predicted by the 1-D Fourier analysis performed in previous work. It is observed that the rate of convergence of the numerical experiments has similar behavior to that predicted by the Fourier analysis for variations of optical thickness in the coarse cell and spatial subdomain. However, the rate of convergence is measured to be slightly less than that predicted by Fourier analysis. The algorithm is applied to the Takeda 3-D neutron transport benchmark, and compared to a standard source iteration. In the analysis of this problem, the method is observed to speed up convergence, significantly reducing the number of outer iterations by a factor of nearly 20x and reducing the overall run time by a factor of about 10x. Finally, the method is applied to a realistic PWR assembly, which is observed to converge in 7 outer iterations, a factor of 150x less than source iteration, using the SDD-CMFD acceleration method, and have an estimated speedup of ∼34x over conventional source iteration. (author)
The 3D Elevation Program: summary for Minnesota
Carswell, William J., Jr.
2013-01-01
Elevation data are essential to a broad range of applications, including forest resources management, wildlife and habitat management, national security, recreation, and many others. For the State of Minnesota, elevation data are critical for agriculture and precision farming, natural resources conservation, flood risk management, infrastructure and construction management, water supply and quality, coastal zone management, and other business uses. Today, high-quality light detection and ranging (lidar) data are the sources for creating elevation models and other elevation datasets. Federal, State, and local agencies work in partnership to (1) replace data, on a national basis, that are (on average) 30 years old and of lower quality and (2) provide coverage where publicly accessible data do not exist. A joint goal of State and Federal partners is to acquire consistent, statewide coverage to support existing and emerging applications enabled by lidar data. The new 3D Elevation Program (3DEP) initiative, managed by the U.S. Geological Survey (USGS), responds to the growing need for high-quality topographic data and a wide range of other three-dimensional representations of the Nation’s natural and constructed features.
The 3D Elevation Program: summary for Rhode Island
Carswell, William J., Jr.
2013-01-01
Elevation data are essential to a broad range of applications, including forest resources management, wildlife and habitat management, national security, recreation, and many others. For the State of Rhode Island, elevation data are critical for flood risk management, natural resources conservation, coastal zone management, sea level rise and subsidence, agriculture and precision farming, and other business uses. Today, high-quality light detection and ranging (lidar) data are the sources for creating elevation models and other elevation datasets. Federal, State, and local agencies work in partnership to (1) replace data, on a national basis, that are (on average) 30 years old and of lower quality and (2) provide coverage where publicly accessible data do not exist. A joint goal of State and Federal partners is to acquire consistent, statewide coverage to support existing and emerging applications enabled by lidar data. The new 3D Elevation Program (3DEP) initiative (Snyder, 2012a,b), managed by the U.S. Geological Survey (USGS), responds to the growing need for high-quality topographic data and a wide range of other three-dimensional representations of the Nation’s natural and constructed features.
The 3D Elevation Program: summary for Wisconsin
Carswell, William J., Jr.
2013-01-01
Elevation data are essential to a broad range of applications, including forest resources management, wildlife and habitat management, national security, recreation, and many others. For the State of Wisconsin, elevation data are critical for agriculture and precision farming, natural resources conservation, flood risk management, infrastructure and construction management, water supply and quality, and other business uses. Today, high-quality light detection and ranging (lidar) data are the sources for creating elevation models and other elevation datasets. Federal, State, and local agencies work in partnership to (1) replace data, on a national basis, that are (on average) 30 years old and of lower quality and (2) provide coverage where publicly accessible data do not exist. A joint goal of State and Federal partners is to acquire consistent, statewide coverage to support existing and emerging applications enabled by lidar data. The new 3D Elevation Program (3DEP) initiative, managed by the U.S. Geological Survey (USGS), responds to the growing need for high-quality topographic data and a wide range of other three-dimensional representations of the Nation’s natural and constructed features.
The 3D Elevation Program: summary for California
Carswell, William J., Jr.
2013-01-01
Elevation data are essential to a broad range of applications, including forest resources management, wildlife and habitat management, national security, recreation, and many others. For the State of California, elevation data are critical for infrastructure and construction management; natural resources conservation; flood risk management; wildfire management, planning, and response; agriculture and precision farming; geologic resource assessment and hazard mitigation; and other business uses. Today, high-quality light detection and ranging (lidar) data are the sources for creating elevation models and other elevation datasets. Federal, State, and local agencies work in partnership to (1) replace data, on a national basis, that are (on average) 30 years old and of lower quality and (2) provide coverage where publicly accessible data do not exist. A joint goal of State and Federal partners is to acquire consistent, statewide coverage to support existing and emerging applications enabled by lidar data. The new 3D Elevation Program (3DEP) initiative, managed by the U.S. Geological Survey (USGS), responds to the growing need for high-quality topographic data and a wide range of other three-dimensional representations of the Nation’s natural and constructed features.
CH5M3D: an HTML5 program for creating 3D molecular structures
Earley, Clarke W
2013-01-01
Background While a number of programs and web-based applications are available for the interactive display of 3-dimensional molecular structures, few of these provide the ability to edit these structures. For this reason, we have developed a library written in JavaScript to allow for the simple creation of web-based applications that should run on any browser capable of rendering HTML5 web pages. While our primary interest in developing this application was for educational use, it may also pr...
3D magnetospheric parallel hybrid multi-grid method applied to planet-plasma interactions
Leclercq, L.; Modolo, R.; Leblanc, F.; Hess, S.; Mancini, M.
2016-03-01
We present a new method to exploit multiple refinement levels within a 3D parallel hybrid model, developed to study planet-plasma interactions. This model is based on the hybrid formalism: ions are kinetically treated whereas electrons are considered as a inertia-less fluid. Generally, ions are represented by numerical particles whose size equals the volume of the cells. Particles that leave a coarse grid subsequently entering a refined region are split into particles whose volume corresponds to the volume of the refined cells. The number of refined particles created from a coarse particle depends on the grid refinement rate. In order to conserve velocity distribution functions and to avoid calculations of average velocities, particles are not coalesced. Moreover, to ensure the constancy of particles' shape function sizes, the hybrid method is adapted to allow refined particles to move within a coarse region. Another innovation of this approach is the method developed to compute grid moments at interfaces between two refinement levels. Indeed, the hybrid method is adapted to accurately account for the special grid structure at the interfaces, avoiding any overlapping grid considerations. Some fundamental test runs were performed to validate our approach (e.g. quiet plasma flow, Alfven wave propagation). Lastly, we also show a planetary application of the model, simulating the interaction between Jupiter's moon Ganymede and the Jovian plasma.
Teaching Parallel Programming Using Java
Shafi, Aamir; Akhtar, Aleem; Javed, Ansar; Carpenter, Bryan
2014-01-01
This paper presents an overview of the "Applied Parallel Computing" course taught to final year Software Engineering undergraduate students in Spring 2014 at NUST, Pakistan. The main objective of the course was to introduce practical parallel programming tools and techniques for shared and distributed memory concurrent systems. A unique aspect of the course was that Java was used as the principle programming language. The course was divided into three sections. The first section covered paral...
Explicit Parallel Programming: System Description
Gamble, Jim; Ribbens, Calvin J.
1991-01-01
The implementation of the Explicit Parallel Programming (EPP) system is described. EPP is a prototype implementation of a language for writing parallel programs for shared memory multiprocessors. EPP may be viewed as a coordination language, since it is used to define the sequencing or ordering of various tasks, while the tasks themselves are defined in some other compilable language. The two main components of the EPP system---a compiler and an executive---are described in this report. An...
Explicit Parallel Programming: User's Guide
Gamble, Jim; Ribbens, Calvin J.
1991-01-01
The Explicit Parallel Programming (EPP) language is defined and illustrated with several examples. EPP is a prototype implementation of a language for writing parallel programs for shared memory multiprocessors. EPP may be viewed as a coordination language, since it is used to define the sequencing or ordering of various tasks, while the tasks themselves are defined in some other compilable language. The prototype described here requires FORTRAN as the base language, but there is no inheren...
Chien-Lun Hou; Hao-Ting Lin; Mao-Hsiung Chiang
2011-01-01
In this paper, a stereo vision 3D position measurement system for a three-axial pneumatic parallel mechanism robot arm is presented. The stereo vision 3D position measurement system aims to measure the 3D trajectories of the end-effector of the robot arm. To track the end-effector of the robot arm, the circle detection algorithm is used to detect the desired target and the SAD algorithm is used to track the moving target and to search the corresponding target location along the conjugate epip...
Task-parallel implementation of 3D shortest path raytracing for geophysical applications
Giroux, Bernard; Larouche, Benoît
2013-04-01
This paper discusses two variants of the shortest path method and their parallel implementation on a shared-memory system. One variant is designed to perform raytracing in models with stepwise distributions of interval velocity while the other is better suited for continuous velocity models. Both rely on a discretization scheme where primary nodes are located at the corners of cuboid cells and where secondary nodes are found on the edges and sides of the cells. The parallel implementations allow raytracing concurrently for different sources, providing an attractive framework for ray-based tomography. The accuracy and performance of the implementations were measured by comparison with the analytic solution for a layered model and for a vertical gradient model. Mean relative error less than 0.2% was obtained with 5 secondary nodes for the layered model and 9 secondary nodes for the gradient model. Parallel performance depends on the level of discretization refinement, on the number of threads, and on the problem size, with the most determinant variable being the level of discretization refinement (number of secondary nodes). The results indicate that a good trade-off between speed and accuracy is achieved with the number of secondary nodes equal to 5. The programs are written in C++ and rely on the Standard Template Library and OpenMP.
Hao-Ting Lin; Mao-Hsiung Chiang
2011-01-01
This study aimed to develop a novel 3D parallel mechanism robot driven by three vertical-axial pneumatic actuators with a stereo vision system for path tracking control. The mechanical system and the control system are the primary novel parts for developing a 3D parallel mechanism robot. In the mechanical system, a 3D parallel mechanism robot contains three serial chains, a fixed base, a movable platform and a pneumatic servo system. The parallel mechanism are designed and analyzed first for ...
Wakefield Simulation of CLIC PETS Structure Using Parallel 3D Finite Element Time-Domain Solver T3P
Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Ng, C.; Schussman, G.; Ko, K.; /SLAC; Syratchev, I.; /CERN
2009-06-19
In recent years, SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic time-domain code T3P. Higher-order Finite Element methods on conformal unstructured meshes and massively parallel processing allow unprecedented simulation accuracy for wakefield computations and simulations of transient effects in realistic accelerator structures. Applications include simulation of wakefield damping in the Compact Linear Collider (CLIC) power extraction and transfer structure (PETS).
Influence of intrinsic and extrinsic forces on 3D stress distribution using CUDA programming
Räss, Ludovic; Omlin, Samuel; Podladchikov, Yuri
2013-04-01
In order to have a better understanding of the influence of buoyancy (intrinsic) and boundary (extrinsic) forces in a nonlinear rheology due to a power law fluid, some basics needs to be explored through 3D numerical calculation. As first approach, the already studied Stokes setup of a rising sphere will be used to calibrate the 3D model. Far field horizontal tectonic stress is applied to the sphere, which generates a vertical acceleration, buoyancy driven. This simple and known setup allows some benchmarking performed through systematic runs. The relative importance of intrinsic and extrinsic forces producing the wide variety of rates and styles of deformation, including absence of deformation and generating 3D stress patterns, will be determined. Relation between vertical motion and power law exponent will also be explored. The goal of these investigations will be to run models having topography and density structure from geophysical imaging as input, and 3D stress field as output. The stress distribution in Swiss Alps and Plateau and its implication for risk analysis is one of the perspective for this research. In fact, proximity of the stress to the failure is fundamental for risk assessment. Sensitivity of this to the accurate topography representation can then be evaluated. The developed 3D numerical codes, tuned for mid-sized cluster, need to be optimized, especially while running good resolution in full 3D. Therefor, two largely used computing platforms, MATLAB and FORTRAN 90 are explored. Starting with an easy adaptable and as short as possible MATLAB code, which is then upgraded in order to reach higher performance in simulation times and resolution. A significant speedup using the rising NVIDIA CUDA technology and resources is also possible. Programming in C-CUDA, creating some synchronization feature, and comparing the results with previous runs, helps us to investigate the new speedup possibilities allowed through GPU parallel computing. These codes
Design and verification of an ultra-precision 3D-coordinate measuring machine with parallel drives
An ultra-precision 3D coordinate measuring machine (CMM), the TriNano N100, has been developed. In our design, the workpiece is mounted on a 3D stage, which is driven by three parallel drives that are mutually orthogonal. The linear drives support the 3D stage using vacuum preloaded (VPL) air bearings, whereby each drive determines the position of the 3D stage along one translation direction only. An exactly constrained design results in highly repeatable machine behavior. Furthermore, the machine complies with the Abbé principle over its full measurement range and the application of parallel drives allows for excellent dynamic behavior. The design allows a 3D measurement uncertainty of 100 nanometers in a measurement range of 200 cubic centimeters. Verification measurements using a Gannen XP 3D tactile probing system on a spherical artifact show a standard deviation in single point repeatability of around 2 nm in each direction. (paper)
Recent trends in parallel programming
Jakl, Ondřej
Ostrava: ÚGN AV ČR, 2007 - (Blaheta, R.; Starý, J.), s. 54-58 ISBN 978-80-86407-12-8. [Seminar on Numerical Analysis. Modelling and Simulation of Chalenging Engineering Problems. Winter School. High-performance and Parallel Computers, Programming Technologies & Numerical Linear Algebra. Ostrava (CZ), 22.01.2007-26.01.2007] R&D Projects: GA AV ČR 1ET400300415; GA MŠk 1N04035 Institutional research plan: CEZ:AV0Z30860518 Keywords : high performance computing * parallel programming * MPI Subject RIV: BA - General Mathematics
Chien-Lun Hou
2011-02-01
Full Text Available In this paper, a stereo vision 3D position measurement system for a three-axial pneumatic parallel mechanism robot arm is presented. The stereo vision 3D position measurement system aims to measure the 3D trajectories of the end-effector of the robot arm. To track the end-effector of the robot arm, the circle detection algorithm is used to detect the desired target and the SAD algorithm is used to track the moving target and to search the corresponding target location along the conjugate epipolar line in the stereo pair. After camera calibration, both intrinsic and extrinsic parameters of the stereo rig can be obtained, so images can be rectified according to the camera parameters. Thus, through the epipolar rectification, the stereo matching process is reduced to a horizontal search along the conjugate epipolar line. Finally, 3D trajectories of the end-effector are computed by stereo triangulation. The experimental results show that the stereo vision 3D position measurement system proposed in this paper can successfully track and measure the fifth-order polynomial trajectory and sinusoidal trajectory of the end-effector of the three- axial pneumatic parallel mechanism robot arm.
Bristeau, Marie-Odile; Glowinski, Roland; Périaux, Jacques; Rossi, Tuomo
1999-01-01
We consider the scattering problem for 3-D electromagnetic harmonic waves. The time-domain Maxwell's equations are solved and Exact Controllability methods improve the convergence of the solutions to the time-periodic ones for nonconvex obstacles. A least-squares formulation solved by a preconditioned conjugate gradient is introduced. The discretization is achieved in time by a centered finite difference scheme and in space by Lagrange finite elements. Numerical results for 3-D nonconvex scat...
Gabriele Jost
2010-01-01
Full Text Available Today most systems in high-performance computing (HPC feature a hierarchical hardware design: shared-memory nodes with several multi-core CPUs are connected via a network infrastructure. When parallelizing an application for these architectures it seems natural to employ a hierarchical programming model such as combining MPI and OpenMP. Nevertheless, there is the general lore that pure MPI outperforms the hybrid MPI/OpenMP approach. In this paper, we describe the hybrid MPI/OpenMP parallelization of IR3D (Incompressible Realistic 3-D code, a full-scale real-world application, which simulates the environmental effects on the evolution of vortices trailing behind control surfaces of underwater vehicles. We discuss performance, scalability and limitations of the pure MPI version of the code on a variety of hardware platforms and show how the hybrid approach can help to overcome certain limitations.
Efficient Parallel Programming with Linda
Ashish Deshpande; Martin Schultz
1992-01-01
Linda is a coordination language inverted by David Gelernter at Yale University, which when combined with a computation language (like C) yields a high-level parallel programming language for MIMD machines. Linda is based on a virtual shared associative memory containing objects called tuples. Skeptics have long claimed that Linda programs could not be efficient on distributed memory architectures. In this paper, we address this claim by discussing C-Linda's performance in solving a particula...
Characterization of a parallel-beam CCD optical-CT apparatus for 3D radiation dosimetry
Krstajic, Nikola; Doran, Simon J.
2007-07-01
3D measurement of optical attenuation is of interest in a variety of fields of biomedical importance, including spectrophotometry, optical projection tomography (OPT) and analysis of 3D radiation dosimeters. Accurate, precise and economical 3D measurements of optical density (OD) are a crucial step in enabling 3D radiation dosimeters to enter wider use in clinics. Polymer gels and Fricke gels, as well as dosimeters not based around gels, have been characterized for 3D dosimetry over the last two decades. A separate problem is the verification of the best readout method. A number of different imaging modalities (magnetic resonance imaging (MRI), optical CT, x-ray CT and ultrasound) have been suggested for the readout of information from 3D dosimeters. To date only MRI and laser-based optical CT have been characterized in detail. This paper describes some initial steps we have taken in establishing charge coupled device (CCD)-based optical CT as a viable alternative to MRI for readout of 3D radiation dosimeters. The main advantage of CCD-based optical CT over traditional laser-based optical CT is a speed increase of at least an order of magnitude, while the simplicity of its architecture would lend itself to cheaper implementation than both MRI and laser-based optical CT if the camera itself were inexpensive enough. Specifically, we study the following aspects of optical metrology, using high quality test targets: (i) calibration and quality of absorbance measurements and the camera requirements for 3D dosimetry; (ii) the modulation transfer function (MTF) of individual projections; (iii) signal-to-noise ratio (SNR) in the projection and reconstruction domains; (iv) distortion in the projection domain, depth-of-field (DOF) and telecentricity. The principal results for our current apparatus are as follows: (i) SNR of optical absorbance in projections is better than 120:1 for uniform phantoms in absorbance range 0.3 to 1.6 (and better than 200:1 for absorbances 1.0 to
Novel Kinetic 3D MHD Algorithm for High Performance Parallel Computing Systems
Chetverushkin, B; Saveliev, V
2013-01-01
The impressive progress of the kinetic schemes in the solution of gas dynamics problems and the development of effective parallel algorithms for modern high performance parallel computing systems led to the development of advanced methods for the solution of the magnetohydrodynamics problem in the important area of plasma physics. The novel feature of the method is the formulation of the complex Boltzmann-like distribution function of kinetic method with the implementation of electromagnetic interaction terms. The numerical method is based on the explicit schemes. Due to logical simplicity and its efficiency, the algorithm is easily adapted to modern high performance parallel computer systems including hybrid computing systems with graphic processors.
Parallel Adaptive Computation of Blood Flow in a 3D ``Whole'' Body Model
Zhou, M.; Figueroa, C. A.; Taylor, C. A.; Sahni, O.; Jansen, K. E.
2008-11-01
Accurate numerical simulations of vascular trauma require the consideration of a larger portion of the vasculature than previously considered, due to the systemic nature of the human body's response. A patient-specific 3D model composed of 78 connected arterial branches extending from the neck to the lower legs is constructed to effectively represent the entire body. Recently developed outflow boundary conditions that appropriately represent the downstream vasculature bed which is not included in the 3D computational domain are applied at 78 outlets. In this work, the pulsatile blood flow simulations are started on a fairly uniform, unstructured mesh that is subsequently adapted using a solution-based approach to efficiently resolve the flow features. The adapted mesh contains non-uniform, anisotropic elements resulting in resolution that conforms with the physical length scales present in the problem. The effects of the mesh resolution on the flow field are studied, specifically on relevant quantities of pressure, velocity and wall shear stress.
A first 3D parallel diffusion solver based on a mixed dual finite element approximation
This paper presents a new extension of the mixed dual finite element approximation of the diffusion equation in rectangular geometry. The mixed dual formulation has been extended in order to take into account discontinuity conditions. The iterative method is based on an alternating direction method which uses the current as unknown. This method is parallelizable and have very fast convergence properties. Some results for a 3D calculation on the CRAY computer are presented. (orig.)
Compensation of errors in robot machining with a parallel 3D-piezo compensation mechanism
Schneider, Ulrich; Drust, Manuel; Puzik, Arnold; Verl, Alexander
2013-01-01
This paper proposes an approach for a 3D-Piezo Compensation Mechanism unit that is capable of fast and accurate adaption of the spindle position to enhance machining by robots. The mechanical design is explained which focuses on low mass, good stiffness and high bandwidth in order to allow compensating for errors beyond the bandwidth of the robot. In addition to previous works [7] and [9], an advanced actuation design is presented enabling movements in three translational axes allowing a work...
BioFVM: an efficient, parallelized diffusive transport solver for 3-D biological simulations
Ghaffarizadeh, Ahmadreza; Friedman, Samuel H.; Macklin, Paul
2015-01-01
Motivation: Computational models of multicellular systems require solving systems of PDEs for release, uptake, decay and diffusion of multiple substrates in 3D, particularly when incorporating the impact of drugs, growth substrates and signaling factors on cell receptors and subcellular systems biology. Results: We introduce BioFVM, a diffusive transport solver tailored to biological problems. BioFVM can simulate release and uptake of many substrates by cell and bulk sources, diffusion and de...
Wang, S.; De Hoop, M. V.; Xia, J.; Li, X.
2011-12-01
We consider the modeling of elastic seismic wave propagation on a rectangular domain via the discretization and solution of the inhomogeneous coupled Helmholtz equation in 3D, by exploiting a parallel multifrontal sparse direct solver equipped with Hierarchically Semi-Separable (HSS) structure to reduce the computational complexity and storage. In particular, we are concerned with solving this equation on a large domain, for a large number of different forcing terms in the context of seismic problems in general, and modeling in particular. We resort to a parsimonious mixed grid finite differences scheme for discretizing the Helmholtz operator and Perfect Matched Layer boundaries, resulting in a non-Hermitian matrix. We make use of a nested dissection based domain decomposition, and introduce an approximate direct solver by developing a parallel HSS matrix compression, factorization, and solution approach. We cast our massive parallelization in the framework of the multifrontal method. The assembly tree is partitioned into local trees and a global tree. The local trees are eliminated independently in each processor, while the global tree is eliminated through massive communication. The solver for the inhomogeneous equation is a parallel hybrid between multifrontal and HSS structure. The computational complexity associated with the factorization is almost linear with the size of the Helmholtz matrix. Our numerical approach can be compared with the spectral element method in 3D seismic applications.
Parallel load balancing strategy for Volume-of-Fluid methods on 3-D unstructured meshes
Jofre, Lluís; Borrell, Ricard; Lehmkuhl, Oriol; Oliva, Assensi
2015-02-01
Volume-of-Fluid (VOF) is one of the methods of choice to reproduce the interface motion in the simulation of multi-fluid flows. One of its main strengths is its accuracy in capturing sharp interface geometries, although requiring for it a number of geometric calculations. Under these circumstances, achieving parallel performance on current supercomputers is a must. The main obstacle for the parallelization is that the computing costs are concentrated only in the discrete elements that lie on the interface between fluids. Consequently, if the interface is not homogeneously distributed throughout the domain, standard domain decomposition (DD) strategies lead to imbalanced workload distributions. In this paper, we present a new parallelization strategy for general unstructured VOF solvers, based on a dynamic load balancing process complementary to the underlying DD. Its parallel efficiency has been analyzed and compared to the DD one using up to 1024 CPU-cores on an Intel SandyBridge based supercomputer. The results obtained on the solution of several artificially generated test cases show a speedup of up to ∼12× with respect to the standard DD, depending on the interface size, the initial distribution and the number of parallel processes engaged. Moreover, the new parallelization strategy presented is of general purpose, therefore, it could be used to parallelize any VOF solver without requiring changes on the coupled flow solver. Finally, note that although designed for the VOF method, our approach could be easily adapted to other interface-capturing methods, such as the Level-Set, which may present similar workload imbalances.
A Framework for Parallel Programming in Java
Launay, Pascale; Pazat, Jean-Louis
1997-01-01
To ease the task of programming parallel and distributed applications, the Do! project aims at the automatic generation of distributed code from multi-threaded Java programs. We provide a parallel programming model, embedded in a framework that constraints parallelism without any extension to the Java language. This framework is described here and is used as a basis to generate distributed programs.
Exploiting Parallelism in Coalgebraic Logic Programming
Komendantskaya, Ekaterina; Schmidt, Martin; Heras, Jónathan
2013-01-01
We present a parallel implementation of Coalgebraic Logic Programming (CoALP) in the programming language Go. CoALP was initially introduced to reflect coalgebraic semantics of logic programming, with coalgebraic derivation algorithm featuring both corecursion and parallelism. Here, we discuss how the coalgebraic semantics influenced our parallel implementation of logic programming.
C# game programming cookbook for Unity 3D
Murray, Jeff W
2014-01-01
Making Games the Modular Way Important Programming ConceptsBuilding the Core Game Framework Controllers and Managers Building the Core Framework ScriptsPlayer Structure Game-Specific Player Controller Dealing with Input Player Manager User Data ManagerRecipes: Common Components Introduction The Timer Class Spawn ScriptsSet Gravity Pretend Friction-Friction Simulation to Prevent Slipping Around Cameras Input Scripts Automatic Self-Destruction Script Automatic Object SpinnerScene Manager Building Player Movement ControllersShoot 'Em Up Spaceship Humanoid Character Wheeled Vehicle Weapon Systems
Parallel 3-D particle-in-cell modelling of charged ultrarelativistic beam dynamics
Boronina, Marina A.; Vshivkov, Vitaly A.
2015-12-01
> ) in supercolliders. We use the 3-D set of Maxwell's equations for the electromagnetic fields, and the Vlasov equation for the distribution function of the beam particles. The model incorporates automatically the longitudinal effects, which can play a significant role in the cases of super-high densities. We present numerical results for the dynamics of two focused ultrarelativistic beams with a size ratio 10:1:100. The results demonstrate high efficiency of the proposed computational methods and algorithms, which are applicable to a variety of problems in relativistic plasma physics.
Contributions to computational stereology and parallel programming
Rasmusson, Allan
rotator, even without the need for isotropic sections. To meet the need for computational power to perform image restoration of virtual tissue sections, parallel programming on GPUs has also been part of the project. This has lead to a significant change in paradigm for a previously developed surgical...... between computer science and stereology, we try to overcome these problems by developing new virtual stereological probes and virtual tissue sections. A concrete result is the development of a new virtual 3D probe, the spatial rotator, which was found to have lower variance than the widely used planar...... simulator and a memory efficient, GPU implementation of for connected components labeling. This was furthermore extended to produce signed distance fields and Voronoi diagrams, all with real-time performance. It has during the course of the project been realized that many disciplines within computer science...
Schultz, A.
2010-12-01
3D forward solvers lie at the core of inverse formulations used to image the variation of electrical conductivity within the Earth's interior. This property is associated with variations in temperature, composition, phase, presence of volatiles, and in specific settings, the presence of groundwater, geothermal resources, oil/gas or minerals. The high cost of 3D solutions has been a stumbling block to wider adoption of 3D methods. Parallel algorithms for modeling frequency domain 3D EM problems have not achieved wide scale adoption, with emphasis on fairly coarse grained parallelism using MPI and similar approaches. The communications bandwidth as well as the latency required to send and receive network communication packets is a limiting factor in implementing fine grained parallel strategies, inhibiting wide adoption of these algorithms. Leading Graphics Processor Unit (GPU) companies now produce GPUs with hundreds of GPU processor cores per die. The footprint, in silicon, of the GPU's restricted instruction set is much smaller than the general purpose instruction set required of a CPU. Consequently, the density of processor cores on a GPU can be much greater than on a CPU. GPUs also have local memory, registers and high speed communication with host CPUs, usually through PCIe type interconnects. The extremely low cost and high computational power of GPUs provides the EM geophysics community with an opportunity to achieve fine grained (i.e. massive) parallelization of codes on low cost hardware. The current generation of GPUs (e.g. NVidia Fermi) provides 3 billion transistors per chip die, with nearly 500 processor cores and up to 6 GB of fast (DDR5) GPU memory. This latest generation of GPU supports fast hardware double precision (64 bit) floating point operations of the type required for frequency domain EM forward solutions. Each Fermi GPU board can sustain nearly 1 TFLOP in double precision, and multiple boards can be installed in the host computer system. We
Simulation of the 3D viscoelastic free surface flow by a parallel corrected particle scheme
Jin-Lian, Ren; Tao, Jiang
2016-02-01
In this work, the behavior of the three-dimensional (3D) jet coiling based on the viscoelastic Oldroyd-B model is investigated by a corrected particle scheme, which is named the smoothed particle hydrodynamics with corrected symmetric kernel gradient and shifting particle technique (SPH_CS_SP) method. The accuracy and stability of SPH_CS_SP method is first tested by solving Poiseuille flow and Taylor-Green flow. Then the capacity for the SPH_CS_SP method to solve the viscoelastic fluid is verified by the polymer flow through a periodic array of cylinders. Moreover, the convergence of the SPH_CS_SP method is also investigated. Finally, the proposed method is further applied to the 3D viscoelastic jet coiling problem, and the influences of macroscopic parameters on the jet coiling are discussed. The numerical results show that the SPH_CS_SP method has higher accuracy and better stability than the traditional SPH method and other corrected SPH method, and can improve the tensile instability. Project supported by the Natural Science Foundation of Jiangsu Province, China (Grant Nos. BK20130436 and BK20150436) and the Natural Science Foundation of the Higher Education Institutions of Jiangsu Province, China (Grant No. 15KJB110025).
1 - Description of program or function: PARTISN (Parallel, Time-Dependent SN) is the evolutionary successor to CCC-0547/DANTSYS. User input and cross section formats are very similar to that of DANTSYS. The linear Boltzmann transport equation is solved for neutral particles using the deterministic (SN) method. Both the static (fixed source or eigenvalue) and time-dependent forms of the transport equation are solved in forward or adjoint mode. Vacuum, reflective, periodic, white, or inhomogeneous boundary conditions are solved. General anisotropic scattering and inhomogeneous sources are permitted. PARTISN solves the transport equation on orthogonal (single level or block-structured AMR) grids in 1-D (slab, two-angle slab, cylindrical, or spherical), 2-D (X-Y, R-Z, or R-T) and 3-D (X-Y-Z or R-Z-T) geometries. 2 - Methods:PARTISN numerically solves the multigroup form of the neutral-particle Boltzmann transport equation. The discrete-ordinates form of approximation is used for treating the angular variation of the particle distribution. For curvilinear geometries, diamond differencing is used for angular discretization. The spatial discretizations may be either low-order (diamond difference or Adaptive Weighted Diamond Difference (AWDD)) or higher-order (linear discontinuous or exponential discontinuous). Negative fluxes are eliminated by a local set-to-zero-and-correct algorithm for the diamond case (DD/STZ). Time differencing is Crank-Nicholson (diamond), also with a set-to-zero fix-up scheme. Both inner and outer iterations can be accelerated using the diffusion synthetic acceleration method, or transport synthetic acceleration can be used to accelerate the inner iterations. The diffusion solver uses either the conjugate gradient or multigrid method. Chebyshev acceleration of the fission source is used. The angular source terms may be treated either via standard PN expansions or Galerkin scattering. An option is provided for strictly positive scattering sources
Chu, Chunlei
2009-01-01
The major performance bottleneck of the parallel Fourier method on distributed memory systems is the network communication cost. In this study, we investigate the potential of using non‐blocking all‐to‐all communications to solve this problem by overlapping computation and communication. We present the runtime comparison of a 3D seismic modeling problem with the Fourier method using non‐blocking and blocking calls, respectively, on a Linux cluster. The data demonstrate that a performance improvement of up to 40% can be achieved by simply changing blocking all‐to‐all communication calls to non‐blocking ones to introduce the overlapping capability. A 3D reverse‐time migration result is also presented as an extension to the modeling work based on non‐blocking collective communications.
Parallel Programming Environment for OpenMP
Insung Park; Michael J. Voss; Seon Wook Kim; Rudolf Eigenmann
2001-01-01
We present our effort to provide a comprehensive parallel programming environment for the OpenMP parallel directive language. This environment includes a parallel programming methodology for the OpenMP programming model and a set of tools (Ursa Minor and InterPol) that support this methodology. Our toolset provides automated and interactive assistance to parallel programmers in time-consuming tasks of the proposed methodology. The features provided by our tools include performance and program...
High-Performance Computation of Distributed-Memory Parallel 3D Voronoi and Delaunay Tessellation
Peterka, Tom; Morozov, Dmitriy; Phillips, Carolyn
2014-11-14
Computing a Voronoi or Delaunay tessellation from a set of points is a core part of the analysis of many simulated and measured datasets: N-body simulations, molecular dynamics codes, and LIDAR point clouds are just a few examples. Such computational geometry methods are common in data analysis and visualization; but as the scale of simulations and observations surpasses billions of particles, the existing serial and shared-memory algorithms no longer suffice. A distributed-memory scalable parallel algorithm is the only feasible approach. The primary contribution of this paper is a new parallel Delaunay and Voronoi tessellation algorithm that automatically determines which neighbor points need to be exchanged among the subdomains of a spatial decomposition. Other contributions include periodic and wall boundary conditions, comparison of our method using two popular serial libraries, and application to numerous science datasets.
Calibration of 3-d.o.f. Translational Parallel Manipulators Using Leg Observations
Pashkevich, Anatoly; Wenger, Philippe; Gomolitsky, Roman
2009-01-01
The paper proposes a novel approach for the geometrical model calibration of quasi-isotropic parallel kinematic mechanisms of the Orthoglide family. It is based on the observations of the manipulator leg parallelism during motions between the specific test postures and employs a low-cost measuring system composed of standard comparator indicators attached to the universal magnetic stands. They are sequentially used for measuring the deviation of the relevant leg location while the manipulator moves the TCP along the Cartesian axes. Using the measured differences, the developed algorithm estimates the joint offsets and the leg lengths that are treated as the most essential parameters. Validity of the proposed calibration technique is confirmed by the experimental results.
Large-scale Parallel Unstructured Mesh Computations for 3D High-lift Analysis
Mavriplis, Dimitri J.; Pirzadeh, S.
1999-01-01
A complete "geometry to drag-polar" analysis capability for the three-dimensional high-lift configurations is described. The approach is based on the use of unstructured meshes in order to enable rapid turnaround for complicated geometries that arise in high-lift configurations. Special attention is devoted to creating a capability for enabling analyses on highly resolved grids. Unstructured meshes of several million vertices are initially generated on a work-station, and subsequently refined on a supercomputer. The flow is solved on these refined meshes on large parallel computers using an unstructured agglomeration multigrid algorithm. Good prediction of lift and drag throughout the range of incidences is demonstrated on a transport take-off configuration using up to 24.7 million grid points. The feasibility of using this approach in a production environment on existing parallel machines is demonstrated, as well as the scalability of the solver on machines using up to 1450 processors.
Multiscale simulation of mixing processes using 3D-parallel, fluid-structure interaction techniques
Valette, Rudy; Vergnes, Bruno; Coupez, Thierry
2008-01-01
International audience This work focuses on the development of a general finite element code, called Ximex®, devoted to the three-dimensional direct simulation of mixing processes of complex fluids. The code is based on a simplified fictitious domain method coupled with a "level-set" approach to represent the rigid moving boundaries, such as screws and rotors, as well as free surfaces. These techniques, combined with the use of parallel computing, allow computing the time-dependent flow of...
A new ray-tracing scheme for 3D diffuse radiation transfer on highly parallel architectures
Tanaka, Satoshi; Okamoto, Takashi; Hasegawa, Kenji
2014-01-01
We present a new numerical scheme to solve the transfer of diffuse radiation on three-dimensional mesh grids which is efficient on processors with highly parallel architecture such as recently popular GPUs and CPUs with multi- and many-core architectures. The scheme is based on the ray-tracing method and the computational cost is proportional to $N_{\\rm m}^{5/3}$ where $N_{\\rm m}$ is the number of mesh grids, and is devised to compute the radiation transfer along each light-ray completely in parallel with appropriate grouping of the light-rays. We find that the performance of our scheme scales well with the number of adopted CPU cores and GPUs, and also that our scheme is nicely parallelized on a multi-node system by adopting the multiple wave front scheme, and the performance scales well with the amount of the computational resources. As numerical tests to validate our scheme and to give a physical criterion for the angular resolution of our ray-tracing scheme, we perform several numerical simulations of the...
Hybrid shared/distributed parallelism for 3D characteristics transport solvers
In this paper, we will present a new hybrid parallel model for solving large-scale 3-dimensional neutron transport problems used in nuclear reactor simulations. Large heterogeneous reactor problems, like the ones that occurs when simulating Candu cores, have remained computationally intensive and impractical for routine applications on single-node or even vector computers. Based on the characteristics method, this new model is designed to solve the transport equation after distributing the calculation load on a network of shared memory multi-processors. The tracks are either generated on the fly at each characteristics sweep or stored in sequential files. The load balancing is taken into account by estimating the calculation load of tracks and by distributing batches of uniform load on each node of the network. Moreover, the communication overhead can be predicted after benchmarking the latency and bandwidth using appropriate network test suite. These models are useful for predicting the performance of the parallel applications and to analyze the scalability of the parallel systems. (authors)
A new ray-tracing scheme for 3D diffuse radiation transfer on highly parallel architectures
Tanaka, Satoshi; Yoshikawa, Kohji; Okamoto, Takashi; HASEGAWA, Kenji
2014-01-01
We present a new numerical scheme to solve the transfer of diffuse radiation on three-dimensional mesh grids which is efficient on processors with highly parallel architecture such as recently popular GPUs and CPUs with multi- and many-core architectures. The scheme is based on the ray-tracing method and the computational cost is proportional to $N_{\\rm m}^{5/3}$ where $N_{\\rm m}$ is the number of mesh grids, and is devised to compute the radiation transfer along each light-ray completely in ...
3D parallel-detection microwave tomography for clinical breast imaging
Epstein, N. R., E-mail: nepstein@ucalgary.ca [Schulich School of Engineering, University of Calgary, 2500 University Dr. NW, Calgary, Alberta T2N 1N4 (Canada); Meaney, P. M. [Thayer School of Engineering, Dartmouth College, 14 Engineering Dr., Hanover, New Hampshire 03755 (United States); Paulsen, K. D. [Thayer School of Engineering, Dartmouth College, 14 Engineering Dr., Hanover, New Hampshire 03755 (United States); Department of Radiology, Geisel School of Medicine, Dartmouth College, Hanover, New Hampshire 03755 (United States); Norris Cotton Cancer Center, Dartmouth Hitchcock Medical Center, Lebanon, New Hampshire 03756 (United States); Advanced Surgical Center, Dartmouth Hitchcock Medical Center, Lebanon, New Hampshire 03756 (United States)
2014-12-15
A biomedical microwave tomography system with 3D-imaging capabilities has been constructed and translated to the clinic. Updates to the hardware and reconfiguration of the electronic-network layouts in a more compartmentalized construct have streamlined system packaging. Upgrades to the data acquisition and microwave components have increased data-acquisition speeds and improved system performance. By incorporating analog-to-digital boards that accommodate the linear amplification and dynamic-range coverage our system requires, a complete set of data (for a fixed array position at a single frequency) is now acquired in 5.8 s. Replacement of key components (e.g., switches and power dividers) by devices with improved operational bandwidths has enhanced system response over a wider frequency range. High-integrity, low-power signals are routinely measured down to −130 dBm for frequencies ranging from 500 to 2300 MHz. Adequate inter-channel isolation has been maintained, and a dynamic range >110 dB has been achieved for the full operating frequency range (500–2900 MHz). For our primary band of interest, the associated measurement deviations are less than 0.33% and 0.5° for signal amplitude and phase values, respectively. A modified monopole antenna array (composed of two interwoven eight-element sub-arrays), in conjunction with an updated motion-control system capable of independently moving the sub-arrays to various in-plane and cross-plane positions within the illumination chamber, has been configured in the new design for full volumetric data acquisition. Signal-to-noise ratios (SNRs) are more than adequate for all transmit/receive antenna pairs over the full frequency range and for the variety of in-plane and cross-plane configurations. For proximal receivers, in-plane SNRs greater than 80 dB are observed up to 2900 MHz, while cross-plane SNRs greater than 80 dB are seen for 6 cm sub-array spacing (for frequencies up to 1500 MHz). We demonstrate accurate
The Experience of Large Computational Programs Parallelization. Parallel Version of MINUIT Program
Sapozhnikov, A P
2003-01-01
The problems around large computational programs parallelization are discussed. As an example a parallel version of MINUIT, widely used program for minimization, is introduced. The given results of MPI-based MINUIT testing on multiprocessor system demonstrate really reached parallelism.
Knowledge rule base for the beam optics program TRACE 3-D
An expert system type of knowledge rule base has been developed for the input parameters used by the particle beam transport program TRACE 3-D. The goal has been to provide the program's user with adequate on-screen information to allow him to initially set up a problem with minimal open-quotes off-lineclose quotes calculations. The focus of this work has been in developing rules for the parameters which define the beam line transport elements. Ten global parameters, the particle mass and charge, beam energy, etc., are used to provide open-quotes expertclose quotes estimates of lower and upper limits for each of the transport element parameters. For example, the limits for the field strength of the quadrupole element are based on a water-cooled, iron-core electromagnet with dimensions derived from practical engineering constraints, and the upper limit for the effective length is scaled with the particle momenta so that initially parallel trajectories do not cross the axis inside the magnet. Limits for the quadrupole doublet and triplet parameters incorporate these rules and additional rules based on stable FODO lattices and bidirectional focusing requirements. The structure of the rule base is outlined and examples for the quadrupole singlet, doublet and triplet are described. The rule base has been implemented within the Shell for Particle Accelerator Related Codes (SPARC) graphical user interface (GUI)
The linear Boltzmann transport equation (BTE) is an integro-differential equation arising in deterministic models of neutral and charged particle transport. In slab (one-dimensional Cartesian) geometry and certain higher-dimensional cases, Diffusion Synthetic Acceleration (DSA) is known to be an effective algorithm for the iterative solution of the discretized BTE. Fourier and asymptotic analyses have been applied to various idealizations (e.g., problems on infinite domains with constant coefficients) to obtain sharp bounds on the convergence rate of DSA in such cases. While DSA has been shown to be a highly effective acceleration (or preconditioning) technique in one-dimensional problems, it has been observed to be less effective in higher dimensions. This is due in part to the expense of solving the related diffusion linear system. We investigate here the effectiveness of a parallel semicoarsening multigrid (SMG) solution approach to DSA preconditioning in several three dimensional problems. In particular, we consider the algorithmic and implementation scalability of a parallel SMG-DSA preconditioner on several types of test problems
Parallel unstructured mesh optimisation for 3D radiation transport and fluids modelling
In this paper we describe the theory and application of a parallel mesh optimisation procedure to obtain self-adapting finite element solutions on unstructured tetrahedral grids. The optimisation procedure adapts the tetrahedral mesh to the solution of a radiation transport or fluid flow problem without sacrificing the integrity of the boundary (geometry), or internal boundaries (regions) of the domain. The objective is to obtain a mesh which has both a uniform interpolation error in any direction and the element shapes are of good quality. This is accomplished with use of a non-Euclidean (anisotropic) metric which is related to the Hessian of the solution field. Appropriate scaling of the metric enables the resolution of multi-scale phenomena as encountered in transient incompressible fluids and multigroup transport calculations. The resulting metric is used to calculate element size and shape quality. The mesh optimisation method is based on a series of mesh connectivity and node position searches of the landscape defining mesh quality which is gauged by a functional. The mesh modification thus fits the solution field(s) in an optimal manner. The parallel mesh optimisation/adaptivity procedure presented in this paper is of general applicability. We illustrate this by applying it to a transient CFD (computational fluid dynamics) problem. Incompressible flow past a cylinder at moderate Reynolds numbers is modelled to demonstrate that the mesh can follow transient flow features. (authors)
A spherical harmonics research code (DANTE) has been developed which is compatible with parallel computer architectures. DANTE provides 3-D, multi-material, deterministic, transport capabilities using an arbitrary finite element mesh. The linearized Boltzmann transport equation is solved in a second order self-adjoint form utilizing a Galerkin finite element spatial differencing scheme. The core solver utilizes a preconditioned conjugate gradient algorithm. Other distinguishing features of the code include options for discrete-ordinates and simplified spherical harmonics angular differencing, an exact Marshak boundary treatment for arbitrarily oriented boundary faces, in-line matrix construction techniques to minimize memory consumption, and an effective diffusion based preconditioner for scattering dominated problems. Algorithm efficiency is demonstrated for a massively parallel SIMD architecture (CM-5), and compatibility with MPP multiprocessor platforms or workstation clusters is anticipated
3D interconnect architecture for high-bandwidth massively paralleled imager
The proton radiography group at LANL is developing a fast (5x106 frames/s or 5 megaframe/s) multi-frame imager for use in dynamic radiographic experiments with high-energy protons. The mega-pixel imager will acquire and process a burst of 32 frames captured at inter-frame time ∼200 ns. Real time signal processing and storage requirements for entire frames, of rapidly acquired pixels impose severe demands on the space available for the electronics in a standard monolithic approach. As such, a 3D arrangement of detector and circuit elements is under development. In this scheme, the readout integrated circuits (ROICs) are stacked vertically (like playing cards) into a cube configuration. Another die, a fully depleted pixel photo-diode focal plane array (FPA), is bump bonded to one of the edge surfaces formed by the resulting ROIC cube. Recently, an assembly of the proof-of-principle test cube and sensor has been completed
3D interconnect architecture for high-bandwidth massively paralleled imager
Kwiatkowski, K. E-mail: krisk@lanl.gov; Lyke, J.C.; Wojnarowski, R.J.; Beche, J.-F.; Fillion, R.; Kapusta, C.; Millaud, J.; Saia, R.; Wilke, M.D
2003-08-21
The proton radiography group at LANL is developing a fast (5x10{sup 6} frames/s or 5 megaframe/s) multi-frame imager for use in dynamic radiographic experiments with high-energy protons. The mega-pixel imager will acquire and process a burst of 32 frames captured at inter-frame time {approx}200 ns. Real time signal processing and storage requirements for entire frames, of rapidly acquired pixels impose severe demands on the space available for the electronics in a standard monolithic approach. As such, a 3D arrangement of detector and circuit elements is under development. In this scheme, the readout integrated circuits (ROICs) are stacked vertically (like playing cards) into a cube configuration. Another die, a fully depleted pixel photo-diode focal plane array (FPA), is bump bonded to one of the edge surfaces formed by the resulting ROIC cube. Recently, an assembly of the proof-of-principle test cube and sensor has been completed.
Awatsuji, Yasuhiro; Xia, Peng; Wang, Yexin; Matoba, Osamu
2016-03-01
Digital holography is a technique of 3D measurement of object. The technique uses an image sensor to record the interference fringe image containing the complex amplitude of object, and numerically reconstructs the complex amplitude by computer. Parallel phase-shifting digital holography is capable of accurate 3D measurement of dynamic object. This is because this technique can reconstruct the complex amplitude of object, on which the undesired images are not superimposed, form a single hologram. The undesired images are the non-diffraction wave and the conjugate image which are associated with holography. In parallel phase-shifting digital holography, a hologram, whose phase of the reference wave is spatially and periodically shifted every other pixel, is recorded to obtain complex amplitude of object by single-shot exposure. The recorded hologram is decomposed into multiple holograms required for phase-shifting digital holography. The complex amplitude of the object is free from the undesired images is reconstructed from the multiple holograms. To validate parallel phase-shifting digital holography, a high-speed parallel phase-shifting digital holography system was constructed. The system consists of a Mach-Zehnder interferometer, a continuous-wave laser, and a high-speed polarization imaging camera. Phase motion picture of dynamic air flow sprayed from a nozzle was recorded at 180,000 frames per second (FPS) have been recorded by the system. Also phase motion picture of dynamic air induced by discharge between two electrodes has been recorded at 1,000,000 FPS, when high voltage was applied between the electrodes.
About Parallel Programming: Paradigms, Parallel Execution and Collaborative Systems
Loredana MOCEAN; Monica CEACA
2009-01-01
In the last years, there were made efforts for delineation of a stabile and unitary frame, where the problems of logical parallel processing must find solutions at least at the level of imperative languages. The results obtained by now are not at the level of the made efforts. This paper wants to be a little contribution at these efforts. We propose an overview in parallel programming, parallel execution and collaborative systems.
About Parallel Programming: Paradigms, Parallel Execution and Collaborative Systems
Loredana MOCEAN
2009-01-01
Full Text Available In the last years, there were made efforts for delineation of a stabile and unitary frame, where the problems of logical parallel processing must find solutions at least at the level of imperative languages. The results obtained by now are not at the level of the made efforts. This paper wants to be a little contribution at these efforts. We propose an overview in parallel programming, parallel execution and collaborative systems.
Structured Parallel Programming Patterns for Efficient Computation
McCool, Michael; Robison, Arch
2012-01-01
Programming is now parallel programming. Much as structured programming revolutionized traditional serial programming decades ago, a new kind of structured programming, based on patterns, is relevant to parallel programming today. Parallel computing experts and industry insiders Michael McCool, Arch Robison, and James Reinders describe how to design and implement maintainable and efficient parallel algorithms using a pattern-based approach. They present both theory and practice, and give detailed concrete examples using multiple programming models. Examples are primarily given using two of th
Implementation of a 3D plasma particle-in-cell code on a MIMD parallel computer
A three-dimensional plasma particle-in-cell (PIC) code has been implemented on the Intel Delta MIMD parallel supercomputer using the General Concurrent PIC algorithm. The GCPIC algorithm uses a domain decomposition to divide the computation among the processors: A processor is assigned a subdomain and all the particles in it. Particles must be exchanged between processors as they move. Results are presented comparing the efficiency for 1-, 2- and 3-dimensional partitions of the three dimensional domain. This algorithm has been found to be very efficient even when a large fraction (e.g. 30%) of the particles must be exchanged at every time step. On the 512-node Intel Delta, up to 125 million particles have been pushed with an electrostatic push time of under 500 nsec/particle/time step
A Case Study of a Hybrid Parallel 3D Surface Rendering Graphics Architecture
Holten-Lund, Hans Erik; Madsen, Jan; Pedersen, Steen
1997-01-01
This paper presents a case study in the design strategy used inbuilding a graphics computer, for drawing very complex 3Dgeometric surfaces. The goal is to build a PC based computer systemcapable of handling surfaces built from about 2 million triangles, andto be able to render a perspective view...... of these on a computer displayat interactive frame rates, i.e. processing around 50 milliontriangles per second. The paper presents a hardware/softwarearchitecture called HPGA (Hybrid Parallel Graphics Architecture) whichis likely to be able to carry out this task. The case study focuses ontechniques to increase...... the clock frequency as well as the parallelismof the system. This paper focuses on the back-end graphics pipeline,which is responsible for rasterizing triangles.%with a practically linear increase in performance. A pure software implementation of the proposed architecture iscurrently able to process 300...
Parallel CFD simulation of flow in a 3D model of vibrating human vocal folds
Šidlof, Petr; Horáček, Jaromír; Řidký, V.
2013-01-01
Roč. 80, č. 1 (2013), s. 290-300. ISSN 0045-7930 R&D Projects: GA ČR(CZ) GAP101/11/0207 Institutional research plan: CEZ:AV0Z20760514 Keywords : numerical simulation * vocal folds * glottal airflow * inite volume method * parallel CFD Subject RIV: BI - Acoustics Impact factor: 1.532, year: 2013 http://www.sciencedirect.com/science?_ob=ArticleListURL&_method=list&_ArticleListID=-268060849&_sort=r&_st=13&view=c&_acct=C000034318&_version=1&_urlVersion=0&_userid=640952&md5=7c5b5539857ee9a02af5e690585b3126&searchtype=a
Hybrid Characteristics: 3D radiative transfer for parallel adaptive mesh refinement hydrodynamics
Rijkhorst, E J; Dubey, A; Mellema, G R; Rijkhorst, Erik-Jan; Plewa, Tomasz; Dubey, Anshu; Mellema, Garrelt
2005-01-01
We have developed a three-dimensional radiative transfer method designed specifically for use with parallel adaptive mesh refinement hydrodynamics codes. This new algorithm, which we call hybrid characteristics, introduces a novel form of ray tracing that can neither be classified as long, nor as short characteristics, but which applies the underlying principles, i.e. efficient execution through interpolation and parallelizability, of both. Primary applications of the hybrid characteristics method are radiation hydrodynamics problems that take into account the effects of photoionization and heating due to point sources of radiation. The method is implemented in the hydrodynamics package FLASH. The ionization, heating, and cooling processes are modelled using the DORIC ionization package. Upon comparison with the long characteristics method, we find that our method calculates the column density with a similarly high accuracy and produces sharp and well defined shadows. We show the quality of the new algorithm ...
A parallel block multi-level preconditioner for the 3D incompressible Navier-Stokes equations
The development of robust and efficient algorithms for both steady-state simulations and fully implicit time integration of the Navier-Stokes equations is an active research topic. To be effective, the linear subproblems generated by these methods require solution techniques that exhibit robust and rapid convergence. In particular, they should be insensitive to parameters in the problem such as mesh size, time step, and Reynolds number. In this context, we explore a parallel preconditioner based on a block factorization of the coefficient matrix generated in an Oseen nonlinear iteration for the primitive variable formulation of the system. The key to this preconditioner is the approximation of a certain Schur complement operator by a technique first proposed by Kay, Loghin, and Wathen [SIAM J. Sci. Comput., 2002] and Silvester, Elman, Kay, and Wathen [J. Comput. Appl. Math. 128 (2001) 261]. The resulting operator entails subsidiary computations (solutions of pressure Poisson and convection-diffusion subproblems) that are similar to those required for decoupled solution methods; however, in this case these solutions are applied as preconditioners to the coupled Oseen system. One important aspect of this approach is that the convection-diffusion and Poisson subproblems are significantly easier to solve than the entire coupled system, and a solver can be built using tools developed for the subproblems. In this paper, we apply smoothed aggregation algebraic multigrid to both subproblems. Previous work has focused on demonstrating the optimality of these preconditioners with respect to mesh size on serial, two-dimensional, steady-state computations employing geometric multi-grid methods; we focus on extending these methods to large-scale, parallel, three-dimensional, transient and steady-state simulations employing algebraic multigrid (AMG) methods. Our results display nearly optimal convergence rates for steady-state solutions as well as for transient solutions over a
3D Profile Filter Algorithm Based on Parallel Generalized B-spline Approximating Gaussian
REN Zhiying; GAO Chenghui; SHEN Ding
2015-01-01
Currently, the approximation methods of the Gaussian filter by some other spline filters have been developed. However, these methods are only suitable for the study of one-dimensional filtering, when these methods are used for three-dimensional filtering, it is found that a rounding error and quantization error would be passed to the next in every part. In this paper, a new and high-precision implementation approach for Gaussian filter is described, which is suitable for three-dimensional reference filtering. Based on the theory of generalized B-spline function and the variational principle, the transmission characteristics of a digital filter can be changed through the sensitivity of the parameters (t1, t2), and which can also reduce the rounding error and quantization error by the filter in a parallel form instead of the cascade form. Finally, the approximation filter of Gaussian filter is obtained. In order to verify the feasibility of the new algorithm, the reference extraction of the conventional methods are also used and compared. The experiments are conducted on the measured optical surface, and the results show that the total calculation by the new algorithm only requires 0.07 s for 480´480 data points;the amplitude deviation between the reference of the parallel form filter and the Gaussian filter is smaller;the new method is closer to the characteristic of the Gaussian filter through the analysis of three-dimensional roughness parameters, comparing with the cascade generalized B-spline approximating Gaussian. So the new algorithm is also efficient and accurate for the implementation of Gaussian filter in the application of surface roughness measurement.
The ParaScope parallel programming environment
Cooper, Keith D.; Hall, Mary W.; Hood, Robert T.; Kennedy, Ken; Mckinley, Kathryn S.; Mellor-Crummey, John M.; Torczon, Linda; Warren, Scott K.
1993-01-01
The ParaScope parallel programming environment, developed to support scientific programming of shared-memory multiprocessors, includes a collection of tools that use global program analysis to help users develop and debug parallel programs. This paper focuses on ParaScope's compilation system, its parallel program editor, and its parallel debugging system. The compilation system extends the traditional single-procedure compiler by providing a mechanism for managing the compilation of complete programs. Thus, ParaScope can support both traditional single-procedure optimization and optimization across procedure boundaries. The ParaScope editor brings both compiler analysis and user expertise to bear on program parallelization. It assists the knowledgeable user by displaying and managing analysis and by providing a variety of interactive program transformations that are effective in exposing parallelism. The debugging system detects and reports timing-dependent errors, called data races, in execution of parallel programs. The system combines static analysis, program instrumentation, and run-time reporting to provide a mechanical system for isolating errors in parallel program executions. Finally, we describe a new project to extend ParaScope to support programming in FORTRAN D, a machine-independent parallel programming language intended for use with both distributed-memory and shared-memory parallel computers.
Parallel Programming in the Age of Ubiquitous Parallelism
Pingali, Keshav
2014-04-01
Multicore and manycore processors are now ubiquitous, but parallel programming remains as difficult as it was 30-40 years ago. During this time, our community has explored many promising approaches including functional and dataflow languages, logic programming, and automatic parallelization using program analysis and restructuring, but none of these approaches has succeeded except in a few niche application areas. In this talk, I will argue that these problems arise largely from the computation-centric foundations and abstractions that we currently use to think about parallelism. In their place, I will propose a novel data-centric foundation for parallel programming called the operator formulation in which algorithms are described in terms of actions on data. The operator formulation shows that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous even in complex, irregular graph applications such as mesh generation/refinement/partitioning and SAT solvers. Regular algorithms emerge as a special case of irregular ones, and many application-specific optimization techniques can be generalized to a broader context. The operator formulation also leads to a structural analysis of algorithms called TAO-analysis that provides implementation guidelines for exploiting parallelism efficiently. Finally, I will describe a system called Galois based on these ideas for exploiting amorphous data-parallelism on multicores and GPUs
Synergia: A hybrid, parallel beam dynamics code with 3D space charge
James F. Amundson; Panagiotis Spentzouris
2003-07-09
We describe Synergia, a hybrid code developed under the DOE SciDAC-supported Accelerator Simulation Program. The code combines and extends the existing accelerator modeling packages IMPACT and beamline/mxyzptlk. We discuss the design and implementation of Synergia, its performance on different architectures, and its potential applications.
A 3D MPI-Parallel GPU-accelerated framework for simulating ocean wave energy converters
Pathak, Ashish; Raessi, Mehdi
2015-11-01
We present an MPI-parallel GPU-accelerated computational framework for studying the interaction between ocean waves and wave energy converters (WECs). The computational framework captures the viscous effects, nonlinear fluid-structure interaction (FSI), and breaking of waves around the structure, which cannot be captured in many potential flow solvers commonly used for WEC simulations. The full Navier-Stokes equations are solved using the two-step projection method, which is accelerated by porting the pressure Poisson equation to GPUs. The FSI is captured using the numerically stable fictitious domain method. A novel three-phase interface reconstruction algorithm is used to resolve three phases in a VOF-PLIC context. A consistent mass and momentum transport approach enables simulations at high density ratios. The accuracy of the overall framework is demonstrated via an array of test cases. Numerical simulations of the interaction between ocean waves and WECs are presented. Funding from the National Science Foundation CBET-1236462 grant is gratefully acknowledged.
Borazjani, Iman; Ge, Liang; Le, Trung; Sotiropoulos, Fotis
2013-04-01
We develop an overset-curvilinear immersed boundary (overset-CURVIB) method in a general non-inertial frame of reference to simulate a wide range of challenging biological flow problems. The method incorporates overset-curvilinear grids to efficiently handle multi-connected geometries and increase the resolution locally near immersed boundaries. Complex bodies undergoing arbitrarily large deformations may be embedded within the overset-curvilinear background grid and treated as sharp interfaces using the curvilinear immersed boundary (CURVIB) method (Ge and Sotiropoulos, Journal of Computational Physics, 2007). The incompressible flow equations are formulated in a general non-inertial frame of reference to enhance the overall versatility and efficiency of the numerical approach. Efficient search algorithms to identify areas requiring blanking, donor cells, and interpolation coefficients for constructing the boundary conditions at grid interfaces of the overset grid are developed and implemented using efficient parallel computing communication strategies to transfer information among sub-domains. The governing equations are discretized using a second-order accurate finite-volume approach and integrated in time via an efficient fractional-step method. Various strategies for ensuring globally conservative interpolation at grid interfaces suitable for incompressible flow fractional step methods are implemented and evaluated. The method is verified and validated against experimental data, and its capabilities are demonstrated by simulating the flow past multiple aquatic swimmers and the systolic flow in an anatomic left ventricle with a mechanical heart valve implanted in the aortic position. PMID:23833331
Koldan, Jelena; Puzyrev, Vladimir; de la Puente, Josep; Houzeaux, Guillaume; Cela, José María
2014-06-01
We present an elaborate preconditioning scheme for Krylov subspace methods which has been developed to improve the performance and reduce the execution time of parallel node-based finite-element (FE) solvers for 3-D electromagnetic (EM) numerical modelling in exploration geophysics. This new preconditioner is based on algebraic multigrid (AMG) that uses different basic relaxation methods, such as Jacobi, symmetric successive over-relaxation (SSOR) and Gauss-Seidel, as smoothers and the wave front algorithm to create groups, which are used for a coarse-level generation. We have implemented and tested this new preconditioner within our parallel nodal FE solver for 3-D forward problems in EM induction geophysics. We have performed series of experiments for several models with different conductivity structures and characteristics to test the performance of our AMG preconditioning technique when combined with biconjugate gradient stabilized method. The results have shown that, the more challenging the problem is in terms of conductivity contrasts, ratio between the sizes of grid elements and/or frequency, the more benefit is obtained by using this preconditioner. Compared to other preconditioning schemes, such as diagonal, SSOR and truncated approximate inverse, the AMG preconditioner greatly improves the convergence of the iterative solver for all tested models. Also, when it comes to cases in which other preconditioners succeed to converge to a desired precision, AMG is able to considerably reduce the total execution time of the forward-problem code-up to an order of magnitude. Furthermore, the tests have confirmed that our AMG scheme ensures grid-independent rate of convergence, as well as improvement in convergence regardless of how big local mesh refinements are. In addition, AMG is designed to be a black-box preconditioner, which makes it easy to use and combine with different iterative methods. Finally, it has proved to be very practical and efficient in the
A Parallel Programming Model with Sequential Semantics
Thornley, John
1996-01-01
Parallel programming is more difficult than sequential programming in part because of the complexity of reasoning, testing, and debugging in the context of concurrency. In this thesis, we present and investigate a parallel programming model that provides direct control of parallelism in a notation with sequential semantics. Our model consists of a standard sequential imperative programming notation extended with the following three pragmas: (1) The parallelizable sequence of statements pragma...
Adobe Flash 11 Stage3D (Molehill) Game Programming Beginner's Guide
Kaitila, Christer
2011-01-01
Written in an informal and friendly manner, the style and approach of this book will take you on an exciting adventure. Piece by piece, detailed examples help you along the way by providing real-world game code required to make a complete 3D video game. Each chapter builds upon the experience and achievements earned in the last, culminating in the ultimate prize - your game! If you ever wanted to make your own 3D game in Flash, then this book is for you. This book is a perfect introduction to 3D game programming in Adobe Molehill for complete beginners. You do not need to know anything about S
Non-Iterative Rigid 2D/3D Point-Set Registration Using Semidefinite Programming
Khoo, Yuehaw; Kapoor, Ankur
2016-07-01
We describe a convex programming framework for pose estimation in 2D/3D point-set registration with unknown point correspondences. We give two mixed-integer nonlinear program (MINP) formulations of the 2D/3D registration problem when there are multiple 2D images, and propose convex relaxations for both of the MINPs to semidefinite programs (SDP) that can be solved efficiently by interior point methods. Our approach to the 2D/3D registration problem is non-iterative in nature as we jointly solve for pose and correspondence. Furthermore, these convex programs can readily incorporate feature descriptors of points to enhance registration results. We prove that the convex programs exactly recover the solution to the original nonconvex 2D/3D registration problem under noiseless condition. We apply these formulations to the registration of 3D models of coronary vessels to their 2D projections obtained from multiple intra-operative fluoroscopic images. For this application, we experimentally corroborate the exact recovery property in the absence of noise and further demonstrate robustness of the convex programs in the presence of noise.
Productive Parallel Programming: The PCN Approach
Ian Foster; Robert Olson; Steven Tuecke
1992-01-01
We describe the PCN programming system, focusing on those features designed to improve the productivity of scientists and engineers using parallel supercomputers. These features include a simple notation for the concise specification of concurrent algorithms, the ability to incorporate existing Fortran and C code into parallel applications, facilities for reusing parallel program components, a portable toolkit that allows applications to be developed on a workstation or small parallel compute...
Towards Parallel Programming Models for Predictability
Lisper, Björn
2012-01-01
Future embedded systems for performance-demanding applications will be massively parallel. High performance tasks will be parallel programs, running on several cores, rather than single threads running on single cores. For hard real-time applications, WCETs for such tasks must be bounded. Low-level parallel programming models, based on concurrent threads, are notoriously hard to use due to their inherent nondeterminism. Therefore the parallel processing community has long considered high-l...
PDDP, A Data Parallel Programming Model
Karen H. Warren
1996-01-01
PDDP, the parallel data distribution preprocessor, is a data parallel programming model for distributed memory parallel computers. PDDP implements high-performance Fortran-compatible data distribution directives and parallelism expressed by the use of Fortran 90 array syntax, the FORALL statement, and the WHERE construct. Distributed data objects belong to a global name space; other data objects are treated as local and replicated on each processor. PDDP allows the user to program in a shared...
Gainullin, I. K.; Sonkin, M. A.
2015-03-01
A parallelized three-dimensional (3D) time-dependent Schrodinger equation (TDSE) solver for one-electron systems is presented in this paper. The TDSE Solver is based on the finite-difference method (FDM) in Cartesian coordinates and uses a simple and explicit leap-frog numerical scheme. The simplicity of the numerical method provides very efficient parallelization and high performance of calculations using Graphics Processing Units (GPUs). For example, calculation of 106 time-steps on the 1000ṡ1000ṡ1000 numerical grid (109 points) takes only 16 hours on 16 Tesla M2090 GPUs. The TDSE Solver demonstrates scalability (parallel efficiency) close to 100% with some limitations on the problem size. The TDSE Solver is validated by calculation of energy eigenstates of the hydrogen atom (13.55 eV) and affinity level of H- ion (0.75 eV). The comparison with other TDSE solvers shows that a GPU-based TDSE Solver is 3 times faster for the problems of the same size and with the same cost of computational resources. The usage of a non-regular Cartesian grid or problem-specific non-Cartesian coordinates increases this benefit up to 10 times. The TDSE Solver was applied to the calculation of the resonant charge transfer (RCT) in nanosystems, including several related physical problems, such as electron capture during H+-H0 collision and electron tunneling between H- ion and thin metallic island film.
The analysis of visual parallel programming languages
Vladimir Averbukh; Mikhail Bakhterev
2013-01-01
The paper is devoted to the analysis of state of the art in visual parallel programming languages. The brief history of this domain is described. The diagrammatic imagery of visual languages is analyzed. Limitations of the diagrammatic approach are revealed. The additional type of visual parallel programming languages (action language) is described. Some problems of perception of visualization for parallel computing are considered. Some approaches to the evaluation of visual programming langu...
Ma, Yingliang; Saetzler, Kurt
2008-01-01
In this paper we describe a novel 3D subdivision strategy to extract the surface of binary image data. This iterative approach generates a series of surface meshes that capture different levels of detail of the underlying structure. At the highest level of detail, the resulting surface mesh generated by our approach uses only about 10% of the triangles in comparison to the marching cube algorithm (MC) even in settings were almost no image noise is present. Our approach also eliminates the so-called "staircase effect" which voxel based algorithms like the MC are likely to show, particularly if non-uniformly sampled images are processed. Finally, we show how the presented algorithm can be parallelized by subdividing 3D image space into rectilinear blocks of subimages. As the algorithm scales very well with an increasing number of processors in a multi-threaded setting, this approach is suited to process large image data sets of several gigabytes. Although the presented work is still computationally more expensive than simple voxel-based algorithms, it produces fewer surface triangles while capturing the same level of detail, is more robust towards image noise and eliminates the above-mentioned "staircase" effect in anisotropic settings. These properties make it particularly useful for biomedical applications, where these conditions are often encountered. PMID:17993710
iPhone 3D Programming Developing Graphical Applications with OpenGL ES
Rideout, Philip
2010-01-01
What does it take to build an iPhone app with stunning 3D graphics? This book will show you how to apply OpenGL graphics programming techniques to any device running the iPhone OS -- including the iPad and iPod Touch -- with no iPhone development or 3D graphics experience required. iPhone 3D Programming provides clear step-by-step instructions, as well as lots of practical advice, for using the iPhone SDK and OpenGL. You'll build several graphics programs -- progressing from simple to more complex examples -- that focus on lighting, textures, blending, augmented reality, optimization for pe
PDDP, A Data Parallel Programming Model
Karen H. Warren
1996-01-01
Full Text Available PDDP, the parallel data distribution preprocessor, is a data parallel programming model for distributed memory parallel computers. PDDP implements high-performance Fortran-compatible data distribution directives and parallelism expressed by the use of Fortran 90 array syntax, the FORALL statement, and the WHERE construct. Distributed data objects belong to a global name space; other data objects are treated as local and replicated on each processor. PDDP allows the user to program in a shared memory style and generates codes that are portable to a variety of parallel machines. For interprocessor communication, PDDP uses the fastest communication primitives on each platform.
HEXBU-3D, a three-dimensional PWR-simulator program for hexagonal fuel assemblies
HEXBU-3D is a three-dimensional nodal simulator program for PWR reactors. It is designed for a reactor core that consists of hexagonal fuel assemblies and of big follower-type control assemblies. The program solves two-group diffusion equations in homogenized fuel assembly geometry by a sophisticated nodal method. The treatment of feedback effects from xenon-poisoning, fuel temperature, moderator temperature and density and soluble boron concentration are included in the program. The nodal equations are solved by a fast two-level iteration technique and the eigenvalue can be either the effective multiplication factor or the boron concentration of the moderator. Burnup calculations are performed by tabulated sets of burnup-dependent cross sections evaluated by a cell burnup program. HEXBY-3D has been originally programmed in FORTRAN V for the UNIVAC 1108 computer, but there is also another version which is operable on the CDC CYBER 170 computer. (author)
Parallel Programming Archetypes in Combinatorics and Optimization
Kryukova, Svetlana A
1995-01-01
A Parallel Programming Archetype is a language-independent program design strategy. We describe two archetypes in combinatorics and optimization, their components, implementations, and example applications developed using an archetype
Generation of Distributed Parallel Java Programs
Launay, Pascale; Pazat, Jean-Louis
1998-01-01
The aim of the Do! project is to ease the standard task of programming distributed applications using Java. This paper gives an overview of the parallel and distributed frameworks and describes the mechanisms developed to distribute programs with Do!.
Parallel programming with PCN. Revision 1
Foster, I.; Tuecke, S.
1991-12-01
PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).
Parallel programming characteristics of a DSP-based parallel system
GAO Shu; GUO Qing-ping
2006-01-01
This paper firstly introduces the structure and working principle of DSP-based parallel system, parallel accelerating board and SHARC DSP chip. Then it pays attention to investigating the system's programming characteristics, especially the mode of communication, discussing how to design parallel algorithms and presenting a domain-decomposition-based complete multi-grid parallel algorithm with virtual boundary forecast (VBF) to solve a lot of large-scale and complicated heat problems. In the end, Mandelbrot Set and a non-linear heat transfer equation of ceramic/metal composite material are taken as examples to illustrate the implementation of the proposed algorithm. The results showed that the solutions are highly efficient and have linear speedup.
Massively Parallel Finite Element Programming
Heister, Timo
2010-01-01
Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
Parallel programming of industrial applications
Heroux, M; Koniges, A; Simon, H
1998-07-21
In the introductory material, we overview the typical MPP environment for real application computing and the special tools available such as parallel debuggers and performance analyzers. Next, we draw from a series of real applications codes and discuss the specific challenges and problems that are encountered in parallelizing these individual applications. The application areas drawn from include biomedical sciences, materials processing and design, plasma and fluid dynamics, and others. We show how it was possible to get a particular application to run efficiently and what steps were necessary. Finally we end with a summary of the lessons learned from these applications and predictions for the future of industrial parallel computing. This tutorial is based on material from a forthcoming book entitled: "Industrial Strength Parallel Computing" to be published by Morgan Kaufmann Publishers (ISBN l-55860-54).
On the Dynamic Programming Approach for the 3D Navier-Stokes Equations
The dynamic programming approach for the control of a 3D flow governed by the stochastic Navier-Stokes equations for incompressible fluid in a bounded domain is studied. By a compactness argument, existence of solutions for the associated Hamilton-Jacobi-Bellman equation is proved. Finally, existence of an optimal control through the feedback formula and of an optimal state is discussed
Towards Distributed Memory Parallel Program Analysis
Quinlan, D; Barany, G; Panas, T
2008-06-17
This paper presents a parallel attribute evaluation for distributed memory parallel computer architectures where previously only shared memory parallel support for this technique has been developed. Attribute evaluation is a part of how attribute grammars are used for program analysis within modern compilers. Within this work, we have extended ROSE, a open compiler infrastructure, with a distributed memory parallel attribute evaluation mechanism to support user defined global program analysis required for some forms of security analysis which can not be addressed by a file by file view of large scale applications. As a result, user defined security analyses may now run in parallel without the user having to specify the way data is communicated between processors. The automation of communication enables an extensible open-source parallel program analysis infrastructure.
Development and application of 3D core fuel management program for HFETR
The author introduces the principle and function of the 3D core fuel management program HFM for the High Flux Engineering Test Reactor (HFETR). Calculations are performed for five reactor core on HFETR critical assembly and first three cycles of HFETR. Results show the adopted cell and core calculation model and method are correct. Good consistency is obtained in calculational results with experimental values. Therefore, HFM program could be used in core fuel management of HFETR rapidly and exactly
Productive Parallel Programming: The PCN Approach
Ian Foster
1992-01-01
Full Text Available We describe the PCN programming system, focusing on those features designed to improve the productivity of scientists and engineers using parallel supercomputers. These features include a simple notation for the concise specification of concurrent algorithms, the ability to incorporate existing Fortran and C code into parallel applications, facilities for reusing parallel program components, a portable toolkit that allows applications to be developed on a workstation or small parallel computer and run unchanged on supercomputers, and integrated debugging and performance analysis tools. We survey representative scientific applications and identify problem classes for which PCN has proved particularly useful.
A survey of parallel programming tools
Cheng, Doreen Y.
1991-01-01
This survey examines 39 parallel programming tools. Focus is placed on those tool capabilites needed for parallel scientific programming rather than for general computer science. The tools are classified with current and future needs of Numerical Aerodynamic Simulator (NAS) in mind: existing and anticipated NAS supercomputers and workstations; operating systems; programming languages; and applications. They are divided into four categories: suggested acquisitions, tools already brought in; tools worth tracking; and tools eliminated from further consideration at this time.
2D/3D Program work summary report, [January 1988--December 1992
The 2D/3D Program was carried out by Germany, Japan and the United States to investigate the thermal-hydraulics of a PWR large-break LOCA. A contributory approach was utilized in which each country contributed significant effort to the program and all three countries shared the research results. Germany constructed and operated the Upper Plenum Test Facility (UPTF), and Japan constructed and operated the Cylindrical Core Test Facility (CCTF) and the Slab Core Test Facility (SCTF). The US contribution consisted of provision of advanced instrumentation to each of the three test facilities, and assessment of the TRAC computer code against the test results. Evaluations of the test results were carried out in all three countries. This report summarizes the 2D/3D Program in terms of the contributing efforts of the participants
Analysis results from the Los Alamos 2D/3D program
Los Alamos National Laboratory is a participant in the 2D/3D program. Activities conducted at Los Alamos National Laboratory in support of 2D/3D program goals include analysis support of facility design, construction, and operation; provision of boundary and initial conditions for test-facility operations based on analysis of pressurized water reactors; performance of pretest and posttest predictions and analyses; and use of experimental results to validate and assess the single- and multi-dimensional, nonequilibrium features in the Transient Reactor Analysis Code (TRAC). During fiscal year 1987, Los Alamos conducted analytical assessment activities using data from the Slab Core Test Facility, The Cylindrical Core Test Facility, and the Upper Plenum Test Facility. Finally, Los Alamos continued work to provide TRAC improvements. In this paper, Los Alamos activities during fiscal year 1987 will be summarized; several significant accomplishments will be described in more detail to illustrate the work activities at Los Alamos
Analysis results from the Los Alamos 2D/3D program
Los Alamos National Laboratory is a participant in the 2D/3D program. Activities conducted at Los Alamos National Laboratory in support of 2D/3D program goals include analysis support of facility design, construction, and operation; provision of boundary and initial conditions for test-facility operations based on analysis of pressurized water reactors; performance of pretest and post-test predictions and analyses; and use of experimental results to validate and assess the single- and multi-dimensional, nonequilibrium features in the Transient Reactor Analysis Code (TRAC). During fiscal year 1987, Los Alamos conducted analytical assessment activities using data from the Slab Core Test Facility, the Cylindrical Core Test Facility, and the Upper Plenum Test Facility. Finally, Los Alamos continued work to provide TRAC improvements. In this paper, Los Alamos activities during fiscal year 1987 are summarized; several significant accomplishments are described in more detail to illustrate the work activities at Los Alamos
Advanced 3D Audio Algorithms by a Flexible, Low Level Application Programming Interface
Simeonov, A; Zoia, G.; Lluis Garcia, R.; Mlynek, D.
2004-01-01
The constantly increasing demand for a better quality in sound and video for multimedia content and virtual reality compels the implementation of more and more sophisticated 3D audio models in authoring and playback tools. A very careful and systematic analysis of the best available development libraries in this area was carried out, considering different Application Programming Interfaces, their features, extensibility, and portability among each other. The results show that it is often diff...
Integrated Task and Data Parallel Programming
Grimshaw, A. S.
1998-01-01
This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated
Koldan, Jelena
2013-01-01
The growing significance, technical development and employment of electromagnetic (EM) methods in exploration geophysics have led to the increasing need for reliable and fast techniques of interpretation of 3-D EM data sets acquired in complex geological environments. The first and most important step to creating an inversion method is the development of a solver for the forward problem. In order to create an efficient, reliable and practical 3-D EM inversion, it is necessary to have a 3-D EM...
The kpx, a program analyzer for parallelization
The kpx is a program analyzer, developed as a common technological basis for promoting parallel processing. The kpx consists of three tools. The first is ktool, that shows how much execution time is spent in program segments. The second is ptool, that shows parallelization overhead on the Paragon system. The last is xtool, that shows parallelization overhead on the VPP system. The kpx, designed to work for any FORTRAN cord on any UNIX computer, is confirmed to work well after testing on Paragon, SP2, SR2201, VPP500, VPP300, Monte-4, SX-4 and T90. (author)
Speedup predictions on large scientific parallel programs
How much speedup can we expect for large scientific parallel programs running on supercomputers. For insight into this problem we extend the parallel processing environment currently existing on the Cray X-MP (a shared memory multiprocessor with at most four processors) to a simulated N-processor environment, where N greater than or equal to 1. Several large scientific parallel programs from Los Alamos National Laboratory were run in this simulated environment, and speedups were predicted. A speedup of 14.4 on 16 processors was measured for one of the three most used codes at the Laboratory
The PISCES 2 parallel programming environment
Pratt, Terrence W.
1987-01-01
PISCES 2 is a programming environment for scientific and engineering computations on MIMD parallel computers. It is currently implemented on a flexible FLEX/32 at NASA Langley, a 20 processor machine with both shared and local memories. The environment provides an extended Fortran for applications programming, a configuration environment for setting up a run on the parallel machine, and a run-time environment for monitoring and controlling program execution. This paper describes the overall design of the system and its implementation on the FLEX/32. Emphasis is placed on several novel aspects of the design: the use of a carefully defined virtual machine, programmer control of the mapping of virtual machine to actual hardware, forces for medium-granularity parallelism, and windows for parallel distribution of data. Some preliminary measurements of storage use are included.
Steam generator experiment for 3-D computer code qualification - CLOTAIRE international program
The current 1988/89 test program does focus on the production of accurate data sets dedicated to the qualifications of both 3-D thermalhydraulic codes and flow induced vibration predictive tools. In order to meet these challenging objectives the test program includes: detailed measurements of two-phase flow distributions relying on advanced optical probe techniques, throughout the bundle straight part; investigations at the same time of flow distributions and of the tubes' vibratory responses, in the U-band region; for a limited number of preselected positions, measurements of the emulsion's changing characteristics during transient sequences similar to those in an actual plant. (orig./DG)
Portable parallel programming in a Fortran environment
Experience using the Argonne-developed PARMACs macro package to implement a portable parallel programming environment is described. Fortran programs with intrinsic parallelism of coarse and medium granularity are easily converted to parallel programs which are portable among a number of commercially available parallel processors in the class of shared-memory bus-based and local-memory network based MIMD processors. The parallelism is implemented using standard UNIX (tm) tools and a small number of easily understood synchronization concepts (monitors and message-passing techniques) to construct and coordinate multiple cooperating processes on one or many processors. Benchmark results are presented for parallel computers such as the Alliant FX/8, the Encore MultiMax, the Sequent Balance, the Intel iPSC/2 Hypercube and a network of Sun 3 workstations. These parallel machines are typical MIMD types with from 8 to 30 processors, each rated at from 1 to 10 MIPS processing power. The demonstration code used for this work is a Monte Carlo simulation of the response to photons of a ''nearly realistic'' lead, iron and plastic electromagnetic and hadronic calorimeter, using the EGS4 code system. 6 refs., 2 figs., 2 tabs
Parallel programming with PCN. Revision 2
Foster, I.; Tuecke, S.
1993-01-01
PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.
Wireless Rover Meets 3D Design and Product Development
Deal, Walter F., III; Hsiung, Steve C.
2016-01-01
Today there are a number of 3D printing technologies that are low cost and within the budgets of middle and high school programs. Educational technology companies offer a variety of 3D printing technologies and parallel curriculum materials to enable technology and engineering teachers to easily add 3D learning activities to their programs.…
A Fortran program (RELAX3D) to solve the 3 dimensional Poisson (Laplace) equation
RELAX3D is an efficient, user friendly, interactive FORTRAN program which solves the Poisson (Laplace) equation Λ2=p for a general 3 dimensional geometry consisting of Dirichlet and Neumann boundaries approximated to lie on a regular 3 dimensional mesh. The finite difference equations at these nodes are solved using a successive point-iterative over-relaxation method. A menu of commands, supplemented by HELP facility, controls the dynamic loading of the subroutine describing the problem case, the iterations to converge to a solution, and the contour plotting of any desired slices, etc
Reactor safety issues resolved by the 2D/3D program
The 2D/3D Program studied multidimensional thermal-hydraulics in a PWR core and primary system during the end-of-blowdown and post-blowdown phases of a large-break LOCA (LBLOCA), and during selected small-break LOCA (SBLOCA) transients. The program included tests at the Cylindrical Core Test Facility (CCTF), the Slab Core Test Facility (SCTF), and the Upper Plenum Test Facility (UPTF), and computer analyses using TRAC. Tests at CCTF investigated core thermal-hydraulics and overall system behavior while tests at SCTF concentrated on multidimensional core thermal-hydraulics. The UPTF tests investigated two-phase flow behavior in the downcomer, upper plenum, tie plate region, and primary loops. TRAC analyses evaluated thermal-hydraulic behavior throughout the primary system in tests as well as in PWRs. This report summarizes the test and analysis results in each of the main areas where improved information was obtained in the 2D/3D Program. The discussion is organized in terms of the reactor safety issues investigated. This report was prepared in a coordination among US, Germany and Japan. US and Germany have published the report as NUREG/IA-0127 and GRS-101 respectively. (author)
Development of a 3D multigroup program for Dancoff factor calculation in pebble bed reactors
Highlights: • Development of a 3D Monte Carlo based code for pebble bed reactors. • Dancoff sensitivity to clad, moderator and fuel cross sections is considered. • Sensitivity of Dancoff to number of energy groups is considered. • Sensitivity of Dancoff to number of fuel and their arrangement is considered. • Excellent agreements vs. MCNP code. - Abstract: The evaluation of multigroup constants in reactor calculations depends on several parameters. One of these parameters is the Dancoff factor which is used for calculating the resonance integral and flux depression in the resonance region in heterogeneous systems. In the current paper, a computer program (MCDAN-3D) is developed for calculating three dimensional black and gray Dancoff coefficients, based on Monte Carlo, escape probability and neutron free flight methods. The developed program is capable to calculate the Dancoff factor for an arbitrary arrangement of fuel and moderator pebbles. Moreover this program can simulate fuels with homogeneous and heterogeneous compositions. It might generate the position of Triso particles in fuel pebbles randomly as well. It could calculate the black and gray Dancoff coefficients since fuel region might have different cross sections. Finally, the effects of clad and moderator are considered and the sensitivity of Dancoff factor with fuels arrangement variation, number of TRISO particles and neutron energy has been studied
Reactor safety issues resolved by the 2D/3D Program
The 2D/3D Program studied multidimensional thermal-hydraulics in a PWR core and primary system during the end-of-blowdown and post-blowdown phases of a large-break LOCA (LBLOCA), and during selected small-break LOCA (SBLOCA) transients. The program included tests at the Cylindrical Core Test Facility (CCTF), the Slab Core Test Facility (SCTF), and the Upper Plenum Test Facility (UPTF), and computer analyses using TRAC. Tests at CCTF investigated core thermal-hydraulics and overall system behavior while tests at SCTF concentrated on multidimensional core thermal-hydraulics. The UPTF tests investigated two-phase flow behavior in the downcomer, upper plenum, tie plate region, and primary loops. TRAC analyses evaluated thermal-hydraulic behavior throughout the primary system in tests as well as in PWRs. This report summarizes the test and analysis results in each of the main areas where improved information was obtained in the 2D/3D Program. The discussion is organized in terms of the reactor safety issues investigated
Ingram-Goble, Adam
This is an exploratory design study of a novel system for learning programming and 3D role-playing game design as tools for social change. This study was conducted at two sites. Participants in the study were ages 9-14 and worked for up to 15 hours with the platform to learn how to program and design video games with personally or socially relevant narratives. This first study was successful in that students learned to program a narrative game, and they viewed the social problem framing for the practices as an interesting aspect of the experience. The second study provided illustrative examples of how providing less general structure up-front, afforded players the opportunity to produce the necessary structures as needed for their particular design, and therefore had a richer understanding of what those structures represented. This study demonstrates that not only were participants able to use computational thinking skills such as Boolean and conditional logic, planning, modeling, abstraction, and encapsulation, they were able to bridge these skills to social domains they cared about. In particular, participants created stories about socially relevant topics without to explicit pushes by the instructors. The findings also suggest that the rapid uptake, and successful creation of personally and socially relevant narratives may have been facilitated by close alignment between the conceptual tools represented in the platform, and the domain of 3D role-playing games.
Detecting drug use in adolescents using a 3D simulation program
Luis Iribarne
2010-11-01
Full Text Available This work presents a new 3D simulation program, called MiiSchool, and its application to the detection of problem behaviours appearing in school settings. We begin by describing some of the main features of the Mii School program. Then, we present the results of a study in which adolescents responded to Mii School simulations involving the consumption of alcoholic drinks, cigarettes, cannabis, cocaine, and MDMA (ecstasy. We established a“risk profile” based on the observed response patterns. We also present results concerning user satisfaction with the program and the extent to which users felt that the simulated scenes were realistic. Lastly, we discuss the usefulness of Mii School as a tool for assessing drug use in school settings.
PLOT 3D: a multipurpose, interactive program for plotting three dimensional graphs
PLOT3D is a general purpose, interactive, three dimensional display and plotter program. Written in Fortran-77, it uses a two dimensional plotter software (PLOT-10) to draw orthographic axonometric projection of three dimensional graph comprising of smooth surface or cubical histogram, in any desired orientation, magnification and window, employing throughout highly accurate hidden lines removal techniques. The figure, so generated, can optionally be clipped, smoothened by interpolation, or shaded selectively to distinguish among its different faces. The program accepts data from an external file, or generates them through built-in functions. It can be used for graphical representation of data such as neutron flux, theta(x,y) in form of smooth surface, even if available data are very few in number. It is also capable of drawing histograms of quantities such as fuel power in a reactor lattice. Listing of the program is given. (author)
Optics Program Modified for Multithreaded Parallel Computing
Lou, John; Bedding, Dave; Basinger, Scott
2006-01-01
A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.
OpenCL parallel programming development cookbook
Tay, Raymond
2013-01-01
OpenCL Parallel Programming Development Cookbook will provide a set of advanced recipes that can be utilized to optimize existing code. This book is therefore ideal for experienced developers with a working knowledge of C/C++ and OpenCL.This book is intended for software developers who have often wondered what to do with that newly bought CPU or GPU they bought other than using it for playing computer games; this book is also for developers who have a working knowledge of C/C++ and who want to learn how to write parallel programs in OpenCL so that life isn't too boring.
Parallelism and programming in classifier systems
Forrest, Stephanie
1990-01-01
Parallelism and Programming in Classifier Systems deals with the computational properties of the underlying parallel machine, including computational completeness, programming and representation techniques, and efficiency of algorithms. In particular, efficient classifier system implementations of symbolic data structures and reasoning procedures are presented and analyzed in detail. The book shows how classifier systems can be used to implement a set of useful operations for the classification of knowledge in semantic networks. A subset of the KL-ONE language was chosen to demonstrate these o
The JLAB 3D program at 12 GeV (TMDs + GPDs)
Pisano, Silvia [Istituto Nazionale di Fisica Nucleare (INFN), Frascati (Italy)
2015-01-01
The Jefferson Lab CEBAF accelerator is undergoing an upgrade that will increase the beam energy up to 12 GeV. The three experimental Halls operating in the 6-GeV era are upgrading their detectors to adapt their performances to the new available kinematics, and a new Hall (D) is being built. The investigation of the three-dimensional nucleon structure both in the coordinate and in the momentum space represents an essential part of the 12-GeV physics program, and several proposals aiming at the extraction of related observables have been already approved in Hall A, B and C. In this proceedings, the focus of the JLab 3D program will be described, and a selection of proposals will be discussed.
The JLAB 3D program at 12 GeV (TMDs + GPDs)
The Jefferson Lab CEBAF accelerator is undergoing an upgrade that will increase the beam energy up to 12 GeV. The three experimental Halls operating in the 6-GeV era are upgrading their detectors to adapt their performances to the new available kinematics, and a new Hall (D) is being built. The investigation of the three-dimensional nucleon structure both in the coordinate and in the momentum space represents an essential part of the 12-GeV physics program, and several proposals aiming at the extraction of related observables have been already approved in Hall A, B and C. In this proceedings, the focus of the JLab 3D program will be described, and a selection of proposals will be discussed.
Parallel Programming of General-Purpose Programs Using Task-Based Programming Models
Vandierendonck, Hans; Pratikakis, Polyvios; Nikolopoulos, Dimitrios
2011-01-01
The prevalence of multicore processors is bound to drive most kinds of software development towards parallel programming. To limit the difficulty and overhead of parallel software design and maintenance, it is crucial that parallel programming models allow an easy-to-understand, concise and dense representation of parallelism. Parallel programming models such as Cilk++ and Intel TBBs attempt to offer a better, higher-level abstraction for parallel programming than threads and locking synchron...
Parallel GRISYS/Power Challenge System Version 1.0 and 3D Prestack Depth Migration Package
Zhao Zhenwen
1995-01-01
@@ Based on the achievements and experience of seismic data parallel processing made in the past years by Beijing Global Software Corporation (GS) of CNPC, Parallel GRISYS/Power Challenge seismic data processing system version 1.0 has been cooperatively developed and integrated on the Power Challenge computer by GS, SGI (USA) and Shuangyuan Company of Academia Sinica.
Distributed parallel computing using navigational programming
Pan, Lei; Lai, M. K.; Noguchi, K; Huseynov, J J; L. F. Bic; Dillencourt, M B
2004-01-01
Message Passing ( MP) and Distributed Shared Memory (DSM) are the two most common approaches to distributed parallel computing. MP is difficult to use, whereas DSM is not scalable. Performance scalability and ease of programming can be achieved at the same time by using navigational programming (NavP). This approach combines the advantages of MP and DSM, and it balances convenience and flexibility. Similar to MP, NavP suggests to its programmers the principle of pivot-computes and hence is ef...
Trelease, R B
1996-01-01
Advances in computer visualization and user interface technologies have enabled development of "virtual reality" programs that allow users to perceive and to interact with objects in artificial three-dimensional environments. Such technologies were used to create an image database and program for studying the human skull, a specimen that has become increasingly expensive and scarce. Stereoscopic image pairs of a museum-quality skull were digitized from multiple views. For each view, the stereo pairs were interlaced into a single, field-sequential stereoscopic picture using an image processing program. The resulting interlaced image files are organized in an interactive multimedia program. At run-time, gray-scale 3-D images are displayed on a large-screen computer monitor and observed through liquid-crystal shutter goggles. Users can then control the program and change views with a mouse and cursor to point-and-click on screen-level control words ("buttons"). For each view of the skull, an ID control button can be used to overlay pointers and captions for important structures. Pointing and clicking on "hidden buttons" overlying certain structures triggers digitized audio spoken word descriptions or mini lectures. PMID:8793223
Functions, objects and parallelism programming in Balinda K
Kwong, Yuen Chung
1999-01-01
Despite many years of research and development, parallel programming remains a difficult and specialized task. A simple but general model for parallel processing is still lacking.This book proposes a model that adds parallelism to functions and objects, allowing simple specification of both parallel execution and inter-process communication. Many examples of applying parallel programming are given.
Control cycloconverter using transputer based parallel programming
Chiu, D.M.; Li, S.J. [Victoria Univ. of Technology, Melbourne, Victoria (Australia). Dept. of Electrical and Electronic Engineering
1995-12-31
The naturally commutated cycloconverter is a valuable tool for speed control in AC machines. Over the years, its control circuits have been designed and implemented using vacuum tubes, transistors, integrated circuits and single-processor. However, the problem of obtaining accurate data on the triggering pulse generation in order for quantitative analysis is still unsolved. Triggering instants of cycloconverter have been precisely controlled using transputer based parallel computing technique. The HELIOS operating system is found to be an efficient tool for the development of the parallel programming. Different topology configurations and various communication mechanisms of HELIOS are employed to determine and select the fastest technique that is suitable for the present work.
Concurrency-based approaches to parallel programming
Kale, L.V.; Chrisochoides, N.; Kohl, J.; Yelick, K.
1995-01-01
The inevitable transition to parallel programming can be facilitated by appropriate tools, including languages and libraries. After describing the needs of applications developers, this paper presents three specific approaches aimed at development of efficient and reusable parallel software for irregular and dynamic-structured problems. A salient feature of all three approaches in their exploitation of concurrency within a processor. Benefits of individual approaches such as these can be leveraged by an interoperability environment which permits modules written using different approaches to co-exist in single applications.
Design patterns percolating to parallel programming framework implementation
Aldinucci, M.; Campa, S.; Danelutto, M.; Kilpatrick, P.; Torquati, M.
2014-01-01
Structured parallel programming is recognised as a viable and effective means of tackling parallel programming problems. Recently, a set of simple and powerful parallel building blocks RISC pb2l) has been proposed to support modelling and implementation of parallel frameworks. In this work we demonstrate how that same parallel building block set may be used to model both general purpose parallel programming abstractions, not usually listed in classical skeleton sets, and more specialized doma...
Skeleton based parallel programming: functional and parallel semantics in a single shot
Aldinucci, Marco; Danelutto, Marco
2004-01-01
Different skeleton based parallel programming systems have been developed in past years. The main goal of these programming environments is to provide programmers with handy, effective ways of writing parallel applications. In particular, skeleton based parallel programming environments automatically deal with most of the difficult, cumbersome programming problems that must be usually handled by programmers of parallel applications using traditional programming environments (e.g. environments...
Kaldestad, Knut B.; Haddadin, Sami; Belder, Rico; Hovland, Geir; Anisi, David A.
2014-01-01
In this paper we present an experimental study on real-time collision avoidance with potential ﬁelds that are based on 3D point cloud data and processed on the Graphics Processing Unit (GPU). The virtual forces from the potential ﬁelds serve two purposes. First, they are used for changing the reference trajectory. Second they are projected to and applied on torque control level for generating according nullspace behavior together with a Cartesian impedance main control ...
Programming massively parallel processors a hands-on approach
Kirk, David B
2010-01-01
Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs. Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA. The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who ...
PSHED: a simplified approach to developing parallel programs
This paper presents a simplified approach in the forms of a tree structured computational model for parallel application programs. An attempt is made to provide a standard user interface to execute programs on BARC Parallel Processing System (BPPS), a scalable distributed memory multiprocessor. The interface package called PSHED provides a basic framework for representing and executing parallel programs on different parallel architectures. The PSHED package incorporates concepts from a broad range of previous research in programming environments and parallel computations. (author). 6 refs
Automatic Performance Debugging of SPMD Parallel Programs
Liu, Xu; Zhan, Jianfeng; Tu, Bibo; Meng, Dan
2010-01-01
Automatic performance debugging of parallel applications usually involves two steps: automatic detection of performance bottlenecks and uncovering their root causes for performance optimization. Previous work fails to resolve this challenging issue in several ways: first, several previous efforts automate analysis processes, but present the results in a confined way that only identifies performance problems with apriori knowledge; second, several tools take exploratory or confirmatory data analysis to automatically discover relevant performance data relationships. However, these efforts do not focus on locating performance bottlenecks or uncovering their root causes. In this paper, we design and implement an innovative system, AutoAnalyzer, to automatically debug the performance problems of single program multi-data (SPMD) parallel programs. Our system is unique in terms of two dimensions: first, without any apriori knowledge, we automatically locate bottlenecks and uncover their root causes for performance o...
Hybrid MPI+OpenMP parallelization of an FFT-based 3D Poisson solver with one periodic direction
Gorobets, Andrei; Trias Miquel, Francesc Xavier; Borrell Pol, Ricard; Lehmkuhl Barba, Oriol; Oliva Llena, Asensio
2011-01-01
This work is devoted to the development of efficient parallel algorithms for the direct numerical simulation (DNS) of incompressible flows on modern supercomputers. In doing so, a Poisson equation needs to be solved at each time-step to project the velocity field onto a divergence-free space. Due to the non-local nature of its solution, this elliptic system is the part of the algorithm that is most difficult to parallelize. The Poisson solver presented here is restricted to problems with o...
The finite element-based Continuum Damage Mechanics (CDM) software DAMAGE XXX has been developed: to model high-temperature creep damage initiation, evolution and crack growth in 3-D engineering components; and, to run on parallel computer architectures. The driver has been to achieve computational speed through computer parallelism. The development and verification of the software have been carried out using uni-axial crosswelded testpieces in which the plane of symmetry of the V-weld preparation is orthogonal to the tensile loading axis. The welds were manufactured using 0.5Cr-0.5Mo-0.25V ferritic parent steel, and a matching 2.25Cr-1Mo ferritic steel weld filler metal. The Heat Affected Zones (HAZ) of welds were assumed to be divided into three sub-regions: Coarse grained-HAZ (CG-HAZ); Refined grained-HAZ (R-HAZ); and, the inter-critical HAZ regions (Type IV-HAZ). Constitutive equations and associated parameters are summarised for weld, CG-HAZ, R-HAZ, Type IV-HAZ, and parent materials, at 575, 590, and 600 deg. C. These are used to make finite element-based predictions of crossweld testpiece lifetimes and failure modes using the newly developed 3-D parallel computer software, and independent 2-D serial software, at an average minimum cross-section stress of 69.5 MPa. Crossweld testpiece analyses, done using the newly developed 3-D parallel software, have been verified using independent results of 2-D serial software; and, of laboratory experiments.
Some important issues of the computational process in parallel programming
Lyazzat Kh. Zhunussova
2015-01-01
The modern approach to education in parallel programming has enough bright “technological” focus: the main emphasis in presenting educational material is on aspects of parallel computing architectures and practical parallel programming techniques.In other words, the issue of creating parallel software becomes only one aspect of a more general discipline — engineering parallel software application as a set of mathematical models, numerical methods for their implementation, parallel algorithms ...
Thermal-hydraulic system computer codes are extensively used worldwide for analysis of nuclear facilities by utilities, regulatory bodies, nuclear power plant designers and vendors, nuclear fuel companies, research organizations, consulting companies, and technical support organizations. The computer code user represents a source of uncertainty that can influence the results of system code calculations. This influence is commonly known as the 'user effect' and stems from the limitations embedded in the codes as well as from the limited capability of the analysts to use the codes. Code user training and qualification is an effective means for reducing the variation of results caused by the application of the codes by different users. This paper describes a systematic approach to training code users who, upon completion of the training, should be able to perform calculations making the best possible use of the capabilities of best estimate codes. In other words, the program aims at contributing towards solving the problem of user effect. The 3D S.UN.COP (Scaling, Uncertainty and 3D COuPled code calculations) seminars have been organized as follow-up of the proposal to IAEA for the Permanent Training Course for System Code Users (D'Auria, 1998). Four seminars have been held at University of Pisa (2003, 2004), at The Pennsylvania State University (2004) and at University of Zagreb (2005). It was recognized that such courses represented both a source of continuing education for current code users and a mean for current code users to enter the formal training structure of a proposed 'permanent' stepwise approach to user training. The 3D S.UN.COP 2005 was successfully held with the participation of 19 persons coming from 9 countries and 14 different institutions (universities, vendors, national laboratories and regulatory bodies). More than 15 scientists were involved in the organization of the seminar, presenting theoretical aspects of the proposed methodologies and
Thermal-hydraulic system computer codes are extensively used worldwide for analysis of nuclear facilities by utilities, regulatory bodies, nuclear power plant designers and vendors, nuclear fuel companies, research organizations, consulting companies, and technical support organizations. The computer code user represents a source of uncertainty that can influence the results of system code calculations. This influence is commonly known as the 'user effect' and stems from the limitations embedded in the codes as well as from the limited capability of the analysts to use the codes. Code user training and qualification is an effective means for reducing the variation of results caused by the application of the codes by different users. This paper describes a systematic approach to training code users who, upon completion of the training, should be able to perform calculations making the best possible use of the capabilities of best estimate codes. In other words, the program aims at contributing towards solving the problem of user effect. The 3D S.UN.COP (Scaling, Uncertainty and 3D COuPled code calculations) seminars have been organized as follow-up of the proposal to IAEA for the Permanent Training Course for System Code Users [1]. Five seminars have been held at University of Pisa (2003, 2004), at The Pennsylvania State University (2004), at University of Zagreb (2005) and at the School of Industrial Engineering of Barcelona (2006). It was recognized that such courses represented both a source of continuing education for current code users and a mean for current code users to enter the formal training structure of a proposed 'permanent' stepwise approach to user training. The 3D S.UN.COP 2006 was successfully held with the attendance of 33 participants coming from 18 countries and 28 different institutions (universities, vendors, national laboratories and regulatory bodies). More than 30 scientists (coming from 13 countries and 23 different institutions) were
Thermal-hydraulic system computer codes are extensively used worldwide for analysis of nuclear facilities by utilities, regulatory bodies, nuclear power plant designers and vendors, nuclear fuel companies, research organizations, consulting companies, and technical support organizations. The computer code user represents a source of uncertainty that can influence the results of system code calculations. This influence is commonly known as the user effect' and stems from the limitations embedded in the codes as well as from the limited capability of the analysis to use the codes. Code user training and qualification is an effective means for reducing the variation of results caused by the application of the codes by different users. This paper describes a systematic approach to training code users who, upon completion of the training, should be able to perform calculations making the best possible use of the capabilities of best estimate codes. In other words, the program aims at contributing towards solving the problem of user effect. The 3D S.UN.COP (Scaling, Uncertainty and 3D COuPled code calculations) seminars have been organized as follow-up of the proposal to IAEA for the Permanent Training Course for System Code Users. Six seminars have been held at University of Pisa (2003, 2004), at The Pennsylvania State University (2004), at University of Zagreb (2005), at the School of Industrial Engineering of Barcelona (January-February 2006) and in Buenos Aires, Argentina (October 2006), being this last one requested by ARN (Autoridad Regulatoria Nuclear), NA-SA (Nucleoelectrica Argentina S.A) and CNEA (Comision Nacional de Energia Atomica). It was recognized that such courses represented both a source of continuing education for current code users and a mean for current code users to enter the formal training structure of a proposed 'permanent' stepwise approach to user training. The 3D S.UN.COP 2006 in Barcelona was successfully held with the attendance of 33
Thermal-hydraulic system computer codes are extensively used worldwide for analysis of nuclear facilities by utilities, regulatory bodies, nuclear power plant designers and vendors, nuclear fuel companies, research organizations, consulting companies, and technical support organizations. The computer code user represents a source of uncertainty that can influence the results of system code calculations. This influence is commonly known as the 'user effect' and stems from the limitations embedded in the codes as well as from the limited capability of the analysts to use the codes. Code user training and qualification is an effective means for reducing the variation of results caused by the application of the codes by different users. This paper describes a systematic approach to training code users who, upon completion of the training, should be able to perform calculations making the best possible use of the capabilities of best estimate codes. In other words, the program aims at contributing towards solving the problem of user effect. The 3D S.UN.COP 2005 (Scaling, Uncertainty and 3D COuPled code calculations) seminar has been organized by University of Pisa and University of Zagreb as follow-up of the proposal to IAEA for the Permanent Training Course for System Code Users (D'Auria, 1998). It was recognized that such a course represented both a source of continuing education for current code users and a means for current code users to enter the formal training structure of a proposed 'permanent' stepwise approach to user training. The seminar-training was successfully held with the participation of 19 persons coming from 9 countries and 14 different institutions (universities, vendors, national laboratories and regulatory bodies). More than 15 scientists were involved in the organization of the seminar, presenting theoretical aspects of the proposed methodologies and holding the training and the final examination. A certificate (LA Code User grade) was released
Castillo-Reyes, Octavio; de la Puente, Josep; Puzyrev, Vladimir; Cela, José M.
2015-01-01
This paper deals with the most relevant parallel and numerical issues that arise when applying the Edge Element Method in the solution of electromagnetic problems in exploration geophysics. In this sense, in recent years the application of land and marine controlled-source electromagnetic (CSEM) surveys has gained tremendous interest among the offshore exploration community. This method is especially significant in detecting hydrocarbon in shallow/deep waters. On the other hand, in Finite Ele...
Bulovyatov, Alexander
2010-01-01
The band structure computation turns into solving a family of Maxwell eigenvalue problems on the periodicity domain. The discretization is done by the finite element method with special higher order H(curl)- and H1-conforming modified elements. The eigenvalue problem is solved by a preconditioned iterative eigenvalue solver with a projection onto the divergence-free vector fields. As a preconditioner we use the parallel multigrid method with a special Hiptmair smoother.
Koldan, Jelena; Puzyrev, Vladimir; de la Puente, Josep; Houzeaux, Guillaume; José M. Cela
2014-01-01
We present an elaborate preconditioning scheme for Krylov subspace methods which has been developed to improve the performance and reduce the execution time of parallel node-based finite-element solvers for three-dimensional electromagnetic numerical modelling in exploration geophysics. This new preconditioner is based on algebraic multigrid that uses different basic relaxation methods, such as Jacobi, symmetric successive over-relaxation and Gauss-Seidel, as smoothers and the wav...
Aftosmis, M. J.; Berger, M. J.; Murman, S. M.; Kwak, Dochan (Technical Monitor)
2002-01-01
The proposed paper will present recent extensions in the development of an efficient Euler solver for adaptively-refined Cartesian meshes with embedded boundaries. The paper will focus on extensions of the basic method to include solution adaptation, time-dependent flow simulation, and arbitrary rigid domain motion. The parallel multilevel method makes use of on-the-fly parallel domain decomposition to achieve extremely good scalability on large numbers of processors, and is coupled with an automatic coarse mesh generation algorithm for efficient processing by a multigrid smoother. Numerical results are presented demonstrating parallel speed-ups of up to 435 on 512 processors. Solution-based adaptation may be keyed off truncation error estimates using tau-extrapolation or a variety of feature detection based refinement parameters. The multigrid method is extended to for time-dependent flows through the use of a dual-time approach. The extension to rigid domain motion uses an Arbitrary Lagrangian-Eulerlarian (ALE) formulation, and results will be presented for a variety of two- and three-dimensional example problems with both simple and complex geometry.
Kressler, Bryan; Spincemaille, Pascal; Prince, Martin R; Wang, Yi
2006-09-01
Time-resolved 3D MRI with high spatial and temporal resolution can be achieved using spiral sampling and sliding-window reconstruction. Image reconstruction is computationally intensive because of the need for data regridding, a large number of temporal phases, and multiple RF receiver coils. Inhomogeneity blurring correction for spiral sampling further increases the computational work load by an order of magnitude, hindering the clinical utility of spiral trajectories. In this work the reconstruction time is reduced by a factor of >40 compared to reconstruction using a single processor. This is achieved by using a cluster of 32 commercial off-the-shelf computers, commodity networking hardware, and readily available software. The reconstruction system is demonstrated for time-resolved spiral contrast-enhanced (CE) peripheral MR angiography (MRA), and a reduction of reconstruction time from 80 min to 1.8 min is achieved. PMID:16892189
Modifications of the PRONTO 3D finite element program tailored to fast burst nuclear reactor design
This update discusses modifications of PRONTO 3D tailored to the design of fast burst nuclear reactors. A thermoelastic constitutive model and spatially variant thermal history load were added for this special application. Included are descriptions of the thermoelastic constitutive model and the thermal loading algorithm, two example problems used to benchmark the new capability, a user's guide, and PRONTO 3D input files for the example problems. The results from PRONTO 3D thermoelastic finite element analysis are benchmarked against measured data and finite difference calculations. PRONTO 3D is a three-dimensional transient solid dynamics code for analyzing large deformations of highly non-linear materials subjected to high strain rates. The code modifications are implemented in PRONTO 3D Version 5.3.3. 12 refs., 30 figs., 9 tabs
Sung, Chul
2013-08-01
Accurate estimation of neuronal count and distribution is central to the understanding of the organization and layout of cortical maps in the brain, and changes in the cell population induced by brain disorders. High-throughput 3D microscopy techniques such as Knife-Edge Scanning Microscopy (KESM) are enabling whole-brain survey of neuronal distributions. Data from such techniques pose serious challenges to quantitative analysis due to the massive, growing, and sparsely labeled nature of the data. In this paper, we present a scalable, incremental learning algorithm for cell body detection that can address these issues. Our algorithm is computationally efficient (linear mapping, non-iterative) and does not require retraining (unlike gradient-based approaches) or retention of old raw data (unlike instance-based learning). We tested our algorithm on our rat brain Nissl data set, showing superior performance compared to an artificial neural network-based benchmark, and also demonstrated robust performance in a scenario where the data set is rapidly growing in size. Our algorithm is also highly parallelizable due to its incremental nature, and we demonstrated this empirically using a MapReduce-based implementation of the algorithm. We expect our scalable, incremental learning approach to be widely applicable to medical imaging domains where there is a constant flux of new data. © 2013 IEEE.
Professional WebGL Programming Developing 3D Graphics for the Web
Anyuru, Andreas
2012-01-01
Everything you need to know about developing hardware-accelerated 3D graphics with WebGL! As the newest technology for creating 3D graphics on the web, in both games, applications, and on regular websites, WebGL gives web developers the capability to produce eye-popping graphics. This book teaches you how to use WebGL to create stunning cross-platform apps. The book features several detailed examples that show you how to develop 3D graphics with WebGL, including explanations of code snippets that help you understand the why behind the how. You will also develop a stronger understanding of W
Development of parallel/serial program analyzing tool
Japan Atomic Energy Research Institute has been developing 'KMtool', a parallel/serial program analyzing tool, in order to promote the parallelization of the science and engineering computation program. KMtool analyzes the performance of program written by FORTRAN77 and MPI, and it reduces the effort for parallelization. This paper describes development purpose, design, utilization and evaluation of KMtool. (author)
Profiling parallel Mercury programs with ThreadScope
Bone, Paul
2011-01-01
The behavior of parallel programs is even harder to understand than the behavior of sequential programs. Parallel programs may suffer from any of the performance problems affecting sequential programs, as well as from several problems unique to parallel systems. Many of these problems are quite hard (or even practically impossible) to diagnose without help from specialized tools. We present a proposal for a tool for profiling the parallel execution of Mercury programs, a proposal whose implementation we have already started. This tool is an adaptation and extension of the ThreadScope profiler that was first built to help programmers visualize the execution of parallel Haskell programs.
Ervik, Åsmund; Müller, Bernhard
2014-01-01
To leverage the last two decades' transition in High-Performance Computing (HPC) towards clusters of compute nodes bound together with fast interconnects, a modern scalable CFD code must be able to efficiently distribute work amongst several nodes using the Message Passing Interface (MPI). MPI can enable very large simulations running on very large clusters, but it is necessary that the bulk of the CFD code be written with MPI in mind, an obstacle to parallelizing an existing serial code. In this work we present the results of extending an existing two-phase 3D Navier-Stokes solver, which was completely serial, to a parallel execution model using MPI. The 3D Navier-Stokes equations for two immiscible incompressible fluids are solved by the continuum surface force method, while the location of the interface is determined by the level-set method. We employ the Portable Extensible Toolkit for Scientific Computing (PETSc) for domain decomposition (DD) in a framework where only a fraction of the code needs to be a...
A 3D point-kernel multiple scatter model for parallel-beam SPECT based on a gamma-ray buildup factor
A three-dimensional (3D) point-kernel multiple scatter model for point spread function (PSF) determination in parallel-beam single-photon emission computed tomography (SPECT), based on a dose gamma-ray buildup factor, is proposed. This model embraces nonuniform attenuation in a voxelized object of imaging (patient body) and multiple scattering that is treated as in the point-kernel integration gamma-ray shielding problems. First-order Compton scattering is done by means of the Klein-Nishina formula, but the multiple scattering is accounted for by making use of a dose buildup factor. An asset of the present model is the possibility of generating a complete two-dimensional (2D) PSF that can be used for 3D SPECT reconstruction by means of iterative algorithms. The proposed model is convenient in those situations where more exact techniques are not economical. For the proposed model's testing purpose calculations (for the point source in a nonuniform scattering object for parallel beam collimator geometry), the multiple-order scatter PSF generated by means of the proposed model matched well with those using Monte Carlo (MC) simulations. Discrepancies are observed only at the exponential tails mostly due to the high statistic uncertainty of MC simulations in this area, but not because of the inappropriateness of the model
A 3D point-kernel multiple scatter model for parallel-beam SPECT based on a gamma-ray buildup factor
Marinkovic, Predrag; Ilic, Radovan; Spaic, Rajko
2007-09-01
A three-dimensional (3D) point-kernel multiple scatter model for point spread function (PSF) determination in parallel-beam single-photon emission computed tomography (SPECT), based on a dose gamma-ray buildup factor, is proposed. This model embraces nonuniform attenuation in a voxelized object of imaging (patient body) and multiple scattering that is treated as in the point-kernel integration gamma-ray shielding problems. First-order Compton scattering is done by means of the Klein-Nishina formula, but the multiple scattering is accounted for by making use of a dose buildup factor. An asset of the present model is the possibility of generating a complete two-dimensional (2D) PSF that can be used for 3D SPECT reconstruction by means of iterative algorithms. The proposed model is convenient in those situations where more exact techniques are not economical. For the proposed model's testing purpose calculations (for the point source in a nonuniform scattering object for parallel beam collimator geometry), the multiple-order scatter PSF generated by means of the proposed model matched well with those using Monte Carlo (MC) simulations. Discrepancies are observed only at the exponential tails mostly due to the high statistic uncertainty of MC simulations in this area, but not because of the inappropriateness of the model.
Parallel Programming Strategies for Irregular Adaptive Applications
Biswas, Rupak; Biegel, Bryan (Technical Monitor)
2001-01-01
Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance for such computations. In this work, we examine two typical irregular adaptive applications, Dynamic Remeshing and N-Body, under competing programming methodologies and across various parallel architectures. The Dynamic Remeshing application simulates flow over an airfoil, and refines localized regions of the underlying unstructured mesh. The N-Body experiment models two neighboring Plummer galaxies that are about to undergo a merger. Both problems demonstrate dramatic changes in processor workloads and interprocessor communication with time; thus, dynamic load balancing is a required component.
Flexible language constructs for large parallel programs
Rosing, Matthew; Schnabel, Robert
1993-01-01
The goal of the research described is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (MIMD) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include SIMD (Single Instruction Multiple Data), SPMD (Single Program Multiple Data), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression of the variety of algorithms that occur in large scientific computations. An overview of a new language that combines many of these programming models in a clean manner is given. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. An overview of the language and discussion of some of the critical implementation details is given.
The best estimate thermal-hydraulic codes used in the area of nuclear reactor safety have reached a marked level of sophistication and they require to be used by competent analysts. The need for user qualification and training is clearly recognized. An effort is being made to develop a proposal for a systematic approach to user training. The estimated duration of training at the course venue, including a set of training seminars, workshops, and practical exercises, is approximately two years. In addition, the specification and assignment of tasks to be performed by the participants at their home institutions, with continuous supervision from the training center, has been foreseen. The 3D S.UN.COP seminars constitute the follow-up of the presented proposal. The seminar is subdivided into three main parts, each of one with a program to be developed in one week: the first week is dedicated to fundamental theoretical aspects, the second week deals with industrial application, coupling methodologies and hands-on training, and the third week focuses on training for transient analysis in the interaction between thermal-hydraulics and fuel behaviour. The responses of the participants during the training have demonstrated an increase in the capabilities to develop and/or modify nodalization and to perform a qualitative and quantitative accuracy evaluation. It is expected that the participants will be able to set up more accurate, reliable and efficient simulation models, applying the procedures for qualifying the thermal-hydraulic system code calculations, and for the evaluation of the uncertainty
Evaluating the state of the art of parallel programming systems
Süß, Michael; Leopold, Claudia
2005-01-01
This paper describes our plans to evaluate the present state of affairs concerning parallel programming and its systems. Three subprojects are proposed: a survey among programmers and scientists, a comparison of parallel programming systems using a standard set of test programs, and a wiki resource for the parallel programming community - the Parawiki. We would like to invite you to participate and turn these subprojects into true community efforts.
Data-Parallel Programming in a Multithreaded Environment
Matthew Haines; Piyush Mehrotra; David Cronk
1997-01-01
Research on programming distributed memory multiprocessors has resulted in a well-understood programming model, namely data-parallel programming. However, data-parallel programming in a multithreaded environment is far less understood. For example, if multiple threads within the same process belong to different data-parallel computations, then the architecture, compiler, or run-time system must ensure that relative indexing and collective operations are handled properly and efficiently. We in...