Fast implementations of 3D PET reconstruction using vector and parallel programming techniques
Computationally intensive techniques that offer potential clinical use have arisen in nuclear medicine. Examples include iterative reconstruction, 3D PET data acquisition and reconstruction, and 3D image volume manipulation including image registration. One obstacle in achieving clinical acceptance of these techniques is the computational time required. This study focuses on methods to reduce the computation time for 3D PET reconstruction through the use of fast computer hardware, vector and parallel programming techniques, and algorithm optimization. The strengths and weaknesses of i860 microprocessor based workstation accelerator boards are investigated in implementations of 3D PET reconstruction
Sofronov, I.D.; Voronin, B.L.; Butnev, O.I. [VNIIEF (Russian Federation)] [and others
1997-12-31
The aim of the work performed is to develop a 3D parallel program for numerical calculation of gas dynamics problem with heat conductivity on distributed memory computational systems (CS), satisfying the condition of numerical result independence from the number of processors involved. Two basically different approaches to the structure of massive parallel computations have been developed. The first approach uses the 3D data matrix decomposition reconstructed at temporal cycle and is a development of parallelization algorithms for multiprocessor CS with shareable memory. The second approach is based on using a 3D data matrix decomposition not reconstructed during a temporal cycle. The program was developed on 8-processor CS MP-3 made in VNIIEF and was adapted to a massive parallel CS Meiko-2 in LLNL by joint efforts of VNIIEF and LLNL staffs. A large number of numerical experiments has been carried out with different number of processors up to 256 and the efficiency of parallelization has been evaluated in dependence on processor number and their parameters.
Shared Memory Parallelism for 3D Cartesian Discrete Ordinates Solver
Moustafa, Salli; Dutka Malen, Ivan; Plagne, Laurent; Ponçot, Angélique; Ramet, Pierre
2014-01-01
This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multicore+SIMD) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOM...
Shared memory parallelism for 3D cartesian discrete ordinates solver
This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multi-core + SIMD - Single Instruction on Multiple Data) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOMINO can exploit the full power of modern multi-core processors and is able to tackle very large simulations, that usually require large HPC clusters, using a single computing node. For example, DOMINO solves a 3D full core PWR eigenvalue problem involving 26 energy groups, 288 angular directions (S16), 46*106 spatial cells and 1*1012 DoFs within 11 hours on a single 32-core SMP node. This represents a sustained performance of 235 GFlops and 40.74% of the SMP node peak performance for the DOMINO sweep implementation. The very high Flops/Watt ratio of DOMINO makes it a very interesting building block for a future many-nodes nuclear simulation tool. (authors)
Parallel FEM simulation of 3-D crack propagation
Full text: Crack propagation simulation is an important topic in many fields, e.g., aeronautical engineering, material sciences, and geophysics. This type of simulation requires a high computational power, mainly at three-dimensional mesh generation and structural analysis steps. These steps usually spend a large amount of computing time and machine resources. The main objective of this work is to provide a fast and accurate system for crack growth simulation in three-dimensional models. The main idea of the methodology presented is to parallelize mesh generation and structural analysis procedures, and to integrate these procedures into a computational environment able to perform automatic arbitrary crack propagation. A parallel mesh generation algorithm has been developed. This algorithm is capable of generating three-dimensional meshes of tetrahedral elements in arbitrary domains with one or multiple embedded cracks. A finite element method program called FEMOOP has been adapted to implement the parallel features. The parallel strategy to solve the set of linear equations is based on an element-by-element scheme in conjunction with a gradient iterative solution. A program called FRANC3D, which is completely integrated with other components of the system, performs crack propagation and geometry updates. The entire system is described in details and a set of parallel simulations of crack propagation are presented to show the reliability of the system. Refs. 4 (author)
DYNA3D2000*, Explicit 3-D Hydrodynamic FEM Program
1 - Description of program or function: DYNA3D2000 is a nonlinear explicit finite element code for analyzing 3-D structures and solid continuum. The code is vectorized and available on several computer platforms. The element library includes continuum, shell, beam, truss and spring/damper elements to allow maximum flexibility in modeling physical problems. Many materials are available to represent a wide range of material behavior, including elasticity, plasticity, composites, thermal effects and rate dependence. In addition, DYNA3D has a sophisticated contact interface capability, including frictional sliding, single surface contact and automatic contact generation. 2 - Method of solution: Discretization of a continuous model transforms partial differential equations into algebraic equations. A numerical solution is then obtained by solving these algebraic equations through a direct time marching scheme. 3 - Restrictions on the complexity of the problem: Recent software improvements have eliminated most of the user identified limitations with dynamic memory allocation and a very large format description that has pushed potential problem sizes beyond the reach of most users. The dominant restrictions remain in code execution speed and robustness, which the developers constantly strive to improve
3D modelling of edge parallel flow asymmetries
The issue of parallel flows asymmetries in the edge plasma is tackled with a new first principle transport and turbulence code. TOKAM-3D is a 3D full-torus fluid code that can be used both in diffusive and turbulent regimes and covers either exclusively closed flux surfaces or both open and closed field lines in limiter geometry. Two independent mechanisms susceptible to lead to large amplitude asymmetric parallel flows are evidenced. Global ExB drifts coupled with the presence of the limiter break the poloidal symmetry and can generate large amplitude parallel flows even with poloidally uniform transport coefficients. On the other hand, turbulent transport in the edge exhibits a strong ballooning of the radial particle flux generating an up-down m = 1, n = 0 structure on the parallel velocity. The combination of both mechanisms in complete simulations leads to a poloidal and radial distribution of the parallel velocity comparable to experimental results.
Parallel Processor for 3D Recovery from Optical Flow
Jose Hugo Barron-Zambrano
2009-01-01
Full Text Available 3D recovery from motion has received a major effort in computer vision systems in the recent years. The main problem lies in the number of operations and memory accesses to be performed by the majority of the existing techniques when translated to hardware or software implementations. This paper proposes a parallel processor for 3D recovery from optical flow. Its main feature is the maximum reuse of data and the low number of clock cycles to calculate the optical flow, along with the precision with which 3D recovery is achieved. The results of the proposed architecture as well as those from processor synthesis are presented.
Parallel 3-D SN performance for DANTSYS/MPI on the Cray T3D
A data parallel version of the 3-D transport solver in DANTSYS has been in use on the SIMD CM-200's at LANL since 1994. This version typically obtains grind times of 150--200 nanoseconds on a 2,048 PE CM-200. The authors have now implemented a new message passing parallel version of DANTSYS, referred to as DANTSYS/MPI, on the 512 PE Cray T3D at Los Alamos. By taking advantage of the SPMD architecture of the Cray T3D, as well as its low latency communications network, they have managed to achieve grind times of less than 10 nanoseconds on real problems. DANTSYS/MPI is fully accelerated using DSA on both the inner and outer iterations. This paper describes the implementation of DANTSYS/MPI on the Cray T3D, and presents two simple performance models for the transport sweep which accurately predict the grind time as a function of the number of PE's and problem size, or scalability
Powerful supercomputers are available today. MBC-1000M is one of Russian supercomputers that may be used by distant way access. Programs LUCKY and LUCKYC were created to work for multi-processors systems. These programs have algorithms created especially for these computers and used MPI (message passing interface) service for exchanges between processors. LUCKY may resolved shielding tasks by multigroup discreet ordinate method. LUCKYC may resolve critical tasks by same method. Only XYZ orthogonal geometry is available. Under little space steps to approximate discreet operator this geometry may be used as universal one to describe complex geometrical structures. Cross section libraries are used up to P8 approximation by Legendre polynomials for nuclear data in GIT format. Programming language is Fortran-90. 'Vector' processors may be used that lets get a time profit up to 30 times. But unfortunately MBC-1000M has not these processors. Nevertheless sufficient value for efficiency of parallel calculations was obtained under 'space' (LUCKY) and 'space and energy' (LUCKYC) paralleling. AUTOCAD program is used to control geometry after a treatment of input data. Programs have powerful geometry module, it is a beautiful tool to achieve any geometry. Output results may be processed by graphic programs on personal computer. (authors)
Parallel Optimization of 3D Cardiac Electrophysiological Model Using GPU
Yong Xia
2015-01-01
Full Text Available Large-scale 3D virtual heart model simulations are highly demanding in computational resources. This imposes a big challenge to the traditional computation resources based on CPU environment, which already cannot meet the requirement of the whole computation demands or are not easily available due to expensive costs. GPU as a parallel computing environment therefore provides an alternative to solve the large-scale computational problems of whole heart modeling. In this study, using a 3D sheep atrial model as a test bed, we developed a GPU-based simulation algorithm to simulate the conduction of electrical excitation waves in the 3D atria. In the GPU algorithm, a multicellular tissue model was split into two components: one is the single cell model (ordinary differential equation and the other is the diffusion term of the monodomain model (partial differential equation. Such a decoupling enabled realization of the GPU parallel algorithm. Furthermore, several optimization strategies were proposed based on the features of the virtual heart model, which enabled a 200-fold speedup as compared to a CPU implementation. In conclusion, an optimized GPU algorithm has been developed that provides an economic and powerful platform for 3D whole heart simulations.
A parallel sweeping preconditioner for heterogeneous 3D Helmholtz equations
Poulson, Jack; Engquist, Björn; Li, Siwei; Ying, Lexing
2012-01-01
A parallelization of a sweeping preconditioner for 3D Helmholtz equations without large cavities is introduced and benchmarked for several challenging velocity models. The setup and application costs of the sequential preconditioner are shown to be O({\\gamma}^2 N^{4/3}) and O({\\gamma} N log N), where {\\gamma}({\\omega}) denotes the modestly frequency-dependent number of grid points per Perfectly Matched Layer. Several computational and memory improvements are introduced relative to using black...
Parallel acquisition of 3D-HA(CA)NH and 3D-HACACO spectra
Reddy, Jithender G.; Hosur, Ramakrishna V., E-mail: hosur@tifr.res.in [Tata Institute of Fundamental Research, Department of Chemical Sciences (India)
2013-06-15
We present here an NMR pulse sequence with 5 independent incrementable time delays within the frame of a 3-dimensional experiment, by incorporating polarization sharing and dual receiver concepts. This has been applied to directly record 3D-HA(CA)NH and 3D-HACACO spectra of proteins simultaneously using parallel detection of {sup 1}H and {sup 13}C nuclei. While both the experiments display intra-residue backbone correlations, the 3D-HA(CA)NH provides also sequential 'i - 1 {yields} i' correlation along the {sup 1}H{alpha} dimension. Both the spectra contain special peak patterns at glycine locations which serve as check points during the sequential assignment process. The 3D-HACACO spectrum contains, in addition, information on prolines and side chains of residues having H-C-CO network (i.e., {sup 1}H{beta}, {sup 13}C{beta} and {sup 13}CO{gamma} of Asp and Asn, and {sup 1}H{gamma}, {sup 13}C{gamma} and {sup 13}CO{delta} of Glu and Gln), which are generally absent in most conventional proton detected experiments.
Parallel acquisition of 3D-HA(CA)NH and 3D-HACACO spectra
We present here an NMR pulse sequence with 5 independent incrementable time delays within the frame of a 3-dimensional experiment, by incorporating polarization sharing and dual receiver concepts. This has been applied to directly record 3D-HA(CA)NH and 3D-HACACO spectra of proteins simultaneously using parallel detection of 1H and 13C nuclei. While both the experiments display intra-residue backbone correlations, the 3D-HA(CA)NH provides also sequential ‘i − 1 → i’ correlation along the 1Hα dimension. Both the spectra contain special peak patterns at glycine locations which serve as check points during the sequential assignment process. The 3D-HACACO spectrum contains, in addition, information on prolines and side chains of residues having H–C–CO network (i.e., 1Hβ, 13Cβ and 13COγ of Asp and Asn, and 1Hγ, 13Cγ and 13COδ of Glu and Gln), which are generally absent in most conventional proton detected experiments.
Parallel PAB3D: Experiences with a Prototype in MPI
Guerinoni, Fabio; Abdol-Hamid, Khaled S.; Pao, S. Paul
1998-01-01
PAB3D is a three-dimensional Navier Stokes solver that has gained acceptance in the research and industrial communities. It takes as computational domain, a set disjoint blocks covering the physical domain. This is the first report on the implementation of PAB3D using the Message Passing Interface (MPI), a standard for parallel processing. We discuss briefly the characteristics of tile code and define a prototype for testing. The principal data structure used for communication is derived from preprocessing "patching". We describe a simple interface (COMMSYS) for MPI communication, and some general techniques likely to be encountered when working on problems of this nature. Last, we identify levels of improvement from the current version and outline future work.
INGRID, 3-D Mesh Generator for Program DYNA3D and NIKE3D and FACET and TOPAZ3D
1 - Description of program or function: INGRID is a general-purpose, three-dimensional mesh generator developed for use with finite element, nonlinear, structural dynamics codes. INGRID generates the large and complex input data files for DYNA3D (NESC 9909), NIKE3D (NESC 9725), FACET, and TOPAZ3D. One of the greatest advantages of INGRID is that virtually any shape can be described without resorting to wedge elements, tetrahedrons, triangular elements or highly distorted quadrilateral or hexahedral elements. Other capabilities available are in the areas of geometry and graphics. Exact surface equations and surface intersections considerably improve the ability to deal with accurate models, and a hidden line graphics algorithm is included which is efficient on the most complicated meshes. The most important new capability is associated with the boundary conditions, loads, and material properties required by nonlinear mechanics programs. Commands have been designed for each case to minimize user effort. This is particularly important since special processing is almost always required for each load or boundary condition. 2 - Method of solution: Geometries are described primarily using the index space notation of the INGEN program (NESC 975) with an additional type of notation, index progression. Index progressions provide a concise and simple method for describing complex structures; the concept was developed to facilitate defining multiple regions in index space. Rather than specifying the minimum and maximum indices for a region, one specifies the progression of indices along the I, J and K directions, respectively. The index progression method allows the analyst to describe most geometries including nodes and elements with roughly the same amount of input as a solids modeler
DPGL: The Direct3D9-based Parallel Graphics Library for Multi-display Environment
Zhen Liu; Jiao-Ying Shi
2007-01-01
The emergence of high performance 3D graphics cards has opened the way to PC clusters for high performance multidisplay environment. In order to exploit the rendering ability of PC clusters, we should design appropriate parallel rendering algorithms and parallel graphics library interfaces. Due to the rapid development of Direct3D, we bring forward DPGL, the Direct3D9-based parallel graphics library in D3DPR parallel rendering system, which implements Direct3D9 interfaces to support existing Direct3D9 application parallelization with no modification. Based on the parallelism analysis of Direct3D9 rendering pipeline, we briefly introduce D3DPR parallel rendering system. DPGL is the fundamental component of D3DPR. After presenting DPGL three layers architecture,we discuss the rendering resource interception and management. Finally, we describe the design and implementation of DPGL in detail,including rendering command interception layer, rendering command interpretation layer and rendering resource parallelization layer.
PIXIE3D: An efficient, fully implicit, parallel, 3D extended MHD code for fusion plasma modeling
PIXIE3D is a modern, parallel, state-of-the-art extended MHD code that employs fully implicit methods for efficiency and accuracy. It features a general geometry formulation, and is therefore suitable for the study of many magnetic fusion configurations of interest. PIXIE3D advances the state of the art in extended MHD modeling in two fundamental ways. Firstly, it employs a novel conservative finite volume scheme which is remarkably robust and stable, and demands very small physical and/or numerical dissipation. This is a fundamental requirement when one wants to study fusion plasmas with realistic conductivities. Secondly, PIXIE3D features fully-implicit time stepping, employing Newton-Krylov methods for inverting the associated nonlinear systems. These methods have been shown to be scalable and efficient when preconditioned properly. Novel preconditioned ideas (so-called physics based), which were prototypes in the context of reduced MHD, have been adapted for 3D primitive-variable resistive MHD in PIXIE3D, and are currently being extended to Hall MHD. PIXIE3D is fully parallel, employing PETSc for parallelism. PIXIE3D has been thoroughly benchmarked against linear theory and against other available extended MHD codes on nonlinear test problems (such as the GEM reconnection challenge). We are currently in the process of extending such comparisons to fusion-relevant problems in realistic geometries. In this talk, we will describe both the spatial discretization approach and the preconditioning strategy employed for extended MHD in PIXIE3D. We will report on recent benchmarking studies between PIXIE3D and other 3D extended MHD codes, and will demonstrate its usefulness in a variety of fusion-relevant configurations such as Tokamaks and Reversed Field Pinches. (Author)
A parallel algorithm for 3D particle tracking and Lagrangian trajectory reconstruction
Particle-tracking methods are widely used in fluid mechanics and multi-target tracking research because of their unique ability to reconstruct long trajectories with high spatial and temporal resolution. Researchers have recently demonstrated 3D tracking of several objects in real time, but as the number of objects is increased, real-time tracking becomes impossible due to data transfer and processing bottlenecks. This problem may be solved by using parallel processing. In this paper, a parallel-processing framework has been developed based on frame decomposition and is programmed using the asynchronous object-oriented Charm++ paradigm. This framework can be a key step in achieving a scalable Lagrangian measurement system for particle-tracking velocimetry and may lead to real-time measurement capabilities. The parallel tracking algorithm was evaluated with three data sets including the particle image velocimetry standard 3D images data set #352, a uniform data set for optimal parallel performance and a computational-fluid-dynamics-generated non-uniform data set to test trajectory reconstruction accuracy, consistency with the sequential version and scalability to more than 500 processors. The algorithm showed strong scaling up to 512 processors and no inherent limits of scalability were seen. Ultimately, up to a 200-fold speedup is observed compared to the serial algorithm when 256 processors were used. The parallel algorithm is adaptable and could be easily modified to use any sequential tracking algorithm, which inputs frames of 3D particle location data and outputs particle trajectories
Ranjan Sen
2012-01-01
Parallel programming is an extension of sequential programming; today, it is becoming the mainstream paradigm in day-to-day information processing. Its aim is to build the fastest programs on parallel computers. The methodologies for developing a parallelprogram can be put into integrated frameworks. Development focuses on algorithm, languages, and how the program is deployed on the parallel computer.
Introduction to parallel programming
Brawer, Steven
1989-01-01
Introduction to Parallel Programming focuses on the techniques, processes, methodologies, and approaches involved in parallel programming. The book first offers information on Fortran, hardware and operating system models, and processes, shared memory, and simple parallel programs. Discussions focus on processes and processors, joining processes, shared memory, time-sharing with multiple processors, hardware, loops, passing arguments in function/subroutine calls, program structure, and arithmetic expressions. The text then elaborates on basic parallel programming techniques, barriers and race
Parallel Hall effect from 3D single-component metamaterials
Kern, Christian; Kadic, Muamer; Wegener, Martin
2015-01-01
We propose a class of three-dimensional metamaterial architectures composed of a single doped semiconductor (e.g., n-Si) in air or vacuum that lead to unusual effective behavior of the classical Hall effect. Using an anisotropic structure, we numerically demonstrate a Hall voltage that is parallel---rather than orthogonal---to the external static magnetic-field vector ("parallel Hall effect"). The sign of this parallel Hall voltage can be determined by a structure parameter. Together with the...
Parallel Simulation of 3-D Turbulent Flow Through Hydraulic Machinery
徐宇; 吴玉林
2003-01-01
Parallel calculational methods were used to analyze incompressible turbulent flow through hydraulic machinery. Two parallel methods were used to simulate the complex flow field. The space decomposition method divides the computational domain into several sub-ranges. Parallel discrete event simulation divides the whole task into several parts according to their functions. The simulation results were compared with the serial simulation results and particle image velocimetry (PIV) experimental results. The results give the distribution and configuration of the complex vortices and illustrate the effectiveness of the parallel algorithms for numerical simulation of turbulent flows.
Parallel Hall effect from 3D single-component metamaterials
Kern, Christian; Wegener, Martin
2015-01-01
We propose a class of three-dimensional metamaterial architectures composed of a single doped semiconductor (e.g., n-Si) in air or vacuum that lead to unusual effective behavior of the classical Hall effect. Using an anisotropic structure, we numerically demonstrate a Hall voltage that is parallel---rather than orthogonal---to the external static magnetic-field vector ("parallel Hall effect"). The sign of this parallel Hall voltage can be determined by a structure parameter. Together with the previously demonstrated positive or negative orthogonal Hall voltage, we demonstrate four different sign combinations
Parallelism in Constraint Programming
Rolf, Carl Christian
2011-01-01
Writing efficient parallel programs is the biggest challenge of the software industry for the foreseeable future. We are currently in a time when parallel computers are the norm, not the exception. Soon, parallel processors will be standard even in cell phones. Without drastic changes in hardware development, all software must be parallelized to its fullest extent. Parallelism can increase performance and reduce power consumption at the same time. Many programs will execute faster on a...
Parallel logic programming systems
Chassin De Kergommeaux, J.; Codognet, Philippe
1992-01-01
Parallelizing logic programming has attracted much interest in the research community, because of the intrinsic or and and parallelisms of logic programs. One research stream aims at transparent exploitation of parallelism in existing logic programming languages such as Prolog while the family of concurrent logic languages develops constructs allowing programmers to express the concurrency, that is the communication and synchronization between parallel process, inside their algorithms. This p...
DANTSYS/MPI: a system for 3-D deterministic transport on parallel architectures
Baker, R.S.; Alcouffe, R.E.
1996-12-31
Since 1994, we have been using a data parallel form of our deterministic transport code DANTSYS to perform time-independent fixed source and eigenvalue calculations on the CM-200`s at Los Alamos National Laboratory (LANL). Parallelization of the transport sweep is obtained by using a 2-D spatial decomposition which retains the ability to invert the source iteration equation in a single iteration (i.e., the diagonal plane sweep). We have now implemented a message passing version of DANTSYS, referred to as DANTSYS/MPI, on the Cray T3D installed at Los Alamos in 1995. By taking advantage of the SPMD (Single Program, Multiple Data) architecture of the Cray T3D, as well as its low latency communications network, we have managed to achieve grind times (time to solve a single cell in phase space) of less than 10 nanoseconds on the 512 PE (Processing Element) T3D, as opposed to typical grind times of 150-200 nanoseconds on a 2048 PE CM-200, or 300-400 nanoseconds on a single PE of a Cray Y-MP. In addition, we have also parallelized the Diffusion Synthetic Accelerator (DSA) equations which are used to accelerate the convergence of the transport equation. DANTSYS/MPI currently runs on traditional Cray PVP`s and the Cray T3D, and it`s computational kernel (Sweep3D) has been ported to and tested on an array of SGI SMP`s (Symmetric Memory Processors), a network of IBM 590 workstations, an IBM SP2, and the Intel TFLOPs machine at Sandia National Laboratory. This paper describes the implementation of DANTSYS/MPI on the Cray T3D, and presents a simple performance model which accurately predicts the grind time as a function of the number of PE`s and problem size, or scalability. This paper also describes the parallel implementation and performance of the elliptic solver used in DANTSYS/MPI for solving the synthetic acceleration equations.
A Parallel Sweeping Preconditioner for Heterogeneous 3D Helmholtz Equations
Poulson, Jack
2013-05-02
A parallelization of a sweeping preconditioner for three-dimensional Helmholtz equations without large cavities is introduced and benchmarked for several challenging velocity models. The setup and application costs of the sequential preconditioner are shown to be O(γ2N4/3) and O(γN logN), where γ(ω) denotes the modestly frequency-dependent number of grid points per perfectly matched layer. Several computational and memory improvements are introduced relative to using black-box sparse-direct solvers for the auxiliary problems, and competitive runtimes and iteration counts are reported for high-frequency problems distributed over thousands of cores. Two open-source packages are released along with this paper: Parallel Sweeping Preconditioner (PSP) and the underlying distributed multifrontal solver, Clique. © 2013 Society for Industrial and Applied Mathematics.
Parallel deterministic neutronics with AMR in 3D
Clouse, C.; Ferguson, J.; Hendrickson, C. [Lawrence Livermore National Lab., CA (United States)
1997-12-31
AMTRAN, a three dimensional Sn neutronics code with adaptive mesh refinement (AMR) has been parallelized over spatial domains and energy groups and runs on the Meiko CS-2 with MPI message passing. Block refined AMR is used with linear finite element representations for the fluxes, which allows for a straight forward interpretation of fluxes at block interfaces with zoning differences. The load balancing algorithm assumes 8 spatial domains, which minimizes idle time among processors.
Rastogi, Richa; Srivastava, Abhishek; Khonde, Kiran; Sirasala, Kirannmayi M.; Londhe, Ashutosh; Chavhan, Hitesh
2015-07-01
This paper presents an efficient parallel 3D Kirchhoff depth migration algorithm suitable for current class of multicore architecture. The fundamental Kirchhoff depth migration algorithm exhibits inherent parallelism however, when it comes to 3D data migration, as the data size increases the resource requirement of the algorithm also increases. This challenges its practical implementation even on current generation high performance computing systems. Therefore a smart parallelization approach is essential to handle 3D data for migration. The most compute intensive part of Kirchhoff depth migration algorithm is the calculation of traveltime tables due to its resource requirements such as memory/storage and I/O. In the current research work, we target this area and develop a competent parallel algorithm for post and prestack 3D Kirchhoff depth migration, using hybrid MPI+OpenMP programming techniques. We introduce a concept of flexi-depth iterations while depth migrating data in parallel imaging space, using optimized traveltime table computations. This concept provides flexibility to the algorithm by migrating data in a number of depth iterations, which depends upon the available node memory and the size of data to be migrated during runtime. Furthermore, it minimizes the requirements of storage, I/O and inter-node communication, thus making it advantageous over the conventional parallelization approaches. The developed parallel algorithm is demonstrated and analysed on Yuva II, a PARAM series of supercomputers. Optimization, performance and scalability experiment results along with the migration outcome show the effectiveness of the parallel algorithm.
Gamble, James Graham
1990-01-01
While many parallel programming languages exist, they rarely address programming languages from the issue of communication (implying expressability, and readability). A new language called Explicit Parallel Programming (EPP), attempts to provide this quality by separating the responsibility for the execution of run time actions from the responsibility for deciding the order in which they occur. The ordering of a parallel algorithm is specified in the new EPP language; run ti...
Parallel processing for efficient 3D slope stability modelling
Marchesini, Ivan; Mergili, Martin; Alvioli, Massimiliano; Metz, Markus; Schneider-Muntau, Barbara; Rossi, Mauro; Guzzetti, Fausto
2014-05-01
We test the performance of the GIS-based, three-dimensional slope stability model r.slope.stability. The model was developed as a C- and python-based raster module of the GRASS GIS software. It considers the three-dimensional geometry of the sliding surface, adopting a modification of the model proposed by Hovland (1977), and revised and extended by Xie and co-workers (2006). Given a terrain elevation map and a set of relevant thematic layers, the model evaluates the stability of slopes for a large number of randomly selected potential slip surfaces, ellipsoidal or truncated in shape. Any single raster cell may be intersected by multiple sliding surfaces, each associated with a value of the factor of safety, FS. For each pixel, the minimum value of FS and the depth of the associated slip surface are stored. This information is used to obtain a spatial overview of the potentially unstable slopes in the study area. We test the model in the Collazzone area, Umbria, central Italy, an area known to be susceptible to landslides of different type and size. Availability of a comprehensive and detailed landslide inventory map allowed for a critical evaluation of the model results. The r.slope.stability code automatically splits the study area into a defined number of tiles, with proper overlap in order to provide the same statistical significance for the entire study area. The tiles are then processed in parallel by a given number of processors, exploiting a multi-purpose computing environment at CNR IRPI, Perugia. The map of the FS is obtained collecting the individual results, taking the minimum values on the overlapping cells. This procedure significantly reduces the processing time. We show how the gain in terms of processing time depends on the tile dimensions and on the number of cores.
Foster, I.; Tuecke, S.
1991-12-01
PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).
Saez, Fernando; Printista, Alicia Marcela; Piccoli, María Fabiana
2007-01-01
In the last time the high-performance programming community has worked to look for new templates or skeletons for several parallel programming paradigms. This new form of programming allows to programmer to reduce the time of development, since it saves time in the phase of design, testing and codification. We are concerned in some issues of skeletons that are fundamental to the definition of any skeletal parallel programming system. This paper present commentaries about these issues in the c...
A parallel multigrid-based preconditioner for the 3D heterogeneous high-frequency Helmholtz equation
We investigate the parallel performance of an iterative solver for 3D heterogeneous Helmholtz problems related to applications in seismic wave propagation. For large 3D problems, the computation is no longer feasible on a single processor, and the memory requirements increase rapidly. Therefore, parallelization of the solver is needed. We employ a complex shifted-Laplace preconditioner combined with the Bi-CGSTAB iterative method and use a multigrid method to approximate the inverse of the resulting preconditioning operator. A 3D multigrid method with 2D semi-coarsening is employed. We show numerical results for large problems arising in geophysical applications
Programming Parallel Computers
Chandy, K. Mani
1988-01-01
This paper is from a keynote address to the IEEE International Conference on Computer Languages, October 9, 1988. Keynote addresses are expected to be provocative (and perhaps even entertaining), but not necessarily scholarly. The reader should be warned that this talk was prepared with these expectations in mind.Parallel computers offer the potential of great speed at low cost. The promise of parallelism is limited by the ability to program parallel machines effectively. This paper explores ...
Compositional C++: Compositional Parallel Programming
Chandy, K. Mani; Kesselman, Carl
1992-01-01
A compositional parallel program is a program constructed by composing component programs in parallel, where the composed program inherits properties of its components. In this paper, we describe a small extension of C++ called Compositional C++ or CC++ which is an object-oriented notation that supports compositional parallel programming. CC++ integrates different paradigms of parallel programming: data-parallel, task-parallel and object-parallel paradigms; imperative and declarative programm...
The 3D Elevation Program: summary of program direction
Snyder, Gregory I.
2012-01-01
The 3D Elevation Program (3DEP) initiative responds to a growing need for high-quality topographic data and a wide range of other three-dimensional representations of the Nation's natural and constructed features. The National Enhanced Elevation Assessment (NEEA), which was completed in 2011, clearly documented this need within government and industry sectors. The results of the NEEA indicated that enhanced elevation data have the potential to generate $13 billion in new benefits annually. The benefits apply to food risk management, agriculture, water supply, homeland security, renewable energy, aviation safety, and other areas. The 3DEP initiative was recommended by the National Digital Elevation Program and its 12 Federal member agencies and was endorsed by the National States Geographic Information Council (NSGIC) and the National Geospatial Advisory Committee (NGAC).
DANTSYS/MPI- a system for 3-D deterministic transport on parallel architectures
A data parallel version of the 3-D transport solver in DANTSYS has been in use on the SIMD CM-200s at LANL since 1994. This version typically obtains grind times of 150-200 nanoseconds on a 2048 PE CM-200. A new message passing parallel version of DANTSYS has been implemented referred to as DANTSYS/MPI, on the 512 PE Cray T3D at Los Alamos. By taking advantage of the SPMD architecture of the Cray T3D, as well as its low latency communications network, we have managed to achieve grind times of less than 10 nanoseconds on real problems. DANTSYS/MPI is fully accelerated using DSA on both the inner and outer iterations. The implementation is described of DANTSYS/MPI on the Cray T3D, and presents a simple performance model which accurately predicts the grind time as a function of the number of PE's and problem size, or scalableness. (author)
Foster, I.; Tuecke, S.
1993-01-01
PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.
Dharmaraj, Christopher D; Thadikonda, Kishan; Fletcher, Anthony R; Doan, Phuc N; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A; Cook, John A; Mitchell, James B; Subramanian, Sankaran; Krishna, Murali C
2009-01-01
Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 x 23 x 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time. PMID:19672315
Christopher D. Dharmaraj
2009-01-01
Full Text Available Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23×23×23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet. The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.
Parallel Isosurface Extraction for 3D Data Analysis Workflows in Distributed Environments
D'Agostino, Daniele; Clematis, Andrea; Gianuzzi, Vittoria
2011-01-01
Abstract In this paper we discuss the issues related to the development of efficient parallel implementations of the Marching Cubes algorithm, one of the most used methods for isosurface extraction, which is a fundamental operation for 3D data analysis and visualization. We present three possible parallelization strategies and we outline pros and cons of each of them, considering isosurface extraction as stand-alone operation or as part of a dynamic workflow. Our analysis shows tha...
Parallel Programming with Intel Parallel Studio XE
Blair-Chappell , Stephen
2012-01-01
Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the
Rotation symmetry axes and the quality index in a 3D octahedral parallel robot manipulator system
Tanev, T. K.; Rooney, J.
2002-01-01
The geometry of a 3D octahedral parallel robot manipulator system is specified in terms of two rigid octahedral structures (the fixed and moving platforms) and six actuation legs. The symmetry of the system is exploited to determine the behaviour of (a new version of) the quality index for various motions. The main results are presented graphically.
RELAP5-3D Developer Guidelines and Programming Practices
Dr. George L Mesina
2014-03-01
Our ultimate goal is to create and maintain RELAP5-3D as the best software tool available to analyze nuclear power plants. This begins with writing excellent programming and requires thorough testing. This document covers development of RELAP5-3D software, the behavior of the RELAP5-3D program that must be maintained, and code testing. RELAP5-3D must perform in a manner consistent with previous code versions with backward compatibility for the sake of the users. Thus file operations, code termination, input and output must remain consistent in form and content while adding appropriate new files, input and output as new features are developed. As computer hardware, operating systems, and other software change, RELAP5-3D must adapt and maintain performance. The code must be thoroughly tested to ensure that it continues to perform robustly on the supported platforms. The coding must be written in a consistent manner that makes the program easy to read to reduce the time and cost of development, maintenance and error resolution. The programming guidelines presented her are intended to institutionalize a consistent way of writing FORTRAN code for the RELAP5-3D computer program that will minimize errors and rework. A common format and organization of program units creates a unifying look and feel to the code. This in turn increases readability and reduces time required for maintenance, development and debugging. It also aids new programmers in reading and understanding the program. Therefore, when undertaking development of the RELAP5-3D computer program, the programmer must write computer code that follows these guidelines. This set of programming guidelines creates a framework of good programming practices, such as initialization, structured programming, and vector-friendly coding. It sets out formatting rules for lines of code, such as indentation, capitalization, spacing, etc. It creates limits on program units, such as subprograms, functions, and modules. It
A burnup corrected 3-D nodal depletion method for vector and parallel computer architectures
The 2- and 3-D nodal depletion code NOMAD-BC was parallelized and vectorized (3-D only). A 3-D, 2-cycle depletion problem was devised and successfully solved with the NOMAD-BC code in less than 35 seconds on two CPUs of a Cray X-MP/48. This shows a combined vectorization and parallelization speedup of 8.6. The same problem was solved on a 2-CPU 16 MHz SGI workstation in less than one hour, exhibiting a 1.78 speedup over the single processor solution on the same machine. It is shown in this work that complex and detailed burnup computations can be successfully optimized. In addition, the performance achieved demonstrates the possibility of obtaining results within very reasonable times, even on inexpensive workstations. Finally, the small CPU time requirements should make possible the routine evaluation of fuel cycles at great savings of the engineer's time. (author)
Programming standards for effective S-3D game development
Schneider, Neil; Matveev, Alexander
2008-02-01
When a video game is in development, more often than not it is being rendered in three dimensions - complete with volumetric depth. It's the PC monitor that is taking this three-dimensional information, and artificially displaying it in a flat, two-dimensional format. Stereoscopic drivers take the three-dimensional information captured from DirectX and OpenGL calls and properly display it with a unique left and right sided view for each eye so a proper stereoscopic 3D image can be seen by the gamer. The two-dimensional limitation of how information is displayed on screen has encouraged programming short-cuts and work-arounds that stifle this stereoscopic 3D effect, and the purpose of this guide is to outline techniques to get the best of both worlds. While the programming requirements do not significantly add to the game development time, following these guidelines will greatly enhance your customer's stereoscopic 3D experience, increase your likelihood of earning Meant to be Seen certification, and give you instant cost-free access to the industry's most valued consumer base. While this outline is mostly based on NVIDIA's programming guide and iZ3D resources, it is designed to work with all stereoscopic 3D hardware solutions and is not proprietary in any way.
Greenwood, J.; Rucker, D.; Levitt, M.; Yang, X.; Lagmanson, M.
2007-12-01
High Resolution Resistivity data is currently used by hydroGEOPHYSICS, Inc to detect and characterize the distribution of suspected contaminant plumes beneath leaking tanks and disposal sites within the U.S. Department of Energy Hanford Site, in Eastern Washington State. The success of the characterization effort has led to resistivity data acquisition in extremely large survey areas exceeding 0.6 km2 and containing over 6,000 electrodes. Optimal data processing results are achieved by utilizing 105 data points within a single finite difference or finite element model domain. The large number of measurements and electrodes and high resolution of the modeling domain requires a model mesh of over 106 nodes. Existing commercially available resistivity inversion software could not support the domain size due to software and hardware limitations. hydroGEOPHYSICS, Inc teamed with Advanced Geosciences, Inc to advance the existing EarthImager3D inversion software to allow for parallel-processing and large memory support under a 64 bit operating system. The basis for the selection of EarthImager3D is demonstrated with a series of verification tests and benchmark comparisons using synthetic test models, field scale experiments and 6 months of intensive modeling using an array of multi-processor servers. The results of benchmark testing show equivalence to other industry standard inversion codes that perform the same function on significantly smaller domain models. hydroGEOPHYSICS, Inc included the use of 214 steel-cased monitoring wells as "long electrodes", 6000 surface electrodes and 8 buried point source electrodes. Advanced Geosciences, Inc. implemented a long electrode modeling function to support the Hanford Site well casing data. This utility is unique to commercial resistivity inversion software, and was evaluated through a series of laboratory and field scale tests using engineered subsurface plumes. The Hanford site is an ideal proving ground for these methods due
Li, Yong Gang; Yang, Yang; Short, Michael P.; Ding, Ze Jun; Zeng, Zhi; Li, Ju
2015-12-01
SRIM-like codes have limitations in describing general 3D geometries, for modeling radiation displacements and damage in nanostructured materials. A universal, computationally efficient and massively parallel 3D Monte Carlo code, IM3D, has been developed with excellent parallel scaling performance. IM3D is based on fast indexing of scattering integrals and the SRIM stopping power database, and allows the user a choice of Constructive Solid Geometry (CSG) or Finite Element Triangle Mesh (FETM) method for constructing 3D shapes and microstructures. For 2D films and multilayers, IM3D perfectly reproduces SRIM results, and can be ∼102 times faster in serial execution and > 104 times faster using parallel computation. For 3D problems, it provides a fast approach for analyzing the spatial distributions of primary displacements and defect generation under ion irradiation. Herein we also provide a detailed discussion of our open-source collision cascade physics engine, revealing the true meaning and limitations of the “Quick Kinchin-Pease” and “Full Cascades” options. The issues of femtosecond to picosecond timescales in defining displacement versus damage, the limitation of the displacements per atom (DPA) unit in quantifying radiation damage (such as inadequacy in quantifying degree of chemical mixing), are discussed.
MPI is a practical, portable, efficient and flexible standard for message passing, which has been implemented on most MPPs and network of workstations by machine vendors, universities and national laboratories. MPI avoids specifying how operations will take place and superfluous work to achieve efficiency as well as portability, and is also designed to encourage overlapping communication and computation to hide communication latencies. This presentation briefly explains the MPI standard, and comments on efficient parallel programming to improve performance. (author)
2D/3D Program work summary report
The 2D/3D Program was carried out by Germany, Japan and the United States to investigate the thermal-hydraulics of a PWR large-break LOCA. A contributory approach was utilized in which each country contributed significant effort to the program and all three countries shared the research results. Germany constructed and operated the Upper Plenum Test Facility (UPTF), and Japan constructed and operated the Cylindrical Core Test Facility (CCTF) and the Slab Core Test Facility (SCTF). The US contribution consisted of provision of advanced instrumentation to each of the three test facilities, and assessment of the TRAC computer code against the test results. Evaluations of the test results were carried out in all three countries. This report summarizes the 2D/3D Program in terms of the contributing efforts of the participants, and was prepared in a coordination among three countries. US and Germany have published the report as NUREG/IA-0126 and GRS-100, respectively. (author)
Gust Acoustics Computation with a Space-Time CE/SE Parallel 3D Solver
Wang, X. Y.; Himansu, A.; Chang, S. C.; Jorgenson, P. C. E.; Reddy, D. R. (Technical Monitor)
2002-01-01
The benchmark Problem 2 in Category 3 of the Third Computational Aero-Acoustics (CAA) Workshop is solved using the space-time conservation element and solution element (CE/SE) method. This problem concerns the unsteady response of an isolated finite-span swept flat-plate airfoil bounded by two parallel walls to an incident gust. The acoustic field generated by the interaction of the gust with the flat-plate airfoil is computed by solving the 3D (three-dimensional) Euler equations in the time domain using a parallel version of a 3D CE/SE solver. The effect of the gust orientation on the far-field directivity is studied. Numerical solutions are presented and compared with analytical solutions, showing a reasonable agreement.
Stiffness Analysis of 3-d.o.f. Overconstrained Translational Parallel Manipulators
Pashkevich, Anatoly; Wenger, Philippe
2008-01-01
The paper presents a new stiffness modelling method for overconstrained parallel manipulators, which is applied to 3-d.o.f. translational mechanisms. It is based on a multidimensional lumped-parameter model that replaces the link flexibility by localized 6-d.o.f. virtual springs. In contrast to other works, the method includes a FEA-based link stiffness evaluation and employs a new solution strategy of the kinetostatic equations, which allows computing the stiffness matrix for the overconstrained architectures and for the singular manipulator postures. The advantages of the developed technique are confirmed by application examples, which deal with comparative stiffness analysis of two translational parallel manipulators.
Tolerant (parallel) Programming
DiNucci, David C.; Bailey, David H. (Technical Monitor)
1997-01-01
In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.
3D magnetospheric parallel hybrid multi-grid method applied to planet-plasma interactions
Leclercq, Ludivine; Modolo, Ronan; Leblanc, François; Hess, Sebastien; Mancini, Marco
2016-01-01
We present a new method to exploit multiple refinement levels within a 3D parallel hybrid model, developed to study planet-plasma interactions. This model is based on the hybrid formalism: ions are kinetically treated whereas electrons are considered as a inertia-less fluid. Generally, ions are represented by numerical particles whose size equals the volume of the cells. Particles that leave a coarse grid subsequently entering a refined region are split into particles whose volume corresponds...
A Parallel Implementation of the Mortar Element Method in 2D and 3D
Samake A.
2013-12-01
Full Text Available We present here the generic parallel computational framework in C++called Feel++for the mortar finite element method with the arbitrary number of subdomain partitions in 2D and 3D. An iterative method with block-diagonal preconditioners is used for solving the algebraic saddle-point problem arising from the finite element discretization. Finally we present a scalability study and the numerical results obtained using Feel++ library.
Edge-based electric field formulation in 3D CSEM simulations: A parallel approach
Castillo-Reyes, Octavio; de la Puente, Josep; Puzyrev, Vladimir; Cela, José M.
2015-01-01
This paper presents a parallel computing scheme for the data computation that arise when applying one of the most popular electromagnetic methods in exploration geophysics, namely, controlled-source electromagnetic (CSEM). The computational approach is based on linear edge finite element method in 3D isotropic domains. The total electromagnetic field is decomposed into primary and secondary electromagnetic field. The primary field is calculated analytically using an horizontal layered-e...
Spatial Parallelism of a 3D Finite Difference, Velocity-Stress Elastic Wave Propagation Code
MINKOFF,SUSAN E.
1999-12-09
Finite difference methods for solving the wave equation more accurately capture the physics of waves propagating through the earth than asymptotic solution methods. Unfortunately. finite difference simulations for 3D elastic wave propagation are expensive. We model waves in a 3D isotropic elastic earth. The wave equation solution consists of three velocity components and six stresses. The partial derivatives are discretized using 2nd-order in time and 4th-order in space staggered finite difference operators. Staggered schemes allow one to obtain additional accuracy (via centered finite differences) without requiring additional storage. The serial code is most unique in its ability to model a number of different types of seismic sources. The parallel implementation uses the MP1 library, thus allowing for portability between platforms. Spatial parallelism provides a highly efficient strategy for parallelizing finite difference simulations. In this implementation, one can decompose the global problem domain into one-, two-, and three-dimensional processor decompositions with 3D decompositions generally producing the best parallel speed up. Because i/o is handled largely outside of the time-step loop (the most expensive part of the simulation) we have opted for straight-forward broadcast and reduce operations to handle i/o. The majority of the communication in the code consists of passing subdomain face information to neighboring processors for use as ''ghost cells''. When this communication is balanced against computation by allocating subdomains of reasonable size, we observe excellent scaled speed up. Allocating subdomains of size 25 x 25 x 25 on each node, we achieve efficiencies of 94% on 128 processors. Numerical examples for both a layered earth model and a homogeneous medium with a high-velocity blocky inclusion illustrate the accuracy of the parallel code.
New adaptive differencing strategy in the PENTRAN 3-d parallel Sn code
It is known that three-dimensional (3-D) discrete ordinates (Sn) transport problems require an immense amount of storage and computational effort to solve. For this reason, parallel codes that offer a capability to completely decompose the angular, energy, and spatial domains among a distributed network of processors are required. One such code recently developed is PENTRAN, which iteratively solves 3-D multi-group, anisotropic Sn problems on distributed-memory platforms, such as the IBM-SP2. Because large problems typically contain several different material zones with various properties, available differencing schemes should automatically adapt to the transport physics in each material zone. To minimize the memory and message-passing overhead required for massively parallel Sn applications, available differencing schemes in an adaptive strategy should also offer reasonable accuracy and positivity, yet require only the zeroth spatial moment of the transport equation; differencing schemes based on higher spatial moments, in spite of their greater accuracy, require at least twice the amount of storage and communication cost for implementation in a massively parallel transport code. This paper discusses a new adaptive differencing strategy that uses increasingly accurate schemes with low parallel memory and communication overhead. This strategy, implemented in PENTRAN, includes a new scheme, exponential directional averaged (EDA) differencing
3-D electromagnetic plasma particle simulations on the Intel Delta parallel computer
A three-dimensional electromagnetic PIC code has been developed on the 512 node Intel Touchstone Delta MIMD parallel computer. This code is based on the General Concurrent PIC algorithm which uses a domain decomposition to divide the computation among the processors. The 3D simulation domain can be partitioned into 1-, 2-, or 3-dimensional sub-domains. Particles must be exchanged between processors as they move among the subdomains. The Intel Delta allows one to use this code for very-large-scale simulations (i.e. over 108 particles and 106 grid cells). The parallel efficiency of this code is measured, and the overall code performance on the Delta is compared with that on Cray supercomputers. It is shown that their code runs with a high parallel efficiency of ≥ 95% for large size problems. The particle push time achieved is 115 nsecs/particle/time step for 162 million particles on 512 nodes. Comparing with the performance on a single processor Cray C90, this represents a factor of 58 speedup. The code uses a finite-difference leap frog method for field solve which is significantly more efficient than fast fourier transforms on parallel computers. The performance of this code on the 128 node Cray T3D will also be discussed
Description of a parallel, 3D, finite element, hydrodynamics-diffusion code
We describe a parallel, 3D, unstructured grid finite element, hydrodynamic diffusion code for inertial confinement fusion (ICF) applications and the ancillary software used to run it. The code system is divided into two entities, a controller and a stand-alone physics code. The code system may reside on different computers; the controller on the user s workstation and the physics code on a supercomputer. The physics code is composed of separate hydrodynamic, equation-of-state, laser energy deposition, heat conduction, and radiation transport packages and is parallelized for distributed memory architectures. For parallelization, a SPMD model is adopted; the domain is decomposed into a disjoint collection of sub-domains, one per processing element (PE). The PEs communicate using MPI. The code is used to simulate the hydrodynamic implosion of a spherical bubble
Parallel computation of 3-D Navier-Stokes flowfields for supersonic vehicles
Ryan, James S.; Weeratunga, Sisira
1993-01-01
Multidisciplinary design optimization of aircraft will require unprecedented capabilities of both analysis software and computer hardware. The speed and accuracy of the analysis will depend heavily on the computational fluid dynamics (CFD) module which is used. A new CFD module has been developed to combine the robust accuracy of conventional codes with the ability to run on parallel architectures. This is achieved by parallelizing the ARC3D algorithm, a central-differenced Navier-Stokes method, on the Intel iPSC/860. The computed solutions are identical to those from conventional machines. Computational speed on 64 processors is comparable to the rate on one Cray Y-MP processor and will increase as new generations of parallel computers become available.
Object-Oriented Parallel Programming
Givelberg, Edward
2014-01-01
We introduce an object-oriented framework for parallel programming, which is based on the observation that programming objects can be naturally interpreted as processes. A parallel program consists of a collection of persistent processes that communicate by executing remote methods. We discuss code parallelization and process persistence, and explain the main ideas in the context of computations with very large data objects.
Kolotilina, L.; Nikishin, A.; Yeremin, A. [and others
1994-12-31
The solution of large systems of linear equations is a crucial bottleneck when performing 3D finite element analysis of structures. Also, in many cases the reliability and robustness of iterative solution strategies, and their efficiency when exploiting hardware resources, fully determine the scope of industrial applications which can be solved on a particular computer platform. This is especially true for modern vector/parallel supercomputers with large vector length and for modern massively parallel supercomputers. Preconditioned iterative methods have been successfully applied to industrial class finite element analysis of structures. The construction and application of high quality preconditioners constitutes a high percentage of the total solution time. Parallel implementation of high quality preconditioners on such architectures is a formidable challenge. Two common types of existing preconditioners are the implicit preconditioners and the explicit preconditioners. The implicit preconditioners (e.g. incomplete factorizations of several types) are generally high quality but require solution of lower and upper triangular systems of equations per iteration which are difficult to parallelize without deteriorating the convergence rate. The explicit type of preconditionings (e.g. polynomial preconditioners or Jacobi-like preconditioners) require sparse matrix-vector multiplications and can be parallelized but their preconditioning qualities are less than desirable. The authors present results of numerical experiments with Factorized Sparse Approximate Inverses (FSAI) for symmetric positive definite linear systems. These are high quality preconditioners that possess a large resource of parallelism by construction without increasing the serial complexity.
Advanced quadratures and periodic boundary conditions in parallel 3D S{sub n} transport
Manalo, K.; Yi, C.; Huang, M.; Sjoden, G. [Nuclear and Radiological Engineering Program, G.W. Woodruff School of Mechanical Engineering, Georgia Institute of Technology, 770 State Street, Atlanta, GA 30332-0745 (United States)
2013-07-01
Significant updates in numerical quadratures have warranted investigation with 3D Sn discrete ordinates transport. We show new applications of quadrature departing from level symmetric (S{sub 2}o). investigating 3 recently developed quadratures: Even-Odd (EO), Linear-Discontinuous Finite Element - Surface Area (LDFE-SA), and the non-symmetric Icosahedral Quadrature (IC). We discuss implementation changes to 3D Sn codes (applied to Hybrid MOC-Sn TITAN and 3D parallel PENTRAN) that can be performed to accommodate Icosahedral Quadrature, as this quadrature is not 90-degree rotation invariant. In particular, as demonstrated using PENTRAN, the properties of Icosahedral Quadrature are suitable for trivial application using periodic BCs versus that of reflective BCs. In addition to implementing periodic BCs for 3D Sn PENTRAN, we implemented a technique termed 'angular re-sweep' which properly conditions periodic BCs for outer eigenvalue iterative loop convergence. As demonstrated by two simple transport problems (3-group fixed source and 3-group reflected/periodic eigenvalue pin cell), we remark that all of the quadratures we investigated are generally superior to level symmetric quadrature, with Icosahedral Quadrature performing the most efficiently for problems tested. (authors)
Schultz, Anthony [Nouvel Hopital Civil, Strasbourg University Hospital, Radiology Department, Strasbourg Cedex (France); Nouvel Hopital Civil, Service de Radiologie, Strasbourg Cedex (France); Caspar, Thibault [Nouvel Hopital Civil, Strasbourg University Hospital, Cardiology Department, Strasbourg Cedex (France); Schaeffer, Mickael [Nouvel Hopital Civil, Strasbourg University Hospital, Public Health and Biostatistics Department, Strasbourg Cedex (France); Labani, Aissam; Jeung, Mi-Young; El Ghannudi, Soraya; Roy, Catherine [Nouvel Hopital Civil, Strasbourg University Hospital, Radiology Department, Strasbourg Cedex (France); Ohana, Mickael [Nouvel Hopital Civil, Strasbourg University Hospital, Radiology Department, Strasbourg Cedex (France); Universite de Strasbourg / CNRS, UMR 7357, iCube Laboratory, Illkirch (France)
2016-06-15
To qualitatively and quantitatively compare different late gadolinium enhancement (LGE) sequences acquired at 3T with a parallel RF transmission technique. One hundred and sixty participants prospectively enrolled underwent a 3T cardiac MRI with 3 different LGE sequences: 3D Phase-Sensitive Inversion-Recovery (3D-PSIR) acquired 5 minutes after injection, 3D Inversion-Recovery (3D-IR) at 9 minutes and 3D-PSIR at 13 minutes. All LGE-positive patients were qualitatively evaluated both independently and blindly by two radiologists using a 4-level scale, and quantitatively assessed with measurement of contrast-to-noise ratio and LGE maximal surface. Statistical analyses were calculated under a Bayesian paradigm using MCMC methods. Fifty patients (70 % men, 56yo ± 19) exhibited LGE (62 % were post-ischemic, 30 % related to cardiomyopathy and 8 % post-myocarditis). Early and late 3D-PSIR were superior to 3D-IR sequences (global quality, estimated coefficient IR > early-PSIR: -2.37 CI = [-3.46; -1.38], prob(coef > 0) = 0 % and late-PSIR > IR: 3.12 CI = [0.62; 4.41], prob(coef > 0) = 100 %), LGE surface estimated coefficient IR > early-PSIR: -0.09 CI = [-1.11; -0.74], prob(coef > 0) = 0 % and late-PSIR > IR: 0.96 CI = [0.77; 1.15], prob(coef > 0) = 100 %. Probabilities for late PSIR being superior to early PSIR concerning global quality and CNR were over 90 %, regardless of the aetiological subgroup. In 3T cardiac MRI acquired with parallel RF transmission technique, 3D-PSIR is qualitatively and quantitatively superior to 3D-IR. (orig.)
Comparison of 3-D Synthetic Aperture Phased-Array Ultrasound Imaging and Parallel Beamforming
Rasmussen, Morten Fischer; Jensen, Jørgen Arendt
2014-01-01
This paper demonstrates that synthetic apertureimaging (SAI) can be used to achieve real-time 3-D ultra-sound phased-array imaging. It investigates whether SAI in-creases the image quality compared with the parallel beam-forming (PB) technique for real-time 3-D imaging. Data areobtained using both...... simulations and measurements with anultrasound research scanner and a commercially available 3.5-MHz 1024-element 2-D transducer array. To limit the probecable thickness, 256 active elements are used in transmit andreceive for both techniques. The two imaging techniques weredesigned for cardiac imaging, which...... requires sequences de-signed for imaging down to 15cm of depth and a frame rateof at least 20Hz. The imaging quality of the two techniquesis investigated through simulations as a function of depth andangle. SAI improved the full-width at half-maximum (FWHM) at low steering angles by 35%, and the 20-d...
Parallel Imaging of 3D Surface Profile with Space-Division Multiplexing
Hyung Seok Lee
2016-01-01
Full Text Available We have developed a modified optical frequency domain imaging (OFDI system that performs parallel imaging of three-dimensional (3D surface profiles by using the space division multiplexing (SDM method with dual-area swept sourced beams. We have also demonstrated that 3D surface information for two different areas could be well obtained in a same time with only one camera by our method. In this study, double field of views (FOVs of 11.16 mm × 5.92 mm were achieved within 0.5 s. Height range for each FOV was 460 µm and axial and transverse resolutions were 3.6 and 5.52 µm, respectively.
Parallel Imaging of 3D Surface Profile with Space-Division Multiplexing
Lee, Hyung Seok; Cho, Soon-Woo; Kim, Gyeong Hun; Jeong, Myung Yung; Won, Young Jae; Kim, Chang-Seok
2016-01-01
We have developed a modified optical frequency domain imaging (OFDI) system that performs parallel imaging of three-dimensional (3D) surface profiles by using the space division multiplexing (SDM) method with dual-area swept sourced beams. We have also demonstrated that 3D surface information for two different areas could be well obtained in a same time with only one camera by our method. In this study, double field of views (FOVs) of 11.16 mm × 5.92 mm were achieved within 0.5 s. Height range for each FOV was 460 µm and axial and transverse resolutions were 3.6 and 5.52 µm, respectively. PMID:26805840
A Parallelized 3D Particle-In-Cell Method With Magnetostatic Field Solver And Its Applications
Hsu, Kuo-Hsien; Chen, Yen-Sen; Wu, Men-Zan Bill; Wu, Jong-Shinn
2008-10-01
A parallelized 3D self-consistent electrostatic particle-in-cell finite element (PIC-FEM) code using an unstructured tetrahedral mesh was developed. For simulating some applications with external permanent magnet set, the distribution of the magnetostatic field usually also need to be considered and determined accurately. In this paper, we will firstly present the development of a 3D magnetostatic field solver with an unstructured mesh for the flexibility of modeling objects with complex geometry. The vector Poisson equation for magnetostatic field is formulated using the Galerkin nodal finite element method and the resulting matrix is solved by parallel conjugate gradient method. A parallel adaptive mesh refinement module is coupled to this solver for better resolution. Completed solver is then verified by simulating a permanent magnet array with results comparable to previous experimental observations and simulations. By taking the advantage of the same unstructured grid format of this solver, the developed PIC-FEM code could directly and easily read the magnetostatic field for particle simulation. In the upcoming conference, magnetron is simulated and presented for demonstrating the capability of this code.
Parallel Programming with Declarative Ada
Thornley, John
1993-01-01
Declarative programming languages (e.g., functional and logic programming languages) are semantically elegant and implicitly express parallelism at a high level. We show how a parallel declarative language can be based on a modern structured imperative language with single-assignment variables. Such a language combines the advantages of parallel declarative programming with the strengths and familiarity of the underlying imperative language. We introduce Declarative Ada, a parallel declarativ...
Jung, Jaewoon; Kobayashi, Chigusa; Imamura, Toshiyuki; Sugita, Yuji
2016-03-01
Three-dimensional Fast Fourier Transform (3D FFT) plays an important role in a wide variety of computer simulations and data analyses, including molecular dynamics (MD) simulations. In this study, we develop hybrid (MPI+OpenMP) parallelization schemes of 3D FFT based on two new volumetric decompositions, mainly for the particle mesh Ewald (PME) calculation in MD simulations. In one scheme, (1d_Alltoall), five all-to-all communications in one dimension are carried out, and in the other, (2d_Alltoall), one two-dimensional all-to-all communication is combined with two all-to-all communications in one dimension. 2d_Alltoall is similar to the conventional volumetric decomposition scheme. We performed benchmark tests of 3D FFT for the systems with different grid sizes using a large number of processors on the K computer in RIKEN AICS. The two schemes show comparable performances, and are better than existing 3D FFTs. The performances of 1d_Alltoall and 2d_Alltoall depend on the supercomputer network system and number of processors in each dimension. There is enough leeway for users to optimize performance for their conditions. In the PME method, short-range real-space interactions as well as long-range reciprocal-space interactions are calculated. Our volumetric decomposition schemes are particularly useful when used in conjunction with the recently developed midpoint cell method for short-range interactions, due to the same decompositions of real and reciprocal spaces. The 1d_Alltoall scheme of 3D FFT takes 4.7 ms to simulate one MD cycle for a virus system containing more than 1 million atoms using 32,768 cores on the K computer.
A parallel sweeping preconditioner for high frequency heterogeneous 3D Helmholtz equations
Poulson, Jack; Fomel, Sergey; Li, Siwei; Ying, Lexing
2012-01-01
A parallelization of a recently introduced sweeping preconditioner for high frequency heterogeneous Helmholtz equations is presented along with experimental results for the full SEG/EAGE Overthrust seismic model at 30 Hz, using eight grid points per characteristic wavelength; to the best of our knowledge, this is the largest 3D Helmholtz calculation to date, and our algorithm only required fifteen minutes to complete on 8192 cores. While the setup and application costs of the sweeping preconditioner are trivially $\\Theta(N^{4/3})$ and $\\Theta(N \\log N)$, this paper provides strong empirical evidence that the number of iterations required for the convergence of GMRES equipped with the sweeping preconditioner is essentially independent of the frequency of the problem. Generalizations to time-harmonic Maxwell and linear-elastic wave equations are also briefly discussed since the techniques behind our parallelization are not specific to the Helmholtz equation.
Parallel 3D Finite Element Particle-in-Cell Simulations with Pic3P
Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Ng, C.; Schussman, G.; Ko, K.; /SLAC; Ben-Zvi, I.; Kewisch, J.; /Brookhaven
2009-06-19
SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic Particle-In-Cell code Pic3P. Designed for simulations of beam-cavity interactions dominated by space charge effects, Pic3P solves the complete set of Maxwell-Lorentz equations self-consistently and includes space-charge, retardation and boundary effects from first principles. Higher-order Finite Element methods with adaptive refinement on conformal unstructured meshes lead to highly efficient use of computational resources. Massively parallel processing with dynamic load balancing enables large-scale modeling of photoinjectors with unprecedented accuracy, aiding the design and operation of next-generation accelerator facilities. Applications include the LCLS RF gun and the BNL polarized SRF gun.
The 3D Elevation Program initiative: a call for action
Sugarbaker, Larry J.; Constance, Eric W.; Heidemann, Hans Karl; Jason, Allyson L.; Lukas, Vicki; Saghy, David L.; Stoker, Jason M.
2014-01-01
The 3D Elevation Program (3DEP) initiative is accelerating the rate of three-dimensional (3D) elevation data collection in response to a call for action to address a wide range of urgent needs nationwide. It began in 2012 with the recommendation to collect (1) high-quality light detection and ranging (lidar) data for the conterminous United States (CONUS), Hawaii, and the U.S. territories and (2) interferometric synthetic aperture radar (ifsar) data for Alaska. Specifications were created for collecting 3D elevation data, and the data management and delivery systems are being modernized. The National Elevation Dataset (NED) will be completely refreshed with new elevation data products and services. The call for action requires broad support from a large partnership community committed to the achievement of national 3D elevation data coverage. The initiative is being led by the U.S. Geological Survey (USGS) and includes many partners—Federal agencies and State, Tribal, and local governments—who will work together to build on existing programs to complete the national collection of 3D elevation data in 8 years. Private sector firms, under contract to the Government, will continue to collect the data and provide essential technology solutions for the Government to manage and deliver these data and services. The 3DEP governance structure includes (1) an executive forum established in May 2013 to have oversight functions and (2) a multiagency coordinating committee based upon the committee structure already in place under the National Digital Elevation Program (NDEP). The 3DEP initiative is based on the results of the National Enhanced Elevation Assessment (NEEA) that was funded by NDEP agencies and completed in 2011. The study, led by the USGS, identified more than 600 requirements for enhanced (3D) elevation data to address mission-critical information requirements of 34 Federal agencies, all 50 States, and a sample of private sector companies and Tribal and local
Three-dimensional parallel UNIPIC-3D code for simulations of high-power microwave devices
Wang, Jianguo; Chen, Zaigao; Wang, Yue; Zhang, Dianhui; Liu, Chunliang; Li, Yongdong; Wang, Hongguang; Qiao, Hailiang; Fu, Meiyan; Yuan, Yuan
2010-07-01
This paper introduces a self-developed, three-dimensional parallel fully electromagnetic particle simulation code UNIPIC-3D. In this code, the electromagnetic fields are updated using the second-order, finite-difference time-domain method, and the particles are moved using the relativistic Newton-Lorentz force equation. The electromagnetic field and particles are coupled through the current term in Maxwell's equations. Two numerical examples are used to verify the algorithms adopted in this code, numerical results agree well with theoretical ones. This code can be used to simulate the high-power microwave (HPM) devices, such as the relativistic backward wave oscillator, coaxial vircator, and magnetically insulated line oscillator, etc. UNIPIC-3D is written in the object-oriented C++ language and can be run on a variety of platforms including WINDOWS, LINUX, and UNIX. Users can use the graphical user's interface to create the complex geometric structures of the simulated HPM devices, which can be automatically meshed by UNIPIC-3D code. This code has a powerful postprocessor which can display the electric field, magnetic field, current, voltage, power, spectrum, momentum of particles, etc. For the sake of comparison, the results computed by using the two-and-a-half-dimensional UNIPIC code are also provided for the same parameters of HPM devices, the numerical results computed from these two codes agree well with each other.
3D Body Scanning Measurement System Associated with RF Imaging, Zero-padding and Parallel Processing
Kim Hyung Tae
2016-04-01
Full Text Available This work presents a novel signal processing method for high-speed 3D body measurements using millimeter waves with a general processing unit (GPU and zero-padding fast Fourier transform (ZPFFT. The proposed measurement system consists of a radio-frequency (RF antenna array for a penetrable measurement, a high-speed analog-to-digital converter (ADC for significant data acquisition, and a general processing unit for fast signal processing. The RF waves of the transmitter and the receiver are converted to real and imaginary signals that are sampled by a high-speed ADC and synchronized with the kinematic positions of the scanner. Because the distance between the surface and the antenna is related to the peak frequency of the conjugate signals, a fast Fourier transform (FFT is applied to the signal processing after the sampling. The sampling time is finite owing to a short scanning time, and the physical resolution needs to be increased; further, zero-padding is applied to interpolate the spectra of the sampled signals to consider a 1/m floating point frequency. The GPU and parallel algorithm are applied to accelerate the speed of the ZPFFT because of the large number of additional mathematical operations of the ZPFFT. 3D body images are finally obtained by spectrograms that are the arrangement of the ZPFFT in a 3D space.
Introducing ZEUS-MP A 3D, Parallel, Multiphysics Code for Astrophysical Fluid Dynamics
Norman, M L
2000-01-01
We describe ZEUS-MP: a Multi-Physics, Massively-Parallel, Message-Passing code for astrophysical fluid dynamics simulations in 3 dimensions. ZEUS-MP is a follow-on to the sequential ZEUS-2D and ZEUS-3D codes developed and disseminated by the Laboratory for Computational Astrophysics (lca.ncsa.uiuc.edu) at NCSA. V1.0 released 1/1/2000 includes the following physics modules: ideal hydrodynamics, ideal MHD, and self-gravity. Future releases will include flux-limited radiation diffusion, thermal heat conduction, two-temperature plasma, and heating and cooling functions. The covariant equations are cast on a moving Eulerian grid with Cartesian, cylindrical, and spherical polar coordinates currently supported. Parallelization is done by domain decomposition and implemented in F77 and MPI. The code is portable across a wide range of platforms from networks of workstations to massively parallel processors. Some parallel performance results are presented as well as an application to turbulent star formation.
3D Navier-Stokes Time Accurate Solutions Using Multipartitioning Parallel Computation Methodology
Zha, Ge-Cheng
1998-01-01
A parallel CFD code solving 3D time accurate Navier-Stokes equations with multipartitioning parallel Methodology is being developed in collaboration with Ohio State University within the Air Vehicle Directorate, at Wright Patterson Air Force Base. The advantage of the multipartitioning parallel method is that the domain decomposition will not introduce domain boundaries for the implicit operators. A ring structure data communication is employed so that the implicit time accurate method can be implemented for multi-processors with the same accuracy as for the single processor. No sub-iteration is needed at the domain boundaries. The code has been validated for some typical unsteady flows, which include Coutte Flow, flow passing a cylinder. The code now is being employed for a large scale time accurate wall jet transient flow computation. 'ne preliminary results are promising. The mesh has been refined to capture more details of the flow field. The mesh refinement computation is in progress and would be difficult to successfully implement without the parallel computation techniques used. A modified version of the code with more efficient inversion of the diagonalized block matrix is currently being tested.
Parallel CAE system for large-scale 3-D finite element analyses
This paper describes a new pre- and post-processing system for the automation of large-scale 3D finite element analyses. In the pre-processing stage, a geometry model lo be analyzed is defined by a user through an interactive operation with a 3D graphics editor. The analysis model is constructed by adding analysis conditions and a mesh refinement information lo the geometry model. The mesh refinement information, i.e. a nodal density distribution over the whole analysis domain is initially defined by superposing several locally optimum nodal patterns stored in the nodal pattern database of the system. Nodes and tetrahedral elements are generated using some computational geometry techniques whose processing speed is almost proportional to the total number of nodes. In the post-processing stage, scalar and vector values are evaluated at arbitrary points in the analysis domain, and displayed as equi-contours, vector lines, iso-surfaces, particle plots and realtime animation by means of scientific visualization techniques. The present system is also capable of mesh optimization. A posteriori error distribution over the whole analysis domain is obtained based on the simple error estimator proposed by Zienkiewicz and Zhu. The nodal density distribution to be used for mesh generation is optimized referring the obtained error distribution. Finally nodes and tetrahedral elements are re-generated. The present remeshing method is one of the global hr-version mesh adaptation methods. To deal with large-scale 3D finite element analyses in a reasonable computational time and memory requirement, a distributed/parallel processing technique is applied to some part of the present system. Fundamental performances of the present system are clearly demonstrated through 3D thermal conduction analyses. (author)
Design and verification of an ultra-precision 3D-coordinate measuring machine with parallel drives
Bos, Edwin; Moers, Ton; van Riel, Martijn
2015-08-01
An ultra-precision 3D coordinate measuring machine (CMM), the TriNano N100, has been developed. In our design, the workpiece is mounted on a 3D stage, which is driven by three parallel drives that are mutually orthogonal. The linear drives support the 3D stage using vacuum preloaded (VPL) air bearings, whereby each drive determines the position of the 3D stage along one translation direction only. An exactly constrained design results in highly repeatable machine behavior. Furthermore, the machine complies with the Abbé principle over its full measurement range and the application of parallel drives allows for excellent dynamic behavior. The design allows a 3D measurement uncertainty of 100 nanometers in a measurement range of 200 cubic centimeters. Verification measurements using a Gannen XP 3D tactile probing system on a spherical artifact show a standard deviation in single point repeatability of around 2 nm in each direction.
Approach of generating parallel programs from parallelized algorithm design strategies
WAN Jian-yi; LI Xiao-ying
2008-01-01
Today, parallel programming is dominated by message passing libraries, such as message passing interface (MPI). This article intends to simplify parallel programming by generating parallel programs from parallelized algorithm design strategies. It uses skeletons to abstract parallelized algorithm design strategies, as well as parallel architectures. Starting from problem specification, an abstract parallel abstract programming language+ (Apla+) program is generated from parallelized algorithm design strategies and problem-specific function definitions. By combining with parallel architectures, implicity of parallelism inside the parallelized algorithm design strategies is exploited. With implementation and transformation, C++ and parallel virtual machine (CPPVM) parallel program is finally generated. Parallelized branch and bound (B&B) algorithm design strategy and parallelized divide and conquer (D & C) algorithm design strategy are studied in this article as examples. And it also illustrates the approach with a case study.
Recent progress in 3D EM/EM-PIC simulation with ARGUS and parallel ARGUS
ARGUS is an integrated, 3-D, volumetric simulation model for systems involving electric and magnetic fields and charged particles, including materials embedded in the simulation region. The code offers the capability to carry out time domain and frequency domain electromagnetic simulations of complex physical systems. ARGUS offers a boolean solid model structure input capability that can include essentially arbitrary structures on the computational domain, and a modular architecture that allows multiple physics packages to access the same data structure and to share common code utilities. Physics modules are in place to compute electrostatic and electromagnetic fields, the normal modes of RF structures, and self-consistent particle-in-cell (PIC) simulation in either a time dependent mode or a steady state mode. The PIC modules include multiple particle species, the Lorentz equations of motion, and algorithms for the creation of particles by emission from material surfaces, injection onto the grid, and ionization. In this paper, we present an updated overview of ARGUS, with particular emphasis given in recent algorithmic and computational advances. These include a completely rewritten frequency domain solver which efficiently treats lossy materials and periodic structures, a parallel version of ARGUS with support for both shared memory parallel vector (i.e. CRAY) machines and distributed memory massively parallel MIMD systems, and numerous new applications of the code
The development of laser-plasma interaction program LAP3D on thousands of processors
Xiaoyan Hu
2015-08-01
Full Text Available Modeling laser-plasma interaction (LPI processes in real-size experiments scale is recognized as a challenging task. For explorering the influence of various instabilities in LPI processes, a three-dimensional laser and plasma code (LAP3D has been developed, which includes filamentation, stimulated Brillouin backscattering (SBS, stimulated Raman backscattering (SRS, non-local heat transport and plasmas flow computation modules. In this program, a second-order upwind scheme is applied to solve the plasma equations which are represented by an Euler fluid model. Operator splitting method is used for solving the equations of the light wave propagation, where the Fast Fourier translation (FFT is applied to compute the diffraction operator and the coordinate translations is used to solve the acoustic wave equation. The coupled terms of the different physics processes are computed by the second-order interpolations algorithm. In order to simulate the LPI processes in massively parallel computers well, several parallel techniques are used, such as the coupled parallel algorithm of FFT and fluid numerical computation, the load balance algorithm, and the data transfer algorithm. Now the phenomena of filamentation, SBS and SRS have been studied in low-density plasma successfully with LAP3D. Scalability of the program is demonstrated with a parallel efficiency above 50% on about ten thousand of processors.
The development of laser-plasma interaction program LAP3D on thousands of processors
Modeling laser-plasma interaction (LPI) processes in real-size experiments scale is recognized as a challenging task. For explorering the influence of various instabilities in LPI processes, a three-dimensional laser and plasma code (LAP3D) has been developed, which includes filamentation, stimulated Brillouin backscattering (SBS), stimulated Raman backscattering (SRS), non-local heat transport and plasmas flow computation modules. In this program, a second-order upwind scheme is applied to solve the plasma equations which are represented by an Euler fluid model. Operator splitting method is used for solving the equations of the light wave propagation, where the Fast Fourier translation (FFT) is applied to compute the diffraction operator and the coordinate translations is used to solve the acoustic wave equation. The coupled terms of the different physics processes are computed by the second-order interpolations algorithm. In order to simulate the LPI processes in massively parallel computers well, several parallel techniques are used, such as the coupled parallel algorithm of FFT and fluid numerical computation, the load balance algorithm, and the data transfer algorithm. Now the phenomena of filamentation, SBS and SRS have been studied in low-density plasma successfully with LAP3D. Scalability of the program is demonstrated with a parallel efficiency above 50% on about ten thousand of processors
The development of laser-plasma interaction program LAP3D on thousands of processors
Hu, Xiaoyan, E-mail: hu-xiaoyan@iapcm.ac.cn; Hao, Liang; Liu, Zhanjun; Zheng, Chunyang; Li, Bin, E-mail: li.bin@iapcm.ac.cn; Guo, Hong [Institute of Applied Physics and Computational Mathematics, Beijing 100088 (China)
2015-08-15
Modeling laser-plasma interaction (LPI) processes in real-size experiments scale is recognized as a challenging task. For explorering the influence of various instabilities in LPI processes, a three-dimensional laser and plasma code (LAP3D) has been developed, which includes filamentation, stimulated Brillouin backscattering (SBS), stimulated Raman backscattering (SRS), non-local heat transport and plasmas flow computation modules. In this program, a second-order upwind scheme is applied to solve the plasma equations which are represented by an Euler fluid model. Operator splitting method is used for solving the equations of the light wave propagation, where the Fast Fourier translation (FFT) is applied to compute the diffraction operator and the coordinate translations is used to solve the acoustic wave equation. The coupled terms of the different physics processes are computed by the second-order interpolations algorithm. In order to simulate the LPI processes in massively parallel computers well, several parallel techniques are used, such as the coupled parallel algorithm of FFT and fluid numerical computation, the load balance algorithm, and the data transfer algorithm. Now the phenomena of filamentation, SBS and SRS have been studied in low-density plasma successfully with LAP3D. Scalability of the program is demonstrated with a parallel efficiency above 50% on about ten thousand of processors.
Patterns For Parallel Programming
Mattson, Timothy G; Massingill, Berna L
2005-01-01
From grids and clusters to next-generation game consoles, parallel computing is going mainstream. Innovations such as Hyper-Threading Technology, HyperTransport Technology, and multicore microprocessors from IBM, Intel, and Sun are accelerating the movement's growth. Only one thing is missing: programmers with the skills to meet the soaring demand for parallel software.
3-D readout-electronics packaging for high-bandwidth massively paralleled imager
Kwiatkowski, Kris; Lyke, James
2007-12-18
Dense, massively parallel signal processing electronics are co-packaged behind associated sensor pixels. Microchips containing a linear or bilinear arrangement of photo-sensors, together with associated complex electronics, are integrated into a simple 3-D structure (a "mirror cube"). An array of photo-sensitive cells are disposed on a stacked CMOS chip's surface at a 45.degree. angle from light reflecting mirror surfaces formed on a neighboring CMOS chip surface. Image processing electronics are held within the stacked CMOS chip layers. Electrical connections couple each of said stacked CMOS chip layers and a distribution grid, the connections for distributing power and signals to components associated with each stacked CSMO chip layer.
The new Exponential Directional Iterative (EDI) 3-D Sn scheme for parallel adaptive differencing
The new Exponential Directional Iterative (EDI) discrete ordinates (Sn) scheme for 3-D Cartesian Coordinates is presented. The EDI scheme is a logical extension of the positive, efficient Exponential Directional Weighted (EDW) Sn scheme currently used as the third level of the adaptive spatial differencing algorithm in the PENTRAN parallel discrete ordinates solver. Here, the derivation and advantages of the EDI scheme are presented; EDI uses EDW-rendered exponential coefficients as initial starting values to begin a fixed point iteration of the exponential coefficients. One issue that required evaluation was an iterative cutoff criterion to prevent the application of an unstable fixed point iteration; although this was needed in some cases, it was readily treated with a default to EDW. Iterative refinement of the exponential coefficients in EDI typically converged in fewer than four fixed point iterations. Moreover, EDI yielded more accurate angular fluxes compared to the other schemes tested, particularly in streaming conditions. Overall, it was found that the EDI scheme was up to an order of magnitude more accurate than the EDW scheme on a given mesh interval in streaming cases, and is potentially a good candidate as a fourth-level differencing scheme in the PENTRAN adaptive differencing sequence. The 3-D Cartesian computational cost of EDI was only about 20% more than the EDW scheme, and about 40% more than Diamond Zero (DZ). More evaluation and testing are required to determine suitable upgrade metrics for EDI to be fully integrated into the current adaptive spatial differencing sequence in PENTRAN. (author)
Combining parallel search and parallel consistency in constraint programming
Rolf, Carl Christian; Kuchcinski, Krzysztof
2010-01-01
Program parallelization becomes increasingly important when new multi-core architectures provide ways to improve performance. One of the greatest challenges of this development lies in programming parallel applications. Declarative languages, such as constraint programming, can make the transition to parallelism easier by hiding the parallelization details in a framework. Automatic parallelization in constraint programming has mostly focused on parallel search. While search and consist...
In situ patterned micro 3D liver constructs for parallel toxicology testing in a fluidic device.
Skardal, Aleksander; Devarasetty, Mahesh; Soker, Shay; Hall, Adam R
2015-09-01
3D tissue models are increasingly being implemented for drug and toxicology testing. However, the creation of tissue-engineered constructs for this purpose often relies on complex biofabrication techniques that are time consuming, expensive, and difficult to scale up. Here, we describe a strategy for realizing multiple tissue constructs in a parallel microfluidic platform using an approach that is simple and can be easily scaled for high-throughput formats. Liver cells mixed with a UV-crosslinkable hydrogel solution are introduced into parallel channels of a sealed microfluidic device and photopatterned to produce stable tissue constructs in situ. The remaining uncrosslinked material is washed away, leaving the structures in place. By using a hydrogel that specifically mimics the properties of the natural extracellular matrix, we closely emulate native tissue, resulting in constructs that remain stable and functional in the device during a 7-day culture time course under recirculating media flow. As proof of principle for toxicology analysis, we expose the constructs to ethyl alcohol (0-500 mM) and show that the cell viability and the secretion of urea and albumin decrease with increasing alcohol exposure, while markers for cell damage increase. PMID:26355538
Parallel programming with Python
Palach, Jan
2014-01-01
A fast, easy-to-follow and clear tutorial to help you develop Parallel computing systems using Python. Along with explaining the fundamentals, the book will also introduce you to slightly advanced concepts and will help you in implementing these techniques in the real world. If you are an experienced Python programmer and are willing to utilize the available computing resources by parallelizing applications in a simple way, then this book is for you. You are required to have a basic knowledge of Python development to get the most of this book.
Practical parallel programming
Bauer, Barr E
2014-01-01
This is the book that will teach programmers to write faster, more efficient code for parallel processors. The reader is introduced to a vast array of procedures and paradigms on which actual coding may be based. Examples and real-life simulations using these devices are presented in C and FORTRAN.
The 3D Elevation Program: summary for Minnesota
Carswell, William J., Jr.
2013-01-01
Elevation data are essential to a broad range of applications, including forest resources management, wildlife and habitat management, national security, recreation, and many others. For the State of Minnesota, elevation data are critical for agriculture and precision farming, natural resources conservation, flood risk management, infrastructure and construction management, water supply and quality, coastal zone management, and other business uses. Today, high-quality light detection and ranging (lidar) data are the sources for creating elevation models and other elevation datasets. Federal, State, and local agencies work in partnership to (1) replace data, on a national basis, that are (on average) 30 years old and of lower quality and (2) provide coverage where publicly accessible data do not exist. A joint goal of State and Federal partners is to acquire consistent, statewide coverage to support existing and emerging applications enabled by lidar data. The new 3D Elevation Program (3DEP) initiative, managed by the U.S. Geological Survey (USGS), responds to the growing need for high-quality topographic data and a wide range of other three-dimensional representations of the Nation’s natural and constructed features.
The 3D Elevation Program: summary for Rhode Island
Carswell, William J., Jr.
2013-01-01
Elevation data are essential to a broad range of applications, including forest resources management, wildlife and habitat management, national security, recreation, and many others. For the State of Rhode Island, elevation data are critical for flood risk management, natural resources conservation, coastal zone management, sea level rise and subsidence, agriculture and precision farming, and other business uses. Today, high-quality light detection and ranging (lidar) data are the sources for creating elevation models and other elevation datasets. Federal, State, and local agencies work in partnership to (1) replace data, on a national basis, that are (on average) 30 years old and of lower quality and (2) provide coverage where publicly accessible data do not exist. A joint goal of State and Federal partners is to acquire consistent, statewide coverage to support existing and emerging applications enabled by lidar data. The new 3D Elevation Program (3DEP) initiative (Snyder, 2012a,b), managed by the U.S. Geological Survey (USGS), responds to the growing need for high-quality topographic data and a wide range of other three-dimensional representations of the Nation’s natural and constructed features.
The 3D Elevation Program: summary for Wisconsin
Carswell, William J., Jr.
2013-01-01
Elevation data are essential to a broad range of applications, including forest resources management, wildlife and habitat management, national security, recreation, and many others. For the State of Wisconsin, elevation data are critical for agriculture and precision farming, natural resources conservation, flood risk management, infrastructure and construction management, water supply and quality, and other business uses. Today, high-quality light detection and ranging (lidar) data are the sources for creating elevation models and other elevation datasets. Federal, State, and local agencies work in partnership to (1) replace data, on a national basis, that are (on average) 30 years old and of lower quality and (2) provide coverage where publicly accessible data do not exist. A joint goal of State and Federal partners is to acquire consistent, statewide coverage to support existing and emerging applications enabled by lidar data. The new 3D Elevation Program (3DEP) initiative, managed by the U.S. Geological Survey (USGS), responds to the growing need for high-quality topographic data and a wide range of other three-dimensional representations of the Nation’s natural and constructed features.
The 3D Elevation Program: summary for California
Carswell, William J., Jr.
2013-01-01
Elevation data are essential to a broad range of applications, including forest resources management, wildlife and habitat management, national security, recreation, and many others. For the State of California, elevation data are critical for infrastructure and construction management; natural resources conservation; flood risk management; wildfire management, planning, and response; agriculture and precision farming; geologic resource assessment and hazard mitigation; and other business uses. Today, high-quality light detection and ranging (lidar) data are the sources for creating elevation models and other elevation datasets. Federal, State, and local agencies work in partnership to (1) replace data, on a national basis, that are (on average) 30 years old and of lower quality and (2) provide coverage where publicly accessible data do not exist. A joint goal of State and Federal partners is to acquire consistent, statewide coverage to support existing and emerging applications enabled by lidar data. The new 3D Elevation Program (3DEP) initiative, managed by the U.S. Geological Survey (USGS), responds to the growing need for high-quality topographic data and a wide range of other three-dimensional representations of the Nation’s natural and constructed features.
Application of the SDD-CMFD acceleration method to parallel 3-D MOC transport
In this paper the spatial domain decomposed coarse mesh finite difference (SDD-CMFD) method is applied as an acceleration technique to a parallel implementation of the 3-D method of characteristics (MOC) for a series of problems to assess the effectiveness of the method for practical applications. The SDD-CMFD method assumes the problem domain is divided into independent parallelizable sweep regions globally linked within the framework of a CMFD-like system. Results obtained with the MPACT code are examined for three problems. The first analysis is of multi-dimensional, 1-group, infinite homogeneous media problems that compare the numerically-measured rate of convergence to that predicted by the 1-D Fourier analysis performed in previous work. It is observed that the rate of convergence of the numerical experiments has similar behavior to that predicted by the Fourier analysis for variations of optical thickness in the coarse cell and spatial subdomain. However, the rate of convergence is measured to be slightly less than that predicted by Fourier analysis. The algorithm is applied to the Takeda 3-D neutron transport benchmark, and compared to a standard source iteration. In the analysis of this problem, the method is observed to speed up convergence, significantly reducing the number of outer iterations by a factor of nearly 20x and reducing the overall run time by a factor of about 10x. Finally, the method is applied to a realistic PWR assembly, which is observed to converge in 7 outer iterations, a factor of 150x less than source iteration, using the SDD-CMFD acceleration method, and have an estimated speedup of ∼34x over conventional source iteration. (author)
CH5M3D: an HTML5 program for creating 3D molecular structures
Earley, Clarke W
2013-01-01
Background While a number of programs and web-based applications are available for the interactive display of 3-dimensional molecular structures, few of these provide the ability to edit these structures. For this reason, we have developed a library written in JavaScript to allow for the simple creation of web-based applications that should run on any browser capable of rendering HTML5 web pages. While our primary interest in developing this application was for educational use, it may also pr...
3D magnetospheric parallel hybrid multi-grid method applied to planet-plasma interactions
Leclercq, L.; Modolo, R.; Leblanc, F.; Hess, S.; Mancini, M.
2016-03-01
We present a new method to exploit multiple refinement levels within a 3D parallel hybrid model, developed to study planet-plasma interactions. This model is based on the hybrid formalism: ions are kinetically treated whereas electrons are considered as a inertia-less fluid. Generally, ions are represented by numerical particles whose size equals the volume of the cells. Particles that leave a coarse grid subsequently entering a refined region are split into particles whose volume corresponds to the volume of the refined cells. The number of refined particles created from a coarse particle depends on the grid refinement rate. In order to conserve velocity distribution functions and to avoid calculations of average velocities, particles are not coalesced. Moreover, to ensure the constancy of particles' shape function sizes, the hybrid method is adapted to allow refined particles to move within a coarse region. Another innovation of this approach is the method developed to compute grid moments at interfaces between two refinement levels. Indeed, the hybrid method is adapted to accurately account for the special grid structure at the interfaces, avoiding any overlapping grid considerations. Some fundamental test runs were performed to validate our approach (e.g. quiet plasma flow, Alfven wave propagation). Lastly, we also show a planetary application of the model, simulating the interaction between Jupiter's moon Ganymede and the Jovian plasma.
Explicit Parallel Programming: System Description
Gamble, Jim; Ribbens, Calvin J.
1991-01-01
The implementation of the Explicit Parallel Programming (EPP) system is described. EPP is a prototype implementation of a language for writing parallel programs for shared memory multiprocessors. EPP may be viewed as a coordination language, since it is used to define the sequencing or ordering of various tasks, while the tasks themselves are defined in some other compilable language. The two main components of the EPP system---a compiler and an executive---are described in this report. An...
Explicit Parallel Programming: User's Guide
Gamble, Jim; Ribbens, Calvin J.
1991-01-01
The Explicit Parallel Programming (EPP) language is defined and illustrated with several examples. EPP is a prototype implementation of a language for writing parallel programs for shared memory multiprocessors. EPP may be viewed as a coordination language, since it is used to define the sequencing or ordering of various tasks, while the tasks themselves are defined in some other compilable language. The prototype described here requires FORTRAN as the base language, but there is no inheren...
Teaching Parallel Programming Using Java
Shafi, Aamir; Akhtar, Aleem; Javed, Ansar; Carpenter, Bryan
2014-01-01
This paper presents an overview of the "Applied Parallel Computing" course taught to final year Software Engineering undergraduate students in Spring 2014 at NUST, Pakistan. The main objective of the course was to introduce practical parallel programming tools and techniques for shared and distributed memory concurrent systems. A unique aspect of the course was that Java was used as the principle programming language. The course was divided into three sections. The first section covered paral...
Chien-Lun Hou; Hao-Ting Lin; Mao-Hsiung Chiang
2011-01-01
In this paper, a stereo vision 3D position measurement system for a three-axial pneumatic parallel mechanism robot arm is presented. The stereo vision 3D position measurement system aims to measure the 3D trajectories of the end-effector of the robot arm. To track the end-effector of the robot arm, the circle detection algorithm is used to detect the desired target and the SAD algorithm is used to track the moving target and to search the corresponding target location along the conjugate epip...
Task-parallel implementation of 3D shortest path raytracing for geophysical applications
Giroux, Bernard; Larouche, Benoît
2013-04-01
This paper discusses two variants of the shortest path method and their parallel implementation on a shared-memory system. One variant is designed to perform raytracing in models with stepwise distributions of interval velocity while the other is better suited for continuous velocity models. Both rely on a discretization scheme where primary nodes are located at the corners of cuboid cells and where secondary nodes are found on the edges and sides of the cells. The parallel implementations allow raytracing concurrently for different sources, providing an attractive framework for ray-based tomography. The accuracy and performance of the implementations were measured by comparison with the analytic solution for a layered model and for a vertical gradient model. Mean relative error less than 0.2% was obtained with 5 secondary nodes for the layered model and 9 secondary nodes for the gradient model. Parallel performance depends on the level of discretization refinement, on the number of threads, and on the problem size, with the most determinant variable being the level of discretization refinement (number of secondary nodes). The results indicate that a good trade-off between speed and accuracy is achieved with the number of secondary nodes equal to 5. The programs are written in C++ and rely on the Standard Template Library and OpenMP.
Hao-Ting Lin; Mao-Hsiung Chiang
2011-01-01
This study aimed to develop a novel 3D parallel mechanism robot driven by three vertical-axial pneumatic actuators with a stereo vision system for path tracking control. The mechanical system and the control system are the primary novel parts for developing a 3D parallel mechanism robot. In the mechanical system, a 3D parallel mechanism robot contains three serial chains, a fixed base, a movable platform and a pneumatic servo system. The parallel mechanism are designed and analyzed first for ...
Wakefield Simulation of CLIC PETS Structure Using Parallel 3D Finite Element Time-Domain Solver T3P
Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Ng, C.; Schussman, G.; Ko, K.; /SLAC; Syratchev, I.; /CERN
2009-06-19
In recent years, SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic time-domain code T3P. Higher-order Finite Element methods on conformal unstructured meshes and massively parallel processing allow unprecedented simulation accuracy for wakefield computations and simulations of transient effects in realistic accelerator structures. Applications include simulation of wakefield damping in the Compact Linear Collider (CLIC) power extraction and transfer structure (PETS).
Influence of intrinsic and extrinsic forces on 3D stress distribution using CUDA programming
Räss, Ludovic; Omlin, Samuel; Podladchikov, Yuri
2013-04-01
In order to have a better understanding of the influence of buoyancy (intrinsic) and boundary (extrinsic) forces in a nonlinear rheology due to a power law fluid, some basics needs to be explored through 3D numerical calculation. As first approach, the already studied Stokes setup of a rising sphere will be used to calibrate the 3D model. Far field horizontal tectonic stress is applied to the sphere, which generates a vertical acceleration, buoyancy driven. This simple and known setup allows some benchmarking performed through systematic runs. The relative importance of intrinsic and extrinsic forces producing the wide variety of rates and styles of deformation, including absence of deformation and generating 3D stress patterns, will be determined. Relation between vertical motion and power law exponent will also be explored. The goal of these investigations will be to run models having topography and density structure from geophysical imaging as input, and 3D stress field as output. The stress distribution in Swiss Alps and Plateau and its implication for risk analysis is one of the perspective for this research. In fact, proximity of the stress to the failure is fundamental for risk assessment. Sensitivity of this to the accurate topography representation can then be evaluated. The developed 3D numerical codes, tuned for mid-sized cluster, need to be optimized, especially while running good resolution in full 3D. Therefor, two largely used computing platforms, MATLAB and FORTRAN 90 are explored. Starting with an easy adaptable and as short as possible MATLAB code, which is then upgraded in order to reach higher performance in simulation times and resolution. A significant speedup using the rising NVIDIA CUDA technology and resources is also possible. Programming in C-CUDA, creating some synchronization feature, and comparing the results with previous runs, helps us to investigate the new speedup possibilities allowed through GPU parallel computing. These codes
Design and verification of an ultra-precision 3D-coordinate measuring machine with parallel drives
An ultra-precision 3D coordinate measuring machine (CMM), the TriNano N100, has been developed. In our design, the workpiece is mounted on a 3D stage, which is driven by three parallel drives that are mutually orthogonal. The linear drives support the 3D stage using vacuum preloaded (VPL) air bearings, whereby each drive determines the position of the 3D stage along one translation direction only. An exactly constrained design results in highly repeatable machine behavior. Furthermore, the machine complies with the Abbé principle over its full measurement range and the application of parallel drives allows for excellent dynamic behavior. The design allows a 3D measurement uncertainty of 100 nanometers in a measurement range of 200 cubic centimeters. Verification measurements using a Gannen XP 3D tactile probing system on a spherical artifact show a standard deviation in single point repeatability of around 2 nm in each direction. (paper)
Recent trends in parallel programming
Jakl, Ondřej
Ostrava: ÚGN AV ČR, 2007 - (Blaheta, R.; Starý, J.), s. 54-58 ISBN 978-80-86407-12-8. [Seminar on Numerical Analysis. Modelling and Simulation of Chalenging Engineering Problems. Winter School. High-performance and Parallel Computers, Programming Technologies & Numerical Linear Algebra. Ostrava (CZ), 22.01.2007-26.01.2007] R&D Projects: GA AV ČR 1ET400300415; GA MŠk 1N04035 Institutional research plan: CEZ:AV0Z30860518 Keywords : high performance computing * parallel programming * MPI Subject RIV: BA - General Mathematics
Chien-Lun Hou
2011-02-01
Full Text Available In this paper, a stereo vision 3D position measurement system for a three-axial pneumatic parallel mechanism robot arm is presented. The stereo vision 3D position measurement system aims to measure the 3D trajectories of the end-effector of the robot arm. To track the end-effector of the robot arm, the circle detection algorithm is used to detect the desired target and the SAD algorithm is used to track the moving target and to search the corresponding target location along the conjugate epipolar line in the stereo pair. After camera calibration, both intrinsic and extrinsic parameters of the stereo rig can be obtained, so images can be rectified according to the camera parameters. Thus, through the epipolar rectification, the stereo matching process is reduced to a horizontal search along the conjugate epipolar line. Finally, 3D trajectories of the end-effector are computed by stereo triangulation. The experimental results show that the stereo vision 3D position measurement system proposed in this paper can successfully track and measure the fifth-order polynomial trajectory and sinusoidal trajectory of the end-effector of the three- axial pneumatic parallel mechanism robot arm.
Bristeau, Marie-Odile; Glowinski, Roland; Périaux, Jacques; Rossi, Tuomo
1999-01-01
We consider the scattering problem for 3-D electromagnetic harmonic waves. The time-domain Maxwell's equations are solved and Exact Controllability methods improve the convergence of the solutions to the time-periodic ones for nonconvex obstacles. A least-squares formulation solved by a preconditioned conjugate gradient is introduced. The discretization is achieved in time by a centered finite difference scheme and in space by Lagrange finite elements. Numerical results for 3-D nonconvex scat...
Gabriele Jost
2010-01-01
Full Text Available Today most systems in high-performance computing (HPC feature a hierarchical hardware design: shared-memory nodes with several multi-core CPUs are connected via a network infrastructure. When parallelizing an application for these architectures it seems natural to employ a hierarchical programming model such as combining MPI and OpenMP. Nevertheless, there is the general lore that pure MPI outperforms the hybrid MPI/OpenMP approach. In this paper, we describe the hybrid MPI/OpenMP parallelization of IR3D (Incompressible Realistic 3-D code, a full-scale real-world application, which simulates the environmental effects on the evolution of vortices trailing behind control surfaces of underwater vehicles. We discuss performance, scalability and limitations of the pure MPI version of the code on a variety of hardware platforms and show how the hybrid approach can help to overcome certain limitations.
Efficient Parallel Programming with Linda
Ashish Deshpande; Martin Schultz
1992-01-01
Linda is a coordination language inverted by David Gelernter at Yale University, which when combined with a computation language (like C) yields a high-level parallel programming language for MIMD machines. Linda is based on a virtual shared associative memory containing objects called tuples. Skeptics have long claimed that Linda programs could not be efficient on distributed memory architectures. In this paper, we address this claim by discussing C-Linda's performance in solving a particula...
Characterization of a parallel-beam CCD optical-CT apparatus for 3D radiation dosimetry
Krstajic, Nikola; Doran, Simon J.
2007-07-01
3D measurement of optical attenuation is of interest in a variety of fields of biomedical importance, including spectrophotometry, optical projection tomography (OPT) and analysis of 3D radiation dosimeters. Accurate, precise and economical 3D measurements of optical density (OD) are a crucial step in enabling 3D radiation dosimeters to enter wider use in clinics. Polymer gels and Fricke gels, as well as dosimeters not based around gels, have been characterized for 3D dosimetry over the last two decades. A separate problem is the verification of the best readout method. A number of different imaging modalities (magnetic resonance imaging (MRI), optical CT, x-ray CT and ultrasound) have been suggested for the readout of information from 3D dosimeters. To date only MRI and laser-based optical CT have been characterized in detail. This paper describes some initial steps we have taken in establishing charge coupled device (CCD)-based optical CT as a viable alternative to MRI for readout of 3D radiation dosimeters. The main advantage of CCD-based optical CT over traditional laser-based optical CT is a speed increase of at least an order of magnitude, while the simplicity of its architecture would lend itself to cheaper implementation than both MRI and laser-based optical CT if the camera itself were inexpensive enough. Specifically, we study the following aspects of optical metrology, using high quality test targets: (i) calibration and quality of absorbance measurements and the camera requirements for 3D dosimetry; (ii) the modulation transfer function (MTF) of individual projections; (iii) signal-to-noise ratio (SNR) in the projection and reconstruction domains; (iv) distortion in the projection domain, depth-of-field (DOF) and telecentricity. The principal results for our current apparatus are as follows: (i) SNR of optical absorbance in projections is better than 120:1 for uniform phantoms in absorbance range 0.3 to 1.6 (and better than 200:1 for absorbances 1.0 to
Novel Kinetic 3D MHD Algorithm for High Performance Parallel Computing Systems
Chetverushkin, B; Saveliev, V
2013-01-01
The impressive progress of the kinetic schemes in the solution of gas dynamics problems and the development of effective parallel algorithms for modern high performance parallel computing systems led to the development of advanced methods for the solution of the magnetohydrodynamics problem in the important area of plasma physics. The novel feature of the method is the formulation of the complex Boltzmann-like distribution function of kinetic method with the implementation of electromagnetic interaction terms. The numerical method is based on the explicit schemes. Due to logical simplicity and its efficiency, the algorithm is easily adapted to modern high performance parallel computer systems including hybrid computing systems with graphic processors.
Parallel Adaptive Computation of Blood Flow in a 3D ``Whole'' Body Model
Zhou, M.; Figueroa, C. A.; Taylor, C. A.; Sahni, O.; Jansen, K. E.
2008-11-01
Accurate numerical simulations of vascular trauma require the consideration of a larger portion of the vasculature than previously considered, due to the systemic nature of the human body's response. A patient-specific 3D model composed of 78 connected arterial branches extending from the neck to the lower legs is constructed to effectively represent the entire body. Recently developed outflow boundary conditions that appropriately represent the downstream vasculature bed which is not included in the 3D computational domain are applied at 78 outlets. In this work, the pulsatile blood flow simulations are started on a fairly uniform, unstructured mesh that is subsequently adapted using a solution-based approach to efficiently resolve the flow features. The adapted mesh contains non-uniform, anisotropic elements resulting in resolution that conforms with the physical length scales present in the problem. The effects of the mesh resolution on the flow field are studied, specifically on relevant quantities of pressure, velocity and wall shear stress.
A first 3D parallel diffusion solver based on a mixed dual finite element approximation
This paper presents a new extension of the mixed dual finite element approximation of the diffusion equation in rectangular geometry. The mixed dual formulation has been extended in order to take into account discontinuity conditions. The iterative method is based on an alternating direction method which uses the current as unknown. This method is parallelizable and have very fast convergence properties. Some results for a 3D calculation on the CRAY computer are presented. (orig.)
Compensation of errors in robot machining with a parallel 3D-piezo compensation mechanism
Schneider, Ulrich; Drust, Manuel; Puzik, Arnold; Verl, Alexander
2013-01-01
This paper proposes an approach for a 3D-Piezo Compensation Mechanism unit that is capable of fast and accurate adaption of the spindle position to enhance machining by robots. The mechanical design is explained which focuses on low mass, good stiffness and high bandwidth in order to allow compensating for errors beyond the bandwidth of the robot. In addition to previous works [7] and [9], an advanced actuation design is presented enabling movements in three translational axes allowing a work...
BioFVM: an efficient, parallelized diffusive transport solver for 3-D biological simulations
Ghaffarizadeh, Ahmadreza; Friedman, Samuel H.; Macklin, Paul
2015-01-01
Motivation: Computational models of multicellular systems require solving systems of PDEs for release, uptake, decay and diffusion of multiple substrates in 3D, particularly when incorporating the impact of drugs, growth substrates and signaling factors on cell receptors and subcellular systems biology. Results: We introduce BioFVM, a diffusive transport solver tailored to biological problems. BioFVM can simulate release and uptake of many substrates by cell and bulk sources, diffusion and de...
Wang, S.; De Hoop, M. V.; Xia, J.; Li, X.
2011-12-01
We consider the modeling of elastic seismic wave propagation on a rectangular domain via the discretization and solution of the inhomogeneous coupled Helmholtz equation in 3D, by exploiting a parallel multifrontal sparse direct solver equipped with Hierarchically Semi-Separable (HSS) structure to reduce the computational complexity and storage. In particular, we are concerned with solving this equation on a large domain, for a large number of different forcing terms in the context of seismic problems in general, and modeling in particular. We resort to a parsimonious mixed grid finite differences scheme for discretizing the Helmholtz operator and Perfect Matched Layer boundaries, resulting in a non-Hermitian matrix. We make use of a nested dissection based domain decomposition, and introduce an approximate direct solver by developing a parallel HSS matrix compression, factorization, and solution approach. We cast our massive parallelization in the framework of the multifrontal method. The assembly tree is partitioned into local trees and a global tree. The local trees are eliminated independently in each processor, while the global tree is eliminated through massive communication. The solver for the inhomogeneous equation is a parallel hybrid between multifrontal and HSS structure. The computational complexity associated with the factorization is almost linear with the size of the Helmholtz matrix. Our numerical approach can be compared with the spectral element method in 3D seismic applications.
Parallel load balancing strategy for Volume-of-Fluid methods on 3-D unstructured meshes
Jofre, Lluís; Borrell, Ricard; Lehmkuhl, Oriol; Oliva, Assensi
2015-02-01
Volume-of-Fluid (VOF) is one of the methods of choice to reproduce the interface motion in the simulation of multi-fluid flows. One of its main strengths is its accuracy in capturing sharp interface geometries, although requiring for it a number of geometric calculations. Under these circumstances, achieving parallel performance on current supercomputers is a must. The main obstacle for the parallelization is that the computing costs are concentrated only in the discrete elements that lie on the interface between fluids. Consequently, if the interface is not homogeneously distributed throughout the domain, standard domain decomposition (DD) strategies lead to imbalanced workload distributions. In this paper, we present a new parallelization strategy for general unstructured VOF solvers, based on a dynamic load balancing process complementary to the underlying DD. Its parallel efficiency has been analyzed and compared to the DD one using up to 1024 CPU-cores on an Intel SandyBridge based supercomputer. The results obtained on the solution of several artificially generated test cases show a speedup of up to ∼12× with respect to the standard DD, depending on the interface size, the initial distribution and the number of parallel processes engaged. Moreover, the new parallelization strategy presented is of general purpose, therefore, it could be used to parallelize any VOF solver without requiring changes on the coupled flow solver. Finally, note that although designed for the VOF method, our approach could be easily adapted to other interface-capturing methods, such as the Level-Set, which may present similar workload imbalances.
Exploiting Parallelism in Coalgebraic Logic Programming
Komendantskaya, Ekaterina; Schmidt, Martin; Heras, Jónathan
2013-01-01
We present a parallel implementation of Coalgebraic Logic Programming (CoALP) in the programming language Go. CoALP was initially introduced to reflect coalgebraic semantics of logic programming, with coalgebraic derivation algorithm featuring both corecursion and parallelism. Here, we discuss how the coalgebraic semantics influenced our parallel implementation of logic programming.
A Framework for Parallel Programming in Java
Launay, Pascale; Pazat, Jean-Louis
1997-01-01
To ease the task of programming parallel and distributed applications, the Do! project aims at the automatic generation of distributed code from multi-threaded Java programs. We provide a parallel programming model, embedded in a framework that constraints parallelism without any extension to the Java language. This framework is described here and is used as a basis to generate distributed programs.
C# game programming cookbook for Unity 3D
Murray, Jeff W
2014-01-01
Making Games the Modular Way Important Programming ConceptsBuilding the Core Game Framework Controllers and Managers Building the Core Framework ScriptsPlayer Structure Game-Specific Player Controller Dealing with Input Player Manager User Data ManagerRecipes: Common Components Introduction The Timer Class Spawn ScriptsSet Gravity Pretend Friction-Friction Simulation to Prevent Slipping Around Cameras Input Scripts Automatic Self-Destruction Script Automatic Object SpinnerScene Manager Building Player Movement ControllersShoot 'Em Up Spaceship Humanoid Character Wheeled Vehicle Weapon Systems
Contributions to computational stereology and parallel programming
Rasmusson, Allan
rotator, even without the need for isotropic sections. To meet the need for computational power to perform image restoration of virtual tissue sections, parallel programming on GPUs has also been part of the project. This has lead to a significant change in paradigm for a previously developed surgical...... between computer science and stereology, we try to overcome these problems by developing new virtual stereological probes and virtual tissue sections. A concrete result is the development of a new virtual 3D probe, the spatial rotator, which was found to have lower variance than the widely used planar...... simulator and a memory efficient, GPU implementation of for connected components labeling. This was furthermore extended to produce signed distance fields and Voronoi diagrams, all with real-time performance. It has during the course of the project been realized that many disciplines within computer science...
Parallel 3-D particle-in-cell modelling of charged ultrarelativistic beam dynamics
Boronina, Marina A.; Vshivkov, Vitaly A.
2015-12-01
> ) in supercolliders. We use the 3-D set of Maxwell's equations for the electromagnetic fields, and the Vlasov equation for the distribution function of the beam particles. The model incorporates automatically the longitudinal effects, which can play a significant role in the cases of super-high densities. We present numerical results for the dynamics of two focused ultrarelativistic beams with a size ratio 10:1:100. The results demonstrate high efficiency of the proposed computational methods and algorithms, which are applicable to a variety of problems in relativistic plasma physics.
Schultz, A.
2010-12-01
3D forward solvers lie at the core of inverse formulations used to image the variation of electrical conductivity within the Earth's interior. This property is associated with variations in temperature, composition, phase, presence of volatiles, and in specific settings, the presence of groundwater, geothermal resources, oil/gas or minerals. The high cost of 3D solutions has been a stumbling block to wider adoption of 3D methods. Parallel algorithms for modeling frequency domain 3D EM problems have not achieved wide scale adoption, with emphasis on fairly coarse grained parallelism using MPI and similar approaches. The communications bandwidth as well as the latency required to send and receive network communication packets is a limiting factor in implementing fine grained parallel strategies, inhibiting wide adoption of these algorithms. Leading Graphics Processor Unit (GPU) companies now produce GPUs with hundreds of GPU processor cores per die. The footprint, in silicon, of the GPU's restricted instruction set is much smaller than the general purpose instruction set required of a CPU. Consequently, the density of processor cores on a GPU can be much greater than on a CPU. GPUs also have local memory, registers and high speed communication with host CPUs, usually through PCIe type interconnects. The extremely low cost and high computational power of GPUs provides the EM geophysics community with an opportunity to achieve fine grained (i.e. massive) parallelization of codes on low cost hardware. The current generation of GPUs (e.g. NVidia Fermi) provides 3 billion transistors per chip die, with nearly 500 processor cores and up to 6 GB of fast (DDR5) GPU memory. This latest generation of GPU supports fast hardware double precision (64 bit) floating point operations of the type required for frequency domain EM forward solutions. Each Fermi GPU board can sustain nearly 1 TFLOP in double precision, and multiple boards can be installed in the host computer system. We
Simulation of the 3D viscoelastic free surface flow by a parallel corrected particle scheme
Jin-Lian, Ren; Tao, Jiang
2016-02-01
In this work, the behavior of the three-dimensional (3D) jet coiling based on the viscoelastic Oldroyd-B model is investigated by a corrected particle scheme, which is named the smoothed particle hydrodynamics with corrected symmetric kernel gradient and shifting particle technique (SPH_CS_SP) method. The accuracy and stability of SPH_CS_SP method is first tested by solving Poiseuille flow and Taylor-Green flow. Then the capacity for the SPH_CS_SP method to solve the viscoelastic fluid is verified by the polymer flow through a periodic array of cylinders. Moreover, the convergence of the SPH_CS_SP method is also investigated. Finally, the proposed method is further applied to the 3D viscoelastic jet coiling problem, and the influences of macroscopic parameters on the jet coiling are discussed. The numerical results show that the SPH_CS_SP method has higher accuracy and better stability than the traditional SPH method and other corrected SPH method, and can improve the tensile instability. Project supported by the Natural Science Foundation of Jiangsu Province, China (Grant Nos. BK20130436 and BK20150436) and the Natural Science Foundation of the Higher Education Institutions of Jiangsu Province, China (Grant No. 15KJB110025).
1 - Description of program or function: PARTISN (Parallel, Time-Dependent SN) is the evolutionary successor to CCC-0547/DANTSYS. User input and cross section formats are very similar to that of DANTSYS. The linear Boltzmann transport equation is solved for neutral particles using the deterministic (SN) method. Both the static (fixed source or eigenvalue) and time-dependent forms of the transport equation are solved in forward or adjoint mode. Vacuum, reflective, periodic, white, or inhomogeneous boundary conditions are solved. General anisotropic scattering and inhomogeneous sources are permitted. PARTISN solves the transport equation on orthogonal (single level or block-structured AMR) grids in 1-D (slab, two-angle slab, cylindrical, or spherical), 2-D (X-Y, R-Z, or R-T) and 3-D (X-Y-Z or R-Z-T) geometries. 2 - Methods:PARTISN numerically solves the multigroup form of the neutral-particle Boltzmann transport equation. The discrete-ordinates form of approximation is used for treating the angular variation of the particle distribution. For curvilinear geometries, diamond differencing is used for angular discretization. The spatial discretizations may be either low-order (diamond difference or Adaptive Weighted Diamond Difference (AWDD)) or higher-order (linear discontinuous or exponential discontinuous). Negative fluxes are eliminated by a local set-to-zero-and-correct algorithm for the diamond case (DD/STZ). Time differencing is Crank-Nicholson (diamond), also with a set-to-zero fix-up scheme. Both inner and outer iterations can be accelerated using the diffusion synthetic acceleration method, or transport synthetic acceleration can be used to accelerate the inner iterations. The diffusion solver uses either the conjugate gradient or multigrid method. Chebyshev acceleration of the fission source is used. The angular source terms may be treated either via standard PN expansions or Galerkin scattering. An option is provided for strictly positive scattering sources
Parallel Programming Environment for OpenMP
Insung Park; Michael J. Voss; Seon Wook Kim; Rudolf Eigenmann
2001-01-01
We present our effort to provide a comprehensive parallel programming environment for the OpenMP parallel directive language. This environment includes a parallel programming methodology for the OpenMP programming model and a set of tools (Ursa Minor and InterPol) that support this methodology. Our toolset provides automated and interactive assistance to parallel programmers in time-consuming tasks of the proposed methodology. The features provided by our tools include performance and program...
Chu, Chunlei
2009-01-01
The major performance bottleneck of the parallel Fourier method on distributed memory systems is the network communication cost. In this study, we investigate the potential of using non‐blocking all‐to‐all communications to solve this problem by overlapping computation and communication. We present the runtime comparison of a 3D seismic modeling problem with the Fourier method using non‐blocking and blocking calls, respectively, on a Linux cluster. The data demonstrate that a performance improvement of up to 40% can be achieved by simply changing blocking all‐to‐all communication calls to non‐blocking ones to introduce the overlapping capability. A 3D reverse‐time migration result is also presented as an extension to the modeling work based on non‐blocking collective communications.
High-Performance Computation of Distributed-Memory Parallel 3D Voronoi and Delaunay Tessellation
Peterka, Tom; Morozov, Dmitriy; Phillips, Carolyn
2014-11-14
Computing a Voronoi or Delaunay tessellation from a set of points is a core part of the analysis of many simulated and measured datasets: N-body simulations, molecular dynamics codes, and LIDAR point clouds are just a few examples. Such computational geometry methods are common in data analysis and visualization; but as the scale of simulations and observations surpasses billions of particles, the existing serial and shared-memory algorithms no longer suffice. A distributed-memory scalable parallel algorithm is the only feasible approach. The primary contribution of this paper is a new parallel Delaunay and Voronoi tessellation algorithm that automatically determines which neighbor points need to be exchanged among the subdomains of a spatial decomposition. Other contributions include periodic and wall boundary conditions, comparison of our method using two popular serial libraries, and application to numerous science datasets.
Large-scale Parallel Unstructured Mesh Computations for 3D High-lift Analysis
Mavriplis, Dimitri J.; Pirzadeh, S.
1999-01-01
A complete "geometry to drag-polar" analysis capability for the three-dimensional high-lift configurations is described. The approach is based on the use of unstructured meshes in order to enable rapid turnaround for complicated geometries that arise in high-lift configurations. Special attention is devoted to creating a capability for enabling analyses on highly resolved grids. Unstructured meshes of several million vertices are initially generated on a work-station, and subsequently refined on a supercomputer. The flow is solved on these refined meshes on large parallel computers using an unstructured agglomeration multigrid algorithm. Good prediction of lift and drag throughout the range of incidences is demonstrated on a transport take-off configuration using up to 24.7 million grid points. The feasibility of using this approach in a production environment on existing parallel machines is demonstrated, as well as the scalability of the solver on machines using up to 1450 processors.
Calibration of 3-d.o.f. Translational Parallel Manipulators Using Leg Observations
Pashkevich, Anatoly; Wenger, Philippe; Gomolitsky, Roman
2009-01-01
The paper proposes a novel approach for the geometrical model calibration of quasi-isotropic parallel kinematic mechanisms of the Orthoglide family. It is based on the observations of the manipulator leg parallelism during motions between the specific test postures and employs a low-cost measuring system composed of standard comparator indicators attached to the universal magnetic stands. They are sequentially used for measuring the deviation of the relevant leg location while the manipulator moves the TCP along the Cartesian axes. Using the measured differences, the developed algorithm estimates the joint offsets and the leg lengths that are treated as the most essential parameters. Validity of the proposed calibration technique is confirmed by the experimental results.
Multiscale simulation of mixing processes using 3D-parallel, fluid-structure interaction techniques
Valette, Rudy; Vergnes, Bruno; Coupez, Thierry
2008-01-01
International audience This work focuses on the development of a general finite element code, called Ximex®, devoted to the three-dimensional direct simulation of mixing processes of complex fluids. The code is based on a simplified fictitious domain method coupled with a "level-set" approach to represent the rigid moving boundaries, such as screws and rotors, as well as free surfaces. These techniques, combined with the use of parallel computing, allow computing the time-dependent flow of...
A new ray-tracing scheme for 3D diffuse radiation transfer on highly parallel architectures
Tanaka, Satoshi; Okamoto, Takashi; Hasegawa, Kenji
2014-01-01
We present a new numerical scheme to solve the transfer of diffuse radiation on three-dimensional mesh grids which is efficient on processors with highly parallel architecture such as recently popular GPUs and CPUs with multi- and many-core architectures. The scheme is based on the ray-tracing method and the computational cost is proportional to $N_{\\rm m}^{5/3}$ where $N_{\\rm m}$ is the number of mesh grids, and is devised to compute the radiation transfer along each light-ray completely in parallel with appropriate grouping of the light-rays. We find that the performance of our scheme scales well with the number of adopted CPU cores and GPUs, and also that our scheme is nicely parallelized on a multi-node system by adopting the multiple wave front scheme, and the performance scales well with the amount of the computational resources. As numerical tests to validate our scheme and to give a physical criterion for the angular resolution of our ray-tracing scheme, we perform several numerical simulations of the...
Hybrid shared/distributed parallelism for 3D characteristics transport solvers
In this paper, we will present a new hybrid parallel model for solving large-scale 3-dimensional neutron transport problems used in nuclear reactor simulations. Large heterogeneous reactor problems, like the ones that occurs when simulating Candu cores, have remained computationally intensive and impractical for routine applications on single-node or even vector computers. Based on the characteristics method, this new model is designed to solve the transport equation after distributing the calculation load on a network of shared memory multi-processors. The tracks are either generated on the fly at each characteristics sweep or stored in sequential files. The load balancing is taken into account by estimating the calculation load of tracks and by distributing batches of uniform load on each node of the network. Moreover, the communication overhead can be predicted after benchmarking the latency and bandwidth using appropriate network test suite. These models are useful for predicting the performance of the parallel applications and to analyze the scalability of the parallel systems. (authors)
A new ray-tracing scheme for 3D diffuse radiation transfer on highly parallel architectures
Tanaka, Satoshi; Yoshikawa, Kohji; Okamoto, Takashi; HASEGAWA, Kenji
2014-01-01
We present a new numerical scheme to solve the transfer of diffuse radiation on three-dimensional mesh grids which is efficient on processors with highly parallel architecture such as recently popular GPUs and CPUs with multi- and many-core architectures. The scheme is based on the ray-tracing method and the computational cost is proportional to $N_{\\rm m}^{5/3}$ where $N_{\\rm m}$ is the number of mesh grids, and is devised to compute the radiation transfer along each light-ray completely in ...
Knowledge rule base for the beam optics program TRACE 3-D
An expert system type of knowledge rule base has been developed for the input parameters used by the particle beam transport program TRACE 3-D. The goal has been to provide the program's user with adequate on-screen information to allow him to initially set up a problem with minimal open-quotes off-lineclose quotes calculations. The focus of this work has been in developing rules for the parameters which define the beam line transport elements. Ten global parameters, the particle mass and charge, beam energy, etc., are used to provide open-quotes expertclose quotes estimates of lower and upper limits for each of the transport element parameters. For example, the limits for the field strength of the quadrupole element are based on a water-cooled, iron-core electromagnet with dimensions derived from practical engineering constraints, and the upper limit for the effective length is scaled with the particle momenta so that initially parallel trajectories do not cross the axis inside the magnet. Limits for the quadrupole doublet and triplet parameters incorporate these rules and additional rules based on stable FODO lattices and bidirectional focusing requirements. The structure of the rule base is outlined and examples for the quadrupole singlet, doublet and triplet are described. The rule base has been implemented within the Shell for Particle Accelerator Related Codes (SPARC) graphical user interface (GUI)
The Experience of Large Computational Programs Parallelization. Parallel Version of MINUIT Program
Sapozhnikov, A P
2003-01-01
The problems around large computational programs parallelization are discussed. As an example a parallel version of MINUIT, widely used program for minimization, is introduced. The given results of MPI-based MINUIT testing on multiprocessor system demonstrate really reached parallelism.
3D parallel-detection microwave tomography for clinical breast imaging
Epstein, N. R., E-mail: nepstein@ucalgary.ca [Schulich School of Engineering, University of Calgary, 2500 University Dr. NW, Calgary, Alberta T2N 1N4 (Canada); Meaney, P. M. [Thayer School of Engineering, Dartmouth College, 14 Engineering Dr., Hanover, New Hampshire 03755 (United States); Paulsen, K. D. [Thayer School of Engineering, Dartmouth College, 14 Engineering Dr., Hanover, New Hampshire 03755 (United States); Department of Radiology, Geisel School of Medicine, Dartmouth College, Hanover, New Hampshire 03755 (United States); Norris Cotton Cancer Center, Dartmouth Hitchcock Medical Center, Lebanon, New Hampshire 03756 (United States); Advanced Surgical Center, Dartmouth Hitchcock Medical Center, Lebanon, New Hampshire 03756 (United States)
2014-12-15
A biomedical microwave tomography system with 3D-imaging capabilities has been constructed and translated to the clinic. Updates to the hardware and reconfiguration of the electronic-network layouts in a more compartmentalized construct have streamlined system packaging. Upgrades to the data acquisition and microwave components have increased data-acquisition speeds and improved system performance. By incorporating analog-to-digital boards that accommodate the linear amplification and dynamic-range coverage our system requires, a complete set of data (for a fixed array position at a single frequency) is now acquired in 5.8 s. Replacement of key components (e.g., switches and power dividers) by devices with improved operational bandwidths has enhanced system response over a wider frequency range. High-integrity, low-power signals are routinely measured down to −130 dBm for frequencies ranging from 500 to 2300 MHz. Adequate inter-channel isolation has been maintained, and a dynamic range >110 dB has been achieved for the full operating frequency range (500–2900 MHz). For our primary band of interest, the associated measurement deviations are less than 0.33% and 0.5° for signal amplitude and phase values, respectively. A modified monopole antenna array (composed of two interwoven eight-element sub-arrays), in conjunction with an updated motion-control system capable of independently moving the sub-arrays to various in-plane and cross-plane positions within the illumination chamber, has been configured in the new design for full volumetric data acquisition. Signal-to-noise ratios (SNRs) are more than adequate for all transmit/receive antenna pairs over the full frequency range and for the variety of in-plane and cross-plane configurations. For proximal receivers, in-plane SNRs greater than 80 dB are observed up to 2900 MHz, while cross-plane SNRs greater than 80 dB are seen for 6 cm sub-array spacing (for frequencies up to 1500 MHz). We demonstrate accurate
The linear Boltzmann transport equation (BTE) is an integro-differential equation arising in deterministic models of neutral and charged particle transport. In slab (one-dimensional Cartesian) geometry and certain higher-dimensional cases, Diffusion Synthetic Acceleration (DSA) is known to be an effective algorithm for the iterative solution of the discretized BTE. Fourier and asymptotic analyses have been applied to various idealizations (e.g., problems on infinite domains with constant coefficients) to obtain sharp bounds on the convergence rate of DSA in such cases. While DSA has been shown to be a highly effective acceleration (or preconditioning) technique in one-dimensional problems, it has been observed to be less effective in higher dimensions. This is due in part to the expense of solving the related diffusion linear system. We investigate here the effectiveness of a parallel semicoarsening multigrid (SMG) solution approach to DSA preconditioning in several three dimensional problems. In particular, we consider the algorithmic and implementation scalability of a parallel SMG-DSA preconditioner on several types of test problems
Parallel unstructured mesh optimisation for 3D radiation transport and fluids modelling
In this paper we describe the theory and application of a parallel mesh optimisation procedure to obtain self-adapting finite element solutions on unstructured tetrahedral grids. The optimisation procedure adapts the tetrahedral mesh to the solution of a radiation transport or fluid flow problem without sacrificing the integrity of the boundary (geometry), or internal boundaries (regions) of the domain. The objective is to obtain a mesh which has both a uniform interpolation error in any direction and the element shapes are of good quality. This is accomplished with use of a non-Euclidean (anisotropic) metric which is related to the Hessian of the solution field. Appropriate scaling of the metric enables the resolution of multi-scale phenomena as encountered in transient incompressible fluids and multigroup transport calculations. The resulting metric is used to calculate element size and shape quality. The mesh optimisation method is based on a series of mesh connectivity and node position searches of the landscape defining mesh quality which is gauged by a functional. The mesh modification thus fits the solution field(s) in an optimal manner. The parallel mesh optimisation/adaptivity procedure presented in this paper is of general applicability. We illustrate this by applying it to a transient CFD (computational fluid dynamics) problem. Incompressible flow past a cylinder at moderate Reynolds numbers is modelled to demonstrate that the mesh can follow transient flow features. (authors)
A spherical harmonics research code (DANTE) has been developed which is compatible with parallel computer architectures. DANTE provides 3-D, multi-material, deterministic, transport capabilities using an arbitrary finite element mesh. The linearized Boltzmann transport equation is solved in a second order self-adjoint form utilizing a Galerkin finite element spatial differencing scheme. The core solver utilizes a preconditioned conjugate gradient algorithm. Other distinguishing features of the code include options for discrete-ordinates and simplified spherical harmonics angular differencing, an exact Marshak boundary treatment for arbitrarily oriented boundary faces, in-line matrix construction techniques to minimize memory consumption, and an effective diffusion based preconditioner for scattering dominated problems. Algorithm efficiency is demonstrated for a massively parallel SIMD architecture (CM-5), and compatibility with MPP multiprocessor platforms or workstation clusters is anticipated
3D interconnect architecture for high-bandwidth massively paralleled imager
The proton radiography group at LANL is developing a fast (5x106 frames/s or 5 megaframe/s) multi-frame imager for use in dynamic radiographic experiments with high-energy protons. The mega-pixel imager will acquire and process a burst of 32 frames captured at inter-frame time ∼200 ns. Real time signal processing and storage requirements for entire frames, of rapidly acquired pixels impose severe demands on the space available for the electronics in a standard monolithic approach. As such, a 3D arrangement of detector and circuit elements is under development. In this scheme, the readout integrated circuits (ROICs) are stacked vertically (like playing cards) into a cube configuration. Another die, a fully depleted pixel photo-diode focal plane array (FPA), is bump bonded to one of the edge surfaces formed by the resulting ROIC cube. Recently, an assembly of the proof-of-principle test cube and sensor has been completed
3D interconnect architecture for high-bandwidth massively paralleled imager
Kwiatkowski, K. E-mail: krisk@lanl.gov; Lyke, J.C.; Wojnarowski, R.J.; Beche, J.-F.; Fillion, R.; Kapusta, C.; Millaud, J.; Saia, R.; Wilke, M.D
2003-08-21
The proton radiography group at LANL is developing a fast (5x10{sup 6} frames/s or 5 megaframe/s) multi-frame imager for use in dynamic radiographic experiments with high-energy protons. The mega-pixel imager will acquire and process a burst of 32 frames captured at inter-frame time {approx}200 ns. Real time signal processing and storage requirements for entire frames, of rapidly acquired pixels impose severe demands on the space available for the electronics in a standard monolithic approach. As such, a 3D arrangement of detector and circuit elements is under development. In this scheme, the readout integrated circuits (ROICs) are stacked vertically (like playing cards) into a cube configuration. Another die, a fully depleted pixel photo-diode focal plane array (FPA), is bump bonded to one of the edge surfaces formed by the resulting ROIC cube. Recently, an assembly of the proof-of-principle test cube and sensor has been completed.
About Parallel Programming: Paradigms, Parallel Execution and Collaborative Systems
Loredana MOCEAN; Monica CEACA
2009-01-01
In the last years, there were made efforts for delineation of a stabile and unitary frame, where the problems of logical parallel processing must find solutions at least at the level of imperative languages. The results obtained by now are not at the level of the made efforts. This paper wants to be a little contribution at these efforts. We propose an overview in parallel programming, parallel execution and collaborative systems.
About Parallel Programming: Paradigms, Parallel Execution and Collaborative Systems
Loredana MOCEAN
2009-01-01
Full Text Available In the last years, there were made efforts for delineation of a stabile and unitary frame, where the problems of logical parallel processing must find solutions at least at the level of imperative languages. The results obtained by now are not at the level of the made efforts. This paper wants to be a little contribution at these efforts. We propose an overview in parallel programming, parallel execution and collaborative systems.
Awatsuji, Yasuhiro; Xia, Peng; Wang, Yexin; Matoba, Osamu
2016-03-01
Digital holography is a technique of 3D measurement of object. The technique uses an image sensor to record the interference fringe image containing the complex amplitude of object, and numerically reconstructs the complex amplitude by computer. Parallel phase-shifting digital holography is capable of accurate 3D measurement of dynamic object. This is because this technique can reconstruct the complex amplitude of object, on which the undesired images are not superimposed, form a single hologram. The undesired images are the non-diffraction wave and the conjugate image which are associated with holography. In parallel phase-shifting digital holography, a hologram, whose phase of the reference wave is spatially and periodically shifted every other pixel, is recorded to obtain complex amplitude of object by single-shot exposure. The recorded hologram is decomposed into multiple holograms required for phase-shifting digital holography. The complex amplitude of the object is free from the undesired images is reconstructed from the multiple holograms. To validate parallel phase-shifting digital holography, a high-speed parallel phase-shifting digital holography system was constructed. The system consists of a Mach-Zehnder interferometer, a continuous-wave laser, and a high-speed polarization imaging camera. Phase motion picture of dynamic air flow sprayed from a nozzle was recorded at 180,000 frames per second (FPS) have been recorded by the system. Also phase motion picture of dynamic air induced by discharge between two electrodes has been recorded at 1,000,000 FPS, when high voltage was applied between the electrodes.
Structured Parallel Programming Patterns for Efficient Computation
McCool, Michael; Robison, Arch
2012-01-01
Programming is now parallel programming. Much as structured programming revolutionized traditional serial programming decades ago, a new kind of structured programming, based on patterns, is relevant to parallel programming today. Parallel computing experts and industry insiders Michael McCool, Arch Robison, and James Reinders describe how to design and implement maintainable and efficient parallel algorithms using a pattern-based approach. They present both theory and practice, and give detailed concrete examples using multiple programming models. Examples are primarily given using two of th
The ParaScope parallel programming environment
Cooper, Keith D.; Hall, Mary W.; Hood, Robert T.; Kennedy, Ken; Mckinley, Kathryn S.; Mellor-Crummey, John M.; Torczon, Linda; Warren, Scott K.
1993-01-01
The ParaScope parallel programming environment, developed to support scientific programming of shared-memory multiprocessors, includes a collection of tools that use global program analysis to help users develop and debug parallel programs. This paper focuses on ParaScope's compilation system, its parallel program editor, and its parallel debugging system. The compilation system extends the traditional single-procedure compiler by providing a mechanism for managing the compilation of complete programs. Thus, ParaScope can support both traditional single-procedure optimization and optimization across procedure boundaries. The ParaScope editor brings both compiler analysis and user expertise to bear on program parallelization. It assists the knowledgeable user by displaying and managing analysis and by providing a variety of interactive program transformations that are effective in exposing parallelism. The debugging system detects and reports timing-dependent errors, called data races, in execution of parallel programs. The system combines static analysis, program instrumentation, and run-time reporting to provide a mechanical system for isolating errors in parallel program executions. Finally, we describe a new project to extend ParaScope to support programming in FORTRAN D, a machine-independent parallel programming language intended for use with both distributed-memory and shared-memory parallel computers.
A Case Study of a Hybrid Parallel 3D Surface Rendering Graphics Architecture
Holten-Lund, Hans Erik; Madsen, Jan; Pedersen, Steen
1997-01-01
This paper presents a case study in the design strategy used inbuilding a graphics computer, for drawing very complex 3Dgeometric surfaces. The goal is to build a PC based computer systemcapable of handling surfaces built from about 2 million triangles, andto be able to render a perspective view...... of these on a computer displayat interactive frame rates, i.e. processing around 50 milliontriangles per second. The paper presents a hardware/softwarearchitecture called HPGA (Hybrid Parallel Graphics Architecture) whichis likely to be able to carry out this task. The case study focuses ontechniques to increase...... the clock frequency as well as the parallelismof the system. This paper focuses on the back-end graphics pipeline,which is responsible for rasterizing triangles.%with a practically linear increase in performance. A pure software implementation of the proposed architecture iscurrently able to process 300...
Implementation of a 3D plasma particle-in-cell code on a MIMD parallel computer
A three-dimensional plasma particle-in-cell (PIC) code has been implemented on the Intel Delta MIMD parallel supercomputer using the General Concurrent PIC algorithm. The GCPIC algorithm uses a domain decomposition to divide the computation among the processors: A processor is assigned a subdomain and all the particles in it. Particles must be exchanged between processors as they move. Results are presented comparing the efficiency for 1-, 2- and 3-dimensional partitions of the three dimensional domain. This algorithm has been found to be very efficient even when a large fraction (e.g. 30%) of the particles must be exchanged at every time step. On the 512-node Intel Delta, up to 125 million particles have been pushed with an electrostatic push time of under 500 nsec/particle/time step
Parallel CFD simulation of flow in a 3D model of vibrating human vocal folds
Šidlof, Petr; Horáček, Jaromír; Řidký, V.
2013-01-01
Roč. 80, č. 1 (2013), s. 290-300. ISSN 0045-7930 R&D Projects: GA ČR(CZ) GAP101/11/0207 Institutional research plan: CEZ:AV0Z20760514 Keywords : numerical simulation * vocal folds * glottal airflow * inite volume method * parallel CFD Subject RIV: BI - Acoustics Impact factor: 1.532, year: 2013 http://www.sciencedirect.com/science?_ob=ArticleListURL&_method=list&_ArticleListID=-268060849&_sort=r&_st=13&view=c&_acct=C000034318&_version=1&_urlVersion=0&_userid=640952&md5=7c5b5539857ee9a02af5e690585b3126&searchtype=a
Hybrid Characteristics: 3D radiative transfer for parallel adaptive mesh refinement hydrodynamics
Rijkhorst, E J; Dubey, A; Mellema, G R; Rijkhorst, Erik-Jan; Plewa, Tomasz; Dubey, Anshu; Mellema, Garrelt
2005-01-01
We have developed a three-dimensional radiative transfer method designed specifically for use with parallel adaptive mesh refinement hydrodynamics codes. This new algorithm, which we call hybrid characteristics, introduces a novel form of ray tracing that can neither be classified as long, nor as short characteristics, but which applies the underlying principles, i.e. efficient execution through interpolation and parallelizability, of both. Primary applications of the hybrid characteristics method are radiation hydrodynamics problems that take into account the effects of photoionization and heating due to point sources of radiation. The method is implemented in the hydrodynamics package FLASH. The ionization, heating, and cooling processes are modelled using the DORIC ionization package. Upon comparison with the long characteristics method, we find that our method calculates the column density with a similarly high accuracy and produces sharp and well defined shadows. We show the quality of the new algorithm ...
3D Profile Filter Algorithm Based on Parallel Generalized B-spline Approximating Gaussian
REN Zhiying; GAO Chenghui; SHEN Ding
2015-01-01
Currently, the approximation methods of the Gaussian filter by some other spline filters have been developed. However, these methods are only suitable for the study of one-dimensional filtering, when these methods are used for three-dimensional filtering, it is found that a rounding error and quantization error would be passed to the next in every part. In this paper, a new and high-precision implementation approach for Gaussian filter is described, which is suitable for three-dimensional reference filtering. Based on the theory of generalized B-spline function and the variational principle, the transmission characteristics of a digital filter can be changed through the sensitivity of the parameters (t1, t2), and which can also reduce the rounding error and quantization error by the filter in a parallel form instead of the cascade form. Finally, the approximation filter of Gaussian filter is obtained. In order to verify the feasibility of the new algorithm, the reference extraction of the conventional methods are also used and compared. The experiments are conducted on the measured optical surface, and the results show that the total calculation by the new algorithm only requires 0.07 s for 480´480 data points;the amplitude deviation between the reference of the parallel form filter and the Gaussian filter is smaller;the new method is closer to the characteristic of the Gaussian filter through the analysis of three-dimensional roughness parameters, comparing with the cascade generalized B-spline approximating Gaussian. So the new algorithm is also efficient and accurate for the implementation of Gaussian filter in the application of surface roughness measurement.
A parallel block multi-level preconditioner for the 3D incompressible Navier-Stokes equations
The development of robust and efficient algorithms for both steady-state simulations and fully implicit time integration of the Navier-Stokes equations is an active research topic. To be effective, the linear subproblems generated by these methods require solution techniques that exhibit robust and rapid convergence. In particular, they should be insensitive to parameters in the problem such as mesh size, time step, and Reynolds number. In this context, we explore a parallel preconditioner based on a block factorization of the coefficient matrix generated in an Oseen nonlinear iteration for the primitive variable formulation of the system. The key to this preconditioner is the approximation of a certain Schur complement operator by a technique first proposed by Kay, Loghin, and Wathen [SIAM J. Sci. Comput., 2002] and Silvester, Elman, Kay, and Wathen [J. Comput. Appl. Math. 128 (2001) 261]. The resulting operator entails subsidiary computations (solutions of pressure Poisson and convection-diffusion subproblems) that are similar to those required for decoupled solution methods; however, in this case these solutions are applied as preconditioners to the coupled Oseen system. One important aspect of this approach is that the convection-diffusion and Poisson subproblems are significantly easier to solve than the entire coupled system, and a solver can be built using tools developed for the subproblems. In this paper, we apply smoothed aggregation algebraic multigrid to both subproblems. Previous work has focused on demonstrating the optimality of these preconditioners with respect to mesh size on serial, two-dimensional, steady-state computations employing geometric multi-grid methods; we focus on extending these methods to large-scale, parallel, three-dimensional, transient and steady-state simulations employing algebraic multigrid (AMG) methods. Our results display nearly optimal convergence rates for steady-state solutions as well as for transient solutions over a
Parallel Programming in the Age of Ubiquitous Parallelism
Pingali, Keshav
2014-04-01
Multicore and manycore processors are now ubiquitous, but parallel programming remains as difficult as it was 30-40 years ago. During this time, our community has explored many promising approaches including functional and dataflow languages, logic programming, and automatic parallelization using program analysis and restructuring, but none of these approaches has succeeded except in a few niche application areas. In this talk, I will argue that these problems arise largely from the computation-centric foundations and abstractions that we currently use to think about parallelism. In their place, I will propose a novel data-centric foundation for parallel programming called the operator formulation in which algorithms are described in terms of actions on data. The operator formulation shows that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous even in complex, irregular graph applications such as mesh generation/refinement/partitioning and SAT solvers. Regular algorithms emerge as a special case of irregular ones, and many application-specific optimization techniques can be generalized to a broader context. The operator formulation also leads to a structural analysis of algorithms called TAO-analysis that provides implementation guidelines for exploiting parallelism efficiently. Finally, I will describe a system called Galois based on these ideas for exploiting amorphous data-parallelism on multicores and GPUs
Synergia: A hybrid, parallel beam dynamics code with 3D space charge
James F. Amundson; Panagiotis Spentzouris
2003-07-09
We describe Synergia, a hybrid code developed under the DOE SciDAC-supported Accelerator Simulation Program. The code combines and extends the existing accelerator modeling packages IMPACT and beamline/mxyzptlk. We discuss the design and implementation of Synergia, its performance on different architectures, and its potential applications.
A 3D MPI-Parallel GPU-accelerated framework for simulating ocean wave energy converters
Pathak, Ashish; Raessi, Mehdi
2015-11-01
We present an MPI-parallel GPU-accelerated computational framework for studying the interaction between ocean waves and wave energy converters (WECs). The computational framework captures the viscous effects, nonlinear fluid-structure interaction (FSI), and breaking of waves around the structure, which cannot be captured in many potential flow solvers commonly used for WEC simulations. The full Navier-Stokes equations are solved using the two-step projection method, which is accelerated by porting the pressure Poisson equation to GPUs. The FSI is captured using the numerically stable fictitious domain method. A novel three-phase interface reconstruction algorithm is used to resolve three phases in a VOF-PLIC context. A consistent mass and momentum transport approach enables simulations at high density ratios. The accuracy of the overall framework is demonstrated via an array of test cases. Numerical simulations of the interaction between ocean waves and WECs are presented. Funding from the National Science Foundation CBET-1236462 grant is gratefully acknowledged.
Borazjani, Iman; Ge, Liang; Le, Trung; Sotiropoulos, Fotis
2013-04-01
We develop an overset-curvilinear immersed boundary (overset-CURVIB) method in a general non-inertial frame of reference to simulate a wide range of challenging biological flow problems. The method incorporates overset-curvilinear grids to efficiently handle multi-connected geometries and increase the resolution locally near immersed boundaries. Complex bodies undergoing arbitrarily large deformations may be embedded within the overset-curvilinear background grid and treated as sharp interfaces using the curvilinear immersed boundary (CURVIB) method (Ge and Sotiropoulos, Journal of Computational Physics, 2007). The incompressible flow equations are formulated in a general non-inertial frame of reference to enhance the overall versatility and efficiency of the numerical approach. Efficient search algorithms to identify areas requiring blanking, donor cells, and interpolation coefficients for constructing the boundary conditions at grid interfaces of the overset grid are developed and implemented using efficient parallel computing communication strategies to transfer information among sub-domains. The governing equations are discretized using a second-order accurate finite-volume approach and integrated in time via an efficient fractional-step method. Various strategies for ensuring globally conservative interpolation at grid interfaces suitable for incompressible flow fractional step methods are implemented and evaluated. The method is verified and validated against experimental data, and its capabilities are demonstrated by simulating the flow past multiple aquatic swimmers and the systolic flow in an anatomic left ventricle with a mechanical heart valve implanted in the aortic position. PMID:23833331
A Parallel Programming Model with Sequential Semantics
Thornley, John
1996-01-01
Parallel programming is more difficult than sequential programming in part because of the complexity of reasoning, testing, and debugging in the context of concurrency. In this thesis, we present and investigate a parallel programming model that provides direct control of parallelism in a notation with sequential semantics. Our model consists of a standard sequential imperative programming notation extended with the following three pragmas: (1) The parallelizable sequence of statements pragma...
Adobe Flash 11 Stage3D (Molehill) Game Programming Beginner's Guide
Kaitila, Christer
2011-01-01
Written in an informal and friendly manner, the style and approach of this book will take you on an exciting adventure. Piece by piece, detailed examples help you along the way by providing real-world game code required to make a complete 3D video game. Each chapter builds upon the experience and achievements earned in the last, culminating in the ultimate prize - your game! If you ever wanted to make your own 3D game in Flash, then this book is for you. This book is a perfect introduction to 3D game programming in Adobe Molehill for complete beginners. You do not need to know anything about S
Koldan, Jelena; Puzyrev, Vladimir; de la Puente, Josep; Houzeaux, Guillaume; Cela, José María
2014-06-01
We present an elaborate preconditioning scheme for Krylov subspace methods which has been developed to improve the performance and reduce the execution time of parallel node-based finite-element (FE) solvers for 3-D electromagnetic (EM) numerical modelling in exploration geophysics. This new preconditioner is based on algebraic multigrid (AMG) that uses different basic relaxation methods, such as Jacobi, symmetric successive over-relaxation (SSOR) and Gauss-Seidel, as smoothers and the wave front algorithm to create groups, which are used for a coarse-level generation. We have implemented and tested this new preconditioner within our parallel nodal FE solver for 3-D forward problems in EM induction geophysics. We have performed series of experiments for several models with different conductivity structures and characteristics to test the performance of our AMG preconditioning technique when combined with biconjugate gradient stabilized method. The results have shown that, the more challenging the problem is in terms of conductivity contrasts, ratio between the sizes of grid elements and/or frequency, the more benefit is obtained by using this preconditioner. Compared to other preconditioning schemes, such as diagonal, SSOR and truncated approximate inverse, the AMG preconditioner greatly improves the convergence of the iterative solver for all tested models. Also, when it comes to cases in which other preconditioners succeed to converge to a desired precision, AMG is able to considerably reduce the total execution time of the forward-problem code-up to an order of magnitude. Furthermore, the tests have confirmed that our AMG scheme ensures grid-independent rate of convergence, as well as improvement in convergence regardless of how big local mesh refinements are. In addition, AMG is designed to be a black-box preconditioner, which makes it easy to use and combine with different iterative methods. Finally, it has proved to be very practical and efficient in the
Non-Iterative Rigid 2D/3D Point-Set Registration Using Semidefinite Programming
Khoo, Yuehaw; Kapoor, Ankur
2016-07-01
We describe a convex programming framework for pose estimation in 2D/3D point-set registration with unknown point correspondences. We give two mixed-integer nonlinear program (MINP) formulations of the 2D/3D registration problem when there are multiple 2D images, and propose convex relaxations for both of the MINPs to semidefinite programs (SDP) that can be solved efficiently by interior point methods. Our approach to the 2D/3D registration problem is non-iterative in nature as we jointly solve for pose and correspondence. Furthermore, these convex programs can readily incorporate feature descriptors of points to enhance registration results. We prove that the convex programs exactly recover the solution to the original nonconvex 2D/3D registration problem under noiseless condition. We apply these formulations to the registration of 3D models of coronary vessels to their 2D projections obtained from multiple intra-operative fluoroscopic images. For this application, we experimentally corroborate the exact recovery property in the absence of noise and further demonstrate robustness of the convex programs in the presence of noise.
Productive Parallel Programming: The PCN Approach
Ian Foster; Robert Olson; Steven Tuecke
1992-01-01
We describe the PCN programming system, focusing on those features designed to improve the productivity of scientists and engineers using parallel supercomputers. These features include a simple notation for the concise specification of concurrent algorithms, the ability to incorporate existing Fortran and C code into parallel applications, facilities for reusing parallel program components, a portable toolkit that allows applications to be developed on a workstation or small parallel compute...
Towards Parallel Programming Models for Predictability
Lisper, Björn
2012-01-01
Future embedded systems for performance-demanding applications will be massively parallel. High performance tasks will be parallel programs, running on several cores, rather than single threads running on single cores. For hard real-time applications, WCETs for such tasks must be bounded. Low-level parallel programming models, based on concurrent threads, are notoriously hard to use due to their inherent nondeterminism. Therefore the parallel processing community has long considered high-l...
PDDP, A Data Parallel Programming Model
Karen H. Warren
1996-01-01
PDDP, the parallel data distribution preprocessor, is a data parallel programming model for distributed memory parallel computers. PDDP implements high-performance Fortran-compatible data distribution directives and parallelism expressed by the use of Fortran 90 array syntax, the FORALL statement, and the WHERE construct. Distributed data objects belong to a global name space; other data objects are treated as local and replicated on each processor. PDDP allows the user to program in a shared...
Gainullin, I. K.; Sonkin, M. A.
2015-03-01
A parallelized three-dimensional (3D) time-dependent Schrodinger equation (TDSE) solver for one-electron systems is presented in this paper. The TDSE Solver is based on the finite-difference method (FDM) in Cartesian coordinates and uses a simple and explicit leap-frog numerical scheme. The simplicity of the numerical method provides very efficient parallelization and high performance of calculations using Graphics Processing Units (GPUs). For example, calculation of 106 time-steps on the 1000ṡ1000ṡ1000 numerical grid (109 points) takes only 16 hours on 16 Tesla M2090 GPUs. The TDSE Solver demonstrates scalability (parallel efficiency) close to 100% with some limitations on the problem size. The TDSE Solver is validated by calculation of energy eigenstates of the hydrogen atom (13.55 eV) and affinity level of H- ion (0.75 eV). The comparison with other TDSE solvers shows that a GPU-based TDSE Solver is 3 times faster for the problems of the same size and with the same cost of computational resources. The usage of a non-regular Cartesian grid or problem-specific non-Cartesian coordinates increases this benefit up to 10 times. The TDSE Solver was applied to the calculation of the resonant charge transfer (RCT) in nanosystems, including several related physical problems, such as electron capture during H+-H0 collision and electron tunneling between H- ion and thin metallic island film.
The analysis of visual parallel programming languages
Vladimir Averbukh; Mikhail Bakhterev
2013-01-01
The paper is devoted to the analysis of state of the art in visual parallel programming languages. The brief history of this domain is described. The diagrammatic imagery of visual languages is analyzed. Limitations of the diagrammatic approach are revealed. The additional type of visual parallel programming languages (action language) is described. Some problems of perception of visualization for parallel computing are considered. Some approaches to the evaluation of visual programming langu...
iPhone 3D Programming Developing Graphical Applications with OpenGL ES
Rideout, Philip
2010-01-01
What does it take to build an iPhone app with stunning 3D graphics? This book will show you how to apply OpenGL graphics programming techniques to any device running the iPhone OS -- including the iPad and iPod Touch -- with no iPhone development or 3D graphics experience required. iPhone 3D Programming provides clear step-by-step instructions, as well as lots of practical advice, for using the iPhone SDK and OpenGL. You'll build several graphics programs -- progressing from simple to more complex examples -- that focus on lighting, textures, blending, augmented reality, optimization for pe
Ma, Yingliang; Saetzler, Kurt
2008-01-01
In this paper we describe a novel 3D subdivision strategy to extract the surface of binary image data. This iterative approach generates a series of surface meshes that capture different levels of detail of the underlying structure. At the highest level of detail, the resulting surface mesh generated by our approach uses only about 10% of the triangles in comparison to the marching cube algorithm (MC) even in settings were almost no image noise is present. Our approach also eliminates the so-called "staircase effect" which voxel based algorithms like the MC are likely to show, particularly if non-uniformly sampled images are processed. Finally, we show how the presented algorithm can be parallelized by subdividing 3D image space into rectilinear blocks of subimages. As the algorithm scales very well with an increasing number of processors in a multi-threaded setting, this approach is suited to process large image data sets of several gigabytes. Although the presented work is still computationally more expensive than simple voxel-based algorithms, it produces fewer surface triangles while capturing the same level of detail, is more robust towards image noise and eliminates the above-mentioned "staircase" effect in anisotropic settings. These properties make it particularly useful for biomedical applications, where these conditions are often encountered. PMID:17993710
PDDP, A Data Parallel Programming Model
Karen H. Warren
1996-01-01
Full Text Available PDDP, the parallel data distribution preprocessor, is a data parallel programming model for distributed memory parallel computers. PDDP implements high-performance Fortran-compatible data distribution directives and parallelism expressed by the use of Fortran 90 array syntax, the FORALL statement, and the WHERE construct. Distributed data objects belong to a global name space; other data objects are treated as local and replicated on each processor. PDDP allows the user to program in a shared memory style and generates codes that are portable to a variety of parallel machines. For interprocessor communication, PDDP uses the fastest communication primitives on each platform.
HEXBU-3D, a three-dimensional PWR-simulator program for hexagonal fuel assemblies
HEXBU-3D is a three-dimensional nodal simulator program for PWR reactors. It is designed for a reactor core that consists of hexagonal fuel assemblies and of big follower-type control assemblies. The program solves two-group diffusion equations in homogenized fuel assembly geometry by a sophisticated nodal method. The treatment of feedback effects from xenon-poisoning, fuel temperature, moderator temperature and density and soluble boron concentration are included in the program. The nodal equations are solved by a fast two-level iteration technique and the eigenvalue can be either the effective multiplication factor or the boron concentration of the moderator. Burnup calculations are performed by tabulated sets of burnup-dependent cross sections evaluated by a cell burnup program. HEXBY-3D has been originally programmed in FORTRAN V for the UNIVAC 1108 computer, but there is also another version which is operable on the CDC CYBER 170 computer. (author)
Parallel Programming Archetypes in Combinatorics and Optimization
Kryukova, Svetlana A
1995-01-01
A Parallel Programming Archetype is a language-independent program design strategy. We describe two archetypes in combinatorics and optimization, their components, implementations, and example applications developed using an archetype
Generation of Distributed Parallel Java Programs
Launay, Pascale; Pazat, Jean-Louis
1998-01-01
The aim of the Do! project is to ease the standard task of programming distributed applications using Java. This paper gives an overview of the parallel and distributed frameworks and describes the mechanisms developed to distribute programs with Do!.
Parallel programming with PCN. Revision 1
Foster, I.; Tuecke, S.
1991-12-01
PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).
Parallel programming characteristics of a DSP-based parallel system
GAO Shu; GUO Qing-ping
2006-01-01
This paper firstly introduces the structure and working principle of DSP-based parallel system, parallel accelerating board and SHARC DSP chip. Then it pays attention to investigating the system's programming characteristics, especially the mode of communication, discussing how to design parallel algorithms and presenting a domain-decomposition-based complete multi-grid parallel algorithm with virtual boundary forecast (VBF) to solve a lot of large-scale and complicated heat problems. In the end, Mandelbrot Set and a non-linear heat transfer equation of ceramic/metal composite material are taken as examples to illustrate the implementation of the proposed algorithm. The results showed that the solutions are highly efficient and have linear speedup.
Parallel programming of industrial applications
Heroux, M; Koniges, A; Simon, H
1998-07-21
In the introductory material, we overview the typical MPP environment for real application computing and the special tools available such as parallel debuggers and performance analyzers. Next, we draw from a series of real applications codes and discuss the specific challenges and problems that are encountered in parallelizing these individual applications. The application areas drawn from include biomedical sciences, materials processing and design, plasma and fluid dynamics, and others. We show how it was possible to get a particular application to run efficiently and what steps were necessary. Finally we end with a summary of the lessons learned from these applications and predictions for the future of industrial parallel computing. This tutorial is based on material from a forthcoming book entitled: "Industrial Strength Parallel Computing" to be published by Morgan Kaufmann Publishers (ISBN l-55860-54).
Massively Parallel Finite Element Programming
Heister, Timo
2010-01-01
Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
On the Dynamic Programming Approach for the 3D Navier-Stokes Equations
The dynamic programming approach for the control of a 3D flow governed by the stochastic Navier-Stokes equations for incompressible fluid in a bounded domain is studied. By a compactness argument, existence of solutions for the associated Hamilton-Jacobi-Bellman equation is proved. Finally, existence of an optimal control through the feedback formula and of an optimal state is discussed
Towards Distributed Memory Parallel Program Analysis
Quinlan, D; Barany, G; Panas, T
2008-06-17
This paper presents a parallel attribute evaluation for distributed memory parallel computer architectures where previously only shared memory parallel support for this technique has been developed. Attribute evaluation is a part of how attribute grammars are used for program analysis within modern compilers. Within this work, we have extended ROSE, a open compiler infrastructure, with a distributed memory parallel attribute evaluation mechanism to support user defined global program analysis required for some forms of security analysis which can not be addressed by a file by file view of large scale applications. As a result, user defined security analyses may now run in parallel without the user having to specify the way data is communicated between processors. The automation of communication enables an extensible open-source parallel program analysis infrastructure.
Development and application of 3D core fuel management program for HFETR
The author introduces the principle and function of the 3D core fuel management program HFM for the High Flux Engineering Test Reactor (HFETR). Calculations are performed for five reactor core on HFETR critical assembly and first three cycles of HFETR. Results show the adopted cell and core calculation model and method are correct. Good consistency is obtained in calculational results with experimental values. Therefore, HFM program could be used in core fuel management of HFETR rapidly and exactly
Productive Parallel Programming: The PCN Approach
Ian Foster
1992-01-01
Full Text Available We describe the PCN programming system, focusing on those features designed to improve the productivity of scientists and engineers using parallel supercomputers. These features include a simple notation for the concise specification of concurrent algorithms, the ability to incorporate existing Fortran and C code into parallel applications, facilities for reusing parallel program components, a portable toolkit that allows applications to be developed on a workstation or small parallel computer and run unchanged on supercomputers, and integrated debugging and performance analysis tools. We survey representative scientific applications and identify problem classes for which PCN has proved particularly useful.
A survey of parallel programming tools
Cheng, Doreen Y.
1991-01-01
This survey examines 39 parallel programming tools. Focus is placed on those tool capabilites needed for parallel scientific programming rather than for general computer science. The tools are classified with current and future needs of Numerical Aerodynamic Simulator (NAS) in mind: existing and anticipated NAS supercomputers and workstations; operating systems; programming languages; and applications. They are divided into four categories: suggested acquisitions, tools already brought in; tools worth tracking; and tools eliminated from further consideration at this time.
2D/3D Program work summary report, [January 1988--December 1992
The 2D/3D Program was carried out by Germany, Japan and the United States to investigate the thermal-hydraulics of a PWR large-break LOCA. A contributory approach was utilized in which each country contributed significant effort to the program and all three countries shared the research results. Germany constructed and operated the Upper Plenum Test Facility (UPTF), and Japan constructed and operated the Cylindrical Core Test Facility (CCTF) and the Slab Core Test Facility (SCTF). The US contribution consisted of provision of advanced instrumentation to each of the three test facilities, and assessment of the TRAC computer code against the test results. Evaluations of the test results were carried out in all three countries. This report summarizes the 2D/3D Program in terms of the contributing efforts of the participants
Analysis results from the Los Alamos 2D/3D program
Los Alamos National Laboratory is a participant in the 2D/3D program. Activities conducted at Los Alamos National Laboratory in support of 2D/3D program goals include analysis support of facility design, construction, and operation; provision of boundary and initial conditions for test-facility operations based on analysis of pressurized water reactors; performance of pretest and posttest predictions and analyses; and use of experimental results to validate and assess the single- and multi-dimensional, nonequilibrium features in the Transient Reactor Analysis Code (TRAC). During fiscal year 1987, Los Alamos conducted analytical assessment activities using data from the Slab Core Test Facility, The Cylindrical Core Test Facility, and the Upper Plenum Test Facility. Finally, Los Alamos continued work to provide TRAC improvements. In this paper, Los Alamos activities during fiscal year 1987 will be summarized; several significant accomplishments will be described in more detail to illustrate the work activities at Los Alamos
Analysis results from the Los Alamos 2D/3D program
Los Alamos National Laboratory is a participant in the 2D/3D program. Activities conducted at Los Alamos National Laboratory in support of 2D/3D program goals include analysis support of facility design, construction, and operation; provision of boundary and initial conditions for test-facility operations based on analysis of pressurized water reactors; performance of pretest and post-test predictions and analyses; and use of experimental results to validate and assess the single- and multi-dimensional, nonequilibrium features in the Transient Reactor Analysis Code (TRAC). During fiscal year 1987, Los Alamos conducted analytical assessment activities using data from the Slab Core Test Facility, the Cylindrical Core Test Facility, and the Upper Plenum Test Facility. Finally, Los Alamos continued work to provide TRAC improvements. In this paper, Los Alamos activities during fiscal year 1987 are summarized; several significant accomplishments are described in more detail to illustrate the work activities at Los Alamos
Advanced 3D Audio Algorithms by a Flexible, Low Level Application Programming Interface
Simeonov, A; Zoia, G.; Lluis Garcia, R.; Mlynek, D.
2004-01-01
The constantly increasing demand for a better quality in sound and video for multimedia content and virtual reality compels the implementation of more and more sophisticated 3D audio models in authoring and playback tools. A very careful and systematic analysis of the best available development libraries in this area was carried out, considering different Application Programming Interfaces, their features, extensibility, and portability among each other. The results show that it is often diff...
Integrated Task and Data Parallel Programming
Grimshaw, A. S.
1998-01-01
This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated
The kpx, a program analyzer for parallelization
The kpx is a program analyzer, developed as a common technological basis for promoting parallel processing. The kpx consists of three tools. The first is ktool, that shows how much execution time is spent in program segments. The second is ptool, that shows parallelization overhead on the Paragon system. The last is xtool, that shows parallelization overhead on the VPP system. The kpx, designed to work for any FORTRAN cord on any UNIX computer, is confirmed to work well after testing on Paragon, SP2, SR2201, VPP500, VPP300, Monte-4, SX-4 and T90. (author)
Speedup predictions on large scientific parallel programs
How much speedup can we expect for large scientific parallel programs running on supercomputers. For insight into this problem we extend the parallel processing environment currently existing on the Cray X-MP (a shared memory multiprocessor with at most four processors) to a simulated N-processor environment, where N greater than or equal to 1. Several large scientific parallel programs from Los Alamos National Laboratory were run in this simulated environment, and speedups were predicted. A speedup of 14.4 on 16 processors was measured for one of the three most used codes at the Laboratory
Koldan, Jelena
2013-01-01
The growing significance, technical development and employment of electromagnetic (EM) methods in exploration geophysics have led to the increasing need for reliable and fast techniques of interpretation of 3-D EM data sets acquired in complex geological environments. The first and most important step to creating an inversion method is the development of a solver for the forward problem. In order to create an efficient, reliable and practical 3-D EM inversion, it is necessary to have a 3-D EM...
Steam generator experiment for 3-D computer code qualification - CLOTAIRE international program
The current 1988/89 test program does focus on the production of accurate data sets dedicated to the qualifications of both 3-D thermalhydraulic codes and flow induced vibration predictive tools. In order to meet these challenging objectives the test program includes: detailed measurements of two-phase flow distributions relying on advanced optical probe techniques, throughout the bundle straight part; investigations at the same time of flow distributions and of the tubes' vibratory responses, in the U-band region; for a limited number of preselected positions, measurements of the emulsion's changing characteristics during transient sequences similar to those in an actual plant. (orig./DG)
The PISCES 2 parallel programming environment
Pratt, Terrence W.
1987-01-01
PISCES 2 is a programming environment for scientific and engineering computations on MIMD parallel computers. It is currently implemented on a flexible FLEX/32 at NASA Langley, a 20 processor machine with both shared and local memories. The environment provides an extended Fortran for applications programming, a configuration environment for setting up a run on the parallel machine, and a run-time environment for monitoring and controlling program execution. This paper describes the overall design of the system and its implementation on the FLEX/32. Emphasis is placed on several novel aspects of the design: the use of a carefully defined virtual machine, programmer control of the mapping of virtual machine to actual hardware, forces for medium-granularity parallelism, and windows for parallel distribution of data. Some preliminary measurements of storage use are included.
Portable parallel programming in a Fortran environment
Experience using the Argonne-developed PARMACs macro package to implement a portable parallel programming environment is described. Fortran programs with intrinsic parallelism of coarse and medium granularity are easily converted to parallel programs which are portable among a number of commercially available parallel processors in the class of shared-memory bus-based and local-memory network based MIMD processors. The parallelism is implemented using standard UNIX (tm) tools and a small number of easily understood synchronization concepts (monitors and message-passing techniques) to construct and coordinate multiple cooperating processes on one or many processors. Benchmark results are presented for parallel computers such as the Alliant FX/8, the Encore MultiMax, the Sequent Balance, the Intel iPSC/2 Hypercube and a network of Sun 3 workstations. These parallel machines are typical MIMD types with from 8 to 30 processors, each rated at from 1 to 10 MIPS processing power. The demonstration code used for this work is a Monte Carlo simulation of the response to photons of a ''nearly realistic'' lead, iron and plastic electromagnetic and hadronic calorimeter, using the EGS4 code system. 6 refs., 2 figs., 2 tabs
Parallel programming with PCN. Revision 2
Foster, I.; Tuecke, S.
1993-01-01
PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.
Wireless Rover Meets 3D Design and Product Development
Deal, Walter F., III; Hsiung, Steve C.
2016-01-01
Today there are a number of 3D printing technologies that are low cost and within the budgets of middle and high school programs. Educational technology companies offer a variety of 3D printing technologies and parallel curriculum materials to enable technology and engineering teachers to easily add 3D learning activities to their programs.…
A Fortran program (RELAX3D) to solve the 3 dimensional Poisson (Laplace) equation
RELAX3D is an efficient, user friendly, interactive FORTRAN program which solves the Poisson (Laplace) equation Λ2=p for a general 3 dimensional geometry consisting of Dirichlet and Neumann boundaries approximated to lie on a regular 3 dimensional mesh. The finite difference equations at these nodes are solved using a successive point-iterative over-relaxation method. A menu of commands, supplemented by HELP facility, controls the dynamic loading of the subroutine describing the problem case, the iterations to converge to a solution, and the contour plotting of any desired slices, etc
Reactor safety issues resolved by the 2D/3D program
The 2D/3D Program studied multidimensional thermal-hydraulics in a PWR core and primary system during the end-of-blowdown and post-blowdown phases of a large-break LOCA (LBLOCA), and during selected small-break LOCA (SBLOCA) transients. The program included tests at the Cylindrical Core Test Facility (CCTF), the Slab Core Test Facility (SCTF), and the Upper Plenum Test Facility (UPTF), and computer analyses using TRAC. Tests at CCTF investigated core thermal-hydraulics and overall system behavior while tests at SCTF concentrated on multidimensional core thermal-hydraulics. The UPTF tests investigated two-phase flow behavior in the downcomer, upper plenum, tie plate region, and primary loops. TRAC analyses evaluated thermal-hydraulic behavior throughout the primary system in tests as well as in PWRs. This report summarizes the test and analysis results in each of the main areas where improved information was obtained in the 2D/3D Program. The discussion is organized in terms of the reactor safety issues investigated. This report was prepared in a coordination among US, Germany and Japan. US and Germany have published the report as NUREG/IA-0127 and GRS-101 respectively. (author)
Reactor safety issues resolved by the 2D/3D Program
The 2D/3D Program studied multidimensional thermal-hydraulics in a PWR core and primary system during the end-of-blowdown and post-blowdown phases of a large-break LOCA (LBLOCA), and during selected small-break LOCA (SBLOCA) transients. The program included tests at the Cylindrical Core Test Facility (CCTF), the Slab Core Test Facility (SCTF), and the Upper Plenum Test Facility (UPTF), and computer analyses using TRAC. Tests at CCTF investigated core thermal-hydraulics and overall system behavior while tests at SCTF concentrated on multidimensional core thermal-hydraulics. The UPTF tests investigated two-phase flow behavior in the downcomer, upper plenum, tie plate region, and primary loops. TRAC analyses evaluated thermal-hydraulic behavior throughout the primary system in tests as well as in PWRs. This report summarizes the test and analysis results in each of the main areas where improved information was obtained in the 2D/3D Program. The discussion is organized in terms of the reactor safety issues investigated
Ingram-Goble, Adam
This is an exploratory design study of a novel system for learning programming and 3D role-playing game design as tools for social change. This study was conducted at two sites. Participants in the study were ages 9-14 and worked for up to 15 hours with the platform to learn how to program and design video games with personally or socially relevant narratives. This first study was successful in that students learned to program a narrative game, and they viewed the social problem framing for the practices as an interesting aspect of the experience. The second study provided illustrative examples of how providing less general structure up-front, afforded players the opportunity to produce the necessary structures as needed for their particular design, and therefore had a richer understanding of what those structures represented. This study demonstrates that not only were participants able to use computational thinking skills such as Boolean and conditional logic, planning, modeling, abstraction, and encapsulation, they were able to bridge these skills to social domains they cared about. In particular, participants created stories about socially relevant topics without to explicit pushes by the instructors. The findings also suggest that the rapid uptake, and successful creation of personally and socially relevant narratives may have been facilitated by close alignment between the conceptual tools represented in the platform, and the domain of 3D role-playing games.
Development of a 3D multigroup program for Dancoff factor calculation in pebble bed reactors
Highlights: • Development of a 3D Monte Carlo based code for pebble bed reactors. • Dancoff sensitivity to clad, moderator and fuel cross sections is considered. • Sensitivity of Dancoff to number of energy groups is considered. • Sensitivity of Dancoff to number of fuel and their arrangement is considered. • Excellent agreements vs. MCNP code. - Abstract: The evaluation of multigroup constants in reactor calculations depends on several parameters. One of these parameters is the Dancoff factor which is used for calculating the resonance integral and flux depression in the resonance region in heterogeneous systems. In the current paper, a computer program (MCDAN-3D) is developed for calculating three dimensional black and gray Dancoff coefficients, based on Monte Carlo, escape probability and neutron free flight methods. The developed program is capable to calculate the Dancoff factor for an arbitrary arrangement of fuel and moderator pebbles. Moreover this program can simulate fuels with homogeneous and heterogeneous compositions. It might generate the position of Triso particles in fuel pebbles randomly as well. It could calculate the black and gray Dancoff coefficients since fuel region might have different cross sections. Finally, the effects of clad and moderator are considered and the sensitivity of Dancoff factor with fuels arrangement variation, number of TRISO particles and neutron energy has been studied
Detecting drug use in adolescents using a 3D simulation program
Luis Iribarne
2010-11-01
Full Text Available This work presents a new 3D simulation program, called MiiSchool, and its application to the detection of problem behaviours appearing in school settings. We begin by describing some of the main features of the Mii School program. Then, we present the results of a study in which adolescents responded to Mii School simulations involving the consumption of alcoholic drinks, cigarettes, cannabis, cocaine, and MDMA (ecstasy. We established a“risk profile” based on the observed response patterns. We also present results concerning user satisfaction with the program and the extent to which users felt that the simulated scenes were realistic. Lastly, we discuss the usefulness of Mii School as a tool for assessing drug use in school settings.
PLOT 3D: a multipurpose, interactive program for plotting three dimensional graphs
PLOT3D is a general purpose, interactive, three dimensional display and plotter program. Written in Fortran-77, it uses a two dimensional plotter software (PLOT-10) to draw orthographic axonometric projection of three dimensional graph comprising of smooth surface or cubical histogram, in any desired orientation, magnification and window, employing throughout highly accurate hidden lines removal techniques. The figure, so generated, can optionally be clipped, smoothened by interpolation, or shaded selectively to distinguish among its different faces. The program accepts data from an external file, or generates them through built-in functions. It can be used for graphical representation of data such as neutron flux, theta(x,y) in form of smooth surface, even if available data are very few in number. It is also capable of drawing histograms of quantities such as fuel power in a reactor lattice. Listing of the program is given. (author)
Optics Program Modified for Multithreaded Parallel Computing
Lou, John; Bedding, Dave; Basinger, Scott
2006-01-01
A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.
Parallelism and programming in classifier systems
Forrest, Stephanie
1990-01-01
Parallelism and Programming in Classifier Systems deals with the computational properties of the underlying parallel machine, including computational completeness, programming and representation techniques, and efficiency of algorithms. In particular, efficient classifier system implementations of symbolic data structures and reasoning procedures are presented and analyzed in detail. The book shows how classifier systems can be used to implement a set of useful operations for the classification of knowledge in semantic networks. A subset of the KL-ONE language was chosen to demonstrate these o
OpenCL parallel programming development cookbook
Tay, Raymond
2013-01-01
OpenCL Parallel Programming Development Cookbook will provide a set of advanced recipes that can be utilized to optimize existing code. This book is therefore ideal for experienced developers with a working knowledge of C/C++ and OpenCL.This book is intended for software developers who have often wondered what to do with that newly bought CPU or GPU they bought other than using it for playing computer games; this book is also for developers who have a working knowledge of C/C++ and who want to learn how to write parallel programs in OpenCL so that life isn't too boring.
The JLAB 3D program at 12 GeV (TMDs + GPDs)
The Jefferson Lab CEBAF accelerator is undergoing an upgrade that will increase the beam energy up to 12 GeV. The three experimental Halls operating in the 6-GeV era are upgrading their detectors to adapt their performances to the new available kinematics, and a new Hall (D) is being built. The investigation of the three-dimensional nucleon structure both in the coordinate and in the momentum space represents an essential part of the 12-GeV physics program, and several proposals aiming at the extraction of related observables have been already approved in Hall A, B and C. In this proceedings, the focus of the JLab 3D program will be described, and a selection of proposals will be discussed.
The JLAB 3D program at 12 GeV (TMDs + GPDs)
Pisano, Silvia [Istituto Nazionale di Fisica Nucleare (INFN), Frascati (Italy)
2015-01-01
The Jefferson Lab CEBAF accelerator is undergoing an upgrade that will increase the beam energy up to 12 GeV. The three experimental Halls operating in the 6-GeV era are upgrading their detectors to adapt their performances to the new available kinematics, and a new Hall (D) is being built. The investigation of the three-dimensional nucleon structure both in the coordinate and in the momentum space represents an essential part of the 12-GeV physics program, and several proposals aiming at the extraction of related observables have been already approved in Hall A, B and C. In this proceedings, the focus of the JLab 3D program will be described, and a selection of proposals will be discussed.
Parallel Programming of General-Purpose Programs Using Task-Based Programming Models
Vandierendonck, Hans; Pratikakis, Polyvios; Nikolopoulos, Dimitrios
2011-01-01
The prevalence of multicore processors is bound to drive most kinds of software development towards parallel programming. To limit the difficulty and overhead of parallel software design and maintenance, it is crucial that parallel programming models allow an easy-to-understand, concise and dense representation of parallelism. Parallel programming models such as Cilk++ and Intel TBBs attempt to offer a better, higher-level abstraction for parallel programming than threads and locking synchron...
Distributed parallel computing using navigational programming
Pan, Lei; Lai, M. K.; Noguchi, K; Huseynov, J J; L. F. Bic; Dillencourt, M B
2004-01-01
Message Passing ( MP) and Distributed Shared Memory (DSM) are the two most common approaches to distributed parallel computing. MP is difficult to use, whereas DSM is not scalable. Performance scalability and ease of programming can be achieved at the same time by using navigational programming (NavP). This approach combines the advantages of MP and DSM, and it balances convenience and flexibility. Similar to MP, NavP suggests to its programmers the principle of pivot-computes and hence is ef...
Parallel GRISYS/Power Challenge System Version 1.0 and 3D Prestack Depth Migration Package
Zhao Zhenwen
1995-01-01
@@ Based on the achievements and experience of seismic data parallel processing made in the past years by Beijing Global Software Corporation (GS) of CNPC, Parallel GRISYS/Power Challenge seismic data processing system version 1.0 has been cooperatively developed and integrated on the Power Challenge computer by GS, SGI (USA) and Shuangyuan Company of Academia Sinica.
Trelease, R B
1996-01-01
Advances in computer visualization and user interface technologies have enabled development of "virtual reality" programs that allow users to perceive and to interact with objects in artificial three-dimensional environments. Such technologies were used to create an image database and program for studying the human skull, a specimen that has become increasingly expensive and scarce. Stereoscopic image pairs of a museum-quality skull were digitized from multiple views. For each view, the stereo pairs were interlaced into a single, field-sequential stereoscopic picture using an image processing program. The resulting interlaced image files are organized in an interactive multimedia program. At run-time, gray-scale 3-D images are displayed on a large-screen computer monitor and observed through liquid-crystal shutter goggles. Users can then control the program and change views with a mouse and cursor to point-and-click on screen-level control words ("buttons"). For each view of the skull, an ID control button can be used to overlay pointers and captions for important structures. Pointing and clicking on "hidden buttons" overlying certain structures triggers digitized audio spoken word descriptions or mini lectures. PMID:8793223
Functions, objects and parallelism programming in Balinda K
Kwong, Yuen Chung
1999-01-01
Despite many years of research and development, parallel programming remains a difficult and specialized task. A simple but general model for parallel processing is still lacking.This book proposes a model that adds parallelism to functions and objects, allowing simple specification of both parallel execution and inter-process communication. Many examples of applying parallel programming are given.
Concurrency-based approaches to parallel programming
Kale, L.V.; Chrisochoides, N.; Kohl, J.; Yelick, K.
1995-01-01
The inevitable transition to parallel programming can be facilitated by appropriate tools, including languages and libraries. After describing the needs of applications developers, this paper presents three specific approaches aimed at development of efficient and reusable parallel software for irregular and dynamic-structured problems. A salient feature of all three approaches in their exploitation of concurrency within a processor. Benefits of individual approaches such as these can be leveraged by an interoperability environment which permits modules written using different approaches to co-exist in single applications.
Control cycloconverter using transputer based parallel programming
Chiu, D.M.; Li, S.J. [Victoria Univ. of Technology, Melbourne, Victoria (Australia). Dept. of Electrical and Electronic Engineering
1995-12-31
The naturally commutated cycloconverter is a valuable tool for speed control in AC machines. Over the years, its control circuits have been designed and implemented using vacuum tubes, transistors, integrated circuits and single-processor. However, the problem of obtaining accurate data on the triggering pulse generation in order for quantitative analysis is still unsolved. Triggering instants of cycloconverter have been precisely controlled using transputer based parallel computing technique. The HELIOS operating system is found to be an efficient tool for the development of the parallel programming. Different topology configurations and various communication mechanisms of HELIOS are employed to determine and select the fastest technique that is suitable for the present work.
Design patterns percolating to parallel programming framework implementation
Aldinucci, M.; Campa, S.; Danelutto, M.; Kilpatrick, P.; Torquati, M.
2014-01-01
Structured parallel programming is recognised as a viable and effective means of tackling parallel programming problems. Recently, a set of simple and powerful parallel building blocks RISC pb2l) has been proposed to support modelling and implementation of parallel frameworks. In this work we demonstrate how that same parallel building block set may be used to model both general purpose parallel programming abstractions, not usually listed in classical skeleton sets, and more specialized doma...
Skeleton based parallel programming: functional and parallel semantics in a single shot
Aldinucci, Marco; Danelutto, Marco
2004-01-01
Different skeleton based parallel programming systems have been developed in past years. The main goal of these programming environments is to provide programmers with handy, effective ways of writing parallel applications. In particular, skeleton based parallel programming environments automatically deal with most of the difficult, cumbersome programming problems that must be usually handled by programmers of parallel applications using traditional programming environments (e.g. environments...
Kaldestad, Knut B.; Haddadin, Sami; Belder, Rico; Hovland, Geir; Anisi, David A.
2014-01-01
In this paper we present an experimental study on real-time collision avoidance with potential ﬁelds that are based on 3D point cloud data and processed on the Graphics Processing Unit (GPU). The virtual forces from the potential ﬁelds serve two purposes. First, they are used for changing the reference trajectory. Second they are projected to and applied on torque control level for generating according nullspace behavior together with a Cartesian impedance main control ...
Programming massively parallel processors a hands-on approach
Kirk, David B
2010-01-01
Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs. Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA. The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who ...
PSHED: a simplified approach to developing parallel programs
This paper presents a simplified approach in the forms of a tree structured computational model for parallel application programs. An attempt is made to provide a standard user interface to execute programs on BARC Parallel Processing System (BPPS), a scalable distributed memory multiprocessor. The interface package called PSHED provides a basic framework for representing and executing parallel programs on different parallel architectures. The PSHED package incorporates concepts from a broad range of previous research in programming environments and parallel computations. (author). 6 refs
Automatic Performance Debugging of SPMD Parallel Programs
Liu, Xu; Zhan, Jianfeng; Tu, Bibo; Meng, Dan
2010-01-01
Automatic performance debugging of parallel applications usually involves two steps: automatic detection of performance bottlenecks and uncovering their root causes for performance optimization. Previous work fails to resolve this challenging issue in several ways: first, several previous efforts automate analysis processes, but present the results in a confined way that only identifies performance problems with apriori knowledge; second, several tools take exploratory or confirmatory data analysis to automatically discover relevant performance data relationships. However, these efforts do not focus on locating performance bottlenecks or uncovering their root causes. In this paper, we design and implement an innovative system, AutoAnalyzer, to automatically debug the performance problems of single program multi-data (SPMD) parallel programs. Our system is unique in terms of two dimensions: first, without any apriori knowledge, we automatically locate bottlenecks and uncover their root causes for performance o...
Hybrid MPI+OpenMP parallelization of an FFT-based 3D Poisson solver with one periodic direction
Gorobets, Andrei; Trias Miquel, Francesc Xavier; Borrell Pol, Ricard; Lehmkuhl Barba, Oriol; Oliva Llena, Asensio
2011-01-01
This work is devoted to the development of efficient parallel algorithms for the direct numerical simulation (DNS) of incompressible flows on modern supercomputers. In doing so, a Poisson equation needs to be solved at each time-step to project the velocity field onto a divergence-free space. Due to the non-local nature of its solution, this elliptic system is the part of the algorithm that is most difficult to parallelize. The Poisson solver presented here is restricted to problems with o...
The finite element-based Continuum Damage Mechanics (CDM) software DAMAGE XXX has been developed: to model high-temperature creep damage initiation, evolution and crack growth in 3-D engineering components; and, to run on parallel computer architectures. The driver has been to achieve computational speed through computer parallelism. The development and verification of the software have been carried out using uni-axial crosswelded testpieces in which the plane of symmetry of the V-weld preparation is orthogonal to the tensile loading axis. The welds were manufactured using 0.5Cr-0.5Mo-0.25V ferritic parent steel, and a matching 2.25Cr-1Mo ferritic steel weld filler metal. The Heat Affected Zones (HAZ) of welds were assumed to be divided into three sub-regions: Coarse grained-HAZ (CG-HAZ); Refined grained-HAZ (R-HAZ); and, the inter-critical HAZ regions (Type IV-HAZ). Constitutive equations and associated parameters are summarised for weld, CG-HAZ, R-HAZ, Type IV-HAZ, and parent materials, at 575, 590, and 600 deg. C. These are used to make finite element-based predictions of crossweld testpiece lifetimes and failure modes using the newly developed 3-D parallel computer software, and independent 2-D serial software, at an average minimum cross-section stress of 69.5 MPa. Crossweld testpiece analyses, done using the newly developed 3-D parallel software, have been verified using independent results of 2-D serial software; and, of laboratory experiments.
Some important issues of the computational process in parallel programming
Lyazzat Kh. Zhunussova
2015-01-01
The modern approach to education in parallel programming has enough bright “technological” focus: the main emphasis in presenting educational material is on aspects of parallel computing architectures and practical parallel programming techniques.In other words, the issue of creating parallel software becomes only one aspect of a more general discipline — engineering parallel software application as a set of mathematical models, numerical methods for their implementation, parallel algorithms ...
Thermal-hydraulic system computer codes are extensively used worldwide for analysis of nuclear facilities by utilities, regulatory bodies, nuclear power plant designers and vendors, nuclear fuel companies, research organizations, consulting companies, and technical support organizations. The computer code user represents a source of uncertainty that can influence the results of system code calculations. This influence is commonly known as the 'user effect' and stems from the limitations embedded in the codes as well as from the limited capability of the analysts to use the codes. Code user training and qualification is an effective means for reducing the variation of results caused by the application of the codes by different users. This paper describes a systematic approach to training code users who, upon completion of the training, should be able to perform calculations making the best possible use of the capabilities of best estimate codes. In other words, the program aims at contributing towards solving the problem of user effect. The 3D S.UN.COP (Scaling, Uncertainty and 3D COuPled code calculations) seminars have been organized as follow-up of the proposal to IAEA for the Permanent Training Course for System Code Users (D'Auria, 1998). Four seminars have been held at University of Pisa (2003, 2004), at The Pennsylvania State University (2004) and at University of Zagreb (2005). It was recognized that such courses represented both a source of continuing education for current code users and a mean for current code users to enter the formal training structure of a proposed 'permanent' stepwise approach to user training. The 3D S.UN.COP 2005 was successfully held with the participation of 19 persons coming from 9 countries and 14 different institutions (universities, vendors, national laboratories and regulatory bodies). More than 15 scientists were involved in the organization of the seminar, presenting theoretical aspects of the proposed methodologies and
Thermal-hydraulic system computer codes are extensively used worldwide for analysis of nuclear facilities by utilities, regulatory bodies, nuclear power plant designers and vendors, nuclear fuel companies, research organizations, consulting companies, and technical support organizations. The computer code user represents a source of uncertainty that can influence the results of system code calculations. This influence is commonly known as the 'user effect' and stems from the limitations embedded in the codes as well as from the limited capability of the analysts to use the codes. Code user training and qualification is an effective means for reducing the variation of results caused by the application of the codes by different users. This paper describes a systematic approach to training code users who, upon completion of the training, should be able to perform calculations making the best possible use of the capabilities of best estimate codes. In other words, the program aims at contributing towards solving the problem of user effect. The 3D S.UN.COP (Scaling, Uncertainty and 3D COuPled code calculations) seminars have been organized as follow-up of the proposal to IAEA for the Permanent Training Course for System Code Users [1]. Five seminars have been held at University of Pisa (2003, 2004), at The Pennsylvania State University (2004), at University of Zagreb (2005) and at the School of Industrial Engineering of Barcelona (2006). It was recognized that such courses represented both a source of continuing education for current code users and a mean for current code users to enter the formal training structure of a proposed 'permanent' stepwise approach to user training. The 3D S.UN.COP 2006 was successfully held with the attendance of 33 participants coming from 18 countries and 28 different institutions (universities, vendors, national laboratories and regulatory bodies). More than 30 scientists (coming from 13 countries and 23 different institutions) were
Thermal-hydraulic system computer codes are extensively used worldwide for analysis of nuclear facilities by utilities, regulatory bodies, nuclear power plant designers and vendors, nuclear fuel companies, research organizations, consulting companies, and technical support organizations. The computer code user represents a source of uncertainty that can influence the results of system code calculations. This influence is commonly known as the user effect' and stems from the limitations embedded in the codes as well as from the limited capability of the analysis to use the codes. Code user training and qualification is an effective means for reducing the variation of results caused by the application of the codes by different users. This paper describes a systematic approach to training code users who, upon completion of the training, should be able to perform calculations making the best possible use of the capabilities of best estimate codes. In other words, the program aims at contributing towards solving the problem of user effect. The 3D S.UN.COP (Scaling, Uncertainty and 3D COuPled code calculations) seminars have been organized as follow-up of the proposal to IAEA for the Permanent Training Course for System Code Users. Six seminars have been held at University of Pisa (2003, 2004), at The Pennsylvania State University (2004), at University of Zagreb (2005), at the School of Industrial Engineering of Barcelona (January-February 2006) and in Buenos Aires, Argentina (October 2006), being this last one requested by ARN (Autoridad Regulatoria Nuclear), NA-SA (Nucleoelectrica Argentina S.A) and CNEA (Comision Nacional de Energia Atomica). It was recognized that such courses represented both a source of continuing education for current code users and a mean for current code users to enter the formal training structure of a proposed 'permanent' stepwise approach to user training. The 3D S.UN.COP 2006 in Barcelona was successfully held with the attendance of 33
Thermal-hydraulic system computer codes are extensively used worldwide for analysis of nuclear facilities by utilities, regulatory bodies, nuclear power plant designers and vendors, nuclear fuel companies, research organizations, consulting companies, and technical support organizations. The computer code user represents a source of uncertainty that can influence the results of system code calculations. This influence is commonly known as the 'user effect' and stems from the limitations embedded in the codes as well as from the limited capability of the analysts to use the codes. Code user training and qualification is an effective means for reducing the variation of results caused by the application of the codes by different users. This paper describes a systematic approach to training code users who, upon completion of the training, should be able to perform calculations making the best possible use of the capabilities of best estimate codes. In other words, the program aims at contributing towards solving the problem of user effect. The 3D S.UN.COP 2005 (Scaling, Uncertainty and 3D COuPled code calculations) seminar has been organized by University of Pisa and University of Zagreb as follow-up of the proposal to IAEA for the Permanent Training Course for System Code Users (D'Auria, 1998). It was recognized that such a course represented both a source of continuing education for current code users and a means for current code users to enter the formal training structure of a proposed 'permanent' stepwise approach to user training. The seminar-training was successfully held with the participation of 19 persons coming from 9 countries and 14 different institutions (universities, vendors, national laboratories and regulatory bodies). More than 15 scientists were involved in the organization of the seminar, presenting theoretical aspects of the proposed methodologies and holding the training and the final examination. A certificate (LA Code User grade) was released
Castillo-Reyes, Octavio; de la Puente, Josep; Puzyrev, Vladimir; Cela, José M.
2015-01-01
This paper deals with the most relevant parallel and numerical issues that arise when applying the Edge Element Method in the solution of electromagnetic problems in exploration geophysics. In this sense, in recent years the application of land and marine controlled-source electromagnetic (CSEM) surveys has gained tremendous interest among the offshore exploration community. This method is especially significant in detecting hydrocarbon in shallow/deep waters. On the other hand, in Finite Ele...
Koldan, Jelena; Puzyrev, Vladimir; de la Puente, Josep; Houzeaux, Guillaume; José M. Cela
2014-01-01
We present an elaborate preconditioning scheme for Krylov subspace methods which has been developed to improve the performance and reduce the execution time of parallel node-based finite-element solvers for three-dimensional electromagnetic numerical modelling in exploration geophysics. This new preconditioner is based on algebraic multigrid that uses different basic relaxation methods, such as Jacobi, symmetric successive over-relaxation and Gauss-Seidel, as smoothers and the wav...
Bulovyatov, Alexander
2010-01-01
The band structure computation turns into solving a family of Maxwell eigenvalue problems on the periodicity domain. The discretization is done by the finite element method with special higher order H(curl)- and H1-conforming modified elements. The eigenvalue problem is solved by a preconditioned iterative eigenvalue solver with a projection onto the divergence-free vector fields. As a preconditioner we use the parallel multigrid method with a special Hiptmair smoother.
Aftosmis, M. J.; Berger, M. J.; Murman, S. M.; Kwak, Dochan (Technical Monitor)
2002-01-01
The proposed paper will present recent extensions in the development of an efficient Euler solver for adaptively-refined Cartesian meshes with embedded boundaries. The paper will focus on extensions of the basic method to include solution adaptation, time-dependent flow simulation, and arbitrary rigid domain motion. The parallel multilevel method makes use of on-the-fly parallel domain decomposition to achieve extremely good scalability on large numbers of processors, and is coupled with an automatic coarse mesh generation algorithm for efficient processing by a multigrid smoother. Numerical results are presented demonstrating parallel speed-ups of up to 435 on 512 processors. Solution-based adaptation may be keyed off truncation error estimates using tau-extrapolation or a variety of feature detection based refinement parameters. The multigrid method is extended to for time-dependent flows through the use of a dual-time approach. The extension to rigid domain motion uses an Arbitrary Lagrangian-Eulerlarian (ALE) formulation, and results will be presented for a variety of two- and three-dimensional example problems with both simple and complex geometry.
Kressler, Bryan; Spincemaille, Pascal; Prince, Martin R; Wang, Yi
2006-09-01
Time-resolved 3D MRI with high spatial and temporal resolution can be achieved using spiral sampling and sliding-window reconstruction. Image reconstruction is computationally intensive because of the need for data regridding, a large number of temporal phases, and multiple RF receiver coils. Inhomogeneity blurring correction for spiral sampling further increases the computational work load by an order of magnitude, hindering the clinical utility of spiral trajectories. In this work the reconstruction time is reduced by a factor of >40 compared to reconstruction using a single processor. This is achieved by using a cluster of 32 commercial off-the-shelf computers, commodity networking hardware, and readily available software. The reconstruction system is demonstrated for time-resolved spiral contrast-enhanced (CE) peripheral MR angiography (MRA), and a reduction of reconstruction time from 80 min to 1.8 min is achieved. PMID:16892189
Modifications of the PRONTO 3D finite element program tailored to fast burst nuclear reactor design
This update discusses modifications of PRONTO 3D tailored to the design of fast burst nuclear reactors. A thermoelastic constitutive model and spatially variant thermal history load were added for this special application. Included are descriptions of the thermoelastic constitutive model and the thermal loading algorithm, two example problems used to benchmark the new capability, a user's guide, and PRONTO 3D input files for the example problems. The results from PRONTO 3D thermoelastic finite element analysis are benchmarked against measured data and finite difference calculations. PRONTO 3D is a three-dimensional transient solid dynamics code for analyzing large deformations of highly non-linear materials subjected to high strain rates. The code modifications are implemented in PRONTO 3D Version 5.3.3. 12 refs., 30 figs., 9 tabs
Sung, Chul
2013-08-01
Accurate estimation of neuronal count and distribution is central to the understanding of the organization and layout of cortical maps in the brain, and changes in the cell population induced by brain disorders. High-throughput 3D microscopy techniques such as Knife-Edge Scanning Microscopy (KESM) are enabling whole-brain survey of neuronal distributions. Data from such techniques pose serious challenges to quantitative analysis due to the massive, growing, and sparsely labeled nature of the data. In this paper, we present a scalable, incremental learning algorithm for cell body detection that can address these issues. Our algorithm is computationally efficient (linear mapping, non-iterative) and does not require retraining (unlike gradient-based approaches) or retention of old raw data (unlike instance-based learning). We tested our algorithm on our rat brain Nissl data set, showing superior performance compared to an artificial neural network-based benchmark, and also demonstrated robust performance in a scenario where the data set is rapidly growing in size. Our algorithm is also highly parallelizable due to its incremental nature, and we demonstrated this empirically using a MapReduce-based implementation of the algorithm. We expect our scalable, incremental learning approach to be widely applicable to medical imaging domains where there is a constant flux of new data. © 2013 IEEE.
Professional WebGL Programming Developing 3D Graphics for the Web
Anyuru, Andreas
2012-01-01
Everything you need to know about developing hardware-accelerated 3D graphics with WebGL! As the newest technology for creating 3D graphics on the web, in both games, applications, and on regular websites, WebGL gives web developers the capability to produce eye-popping graphics. This book teaches you how to use WebGL to create stunning cross-platform apps. The book features several detailed examples that show you how to develop 3D graphics with WebGL, including explanations of code snippets that help you understand the why behind the how. You will also develop a stronger understanding of W
Development of parallel/serial program analyzing tool
Japan Atomic Energy Research Institute has been developing 'KMtool', a parallel/serial program analyzing tool, in order to promote the parallelization of the science and engineering computation program. KMtool analyzes the performance of program written by FORTRAN77 and MPI, and it reduces the effort for parallelization. This paper describes development purpose, design, utilization and evaluation of KMtool. (author)
Profiling parallel Mercury programs with ThreadScope
Bone, Paul
2011-01-01
The behavior of parallel programs is even harder to understand than the behavior of sequential programs. Parallel programs may suffer from any of the performance problems affecting sequential programs, as well as from several problems unique to parallel systems. Many of these problems are quite hard (or even practically impossible) to diagnose without help from specialized tools. We present a proposal for a tool for profiling the parallel execution of Mercury programs, a proposal whose implementation we have already started. This tool is an adaptation and extension of the ThreadScope profiler that was first built to help programmers visualize the execution of parallel Haskell programs.
Ervik, Åsmund; Müller, Bernhard
2014-01-01
To leverage the last two decades' transition in High-Performance Computing (HPC) towards clusters of compute nodes bound together with fast interconnects, a modern scalable CFD code must be able to efficiently distribute work amongst several nodes using the Message Passing Interface (MPI). MPI can enable very large simulations running on very large clusters, but it is necessary that the bulk of the CFD code be written with MPI in mind, an obstacle to parallelizing an existing serial code. In this work we present the results of extending an existing two-phase 3D Navier-Stokes solver, which was completely serial, to a parallel execution model using MPI. The 3D Navier-Stokes equations for two immiscible incompressible fluids are solved by the continuum surface force method, while the location of the interface is determined by the level-set method. We employ the Portable Extensible Toolkit for Scientific Computing (PETSc) for domain decomposition (DD) in a framework where only a fraction of the code needs to be a...
A 3D point-kernel multiple scatter model for parallel-beam SPECT based on a gamma-ray buildup factor
A three-dimensional (3D) point-kernel multiple scatter model for point spread function (PSF) determination in parallel-beam single-photon emission computed tomography (SPECT), based on a dose gamma-ray buildup factor, is proposed. This model embraces nonuniform attenuation in a voxelized object of imaging (patient body) and multiple scattering that is treated as in the point-kernel integration gamma-ray shielding problems. First-order Compton scattering is done by means of the Klein-Nishina formula, but the multiple scattering is accounted for by making use of a dose buildup factor. An asset of the present model is the possibility of generating a complete two-dimensional (2D) PSF that can be used for 3D SPECT reconstruction by means of iterative algorithms. The proposed model is convenient in those situations where more exact techniques are not economical. For the proposed model's testing purpose calculations (for the point source in a nonuniform scattering object for parallel beam collimator geometry), the multiple-order scatter PSF generated by means of the proposed model matched well with those using Monte Carlo (MC) simulations. Discrepancies are observed only at the exponential tails mostly due to the high statistic uncertainty of MC simulations in this area, but not because of the inappropriateness of the model
A 3D point-kernel multiple scatter model for parallel-beam SPECT based on a gamma-ray buildup factor
Marinkovic, Predrag; Ilic, Radovan; Spaic, Rajko
2007-09-01
A three-dimensional (3D) point-kernel multiple scatter model for point spread function (PSF) determination in parallel-beam single-photon emission computed tomography (SPECT), based on a dose gamma-ray buildup factor, is proposed. This model embraces nonuniform attenuation in a voxelized object of imaging (patient body) and multiple scattering that is treated as in the point-kernel integration gamma-ray shielding problems. First-order Compton scattering is done by means of the Klein-Nishina formula, but the multiple scattering is accounted for by making use of a dose buildup factor. An asset of the present model is the possibility of generating a complete two-dimensional (2D) PSF that can be used for 3D SPECT reconstruction by means of iterative algorithms. The proposed model is convenient in those situations where more exact techniques are not economical. For the proposed model's testing purpose calculations (for the point source in a nonuniform scattering object for parallel beam collimator geometry), the multiple-order scatter PSF generated by means of the proposed model matched well with those using Monte Carlo (MC) simulations. Discrepancies are observed only at the exponential tails mostly due to the high statistic uncertainty of MC simulations in this area, but not because of the inappropriateness of the model.
Parallel Programming Strategies for Irregular Adaptive Applications
Biswas, Rupak; Biegel, Bryan (Technical Monitor)
2001-01-01
Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance for such computations. In this work, we examine two typical irregular adaptive applications, Dynamic Remeshing and N-Body, under competing programming methodologies and across various parallel architectures. The Dynamic Remeshing application simulates flow over an airfoil, and refines localized regions of the underlying unstructured mesh. The N-Body experiment models two neighboring Plummer galaxies that are about to undergo a merger. Both problems demonstrate dramatic changes in processor workloads and interprocessor communication with time; thus, dynamic load balancing is a required component.
Flexible language constructs for large parallel programs
Rosing, Matthew; Schnabel, Robert
1993-01-01
The goal of the research described is to develop flexible language constructs for writing large data parallel numerical programs for distributed memory (MIMD) multiprocessors. Previously, several models have been developed to support synchronization and communication. Models for global synchronization include SIMD (Single Instruction Multiple Data), SPMD (Single Program Multiple Data), and sequential programs annotated with data distribution statements. The two primary models for communication include implicit communication based on shared memory and explicit communication based on messages. None of these models by themselves seem sufficient to permit the natural and efficient expression of the variety of algorithms that occur in large scientific computations. An overview of a new language that combines many of these programming models in a clean manner is given. This is done in a modular fashion such that different models can be combined to support large programs. Within a module, the selection of a model depends on the algorithm and its efficiency requirements. An overview of the language and discussion of some of the critical implementation details is given.
The best estimate thermal-hydraulic codes used in the area of nuclear reactor safety have reached a marked level of sophistication and they require to be used by competent analysts. The need for user qualification and training is clearly recognized. An effort is being made to develop a proposal for a systematic approach to user training. The estimated duration of training at the course venue, including a set of training seminars, workshops, and practical exercises, is approximately two years. In addition, the specification and assignment of tasks to be performed by the participants at their home institutions, with continuous supervision from the training center, has been foreseen. The 3D S.UN.COP seminars constitute the follow-up of the presented proposal. The seminar is subdivided into three main parts, each of one with a program to be developed in one week: the first week is dedicated to fundamental theoretical aspects, the second week deals with industrial application, coupling methodologies and hands-on training, and the third week focuses on training for transient analysis in the interaction between thermal-hydraulics and fuel behaviour. The responses of the participants during the training have demonstrated an increase in the capabilities to develop and/or modify nodalization and to perform a qualitative and quantitative accuracy evaluation. It is expected that the participants will be able to set up more accurate, reliable and efficient simulation models, applying the procedures for qualifying the thermal-hydraulic system code calculations, and for the evaluation of the uncertainty
Evaluating the state of the art of parallel programming systems
Süß, Michael; Leopold, Claudia
2005-01-01
This paper describes our plans to evaluate the present state of affairs concerning parallel programming and its systems. Three subprojects are proposed: a survey among programmers and scientists, a comparison of parallel programming systems using a standard set of test programs, and a wiki resource for the parallel programming community - the Parawiki. We would like to invite you to participate and turn these subprojects into true community efforts.
Data-Parallel Programming in a Multithreaded Environment
Matthew Haines; Piyush Mehrotra; David Cronk
1997-01-01
Research on programming distributed memory multiprocessors has resulted in a well-understood programming model, namely data-parallel programming. However, data-parallel programming in a multithreaded environment is far less understood. For example, if multiple threads within the same process belong to different data-parallel computations, then the architecture, compiler, or run-time system must ensure that relative indexing and collective operations are handled properly and efficiently. We in...
Li, Shengtai [Los Alamos National Laboratory; Li, Hui [Los Alamos National Laboratory
2012-06-14
sensitive to the position of the planet, we adopt the corotating frame that allows the planet moving only in radial direction if only one planet is present. This code has been extensively tested on a number of problems. For the earthmass planet with constant aspect ratio h = 0.05, the torque calculated using our code matches quite well with the the 3D linear theory results by Tanaka et al. (2002). The code is fully parallelized via message-passing interface (MPI) and has very high parallel efficiency. Several numerical examples for both fixed planet and moving planet are provided to demonstrate the efficacy of the numerical method and code.
An exception handling mechanism for parallel object-oriented programming
Issarny, Valérie
1992-01-01
Paradigms of parallel object-oriented programming are attractive for the design of large distributed software. They notably provide sound basis to develop applications that are easy to maintain and reuse. This paper investigates the issue of robustness for parallel object-oriented applications. An exception handling mechanism for strongly-typed, parallel object-oriented programming is introduced. The mechanism is based on a parallel exception handling model whose features enforce the developm...
Parallel programming in Go and Scala : A performance comparison
Johnell, Carl
2015-01-01
This thesis provides a performance comparison of parallel programming in Go and Scala. Go supports concurrency through goroutines and channels. Scala have parallel collections, futures and actors that can be used for concurrent and parallel programming. The experiment used two different types of algorithms to compare the performance between Go and Scala. Parallel versions of matrix multiplication and matrix chain multiplication were implemented with goroutines and channels in Go. Matrix m...
Tramm, John R.; Gunow, Geoffrey; He, Tim; Smith, Kord S.; Forget, Benoit; Siegel, Andrew R.
2016-05-01
In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures.
High-Fidelity RF Gun Simulations with the Parallel 3D Finite Element Particle-In-Cell Code Pic3P
Candel, A; Kabel, A.; Lee, L.; Li, Z.; Limborg, C.; Ng, C.; Schussman, G.; Ko, K.; /SLAC
2009-06-19
SLAC's Advanced Computations Department (ACD) has developed the first parallel Finite Element 3D Particle-In-Cell (PIC) code, Pic3P, for simulations of RF guns and other space-charge dominated beam-cavity interactions. Pic3P solves the complete set of Maxwell-Lorentz equations and thus includes space charge, retardation and wakefield effects from first principles. Pic3P uses higher-order Finite Elementmethods on unstructured conformal meshes. A novel scheme for causal adaptive refinement and dynamic load balancing enable unprecedented simulation accuracy, aiding the design and operation of the next generation of accelerator facilities. Application to the Linac Coherent Light Source (LCLS) RF gun is presented.
Parallel solution of sparse one-dimensional dynamic programming problems
Nicol, David M.
1989-01-01
Parallel computation offers the potential for quickly solving large computational problems. However, it is often a non-trivial task to effectively use parallel computers. Solution methods must sometimes be reformulated to exploit parallelism; the reformulations are often more complex than their slower serial counterparts. We illustrate these points by studying the parallelization of sparse one-dimensional dynamic programming problems, those which do not obviously admit substantial parallelization. We propose a new method for parallelizing such problems, develop analytic models which help us to identify problems which parallelize well, and compare the performance of our algorithm with existing algorithms on a multiprocessor.
Parallelization for first principles electronic state calculation program
In this report we study the parallelization for First principles electronic state calculation program. The target machines are NEC SX-4 for shared memory type parallelization and FUJITSU VPP300 for distributed memory type parallelization. The features of each parallel machine are surveyed, and the parallelization methods suitable for each are proposed. It is shown that 1.60 times acceleration is achieved with 2 CPU parallelization by SX-4 and 4.97 times acceleration is achieved with 12 PE parallelization by VPP 300. (author)
Step by step parallel programming method for molecular dynamics code
Parallel programming for a numerical simulation program of molecular dynamics is carried out with a step-by-step programming technique using the two phase method. As a result, within the range of a certain computing parameters, it is found to obtain parallel performance by using the level of parallel programming which decomposes the calculation according to indices of do-loops into each processor on the vector parallel computer VPP500 and the scalar parallel computer Paragon. It is also found that VPP500 shows parallel performance in wider range computing parameters. The reason is that the time cost of the program parts, which can not be reduced by the do-loop level of the parallel programming, can be reduced to the negligible level by the vectorization. After that, the time consuming parts of the program are concentrated on less parts that can be accelerated by the do-loop level of the parallel programming. This report shows the step-by-step parallel programming method and the parallel performance of the molecular dynamics code on VPP500 and Paragon. (author)
Data-Parallel Programming in a Multithreaded Environment
Matthew Haines
1997-01-01
Full Text Available Research on programming distributed memory multiprocessors has resulted in a well-understood programming model, namely data-parallel programming. However, data-parallel programming in a multithreaded environment is far less understood. For example, if multiple threads within the same process belong to different data-parallel computations, then the architecture, compiler, or run-time system must ensure that relative indexing and collective operations are handled properly and efficiently. We introduce a run-time-based solution for data-parallel programming in a distributed memory environment that handles the problems of relative indexing and collective communications among thread groups. As a result, the data-parallel programming model can now be executed in a multithreaded environment, such as a system using threads to support both task and data parallelism.
DeJong, Andrew
Numerical models of fluid-structure interaction have grown in importance due to increasing interest in environmental energy harvesting, airfoil-gust interactions, and bio-inspired formation flying. Powered by increasingly powerful parallel computers, such models seek to explain the fundamental physics behind the complex, unsteady fluid-structure phenomena. To this end, a high-fidelity computational model based on the high-order spectral difference method on 3D unstructured, dynamic meshes has been developed. The spectral difference method constructs continuous solution fields within each element with a Riemann solver to compute the inviscid fluxes at the element interfaces and an averaging mechanism to compute the viscous fluxes. This method has shown promise in the past as a highly accurate, yet sufficiently fast method for solving unsteady viscous compressible flows. The solver is monolithically coupled to the equations of motion of an elastically mounted 3-degree of freedom rigid bluff body undergoing flow-induced lift, drag, and torque. The mesh is deformed using 4 methods: an analytic function, Laplace equation, biharmonic equation, and a bi-elliptic equation with variable diffusivity. This single system of equations -- fluid and structure -- is advanced through time using a 5-stage, 4th-order Runge-Kutta scheme. Message Passing Interface is used to run the coupled system in parallel on up to 240 processors. The solver is validated against previously published numerical and experimental data for an elastically mounted cylinder. The effect of adding an upstream body and inducing wake galloping is observed.
Parallel phase model : a programming model for high-end parallel machines with manycores.
Wu, Junfeng (Syracuse University, Syracuse, NY); Wen, Zhaofang; Heroux, Michael Allen; Brightwell, Ronald Brian
2009-04-01
This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster of nodes with a large number of cores on each node. PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster level. The programming abstraction will be suitable for expressing both fine-grained and coarse-grained parallelism. It includes a few high-level parallel programming language constructs that can be added as an extension to an existing (sequential or parallel) programming language such as C; and the implementation of PPM also includes a light-weight runtime library that runs on top of an existing network communication software layer (e.g. MPI). Design philosophy of PPM and details of the programming abstraction are also presented. Several unstructured applications that inherently require high-volume random fine-grained data accesses have been implemented in PPM with very promising results.
Introduction to 3D Graphics through Excel
Benacka, Jan
2013-01-01
The article presents a method of explaining the principles of 3D graphics through making a revolvable and sizable orthographic parallel projection of cuboid in Excel. No programming is used. The method was tried in fourteen 90 minute lessons with 181 participants, which were Informatics teachers, undergraduates of Applied Informatics and gymnasium…
Gabriele Jost; Bob Robins
2010-01-01
Today most systems in high-performance computing (HPC) feature a hierarchical hardware design: shared-memory nodes with several multi-core CPUs are connected via a network infrastructure. When parallelizing an application for these architectures it seems natural to employ a hierarchical programming model such as combining MPI and OpenMP. Nevertheless, there is the general lore that pure MPI outperforms the hybrid MPI/OpenMP approach. In this paper, we describe the hybrid MPI/OpenMP paralleliz...
Optimizing FORTRAN Programs for Hierarchical Memory Parallel Processing Systems
金国华; 陈福接
1993-01-01
Parallel loops account for the greatest amount of parallelism in numerical programs.Executing nested loops in parallel with low run-time overhead is thus very important for achieving high performance in parallel processing systems.However,in parallel processing systems with caches or local memories in memory hierarchies,“thrashing problemmay”may arise whenever data move back and forth between the caches or local memories in different processors.Previous techniques can only deal with the rather simple cases with one linear function in the perfactly nested loop.In this paper,we present a parallel program optimizing technique called hybri loop interchange(HLI)for the cases with multiple linear functions and loop-carried data dependences in the nested loop.With HLI we can easily eliminate or reduce the thrashing phenomena without reucing the program parallelism.
Parallel Programming with Matrix Distributed Processing
Di Pierro, Massimo
2005-01-01
Matrix Distributed Processing (MDP) is a C++ library for fast development of efficient parallel algorithms. It constitues the core of FermiQCD. MDP enables programmers to focus on algorithms, while parallelization is dealt with automatically and transparently. Here we present a brief overview of MDP and examples of applications in Computer Science (Cellular Automata), Engineering (PDE Solver) and Physics (Ising Model).
Helper locks for fork-join parallel programming
Agrawal, Kunal; Leiserson, Charles E.; Sukha, Jim
2010-01-01
Helper locks allow programs with large parallel critical sections, called parallel regions, to execute more efficiently by enlisting processors that might otherwise be waiting on the helper lock to aid in the execution of the parallel region. Suppose that a processor p is executing a parallel region A after having acquired the lock L protecting A. If another processor p′ tries to acquire L, then instead of blocking and waiting for p to complete A, processor p′ joins p to ...
Development of a patient-specific 3D dose evaluation program for QA in radiation therapy
Lee, Suk; Chang, Kyung Hwan; Cao, Yuan Jie; Shim, Jang Bo; Yang, Dae Sik; Park, Young Je; Yoon, Won Sup; Kim, Chul Yong
2015-03-01
We present preliminary results for a 3-dimensional dose evaluation software system ( P DRESS, patient-specific 3-dimensional dose real evaluation system). Scanned computed tomography (CT) images obtained by using dosimetry were transferred to the radiation treatment planning system (ECLIPSE, VARIAN, Palo Alto, CA) where the intensity modulated radiation therapy (IMRT) nasopharynx plan was designed. We used a 10 MV photon beam (CLiX, VARIAN, Palo Alto, CA) to deliver the nasopharynx treatment plan. After irradiation, the TENOMAG dosimeter was scanned using a VISTA ™ scanner. The scanned data were reconstructed using VistaRecon software to obtain a 3D dose distribution of the optical density. An optical-CT scanner was used to readout the dose distribution in the gel dosimeter. Moreover, we developed the P DRESS by using Flatform, which were developed by our group, to display the 3D dose distribution by loading the DICOM RT data which are exported from the radiotherapy treatment plan (RTP) and the optical-CT reconstructed VFF file, into the independent P DRESS with an ioniz ation chamber and EBT film was used to compare the dose distribution calculated from the RTP with that measured by using a gel dosimeter. The agreement between the normalized EBT, the gel dosimeter and RTP data was evaluated using both qualitative and quantitative methods, such as the isodose distribution, dose difference, point value, and profile. The profiles showed good agreement between the RTP data and the gel dosimeter data, and the precision of the dose distribution was within ±3%. The results from this study showed significantly discrepancies between the dose distribution calculated from the treatment plan and the dose distribution measured by a TENOMAG gel and by scanning with an optical CT scanner. The 3D dose evaluation software system ( P DRESS, patient specific dose real evaluation system), which were developed in this study evaluates the accuracies of the three-dimensional dose
An interactive parallel programming environment applied in atmospheric science
vonLaszewski, G.
1996-01-01
This article introduces an interactive parallel programming environment (IPPE) that simplifies the generation and execution of parallel programs. One of the tasks of the environment is to generate message-passing parallel programs for homogeneous and heterogeneous computing platforms. The parallel programs are represented by using visual objects. This is accomplished with the help of a graphical programming editor that is implemented in Java and enables portability to a wide variety of computer platforms. In contrast to other graphical programming systems, reusable parts of the programs can be stored in a program library to support rapid prototyping. In addition, runtime performance data on different computing platforms is collected in a database. A selection process determines dynamically the software and the hardware platform to be used to solve the problem in minimal wall-clock time. The environment is currently being tested on a Grand Challenge problem, the NASA four-dimensional data assimilation system.
Deterministic Consistency: A Programming Model for Shared Memory Parallelism
Aviram, Amittai; Ford, Bryan
2009-01-01
The difficulty of developing reliable parallel software is generating interest in deterministic environments, where a given program and input can yield only one possible result. Languages or type systems can enforce determinism in new code, and runtime systems can impose synthetic schedules on legacy parallel code. To parallelize existing serial code, however, we would like a programming model that is naturally deterministic without language restrictions or artificial scheduling. We propose "...
Programming parallel architectures - The BLAZE family of languages
Mehrotra, Piyush
1989-01-01
This paper gives an overview of the various approaches to programming multiprocessor architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive, since they remove much of the burden of exploiting parallel architectures from the user. This paper also describes recent work in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described.
Meulien Ohlmann, Odile
2013-02-01
Today the industry offers a chain of 3D products. Learning to "read" and to "create in 3D" becomes an issue of education of primary importance. 25 years professional experience in France, the United States and Germany, Odile Meulien set up a personal method of initiation to 3D creation that entails the spatial/temporal experience of the holographic visual. She will present some different tools and techniques used for this learning, their advantages and disadvantages, programs and issues of educational policies, constraints and expectations related to the development of new techniques for 3D imaging. Although the creation of display holograms is very much reduced compared to the creation of the 90ies, the holographic concept is spreading in all scientific, social, and artistic activities of our present time. She will also raise many questions: What means 3D? Is it communication? Is it perception? How the seeing and none seeing is interferes? What else has to be taken in consideration to communicate in 3D? How to handle the non visible relations of moving objects with subjects? Does this transform our model of exchange with others? What kind of interaction this has with our everyday life? Then come more practical questions: How to learn creating 3D visualization, to learn 3D grammar, 3D language, 3D thinking? What for? At what level? In which matter? for whom?
Wilson, R. B.; Bak, M. J.; Nakazawa, S.; Banerjee, P. K.
1984-01-01
A 3-D inelastic analysis methods program consists of a series of computer codes embodying a progression of mathematical models (mechanics of materials, special finite element, boundary element) for streamlined analysis of combustor liners, turbine blades, and turbine vanes. These models address the effects of high temperatures and thermal/mechanical loadings on the local (stress/strain) and global (dynamics, buckling) structural behavior of the three selected components. These models are used to solve 3-D inelastic problems using linear approximations in the sense that stresses/strains and temperatures in generic modeling regions are linear functions of the spatial coordinates, and solution increments for load, temperature and/or time are extrapolated linearly from previous information. Three linear formulation computer codes, referred to as MOMM (Mechanics of Materials Model), MHOST (MARC-Hot Section Technology), and BEST (Boundary Element Stress Technology), were developed and are described.
Kordy, M.; Wannamaker, P.; Maris, V.; Cherkaev, E.; Hill, G.
2016-01-01
We have developed an algorithm, which we call HexMT, for 3-D simulation and inversion of magnetotelluric (MT) responses using deformable hexahedral finite elements that permit incorporation of topography. Direct solvers parallelized on symmetric multiprocessor (SMP), single-chassis workstations with large RAM are used throughout, including the forward solution, parameter Jacobians and model parameter update. In Part I, the forward simulator and Jacobian calculations are presented. We use first-order edge elements to represent the secondary electric field (E), yielding accuracy O(h) for E and its curl (magnetic field). For very low frequencies or small material admittivities, the E-field requires divergence correction. With the help of Hodge decomposition, the correction may be applied in one step after the forward solution is calculated. This allows accurate E-field solutions in dielectric air. The system matrix factorization and source vector solutions are computed using the MKL PARDISO library, which shows good scalability through 24 processor cores. The factorized matrix is used to calculate the forward response as well as the Jacobians of electromagnetic (EM) field and MT responses using the reciprocity theorem. Comparison with other codes demonstrates accuracy of our forward calculations. We consider a popular conductive/resistive double brick structure, several synthetic topographic models and the natural topography of Mount Erebus in Antarctica. In particular, the ability of finite elements to represent smooth topographic slopes permits accurate simulation of refraction of EM waves normal to the slopes at high frequencies. Run-time tests of the parallelized algorithm indicate that for meshes as large as 176 × 176 × 70 elements, MT forward responses and Jacobians can be calculated in ˜1.5 hr per frequency. Together with an efficient inversion parameter step described in Part II, MT inversion problems of 200-300 stations are computable with total run times
Based on matlab 3d visualization programming in the application of the uranium exploration
Combined geological theory, geophysical curve and Matlab programming three dimensional visualization applied to the production of uranium exploration. With a simple Matlab programming, numerical processing and graphical visualization of convenient features, and effective in identifying ore bodies, recourse to ore, ore body delineation of the scope of analysis has played the role of sedimentary environment. (author)
Survey on present status and trend of parallel programming environments
This report intends to provide useful information on software tools for parallel programming through the survey on parallel programming environments of the following six parallel computers, Fujitsu VPP300/500, NEC SX-4, Hitachi SR2201, Cray T94, IBM SP, and Intel Paragon, all of which are installed at Japan Atomic Energy Research Institute (JAERI), moreover, the present status of R and D's on parallel softwares of parallel languages, compilers, debuggers, performance evaluation tools, and integrated tools is reported. This survey has been made as a part of our project of developing a basic software for parallel programming environment, which is designed on the concept of STA (Seamless Thinking Aid to programmers). (author)
The BLAZE language: A parallel language for scientific programming
Mehrotra, P.; Vanrosendale, J.
1985-01-01
A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.
The BLAZE language - A parallel language for scientific programming
Mehrotra, Piyush; Van Rosendale, John
1987-01-01
A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.
Santasusana, Miquel; Irazábal, Joaquín; Oñate, Eugenio; Carbonell, Josep Maria
2016-07-01
In this work, we present a new methodology for the treatment of the contact interaction between rigid boundaries and spherical discrete elements (DE). Rigid body parts are present in most of large-scale simulations. The surfaces of the rigid parts are commonly meshed with a finite element-like (FE) discretization. The contact detection and calculation between those DE and the discretized boundaries is not straightforward and has been addressed by different approaches. The algorithm presented in this paper considers the contact of the DEs with the geometric primitives of a FE mesh, i.e. facet, edge or vertex. To do so, the original hierarchical method presented by Horner et al. (J Eng Mech 127(10):1027-1032, 2001) is extended with a new insight leading to a robust, fast and accurate 3D contact algorithm which is fully parallelizable. The implementation of the method has been developed in order to deal ideally with triangles and quadrilaterals. If the boundaries are discretized with another type of geometries, the method can be easily extended to higher order planar convex polyhedra. A detailed description of the procedure followed to treat a wide range of cases is presented. The description of the developed algorithm and its validation is verified with several practical examples. The parallelization capabilities and the obtained performance are presented with the study of an industrial application example.
Pathak, Ashish; Raessi, Mehdi
2014-11-01
We present a 3D MPI-parallel, GPU-accelerated computational tool that captures the interaction between a moving rigid body and two-fluid flows. Although the immediate application is the study of ocean wave energy converters (WECs), the model was developed at a general level and can be used in other applications. Solving the full Navier-Stokes equations, the model is able to capture non-linear effects, including wave-breaking and fluid-structure interaction, that have significant impact on WEC performance. To transport mass and momentum, we use a consistent scheme that can handle large density ratios (e.g. air/water). We present a novel reconstruction scheme for resolving three-phase (solid-liquid-gas) cells in the volume-of-fluid context, where the fluid interface orientation is estimated via a minimization procedure, while imposing a contact angle. The reconstruction allows for accurate mass and momentum transport in the vicinity of three-phase cells. The fast-fictitious-domain method is used for capturing the interaction between a moving rigid body and two-fluid flow. The pressure Poisson solver is accelerated using GPUs in the MPI framework. We present results of an array of test cases devised to assess the performance and accuracy of the computational tool.
The SMC (Short Model Coil) Nb3Sn Program: FE Analysis with 3D Modeling
Kokkinos, C; Guinchard, M; Karppinen, M; Manil, P; Perez, J C; Regis, F
2012-01-01
The SMC (Short Model Coil) project aims at testing superconducting coils in racetrack configuration, wound with Nb3Sn cable. The degradation of the magnetic properties of the cable is studied by applying different levels of pre-stress. It is an essential step in the validation of procedures for the construction of superconducting magnets with high performance conductor. Two SMC assemblies have been completed and cold tested in the frame of a European collaboration between CEA (FR), CERN and STFC (UK), with the technical support from LBNL (US). The second assembly showed remarkable good quench results, reaching a peak field of 12.5T. This paper details the new 3D modeling method of the SMC, implemented using the ANSYS® Workbench environment. Advanced computer-aided-design (CAD) tools are combined with multi-physics Finite Element Analyses (FEA), in the same integrated graphic interface, forming a fully parametric model that enables simulation driven development of the SMC project. The magnetic and structural ...
A Parallel Program Analysis Framework for the ACTS Toolkit; FINAL
OAK 270 - The final report summarizes the technical progress achieved during the project. A Parallel Program Analysis Framework for the acts toolkit, referred to as the TAU project. Described are the results in four work areas: (1) creation of a performance system for integrated instrumentation, measurement, analysis and visualization. (2) development of a performance measurement system for parallel profiling and tracing (3) development of an advanced program analysis system to enable creation of source-based performance and programing tools (4) development of parallel program interaction technology for accessing, performance information and application data during execution
Professional Parallel Programming with C# Master Parallel Extensions with NET 4
Hillar, Gastón
2010-01-01
Expert guidance for those programming today's dual-core processors PCs As PC processors explode from one or two to now eight processors, there is an urgent need for programmers to master concurrent programming. This book dives deep into the latest technologies available to programmers for creating professional parallel applications using C#, .NET 4, and Visual Studio 2010. The book covers task-based programming, coordination data structures, PLINQ, thread pools, asynchronous programming model, and more. It also teaches other parallel programming techniques, such as SIMD and vectorization.Teach
Grundy - Parallel processor architecture makes programming easy
Meier, R. J., Jr.
1985-01-01
The hardware, software, and firmware of the parallel processor, Grundy, are examined. The Grundy processor uses a simple processor that has a totally orthogonal three-address instruction set. The system contains a relative and indirect processing mode to support the high-level language, and uses pseudoprocessors and read-only memory. The system supports high-level language in which arbitrary degrees of algorithmic parallelism is expressed. The functions of the compiler and invocation frame are described. Grundy uses an operating system that can be accessed by an arbitrary number of processes simultaneously, and the access time grows only as the logarithm of the number of active processes. Applications for the parallel processor are discussed.
Simulation in 3 dimensions of a cycle 18 months for an BWR type reactor using the Nod3D program
The development of own codes that you/they allow the simulation in 3 dimensions of the nucleus of a reactor and be of easy maintenance, without the consequent payment of expensive use licenses, it can be a factor that propitiates the technological independence. In the Department of Nuclear Engineering (DIN) of the Superior School of Physics and Mathematics (ESFM) of the National Polytechnic Institute (IPN) a denominated program Nod3D has been developed with the one that one can simulate the operation of a reactor BWR in 3 dimensions calculating the effective multiplication factor (kJJ3, as well as the distribution of the flow neutronic and of the axial and radial profiles of the power, inside a means of well-known characteristics solving the equations of diffusion of neutrons numerically in stationary state and geometry XYZ using the mathematical nodal method RTN0 (Raviart-Thomas-Nedelec of index zero). One of the limitations of the program Nod3D is that it doesn't allow to consider the burnt of the fuel in an independent way considering feedback, this makes it in an implicit way considering the effective sections in each step of burnt and these sections are obtained of the code Core Master LEND. However even given this limitation, the results obtained in the simulation of a cycle of typical operation of a reactor of the type BWR are similar to those reported by the code Core Master LENDS. The results of the keJ - that were obtained with the program Nod3D they were compared with the results of the code Core Master LEND, presenting a difference smaller than 0.2% (200 pcm), and in the case of the axial profile of power, the maxim differs it was of 2.5%. (Author)
Generating Complex Molecular Graphics Using Automated Programs that Work with Raster 3D
MEHLHORN,DEREK T.
2000-01-01
Two programs have been written in C++ to greatly automate the process of computer simulation visualization inmost cases. These programs, rasterize.C and tracker.C, can be used to generate numerous images in order to create a video or still ties. In order to limit the amount of time and work involved in visualizing simulations, both of these programs have their own specific output formats. The first output format, from rasterize.C, is best suited for those who need only to visualize the actions of a single element, or elements that work on roughly the same time scale. The second format, from tracker.C, is best suited for simulations which involve multiple elements that work on different time scales and thus must be represented in a manner other than straight forward visualization.
Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++
Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis
1994-01-01
Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.
Declarative Parallel Programming in Spreadsheet End-User Development
Biermann, Florian
2016-01-01
Spreadsheets are first-order functional languages and are widely used in research and industry as a tool to conveniently perform all kinds of computations. Because cells on a spreadsheet are immutable, there are possibilities for implicit parallelization of spreadsheet computations....... In this literature study, we provide an overview of the publications on spreadsheet end-user programming and declarative array programming to inform further research on parallel programming in spreadsheets. Our results show that there is a clear overlap between spreadsheet programming and array programming and we...... can directly apply results from functional array programming to a spreadsheet model of computations....
3D dosimetry by compass program with array detector for volumetric modulated arc therapy
The aim of this study was to analyze the accuracy of dose of volumetric modulated arc therapy using the home-made phantom, a glass detector, GafChromic Film, ion chamber and a compass program with matrixx detector. We measured the isodose curve in the RTtarget, LTtagert and G4 using the compass program and Matrixx detector with homemade multi-purpose VMAT phantom, three times a day measured by five days. Measurements were compared with the calculated values. Compass analysis program was used to analysis relative iso dose curve. As a result, average passing rate were 85.22% ± 1., 89.96% ± 2. and 95.14 % ± 1.18. Compass analysis program and Matrixx detector are useful dose verification tools for Volumetric Modulated Arc Therapy. However, doses were somewhat different between calculated dose and measured dose at steep dose gradient region and low dose region. We recommend that absolute dose be necessary to be measured using the glass detector and ion chamber at region.
A Performance Analysis Tool for PVM Parallel Programs
Chen Wang; Yin Liu; Changjun Jiang; Zhaoqing Zhang
2004-01-01
In this paper,we introduce the design and implementation of ParaVT,which is a visual performance analysis and parallel debugging tool.In ParaVT,we propose an automated instrumentation mechanism. Based on this mechanism,ParaVT automatically analyzes the performance bottleneck of parallel applications and provides a visual user interface to monitor and analyze the performance of parallel programs.In addition ,it also supports certain extensions.
Programming parallel architectures: The BLAZE family of languages
Mehrotra, Piyush
1988-01-01
Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.
Denis, Alexandre; Pérez, Christian; Priol, Thierry
2001-01-01
With the availability of Computational Grids, new kinds of applications that will soon emerge will raise the problem of how to program them on such computing systems. In this paper, we advocate a programming model that is based on a combination of parallel and distributed programming models. Compared to previous approaches, this work aims at bringing SPMD programming into CORBA. For example, we want to interconnect two MPI codes by CORBA without modifying MPI or CORBA. We show that such an ap...
Multi-coupling dynamic model and 3d simulation program for in-situ leaching of uranium mining
The in-situ leaching of uranium mining is a very complicated non-linear dynamic system, which involves couplings and positive/negative feedback among many factors and processes. A comprehensive, coupled multi-factors and processes dynamic model and simulation method was established to study the in-situ leaching of uranium mining. The model accounts for most coupling among various processes as following: (1) rock texture mechanics and its evolution, (2)the incremental stress rheology of rock deformation, (3) 3-D viscoelastic/ plastic multi-deformation processes, (4) hydrofracturing, (5) tensorial (anisotropic) fracture and rock permeability, (6) water-rock interactions and mass-transport (both advective and diffusive), (7) dissolution-induced chemical compaction, (8) multi-phase fluid flow. A 3-D simulation program was compiled based on Fortran and C++. An example illustrating the application of this model to simulating acidification, production and terminal stage of in situ leaching of uranium mining is presented for the some mine in Xinjiang, China. This model and program can be used for theoretical study, mine design, production management, the study of contaminant transport and restoration in groundwater of in-situ leaching of uranium mining. (authors)
The 3D structure of the hadrons: recents results and experimental program at Jefferson Lab
Muñoz Camacho C.
2014-04-01
Full Text Available The understanding of Quantum Chromodynamics (QCD at large distances still remains one of the main outstanding problems of nuclear physics. Studying the internal structure of hadrons provides a way to probe QCD in the non-perturbative domain and can help us unravel the internal structure of the most elementary blocks of matter. Jefferson Lab (JLab has already delivered results on how elementary quarks and gluons create nucleon structure and properties. The upgrade of JLab to 12 GeV will allow the full exploration of the valence-quark structure of nucleons and the extraction of real threedimensional pictures. I will present recent results and review the future experimental program at JLab.
Programming Probabilistic Structural Analysis for Parallel Processing Computer
Sues, Robert H.; Chen, Heh-Chyun; Twisdale, Lawrence A.; Chamis, Christos C.; Murthy, Pappu L. N.
1991-01-01
The ultimate goal of this research program is to make Probabilistic Structural Analysis (PSA) computationally efficient and hence practical for the design environment by achieving large scale parallelism. The paper identifies the multiple levels of parallelism in PSA, identifies methodologies for exploiting this parallelism, describes the development of a parallel stochastic finite element code, and presents results of two example applications. It is demonstrated that speeds within five percent of those theoretically possible can be achieved. A special-purpose numerical technique, the stochastic preconditioned conjugate gradient method, is also presented and demonstrated to be extremely efficient for certain classes of PSA problems.
Defeasible logic programming: language definition, operational semantics, and parallelism
García, Alejandro Javier
2001-01-01
This thesis defines Defeasible Logic Programming and provides a concrete specification of this new language through its operational semantics. Defeasible Logic Programming, or DeLP for short, has been defined based on the Logic Programming paradigm and considering features of recent developments in the area of Defeasible Argumentation. DeLP relates and improves many aspects of the areas of Logic Programming, Defeasible Argumentation, Intelligent Agents, and Parallel Logic Programming
Using parallel programming environments on clusters of workstations
da Cunha, Rudnei Dias; Hopkins, Tim
1993-01-01
We report our experiences using the parallel programming environments, PVM, HeNCE, p4 and TCGMSG and discuss some aspects concerning the performance and software engineering issues. A brief overview of each environment is given and a number of case studies written using a number of different programming paradigms are presented. Some of the examples presented are simple ``building-blocks'' which may enhance the performance of parallel applications, others are complete applications.
The FORCE - A highly portable parallel programming language
Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger
1989-01-01
This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.
The FORCE: A highly portable parallel programming language
Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger
1989-01-01
Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.
Integrated Task And Data Parallel Programming: Language Design
Grimshaw, Andrew S.; West, Emily A.
1998-01-01
his research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers '95 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program m. Additional 1995 Activities During the fall I collaborated
INGEN, 2-D, 3-D Mesh Generator for Finite Elements Program
1 - Description of problem or function: INGEN is a general-purpose mesh generator for use in conjunction with two- and three-dimensional finite element programs. The basic components of INGEN are surface and three-dimensional region generators that use linear-blending interpolation formulae. These generators are based on an i, j, k index scheme, which is used to number nodal points, construct elements, and develop displacement and traction boundary conditions. 2 - Method of solution: The user of INGEN develops a mesh grading by first generating the boundary edges of the mesh with the desired spacing of nodal points using the line and circular-arc generators and then using surface and volume (three-dimensional region) generators, both of which preserve this spacing. The surface nodal-point generator preserves this spacing by using the nodal points as they are distributed along the boundary edges as the criteria for spacing the surface nodal points. Similarly, the volume nodal-point generator calculates the interior nodal points, using the surface nodal points as the criteria for spacing the interior nodal points. Both the surface and volume generators use linear-blending interpolation equations for calculating nodal point coordinates. 3 - Restrictions on the complexity of the problem: The origin cannot be used as the coordinates of a nodal point because the zero coordinates are a test used for nonexistent nodal points
Huang, L. C. P.; Cook, R. A.
1973-01-01
Models utilizing various sub-sets of the six degrees of freedom are used in trajectory simulation. A 3-D model with only linear degrees of freedom is especially attractive, since the coefficients for the angular degrees of freedom are the most difficult to determine and the angular equations are the most time consuming for the computer to evaluate. A computer program is developed that uses three separate subsections to predict trajectories. A launch rail subsection is used until the rocket has left its launcher. The program then switches to a special 3-D section which computes motions in two linear and one angular degrees of freedom. When the rocket trims out, the program switches to the standard, three linear degrees of freedom model.
User's guide of parallel program development environment (PPDE)
The Center for Promotion of Computational Science and Engineering has conducted R and D on the technology basis of parallel processing and has developed a software environment of the STA (Seamless Thinking Aid) basic system to support 'Seamless Thinking' in parallel programming on various kinds of parallel computers. In the STA basic system, the PPDF (Parallel Program Development Environment) provides the environment for the integrated use of tools such as an editor, a compiler, a debugger and a performance evaluation tool. The PPDE facilitates information exchange between an editor and other tools so that the analysis of the information done with each tool is displayed on the editor, in place of the source line of programs. This report describes the use of the PPDE. (author)
Kumar, D.
1980-01-01
The computer program AFTBDY generates a body fitted curvilinear coordinate system for a wedge curved after body. This wedge curved after body is being used in an experimental program. The coordinate system generated by AFTBDY is used to solve 3D compressible N.S. equations. The coordinate system in the physical plane is a cartesian x,y,z system, whereas, in the transformed plane a rectangular xi, eta, zeta system is used. The coordinate system generated is such that in the transformed plane coordinate spacing in the xi, eta, zeta direction is constant and equal to unity. The physical plane coordinate lines in the different regions are clustered heavily or sparsely depending on the regions where physical quantities to be solved for by the N.S. equations have high or low gradients. The coordinate distribution in the physical plane is such that x stays constant in eta and zeta direction, whereas, z stays constant in xi and eta direction. The desired distribution in x and z is input to the program. Consequently, only the y-coordinate is solved for by the program AFTBDY.
The neutron diffusion programs D3D and D3E solve the multigroup diffusion equations by the technique, outer iteration for the distribution of the source and inner iterations for the distribution of the group fluxes. The inner iterations consist of two nested block overrelaxations: with the planes as the blocks within the energy groups (called group iterations) and with tupels of lines as blocks within the planes (called plane iterations). The convergence behaviour of the outer iteration depends on the number of the group iterations, the behaviour of the group iterations on the number of the plane iterations. These numbers of the inner iterations are determined by upper bounds for the relative flux changes and, therefore, vary from outer iteration to outer iteration causing non-monotonic convergence behaviour and irregular jumps of numerical values of the outer iteration. This is demostrated by means of examples taken from the literature. In addition, ways to overcome such difficulties are indicated. (orig.)
Evaluating integration of inland bathymetry in the U.S. Geological Survey 3D Elevation Program, 2014
Miller-Corbett, Cynthia
2016-01-01
Inland bathymetry survey collections, survey data types, features, sources, availability, and the effort required to integrate inland bathymetric data into the U.S. Geological Survey 3D Elevation Program are assessed to help determine the feasibility of integrating three-dimensional water feature elevation data into The National Map. Available data from wading, acoustic, light detection and ranging, and combined technique surveys are provided by the U.S. Geological Survey, National Oceanic and Atmospheric Administration, U.S. Army Corps of Engineers, and other sources. Inland bathymetric data accessed through Web-hosted resources or contacts provide useful baseline parameters for evaluating survey types and techniques used for collection and processing, and serve as a basis for comparing survey methods and the quality of results. Historically, boat-mounted acoustic surveys have provided most inland bathymetry data. Light detection and ranging techniques that are beneficial in areas hard to reach by boat, that can collect dense data in shallow water to provide comprehensive coverage, and that can be cost effective for surveying large areas with good water clarity are becoming more common; however, optimal conditions and techniques for collecting and processing light detection and ranging inland bathymetry surveys are not yet well defined.Assessment of site condition parameters important for understanding inland bathymetry survey issues and results, and an evaluation of existing inland bathymetry survey coverage are proposed as steps to develop criteria for implementing a useful and successful inland bathymetry survey plan in the 3D Elevation Program. These survey parameters would also serve as input for an inland bathymetry survey data baseline. Integration and interpolation techniques are important factors to consider in developing a robust plan; however, available survey data are usually in a triangulated irregular network format or other format compatible with
VISUAL PARALLEL PROGRAMMING AS PAAS CLOUD SERVICE WITH GRAPH-SYMBOLIC PROGRAMMING TECHNOLOGY
EGOROVA DARYA; ZHIDCHENKO VICTOR
2015-01-01
In this paper we present the visual approach to parallel programming provided by Graph-Symbolic Programming Technology. The basics of this technology are described as well as advantages and disadvantages of visual parallel programming. The technology is being implemented as a PaaS cloud service that provides the tools for creation, validation and execution of parallel programs on cluster systems. The current state of this work is also presented.
Highlights: • Code works based on Monte Carlo and escape probability methods. • Sensitivity of Dancoff factor to number of energy groups and type and arrangement of neighbor’s fuels is considered. • Sensitivity of Dancoff factor to control rod’s height is considered. • Dancoff factor high efficiency is achieved versus method sampling neutron flight direction from the fuel surface. • Sensitivity of K to Dancoff factor is considered. - Abstract: Evaluation of multigroup constants in reactor calculations depends on several parameters, the Dancoff factor amid them is used for calculation of the resonance integral as well as flux depression in the resonance region in the heterogeneous systems. This paper focuses on the computer program (MCDAN-3D) developed for calculation of the multigroup black and gray Dancoff factor in three dimensional geometry based on Monte Carlo and escape probability methods. The developed program is capable to calculate the Dancoff factor for an arbitrary arrangement of fuel rods with different cylindrical fuel dimensions and control rods with various lengths inserted in the reactor core. The initiative calculates the black and gray Dancoff factor versus generated neutron flux in cosine and constant shapes in axial fuel direction. The effects of clad and moderator are followed by studying of Dancoff factor’s sensitivity with variation of fuel arrangements and neutron’s energy group for CANDU37 and VVER1000 fuel assemblies. MCDAN-3D outcomes poses excellent agreement with the MCNPX code. The calculated Dancoff factors are then used for cell criticality calculations by the WIMS code
Parallelization of detailed thermal-hydraulic analysis program SPIRAL
The detailed thermal-hydraulic analysis computer program APIRAL is under development for the evaluation of local flow and temperature fields in wire-wrapped fuel pin bundles deformed by the influence of high burn-up, which are hard to reveal by experiment due to measurement difficulty. The coupling utilization of this program and a subchannel analysis program can offer a practical method to evaluate thermal-hydraulic behavior in a whole fuel assembly with high accuracy. This report describes the parallelization of SPIRAL for improving applicability to larger numerical simulations. The domain decomposition method using overlapped elements was adopted to the parallelization because SPIRAL is based on finite element method and it can minimize the number of communications between processor elements. As a parallelization programming library, Massage Passing Interface (MPI) was applied. Several numerical simulations were carried out to verify the parallelized version of SPIRAL and to evaluate parallelization efficiency. From These simulation results, the validity of this version was confirmed. Although no good parallelization efficiency was obtained in the case of small scale simulations due to overhead processes, approximately twelve times processing speed was achieved by using 16 processor elements in larger scale simulations. (author)
Development of massively parallel quantum chemistry program SMASH
A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C150H30)2 with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer
Development of massively parallel quantum chemistry program SMASH
Ishimura, Kazuya [Department of Theoretical and Computational Molecular Science, Institute for Molecular Science 38 Nishigo-Naka, Myodaiji, Okazaki, Aichi 444-8585 (Japan)
2015-12-31
A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C{sub 150}H{sub 30}){sub 2} with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer.
Development of massively parallel quantum chemistry program SMASH
Ishimura, Kazuya
2015-12-01
A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C150H30)2 with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer.
Web Based Parallel Programming Workshop for Undergraduate Education.
Marcus, Robert L.; Robertson, Douglass
Central State University (Ohio), under a contract with Nichols Research Corporation, has developed a World Wide web based workshop on high performance computing entitled "IBN SP2 Parallel Programming Workshop." The research is part of the DoD (Department of Defense) High Performance Computing Modernization Program. The research activities included…
Protocol-Based Verification of Message-Passing Parallel Programs
López-Acosta, Hugo-Andrés; Eduardo R. B. Marques, Eduardo R. B.; Martins, Francisco;
2015-01-01
a protocol language based on a dependent type system for message-passing parallel programs, which includes various communication operators, such as point-to-point messages, broadcast, reduce, array scatter and gather. For the verification of a program against a given protocol, the protocol is first...
Frumkin, Michael; Yan, Jerry
1999-01-01
We present an HPF (High Performance Fortran) implementation of ARC3D code along with the profiling and performance data on SGI Origin 2000. Advantages and limitations of HPF as a parallel programming language for CFD applications are discussed. For achieving good performance results we used the data distributions optimized for implementation of implicit and explicit operators of the solver and boundary conditions. We compare the results with MPI and directive based implementations.
Álvaro Sánchez Climent
2014-10-01
Full Text Available Nowadays the free-software programs have been converted into the ideal tools for the archaeological researches, reaching the same level as other commercial programs. For that reason, the 3D modeling tool Blender has reached in the last years a great popularity offering similar characteristics like other commercial 3D editing programs such as 3D Studio Max or AutoCAD. Recently, it has been developed the necessary script for the volumetric calculations of three-dimnesional objects, offering great possibilities to calculate the volume of the archaeological ceramics. In this paper, we present a methodological approach for the volumetric studies with Blender and a study case of funerary urns from several celtiberians cemeteries of the Spanish Meseta. The goal is to demonstrate the great possibilities that the 3D editing free-software tools have in the volumetric studies at the present time.
Heterogeneous Multicore Parallel Programming for Graphics Processing Units
Francois Bodin; Stephane Bihan
2009-01-01
Hybrid parallel multicore architectures based on graphics processing units (GPUs) can provide tremendous computing power. Current NVIDIA and AMD Graphics Product Group hardware display a peak performance of hundreds of gigaflops. However, exploiting GPUs from existing applications is a difficult task that requires non-portable rewriting of the code. In this paper, we present HMPP, a Heterogeneous Multicore Parallel Programming workbench with compilers, developed by CAPS entreprise, that allow...
Parallel Programming with MatlabMPI
Kepner, Jeremy
2001-01-01
MatlabMPI is a Matlab implementation of the Message Passing Interface (MPI) standard and allows any Matlab program to exploit multiple processors. MatlabMPI currently implements the basic six functions that are the core of the MPI point-to-point communications standard. The key technical innovation of MatlabMPI is that it implements the widely used MPI ``look and feel'' on top of standard Matlab file I/O, resulting in an extremely compact (~100 lines) and ``pure'' implementation which runs an...
Optimized Parallel Execution of Declarative Programs on Distributed Memory Multiprocessors
沈美明; 田新民; 等
1993-01-01
In this paper,we focus on the compiling implementation of parlalel logic language PARLOG and functional language ML on distributed memory multiprocessors.Under the graph rewriting framework, a Heterogeneous Parallel Graph Rewriting Execution Model(HPGREM)is presented firstly.Then based on HPGREM,a parallel abstact machine PAM/TGR is described.Furthermore,several optimizing compilation schemes for executing declarative programs on transputer array are proposed.The performance statistics on transputer array demonstrate the effectiveness of our model,parallel abstract machine,optimizing compilation strategies and compiler.
On program restructuring, scheduling, and communication for parallel processor systems
Polychronopoulos, Constantine D.
1986-08-01
This dissertation discusses several software and hardware aspects of program execution on large-scale, high-performance parallel processor systems. The issues covered are program restructuring, partitioning, scheduling and interprocessor communication, synchronization, and hardware design issues of specialized units. All this work was performed focusing on a single goal: to maximize program speedup, or equivalently, to minimize parallel execution time. Parafrase, a Fortran restructuring compiler was used to transform programs in a parallel form and conduct experiments. Two new program restructuring techniques are presented, loop coalescing and subscript blocking. Compile-time and run-time scheduling schemes are covered extensively. Depending on the program construct, these algorithms generate optimal or near-optimal schedules. For the case of arbitrarily nested hybrid loops, two optimal scheduling algorithms for dynamic and static scheduling are presented. Simulation results are given for a new dynamic scheduling algorithm. The performance of this algorithm is compared to that of self-scheduling. Techniques for program partitioning and minimization of interprocessor communication for idealized program models and for real Fortran programs are also discussed. The close relationship between scheduling, interprocessor communication, and synchronization becomes apparent at several points in this work. Finally, the impact of various types of overhead on program speedup and experimental results are presented. 69 refs., 74 figs., 14 tabs.
Aboufadel, Edward F.
2014-01-01
The purpose of this short paper is to describe a project to manufacture a regular octohedron on a 3D printer. We assume that the reader is familiar with the basics of 3D printing. In the project, we use fundamental ideas to calculate the vertices and faces of an octohedron. Then, we utilize the OPENSCAD program to create a virtual 3D model and an STereoLithography (.stl) file that can be used by a 3D printer.
Structural synthesis of parallel programs (Methodology and Tools
G. Cejtlin
1995-11-01
Full Text Available Concepts of structured programming and propositional program logics were anticipated in the systems of algorithmic algebras (SAAs introduced by V.M.Glushkov in 1965. This paper correlates the SAA language with the known formalisms describing logic schemes of structured and unstructured programs. Complete axiomatics is constructed for modified SAAs (SAA-M oriented towards the formalization of parallel computations and abstract data types. An apparatus formalizing (top-down, bottom-up, or combined design of programs is suggested that incorporates SAA-M, grammar and automaton models oriented towards multiprocessing. A method and tools of parallel program design are developed. They feature orientation towards validation, transformations and synthesis of programs.
Programming frames for the efficient use of parallel systems
Thomas, Römke; Petit Silvestre, Jordi
1997-01-01
Frames will provide support for the programming of distributed memory machines via a library of basic algorithms, data structures and so-called programming frames (or frameworks). The latter are skeletons with problem dependent parameters to be provided by the users. Frames focuses on re-usability and portability as well as on small and easy-to-learn interfaces. Thus, expert and non-expert users will be provided with tools to program and exploit parallel machines efficiently...
Deductive Verification of Parallel Programs Using Why3
Santos, César; Martins, Francisco; Vasconcelos, Vasco Thudichum
2015-01-01
The Message Passing Interface specification (MPI) defines a portable message-passing API used to program parallel computers. MPI programs manifest a number of challenges on what concerns correctness: sent and expected values in communications may not match, resulting in incorrect computations possibly leading to crashes; and programs may deadlock resulting in wasted resources. Existing tools are not completely satisfactory: model-checking does not scale with the number of processes; testing t...
Nakazawa, Shohei
1991-01-01
Formulations and algorithms implemented in the MHOST finite element program are discussed. The code uses a novel concept of the mixed iterative solution technique for the efficient 3-D computations of turbine engine hot section components. The general framework of variational formulation and solution algorithms are discussed which were derived from the mixed three field Hu-Washizu principle. This formulation enables the use of nodal interpolation for coordinates, displacements, strains, and stresses. Algorithmic description of the mixed iterative method includes variations for the quasi static, transient dynamic and buckling analyses. The global-local analysis procedure referred to as the subelement refinement is developed in the framework of the mixed iterative solution, of which the detail is presented. The numerically integrated isoparametric elements implemented in the framework is discussed. Methods to filter certain parts of strain and project the element discontinuous quantities to the nodes are developed for a family of linear elements. Integration algorithms are described for linear and nonlinear equations included in MHOST program.
Development of LGA & LBE 2D Parallel Programs
Ujita, Hiroshi; Nagata, Satoru; Akiyama, Minoru; Naitoh, Masanori; Ohashi, Hirotada
A lattice-gas Automata two-dimensional program was developed for analysis of single and two-phase flow behaviors, to support the development of integrated software modules for Nuclear Power Plant mechanistic simulations. The program has single-color, which includes FHP I, II, and III models, two-color (Immiscible lattice gas), and two-velocity methods including a gravity effect model. Parameter surveys have been performed for Karman vortex street, two-phase separation for understanding flow regimes, and natural circulation flow for demonstrating passive reactor safety due to the chimney structure vessel. In addition, lattice-Boltzmann Equation two-dimensional programs were also developed. For analyzing single-phase flow behavior, a lattice-Boltzmann-BGK program was developed, which has multi-block treatments. A Finite Differential lattice-Boltzmann Equation program of parallelized version was introduced to analyze boiling two-phase flow behaviors. Parameter surveys have been performed for backward facing flow, Karman vortex street, bent piping flow with/without obstacles for piping system applications, flow in the porous media for demonstrating porous debris coolability, Couette flow, and spinodal decomposition to understand basic phase separation mechanisms. Parallelization was completed by using a domain decomposition method for all of the programs. An increase in calculation speed of at least 25 times, by parallel processing on 32 processors, demonstrated high parallelization efficiency. Application fields for microscopic model simulation to hypothetical severe conditions in large plants were also discussed.
Basic design of parallel computational program for probabilistic structural analysis
In our laboratory, for 'development of damage evaluation method of structural brittle materials by microscopic fracture mechanics and probabilistic theory' (nuclear computational science cross-over research) we examine computational method related to super parallel computation system which is coupled with material strength theory based on microscopic fracture mechanics for latent cracks and continuum structural model to develop new structural reliability evaluation methods for ceramic structures. This technical report is the review results regarding probabilistic structural mechanics theory, basic terms of formula and program methods of parallel computation which are related to principal terms in basic design of computational mechanics program. (author)
JCOGIN. A parallel programming infrastructure for Monte Carlo particle transport
The advantages of the Monte Carlo method for reactor analysis are well known, but the full-core reactor analysis challenges the computational time and computer memory. Meanwhile, the exponential growth of computer power in the last 10 years is now creating a great opportunity for large scale parallel computing on the Monte Carlo full-core reactor analysis. In this paper, a parallel programming infrastructure is introduced for Monte Carlo particle transport, named JCOGIN, which aims at accelerating the development of Monte Carlo codes for the large scale parallelism simulations of the full-core reactor. Now, JCOGIN implements the hybrid parallelism of the spatial decomposition and the traditional particle parallelism on MPI and OpenMP. Finally, JMCT code is developed on JCOGIN, which reaches the parallel efficiency of 70% on 20480 cores for fixed source problem. By the hybrid parallelism, the full-core pin-by-pin simulation of the Dayawan reactor was implemented, with the number of the cells up to 10 million and the tallies of the fluxes utilizing over 40GB of memory. (author)
Serdal Baltaci
2015-01-01
Full Text Available Each new version of the GeoGebra dynamic mathematics software goes through updates and innovations. One of these innovations is the GeoGebra 5.0 version. This version aims to facilitate 3D instruction by offering opportunities for students to analyze 3D objects. While scanning the previous studies of GeoGebra 3D, it is seen that they mainly focus on the visualization of a problem in daily life and the dimensions of the evaluation of the process of problem solving with various variables. Therefore, this research problem was determined to reveal the opinions of pre-service elementary mathematics teachers who can use multiple software programs very well, about the usability of GeoGebra 3D. Compared to other studies conducted in this field, this study is thought to add a new dimension to the literature on GeoGebra 3D because the participants in the study had received training in using the Derive, Cabri, Cabri 3D, GeoGebra and GeoGebra 3D programs and had developed activities throughout their undergraduate programs and in some cases they were held responsible for those programs in their exams. In this research, we used the method of case study. The participants consisted of five elementary pre-service mathematics teachers who were enrolled in fourth year courses. We employed semi-structured interviews to collect data. It is concluded that pre-service elementary mathematics teachers expressed a great deal of opinions about the positive contribution of the GeoGebra 3D dynamic mathematics software.
X3D: Extensible 3D Graphics Standard
Daly, Leonard; Brutzman, Don
2007-01-01
The article of record as published may be located at http://dx.doi.org/10.1109/MSP.2007.905889 Extensible 3D (X3D) is the open standard for Web-delivered three-dimensional (3D) graphics. It specifies a declarative geometry definition language, a run-time engine, and an application program interface (API) that provide an interactive, animated, real-time environment for 3D graphics. The X3D specification documents are freely available, the standard can be used without paying any royalties,...
Hernandez, N.; Alonso, G. [ININ, A.P. 18-1027, 11801 Mexico D.F. (Mexico)]. E-mail: nhm@nuclear.inin.mx; Valle, E. del [IPN, ESFM, 07738 Mexico D.F. (Mexico)
2004-07-01
The development of own codes that you/they allow the simulation in 3 dimensions of the nucleus of a reactor and be of easy maintenance, without the consequent payment of expensive use licenses, it can be a factor that propitiates the technological independence. In the Department of Nuclear Engineering (DIN) of the Superior School of Physics and Mathematics (ESFM) of the National Polytechnic Institute (IPN) a denominated program Nod3D has been developed with the one that one can simulate the operation of a reactor BWR in 3 dimensions calculating the effective multiplication factor (kJJ3, as well as the distribution of the flow neutronic and of the axial and radial profiles of the power, inside a means of well-known characteristics solving the equations of diffusion of neutrons numerically in stationary state and geometry XYZ using the mathematical nodal method RTN0 (Raviart-Thomas-Nedelec of index zero). One of the limitations of the program Nod3D is that it doesn't allow to consider the burnt of the fuel in an independent way considering feedback, this makes it in an implicit way considering the effective sections in each step of burnt and these sections are obtained of the code Core Master LEND. However even given this limitation, the results obtained in the simulation of a cycle of typical operation of a reactor of the type BWR are similar to those reported by the code Core Master LENDS. The results of the keJ - that were obtained with the program Nod3D they were compared with the results of the code Core Master LEND, presenting a difference smaller than 0.2% (200 pcm), and in the case of the axial profile of power, the maxim differs it was of 2.5%. (Author)
Shih, T. I.-P.; Bailey, R. T.; Nguyen, H. L.; Roelke, R. J.
1990-01-01
An efficient computer program, called GRID2D/3D was developed to generate single and composite grid systems within geometrically complex two- and three-dimensional (2- and 3-D) spatial domains that can deform with time. GRID2D/3D generates single grid systems by using algebraic grid generation methods based on transfinite interpolation in which the distribution of grid points within the spatial domain is controlled by stretching functions. All single grid systems generated by GRID2D/3D can have grid lines that are continuous and differentiable everywhere up to the second-order. Also, grid lines can intersect boundaries of the spatial domain orthogonally. GRID2D/3D generates composite grid systems by patching together two or more single grid systems. The patching can be discontinuous or continuous. For continuous composite grid systems, the grid lines are continuous and differentiable everywhere up to the second-order except at interfaces where different single grid systems meet. At interfaces where different single grid systems meet, the grid lines are only differentiable up to the first-order. For 2-D spatial domains, the boundary curves are described by using either cubic or tension spline interpolation. For 3-D spatial domains, the boundary surfaces are described by using either linear Coon's interpolation, bi-hyperbolic spline interpolation, or a new technique referred to as 3-D bi-directional Hermite interpolation. Since grid systems generated by algebraic methods can have grid lines that overlap one another, GRID2D/3D contains a graphics package for evaluating the grid systems generated. With the graphics package, the user can generate grid systems in an interactive manner with the grid generation part of GRID2D/3D. GRID2D/3D is written in FORTRAN 77 and can be run on any IBM PC, XT, or AT compatible computer. In order to use GRID2D/3D on workstations or mainframe computers, some minor modifications must be made in the graphics part of the program; no
We describe an important addition to the parallel implementation of our generalized nonlocal thermodynamic equilibrium (NLTE) stellar atmosphere and radiative transfer computer program PHOENIX. In a previous paper in this series we described data and task parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. These algorithms divided the work spatially or by spectral lines, that is, distributing the radial zones, individual spectral lines, or characteristic rays among different processors and employ, in addition, task parallelism for logically independent functions (such as atomic and molecular line opacities). For finite, monotonic velocity fields, the radiative transfer equation is an initial value problem in wavelength, and hence each wavelength point depends upon the previous one. However, for sophisticated NLTE models of both static and moving atmospheres needed to accurately describe, e.g., novae and supernovae, the number of wavelength points is very large (200,000 - 300,000) and hence parallelization over wavelength can lead both to considerable speedup in calculation time and the ability to make use of the aggregate memory available on massively parallel supercomputers. Here, we describe an implementation of a pipelined design for the wavelength parallelization of PHOENIX, where the necessary data from the processor working on a previous wavelength point is sent to the processor working on the succeeding wavelength point as soon as it is known. Our implementation uses a MIMD design based on a relatively small number of standard message passing interface (MPI) library calls and is fully portable between serial and parallel computers. copyright 1998 The American Astronomical Society
Parallel dynamic programming for on-line flight path optimization
Slater, G. L.; Hu, K.
1989-01-01
Parallel systolic algorithms for dynamic programming(DP) and their respective hardware implementations are presented for a problem in on-line trajectory optimization. The method is applied to a model for helicopter flight path optimization through a complex constraint region. This problem has application to an air traffic control problem and also to a terrain following/threat avoidance problem.
Session-Based Programming for Parallel Algorithms: Expressiveness and Performance
Andi Bejleri; Raymond Hu; Nobuko Yoshida
2010-01-01
This paper investigates session programming and typing of benchmark examples to compare productivity, safety and performance with other communications programming languages. Parallel algorithms are used to examine the above aspects due to their extensive use of message passing for interaction, and their increasing prominence in algorithmic research with the rising availability of hardware resources such as multicore machines and clusters. We contribute new benchmark results for SJ, an extensi...
Oberholzer, K.; Romaneehsen, B.; Kunz, P.; Thelen, M.; Kreitner, K.F. [Klinik fuer Radiologie, Johannes Gutenberg-Univ. Mainz (Germany); Kramm, T. [Klinik fuer Herz-, Thorax- und Gefaesschirurgie, Johannes Gutenberg-Univ. Mainz (Germany)
2004-04-01
Purpose: Comparison of two different types of contrast-enhanced 3D-MR angiography (CE-MRA) with integrated parallel acquisition technique (iPAT) in patients with chronic-thromboembolic pulmonary hypertension (CTEPH) and evaluation whether sagittal acquisition with higher resolution and minimized acquisition time is superior to common coronal orientation. Materials and Methods: CE-MRA was performed on 15 patients with CTEPH preoperatively and on 10 patients also postoperatively, while 5 other patients received only a postoperative MRA. All 30 MR studies with one coronal and two sagittal acquisitions were blindly evaluated and compared. The resolution of coronal and sagittal MRA was 1.3 x 0.6 x 1.4 mm{sup 3} and 1.2 x 1.2 x 1.2 mm{sup 3}, and acquisition time 20 and 17 sec (iPAT factor 2, GRAPPA), respectively. Image quality, coverage of the pulmonary arteries, delineation of patent segmental and subsegmental vessels and pathological findings were assessed. A total of 1980 vessels were evaluated. Results: Sagittal 3D-MRA was superior in overall image quality and complete coverage of the vessels compared to coronal MRA, 18% of subsegmental and 4.3% of segmental arteries as well as 1.1% of the lobar vessels were not covered by coronal acquisition. Only 0.5% of sagittal subsegments were missed. The number of depicted patent segmental and subsegmental arteries was higher in sagittal MRA (460 vs 489 and 573 vs. 649, respectively), the total difference of patent vessels was 105. Sagittal MRA revealed more pathological findings in segmental arteries (especially thrombotic material and stenoses). (orig.) [German] Ziel: Vergleich zweier kontrastmittelverstaerkter MR-Angiographie-Techniken der Pulmonalarterien mit integrierter paralleler Akquisitionstechnik (iPAT) bei Patienten mit chronisch-thromboembolischer pulmonaler Hypertonie (CTEPH), Ueberpruefung der Hypothese, dass mit sagittaler Datenaufnahme eine bessere Bildqualitaet und Detailerkennbarkeit durch hoehere Aufloesung
Heterogeneous Multicore Parallel Programming for Graphics Processing Units
Francois Bodin
2009-01-01
Full Text Available Hybrid parallel multicore architectures based on graphics processing units (GPUs can provide tremendous computing power. Current NVIDIA and AMD Graphics Product Group hardware display a peak performance of hundreds of gigaflops. However, exploiting GPUs from existing applications is a difficult task that requires non-portable rewriting of the code. In this paper, we present HMPP, a Heterogeneous Multicore Parallel Programming workbench with compilers, developed by CAPS entreprise, that allows the integration of heterogeneous hardware accelerators in a unintrusive manner while preserving the legacy code.
Advanced parallel programming models research and development opportunities.
Wen, Zhaofang.; Brightwell, Ronald Brian
2004-07-01
There is currently a large research and development effort within the high-performance computing community on advanced parallel programming models. This research can potentially have an impact on parallel applications, system software, and computing architectures in the next several years. Given Sandia's expertise and unique perspective in these areas, particularly on very large-scale systems, there are many areas in which Sandia can contribute to this effort. This technical report provides a survey of past and present parallel programming model research projects and provides a detailed description of the Partitioned Global Address Space (PGAS) programming model. The PGAS model may offer several improvements over the traditional distributed memory message passing model, which is the dominant model currently being used at Sandia. This technical report discusses these potential benefits and outlines specific areas where Sandia's expertise could contribute to current research activities. In particular, we describe several projects in the areas of high-performance networking, operating systems and parallel runtime systems, compilers, application development, and performance evaluation.
Efficient Thread Labeling for Monitoring Programs with Nested Parallelism
Ha, Ok-Kyoon; Kim, Sun-Sook; Jun, Yong-Kee
It is difficult and cumbersome to detect data races occurred in an execution of parallel programs. Any on-the-fly race detection techniques using Lamport's happened-before relation needs a thread labeling scheme for generating unique identifiers which maintain logical concurrency information for the parallel threads. NR labeling is an efficient thread labeling scheme for the fork-join program model with nested parallelism, because its efficiency depends only on the nesting depth for every fork and join operation. This paper presents an improved NR labeling, called e-NR labeling, in which every thread generates its label by inheriting the pointer to its ancestor list from the parent threads or by updating the pointer in a constant amount of time and space. This labeling is more efficient than the NR labeling, because its efficiency does not depend on the nesting depth for every fork and join operation. Some experiments were performed with OpenMP programs having nesting depths of three or four and maximum parallelisms varying from 10,000 to 1,000,000. The results show that e-NR is 5 times faster than NR labeling and 4.3 times faster than OS labeling in the average time for creating and maintaining the thread labels. In average space required for labeling, it is 3.5 times smaller than NR labeling and 3 times smaller than OS labeling.
Metere, Alfredo; Dzugutov, Mikhail
2015-01-01
We present a new program able to perform unique visual analysis on generic particle systems: PASYVAT (PArticle SYstem Visual Analysis Tool). More specifically, it can perform a selection of multiple interparticle distance ranges from a radial distribution function (RDF) plot and display them in 3D as bonds. This software can be used with any data set representing a system of particles in 3D. In this manuscript the reader will find a description of the program and its internal structure, with emphasis on its applicability in the study of certain particle configurations, obtained from classical molecular dynamics simulation in condensed matter physics.
CaKernel – A Parallel Application Programming Framework for Heterogenous Computing Architectures
Marek Blazewicz
2011-01-01
Full Text Available With the recent advent of new heterogeneous computing architectures there is still a lack of parallel problem solving environments that can help scientists to use easily and efficiently hybrid supercomputers. Many scientific simulations that use structured grids to solve partial differential equations in fact rely on stencil computations. Stencil computations have become crucial in solving many challenging problems in various domains, e.g., engineering or physics. Although many parallel stencil computing approaches have been proposed, in most cases they solve only particular problems. As a result, scientists are struggling when it comes to the subject of implementing a new stencil-based simulation, especially on high performance hybrid supercomputers. In response to the presented need we extend our previous work on a parallel programming framework for CUDA – CaCUDA that now supports OpenCL. We present CaKernel – a tool that simplifies the development of parallel scientific applications on hybrid systems. CaKernel is built on the highly scalable and portable Cactus framework. In the CaKernel framework, Cactus manages the inter-process communication via MPI while CaKernel manages the code running on Graphics Processing Units (GPUs and interactions between them. As a non-trivial test case we have developed a 3D CFD code to demonstrate the performance and scalability of the automatically generated code.
Parallel Libraries to support High-Level Programming
Larsen, Morten Nørgaard
a new programming language or at least forces them to learn new methods and/or ways of writing code. For the first part, this thesis will focus on simplifying the task of writing parallel programs for programmers, but especially for the large group of noncomputer scientists. I will start by presenting...... so is not a simple task and for many non-computer scientists, like chemists and physicists writing programs for simulating their experiments, the task can easily become overwhelming. During the last decades, a lot of research efforts have been put into how to create tools that will simplify writing...
Programming Massively Parallel Architectures using MARTE: a Case Study
Rodrigues, Wendell; Dekeyser, Jean-Luc
2011-01-01
Nowadays, several industrial applications are being ported to parallel architectures. These applications take advantage of the potential parallelism provided by multiple core processors. Many-core processors, especially the GPUs(Graphics Processing Unit), have led the race of floating-point performance since 2003. While the performance improvement of general- purpose microprocessors has slowed significantly, the GPUs have continued to improve relentlessly. As of 2009, the ratio between many-core GPUs and multicore CPUs for peak floating-point calculation throughput is about 10 times. However, as parallel programming requires a non-trivial distribution of tasks and data, developers find it hard to implement their applications effectively. Aiming to improve the use of many-core processors, this work presents an case-study using UML and MARTE profile to specify and generate OpenCL code for intensive signal processing applications. Benchmark results show us the viability of the use of MDE approaches to generate G...
Testing New Programming Paradigms with NAS Parallel Benchmarks
Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.
2000-01-01
Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage
Scientific programming on massively parallel processor CP-PACS
The massively parallel processor CP-PACS takes various problems of calculation physics as the object, and it has been designed so that its architecture has been devised to do various numerical processings. In this report, the outline of the CP-PACS and the example of programming in the Kernel CG benchmark in NAS Parallel Benchmarks, version 1, are shown, and the pseudo vector processing mechanism and the parallel processing tuning of scientific and technical computation utilizing the three-dimensional hyper crossbar net, which are two great features of the architecture of the CP-PACS are described. As for the CP-PACS, the PUs based on RISC processor and added with pseudo vector processor are used. Pseudo vector processing is realized as the loop processing by scalar command. The features of the connection net of PUs are explained. The algorithm of the NPB version 1 Kernel CG is shown. The part that takes the time for processing most in the main loop is the product of matrix and vector (matvec), and the parallel processing of the matvec is explained. The time for the computation by the CPU is determined. As the evaluation of the performance, the evaluation of the time for execution, the short vector processing of pseudo vector processor based on slide window, and the comparison with other parallel computers are reported. (K.I.)
On the utility of threads for data parallel programming
Fahringer, Thomas; Haines, Matthew; Mehrotra, Piyush
1995-01-01
Threads provide a useful programming model for asynchronous behavior because of their ability to encapsulate units of work that can then be scheduled for execution at runtime, based on the dynamic state of a system. Recently, the threaded model has been applied to the domain of data parallel scientific codes, and initial reports indicate that the threaded model can produce performance gains over non-threaded approaches, primarily through the use of overlapping useful computation with communication latency. However, overlapping computation with communication is possible without the benefit of threads if the communication system supports asynchronous primitives, and this comparison has not been made in previous papers. This paper provides a critical look at the utility of lightweight threads as applied to data parallel scientific programming.
Final Report: Center for Programming Models for Scalable Parallel Computing
Mellor-Crummey, John [William Marsh Rice University
2011-09-13
As part of the Center for Programming Models for Scalable Parallel Computing, Rice University collaborated with project partners in the design, development and deployment of language, compiler, and runtime support for parallel programming models to support application development for the “leadership-class” computer systems at DOE national laboratories. Work over the course of this project has focused on the design, implementation, and evaluation of a second-generation version of Coarray Fortran. Research and development efforts of the project have focused on the CAF 2.0 language, compiler, runtime system, and supporting infrastructure. This has involved working with the teams that provide infrastructure for CAF that we rely on, implementing new language and runtime features, producing an open source compiler that enabled us to evaluate our ideas, and evaluating our design and implementation through the use of benchmarks. The report details the research, development, findings, and conclusions from this work.
3D Elevation Program—Virtual USA in 3D
Lukas, Vicki; Stoker, J.M.
2016-01-01
The U.S. Geological Survey (USGS) 3D Elevation Program (3DEP) uses a laser system called ‘lidar’ (light detection and ranging) to create a virtual reality map of the Nation that is very accurate. 3D maps have many uses with new uses being discovered all the time.
07361 Abstracts Collection -- Programming Models for Ubiquitous Parallelism
Wong, David Chi-Leung; Cohen, Albert; Garzarán, María J.; Lengauer, Christian; Midkiff, Samuel P.
2008-01-01
From 02.09. to 07.09.2007, the Dagstuhl Seminar 07361 ``Programming Models for Ubiquitous Parallelism'' was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section des...
MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems
Taft, James R.
1999-01-01
Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.
Organization of an interphase system for the coupling of WINS-D4 and SNAP-3D programs
In this report a modular system developed for the CC-1 critical assembly's physical calculation is described. It was based upon the WINS-D4 and SNAP-3D codes, which are coupled by means of an interphase module and a groups diffusion cross sections library
A Portable Debugger for Parallel and Distributed Programs
Cheng, Doreen Y.; Hood, Robert; Cooper, D. M. (Technical Monitor)
1994-01-01
In this paper, we describe the design and implementation of a portable debugger for parallel and distributed programs. The design incorporates a client-server model in order to isolate non-portable debugger code from the user interface. The precise definition of a protocol for client-server interaction permits a high degree of portability of the client user interface. Replication of server components permits the implementation of a debugger for distributed computations. Portability across message passing implementations is achieved with a protocol that dictates the interaction between a message passing library and the debugger. This permits the same debugger to be used both on PVM and MTI programs. The process abstractions used for debugging message-passing programs can be easily adapted to debug HPF programs at the source level. This allows the debugger to present information hidden in tool-generated code in a meaningful manner.
Session-Based Programming for Parallel Algorithms: Expressiveness and Performance
Bejleri, Andi; Yoshida, Nobuko; 10.4204/EPTCS.17.2
2010-01-01
This paper investigates session programming and typing of benchmark examples to compare productivity, safety and performance with other communications programming languages. Parallel algorithms are used to examine the above aspects due to their extensive use of message passing for interaction, and their increasing prominence in algorithmic research with the rising availability of hardware resources such as multicore machines and clusters. We contribute new benchmark results for SJ, an extension of Java for type-safe, binary session programming, against MPJ Express, a Java messaging system based on the MPI standard. In conclusion, we observe that (1) despite rich libraries and functionality, MPI remains a low-level API, and can suffer from commonly perceived disadvantages of explicit message passing such as deadlocks and unexpected message types, and (2) the benefits of high-level session abstraction, which has significant impact on program structure to improve readability and reliability, and session type-saf...
Zheng, Xiang
2015-03-01
We present a numerical algorithm for simulating the spinodal decomposition described by the three dimensional Cahn-Hilliard-Cook (CHC) equation, which is a fourth-order stochastic partial differential equation with a noise term. The equation is discretized in space and time based on a fully implicit, cell-centered finite difference scheme, with an adaptive time-stepping strategy designed to accelerate the progress to equilibrium. At each time step, a parallel Newton-Krylov-Schwarz algorithm is used to solve the nonlinear system. We discuss various numerical and computational challenges associated with the method. The numerical scheme is validated by a comparison with an explicit scheme of high accuracy (and unreasonably high cost). We present steady state solutions of the CHC equation in two and three dimensions. The effect of the thermal fluctuation on the spinodal decomposition process is studied. We show that the existence of the thermal fluctuation accelerates the spinodal decomposition process and that the final steady morphology is sensitive to the stochastic noise. We also show the evolution of the energies and statistical moments. In terms of the parallel performance, it is found that the implicit domain decomposition approach scales well on supercomputers with a large number of processors. © 2015 Elsevier Inc.
We present a numerical algorithm for simulating the spinodal decomposition described by the three dimensional Cahn–Hilliard–Cook (CHC) equation, which is a fourth-order stochastic partial differential equation with a noise term. The equation is discretized in space and time based on a fully implicit, cell-centered finite difference scheme, with an adaptive time-stepping strategy designed to accelerate the progress to equilibrium. At each time step, a parallel Newton–Krylov–Schwarz algorithm is used to solve the nonlinear system. We discuss various numerical and computational challenges associated with the method. The numerical scheme is validated by a comparison with an explicit scheme of high accuracy (and unreasonably high cost). We present steady state solutions of the CHC equation in two and three dimensions. The effect of the thermal fluctuation on the spinodal decomposition process is studied. We show that the existence of the thermal fluctuation accelerates the spinodal decomposition process and that the final steady morphology is sensitive to the stochastic noise. We also show the evolution of the energies and statistical moments. In terms of the parallel performance, it is found that the implicit domain decomposition approach scales well on supercomputers with a large number of processors
Parallelizing Deadlock Resolution in Symbolic Synthesis of Distributed Programs
Fuad Abujarad
2009-12-01
Full Text Available Previous work has shown that there are two major complexity barriers in the synthesis of fault-tolerant distributed programs: (1 generation of fault-span, the set of states reachable in the presence of faults, and (2 resolving deadlock states, from where the program has no outgoing transitions. Of these, the former closely resembles with model checking and, hence, techniques for efficient verification are directly applicable to it. Hence, we focus on expediting the latter with the use of multi-core technology. We present two approaches for parallelization by considering different design choices. The first approach is based on the computation of equivalence classes of program transitions (called group computation that are needed due to the issue of distribution (i.e., inability of processes to atomically read and write all program variables. We show that in most cases the speedup of this approach is close to the ideal speedup and in some cases it is superlinear. The second approach uses traditional technique of partitioning deadlock states among multiple threads. However, our experiments show that the speedup for this approach is small. Consequently, our analysis demonstrates that a simple approach of parallelizing the group computation is likely to be the effective method for using multi-core computing in the context of deadlock resolution.
Users manual for the Chameleon parallel programming tools
Gropp, W. [Argonne National Lab., IL (United States); Smith, B. [Univ. of California, Los Angeles, CA (United States). Dept. of Mathematics
1993-06-01
Message passing is a common method for writing programs for distributed-memory parallel computers. Unfortunately, the lack of a standard for message passing has hampered the construction of portable and efficient parallel programs. In an attempt to remedy this problem, a number of groups have developed their own message-passing systems, each with its own strengths and weaknesses. Chameleon is a second-generation system of this type. Rather than replacing these existing systems, Chameleon is meant to supplement them by providing a uniform way to access many of these systems. Chameleon`s goals are to (a) be very lightweight (low over-head), (b) be highly portable, and (c) help standardize program startup and the use of emerging message-passing operations such as collective operations on subsets of processors. Chameleon also provides a way to port programs written using PICL or Intel NX message passing to other systems, including collections of workstations. Chameleon is tracking the Message-Passing Interface (MPI) draft standard and will provide both an MPI implementation and an MPI transport layer. Chameleon provides support for heterogeneous computing by using p4 and PVM. Chameleon`s support for homogeneous computing includes the portable libraries p4, PICL, and PVM and vendor-specific implementation for Intel NX, IBM EUI (SP-1), and Thinking Machines CMMD (CM-5). Support for Ncube and PVM 3.x is also under development.
Advanced Programming Platform for efficient use of Data Parallel Hardware
Cabellos, Luis
2012-01-01
Graphics processing units (GPU) had evolved from a specialized hardware capable to render high quality graphics in games to a commodity hardware for effective processing blocks of data in a parallel schema. This evolution is particularly interesting for scientific groups, which traditionally use mainly CPU as a work horse, and now can profit of the arrival of GPU hardware to HPC clusters. This new GPU hardware promises a boost in peak performance, but it is not trivial to use. In this article a programming platform designed to promote a direct use of this specialized hardware is presented. This platform includes a visual editor of parallel data flows and it is oriented to the execution in distributed clusters with GPUs. Examples of application in two characteristic problems, Fast Fourier Transform and Image Compression, are also shown.
NIF Ignition Target 3D Point Design
Jones, O; Marinak, M; Milovich, J; Callahan, D
2008-11-05
We have developed an input file for running 3D NIF hohlraums that is optimized such that it can be run in 1-2 days on parallel computers. We have incorporated increasing levels of automation into the 3D input file: (1) Configuration controlled input files; (2) Common file for 2D and 3D, different types of capsules (symcap, etc.); and (3) Can obtain target dimensions, laser pulse, and diagnostics settings automatically from NIF Campaign Management Tool. Using 3D Hydra calculations to investigate different problems: (1) Intrinsic 3D asymmetry; (2) Tolerance to nonideal 3D effects (e.g. laser power balance, pointing errors); and (3) Synthetic diagnostics.
SIMULATION PROCESS OF REMOVING NON-METALLIC INCLUSIONS IN ALUMINUM ALLOYS USING THE PROGRAM FLOW-3D
N. V. Sletova; I. N. Volnov; S. P. Zadrutsky; V. A. Chaikin
2015-01-01
The perspective materials for making fining preparations for the silumins are the calcium and strontium carbonates from the environmental safety point of view are shown. Principle possibility of using dispersed carbonates in the fining mixtures is confirmed by late inoculation process research using simulation FLOW-3D.The high efficiency of the fining mixture with the inoculants effect is confirmed by the industrial tests
SIMULATION PROCESS OF REMOVING NON-METALLIC INCLUSIONS IN ALUMINUM ALLOYS USING THE PROGRAM FLOW-3D
N. V. Sletova
2015-05-01
Full Text Available The perspective materials for making fining preparations for the silumins are the calcium and strontium carbonates from the environmental safety point of view are shown. Principle possibility of using dispersed carbonates in the fining mixtures is confirmed by late inoculation process research using simulation FLOW-3D.The high efficiency of the fining mixture with the inoculants effect is confirmed by the industrial tests
Feedback Driven Annotation and Refactoring of Parallel Programs
Larsen, Per
performes signicantly faster - up to 12.5x - after modication directed by the compilation feedback system. The last aspect is renement of compilation feedback. Out of numerous issues reported, few are important to solve. Different compilers and compilation flags are used to estimate whether an issue can be......This thesis combines programmer knowledge and feedback to improve modeling and optimization of software. The research is motivated by two observations. First, there is a great need for automatic analysis of software for embedded systems - to expose and model parallelism inherent in programs. Second...... communication in embedded programs. Runtime checks are developed to ensure that annotations correctly describe observable program behavior. The performance impact of runtime checking is evaluated on several benchmark kernels and is negligible in all cases. The second aspect is compilation feedback. Annotations...
Automatic Performance Debugging of SPMD-style Parallel Programs
Liu, Xu; Zhan, Kunlin; Shi, Weisong; Yuan, Lin; Meng, Dan; Wang, Lei
2011-01-01
The simple program and multiple data (SPMD) programming model is widely used for both high performance computing and Cloud computing. In this paper, we design and implement an innovative system, AutoAnalyzer, that automates the process of debugging performance problems of SPMD-style parallel programs, including data collection, performance behavior analysis, locating bottlenecks, and uncovering their root causes. AutoAnalyzer is unique in terms of two features: first, without any apriori knowledge, it automatically locates bottlenecks and uncovers their root causes for performance optimization; second, it is lightweight in terms of the size of performance data to be collected and analyzed. Our contributions are three-fold: first, we propose two effective clustering algorithms to investigate the existence of performance bottlenecks that cause process behavior dissimilarity or code region behavior disparity, respectively; meanwhile, we present two searching algorithms to locate bottlenecks; second, on a basis o...
Lucas, Laurent; Loscos, Céline
2013-01-01
While 3D vision has existed for many years, the use of 3D cameras and video-based modeling by the film industry has induced an explosion of interest for 3D acquisition technology, 3D content and 3D displays. As such, 3D video has become one of the new technology trends of this century.The chapters in this book cover a large spectrum of areas connected to 3D video, which are presented both theoretically and technologically, while taking into account both physiological and perceptual aspects. Stepping away from traditional 3D vision, the authors, all currently involved in these areas, provide th
Beane, Andy
2012-01-01
The essential fundamentals of 3D animation for aspiring 3D artists 3D is everywhere--video games, movie and television special effects, mobile devices, etc. Many aspiring artists and animators have grown up with 3D and computers, and naturally gravitate to this field as their area of interest. Bringing a blend of studio and classroom experience to offer you thorough coverage of the 3D animation industry, this must-have book shows you what it takes to create compelling and realistic 3D imagery. Serves as the first step to understanding the language of 3D and computer graphics (CG)Covers 3D anim
NavP: Structured and Multithreaded Distributed Parallel Programming
Pan, Lei
2007-01-01
We present Navigational Programming (NavP) -- a distributed parallel programming methodology based on the principles of migrating computations and multithreading. The four major steps of NavP are: (1) Distribute the data using the data communication pattern in a given algorithm; (2) Insert navigational commands for the computation to migrate and follow large-sized distributed data; (3) Cut the sequential migrating thread and construct a mobile pipeline; and (4) Loop back for refinement. NavP is significantly different from the current prevailing Message Passing (MP) approach. The advantages of NavP include: (1) NavP is structured distributed programming and it does not change the code structure of an original algorithm. This is in sharp contrast to MP as MP implementations in general do not resemble the original sequential code; (2) NavP implementations are always competitive with the best MPI implementations in terms of performance. Approaches such as DSM or HPF have failed to deliver satisfying performance as of today in contrast, even if they are relatively easy to use compared to MP; (3) NavP provides incremental parallelization, which is beyond the reach of MP; and (4) NavP is a unifying approach that allows us to exploit both fine- (multithreading on shared memory) and coarse- (pipelined tasks on distributed memory) grained parallelism. This is in contrast to the currently popular hybrid use of MP+OpenMP, which is known to be complex to use. We present experimental results that demonstrate the effectiveness of NavP.
The driver linac of the proposed Rare Isotope Accelerator (RIA) requires a great variety of high intensity, high charge state ion beams. In order to design and to optimize the low energy beamline optics of the RIA front end,we have developed a new parallel three-dimensional model to simulate the low energy, multi-species ion beam formation and transport from the ECR ion source extraction region to the focal plane of the analyzing magnet. A multisection overlapped computational domain has been used to break the original transport system into a number of each subsystem, macro-particle tracking is used to obtain the charge density distribution in this subdomain. The three-dimensional Poisson equation is solved within the subdomain and particle tracking is repeated until the solution converges. Two new Poisson solvers based on a combination of the spectral method and the multigrid method have been developed to solve the Poisson equation in cylindrical coordinates for the beam extraction region and in the Frenet-Serret coordinates for the bending magnet region. Some test examples and initial applications will also be presented
A scalable parallel algorithm for multiple objective linear programs
Wiecek, Malgorzata M.; Zhang, Hong
1994-01-01
This paper presents an ADBASE-based parallel algorithm for solving multiple objective linear programs (MOLP's). Job balance, speedup and scalability are of primary interest in evaluating efficiency of the new algorithm. Implementation results on Intel iPSC/2 and Paragon multiprocessors show that the algorithm significantly speeds up the process of solving MOLP's, which is understood as generating all or some efficient extreme points and unbounded efficient edges. The algorithm gives specially good results for large and very large problems. Motivation and justification for solving such large MOLP's are also included.
Parallelization and checkpointing of GPU applications through program transformation
Solano-Quinde, Lizandro Damian [Iowa State Univ., Ames, IA (United States)
2012-01-01
GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general purpose applications. Among the areas that have benefited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, and, in general, the High Performance Computing (HPC) Industry. In order to continue to exploit higher levels of parallelism with GPUs, multi-GPU systems are gaining popularity. In this context, single-GPU applications are parallelized for running in multi-GPU systems. Furthermore, multi-GPU systems help to solve the GPU memory limitation for applications with large application memory footprint. Parallelizing single-GPU applications has been approached by libraries that distribute the workload at runtime, however, they impose execution overhead and are not portable. On the other hand, on traditional CPU systems, parallelization has been approached through application transformation at pre-compile time, which enhances the application to distribute the workload at application level and does not have the issues of library-based approaches. Hence, a parallelization scheme for GPU systems based on application transformation is needed. Like any computing engine of today, reliability is also a concern in GPUs. GPUs are vulnerable to transient and permanent failures. Current checkpoint/restart techniques are not suitable for systems with GPUs. Checkpointing for GPU systems present new and interesting challenges, primarily due to the natural differences imposed by the hardware design, the memory subsystem architecture, the massive number of threads, and the limited amount of synchronization among threads. Therefore, a checkpoint/restart technique suitable for GPU systems is needed. The goal of this work is to exploit higher levels of parallelism and
A small scale experimental facility was designed to study the thermal hydraulic phenomena in the Reactor Cavity Cooling System (RCCS). The facility was scaled down from the full scale RCCS system by applying scaling laws. A set of RELAP5-3D simulations were performed to confirm the scaling calculations, and to refine and optimize the facility's configuration, instrumentation selection, and layout. Computational Fluid Dynamics (CFD) calculations using StarCCM+ were performed in order to study the flow patterns and two-phase water behavior in selected locations of the facility where expected complex flow structure occurs. (author)
PLOT3D/AMES, APOLLO UNIX VERSION USING GMR3D (WITHOUT TURB3D)
Buning, P.
1994-01-01
PLOT3D is an interactive graphics program designed to help scientists visualize computational fluid dynamics (CFD) grids and solutions. Today, supercomputers and CFD algorithms can provide scientists with simulations of such highly complex phenomena that obtaining an understanding of the simulations has become a major problem. Tools which help the scientist visualize the simulations can be of tremendous aid. PLOT3D/AMES offers more functions and features, and has been adapted for more types of computers than any other CFD graphics program. Version 3.6b+ is supported for five computers and graphic libraries. Using PLOT3D, CFD physicists can view their computational models from any angle, observing the physics of problems and the quality of solutions. As an aid in designing aircraft, for example, PLOT3D's interactive computer graphics can show vortices, temperature, reverse flow, pressure, and dozens of other characteristics of air flow during flight. As critical areas become obvious, they can easily be studied more closely using a finer grid. PLOT3D is part of a computational fluid dynamics software cycle. First, a program such as 3DGRAPE (ARC-12620) helps the scientist generate computational grids to model an object and its surrounding space. Once the grids have been designed and parameters such as the angle of attack, Mach number, and Reynolds number have been specified, a "flow-solver" program such as INS3D (ARC-11794 or COS-10019) solves the system of equations governing fluid flow, usually on a supercomputer. Grids sometimes have as many as two million points, and the "flow-solver" produces a solution file which contains density, x- y- and z-momentum, and stagnation energy for each grid point. With such a solution file and a grid file containing up to 50 grids as input, PLOT3D can calculate and graphically display any one of 74 functions, including shock waves, surface pressure, velocity vectors, and particle traces. PLOT3D's 74 functions are organized into
PLOT3D/AMES, APOLLO UNIX VERSION USING GMR3D (WITH TURB3D)
Buning, P.
1994-01-01
PLOT3D is an interactive graphics program designed to help scientists visualize computational fluid dynamics (CFD) grids and solutions. Today, supercomputers and CFD algorithms can provide scientists with simulations of such highly complex phenomena that obtaining an understanding of the simulations has become a major problem. Tools which help the scientist visualize the simulations can be of tremendous aid. PLOT3D/AMES offers more functions and features, and has been adapted for more types of computers than any other CFD graphics program. Version 3.6b+ is supported for five computers and graphic libraries. Using PLOT3D, CFD physicists can view their computational models from any angle, observing the physics of problems and the quality of solutions. As an aid in designing aircraft, for example, PLOT3D's interactive computer graphics can show vortices, temperature, reverse flow, pressure, and dozens of other characteristics of air flow during flight. As critical areas become obvious, they can easily be studied more closely using a finer grid. PLOT3D is part of a computational fluid dynamics software cycle. First, a program such as 3DGRAPE (ARC-12620) helps the scientist generate computational grids to model an object and its surrounding space. Once the grids have been designed and parameters such as the angle of attack, Mach number, and Reynolds number have been specified, a "flow-solver" program such as INS3D (ARC-11794 or COS-10019) solves the system of equations governing fluid flow, usually on a supercomputer. Grids sometimes have as many as two million points, and the "flow-solver" produces a solution file which contains density, x- y- and z-momentum, and stagnation energy for each grid point. With such a solution file and a grid file containing up to 50 grids as input, PLOT3D can calculate and graphically display any one of 74 functions, including shock waves, surface pressure, velocity vectors, and particle traces. PLOT3D's 74 functions are organized into
Session-Based Programming for Parallel Algorithms: Expressiveness and Performance
Andi Bejleri
2010-02-01
Full Text Available This paper investigates session programming and typing of benchmark examples to compare productivity, safety and performance with other communications programming languages. Parallel algorithms are used to examine the above aspects due to their extensive use of message passing for interaction, and their increasing prominence in algorithmic research with the rising availability of hardware resources such as multicore machines and clusters. We contribute new benchmark results for SJ, an extension of Java for type-safe, binary session programming, against MPJ Express, a Java messaging system based on the MPI standard. In conclusion, we observe that (1 despite rich libraries and functionality, MPI remains a low-level API, and can suffer from commonly perceived disadvantages of explicit message passing such as deadlocks and unexpected message types, and (2 the benefits of high-level session abstraction, which has significant impact on program structure to improve readability and reliability, and session type-safety can greatly facilitate the task of communications programming whilst retaining competitive performance.
Energy consumption model over parallel programs implemented on multicore architectures
Ricardo Isidro-Ramirez
2015-06-01
Full Text Available In High Performance Computing, energy consump-tion is becoming an important aspect to consider. Due to the high costs that represent energy production in all countries it holds an important role and it seek to find ways to save energy. It is reflected in some efforts to reduce the energy requirements of hardware components and applications. Some options have been appearing in order to scale down energy use and, con-sequently, scale up energy efficiency. One of these strategies is the multithread programming paradigm, whose purpose is to produce parallel programs able to use the full amount of computing resources available in a microprocessor. That energy saving strategy focuses on efficient use of multicore processors that are found in various computing devices, like mobile devices. Actually, as a growing trend, multicore processors are found as part of various specific purpose computers since 2003, from High Performance Computing servers to mobile devices. However, it is not clear how multiprogramming affects energy efficiency. This paper presents an analysis of different types of multicore-based architectures used in computing, and then a valid model is presented. Based on Amdahl’s Law, a model that considers different scenarios of energy use in multicore architectures it is proposed. Some interesting results were found from experiments with the developed algorithm, that it was execute of a parallel and sequential way. A lower limit of energy consumption was found in a type of multicore architecture and this behavior was observed experimentally.
A Programming Model Performance Study Using the NAS Parallel Benchmarks
Hongzhang Shan
2010-01-01
Full Text Available Harnessing the power of multicore platforms is challenging due to the additional levels of parallelism present. In this paper we use the NAS Parallel Benchmarks to study three programming models, MPI, OpenMP and PGAS to understand their performance and memory usage characteristics on current multicore architectures. To understand these characteristics we use the Integrated Performance Monitoring tool and other ways to measure communication versus computation time, as well as the fraction of the run time spent in OpenMP. The benchmarks are run on two different Cray XT5 systems and an Infiniband cluster. Our results show that in general the three programming models exhibit very similar performance characteristics. In a few cases, OpenMP is significantly faster because it explicitly avoids communication. For these particular cases, we were able to re-write the UPC versions and achieve equal performance to OpenMP. Using OpenMP was also the most advantageous in terms of memory usage. Also we compare performance differences between the two Cray systems, which have quad-core and hex-core processors. We show that at scale the performance is almost always slower on the hex-core system because of increased contention for network resources.
An object-oriented programming paradigm for parallelization of computational fluid dynamics
We propose an object-oriented programming paradigm for parallelization of scientific computing programs, and show that the approach can be a very useful strategy. Generally, parallelization of scientific programs tends to be complicated and unportable due to the specific requirements of each parallel computer or compiler. In this paper, we show that the object-oriented programming design, which separates the parallel processing parts from the solver of the applications, can achieve the large improvement in the maintenance of the codes, as well as the high portability. We design the program for the two-dimensional Euler equations according to the paradigm, and evaluate the parallel performance on IBM SP2. (author)
Center for Programming Models for Scalable Parallel Computing: Future Programming Models
Gao, Guang, R.
2008-07-24
The mission of the pmodel center project is to develop software technology to support scalable parallel programming models for terascale systems. The goal of the specific UD subproject is in the context developing an efficient and robust methodology and tools for HPC programming. More specifically, the focus is on developing new programming models which facilitate programmers in porting their application onto parallel high performance computing systems. During the course of the research in the past 5 years, the landscape of microprocessor chip architecture has witnessed a fundamental change – the emergence of multi-core/many-core chip architecture appear to become the mainstream technology and will have a major impact to for future generation parallel machines. The programming model for shared-address space machines is becoming critical to such multi-core architectures. Our research highlight is the in-depth study of proposed fine-grain parallelism/multithreading support on such future generation multi-core architectures. Our research has demonstrated the significant impact such fine-grain multithreading model can have on the productivity of parallel programming models and their efficient implementation.
A Parallel Vector Machine for the PM Programming Language
Bellerby, Tim
2016-04-01
PM is a new programming language which aims to make the writing of computational geoscience models on parallel hardware accessible to scientists who are not themselves expert parallel programmers. It is based around the concept of communicating operators: language constructs that enable variables local to a single invocation of a parallelised loop to be viewed as if they were arrays spanning the entire loop domain. This mechanism enables different loop invocations (which may or may not be executing on different processors) to exchange information in a manner that extends the successful Communicating Sequential Processes idiom from single messages to collective communication. Communicating operators avoid the additional synchronisation mechanisms, such as atomic variables, required when programming using the Partitioned Global Address Space (PGAS) paradigm. Using a single loop invocation as the fundamental unit of concurrency enables PM to uniformly represent different levels of parallelism from vector operations through shared memory systems to distributed grids. This paper describes an implementation of PM based on a vectorised virtual machine. On a single processor node, concurrent operations are implemented using masked vector operations. Virtual machine instructions operate on vectors of values and may be unmasked, masked using a Boolean field, or masked using an array of active vector cell locations. Conditional structures (such as if-then-else or while statement implementations) calculate and apply masks to the operations they control. A shift in mask representation from Boolean to location-list occurs when active locations become sufficiently sparse. Parallel loops unfold data structures (or vectors of data structures for nested loops) into vectors of values that may additionally be distributed over multiple computational nodes and then split into micro-threads compatible with the size of the local cache. Inter-node communication is accomplished using
D. Pletinckx
2012-09-01
Full Text Available The current 3D hype creates a lot of interest in 3D. People go to 3D movies, but are we ready to use 3D in our homes, in our offices, in our communication? Are we ready to deliver real 3D to a general public and use interactive 3D in a meaningful way to enjoy, learn, communicate? The CARARE project is realising this for the moment in the domain of monuments and archaeology, so that real 3D of archaeological sites and European monuments will be available to the general public by 2012. There are several aspects to this endeavour. First of all is the technical aspect of flawlessly delivering 3D content over all platforms and operating systems, without installing software. We have currently a working solution in PDF, but HTML5 will probably be the future. Secondly, there is still little knowledge on how to create 3D learning objects, 3D tourist information or 3D scholarly communication. We are still in a prototype phase when it comes to integrate 3D objects in physical or virtual museums. Nevertheless, Europeana has a tremendous potential as a multi-facetted virtual museum. Finally, 3D has a large potential to act as a hub of information, linking to related 2D imagery, texts, video, sound. We describe how to create such rich, explorable 3D objects that can be used intuitively by the generic Europeana user and what metadata is needed to support the semantic linking.
In light water reactors, particularly the pressurized water reactors, the severity of loss-of-coolant accidents (LOCAs) will limit how high the reactor power can extend. Although the best-estimate LOCA methodology can provide the greatest margin on the peak cladding temperature (PCT) evaluation during LOCA, it will take many more resources to develop and to get final approval from the licensing authority. Instead, implementation of evaluation models required by Appendix K of the Code of Federal Regulations, Title 10, Part 50 (10 CFR 50), upon an advanced thermal-hydraulic platform can also gain significant margin on the PCT calculation. A program to modify RELAP5-3D in accordance with Appendix K of 10 CFR 50 was launched by the Institute of Nuclear Energy Research, Taiwan, and it consists of six sequential phases of work. The compliance of the current RELAP5-3D with Appendix K of 10 CFR 50 has been evaluated, and it was found that there are 11 areas where the code modifications are required to satisfy the requirements set forth in Appendix K of 10 CFR 50. To verify and assess the development of the Appendix K version of RELAP5-3D, nine kinds of separate-effect experiments and six sets of integral-effect experiments will be adopted. Through the assessments program, all the model changes will be verified
This book explains modeling of solid works 3D and application of 3D CAD/CAM. The contents of this book are outline of modeling such as CAD and 2D and 3D, solid works composition, method of sketch, writing measurement fixing, selecting projection, choosing condition of restriction, practice of sketch, making parts, reforming parts, modeling 3D, revising 3D modeling, using pattern function, modeling necessaries, assembling, floor plan, 3D modeling method, practice floor plans for industrial engineer data aided manufacturing, processing of CAD/CAM interface.
Traditional two-dimensional (2D)/one-dimensional (1D) SYNTHESIS methodology has been widely used to calculate fast neutron (>1.0 MeV) fluence exposure to reactor pressure vessel in the belt-line region. However, it is expected that this methodology cannot provide accurate fast neutron fluence calculation at elevations far above or below the active core region. A three-dimensional (3D) parallel discrete ordinates calculation for ex-vessel neutron dosimetry on a Westinghouse 4-Loop XL Pressurized Water Reactor has been done. It shows good agreement between the calculated results and measured results. Furthermore, the results show very different fast neutron flux values at some of the former plate locations and elevations above and below an active core than those calculated by a 2D/1D SYNTHESIS method. This indicates that for certain irregular reactor internal structures, where the fast neutron flux has a very strong local effect, it is required to use a 3D transport method to calculate accurate fast neutron exposure. (authors)
The Data-Parallel Programming Model: a Semantic Perspective (Final Version)
Bougé, Luc
1996-01-01
We provide a short introduction to the data-parallel programming model. We argue that parallel computing often makes little distinction between the execution model and the programming model. This results in poor programming and low portability. Using the «GOTO considered harmful» analogy, we show that data parallelism can be seen as a way out of this difficulty. We show that important aspects of the data-parallel model were already present in earlier approaches to parallel programming, and de...
Open-MP与并行程序设计%Open-MP and Parallel Programming
陈崚; 陈宏建; 秦玲
2003-01-01
The application programming interface Open-MP for the shared memory parallel computer system and its characteristics are illustrated. We also compare Open-MP with parallel programming tool MPI.To overcome the disadvantage of large overhead in Open-MP program,several optimization methods in Open-MP programming are presented to increase the efficiency of its execution.
Performance Measurement, Visualization and Modeling of Parallel and Distributed Programs
Yan, Jerry C.; Sarukkai, Sekhar R.; Mehra, Pankaj; Lum, Henry, Jr. (Technical Monitor)
1994-01-01
This paper presents a methodology for debugging the performance of message-passing programs on both tightly coupled and loosely coupled distributed-memory machines. The AIMS (Automated Instrumentation and Monitoring System) toolkit, a suite of software tools for measurement and analysis of performance, is introduced and its application illustrated using several benchmark programs drawn from the field of computational fluid dynamics. AIMS includes (i) Xinstrument, a powerful source-code instrumentor, which supports both Fortran77 and C as well as a number of different message-passing libraries including Intel's NX Thinking Machines' CMMD, and PVM; (ii) Monitor, a library of timestamping and trace -collection routines that run on supercomputers (such as Intel's iPSC/860, Delta, and Paragon and Thinking Machines' CM5) as well as on networks of workstations (including Convex Cluster and SparcStations connected by a LAN); (iii) Visualization Kernel, a trace-animation facility that supports source-code clickback, simultaneous visualization of computation and communication patterns, as well as analysis of data movements; (iv) Statistics Kernel, an advanced profiling facility, that associates a variety of performance data with various syntactic components of a parallel program; (v) Index Kernel, a diagnostic tool that helps pinpoint performance bottlenecks through the use of abstract indices; (vi) Modeling Kernel, a facility for automated modeling of message-passing programs that supports both simulation -based and analytical approaches to performance prediction and scalability analysis; (vii) Intrusion Compensator, a utility for recovering true performance from observed performance by removing the overheads of monitoring and their effects on the communication pattern of the program; and (viii) Compatibility Tools, that convert AIMS-generated traces into formats used by other performance-visualization tools, such as ParaGraph, Pablo, and certain AVS/Explorer modules.
Amin Chamani
2013-06-01
Full Text Available The development of yielded or failure zone due to an engineering construction is a subject of study in different disciplines. In Petroleum engineering, depletion from and injection of gas into a porous rock can cause development of a yield zone around the reservoir. Studying this phenomenon requires elasto-plastic analysis of geomaterial, in this case the porous rocks. In this study, which is a continuation of a previous study investigating the elastic behaviour of geomaterial, the elasto-plastic responses of geomaterial were studied. A 3D finite element code (FEM was developed, which can consider different constitutive models. The code features were explained and some case studies were presented to validate the output results of the code. The numerical model was, then, applied to study the development of the plastic zone around a horizontal porous formation subjected to the injection of gas. The model is described in detail and the results are presented. It was observed that by reducing the cohesion of rocks the extension of the plastic zone increased. Comparing to the elastic model, the ability to estimate the extension of the yield and failure zone is the main advantage of an elasto-plastic model.
Bailey, R. T.; Shih, T. I.-P.; Nguyen, H. L.; Roelke, R. J.
1990-01-01
An efficient computer program, called GRID2D/3D, was developed to generate single and composite grid systems within geometrically complex two- and three-dimensional (2- and 3-D) spatial domains that can deform with time. GRID2D/3D generates single grid systems by using algebraic grid generation methods based on transfinite interpolation in which the distribution of grid points within the spatial domain is controlled by stretching functions. All single grid systems generated by GRID2D/3D can have grid lines that are continuous and differentiable everywhere up to the second-order. Also, grid lines can intersect boundaries of the spatial domain orthogonally. GRID2D/3D generates composite grid systems by patching together two or more single grid systems. The patching can be discontinuous or continuous. For continuous composite grid systems, the grid lines are continuous and differentiable everywhere up to the second-order except at interfaces where different single grid systems meet. At interfaces where different single grid systems meet, the grid lines are only differentiable up to the first-order. For 2-D spatial domains, the boundary curves are described by using either cubic or tension spline interpolation. For 3-D spatial domains, the boundary surfaces are described by using either linear Coon's interpolation, bi-hyperbolic spline interpolation, or a new technique referred to as 3-D bi-directional Hermite interpolation. Since grid systems generated by algebraic methods can have grid lines that overlap one another, GRID2D/3D contains a graphics package for evaluating the grid systems generated. With the graphics package, the user can generate grid systems in an interactive manner with the grid generation part of GRID2D/3D. GRID2D/3D is written in FORTRAN 77 and can be run on any IBM PC, XT, or AT compatible computer. In order to use GRID2D/3D on workstations or mainframe computers, some minor modifications must be made in the graphics part of the program; no
Felician ALECU
2010-01-01
Full Text Available Many professionals and 3D artists consider Blender as being the best open source solution for 3D computer graphics. The main features are related to modeling, rendering, shading, imaging, compositing, animation, physics and particles and realtime 3D/game creation.
The program for determination of the scintillator decay time with the use of an autocorrelated time spectrometer is submitted. The result of measurement of the LaBr3:Ce decay time is given: τ = 22.5(2) ns.
3d-3d correspondence revisited
Chung, Hee-Joong; Dimofte, Tudor; Gukov, Sergei; Sułkowski, Piotr
2016-04-01
In fivebrane compactifications on 3-manifolds, we point out the importance of all flat connections in the proper definition of the effective 3d {N}=2 theory. The Lagrangians of some theories with the desired properties can be constructed with the help of homological knot invariants that categorify colored Jones polynomials. Higgsing the full 3d theories constructed this way recovers theories found previously by Dimofte-Gaiotto-Gukov. We also consider the cutting and gluing of 3-manifolds along smooth boundaries and the role played by all flat connections in this operation.
Brdnik, Lovro
2015-01-01
Diplomsko delo analizira trenutno stanje 3D tiskalnikov na trgu. Prikazan je razvoj in principi delovanja 3D tiskalnikov. Predstavljeni so tipi 3D tiskalnikov, njihove prednosti in slabosti. Podrobneje je predstavljena zgradba in delovanje koračnih motorjev. Opravljene so meritve koračnih motorjev. Opisana je programska oprema za rokovanje s 3D tiskalniki in komponente, ki jih potrebujemo za izdelavo. Diploma se oklepa vprašanja, ali je izdelava 3D tiskalnika bolj ekonomična kot pa naložba v ...
Tinetti, Fernando Gustavo
2000-01-01
This book makes a clear presentation of the traditional topics included in a course of undergraduate parallel programming. As explained by the authors, it was developed from their own experience in classrooms, introducing their students to parallel programming. It can be used almost directly to teach basic parallel programming.
Full text: A computational program was developed for reactor fuel management in three dimensional Cartesian coordinates using two-group neutron diffusion theory (fast neutron and thermal neutron energy group). Three fuel loading patterns were considered as follow: 1. uniform loading, 2. out-in loading and 3. in-scatter loading. Criticality, peak power distribution and loaded fuel depletion measured in megawatt-day per kilogram (MW d/kg) of uranium were also calculated by the developed program. The results showed that the in-scatter loading pattern gave the best power peaking for fuel management
并行程序设计语言发展现状%Current Development of Parallel Programming Language
韩卫; 郝红宇; 代丽
2003-01-01
In this paper we introduce the history of the parallel programming language and list some of currently parallel programming languages. Then according to the classified principle. We analyze some of the representative parallel programming languages in detail. Finally, we show a further feature to the parallel programming language.
Lipatov, A. S.; Farrell, W. M.; Cooper, J. F.; Sittler, E. C., Jr.; Hartle, R. E.
2015-01-01
The interactions between the solar wind and Moon-sized objects are determined by a set of the solar wind parameters and plasma environment of the space objects. The orientation of upstream magnetic field is one of the key factors which determines the formation and structure of bow shock wave/Mach cone or Alfven wing near the obstacle. The study of effects of the direction of the upstream magnetic field on lunar-like plasma environment is the main subject of our investigation in this paper. Photoionization, electron-impact ionization and charge exchange are included in our hybrid model. The computational model includes the self-consistent dynamics of the light (hydrogen (+), helium (+)) and heavy (sodium (+)) pickup ions. The lunar interior is considered as a weakly conducting body. Our previous 2013 lunar work, as reported in this journal, found formation of a triple structure of the Mach cone near the Moon in the case of perpendicular upstream magnetic field. Further advances in modeling now reveal the presence of strong wave activity in the upstream solar wind and plasma wake in the cases of quasiparallel and parallel upstream magnetic fields. However, little wave activity is found for the opposite case with a perpendicular upstream magnetic field. The modeling does not show a formation of the Mach cone in the case of theta(Sub B,U) approximately equal to 0 degrees.
Reng, Lars
2012-01-01
between technical and artistic minded students is, however, increased once the students reach the sixth semester. The complex algorithms of the artificial intelligence course seemed to demotivate the artistic minded students even before the course began. This paper will present the extensive changes made...... to the sixth semester artificial intelligence programming course, in order to provide a highly motivating direct visual feedback, and thereby remove the steep initial learning curve for artistic minded students. The framework was developed with close dialog to both the game industry and experienced master...
Processor Allocation for Optimistic Parallelization of Irregular Programs
Versaci, Francesco
2012-01-01
Optimistic parallelization is a promising approach for the parallelization of irregular algorithms: potentially interfering tasks are launched dynamically, and the runtime system detects conflicts between concurrent activities, aborting and rolling back conflicting tasks. However, parallelism in irregular algorithms is very complex. In a regular algorithm like dense matrix multiplication, the amount of parallelism can usually be expressed as a function of the problem size, so it is reasonably straightforward to determine how many processors should be allocated to execute a regular algorithm of a certain size (this is called the processor allocation problem). In contrast, parallelism in irregular algorithms can be a function of input parameters, and the amount of parallelism can vary dramatically during the execution of the irregular algorithm. Therefore, the processor allocation problem for irregular algorithms is very difficult. In this paper, we describe the first systematic strategy for addressing this pro...
JAC3D is a three-dimensional finite element program designed to solve quasi-static nonlinear mechanics problems. A set of continuum equations describes the nonlinear mechanics involving large rotation and strain. A nonlinear conjugate gradient method is used to solve the equation. The method is implemented in a three-dimensional setting with various methods for accelerating convergence. Sliding interface logic is also implemented. An eight-node Lagrangian uniform strain element is used with hourglass stiffness to control the zero-energy modes. This report documents the elastic and isothermal elastic-plastic material model. Other material models, documented elsewhere, are also available. The program is vectorized for efficient performance on Cray computers. Sample problems described are the bending of a thin beam, the rotation of a unit cube, and the pressurization and thermal loading of a hollow sphere
FastScript3D - A Companion to Java 3D
Koenig, Patti
2005-01-01
FastScript3D is a computer program, written in the Java 3D(TM) programming language, that establishes an alternative language that helps users who lack expertise in Java 3D to use Java 3D for constructing three-dimensional (3D)-appearing graphics. The FastScript3D language provides a set of simple, intuitive, one-line text-string commands for creating, controlling, and animating 3D models. The first word in a string is the name of a command; the rest of the string contains the data arguments for the command. The commands can also be used as an aid to learning Java 3D. Developers can extend the language by adding custom text-string commands. The commands can define new 3D objects or load representations of 3D objects from files in formats compatible with such other software systems as X3D. The text strings can be easily integrated into other languages. FastScript3D facilitates communication between scripting languages [which enable programming of hyper-text markup language (HTML) documents to interact with users] and Java 3D. The FastScript3D language can be extended and customized on both the scripting side and the Java 3D side.
Tournay, Bruno; Rüdiger, Bjarne
2006-01-01
3d digital model af Arkitektskolens gård med virtuel udstilling af afgangsprojekter fra afgangen sommer 2006. 10 s.......3d digital model af Arkitektskolens gård med virtuel udstilling af afgangsprojekter fra afgangen sommer 2006. 10 s....
Roberto Rinaldi
2014-12-01
Full Text Available After an experimental phase of many years, 3D filming is now effective and successful. Improvements are still possible, but the film industry achieved memorable success on 3D movie’s box offices due to the overall quality of its products. Special environments such as space (“Gravity” and the underwater realm look perfect to be reproduced in 3D. “Filming in space” was possible in “Gravity” using special effects and computer graphic. The underwater realm is still difficult to be handled. Underwater filming in 3D was not that easy and effective as filming in 2D, since not long ago. After almost 3 years of research, a French, Austrian and Italian team realized a perfect tool to film underwater, in 3D, without any constrains. This allows filmmakers to bring the audience deep inside an environment where they most probably will never have the chance to be.
Guide to development of a scalar massive parallel programming on Paragon
Parallel calculations using more than hundred computers had begun in Japan only several years ago. The Intel Paragon XP/S 15GP256 , 75MP834 were introduced as pioneers in Japan Atomic Energy Research Institute (JAERI) to pursue massive parallel simulations for advanced photon and fusion researches. Recently, large number of parallel programs have been transplanted or newly produced to perform the parallel calculations with those computers. However, these programs are developed based on software technologies for conventional super computer, therefore they sometimes cause troubles in the massive parallel computing. In principle, when programs are developed under different computer and operating system (OS), prudent directions and knowledge are needed. However, integration of knowledge and standardization of environment are quite difficult because number of Paragon system and Paragon's users are very small in Japan. Therefore, we summarized information which was got through the process of development of a massive parallel program in the Paragon XP/S 75MP834. (author)
PLOT3D- DRAWING THREE DIMENSIONAL SURFACES
Canright, R. B.
1994-01-01
PLOT3D is a package of programs to draw three-dimensional surfaces of the form z = f(x,y). The function f and the boundary values for x and y are the input to PLOT3D. The surface thus defined may be drawn after arbitrary rotations. However, it is designed to draw only functions in rectangular coordinates expressed explicitly in the above form. It cannot, for example, draw a sphere. Output is by off-line incremental plotter or online microfilm recorder. This package, unlike other packages, will plot any function of the form z = f(x,y) and portrays continuous and bounded functions of two independent variables. With curve fitting; however, it can draw experimental data and pictures which cannot be expressed in the above form. The method used is division into a uniform rectangular grid of the given x and y ranges. The values of the supplied function at the grid points (x, y) are calculated and stored; this defines the surface. The surface is portrayed by connecting successive (y,z) points with straight-line segments for each x value on the grid and, in turn, connecting successive (x,z) points for each fixed y value on the grid. These lines are then projected by parallel projection onto the fixed yz-plane for plotting. This program has been implemented on the IBM 360/67 with on-line CDC microfilm recorder.
Detection Of Control Flow Errors In Parallel Programs At Compile Time
Bruce P. Lester
2010-12-01
Full Text Available This paper describes a general technique to identify control flow errors in parallel programs, which can be automated into a compiler. The compiler builds a system of linear equations that describes the global control flow of the whole program. Solving these equations using standard techniques of linear algebra can locate a wide range of control flow bugs at compile time. This paper also describes an implementation of this control flow analysis technique in a prototype compiler for a well-known parallel programming language. In contrast to previous research in automated parallel program analysis, our technique is efficient for large programs, and does not limit the range of language features.
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems
Stone, John E.; Gohara, David; Shi, Guochun
2010-01-01
We provide an overview of the key architectural features of recent microprocessor designs and describe the programming model and abstractions provided by OpenCL, a new parallel programming standard targeting these architectures.
James, Tamara; Hsieh, Meng-Lun; Knipling, Leslie; Hinton, Deborah
2015-01-01
Determining the structure of a protein-DNA complex can be difficult, particularly if the protein does not bind tightly to the DNA, if there are no homologous proteins from which the DNA binding can be inferred, and/or if only portions of the protein can be crystallized. If the protein comprises just a part of a large multi-subunit complex, other complications can arise such as the complex being too large for NMR studies, or it is not possible to obtain the amounts of protein and nucleic acids needed for crystallographic analyses. Here, we describe a technique we used to map the position of an activator protein relative to the DNA within a large transcription complex. We determined the position of the activator on the DNA from data generated using activator proteins that had been conjugated at specific residues with the chemical cleaving reagent, iron bromoacetamidobenzyl-EDTA (FeBABE). These analyses were combined with 3-D models of the available structures of portions of the activator protein and B-form DNA to obtain a 3-D picture of the protein relative to the DNA. Finally, the Molsoft program was used to refine the position, revealing the architecture of the protein-DNA within the transcription complex. PMID:26404142
A Verified Integration of Imperative Parallel Programming Paradigms in an Object-Oriented Language
Sivilotti, Paul
1993-01-01
CC++ is a parallel object-oriented programming language that uses parallel composition, atomic functions, and single- assignment variables to express concurrency. We show that this programming paradigm is equivalent to several traditional imperative communication and synchronization models, namely: semaphores, monitors, and asynchronous channels. A collection of libraries which integrates these traditional models with CC++ is specified, implemented, and formally verified.
About the Capability of Some Parallel Program Metric Prediction Using Neural Network Approach
Vera, Yu; Nina, N.
2007-01-01
Parallel program execution on multiprocessor system is affected by many factors. Often it is rather difficult to estimate how program would behave when running on some definite number of processors. Most tools designed for performance and other parallel program characteristics evaluation affect source code of a program. In this paper we consider neural network approach to predict job run time (when using a queue-based submission system). All analysis that is needed for the prediction can be p...
Towards a Parallel Virtual Machine for Functional Logic Programming
Alqaddoumi, Abdulla
2010-01-01
Functional logic programming is a multi-paradigm programming that combines the best features of functional programming and logic programming. Functional programming provides mechanisms for demand-driven evaluation, higher order functions and polymorphic typing. Logic programming deals with non-determinism, partial information and constraints. Both programming paradigms fall under the umbrella of declarative programming. For the most part, the current implementations of functional logic langua...
Cache-aware Parallel Programming for Manycore Processors
Tousimojarad, Ashkan; Vanderbauwhede, Wim
2014-01-01
With rapidly evolving technology, multicore and manycore processors have emerged as promising architectures to benefit from increasing transistor numbers. The transition towards these parallel architectures makes today an exciting time to investigate challenges in parallel computing. The TILEPro64 is a manycore accelerator, composed of 64 tiles interconnected via multiple 8x8 mesh networks. It contains per-tile caches and supports cache-coherent shared memory by default. In this paper we pres...
Valenza, Enrico
2015-01-01
This book is aimed at the professionals that already have good 3D CGI experience with commercial packages and have now decided to try the open source Blender and want to experiment with something more complex than the average tutorials on the web. However, it's also aimed at the intermediate Blender users who simply want to go some steps further.It's taken for granted that you already know how to move inside the Blender interface, that you already have 3D modeling knowledge, and also that of basic 3D modeling and rendering concepts, for example, edge-loops, n-gons, or samples. In any case, it'
A parallel dynamic programming algorithm for multi-reservoir system optimization
Li, Xiang; Wei, Jiahua; Li, Tiejian; Wang, Guangqian; Yeh, William W.-G.
2014-05-01
This paper develops a parallel dynamic programming algorithm to optimize the joint operation of a multi-reservoir system. First, a multi-dimensional dynamic programming (DP) model is formulated for a multi-reservoir system. Second, the DP algorithm is parallelized using a peer-to-peer parallel paradigm. The parallelization is based on the distributed memory architecture and the message passing interface (MPI) protocol. We consider both the distributed computing and distributed computer memory in the parallelization. The parallel paradigm aims at reducing the computation time as well as alleviating the computer memory requirement associated with running a multi-dimensional DP model. Next, we test the parallel DP algorithm on the classic, benchmark four-reservoir problem on a high-performance computing (HPC) system with up to 350 cores. Results indicate that the parallel DP algorithm exhibits good performance in parallel efficiency; the parallel DP algorithm is scalable and will not be restricted by the number of cores. Finally, the parallel DP algorithm is applied to a real-world, five-reservoir system in China. The results demonstrate the parallel efficiency and practical utility of the proposed methodology.
Quealy, Angela; Cole, Gary L.; Blech, Richard A.
1993-01-01
The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.
Probe Trajectory Interpolation for 3D Reconstruction of Freehand Ultrasound
Coupé, Pierrick; Hellier, Pierre; Morandi, Xavier; Barillot, Christian
2007-01-01
Three-dimensional (3D) Freehand ultrasound uses the acquisition of non parallel B-scans localized in 3D by a tracking system (optic, mechanical or magnetic). Using the positions of the irregularly spaced B-scans, a regular 3D lattice volume can be reconstructed, to which conventional 3D computer vision algorithms (registration and segmentation) can be applied. This paper presents a new 3D reconstruction method which explicitly accounts for the probe trajectory. Experiments were conducted on p...
Allan R. Larrabee
1993-01-01
The first digital computers consisted of a single processor acting on a single stream of data. In this so-called "von Neumann" architecture, computation speed is limited mainly by the time required to transfer data between the processor and memory. This limiting factor has been referred to as the "von Neumann bottleneck". The concern that the miniaturization of silicon-based integrated circuits will soon reach theoretical limits of size and gate times has led to increased interest in parallel...
3D Reconstruction of NMR Images
Peter Izak; Milan Smetana; Libor Hargas; Miroslav Hrianka; Pavol Spanik
2007-01-01
This paper introduces experiment of 3D reconstruction NMR images scanned from magnetic resonance device. There are described methods which can be used for 3D reconstruction magnetic resonance images in biomedical application. The main idea is based on marching cubes algorithm. For this task was chosen sophistication method by program Vision Assistant, which is a part of program LabVIEW.
Hundebøl, Jesper
wave of new building information modelling tools demands further investigation, not least because of industry representatives' somewhat coarse parlance: Now the word is spreading -3D digital modelling is nothing less than a revolution, a shift of paradigm, a new alphabet... Research qeustions. Based...... on empirical probes (interviews, observations, written inscriptions) within the Danish construction industry this paper explores the organizational and managerial dynamics of 3D Digital Modelling. The paper intends to - Illustrate how the network of (non-)human actors engaged in the promotion (and arrest) of 3......D Modelling (in Denmark) stabilizes - Examine how 3D Modelling manifests itself in the early design phases of a construction project with a view to discuss the effects hereof for i.a. the management of the building process. Structure. The paper introduces a few, basic methodological concepts...
Lively, Michael
2010-01-01
Professional Papervision3D describes how Papervision3D works and how real world applications are built, with a clear look at essential topics such as building websites and games, creating virtual tours, and Adobe's Flash 10. Readers learn important techniques through hands-on applications, and build on those skills as the book progresses. The companion website contains all code examples, video step-by-step explanations, and a collada repository.
HPC parallel programming model for gyrokinetic MHD simulation
The 3-dimensional gyrokinetic PIC (particle-in-cell) code for MHD simulation, Gpic-MHD, was installed on SR16000 (“Plasma Simulator”), which is a scalar cluster system consisting of 8,192 logical cores. The Gpic-MHD code advances particle and field quantities in time. In order to distribute calculations over large number of logical cores, the total simulation domain in cylindrical geometry was broken up into NDD-r × NDD-z (number of radial decomposition times number of axial decomposition) small domains including approximately the same number of particles. The axial direction was uniformly decomposed, while the radial direction was non-uniformly decomposed. NRP replicas (copies) of each decomposed domain were used (“particle decomposition”). The hybrid parallelization model of multi-threads and multi-processes was employed: threads were parallelized by the auto-parallelization and NDD-r × NDD-z × NRP processes were parallelized by MPI (message-passing interface). The parallelization performance of Gpic-MHD was investigated for the medium size system of Nr × Nθ × Nz = 1025 × 128 × 128 mesh with 4.196 or 8.192 billion particles. The highest speed for the fixed number of logical cores was obtained for two threads, the maximum number of NDD-z, and optimum combination of NDD-r and NRP. The observed optimum speeds demonstrated good scaling up to 8,192 logical cores. (author)
Declarative Parallel Programming in Spreadsheet End-User Development:A Literature Review
Biermann, Florian
2016-01-01
Spreadsheets are first-order functional languages and are widely used in research and industry as a tool to conveniently perform all kinds of computations. Because cells on a spreadsheet are immutable, there are possibilities for implicit parallelization of spreadsheet computations. In this literature study, we provide an overview of the publications on spreadsheet end-user programming and declarative array programming to inform further research on parallel programming in spreadsheets. Our re...
[Real time 3D echocardiography
Bauer, F.; Shiota, T.; Thomas, J. D.
2001-01-01
Three-dimensional representation of the heart is an old concern. Usually, 3D reconstruction of the cardiac mass is made by successive acquisition of 2D sections, the spatial localisation and orientation of which require complex guiding systems. More recently, the concept of volumetric acquisition has been introduced. A matricial emitter-receiver probe complex with parallel data processing provides instantaneous of a pyramidal 64 degrees x 64 degrees volume. The image is restituted in real time and is composed of 3 planes (planes B and C) which can be displaced in all spatial directions at any time during acquisition. The flexibility of this system of acquisition allows volume and mass measurement with greater accuracy and reproducibility, limiting inter-observer variability. Free navigation of the planes of investigation allows reconstruction for qualitative and quantitative analysis of valvular heart disease and other pathologies. Although real time 3D echocardiography is ready for clinical usage, some improvements are still necessary to improve its conviviality. Then real time 3D echocardiography could be the essential tool for understanding, diagnosis and management of patients.
PRAM C:a new programming environment for fine-grain and coarse-grain parallelism.
Brown, Jonathan Leighton; Wen, Zhaofang.
2004-11-01
In the search for ''good'' parallel programming environments for Sandia's current and future parallel architectures, they revisit a long-standing open question. Can the PRAM parallel algorithms designed by theoretical computer scientists over the last two decades be implemented efficiently? This open question has co-existed with ongoing efforts in the HPC community to develop practical parallel programming models that can simultaneously provide ease of use, expressiveness, performance, and scalability. Unfortunately, no single model has met all these competing requirements. Here they propose a parallel programming environment, PRAM C, to bridge the gap between theory and practice. This is an attempt to provide an affirmative answer to the PRAM question, and to satisfy these competing practical requirements. This environment consists of a new thin runtime layer and an ANSI C extension. The C extension has two control constructs and one additional data type concept, ''shared''. This C extension should enable easy translation from PRAM algorithms to real parallel programs, much like the translation from sequential algorithms to C programs. The thin runtime layer bundles fine-grained communication requests into coarse-grained communication to be served by message-passing. Although the PRAM represents SIMD-style fine-grained parallelism, a stand-alone PRAM C environment can support both fine-grained and coarse-grained parallel programming in either a MIMD or SPMD style, interoperate with existing MPI libraries, and use existing hardware. The PRAM C model can also be integrated easily with existing models. Unlike related efforts proposing innovative hardware with the goal to realize the PRAM, ours can be a pure software solution with the purpose to provide a practical programming environment for existing parallel machines; it also has the potential to perform well on future parallel architectures.
Stanitz, J. D.
1985-01-01
The general design method for three-dimensional, potential, incompressible or subsonic-compressible flow developed in part 1 of this report is applied to the design of simple, unbranched ducts. A computer program, DIN3D1, is developed and five numerical examples are presented: a nozzle, two elbows, an S-duct, and the preliminary design of a side inlet for turbomachines. The two major inputs to the program are the upstream boundary shape and the lateral velocity distribution on the duct wall. As a result of these inputs, boundary conditions are overprescribed and the problem is ill posed. However, it appears that there are degrees of compatibility between these two major inputs and that, for reasonably compatible inputs, satisfactory solutions can be obtained. By not prescribing the shape of the upstream boundary, the problem presumably becomes well posed, but it is not clear how to formulate a practical design method under this circumstance. Nor does it appear desirable, because the designer usually needs to retain control over the upstream (or downstream) boundary shape. The problem is further complicated by the fact that, unlike the two-dimensional case, and irrespective of the upstream boundary shape, some prescribed lateral velocity distributions do not have proper solutions.
Communications oriented programming of parallel iterative solutions of sparse linear systems
Patrick, M. L.; Pratt, T. W.
1986-01-01
Parallel algorithms are developed for a class of scientific computational problems by partitioning the problems into smaller problems which may be solved concurrently. The effectiveness of the resulting parallel solutions is determined by the amount and frequency of communication and synchronization and the extent to which communication can be overlapped with computation. Three different parallel algorithms for solving the same class of problems are presented, and their effectiveness is analyzed from this point of view. The algorithms are programmed using a new programming environment. Run-time statistics and experience obtained from the execution of these programs assist in measuring the effectiveness of these algorithms.
3D Spectroscopic Instrumentation
Bershady, Matthew A
2009-01-01
In this Chapter we review the challenges of, and opportunities for, 3D spectroscopy, and how these have lead to new and different approaches to sampling astronomical information. We describe and categorize existing instruments on 4m and 10m telescopes. Our primary focus is on grating-dispersed spectrographs. We discuss how to optimize dispersive elements, such as VPH gratings, to achieve adequate spectral resolution, high throughput, and efficient data packing to maximize spatial sampling for 3D spectroscopy. We review and compare the various coupling methods that make these spectrographs ``3D,'' including fibers, lenslets, slicers, and filtered multi-slits. We also describe Fabry-Perot and spatial-heterodyne interferometers, pointing out their advantages as field-widened systems relative to conventional, grating-dispersed spectrographs. We explore the parameter space all these instruments sample, highlighting regimes open for exploitation. Present instruments provide a foil for future development. We give an...
Halskov, Kim; Johansen, Stine Liv; Bach Mikkelsen, Michelle
2014-01-01
Three-dimensional projection installations are particular kinds of augmented spaces in which a digital 3-D model is projected onto a physical three-dimensional object, thereby fusing the digital content and the physical object. Based on interaction design research and media studies, this article...... contributes to the understanding of the distinctive characteristics of such a new medium, and identifies three strategies for designing 3-D projection installations: establishing space; interplay between the digital and the physical; and transformation of materiality. The principal empirical case, From...... Fingerplan to Loop City, is a 3-D projection installation presenting the history and future of city planning for the Copenhagen area in Denmark. The installation was presented as part of the 12th Architecture Biennale in Venice in 2010....
Muhammad Aqib
2016-02-01
Full Text Available Big and complex applications need many resources and long computation time to execute sequentially. In this scenario, all application's processes are handled in sequential fashion even if they are independent of each other. In high- performance computing environment, multiple processors are available to running applications in parallel. So mutually independent blocks of codes could run in parallel. This approach not only increases the efficiency of the system without affecting the results but also saves a significant amount of energy. Many parallel programming models or APIs like Open MPI, Open MP, CUDA, etc. are available to running multiple instructions in parallel. In this paper, the efficiency and energy consumption of two known tasks i.e. matrix multiplication and quicksort are analyzed using different parallel programming models and a multiprocessor machine. The obtained results, which can be generalized, outline the effect of choosing a programming model on the efficiency and energy consumption when running different codes on different machines.
User's guide of parallel program development environment (PPDE). The 2nd edition
The STA basic system has been enhanced to accelerate support for parallel programming on heterogeneous parallel computers, through a series of R and D on the technology of parallel processing. The enhancement has been made through extending the function of the PPDF, Parallel Program Development Environment in the STA basic system. The extended PPDE has the function to make: 1) the automatic creation of a 'makefile' and a shell script file for its execution, 2) the multi-tools execution which makes the tools on heterogeneous computers to execute with one operation a task on a computer, and 3) the mirror composition to reflect editing results of a file on a computer into all related files on other computers. These additional functions will enhance the work efficiency for program development on some computers. More functions have been added to the PPDE to provide help for parallel program development. New functions were also designed to complement a HPF translator and a parallelizing support tool when working together so that a sequential program is efficiently converted to a parallel program. This report describes the use of extended PPDE. (author)
Francisco R. Feito Higueruela
2010-04-01
Full Text Available Applications of Geographical Information Systems on several Archeology fields have been increasing during the last years. Recent avances in these technologies make possible to work with more realistic 3D models. In this paper we introduce a new paradigm for this system, the GIS Thetrahedron, in which we define the fundamental elements of GIS, in order to provide a better understanding of their capabilities. At the same time the basic 3D characteristics of some comercial and open source software are described, as well as the application to some samples on archeological researchs
Iliesiu, Luca; Kos, Filip; Poland, David; Pufu, Silviu S.; Simmons-Duffin, David; Yacoby, Ran
2016-03-01
We study the conformal bootstrap for a 4-point function of fermions in 3D. We first introduce an embedding formalism for 3D spinors and compute the conformal blocks appearing in fermion 4-point functions. Using these results, we find general bounds on the dimensions of operators appearing in the ψ × ψ OPE, and also on the central charge C T . We observe features in our bounds that coincide with scaling dimensions in the GrossNeveu models at large N . We also speculate that other features could coincide with a fermionic CFT containing no relevant scalar operators.
Villaume, René Domine; Ørstrup, Finn Rude
2002-01-01
Projektet undersøger potentialet for interaktiv 3D design via Internettet. Arkitekt Jørn Utzons projekt til Espansiva blev udviklet som et byggesystem med det mål, at kunne skabe mangfoldige planmuligheder og mangfoldige facade- og rumudformninger. Systemets bygningskomponenter er digitaliseret som...... 3D elementer og gjort tilgængelige. Via Internettet er det nu muligt at sammenstille og afprøve en uendelig række bygningstyper som systemet blev tænkt og udviklet til....
Kotek, L.
2015-01-01
This paper is about 3D scan of plaster dental casts. The main aim of the work is a hardware and software proposition of 3D scan system for scanning of dental casts. There were used camera, projector and rotate table for this scanning system. Surface triangulation was used, taking benefits of projections of structured light on object, which is being scanned. The rotate table is controlled by PC. The camera, projector and rotate table are synchronized by PC. Controlling of stepper motor is prov...
Ms. Swapnali R. Ghadge
2013-01-01
In today’s ever-shifting media landscape, it can be a complex task to find effective ways to reach your desired audience. As traditional media such as television continue to lose audience share, one venue in particular stands out for its ability to attract highly motivated audiences and for its tremendous growth potential the 3D Internet. The concept of '3D Internet' has recently come into the spotlight in the R&D arena, catching the attention of many people, and leading to a lot o...
Buffered coscheduling for parallel programming and enhanced fault tolerance
Petrini, Fabrizio; Feng, Wu-chun
2006-01-31
A computer implemented method schedules processor jobs on a network of parallel machine processors or distributed system processors. Control information communications generated by each process performed by each processor during a defined time interval is accumulated in buffers, where adjacent time intervals are separated by strobe intervals for a global exchange of control information. A global exchange of the control information communications at the end of each defined time interval is performed during an intervening strobe interval so that each processor is informed by all of the other processors of the number of incoming jobs to be received by each processor in a subsequent time interval. The buffered coscheduling method of this invention also enhances the fault tolerance of a network of parallel machine processors or distributed system processors
Microcoding an abstract machine for parallel logic programming
Rizk, A; Garcia, J.
1989-01-01
This paper shows the advantages of implementing an abstract intermediate machine for a parallel logical language, on the lower hardware level the FIRMWARE level - of a physical machine which implements basic hardware mechanisms for fast symbolic computation. µSyC, which is a µprogrammable Symbolic Coprocessor under development at the Bull Research Center, has been chosen as a target architecture and the abstract machine in question is the Sequential Parlog Machine which is based on the AND/OR...
Parallel graph reduction for divide-and-conquer applications - Part I: program transformations
Vree, Willem G.; Hartel, Pieter H.
1988-01-01
A proposal is made to base parallel evaluation of functional programs on graph reduction combined with a form of string reduction that avoids duplication of work. Pure graph reduction poses some rather difficult problems to implement on a parallel reduction machine, but with certain restrictions, pa
Hejlesen, Aske K.; Ovesen, Nis
2012-01-01
This paper presents an experimental approach to teaching 3D modelling techniques in an Industrial Design programme. The approach includes the use of tangible free form models as tools for improving the overall learning. The paper is based on lecturer and student experiences obtained through...
Stenholt, Rasmus; Madsen, Claus B.
2011-01-01
Enabling users to shape 3-D boxes in immersive virtual environments is a non-trivial problem. In this paper, a new family of techniques for creating rectangular boxes of arbitrary position, orientation, and size is presented and evaluated. These new techniques are based solely on position data...
M.M. Voormolen
2007-01-01
textabstractThree dimensional (3D) echocardiography has recently developed from an experimental technique in the â€™90 towards an imaging modality for the daily clinical practice. This dissertation describes the considerations, implementation, validation and clinical application of a unique
Detecting and Using Critical Paths at Runtime in Message Driven Parallel Programs
Laxmikant V. Kalé
2011-01-01
Full Text Available Detecting critical paths in traditional message passing parallel programs can be useful for post-mortem performance analysis. This paper presents an efficient online algorithm for detecting critical paths for message-driven parallel programs. Initial implementations of the algorithm have been created in three message-driven parallel languages: Charm++, Charisma, and Structured Dagger. Not only does this work describe a novel implementation of critical path detection for the message-driven programs, but also the resulting critical paths are successfully used as the program runs for automatic performance tuning. The actionable information provided by the critical path is shown to be useful for online performance tuning within the context of the message driven parallel model, whereas it has never been used for online purposes within the traditional message passing model.
F-Nets and Software Cabling: Deriving a Formal Model and Language for Portable Parallel Programming
DiNucci, David C.; Saini, Subhash (Technical Monitor)
1998-01-01
Parallel programming is still being based upon antiquated sequence-based definitions of the terms "algorithm" and "computation", resulting in programs which are architecture dependent and difficult to design and analyze. By focusing on obstacles inherent in existing practice, a more portable model is derived here, which is then formalized into a model called Soviets which utilizes a combination of imperative and functional styles. This formalization suggests more general notions of algorithm and computation, as well as insights into the meaning of structured programming in a parallel setting. To illustrate how these principles can be applied, a very-high-level graphical architecture-independent parallel language, called Software Cabling, is described, with many of the features normally expected from today's computer languages (e.g. data abstraction, data parallelism, and object-based programming constructs).
Meléndez, Adrià
2014-01-01
[eng] This dissertation is devoted to seismic tomography. I have implemented a new modelling tool for 3-D joint refraction and reflection travel-time tomography of wide-angle seismic data (TOMO3D). The reason behind this central objective is the evidence that the information based on 2-D seismic data does not allow to capture the structural complexity of many 3-D targets, and in particular that of the seismogenic zone in subduction margins. The scientific rationale for this statement, which j...
Popov, Anton; Kaus, Boris
2015-04-01
This software project aims at bringing the 3D lithospheric deformation modeling to a qualitatively different level. Our code LaMEM (Lithosphere and Mantle Evolution Model) is based on the following building blocks: * Massively-parallel data-distributed implementation model based on PETSc library * Light, stable and accurate staggered-grid finite difference spatial discretization * Marker-in-Cell pedictor-corector time discretization with Runge-Kutta 4-th order * Elastic stress rotation algorithm based on the time integration of the vorticity pseudo-vector * Staircase-type internal free surface boundary condition without artificial viscosity contrast * Geodynamically relevant visco-elasto-plastic rheology * Global velocity-pressure-temperature Newton-Raphson nonlinear solver * Local nonlinear solver based on FZERO algorithm * Coupled velocity-pressure geometric multigrid preconditioner with Galerkin coarsening Staggered grid finite difference, being inherently Eulerian and rather complicated discretization method, provides no natural treatment of free surface boundary condition. The solution based on the quasi-viscous sticky-air phase introduces significant viscosity contrasts and spoils the convergence of the iterative solvers. In LaMEM we are currently implementing an approximate stair-case type of the free surface boundary condition which excludes the empty cells and restores the solver convergence. Because of the mutual dependence of the stress and strain-rate tensor components, and their different spatial locations in the grid, there is no straightforward way of implementing the nonlinear rheology. In LaMEM we have developed and implemented an efficient interpolation scheme for the second invariant of the strain-rate tensor, that solves this problem. Scalable efficient linear solvers are the key components of the successful nonlinear problem solution. In LaMEM we have a range of PETSc-based preconditioning techniques that either employ a block factorization of
Xing Cai; Hans Petter Langtangen; Halvard Moe
2005-01-01
This article addresses the performance of scientific applications that use the Python programming language. First, we investigate several techniques for improving the computational efficiency of serial Python codes. Then, we discuss the basic programming techniques in Python for parallelizing serial scientific applications. It is shown that an efficient implementation of the array-related operations is essential for achieving good parallel performance, as for the serial case. Once the array-r...
D.T Hasta; Mutiara, A. B.
2010-01-01
The current trend of multicore architectures on shared memory systems underscores the need of parallelism. While there are some programming model to express parallelism, thread programming model has become a standard to support these system such as OpenMP, and POSIX threads. MPI (Message Passing Interface) which remains the dominant model used in high-performance computing today faces this challenge. Previous version of MPI which is MPI-1 has no shared memory concept, and Current MPI version ...
Parallel Programming Model for the Epiphany Many-Core Coprocessor Using Threaded MPI
Ross, James A.; Richie, David A.; Park, Song J.; Shires, Dale R.
2015-01-01
The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. It offers high computational energy efficiency for both integer and floating point calculations as well as parallel scalability. Yet despite the interesting architectural features, a compelling programming model has not been presented to date. This paper demonstrates an efficient parallel programming model for the Epiphany architecture based o...
Xu, Minghui
2014-01-01
3D game has widely been accepted and loved by many game players. More and more different kinds of 3D games were developed to feed people’s needs. The most common programming language for development of 3D game is C++ nowadays. Python is a high-level scripting language. It is simple and clear. The concise syntax could speed up the development cycle. This project was to develop a 3D game using only Python. The game is about how a cat lives in the street. In order to live, the player need...
From functional programming to multicore parallelism: A case study based on Presburger Arithmetic
Dung, Phan Anh; Hansen, Michael Reichhardt
2011-01-01
in the SMT-solver Z3 [8] which has the capability of solving Presburger formulas. Functional programming is well-suited for the domain of decision procedures, and its immutability feature helps to reduce parallelization effort. While Haskell has progressed with a lot of parallelismrelated research [6], we......The overall goal of this work is studying parallelization of functional programs with the specific case study of decision procedures for Presburger Arithmetic (PA). PA is a first order theory of integers accepting addition as its only operation. Whereas it has wide applications in different areas...... bound [7]. We investigate these decision procedures in the context of multicore parallelism with the hope of exploiting multicore powers. Unfortunately, we are not aware of any prior parallelism research related to decision procedures for PA. The closest work is the preliminary results on parallelism...
A Review on New Paradigm’s of Parallel Programming Models in High Performance Computing
Mr.Amitkumar S Manekar
2012-08-01
Full Text Available High Performance Computing (HPC is use of multiplecomputer resources to solve large critical problems.Multiprocessor and Multicore is two broad parallelcomputers which support parallelism. ClusteredSymmetric Multiprocessors (SMP is the most fruitful wayout for large scale applications. Enhancing theperformance of computer application is the main role ofparallel processing. Single processor performance onhigh-end systems often enjoys a noteworthy outlayadvantage when implemented in parallel on systemsutilizing multiple, lower-cost, and commoditymicroprocessors. Parallel computers are going mainstream because clusters of SMP (SymmetricMultiprocessors nodes provide support for an amplecollection of parallel programming paradigms. MPI andOpenMP are the trendy flavors in a parallel programming.In this paper we have taken a review on parallelparadigm’s available in multiprocessor and multicoresystem.
User's guide of parallel program development environment (PPDE). The 2nd edition
Ueno, Hirokazu; Takemiya, Hiroshi; Imamura, Toshiyuki; Koide, Hiroshi; Matsuda, Katsuyuki; Higuchi, Kenji; Hirayama, Toshio [Center for Promotion of Computational Science and Engineering, Japan Atomic Energy Research Institute, Tokyo (Japan); Ohta, Hirofumi [Hitachi Ltd., Tokyo (Japan)
2000-03-01
The STA basic system has been enhanced to accelerate support for parallel programming on heterogeneous parallel computers, through a series of R and D on the technology of parallel processing. The enhancement has been made through extending the function of the PPDF, Parallel Program Development Environment in the STA basic system. The extended PPDE has the function to make: 1) the automatic creation of a 'makefile' and a shell script file for its execution, 2) the multi-tools execution which makes the tools on heterogeneous computers to execute with one operation a task on a computer, and 3) the mirror composition to reflect editing results of a file on a computer into all related files on other computers. These additional functions will enhance the work efficiency for program development on some computers. More functions have been added to the PPDE to provide help for parallel program development. New functions were also designed to complement a HPF translator and a paralleilizing support tool when working together so that a sequential program is efficiently converted to a parallel program. This report describes the use of extended PPDE. (author)
Parallel Instantiation of ASP Programs: Techniques and Experiments
Perri, Simona; Sirianni, Marco
2011-01-01
Answer Set Programming (ASP) is a powerful logic-based programming language, which is enjoying increasing interest within the scientific community and (very recently) in industry. The evaluation of ASP programs is traditionally carried out in two steps. At the first step an input program P undergoes the so-called instantiation (or grounding) process, which produces a program P' semantically equivalent to P, but not containing any variable; in turn, P' is evaluated by using a backtracking search algorithm in the second step. It is well-known that instantiation is important for the efficiency of the whole evaluation, might become a bottleneck in common situations, is crucial in several realworld applications, and is particularly relevant when huge input data has to be dealt with. At the time of this writing, the available instantiator modules are not able to exploit satisfactorily the latest hardware, featuring multi-core/multi-processor SMP (Symmetric MultiProcessing) technologies. This paper presents some par...
3-D Technology Approaches for Biological Ecologies
Liu, Liyu; Austin, Robert; U. S-China Physical-Oncology Sciences Alliance (PS-OA) Team
Constructing three dimensional (3-D) landscapes is an inevitable issue in deep study of biological ecologies, because in whatever scales in nature, all of the ecosystems are composed by complex 3-D environments and biological behaviors. Just imagine if a 3-D technology could help complex ecosystems be built easily and mimic in vivo microenvironment realistically with flexible environmental controls, it will be a fantastic and powerful thrust to assist researchers for explorations. For years, we have been utilizing and developing different technologies for constructing 3-D micro landscapes for biophysics studies in in vitro. Here, I will review our past efforts, including probing cancer cell invasiveness with 3-D silicon based Tepuis, constructing 3-D microenvironment for cell invasion and metastasis through polydimethylsiloxane (PDMS) soft lithography, as well as explorations of optimized stenting positions for coronary bifurcation disease with 3-D wax printing and the latest home designed 3-D bio-printer. Although 3-D technologies is currently considered not mature enough for arbitrary 3-D micro-ecological models with easy design and fabrication, I hope through my talk, the audiences will be able to sense its significance and predictable breakthroughs in the near future. This work was supported by the State Key Development Program for Basic Research of China (Grant No. 2013CB837200), the National Natural Science Foundation of China (Grant No. 11474345) and the Beijing Natural Science Foundation (Grant No. 7154221).
A parallel clustered dynamic programming algorithm for discrete time optimal control problems
Optimal control of dynamical systems is a problem that arises in many areas of engineering and physical science. Due to the special structure of optimal control problems, currently there is no parallel algorithm that can solve optimal control problems efficiently on computers with a large number of processors. In this paper, we will introduce a new optimal control algorithm that permits massively parallel processing. The proposed algorithm, called Cluster Dynamic Programming, is a combination of two efficient serial algorithms, differential dynamic programming and a stagewise Newton's method. Parallel numerical results on an Intel iPSC/860 will be presented
Serdal Baltaci; Avni Yildiz
2015-01-01
Each new version of the GeoGebra dynamic mathematics software goes through updates and innovations. One of these innovations is the GeoGebra 5.0 version. This version aims to facilitate 3D instruction by offering opportunities for students to analyze 3D objects. While scanning the previous studies of GeoGebra 3D, it is seen that they mainly focus on the visualization of a problem in daily life and the dimensions of the evaluation of the process of problem solving with various variables. There...
Klusoň, Jindřich
2010-01-01
Computer animation has a growing importance and application in the world. With expansion of technologies increases quality of the final animation as well as number of 3D animation software. This thesis is currently mapped animation software for creating animation in film, television industry and video games which are advisable users requirements. Of them were selected according to criteria the best - Autodesk Maya 2011. This animation software is unique with tools for creating special effects...
Xing Cai
2005-01-01
Full Text Available This article addresses the performance of scientific applications that use the Python programming language. First, we investigate several techniques for improving the computational efficiency of serial Python codes. Then, we discuss the basic programming techniques in Python for parallelizing serial scientific applications. It is shown that an efficient implementation of the array-related operations is essential for achieving good parallel performance, as for the serial case. Once the array-related operations are efficiently implemented, probably using a mixed-language implementation, good serial and parallel performance become achievable. This is confirmed by a set of numerical experiments. Python is also shown to be well suited for writing high-level parallel programs.
Weeks, Cindy Lou
1986-01-01
Experiments were conducted at NASA Ames Research Center to define multi-tasking software requirements for multiple-instruction, multiple-data stream (MIMD) computer architectures. The focus was on specifying solutions for algorithms in the field of computational fluid dynamics (CFD). The program objectives were to allow researchers to produce usable parallel application software as soon as possible after acquiring MIMD computer equipment, to provide researchers with an easy-to-learn and easy-to-use parallel software language which could be implemented on several different MIMD machines, and to enable researchers to list preferred design specifications for future MIMD computer architectures. Analysis of CFD algorithms indicated that extensions of an existing programming language, adaptable to new computer architectures, provided the best solution to meeting program objectives. The CoFORTRAN Language was written in response to these objectives and to provide researchers a means to experiment with parallel software solutions to CFD algorithms on machines with parallel architectures.
Coudarcher, Rémi; Duculty, Florent; Serot, Jocelyn; Jurie, Frédéric; Derutin, Jean-Pierre; Dhome, Michel
2005-12-01
SKiPPER is a SKeleton-based Parallel Programming EnviRonment being developed since 1996 and running at LASMEA Laboratory, the Blaise-Pascal University, France. The main goal of the project was to demonstrate the applicability of skeleton-based parallel programming techniques to the fast prototyping of reactive vision applications. This paper deals with the special features embedded in the latest version of the project: algorithmic skeleton nesting capabilities and a fully dynamic operating model. Throughout the case study of a complete and realistic image processing application, in which we have pointed out the requirement for skeleton nesting, we are presenting the operating model of this feature. The work described here is one of the few reported experiments showing the application of skeleton nesting facilities for the parallelisation of a realistic application, especially in the area of image processing. The image processing application we have chosen is a 3D face-tracking algorithm from appearance.
Particle Acceleration in 3D Magnetic Reconnection
Dahlin, J.; Drake, J. F.; Swisdak, M.
2015-12-01
Magnetic reconnection is an important driver of energetic particles in phenomena such as magnetospheric storms and solar flares. Using kinetic particle-in-cell (PIC) simulations, we show that the stochastic magnetic field structure which develops during 3D reconnection plays a vital role in particle acceleration and transport. In a 2D system, electrons are trapped in magnetic islands which limits their energy gain. In a 3D system, however, the stochastic magnetic field enables the energetic electrons to access volume-filling acceleration regions and therefore gain energy much more efficiently than in the 2D system. We also examine the relative roles of two important acceleration drivers: parallel electric fields and a Fermi mechanism associated with reflection of charged particles from contracting field lines. We find that parallel electric fields are most important for accelerating low energy particles, whereas Fermi reflection dominates energetic particle production. We also find that proton energization is reduced in the 3D system.
Using CLIPS in the domain of knowledge-based massively parallel programming
Dvorak, Jiri J.
1994-01-01
The Program Development Environment (PDE) is a tool for massively parallel programming of distributed-memory architectures. Adopting a knowledge-based approach, the PDE eliminates the complexity introduced by parallel hardware with distributed memory and offers complete transparency in respect of parallelism exploitation. The knowledge-based part of the PDE is realized in CLIPS. Its principal task is to find an efficient parallel realization of the application specified by the user in a comfortable, abstract, domain-oriented formalism. A large collection of fine-grain parallel algorithmic skeletons, represented as COOL objects in a tree hierarchy, contains the algorithmic knowledge. A hybrid knowledge base with rule modules and procedural parts, encoding expertise about application domain, parallel programming, software engineering, and parallel hardware, enables a high degree of automation in the software development process. In this paper, important aspects of the implementation of the PDE using CLIPS and COOL are shown, including the embedding of CLIPS with C++-based parts of the PDE. The appropriateness of the chosen approach and of the CLIPS language for knowledge-based software engineering are discussed.
Process-Oriented Parallel Programming with an Application to Data-Intensive Computing
Givelberg, Edward
2014-01-01
We introduce process-oriented programming as a natural extension of object-oriented programming for parallel computing. It is based on the observation that every class of an object-oriented language can be instantiated as a process, accessible via a remote pointer. The introduction of process pointers requires no syntax extension, identifies processes with programming objects, and enables processes to exchange information simply by executing remote methods. Process-oriented programming is a h...
Performance and productivity of parallel python programming: a study with a CFD test case
Basermann, Achim; Röhrig-Zöllner, Melven; Illmer, Joachim
2015-01-01
The programming language Python is widely used to create rapidly compact software. However, compared to low-level programming languages like C or Fortran low performance is preventing its use for HPC applications. Efficient parallel programming of multi-core systems and graphic cards is generally a complex task. Python with add-ons might provide a simple approach to program those systems. This paper evaluates the performance of Python implementations with different libraries and compares it t...
Parallel Goals of the Early Childhood Music Program.
Cohen, Veronica Wolf
Early childhood music programs should be based on two interacting goals: (1) to teach those skills most appropriate to a particular level and (2) to nurture musical creativity and self-expression. Early childhood is seen as the optimum time for acquiring certain musical skills, of which the ability to sing in tune is considered primary. The vocal…
Schnell, R. C.; Sheridan, P. J.
2009-12-01
A NOAA WP-3D instrumented for gas, aerosol and radiation measurements was flown 400 research hours over four periods (March-April: 1983, 1986, 1989 and 1992) covering large areas of the Arctic Basin from Alaska to Norway studying Arctic Haze and air chemistry. In 1986 the program included aircraft from the University of Washington; AES, Canada; and NILU, Norway. Profiles were conducted above the Barrow, Alert and Ny Alesund atmospheric baseline stations, and numerous profiles across the low level inversion layer over the ice cap to put surface, boundary layer and free troposphere measurements into perspective. Highlights from AGASP include observations of up to 6 stacked layers of air pollution >5,000 km from the nearest possible source regions; layers of air pollution containing high concentrations of black carbon and anthropogenic gases; photochemical ozone depletion in the Arctic boundary layer; intrusions of stratospheric air injecting stratospheric gases and aerosols deep into the Arctic troposphere; haze optical depths of up to 0.5; and data showing that heat and moisture from open leads in the Arctic ice pack can breach the boundary layer inversion and rise to near the tropopause. In most profiles,aerosol light scattering, and ozone, black carbon and condensation nucleus concentrations were much reduced beneath boundary layer temperature inversion (~1 km above the ice). Since most of the AGASP and related publications pre-date current easy electronic access, a file listing the titles and sources of 185 papers published in journals, books, and NOAA Technical Memos is available at http://www.esrl.noaa.gov/gmd/obop/schnell/.
Discrete Method of Images for 3D Radio Propagation Modeling
Novak, Roman
2016-09-01
Discretization by rasterization is introduced into the method of images (MI) in the context of 3D deterministic radio propagation modeling as a way to exploit spatial coherence of electromagnetic propagation for fine-grained parallelism. Traditional algebraic treatment of bounding regions and surfaces is replaced by computer graphics rendering of 3D reflections and double refractions while building the image tree. The visibility of reception points and surfaces is also resolved by shader programs. The proposed rasterization is shown to be of comparable run time to that of the fundamentally parallel shooting and bouncing rays. The rasterization does not affect the signal evaluation backtracking step, thus preserving its advantage over the brute force ray-tracing methods in terms of accuracy. Moreover, the rendering resolution may be scaled back for a given level of scenario detail with only marginal impact on the image tree size. This allows selection of scene optimized execution parameters for faster execution, giving the method a competitive edge. The proposed variant of MI can be run on any GPU that supports real-time 3D graphics.
Shuang, Bo; Wang, Wenxiao; Shen, Hao; Tauzin, Lawrence J.; Flatebo, Charlotte; Chen, Jianbo; Moringo, Nicholas A.; Bishop, Logan D. C.; Kelly, Kevin F.; Landes, Christy F.
2016-08-01
Super-resolution microscopy with phase masks is a promising technique for 3D imaging and tracking. Due to the complexity of the resultant point spread functions, generalized recovery algorithms are still missing. We introduce a 3D super-resolution recovery algorithm that works for a variety of phase masks generating 3D point spread functions. A fast deconvolution process generates initial guesses, which are further refined by least squares fitting. Overfitting is suppressed using a machine learning determined threshold. Preliminary results on experimental data show that our algorithm can be used to super-localize 3D adsorption events within a porous polymer film and is useful for evaluating potential phase masks. Finally, we demonstrate that parallel computation on graphics processing units can reduce the processing time required for 3D recovery. Simulations reveal that, through desktop parallelization, the ultimate limit of real-time processing is possible. Our program is the first open source recovery program for generalized 3D recovery using rotating point spread functions.
Shuang, Bo; Wang, Wenxiao; Shen, Hao; Tauzin, Lawrence J; Flatebo, Charlotte; Chen, Jianbo; Moringo, Nicholas A; Bishop, Logan D C; Kelly, Kevin F; Landes, Christy F
2016-01-01
Super-resolution microscopy with phase masks is a promising technique for 3D imaging and tracking. Due to the complexity of the resultant point spread functions, generalized recovery algorithms are still missing. We introduce a 3D super-resolution recovery algorithm that works for a variety of phase masks generating 3D point spread functions. A fast deconvolution process generates initial guesses, which are further refined by least squares fitting. Overfitting is suppressed using a machine learning determined threshold. Preliminary results on experimental data show that our algorithm can be used to super-localize 3D adsorption events within a porous polymer film and is useful for evaluating potential phase masks. Finally, we demonstrate that parallel computation on graphics processing units can reduce the processing time required for 3D recovery. Simulations reveal that, through desktop parallelization, the ultimate limit of real-time processing is possible. Our program is the first open source recovery program for generalized 3D recovery using rotating point spread functions. PMID:27488312
Shuang, Bo; Wang, Wenxiao; Shen, Hao; Tauzin, Lawrence J.; Flatebo, Charlotte; Chen, Jianbo; Moringo, Nicholas A.; Bishop, Logan D. C.; Kelly, Kevin F.; Landes, Christy F.
2016-01-01
Super-resolution microscopy with phase masks is a promising technique for 3D imaging and tracking. Due to the complexity of the resultant point spread functions, generalized recovery algorithms are still missing. We introduce a 3D super-resolution recovery algorithm that works for a variety of phase masks generating 3D point spread functions. A fast deconvolution process generates initial guesses, which are further refined by least squares fitting. Overfitting is suppressed using a machine learning determined threshold. Preliminary results on experimental data show that our algorithm can be used to super-localize 3D adsorption events within a porous polymer film and is useful for evaluating potential phase masks. Finally, we demonstrate that parallel computation on graphics processing units can reduce the processing time required for 3D recovery. Simulations reveal that, through desktop parallelization, the ultimate limit of real-time processing is possible. Our program is the first open source recovery program for generalized 3D recovery using rotating point spread functions. PMID:27488312
3D Reconstruction of NMR Images
Peter Izak
2007-01-01
Full Text Available This paper introduces experiment of 3D reconstruction NMR images scanned from magnetic resonance device. There are described methods which can be used for 3D reconstruction magnetic resonance images in biomedical application. The main idea is based on marching cubes algorithm. For this task was chosen sophistication method by program Vision Assistant, which is a part of program LabVIEW.
Andringa, Roel; de Roo, Mees; Hohm, Olaf; Sezgin, Ergin; Townsend, Paul K
2009-01-01
We construct the N=1 three-dimensional supergravity theory with cosmological, Einstein-Hilbert, Lorentz Chern-Simons, and general curvature squared terms. We determine the general supersymmetric configuration, and find a family of supersymmetric adS vacua with the supersymmetric Minkowski vacuum as a limiting case. Linearizing about the Minkowski vacuum, we find three classes of unitary theories; one is the supersymmetric extension of the recently discovered `massive 3D gravity'. Another is a `new topologically massive supergravity' (with no Einstein-Hilbert term) that propagates a single (2,3/2) helicity supermultiplet.
Andringa, Roel; Bergshoeff, Eric A; De Roo, Mees; Hohm, Olaf [Centre for Theoretical Physics, University of Groningen, Nijenborgh 4, 9747 AG Groningen (Netherlands); Sezgin, Ergin [George and Cynthia Woods Mitchell Institute for Fundamental Physics and Astronomy, Texas A and M University, College Station, TX 77843 (United States); Townsend, Paul K, E-mail: E.A.Bergshoeff@rug.n, E-mail: O.Hohm@rug.n, E-mail: sezgin@tamu.ed, E-mail: P.K.Townsend@damtp.cam.ac.u [Department of Applied Mathematics and Theoretical Physics, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Road, Cambridge, CB3 0WA (United Kingdom)
2010-01-21
We construct the N=1 three-dimensional supergravity theory with cosmological, Einstein-Hilbert, Lorentz Chern-Simons, and general curvature squared terms. We determine the general supersymmetric configuration, and find a family of supersymmetric adS vacua with the supersymmetric Minkowski vacuum as a limiting case. Linearizing about the Minkowski vacuum, we find three classes of unitary theories; one is the supersymmetric extension of the recently discovered 'massive 3D gravity'. Another is a 'new topologically massive supergravity' (with no Einstein-Hilbert term) that propagates a single (2,3/2) helicity supermultiplet.
Programming a massively parallel, computation universal system: static behavior
Lapedes, A.; Farber, R.
1986-01-01
In previous work by the authors, the ''optimum finding'' properties of Hopfield neural nets were applied to the nets themselves to create a ''neural compiler.'' This was done in such a way that the problem of programming the attractors of one neural net (called the Slave net) was expressed as an optimization problem that was in turn solved by a second neural net (the Master net). In this series of papers that approach is extended to programming nets that contain interneurons (sometimes called ''hidden neurons''), and thus deals with nets capable of universal computation. 22 refs.
Ms. Swapnali R. Ghadge
2013-08-01
Full Text Available In today’s ever-shifting media landscape, it can be a complex task to find effective ways to reach your desired audience. As traditional media such as television continue to lose audience share, one venue in particular stands out for its ability to attract highly motivated audiences and for its tremendous growth potential the 3D Internet. The concept of '3D Internet' has recently come into the spotlight in the R&D arena, catching the attention of many people, and leading to a lot of discussions. Basically, one can look into this matter from a few different perspectives: visualization and representation of information, and creation and transportation of information, among others. All of them still constitute research challenges, as no products or services are yet available or foreseen for the near future. Nevertheless, one can try to envisage the directions that can be taken towards achieving this goal. People who take part in virtual worlds stay online longer with a heightened level of interest. To take advantage of that interest, diverse businesses and organizations have claimed an early stake in this fast-growing market. They include technology leaders such as IBM, Microsoft, and Cisco, companies such as BMW, Toyota, Circuit City, Coca Cola, and Calvin Klein, and scores of universities, including Harvard, Stanford and Penn State.
Parallel Processing of Economic Programs, a New Strategy in Groups of Firms
Loredana MOCEAN; Monica-Iuliana CIACA; Alexandru VANCEA
2014-01-01
In recent years parallel and distributed systems have become increasingly attractive for applications with high computational demands such as simulation of complex systems from groups of companies. The main advantage of such systems is the ratio, rather than attractive, between the price and performance that can be achieved. In the present paper, authors describe some possibilities of parallel processing at the level of economic programs in groups of firms. The architecture, model and future ...
MPI parallel programming of mixed integer optimization problems using CPLEX with COIN-OR
Aldasoro Marcellan, Unai; Garín Martín, María Araceli; Merino Maestre, María; Pérez Sainz de Rozas, Gloria
2012-01-01
The aim of this technical report is to present some detailed explanations in order to help to understand and use the Message Passing Interface (MPI) parallel programming for solving several mixed integer optimization problems. We have developed a C++ experimental code that uses the IBM ILOG CPLEX optimizer within the COmputational INfrastructure for Operations Research (COIN-OR) and MPI parallel computing for solving the optimization models under UNIX-like systems. The computational experienc...
Task scheduling of parallel programs to optimize communications for cluster of SMPs
郑纬民; 杨博; 林伟坚; 李志光
2001-01-01
This paper discusses the compile time task scheduling of parallel program running on cluster of SMP workstations. Firstly, the problem is stated formally and transformed into a graph partition problem and proved to be NP-Complete. A heuristic algorithm MMP-Solver is then proposed to solve the problem. Experiment result shows that the task scheduling can reduce communication overhead of parallel applications greatly and MMP-Solver outperforms the existing algorithms.
Dynamic programming parallel implementations for the knapsack problem
Andonov, Rumen; Raimbault, Frédéric; Quinton, Patrice
1993-01-01
A systolic algorithm for the dynamic programming approach to the knapsack problem is presented. The algorithm can run on any number of processors and has optimal time speedup and processor efficiency. The running time of the algorithm is [??](mc/q+m) on a ring of q processors, where c is the knapsack size and m is the number of object types. A new procedure for the backtracking phase of the algorithm with a time complexity [??](m) is also proposed which is an improvement on the usual strategi...
Arild Helseth
2015-12-01
Full Text Available Stochastic dual dynamic programming (SDDP has become a popular algorithm used in practical long-term scheduling of hydropower systems. The SDDP algorithm is computationally demanding, but can be designed to take advantage of parallel processing. This paper presents a novel parallel scheme for the SDDP algorithm, where the stage-wise synchronization point traditionally used in the backward iteration of the SDDP algorithm is partially relaxed. The proposed scheme was tested on a realistic model of a Norwegian water course, proving that the synchronization point relaxation significantly improves parallel efficiency.
“3D Turtle Graphics” by using a 3D Printer
Yasusi Kanada
2015-04-01
Full Text Available When creating shapes by using a 3D printer, usually, a static (declarative model designed by using a 3D CAD system is translated to a CAM program and it is sent to the printer. However, widely-used FDM-type 3D printers input a dynamical (procedural program that describes control of motions of the print head and extrusion of the filament. If the program is expressed by using a programming language or a library in a straight manner, solids can be created by a method similar to turtle graphics. An open-source library that enables “turtle 3D printing” method was described by Python and tested. Although this method currently has a problem that it cannot print in the air; however, if this problem is solved by an appropriate method, shapes drawn by 3D turtle graphics freely can be embodied by this method.
A Tool for Performance Modeling of Parallel Programs
J.A. González
2003-01-01
Full Text Available Current performance prediction analytical models try to characterize the performance behavior of actual machines through a small set of parameters. In practice, substantial deviations are observed. These differences are due to factors as memory hierarchies or network latency. A natural approach is to associate a different proportionality constant with each basic block, and analogously, to associate different latencies and bandwidths with each "communication block". Unfortunately, to use this approach implies that the evaluation of parameters must be done for each algorithm. This is a heavy task, implying experiment design, timing, statistics, pattern recognition and multi-parameter fitting algorithms. Software support is required. We present a compiler that takes as source a C program annotated with complexity formulas and produces as output an instrumented code. The trace files obtained from the execution of the resulting code are analyzed with an interactive interpreter, giving us, among other information, the values of those parameters.
“3D Turtle Graphics” by using a 3D Printer
Yasusi Kanada
2015-01-01
When creating shapes by using a 3D printer, usually, a static (declarative) model designed by using a 3D CAD system is translated to a CAM program and it is sent to the printer. However, widely-used FDM-type 3D printers input a dynamical (procedural) program that describes control of motions of the print head and extrusion of the filament. If the program is expressed by using a programming language or a library in a straight manner, solids can be created by a method similar to tur...
RAG-3D: a search tool for RNA 3D substructures.
Zahran, Mai; Sevim Bayrak, Cigdem; Elmetwaly, Shereef; Schlick, Tamar
2015-10-30
To address many challenges in RNA structure/function prediction, the characterization of RNA's modular architectural units is required. Using the RNA-As-Graphs (RAG) database, we have previously explored the existence of secondary structure (2D) submotifs within larger RNA structures. Here we present RAG-3D-a dataset of RNA tertiary (3D) structures and substructures plus a web-based search tool-designed to exploit graph representations of RNAs for the goal of searching for similar 3D structural fragments. The objects in RAG-3D consist of 3D structures translated into 3D graphs, cataloged based on the connectivity between their secondary structure elements. Each graph is additionally described in terms of its subgraph building blocks. The RAG-3D search tool then compares a query RNA 3D structure to those in the database to obtain structurally similar structures and substructures. This comparison reveals conserved 3D RNA features and thus may suggest functional connections. Though RNA search programs based on similarity in sequence, 2D, and/or 3D structural elements are available, our graph-based search tool may be advantageous for illuminating similarities that are not obvious; using motifs rather than sequence space also reduces search times considerably. Ultimately, such substructuring could be useful for RNA 3D structure prediction, structure/function inference and inverse folding. PMID:26304547
Dung, Phan Anh; Hansen, Michael Reichhardt
2015-01-01
In this paper we investigate multicore parallelism in the context of functional programming by means of two quantifier-elimination procedures for Presburger Arithmetic: one is based on Cooper’s algorithm and the other is based on the Omega Test. We first develop correct-by-construction prototype...... into account negative factors such as cache misses, garbage collection and overhead due to task creations, because such factors may introduce sequential bottlenecks with severe consequences for the parallel efficiency. The experiments were conducted using the functional programming language F# and .NET...... reveals more general applicable techniques and guideline for deriving parallel algorithms from sequential ones in the context of data-intensive tree algorithms. The obtained insights should apply for any strict and impure functional programming language. Furthermore, the results obtained for the exact...
Remote Memory Access: A Case for Portable, Efficient and Library Independent Parallel Programming
Alexandros V. Gerbessiotis
2004-01-01
Full Text Available In this work we make a strong case for remote memory access (RMA as the effective way to program a parallel computer by proposing a framework that supports RMA in a library independent, simple and intuitive way. If one uses our approach the parallel code one writes will run transparently under MPI-2 enabled libraries but also bulk-synchronous parallel libraries. The advantage of using RMA is code simplicity, reduced programming complexity, and increased efficiency. We support the latter claims by implementing under this framework a collection of benchmark programs consisting of a communication and synchronization performance assessment program, a dense matrix multiplication algorithm, and two variants of a parallel radix-sort algorithm and examine their performance on a LINUX-based PC cluster under three different RMA enabled libraries: LAM MPI, BSPlib, and PUB. We conclude that implementations of such parallel algorithms using RMA communication primitives lead to code that is as efficient as the message-passing equivalent code and in the case of radix-sort substantially more efficient. In addition our work can be used as a comparative study of the relevant capabilities of the three libraries.
3D Membrane Imaging and Porosity Visualization
Sundaramoorthi, Ganesh
2016-03-03
Ultrafiltration asymmetric porous membranes were imaged by two microscopy methods, which allow 3D reconstruction: Focused Ion Beam and Serial Block Face Scanning Electron Microscopy. A new algorithm was proposed to evaluate porosity and average pore size in different layers orthogonal and parallel to the membrane surface. The 3D-reconstruction enabled additionally the visualization of pore interconnectivity in different parts of the membrane. The method was demonstrated for a block copolymer porous membrane and can be extended to other membranes with application in ultrafiltration, supports for forward osmosis, etc, offering a complete view of the transport paths in the membrane.
Zhou, Jun
The 1994 Northridge earthquake in Los Angeles, California, killed 57 people, injured over 8,700 and caused an estimated $20 billion in damage. Petascale simulations are needed in California and elsewhere to provide society with a better understanding of the rupture and wave dynamics of the largest earthquakes at shaking frequencies required to engineer safe structures. As the heterogeneous supercomputing infrastructures are becoming more common, numerical developments in earthquake system research are particularly challenged by the dependence on the accelerator elements to enable "the Big One" simulations with higher frequency and finer resolution. Reducing time to solution and power consumption are two primary focus area today for the enabling technology of fault rupture dynamics and seismic wave propagation in realistic 3D models of the crust's heterogeneous structure. This dissertation presents scalable parallel programming techniques for high performance seismic simulation running on petascale heterogeneous supercomputers. A real world earthquake simulation code, AWP-ODC, one of the most advanced earthquake codes to date, was chosen as the base code in this research, and the testbed is based on Titan at Oak Ridge National Laboraratory, the world's largest hetergeneous supercomputer. The research work is primarily related to architecture study, computation performance tuning and software system scalability. An earthquake simulation workflow has also been developed to support the efficient production sets of simulations. The highlights of the technical development are an aggressive performance optimization focusing on data locality and a notable data communication model that hides the data communication latency. This development results in the optimal computation efficiency and throughput for the 13-point stencil code on heterogeneous systems, which can be extended to general high-order stencil codes. Started from scratch, the hybrid CPU/GPU version of AWP
Design for scalability in 3D computer graphics architectures
Holten-Lund, Hans Erik
2002-01-01
This thesis describes useful methods and techniques for designing scalable hybrid parallel rendering architectures for 3D computer graphics. Various techniques for utilizing parallelism in a pipelines system are analyzed. During the Ph.D study a prototype 3D graphics architecture named Hybris has...... been developed. Hybris is a prototype rendering architeture which can be tailored to many specific 3D graphics applications and implemented in various ways. Parallel software implementations for both single and multi-processor Windows 2000 system have been demonstrated. Working hardware...... as a case study and an application of the Hybris graphics architecture....
High performance parallelism pearls 2 multicore and many-core programming approaches
Jeffers, Jim
2015-01-01
High Performance Parallelism Pearls Volume 2 offers another set of examples that demonstrate how to leverage parallelism. Similar to Volume 1, the techniques included here explain how to use processors and coprocessors with the same programming - illustrating the most effective ways to combine Xeon Phi coprocessors with Xeon and other multicore processors. The book includes examples of successful programming efforts, drawn from across industries and domains such as biomed, genetics, finance, manufacturing, imaging, and more. Each chapter in this edited work includes detailed explanations of t
Performance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers
Jin, Hao-Qiang; Jost, Gabriele; Biegel, Bryan A. (Technical Monitor)
2002-01-01
The purpose of this study is to evaluate the feasibility of remote memory access (RMA) programming on shared memory parallel computers. We discuss different RMA based implementations of selected CFD application benchmark kernels and compare them to corresponding message passing based codes. For the message-passing implementation we use MPI point-to-point and global communication routines. For the RMA based approach we consider two different libraries supporting this programming model. One is a shared memory parallelization library (SMPlib) developed at NASA Ames, the other is the MPI-2 extensions to the MPI Standard. We give timing comparisons for the different implementation strategies and discuss the performance.
Teaching Parallel Programming Using Both High-Level and Low-Level Languages
Pan, Yi
2001-01-01
We discuss the use of both MPI and OpenMP in the teaching of senior undergraduate and junior graduate classes in parallel programming. We briefly introduce the OpenMP standard and discuss why we have chosen to use it in parallel programming classes. Advantages of using OpenMP over message passing methods are discussed. We also include a brief enumeration of some of the drawbacks of using OpenMP and how these drawbacks are being addressed by supplementing OpenMP with additional MPI codes and p...
An Improved Version of TOPAZ 3D
Krasnykh, Anatoly K
2003-01-01
An improved version of the TOPAZ 3D gun code is presented as a powerful tool for beam optics simulation. In contrast to the previous version of TOPAZ 3D, the geometry of the device under test is introduced into TOPAZ 3D directly from a CAD program, such as Solid Edge or AutoCAD. In order to have this new feature, an interface was developed, using the GiD software package as a meshing code. The article describes this method with two models to illustrate the results.
An Improved Version of TOPAZ 3D
An improved version of the TOPAZ 3D gun code is presented as a powerful tool for beam optics simulation. In contrast to the previous version of TOPAZ 3D, the geometry of the device under test is introduced into TOPAZ 3D directly from a CAD program, such as Solid Edge or AutoCAD. In order to have this new feature, an interface was developed, using the GiD software package as a meshing code. The article describes this method with two models to illustrate the results
FROILAN G. DESTREZA
2014-02-01
Full Text Available This study is for the BSHRM Students of Batangas State University (BatStateU ARASOF for the researchers believe that the Wireless 3D Chocolate Printer would be helpful in their degree program especially on making creative, artistic, personalized and decorative chocolate designs. The researchers used the Prototyping model as procedural method for the successful development and implementation of the hardware and software. This method has five phases which are the following: quick plan, quick design, prototype construction, delivery and feedback and communication. This study was evaluated by the BSHRM Students and the assessment of the respondents regarding the software and hardware application are all excellent in terms of Accuracy, Effecitveness, Efficiency, Maintainability, Reliability and User-friendliness. Also, the overall level of acceptability of the design project as evaluated by the respondents is excellent. With regard to the observation about the best raw material to use in 3D printing, the chocolate is good to use as the printed material is slightly distorted,durable and very easy to prepare; the icing is also good to use as the printed material is not distorted and is very durable but consumes time to prepare; the flour is not good as the printed material is distorted, not durable but it is easy to prepare. The computation of the economic viability level of 3d printer with reference to ROI is 37.14%. The recommendation of the researchers in the design project are as follows: adding a cooling system so that the raw material will be more durable, development of a more simplified version and improving the extrusion process wherein the user do not need to stop the printing process just to replace the empty syringe with a new one.
Hausman, Kalani Kirk
2014-01-01
Get started printing out 3D objects quickly and inexpensively! 3D printing is no longer just a figment of your imagination. This remarkable technology is coming to the masses with the growing availability of 3D printers. 3D printers create 3-dimensional layered models and they allow users to create prototypes that use multiple materials and colors. This friendly-but-straightforward guide examines each type of 3D printing technology available today and gives artists, entrepreneurs, engineers, and hobbyists insight into the amazing things 3D printing has to offer. You'll discover methods for
Fermilab's Advanced Computer Program (ACP) has been developing highly cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 MFlops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction. 10 refs., 7 figs
Grid Service Framework:Supporting Multi-Models Parallel Grid Programming
邓倩妮; 陆鑫达
2004-01-01
Web service is a grid computing technology that promises greater ease-of-use and interoperability than previous distributed computing technologies. This paper proposed Group Service Framework, a grid computing platform based on Microsoft. NET that use web service to: (1) locate and harness volunteer computing resources for different applications, and (2) support multi-models such as Master/Slave, Divide and Conquer, Phase Parallel and so forth parallel programming paradigms in Grid environment, (3) allocate data and balance load dynamically and transparently for grid computing application. The Grid Service Framework based on Microsoft. NET was used to implement several simple parallel computing applications. The results show that the proposed Group Service Framework is suitable for generic parallel numerical computing.
Szkandera, Jan
2009-01-01
Tato bakalářská práce se zabývá návrhem a realizací systému, který umožní obraz scény zobrazovaný na ploše vnímat prostorově. Prostorové vnímání 2D obrazové informace je umožněno jednak stereopromítáním a jednak tím, že se obraz mění v závislosti na poloze pozorovatele. Tato práce se zabývá hlavně druhým z těchto problémů. This Bachelor's thesis goal is to design and realize system, which allows user to perceive 2D visual information as three-dimensional. 3D visual preception of 2D image i...
Mobile tomographs often have the problem that high spatial resolution is impossible owing to the position or setup of the tomograph. While the tree tomograph developed by Messrs. Isotopenforschung Dr. Sauerwein GmbH worked well in practice, it is no longer used as the spatial resolution and measuring time are insufficient for many modern applications. The paper shows that the mechanical base of the method is sufficient for 3D CT measurements with modern detectors and X-ray tubes. CT measurements with very good statistics take less than 10 min. This means that mobile systems can be used, e.g. in examinations of non-transportable cultural objects or monuments. Enhancement of the spatial resolution of mobile tomographs capable of measuring in any position is made difficult by the fact that the tomograph has moving parts and will therefore have weight shifts. With the aid of tomographies whose spatial resolution is far higher than the mechanical accuracy, a correction method is presented for direct integration of the Feldkamp algorithm
Architecture-Adaptive Computing Environment: A Tool for Teaching Parallel Programming
Dorband, John E.; Aburdene, Maurice F.
2002-01-01
Recently, networked and cluster computation have become very popular. This paper is an introduction to a new C based parallel language for architecture-adaptive programming, aCe C. The primary purpose of aCe (Architecture-adaptive Computing Environment) is to encourage programmers to implement applications on parallel architectures by providing them the assurance that future architectures will be able to run their applications with a minimum of modification. A secondary purpose is to encourage computer architects to develop new types of architectures by providing an easily implemented software development environment and a library of test applications. This new language should be an ideal tool to teach parallel programming. In this paper, we will focus on some fundamental features of aCe C.
3D game environments create professional 3D game worlds
Ahearn, Luke
2008-01-01
The ultimate resource to help you create triple-A quality art for a variety of game worlds; 3D Game Environments offers detailed tutorials on creating 3D models, applying 2D art to 3D models, and clear concise advice on issues of efficiency and optimization for a 3D game engine. Using Photoshop and 3ds Max as his primary tools, Luke Ahearn explains how to create realistic textures from photo source and uses a variety of techniques to portray dynamic and believable game worlds.From a modern city to a steamy jungle, learn about the planning and technological considerations for 3D modelin
McCracken, Joselle M; Badea, Adina; Kandel, Mikhail E; Gladman, A Sydney; Wetzel, David J; Popescu, Gabriel; Lewis, Jennifer A; Nuzzo, Ralph G
2016-05-01
R. Nuzzo and co-workers show on page 1025 how compositional differences in hydrogels are used to tune their cellular compliance by controlling their polymer mesh properties and subsequent uptake of the protein poly-l-lysine (green spheres in circled inset). The cover image shows pyramid micro-scaffolds prepared using direct ink writing (DIW) that differentially direct fibroblast and preosteoblast growth in 3D, depending on cell motility and surface treatment. PMID:27166616