mesh-connected parallel computer: Topics by WorldWideScience.org

Sample records for mesh-connected parallel computer

Data-Parallel Mesh Connected Components Labeling and Analysis

Energy Technology Data Exchange (ETDEWEB)

Harrison, Cyrus; Childs, Hank; Gaither, Kelly

2011-04-10

We present a data-parallel algorithm for identifying and labeling the connected sub-meshes within a domain-decomposed 3D mesh. The identification task is challenging in a distributed-memory parallel setting because connectivity is transitive and the cells composing each sub-mesh may span many or all processors. Our algorithm employs a multi-stage application of the Union-find algorithm and a spatial partitioning scheme to efficiently merge information across processors and produce a global labeling of connected sub-meshes. Marking each vertex with its corresponding sub-mesh label allows us to isolate mesh features based on topology, enabling new analysis capabilities. We briefly discuss two specific applications of the algorithm and present results from a weak scaling study. We demonstrate the algorithm at concurrency levels up to 2197 cores and analyze meshes containing up to 68 billion cells.
Parallel paving: An algorithm for generating distributed, adaptive, all-quadrilateral meshes on parallel computers

Energy Technology Data Exchange (ETDEWEB)

Lober, R.R.; Tautges, T.J.; Vaughan, C.T.

1997-03-01

Paving is an automated mesh generation algorithm which produces all-quadrilateral elements. It can additionally generate these elements in varying sizes such that the resulting mesh adapts to a function distribution, such as an error function. While powerful, conventional paving is a very serial algorithm in its operation. Parallel paving is the extension of serial paving into parallel environments to perform the same meshing functions as conventional paving only on distributed, discretized models. This extension allows large, adaptive, parallel finite element simulations to take advantage of paving`s meshing capabilities for h-remap remeshing. A significantly modified version of the CUBIT mesh generation code has been developed to host the parallel paving algorithm and demonstrate its capabilities on both two dimensional and three dimensional surface geometries and compare the resulting parallel produced meshes to conventionally paved meshes for mesh quality and algorithm performance. Sandia`s {open_quotes}tiling{close_quotes} dynamic load balancing code has also been extended to work with the paving algorithm to retain parallel efficiency as subdomains undergo iterative mesh refinement.
Sierra toolkit computational mesh conceptual model

International Nuclear Information System (INIS)

Baur, David G.; Edwards, Harold Carter; Cochran, William K.; Williams, Alan B.; Sjaardema, Gregory D.

2010-01-01

The Sierra Toolkit computational mesh is a software library intended to support massively parallel multi-physics computations on dynamically changing unstructured meshes. This domain of intended use is inherently complex due to distributed memory parallelism, parallel scalability, heterogeneity of physics, heterogeneous discretization of an unstructured mesh, and runtime adaptation of the mesh. Management of this inherent complexity begins with a conceptual analysis and modeling of this domain of intended use; i.e., development of a domain model. The Sierra Toolkit computational mesh software library is designed and implemented based upon this domain model. Software developers using, maintaining, or extending the Sierra Toolkit computational mesh library must be familiar with the concepts/domain model presented in this report.
Parallel adaptive simulations on unstructured meshes

International Nuclear Information System (INIS)

Shephard, M S; Jansen, K E; Sahni, O; Diachin, L A

2007-01-01

This paper discusses methods being developed by the ITAPS center to support the execution of parallel adaptive simulations on unstructured meshes. The paper first outlines the ITAPS approach to the development of interoperable mesh, geometry and field services to support the needs of SciDAC application in these areas. The paper then demonstrates the ability of unstructured adaptive meshing methods built on such interoperable services to effectively solve important physics problems. Attention is then focused on ITAPs' developing ability to solve adaptive unstructured mesh problems on massively parallel computers
Link failure detection in a parallel computer

Science.gov (United States)

Archer, Charles J.; Blocksome, Michael A.; Megerian, Mark G.; Smith, Brian E.

2010-11-09

Methods, apparatus, and products are disclosed for link failure detection in a parallel computer including compute nodes connected in a rectangular mesh network, each pair of adjacent compute nodes in the rectangular mesh network connected together using a pair of links, that includes: assigning each compute node to either a first group or a second group such that adjacent compute nodes in the rectangular mesh network are assigned to different groups; sending, by each of the compute nodes assigned to the first group, a first test message to each adjacent compute node assigned to the second group; determining, by each of the compute nodes assigned to the second group, whether the first test message was received from each adjacent compute node assigned to the first group; and notifying a user, by each of the compute nodes assigned to the second group, whether the first test message was received.
Parallel unstructured mesh optimisation for 3D radiation transport and fluids modelling

International Nuclear Information System (INIS)

Gorman, G.J.; Pain, Ch. C.; Oliveira, C.R.E. de; Umpleby, A.P.; Goddard, A.J.H.

2003-01-01

In this paper we describe the theory and application of a parallel mesh optimisation procedure to obtain self-adapting finite element solutions on unstructured tetrahedral grids. The optimisation procedure adapts the tetrahedral mesh to the solution of a radiation transport or fluid flow problem without sacrificing the integrity of the boundary (geometry), or internal boundaries (regions) of the domain. The objective is to obtain a mesh which has both a uniform interpolation error in any direction and the element shapes are of good quality. This is accomplished with use of a non-Euclidean (anisotropic) metric which is related to the Hessian of the solution field. Appropriate scaling of the metric enables the resolution of multi-scale phenomena as encountered in transient incompressible fluids and multigroup transport calculations. The resulting metric is used to calculate element size and shape quality. The mesh optimisation method is based on a series of mesh connectivity and node position searches of the landscape defining mesh quality which is gauged by a functional. The mesh modification thus fits the solution field(s) in an optimal manner. The parallel mesh optimisation/adaptivity procedure presented in this paper is of general applicability. We illustrate this by applying it to a transient CFD (computational fluid dynamics) problem. Incompressible flow past a cylinder at moderate Reynolds numbers is modelled to demonstrate that the mesh can follow transient flow features. (authors)
Parallel Adaptive Mesh Refinement for High-Order Finite-Volume Schemes in Computational Fluid Dynamics

Science.gov (United States)

Schwing, Alan Michael

For computational fluid dynamics, the governing equations are solved on a discretized domain of nodes, faces, and cells. The quality of the grid or mesh can be a driving source for error in the results. While refinement studies can help guide the creation of a mesh, grid quality is largely determined by user expertise and understanding of the flow physics. Adaptive mesh refinement is a technique for enriching the mesh during a simulation based on metrics for error, impact on important parameters, or location of important flow features. This can offload from the user some of the difficult and ambiguous decisions necessary when discretizing the domain. This work explores the implementation of adaptive mesh refinement in an implicit, unstructured, finite-volume solver. Consideration is made for applying modern computational techniques in the presence of hanging nodes and refined cells. The approach is developed to be independent of the flow solver in order to provide a path for augmenting existing codes. It is designed to be applicable for unsteady simulations and refinement and coarsening of the grid does not impact the conservatism of the underlying numerics. The effect on high-order numerical fluxes of fourth- and sixth-order are explored. Provided the criteria for refinement is appropriately selected, solutions obtained using adapted meshes have no additional error when compared to results obtained on traditional, unadapted meshes. In order to leverage large-scale computational resources common today, the methods are parallelized using MPI. Parallel performance is considered for several test problems in order to assess scalability of both adapted and unadapted grids. Dynamic repartitioning of the mesh during refinement is crucial for load balancing an evolving grid. Development of the methods outlined here depend on a dual-memory approach that is described in detail. Validation of the solver developed here against a number of motivating problems shows favorable
Optimal data replication: A new approach to optimizing parallel EM algorithms on a mesh-connected multiprocessor for 3D PET image reconstruction

International Nuclear Information System (INIS)

Chen, C.M.; Lee, S.Y.

1995-01-01

The EM algorithm promises an estimated image with the maximal likelihood for 3D PET image reconstruction. However, due to its long computation time, the EM algorithm has not been widely used in practice. While several parallel implementations of the EM algorithm have been developed to make the EM algorithm feasible, they do not guarantee an optimal parallelization efficiency. In this paper, the authors propose a new parallel EM algorithm which maximizes the performance by optimizing data replication on a mesh-connected message-passing multiprocessor. To optimize data replication, the authors have formally derived the optimal allocation of shared data, group sizes, integration and broadcasting of replicated data as well as the scheduling of shared data accesses. The proposed parallel EM algorithm has been implemented on an iPSC/860 with 16 PEs. The experimental and theoretical results, which are consistent with each other, have shown that the proposed parallel EM algorithm could improve performance substantially over those using unoptimized data replication
Massively parallel computation of conservation laws

Energy Technology Data Exchange (ETDEWEB)

Garbey, M [Univ. Claude Bernard, Villeurbanne (France); Levine, D [Argonne National Lab., IL (United States)

1990-01-01

The authors present a new method for computing solutions of conservation laws based on the use of cellular automata with the method of characteristics. The method exploits the high degree of parallelism available with cellular automata and retains important features of the method of characteristics. It yields high numerical accuracy and extends naturally to adaptive meshes and domain decomposition methods for perturbed conservation laws. They describe the method and its implementation for a Dirichlet problem with a single conservation law for the one-dimensional case. Numerical results for the one-dimensional law with the classical Burgers nonlinearity or the Buckley-Leverett equation show good numerical accuracy outside the neighborhood of the shocks. The error in the area of the shocks is of the order of the mesh size. The algorithm is well suited for execution on both massively parallel computers and vector machines. They present timing results for an Alliant FX/8, Connection Machine Model 2, and CRAY X-MP.
COMPUTATIONAL EFFICIENCY OF A MODIFIED SCATTERING KERNEL FOR FULL-COUPLED PHOTON-ELECTRON TRANSPORT PARALLEL COMPUTING WITH UNSTRUCTURED TETRAHEDRAL MESHES

Directory of Open Access Journals (Sweden)

JONG WOON KIM

2014-04-01

In this paper, we introduce a modified scattering kernel approach to avoid the unnecessarily repeated calculations involved with the scattering source calculation, and used it with parallel computing to effectively reduce the computation time. Its computational efficiency was tested for three-dimensional full-coupled photon-electron transport problems using our computer program which solves the multi-group discrete ordinates transport equation by using the discontinuous finite element method with unstructured tetrahedral meshes for complicated geometrical problems. The numerical tests show that we can improve speed up to 17∼42 times for the elapsed time per iteration using the modified scattering kernel, not only in the single CPU calculation but also in the parallel computing with several CPUs.
Parallel sorting algorithms

CERN Document Server

Akl, Selim G

1985-01-01

Parallel Sorting Algorithms explains how to use parallel algorithms to sort a sequence of items on a variety of parallel computers. The book reviews the sorting problem, the parallel models of computation, parallel algorithms, and the lower bounds on the parallel sorting problems. The text also presents twenty different algorithms, such as linear arrays, mesh-connected computers, cube-connected computers. Another example where algorithm can be applied is on the shared-memory SIMD (single instruction stream multiple data stream) computers in which the whole sequence to be sorted can fit in the
Connectivity editing for quad-dominant meshes

KAUST Repository

Peng, Chihan

2013-08-01

We propose a connectivity editing framework for quad-dominant meshes. In our framework, the user can edit the mesh connectivity to control the location, type, and number of irregular vertices (with more or fewer than four neighbors) and irregular faces (non-quads). We provide a theoretical analysis of the problem, discuss what edits are possible and impossible, and describe how to implement an editing framework that realizes all possible editing operations. In the results, we show example edits and illustrate the advantages and disadvantages of different strategies for quad-dominant mesh design. © 2013 The Author(s) Computer Graphics Forum © 2013 The Eurographics Association and John Wiley & Sons Ltd.
Parallel adaptation of general three-dimensional hybrid meshes

International Nuclear Information System (INIS)

Kavouklis, Christos; Kallinderis, Yannis

2010-01-01

A new parallel dynamic mesh adaptation and load balancing algorithm for general hybrid grids has been developed. The meshes considered in this work are composed of four kinds of elements; tetrahedra, prisms, hexahedra and pyramids, which poses a challenge to parallel mesh adaptation. Additional complexity imposed by the presence of multiple types of elements affects especially data migration, updates of local data structures and interpartition data structures. Efficient partition of hybrid meshes has been accomplished by transforming them to suitable graphs and using serial graph partitioning algorithms. Communication among processors is based on the faces of the interpartition boundary and the termination detection algorithm of Dijkstra is employed to ensure proper flagging of edges for refinement. An inexpensive dynamic load balancing strategy is introduced to redistribute work load among processors after adaptation. In particular, only the initial coarse mesh, with proper weighting, is balanced which yields savings in computation time and relatively simple implementation of mesh quality preservation rules, while facilitating coarsening of refined elements. Special algorithms are employed for (i) data migration and dynamic updates of the local data structures, (ii) determination of the resulting interpartition boundary and (iii) identification of the communication pattern of processors. Several representative applications are included to evaluate the method.
Parallel-In-Time For Moving Meshes

Energy Technology Data Exchange (ETDEWEB)

Falgout, R. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Manteuffel, T. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Southworth, B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Schroder, J. B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2016-02-04

With steadily growing computational resources available, scientists must develop e ective ways to utilize the increased resources. High performance, highly parallel software has be- come a standard. However until recent years parallelism has focused primarily on the spatial domain. When solving a space-time partial di erential equation (PDE), this leads to a sequential bottleneck in the temporal dimension, particularly when taking a large number of time steps. The XBraid parallel-in-time library was developed as a practical way to add temporal parallelism to existing se- quential codes with only minor modi cations. In this work, a rezoning-type moving mesh is applied to a di usion problem and formulated in a parallel-in-time framework. Tests and scaling studies are run using XBraid and demonstrate excellent results for the simple model problem considered herein.
Refficientlib: an efficient load-rebalanced adaptive mesh refinement algorithm for high-performance computational physics meshes

OpenAIRE

Baiges Aznar, Joan; Bayona Roa, Camilo Andrés

2017-01-01

No separate or additional fees are collected for access to or distribution of the work. In this paper we present a novel algorithm for adaptive mesh refinement in computational physics meshes in a distributed memory parallel setting. The proposed method is developed for nodally based parallel domain partitions where the nodes of the mesh belong to a single processor, whereas the elements can belong to multiple processors. Some of the main features of the algorithm presented in this paper a...
Mesh Partitioning Algorithm Based on Parallel Finite Element Analysis and Its Actualization

Directory of Open Access Journals (Sweden)

Lei Zhang

2013-01-01

Full Text Available In parallel computing based on finite element analysis, domain decomposition is a key technique for its preprocessing. Generally, a domain decomposition of a mesh can be realized through partitioning of a graph which is converted from a finite element mesh. This paper discusses the method for graph partitioning and the way to actualize mesh partitioning. Relevant softwares are introduced, and the data structure and key functions of Metis and ParMetis are introduced. The writing, compiling, and testing of the mesh partitioning interface program based on these key functions are performed. The results indicate some objective law and characteristics to guide the users who use the graph partitioning algorithm and software to write PFEM program, and ideal partitioning effects can be achieved by actualizing mesh partitioning through the program. The interface program can also be used directly by the engineering researchers as a module of the PFEM software. So that it can reduce the application of the threshold of graph partitioning algorithm, improve the calculation efficiency, and promote the application of graph theory and parallel computing.
Parallel 3D Mortar Element Method for Adaptive Nonconforming Meshes

Science.gov (United States)

Feng, Huiyu; Mavriplis, Catherine; VanderWijngaart, Rob; Biswas, Rupak

2004-01-01

High order methods are frequently used in computational simulation for their high accuracy. An efficient way to avoid unnecessary computation in smooth regions of the solution is to use adaptive meshes which employ fine grids only in areas where they are needed. Nonconforming spectral elements allow the grid to be flexibly adjusted to satisfy the computational accuracy requirements. The method is suitable for computational simulations of unsteady problems with very disparate length scales or unsteady moving features, such as heat transfer, fluid dynamics or flame combustion. In this work, we select the Mark Element Method (MEM) to handle the non-conforming interfaces between elements. A new technique is introduced to efficiently implement MEM in 3-D nonconforming meshes. By introducing an "intermediate mortar", the proposed method decomposes the projection between 3-D elements and mortars into two steps. In each step, projection matrices derived in 2-D are used. The two-step method avoids explicitly forming/deriving large projection matrices for 3-D meshes, and also helps to simplify the implementation. This new technique can be used for both h- and p-type adaptation. This method is applied to an unsteady 3-D moving heat source problem. With our new MEM implementation, mesh adaptation is able to efficiently refine the grid near the heat source and coarsen the grid once the heat source passes. The savings in computational work resulting from the dynamic mesh adaptation is demonstrated by the reduction of the the number of elements used and CPU time spent. MEM and mesh adaptation, respectively, bring irregularity and dynamics to the computer memory access pattern. Hence, they provide a good way to gauge the performance of computer systems when running scientific applications whose memory access patterns are irregular and unpredictable. We select a 3-D moving heat source problem as the Unstructured Adaptive (UA) grid benchmark, a new component of the NAS Parallel
An Algorithm for Parallel Sn Sweeps on Unstructured Meshes

International Nuclear Information System (INIS)

Pautz, Shawn D.

2002-01-01

A new algorithm for performing parallel S n sweeps on unstructured meshes is developed. The algorithm uses a low-complexity list ordering heuristic to determine a sweep ordering on any partitioned mesh. For typical problems and with 'normal' mesh partitionings, nearly linear speedups on up to 126 processors are observed. This is an important and desirable result, since although analyses of structured meshes indicate that parallel sweeps will not scale with normal partitioning approaches, no severe asymptotic degradation in the parallel efficiency is observed with modest (≤100) levels of parallelism. This result is a fundamental step in the development of efficient parallel S n methods
Parallel Performance Optimizations on Unstructured Mesh-based Simulations

Energy Technology Data Exchange (ETDEWEB)

Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; Huck, Kevin; Hollingsworth, Jeffrey; Malony, Allen; Williams, Samuel; Oliker, Leonid

2015-01-01

© The Authors. Published by Elsevier B.V. This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.
An expert system for automatic mesh generation for Sn particle transport simulation in parallel environment

International Nuclear Information System (INIS)

Apisit, Patchimpattapong; Alireza, Haghighat; Shedlock, D.

2003-01-01

An expert system for generating an effective mesh distribution for the SN particle transport simulation has been developed. This expert system consists of two main parts: 1) an algorithm for generating an effective mesh distribution in a serial environment, and 2) an algorithm for inference of an effective domain decomposition strategy for parallel computing. For the first part, the algorithm prepares an effective mesh distribution considering problem physics and the spatial differencing scheme. For the second part, the algorithm determines a parallel-performance-index (PPI), which is defined as the ratio of the granularity to the degree-of-coupling. The parallel-performance-index provides expected performance of an algorithm depending on computing environment and resources. A large index indicates a high granularity algorithm with relatively low coupling among processors. This expert system has been successfully tested within the PENTRAN (Parallel Environment Neutral-Particle Transport) code system for simulating real-life shielding problems. (authors)

An expert system for automatic mesh generation for Sn particle transport simulation in parallel environment

Energy Technology Data Exchange (ETDEWEB)

Apisit, Patchimpattapong [Electricity Generating Authority of Thailand, Office of Corporate Planning, Bangkruai, Nonthaburi (Thailand); Alireza, Haghighat; Shedlock, D. [Florida Univ., Department of Nuclear and Radiological Engineering, Gainesville, FL (United States)

2003-07-01

An expert system for generating an effective mesh distribution for the SN particle transport simulation has been developed. This expert system consists of two main parts: 1) an algorithm for generating an effective mesh distribution in a serial environment, and 2) an algorithm for inference of an effective domain decomposition strategy for parallel computing. For the first part, the algorithm prepares an effective mesh distribution considering problem physics and the spatial differencing scheme. For the second part, the algorithm determines a parallel-performance-index (PPI), which is defined as the ratio of the granularity to the degree-of-coupling. The parallel-performance-index provides expected performance of an algorithm depending on computing environment and resources. A large index indicates a high granularity algorithm with relatively low coupling among processors. This expert system has been successfully tested within the PENTRAN (Parallel Environment Neutral-Particle Transport) code system for simulating real-life shielding problems. (authors)
A parallel adaptive mesh refinement algorithm for predicting turbulent non-premixed combusting flows

International Nuclear Information System (INIS)

Gao, X.; Groth, C.P.T.

2005-01-01

A parallel adaptive mesh refinement (AMR) algorithm is proposed for predicting turbulent non-premixed combusting flows characteristic of gas turbine engine combustors. The Favre-averaged Navier-Stokes equations governing mixture and species transport for a reactive mixture of thermally perfect gases in two dimensions, the two transport equations of the κ-ψ turbulence model, and the time-averaged species transport equations, are all solved using a fully coupled finite-volume formulation. A flexible block-based hierarchical data structure is used to maintain the connectivity of the solution blocks in the multi-block mesh and facilitate automatic solution-directed mesh adaptation according to physics-based refinement criteria. This AMR approach allows for anisotropic mesh refinement and the block-based data structure readily permits efficient and scalable implementations of the algorithm on multi-processor architectures. Numerical results for turbulent non-premixed diffusion flames, including cold- and hot-flow predictions for a bluff body burner, are described and compared to available experimental data. The numerical results demonstrate the validity and potential of the parallel AMR approach for predicting complex non-premixed turbulent combusting flows. (author)
Parallel Computing Characteristics of CUPID code under MPI and Hybrid environment

Energy Technology Data Exchange (ETDEWEB)

Lee, Jae Ryong; Yoon, Han Young [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of); Jeon, Byoung Jin; Choi, Hyoung Gwon [Seoul National Univ. of Science and Technology, Seoul (Korea, Republic of)

2014-05-15

In this paper, a characteristic of parallel algorithm is presented for solving an elliptic type equation of CUPID via domain decomposition method using the MPI and the parallel performance is estimated in terms of a scalability which shows the speedup ratio. In addition, the time-consuming pattern of major subroutines is studied. Two different grid systems are taken into account: 40,000 meshes for coarse system and 320,000 meshes for fine system. Since the matrix of the CUPID code differs according to whether the flow is single-phase or two-phase, the effect of matrix shape is evaluated. Finally, the effect of the preconditioner for matrix solver is also investigated. Finally, the hybrid (OpenMP+MPI) parallel algorithm is introduced and discussed in detail for solving pressure solver. Component-scale thermal-hydraulics code, CUPID has been developed for two-phase flow analysis, which adopts a three-dimensional, transient, three-field model, and parallelized to fulfill a recent demand for long-transient and highly resolved multi-phase flow behavior. In this study, the parallel performance of the CUPID code was investigated in terms of scalability. The CUPID code was parallelized with domain decomposition method. The MPI library was adopted to communicate the information at the neighboring domain. For managing the sparse matrix effectively, the CSR storage format is used. To take into account the characteristics of the pressure matrix which turns to be asymmetric for two-phase flow, both single-phase and two-phase calculations were run. In addition, the effect of the matrix size and preconditioning was also investigated. The fine mesh calculation shows better scalability than the coarse mesh because the number of coarse mesh does not need to decompose the computational domain excessively. The fine mesh can be present good scalability when dividing geometry with considering the ratio between computation and communication time. For a given mesh, single-phase flow
Connectivity editing for quadrilateral meshes

KAUST Repository

Peng, Chihan

2011-12-12

We propose new connectivity editing operations for quadrilateral meshes with the unique ability to explicitly control the location, orientation, type, and number of the irregular vertices (valence not equal to four) in the mesh while preserving sharp edges. We provide theoretical analysis on what editing operations are possible and impossible and introduce three fundamental operations to move and re-orient a pair of irregular vertices. We argue that our editing operations are fundamental, because they only change the quad mesh in the smallest possible region and involve the fewest irregular vertices (i.e., two). The irregular vertex movement operations are supplemented by operations for the splitting, merging, canceling, and aligning of irregular vertices. We explain how the proposed highlevel operations are realized through graph-level editing operations such as quad collapses, edge flips, and edge splits. The utility of these mesh editing operations are demonstrated by improving the connectivity of quad meshes generated from state-of-art quadrangulation techniques. © 2011 ACM.
Connectivity editing for quadrilateral meshes

KAUST Repository

Peng, Chihan; Zhang, Eugene; Kobayashi, Yoshihiro; Wonka, Peter

2011-01-01

We propose new connectivity editing operations for quadrilateral meshes with the unique ability to explicitly control the location, orientation, type, and number of the irregular vertices (valence not equal to four) in the mesh while preserving sharp edges. We provide theoretical analysis on what editing operations are possible and impossible and introduce three fundamental operations to move and re-orient a pair of irregular vertices. We argue that our editing operations are fundamental, because they only change the quad mesh in the smallest possible region and involve the fewest irregular vertices (i.e., two). The irregular vertex movement operations are supplemented by operations for the splitting, merging, canceling, and aligning of irregular vertices. We explain how the proposed highlevel operations are realized through graph-level editing operations such as quad collapses, edge flips, and edge splits. The utility of these mesh editing operations are demonstrated by improving the connectivity of quad meshes generated from state-of-art quadrangulation techniques. © 2011 ACM.
High performance parallel computing of flows in complex geometries: I. Methods

International Nuclear Information System (INIS)

Gourdain, N; Gicquel, L; Montagnac, M; Vermorel, O; Staffelbach, G; Garcia, M; Boussuge, J-F; Gazaix, M; Poinsot, T

2009-01-01

Efficient numerical tools coupled with high-performance computers, have become a key element of the design process in the fields of energy supply and transportation. However flow phenomena that occur in complex systems such as gas turbines and aircrafts are still not understood mainly because of the models that are needed. In fact, most computational fluid dynamics (CFD) predictions as found today in industry focus on a reduced or simplified version of the real system (such as a periodic sector) and are usually solved with a steady-state assumption. This paper shows how to overcome such barriers and how such a new challenge can be addressed by developing flow solvers running on high-end computing platforms, using thousands of computing cores. Parallel strategies used by modern flow solvers are discussed with particular emphases on mesh-partitioning, load balancing and communication. Two examples are used to illustrate these concepts: a multi-block structured code and an unstructured code. Parallel computing strategies used with both flow solvers are detailed and compared. This comparison indicates that mesh-partitioning and load balancing are more straightforward with unstructured grids than with multi-block structured meshes. However, the mesh-partitioning stage can be challenging for unstructured grids, mainly due to memory limitations of the newly developed massively parallel architectures. Finally, detailed investigations show that the impact of mesh-partitioning on the numerical CFD solutions, due to rounding errors and block splitting, may be of importance and should be accurately addressed before qualifying massively parallel CFD tools for a routine industrial use.
Domain decomposition parallel computing for transient two-phase flow of nuclear reactors

Energy Technology Data Exchange (ETDEWEB)

Lee, Jae Ryong; Yoon, Han Young [KAERI, Daejeon (Korea, Republic of); Choi, Hyoung Gwon [Seoul National University, Seoul (Korea, Republic of)

2016-05-15

KAERI (Korea Atomic Energy Research Institute) has been developing a multi-dimensional two-phase flow code named CUPID for multi-physics and multi-scale thermal hydraulics analysis of Light water reactors (LWRs). The CUPID code has been validated against a set of conceptual problems and experimental data. In this work, the CUPID code has been parallelized based on the domain decomposition method with Message passing interface (MPI) library. For domain decomposition, the CUPID code provides both manual and automatic methods with METIS library. For the effective memory management, the Compressed sparse row (CSR) format is adopted, which is one of the methods to represent the sparse asymmetric matrix. CSR format saves only non-zero value and its position (row and column). By performing the verification for the fundamental problem set, the parallelization of the CUPID has been successfully confirmed. Since the scalability of a parallel simulation is generally known to be better for fine mesh system, three different scales of mesh system are considered: 40000 meshes for coarse mesh system, 320000 meshes for mid-size mesh system, and 2560000 meshes for fine mesh system. In the given geometry, both single- and two-phase calculations were conducted. In addition, two types of preconditioners for a matrix solver were compared: Diagonal and incomplete LU preconditioner. In terms of enhancement of the parallel performance, the OpenMP and MPI hybrid parallel computing for a pressure solver was examined. It is revealed that the scalability of hybrid calculation was enhanced for the multi-core parallel computation.
Representing and computing regular languages on massively parallel networks

Energy Technology Data Exchange (ETDEWEB)

Miller, M.I.; O' Sullivan, J.A. (Electronic Systems and Research Lab., of Electrical Engineering, Washington Univ., St. Louis, MO (US)); Boysam, B. (Dept. of Electrical, Computer and Systems Engineering, Rensselaer Polytechnic Inst., Troy, NY (US)); Smith, K.R. (Dept. of Electrical Engineering, Southern Illinois Univ., Edwardsville, IL (US))

1991-01-01

This paper proposes a general method for incorporating rule-based constraints corresponding to regular languages into stochastic inference problems, thereby allowing for a unified representation of stochastic and syntactic pattern constraints. The authors' approach first established the formal connection of rules to Chomsky grammars, and generalizes the original work of Shannon on the encoding of rule-based channel sequences to Markov chains of maximum entropy. This maximum entropy probabilistic view leads to Gibb's representations with potentials which have their number of minima growing at precisely the exponential rate that the language of deterministically constrained sequences grow. These representations are coupled to stochastic diffusion algorithms, which sample the language-constrained sequences by visiting the energy minima according to the underlying Gibbs' probability law. The coupling to stochastic search methods yields the all-important practical result that fully parallel stochastic cellular automata may be derived to generate samples from the rule-based constraint sets. The production rules and neighborhood state structure of the language of sequences directly determines the necessary connection structures of the required parallel computing surface. Representations of this type have been mapped to the DAP-510 massively-parallel processor consisting of 1024 mesh-connected bit-serial processing elements for performing automated segmentation of electron-micrograph images.
LR: Compact connectivity representation for triangle meshes

Energy Technology Data Exchange (ETDEWEB)

Gurung, T; Luffel, M; Lindstrom, P; Rossignac, J

2011-01-28

We propose LR (Laced Ring) - a simple data structure for representing the connectivity of manifold triangle meshes. LR provides the option to store on average either 1.08 references per triangle or 26.2 bits per triangle. Its construction, from an input mesh that supports constant-time adjacency queries, has linear space and time complexity, and involves ordering most vertices along a nearly-Hamiltonian cycle. LR is best suited for applications that process meshes with fixed connectivity, as any changes to the connectivity require the data structure to be rebuilt. We provide an implementation of the set of standard random-access, constant-time operators for traversing a mesh, and show that LR often saves both space and traversal time over competing representations.
Software abstractions and computational issues in parallel structure adaptive mesh methods for electronic structure calculations

Energy Technology Data Exchange (ETDEWEB)

Kohn, S.; Weare, J.; Ong, E.; Baden, S.

1997-05-01

We have applied structured adaptive mesh refinement techniques to the solution of the LDA equations for electronic structure calculations. Local spatial refinement concentrates memory resources and numerical effort where it is most needed, near the atomic centers and in regions of rapidly varying charge density. The structured grid representation enables us to employ efficient iterative solver techniques such as conjugate gradient with FAC multigrid preconditioning. We have parallelized our solver using an object- oriented adaptive mesh refinement framework.
A parallel graded-mesh FDTD algorithm for human-antenna interaction problems.

Science.gov (United States)

Catarinucci, Luca; Tarricone, Luciano

2009-01-01

The finite difference time domain method (FDTD) is frequently used for the numerical solution of a wide variety of electromagnetic (EM) problems and, among them, those concerning human exposure to EM fields. In many practical cases related to the assessment of occupational EM exposure, large simulation domains are modeled and high space resolution adopted, so that strong memory and central processing unit power requirements have to be satisfied. To better afford the computational effort, the use of parallel computing is a winning approach; alternatively, subgridding techniques are often implemented. However, the simultaneous use of subgridding schemes and parallel algorithms is very new. In this paper, an easy-to-implement and highly-efficient parallel graded-mesh (GM) FDTD scheme is proposed and applied to human-antenna interaction problems, demonstrating its appropriateness in dealing with complex occupational tasks and showing its capability to guarantee the advantages of a traditional subgridding technique without affecting the parallel FDTD performance.
Parallelization of Unsteady Adaptive Mesh Refinement for Unstructured Navier-Stokes Solvers

Science.gov (United States)

Schwing, Alan M.; Nompelis, Ioannis; Candler, Graham V.

2014-01-01

This paper explores the implementation of the MPI parallelization in a Navier-Stokes solver using adaptive mesh re nement. Viscous and inviscid test problems are considered for the purpose of benchmarking, as are implicit and explicit time advancement methods. The main test problem for comparison includes e ects from boundary layers and other viscous features and requires a large number of grid points for accurate computation. Ex- perimental validation against double cone experiments in hypersonic ow are shown. The adaptive mesh re nement shows promise for a staple test problem in the hypersonic com- munity. Extension to more advanced techniques for more complicated ows is described.
The DANTE Boltzmann transport solver: An unstructured mesh, 3-D, spherical harmonics algorithm compatible with parallel computer architectures

International Nuclear Information System (INIS)

McGhee, J.M.; Roberts, R.M.; Morel, J.E.

1997-01-01

A spherical harmonics research code (DANTE) has been developed which is compatible with parallel computer architectures. DANTE provides 3-D, multi-material, deterministic, transport capabilities using an arbitrary finite element mesh. The linearized Boltzmann transport equation is solved in a second order self-adjoint form utilizing a Galerkin finite element spatial differencing scheme. The core solver utilizes a preconditioned conjugate gradient algorithm. Other distinguishing features of the code include options for discrete-ordinates and simplified spherical harmonics angular differencing, an exact Marshak boundary treatment for arbitrarily oriented boundary faces, in-line matrix construction techniques to minimize memory consumption, and an effective diffusion based preconditioner for scattering dominated problems. Algorithm efficiency is demonstrated for a massively parallel SIMD architecture (CM-5), and compatibility with MPP multiprocessor platforms or workstation clusters is anticipated
Applications of the parallel computing system using network

International Nuclear Information System (INIS)

Ido, Shunji; Hasebe, Hiroki

1994-01-01

Parallel programming is applied to multiple processors connected in Ethernet. Data exchanges between tasks located in each processing element are realized by two ways. One is socket which is standard library on recent UNIX operating systems. Another is a network connecting software, named as Parallel Virtual Machine (PVM) which is a free software developed by ORNL, to use many workstations connected to network as a parallel computer. This paper discusses the availability of parallel computing using network and UNIX workstations and comparison between specialized parallel systems (Transputer and iPSC/860) in a Monte Carlo simulation which generally shows high parallelization ratio. (author)
Parallel Block Structured Adaptive Mesh Refinement on Graphics Processing Units

Energy Technology Data Exchange (ETDEWEB)

Beckingsale, D. A. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom); Gaudin, W. P. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom); Hornung, R. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Gunney, B. T. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Gamblin, T. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Herdman, J. A. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom); Jarvis, S. A. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom)

2014-11-17

Block-structured adaptive mesh refinement is a technique that can be used when solving partial differential equations to reduce the number of zones necessary to achieve the required accuracy in areas of interest. These areas (shock fronts, material interfaces, etc.) are recursively covered with finer mesh patches that are grouped into a hierarchy of refinement levels. Despite the potential for large savings in computational requirements and memory usage without a corresponding reduction in accuracy, AMR adds overhead in managing the mesh hierarchy, adding complex communication and data movement requirements to a simulation. In this paper, we describe the design and implementation of a native GPU-based AMR library, including: the classes used to manage data on a mesh patch, the routines used for transferring data between GPUs on different nodes, and the data-parallel operators developed to coarsen and refine mesh data. We validate the performance and accuracy of our implementation using three test problems and two architectures: an eight-node cluster, and over four thousand nodes of Oak Ridge National Laboratory’s Titan supercomputer. Our GPU-based AMR hydrodynamics code performs up to 4.87× faster than the CPU-based implementation, and has been scaled to over four thousand GPUs using a combination of MPI and CUDA.
Parallel fuzzy connected image segmentation on GPU

OpenAIRE

Zhuge, Ying; Cao, Yong; Udupa, Jayaram K.; Miller, Robert W.

2011-01-01

Purpose: Image segmentation techniques using fuzzy connectedness (FC) principles have shown their effectiveness in segmenting a variety of objects in several large applications. However, one challenge in these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays, commodity graphics hardware provides a highly parallel computing environment. In this paper, the authors present a parallel fuzzy connected image segmentation algorithm impleme...
Parallel CFD Algorithms for Aerodynamical Flow Solvers on Unstructured Meshes. Parts 1 and 2

Science.gov (United States)

Barth, Timothy J.; Kwak, Dochan (Technical Monitor)

1995-01-01

The Advisory Group for Aerospace Research and Development (AGARD) has requested my participation in the lecture series entitled Parallel Computing in Computational Fluid Dynamics to be held at the von Karman Institute in Brussels, Belgium on May 15-19, 1995. In addition, a request has been made from the US Coordinator for AGARD at the Pentagon for NASA Ames to hold a repetition of the lecture series on October 16-20, 1995. I have been asked to be a local coordinator for the Ames event. All AGARD lecture series events have attendance limited to NATO allied countries. A brief of the lecture series is provided in the attached enclosure. Specifically, I have been asked to give two lectures of approximately 75 minutes each on the subject of parallel solution techniques for the fluid flow equations on unstructured meshes. The title of my lectures is "Parallel CFD Algorithms for Aerodynamical Flow Solvers on Unstructured Meshes" (Parts I-II). The contents of these lectures will be largely review in nature and will draw upon previously published work in this area. Topics of my lectures will include: (1) Mesh partitioning algorithms. Recursive techniques based on coordinate bisection, Cuthill-McKee level structures, and spectral bisection. (2) Newton's method for large scale CFD problems. Size and complexity estimates for Newton's method, modifications for insuring global convergence. (3) Techniques for constructing the Jacobian matrix. Analytic and numerical techniques for Jacobian matrix-vector products, constructing the transposed matrix, extensions to optimization and homotopy theories. (4) Iterative solution algorithms. Practical experience with GIVIRES and BICG-STAB matrix solvers. (5) Parallel matrix preconditioning. Incomplete Lower-Upper (ILU) factorization, domain-decomposed ILU, approximate Schur complement strategies.
Mesh-based parallel code coupling interface

Energy Technology Data Exchange (ETDEWEB)

Wolf, K.; Steckel, B. (eds.) [GMD - Forschungszentrum Informationstechnik GmbH, St. Augustin (DE). Inst. fuer Algorithmen und Wissenschaftliches Rechnen (SCAI)

2001-04-01

MpCCI (mesh-based parallel code coupling interface) is an interface for multidisciplinary simulations. It provides industrial end-users as well as commercial code-owners with the facility to combine different simulation tools in one environment. Thereby new solutions for multidisciplinary problems will be created. This opens new application dimensions for existent simulation tools. This Book of Abstracts gives a short overview about ongoing activities in industry and research - all presented at the 2{sup nd} MpCCI User Forum in February 2001 at GMD Sankt Augustin. (orig.) [German] MpCCI (mesh-based parallel code coupling interface) definiert eine Schnittstelle fuer multidisziplinaere Simulationsanwendungen. Sowohl industriellen Anwender als auch kommerziellen Softwarehersteller wird mit MpCCI die Moeglichkeit gegeben, Simulationswerkzeuge unterschiedlicher Disziplinen miteinander zu koppeln. Dadurch entstehen neue Loesungen fuer multidisziplinaere Problemstellungen und fuer etablierte Simulationswerkzeuge ergeben sich neue Anwendungsfelder. Dieses Book of Abstracts bietet einen Ueberblick ueber zur Zeit laufende Arbeiten in der Industrie und in der Forschung, praesentiert auf dem 2{sup nd} MpCCI User Forum im Februar 2001 an der GMD Sankt Augustin. (orig.)
Discussion about the design for mesh data structure within the parallel framework

International Nuclear Information System (INIS)

Shi Guangmei; Wu Ruian; Wang Keying; Ji Xiaoyu; Hao Zhiming; Mo Jun; He Yingbo

2010-01-01

The mesh data structure, one of the fundamental data structure within the parallel framework, its design and realization level have an effect upon parallel capability of the parallel framework. Through the architecture and the fundamental data structure within some typical parallel framework relatively analyzed, such as JASMIN, SIERRA, and ITAPS, the design thought of parallel framework is discussed. Through borrowing ideas from layered set of services design about the SIERRA Framework, and combining with the objective of PANDA Framework in the near future, this paper present the rudimentary system about PANDA framework layered set of services. On this foundation, detailed introduction is placed in the definition and the management of the mesh data structure that it is located in the underlayer of the PANDA framework. The design and realization about parallel distributed mesh data structure of PANDA are emphatically discussed. The PANDA framework extension and application program development based on PANDA framework are grounded on our efforts.
Parallel fuzzy connected image segmentation on GPU.

Science.gov (United States)

Zhuge, Ying; Cao, Yong; Udupa, Jayaram K; Miller, Robert W

2011-07-01

Image segmentation techniques using fuzzy connectedness (FC) principles have shown their effectiveness in segmenting a variety of objects in several large applications. However, one challenge in these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays, commodity graphics hardware provides a highly parallel computing environment. In this paper, the authors present a parallel fuzzy connected image segmentation algorithm implementation on NVIDIA's compute unified device Architecture (CUDA) platform for segmenting medical image data sets. In the FC algorithm, there are two major computational tasks: (i) computing the fuzzy affinity relations and (ii) computing the fuzzy connectedness relations. These two tasks are implemented as CUDA kernels and executed on GPU. A dramatic improvement in speed for both tasks is achieved as a result. Our experiments based on three data sets of small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 24.4x, 18.1x, and 10.3x, correspondingly, for the three data sets on the NVIDIA Tesla C1060 over the implementation of the algorithm on CPU, and takes 0.25, 0.72, and 15.04 s, correspondingly, for the three data sets. The authors developed a parallel algorithm of the widely used fuzzy connected image segmentation method on the NVIDIA GPUs, which are far more cost- and speed-effective than both cluster of workstations and multiprocessing systems. A near-interactive speed of segmentation has been achieved, even for the large data set.

A software framework for the portable parallelization of particle-mesh simulations

DEFF Research Database (Denmark)

Sbalzarini, I.F.; Walther, Jens Honore; Polasek, B.

2006-01-01

Abstract: We present a software framework for the transparent and portable parallelization of simulations using particle-mesh methods. Particles are used to transport physical properties and a mesh is required in order to reinitialize the distorted particle locations, ensuring the convergence...
Broadcasting a message in a parallel computer

Science.gov (United States)

Berg, Jeremy E [Rochester, MN; Faraj, Ahmad A [Rochester, MN

2011-08-02

Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.
A Computational Fluid Dynamics Algorithm on a Massively Parallel Computer

Science.gov (United States)

Jespersen, Dennis C.; Levit, Creon

1989-01-01

The discipline of computational fluid dynamics is demanding ever-increasing computational power to deal with complex fluid flow problems. We investigate the performance of a finite-difference computational fluid dynamics algorithm on a massively parallel computer, the Connection Machine. Of special interest is an implicit time-stepping algorithm; to obtain maximum performance from the Connection Machine, it is necessary to use a nonstandard algorithm to solve the linear systems that arise in the implicit algorithm. We find that the Connection Machine ran achieve very high computation rates on both explicit and implicit algorithms. The performance of the Connection Machine puts it in the same class as today's most powerful conventional supercomputers.
Automatic mesh refinement and parallel load balancing for Fokker-Planck-DSMC algorithm

Science.gov (United States)

Küchlin, Stephan; Jenny, Patrick

2018-06-01

Recently, a parallel Fokker-Planck-DSMC algorithm for rarefied gas flow simulation in complex domains at all Knudsen numbers was developed by the authors. Fokker-Planck-DSMC (FP-DSMC) is an augmentation of the classical DSMC algorithm, which mitigates the near-continuum deficiencies in terms of computational cost of pure DSMC. At each time step, based on a local Knudsen number criterion, the discrete DSMC collision operator is dynamically switched to the Fokker-Planck operator, which is based on the integration of continuous stochastic processes in time, and has fixed computational cost per particle, rather than per collision. In this contribution, we present an extension of the previous implementation with automatic local mesh refinement and parallel load-balancing. In particular, we show how the properties of discrete approximations to space-filling curves enable an efficient implementation. Exemplary numerical studies highlight the capabilities of the new code.
A parallel algorithm for transient solid dynamics simulations with contact detection

International Nuclear Information System (INIS)

Attaway, S.; Hendrickson, B.; Plimpton, S.; Gardner, D.; Vaughan, C.; Heinstein, M.; Peery, J.

1996-01-01

Solid dynamics simulations with Lagrangian finite elements are used to model a wide variety of problems, such as the calculation of impact damage to shipping containers for nuclear waste and the analysis of vehicular crashes. Using parallel computers for these simulations has been hindered by the difficulty of searching efficiently for material surface contacts in parallel. A new parallel algorithm for calculation of arbitrary material contacts in finite element simulations has been developed and implemented in the PRONTO3D transient solid dynamics code. This paper will explore some of the issues involved in developing efficient, portable, parallel finite element models for nonlinear transient solid dynamics simulations. The contact-detection problem poses interesting challenges for efficient implementation of a solid dynamics simulation on a parallel computer. The finite element mesh is typically partitioned so that each processor owns a localized region of the finite element mesh. This mesh partitioning is optimal for the finite element portion of the calculation since each processor must communicate only with the few connected neighboring processors that share boundaries with the decomposed mesh. However, contacts can occur between surfaces that may be owned by any two arbitrary processors. Hence, a global search across all processors is required at every time step to search for these contacts. Load-imbalance can become a problem since the finite element decomposition divides the volumetric mesh evenly across processors but typically leaves the surface elements unevenly distributed. In practice, these complications have been limiting factors in the performance and scalability of transient solid dynamics on massively parallel computers. In this paper the authors present a new parallel algorithm for contact detection that overcomes many of these limitations
Broadcasting collective operation contributions throughout a parallel computer

Science.gov (United States)

Faraj, Ahmad [Rochester, MN

2012-02-21

Methods, systems, and products are disclosed for broadcasting collective operation contributions throughout a parallel computer. The parallel computer includes a plurality of compute nodes connected together through a data communications network. Each compute node has a plurality of processors for use in collective parallel operations on the parallel computer. Broadcasting collective operation contributions throughout a parallel computer according to embodiments of the present invention includes: transmitting, by each processor on each compute node, that processor's collective operation contribution to the other processors on that compute node using intra-node communications; and transmitting on a designated network link, by each processor on each compute node according to a serial processor transmission sequence, that processor's collective operation contribution to the other processors on the other compute nodes using inter-node communications.
A Parallel, Multi-Scale Watershed-Hydrologic-Inundation Model with Adaptively Switching Mesh for Capturing Flooding and Lake Dynamics

Science.gov (United States)

Ji, X.; Shen, C.

2017-12-01

Flood inundation presents substantial societal hazards and also changes biogeochemistry for systems like the Amazon. It is often expensive to simulate high-resolution flood inundation and propagation in a long-term watershed-scale model. Due to the Courant-Friedrichs-Lewy (CFL) restriction, high resolution and large local flow velocity both demand prohibitively small time steps even for parallel codes. Here we develop a parallel surface-subsurface process-based model enhanced by multi-resolution meshes that are adaptively switched on or off. The high-resolution overland flow meshes are enabled only when the flood wave invades to floodplains. This model applies semi-implicit, semi-Lagrangian (SISL) scheme in solving dynamic wave equations, and with the assistant of the multi-mesh method, it also adaptively chooses the dynamic wave equation only in the area of deep inundation. Therefore, the model achieves a balance between accuracy and computational cost.
On efficiency of fire simulation realization: parallelization with greater number of computational meshes

Science.gov (United States)

Valasek, Lukas; Glasa, Jan

2017-12-01

Current fire simulation systems are capable to utilize advantages of high-performance computer (HPC) platforms available and to model fires efficiently in parallel. In this paper, efficiency of a corridor fire simulation on a HPC computer cluster is discussed. The parallel MPI version of Fire Dynamics Simulator is used for testing efficiency of selected strategies of allocation of computational resources of the cluster using a greater number of computational cores. Simulation results indicate that if the number of cores used is not equal to a multiple of the total number of cluster node cores there are allocation strategies which provide more efficient calculations.
A Generic Mesh Data Structure with Parallel Applications

Science.gov (United States)

Cochran, William Kenneth, Jr.

2009-01-01

High performance, massively-parallel multi-physics simulations are built on efficient mesh data structures. Most data structures are designed from the bottom up, focusing on the implementation of linear algebra routines. In this thesis, we explore a top-down approach to design, evaluating the various needs of many aspects of simulation, not just…
A highly efficient parallel algorithm for solving the neutron diffusion nodal equations on shared-memory computers

International Nuclear Information System (INIS)

Azmy, Y.Y.; Kirk, B.L.

1990-01-01

Modern parallel computer architectures offer an enormous potential for reducing CPU and wall-clock execution times of large-scale computations commonly performed in various applications in science and engineering. Recently, several authors have reported their efforts in developing and implementing parallel algorithms for solving the neutron diffusion equation on a variety of shared- and distributed-memory parallel computers. Testing of these algorithms for a variety of two- and three-dimensional meshes showed significant speedup of the computation. Even for very large problems (i.e., three-dimensional fine meshes) executed concurrently on a few nodes in serial (nonvector) mode, however, the measured computational efficiency is very low (40 to 86%). In this paper, the authors present a highly efficient (∼85 to 99.9%) algorithm for solving the two-dimensional nodal diffusion equations on the Sequent Balance 8000 parallel computer. Also presented is a model for the performance, represented by the efficiency, as a function of problem size and the number of participating processors. The model is validated through several tests and then extrapolated to larger problems and more processors to predict the performance of the algorithm in more computationally demanding situations
Parallel Implementation and Scaling of an Adaptive Mesh Discrete Ordinates Algorithm for Transport

International Nuclear Information System (INIS)

Howell, L H

2004-01-01

Block-structured adaptive mesh refinement (AMR) uses a mesh structure built up out of locally-uniform rectangular grids. In the BoxLib parallel framework used by the Raptor code, each processor operates on one or more of these grids at each refinement level. The decomposition of the mesh into grids and the distribution of these grids among processors may change every few timesteps as a calculation proceeds. Finer grids use smaller timesteps than coarser grids, requiring additional work to keep the system synchronized and ensure conservation between different refinement levels. In a paper for NECDC 2002 I presented preliminary results on implementation of parallel transport sweeps on the AMR mesh, conjugate gradient acceleration, accuracy of the AMR solution, and scalar speedup of the AMR algorithm compared to a uniform fully-refined mesh. This paper continues with a more in-depth examination of the parallel scaling properties of the scheme, both in single-level and multi-level calculations. Both sweeping and setup costs are considered. The algorithm scales with acceptable performance to several hundred processors. Trends suggest, however, that this is the limit for efficient calculations with traditional transport sweeps, and that modifications to the sweep algorithm will be increasingly needed as job sizes in the thousands of processors become common
Implementation of a cell-wise block-Gauss-Seidel iterative method for SN transport on a hybrid parallel computer architecture

International Nuclear Information System (INIS)

Rosa, Massimiliano; Warsa, James S.; Perks, Michael

2011-01-01

We have implemented a cell-wise, block-Gauss-Seidel (bGS) iterative algorithm, for the solution of the S_n transport equations on the Roadrunner hybrid, parallel computer architecture. A compute node of this massively parallel machine comprises AMD Opteron cores that are linked to a Cell Broadband Engine™ (Cell/B.E.)"1. LAPACK routines have been ported to the Cell/B.E. in order to make use of its parallel Synergistic Processing Elements (SPEs). The bGS algorithm is based on the LU factorization and solution of a linear system that couples the fluxes for all S_n angles and energy groups on a mesh cell. For every cell of a mesh that has been parallel decomposed on the higher-level Opteron processors, a linear system is transferred to the Cell/B.E. and the parallel LAPACK routines are used to compute a solution, which is then transferred back to the Opteron, where the rest of the computations for the S_n transport problem take place. Compared to standard parallel machines, a hundred-fold speedup of the bGS was observed on the hybrid Roadrunner architecture. Numerical experiments with strong and weak parallel scaling demonstrate the bGS method is viable and compares favorably to full parallel sweeps (FPS) on two-dimensional, unstructured meshes when it is applied to optically thick, multi-material problems. As expected, however, it is not as efficient as FPS in optically thin problems. (author)
Grouper: a compact, streamable triangle mesh data structure.

Science.gov (United States)

Luffel, Mark; Gurung, Topraj; Lindstrom, Peter; Rossignac, Jarek

2014-01-01

We present Grouper: an all-in-one compact file format, random-access data structure, and streamable representation for large triangle meshes. Similarly to the recently published SQuad representation, Grouper represents the geometry and connectivity of a mesh by grouping vertices and triangles into fixed-size records, most of which store two adjacent triangles and a shared vertex. Unlike SQuad, however, Grouper interleaves geometry with connectivity and uses a new connectivity representation to ensure that vertices and triangles can be stored in a coherent order that enables memory-efficient sequential stream processing. We present a linear-time construction algorithm that allows streaming out Grouper meshes using a small memory footprint while preserving the initial ordering of vertices. As a part of this construction, we show how the problem of assigning vertices and triangles to groups reduces to a well-known NP-hard optimization problem, and present a simple yet effective heuristic solution that performs well in practice. Our array-based Grouper representation also doubles as a triangle mesh data structure that allows direct access to vertices and triangles. Storing only about two integer references per triangle--i.e., less than the three vertex references stored with each triangle in a conventional indexed mesh format--Grouper answers both incidence and adjacency queries in amortized constant time. Our compact representation enables data-parallel processing on multicore computers, instant partitioning and fast transmission for distributed processing, as well as efficient out-of-core access. We demonstrate the versatility and performance benefits of Grouper using a suite of example meshes and processing kernels.
Grouper: A Compact, Streamable Triangle Mesh Data Structure

Energy Technology Data Exchange (ETDEWEB)

Luffel, Mark [Georgia Inst. of Technology, Atlanta, GA (United States). Visualization and Usability Center (GVU); Gurung, Topraj [Georgia Inst. of Technology, Atlanta, GA (United States). Visualization and Usability Center (GVU); Lindstrom, Peter [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Rossignac, Jarek [Georgia Inst. of Technology, Atlanta, GA (United States). Visualization and Usability Center (GVU)

2014-01-01

Here, we present Grouper: an all-in-one compact file format, random-access data structure, and streamable representation for large triangle meshes. Similarly to the recently published SQuad representation, Grouper represents the geometry and connectivity of a mesh by grouping vertices and triangles into fixed-size records, most of which store two adjacent triangles and a shared vertex. Unlike SQuad, however, Grouper interleaves geometry with connectivity and uses a new connectivity representation to ensure that vertices and triangles can be stored in a coherent order that enables memory-efficient sequential stream processing. We also present a linear-time construction algorithm that allows streaming out Grouper meshes using a small memory footprint while preserving the initial ordering of vertices. In this construction, we show how the problem of assigning vertices and triangles to groups reduces to a well-known NP-hard optimization problem, and present a simple yet effective heuristic solution that performs well in practice. Our array-based Grouper representation also doubles as a triangle mesh data structure that allows direct access to vertices and triangles. Storing only about two integer references per triangle-i.e., less than the three vertex references stored with each triangle in a conventional indexed mesh format-Grouper answers both incidence and adjacency queries in amortized constant time. Our compact representation enables data-parallel processing on multicore computers, instant partitioning and fast transmission for distributed processing, as well as efficient out-of-core access. We demonstrate the versatility and performance benefits of Grouper using a suite of example meshes and processing kernels.
Wakefield calculations on parallel computers

International Nuclear Information System (INIS)

Schoessow, P.

1990-01-01

The use of parallelism in the solution of wakefield problems is illustrated for two different computer architectures (SIMD and MIMD). Results are given for finite difference codes which have been implemented on a Connection Machine and an Alliant FX/8 and which are used to compute wakefields in dielectric loaded structures. Benchmarks on code performance are presented for both cases. 4 refs., 3 figs., 2 tabs
Parallel FE Electron-Photon Transport Analysis on 2-D Unstructured Mesh

International Nuclear Information System (INIS)

Drumm, C.R.; Lorenz, J.

1999-01-01

A novel solution method has been developed to solve the coupled electron-photon transport problem on an unstructured triangular mesh. Instead of tackling the first-order form of the linear Boltzmann equation, this approach is based on the second-order form in conjunction with the conventional multi-group discrete-ordinates approximation. The highly forward-peaked electron scattering is modeled with a multigroup Legendre expansion derived from the Goudsmit-Saunderson theory. The finite element method is used to treat the spatial dependence. The solution method is unique in that the space-direction dependence is solved simultaneously, eliminating the need for the conventional inner iterations, a method that is well suited for massively parallel computers
Parallel computations

CERN Document Server

1982-01-01

Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn
Computational fluid dynamics on a massively parallel computer

Science.gov (United States)

Jespersen, Dennis C.; Levit, Creon

1989-01-01

A finite difference code was implemented for the compressible Navier-Stokes equations on the Connection Machine, a massively parallel computer. The code is based on the ARC2D/ARC3D program and uses the implicit factored algorithm of Beam and Warming. The codes uses odd-even elimination to solve linear systems. Timings and computation rates are given for the code, and a comparison is made with a Cray XMP.
An Implementation and Parallelization of the Scale Space Meshing Algorithm

Directory of Open Access Journals (Sweden)

Julie Digne

2015-11-01

Full Text Available Creating an interpolating mesh from an unorganized set of oriented points is a difficult problemwhich is often overlooked. Most methods focus indeed on building a watertight smoothed meshby defining some function whose zero level set is the surface of the object. However in some casesit is crucial to build a mesh that interpolates the points and does not fill the acquisition holes:either because the data are sparse and trying to fill the holes would create spurious artifactsor because the goal is to explore visually the data exactly as they were acquired without anysmoothing process. In this paper we detail a parallel implementation of the Scale-Space Meshingalgorithm, which builds on the scale-space framework for reconstructing a high precision meshfrom an input oriented point set. This algorithm first smoothes the point set, producing asingularity free shape. It then uses a standard mesh reconstruction technique, the Ball PivotingAlgorithm, to build a mesh from the smoothed point set. The final step consists in back-projecting the mesh built on the smoothed positions onto the original point set. The result ofthis process is an interpolating, hole-preserving surface mesh reconstruction.
High performance parallel computers for science

International Nuclear Information System (INIS)

Nash, T.; Areti, H.; Atac, R.; Biel, J.; Cook, A.; Deppe, J.; Edel, M.; Fischler, M.; Gaines, I.; Hance, R.

1989-01-01

This paper reports that Fermilab's Advanced Computer Program (ACP) has been developing cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 Mflops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction

Connectivity editing for quad-dominant meshes

KAUST Repository

Peng, Chihan; Wonka, Peter

2013-01-01

and illustrate the advantages and disadvantages of different strategies for quad-dominant mesh design. © 2013 The Author(s) Computer Graphics Forum © 2013 The Eurographics Association and John Wiley & Sons Ltd.
A new adaptive mesh refinement data structure with an application to detonation

Science.gov (United States)

Ji, Hua; Lien, Fue-Sang; Yee, Eugene

2010-11-01

A new Cell-based Structured Adaptive Mesh Refinement (CSAMR) data structure is developed. In our CSAMR data structure, Cartesian-like indices are used to identify each cell. With these stored indices, the information on the parent, children and neighbors of a given cell can be accessed simply and efficiently. Owing to the usage of these indices, the computer memory required for storage of the proposed AMR data structure is only {5}/{8} word per cell, in contrast to the conventional oct-tree [P. MacNeice, K.M. Olson, C. Mobary, R. deFainchtein, C. Packer, PARAMESH: a parallel adaptive mesh refinement community toolkit, Comput. Phys. Commun. 330 (2000) 126] and the fully threaded tree (FTT) [A.M. Khokhlov, Fully threaded tree algorithms for adaptive mesh fluid dynamics simulations, J. Comput. Phys. 143 (1998) 519] data structures which require, respectively, 19 and 2{3}/{8} words per cell for storage of the connectivity information. Because the connectivity information (e.g., parent, children and neighbors) of a cell in our proposed AMR data structure can be accessed using only the cell indices, a tree structure which was required in previous approaches for the organization of the AMR data is no longer needed for this new data structure. Instead, a much simpler hash table structure is used to maintain the AMR data, with the entry keys in the hash table obtained directly from the explicitly stored cell indices. The proposed AMR data structure simplifies the implementation and parallelization of an AMR code. Two three-dimensional test cases are used to illustrate and evaluate the computational performance of the new CSAMR data structure.
Parallel algorithms and cluster computing

CERN Document Server

Hoffmann, Karl Heinz

2007-01-01

This book presents major advances in high performance computing as well as major advances due to high performance computing. It contains a collection of papers in which results achieved in the collaboration of scientists from computer science, mathematics, physics, and mechanical engineering are presented. From the science problems to the mathematical algorithms and on to the effective implementation of these algorithms on massively parallel and cluster computers we present state-of-the-art methods and technology as well as exemplary results in these fields. This book shows that problems which seem superficially distinct become intimately connected on a computational level.
A scalable PC-based parallel computer for lattice QCD

International Nuclear Information System (INIS)

Fodor, Z.; Katz, S.D.; Pappa, G.

2003-01-01

A PC-based parallel computer for medium/large scale lattice QCD simulations is suggested. The Eoetvoes Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7GHz nodes. Gigabit Ethernet cards are used for nearest neighbor communication in a two-dimensional mesh. The sustained performance for dynamical staggered (wilson) quarks on large lattices is around 70(110) GFlops. The exceptional price/performance ratio is below $1/Mflop
A scalable PC-based parallel computer for lattice QCD

International Nuclear Information System (INIS)

Fodor, Z.; Papp, G.

2002-09-01

A PC-based parallel computer for medium/large scale lattice QCD simulations is suggested. The Eoetvoes Univ., Inst. Theor. Phys. cluster consists of 137 Intel P4-1.7 GHz nodes. Gigabit Ethernet cards are used for nearest neighbor communication in a two-dimensional mesh. The sustained performance for dynamical staggered(wilson) quarks on large lattices is around 70(110) GFlops. The exceptional price/performance ratio is below $1/Mflop. (orig.)
Accurate reaction-diffusion operator splitting on tetrahedral meshes for parallel stochastic molecular simulations

Energy Technology Data Exchange (ETDEWEB)

Hepburn, I.; De Schutter, E., E-mail: erik@oist.jp [Computational Neuroscience Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa 904 0495 (Japan); Theoretical Neurobiology & Neuroengineering, University of Antwerp, Antwerp 2610 (Belgium); Chen, W. [Computational Neuroscience Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa 904 0495 (Japan)

2016-08-07

Spatial stochastic molecular simulations in biology are limited by the intense computation required to track molecules in space either in a discrete time or discrete space framework, which has led to the development of parallel methods that can take advantage of the power of modern supercomputers in recent years. We systematically test suggested components of stochastic reaction-diffusion operator splitting in the literature and discuss their effects on accuracy. We introduce an operator splitting implementation for irregular meshes that enhances accuracy with minimal performance cost. We test a range of models in small-scale MPI simulations from simple diffusion models to realistic biological models and find that multi-dimensional geometry partitioning is an important consideration for optimum performance. We demonstrate performance gains of 1-3 orders of magnitude in the parallel implementation, with peak performance strongly dependent on model specification.
Ordering schemes for parallel processing of certain mesh problems

International Nuclear Information System (INIS)

O'Leary, D.

1984-01-01

In this work, some ordering schemes for mesh points are presented which enable algorithms such as the Gauss-Seidel or SOR iteration to be performed efficiently for the nine-point operator finite difference method on computers consisting of a two-dimensional grid of processors. Convergence results are presented for the discretization of u /SUB xx/ + u /SUB yy/ on a uniform mesh over a square, showing that the spectral radius of the iteration for these orderings is no worse than that for the standard row by row ordering of mesh points. Further applications of these mesh point orderings to network problems, more general finite difference operators, and picture processing problems are noted
Interoperable mesh components for large-scale, distributed-memory simulations

International Nuclear Information System (INIS)

Devine, K; Leung, V; Diachin, L; Miller, M

2009-01-01

SciDAC applications have a demonstrated need for advanced software tools to manage the complexities associated with sophisticated geometry, mesh, and field manipulation tasks, particularly as computer architectures move toward the petascale. In this paper, we describe a software component - an abstract data model and programming interface - designed to provide support for parallel unstructured mesh operations. We describe key issues that must be addressed to successfully provide high-performance, distributed-memory unstructured mesh services and highlight some recent research accomplishments in developing new load balancing and MPI-based communication libraries appropriate for leadership class computing. Finally, we give examples of the use of parallel adaptive mesh modification in two SciDAC applications.
Parallel computation for distributed parameter system-from vector processors to Adena computer

Energy Technology Data Exchange (ETDEWEB)

Nogi, T

1983-04-01

Research on advanced parallel hardware and software architectures for very high-speed computation deserves and needs more support and attention to fulfil its promise. Novel architectures for parallel processing are being made ready. Architectures for parallel processing can be roughly divided into two groups. One is a vector processor in which a single central processing unit involves multiple vector-arithmetic registers. The other is a processor array in which slave processors are connected to a host processor to perform parallel computation. In this review, the concept and data structure of the Adena (alternating-direction edition nexus array) architecture, which is conformable to distributed-parameter simulation algorithms, are described. 5 references.
Fast electrostatic force calculation on parallel computer clusters

International Nuclear Information System (INIS)

Kia, Amirali; Kim, Daejoong; Darve, Eric

2008-01-01

The fast multipole method (FMM) and smooth particle mesh Ewald (SPME) are well known fast algorithms to evaluate long range electrostatic interactions in molecular dynamics and other fields. FMM is a multi-scale method which reduces the computation cost by approximating the potential due to a group of particles at a large distance using few multipole functions. This algorithm scales like O(N) for N particles. SPME algorithm is an O(NlnN) method which is based on an interpolation of the Fourier space part of the Ewald sum and evaluating the resulting convolutions using fast Fourier transform (FFT). Those algorithms suffer from relatively poor efficiency on large parallel machines especially for mid-size problems around hundreds of thousands of atoms. A variation of the FMM, called PWA, based on plane wave expansions is presented in this paper. A new parallelization strategy for PWA, which takes advantage of the specific form of this expansion, is described. Its parallel efficiency is compared with SPME through detail time measurements on two different computer clusters
Frontiers of massively parallel scientific computation

International Nuclear Information System (INIS)

Fischer, J.R.

1987-07-01

Practical applications using massively parallel computer hardware first appeared during the 1980s. Their development was motivated by the need for computing power orders of magnitude beyond that available today for tasks such as numerical simulation of complex physical and biological processes, generation of interactive visual displays, satellite image analysis, and knowledge based systems. Representative of the first generation of this new class of computers is the Massively Parallel Processor (MPP). A team of scientists was provided the opportunity to test and implement their algorithms on the MPP. The first results are presented. The research spans a broad variety of applications including Earth sciences, physics, signal and image processing, computer science, and graphics. The performance of the MPP was very good. Results obtained using the Connection Machine and the Distributed Array Processor (DAP) are presented
Development of real-time visualization system for Computational Fluid Dynamics on parallel computers

International Nuclear Information System (INIS)

Muramatsu, Kazuhiro; Otani, Takayuki; Matsumoto, Hideki; Takei, Toshifumi; Doi, Shun

1998-03-01

A real-time visualization system for computational fluid dynamics in a network connecting between a parallel computing server and the client terminal was developed. Using the system, a user can visualize the results of a CFD (Computational Fluid Dynamics) simulation on the parallel computer as a client terminal during the actual computation on a server. Using GUI (Graphical User Interface) on the client terminal, to user is also able to change parameters of the analysis and visualization during the real-time of the calculation. The system carries out both of CFD simulation and generation of a pixel image data on the parallel computer, and compresses the data. Therefore, the amount of data from the parallel computer to the client is so small in comparison with no compression that the user can enjoy the swift image appearance comfortably. Parallelization of image data generation is based on Owner Computation Rule. GUI on the client is built on Java applet. A real-time visualization is thus possible on the client PC only if Web browser is implemented on it. (author)
Climate models on massively parallel computers

International Nuclear Information System (INIS)

Vitart, F.; Rouvillois, P.

1993-01-01

First results got on massively parallel computers (Multiple Instruction Multiple Data and Simple Instruction Multiple Data) allow to consider building of coupled models with high resolutions. This would make possible simulation of thermoaline circulation and other interaction phenomena between atmosphere and ocean. The increasing of computers powers, and then the improvement of resolution will go us to revise our approximations. Then hydrostatic approximation (in ocean circulation) will not be valid when the grid mesh will be of a dimension lower than a few kilometers: We shall have to find other models. The expert appraisement got in numerical analysis at the Center of Limeil-Valenton (CEL-V) will be used again to imagine global models taking in account atmosphere, ocean, ice floe and biosphere, allowing climate simulation until a regional scale
Nested dissection on a mesh-connected processor array

International Nuclear Information System (INIS)

Worley, P.H.; Schreiber, R.

1986-01-01

The authors present a parallel implementation of Gaussian elimination without pivoting using the nested dissection ordering for solving Ax=b where A is an N x N symmetric positive definite matrix. If the graph of A is a √N x √N finite element mesh then a parallel complexity of O(√N) can be achieved for Gaussian elimination with the nested dissection ordering. The authors' implementation achieves this parallel complexity on a two dimensional MIMD processor array with N processors and nearest neighbors interconnections. Thus nested dissection is a near optimal algorithm for this problem on this interconnection topology. The parallel implementation on this architecture requires 158√N + O(log/sub 2/(√N)) parallel floating point multiplications. It is faster than a Kung-Leiserson systolic array for banded matrices for N≥961, and faster than a serial implementation for N as small as 9
Solving the Fokker-Planck equation on a massively parallel computer

International Nuclear Information System (INIS)

Mirin, A.A.

1990-01-01

The Fokker-Planck package FPPAC had been converted to the Connection Machine 2 (CM2). For fine mesh cases the CM2 outperforms the Cray-2 when it comes to time-integrating the difference equations. For long Legendre expansions the CM2 is also faster at computing the Fokker-Planck coefficients. 3 refs
Measurement campaign on connectivity of mesh networks formed by mobile devices

DEFF Research Database (Denmark)

Pietrarca, Beatrice; Sasso, Giovanni; Perrucci, Gian Paolo

2007-01-01

This paper reports the results of a measurement campaign on the connectivity level of mobile devices using Bluetooth (BT) to form cooperative mobile mesh networks. Such mobile mesh networks composed of mobile devices are the basis for any peer-to-peer communication like wireless grids or social...
Parallel computing works

Energy Technology Data Exchange (ETDEWEB)

1991-10-23

An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.
Computational mesh generation for vascular structures with deformable surfaces

International Nuclear Information System (INIS)

Putter, S. de; Laffargue, F.; Breeuwer, M.; Vosse, F.N. van de; Gerritsen, F.A.; Philips Medical Systems, Best

2006-01-01

Computational blood flow and vessel wall mechanics simulations for vascular structures are becoming an important research tool for patient-specific surgical planning and intervention. An important step in the modelling process for patient-specific simulations is the creation of the computational mesh based on the segmented geometry. Most known solutions either require a large amount of manual processing or lead to a substantial difference between the segmented object and the actual computational domain. We have developed a chain of algorithms that lead to a closely related implementation of image segmentation with deformable models and 3D mesh generation. The resulting processing chain is very robust and leads both to an accurate geometrical representation of the vascular structure as well as high quality computational meshes. The chain of algorithms has been tested on a wide variety of shapes. A benchmark comparison of our mesh generation application with five other available meshing applications clearly indicates that the new approach outperforms the existing methods in the majority of cases. (orig.)
Parallel Computing Characteristics of Two-Phase Thermal-Hydraulics code, CUPID

International Nuclear Information System (INIS)

Lee, Jae Ryong; Yoon, Han Young

2013-01-01

Parallelized CUPID code has proved to be able to reproduce multi-dimensional thermal hydraulic analysis by validating with various conceptual problems and experimental data. In this paper, the characteristics of the parallelized CUPID code were investigated. Both single- and two phase simulation are taken into account. Since the scalability of a parallel simulation is known to be better for fine mesh system, two types of mesh system are considered. In addition, the dependency of the preconditioner for matrix solver was also compared. The scalability for the single-phase flow is better than that for two-phase flow due to the less numbers of iterations for solving pressure matrix. The CUPID code was investigated the parallel performance in terms of scalability. The CUPID code was parallelized with domain decomposition method. The MPI library was adopted to communicate the information at the interface cells. As increasing the number of mesh, the scalability is improved. For a given mesh, single-phase flow simulation with diagonal preconditioner shows the best speedup. However, for the two-phase flow simulation, the ILU preconditioner is recommended since it reduces the overall simulation time
Parallel computing works!

CERN Document Server

Fox, Geoffrey C; Messina, Guiseppe C

2014-01-01

A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop

Acceleration and parallelization calculation of EFEN-SP_3 method

International Nuclear Information System (INIS)

Yang Wen; Zheng Youqi; Wu Hongchun; Cao Liangzhi; Li Yunzhao

2013-01-01

Due to the fact that the exponential function expansion nodal-SP_3 (EFEN-SP_3) method needs further improvement in computational efficiency to routinely carry out PWR whole core pin-by-pin calculation, the coarse mesh acceleration and spatial parallelization were investigated in this paper. The coarse mesh acceleration was built by considering discontinuity factor on each coarse mesh interface and preserving neutron balance within each coarse mesh in space, angle and energy. The spatial parallelization based on MPI was implemented by guaranteeing load balancing and minimizing communications cost to fully take advantage of the modern computing and storage abilities. Numerical results based on a commercial nuclear power reactor demonstrate an speedup ratio of about 40 for the coarse mesh acceleration and a parallel efficiency of higher than 60% with 40 CPUs for the spatial parallelization. With these two improvements, the EFEN code can complete a PWR whole core pin-by-pin calculation with 289 × 289 × 218 meshes and 4 energy groups within 100 s by using 48 CPUs (2.40 GHz frequency). (authors)
A class of parallel algorithms for computation of the manipulator inertia matrix

Science.gov (United States)

Fijany, Amir; Bejczy, Antal K.

1989-01-01

Parallel and parallel/pipeline algorithms for computation of the manipulator inertia matrix are presented. An algorithm based on composite rigid-body spatial inertia method, which provides better features for parallelization, is used for the computation of the inertia matrix. Two parallel algorithms are developed which achieve the time lower bound in computation. Also described is the mapping of these algorithms with topological variation on a two-dimensional processor array, with nearest-neighbor connection, and with cardinality variation on a linear processor array. An efficient parallel/pipeline algorithm for the linear array was also developed, but at significantly higher efficiency.
Parallel implementation of a Lagrangian-based model on an adaptive mesh in C++: Application to sea-ice

Science.gov (United States)

Samaké, Abdoulaye; Rampal, Pierre; Bouillon, Sylvain; Ólason, Einar

2017-12-01

We present a parallel implementation framework for a new dynamic/thermodynamic sea-ice model, called neXtSIM, based on the Elasto-Brittle rheology and using an adaptive mesh. The spatial discretisation of the model is done using the finite-element method. The temporal discretisation is semi-implicit and the advection is achieved using either a pure Lagrangian scheme or an Arbitrary Lagrangian Eulerian scheme (ALE). The parallel implementation presented here focuses on the distributed-memory approach using the message-passing library MPI. The efficiency and the scalability of the parallel algorithms are illustrated by the numerical experiments performed using up to 500 processor cores of a cluster computing system. The performance obtained by the proposed parallel implementation of the neXtSIM code is shown being sufficient to perform simulations for state-of-the-art sea ice forecasting and geophysical process studies over geographical domain of several millions squared kilometers like the Arctic region.
Reference Computational Meshing Strategy for Computational Fluid Dynamics Simulation of Departure from Nucleate BoilingReference Computational Meshing Strategy for Computational Fluid Dynamics Simulation of Departure from Nucleate Boiling

Energy Technology Data Exchange (ETDEWEB)

Pointer, William David [ORNL

2017-08-01

The objective of this effort is to establish a strategy and process for generation of suitable computational mesh for computational fluid dynamics simulations of departure from nucleate boiling in a 5 by 5 fuel rod assembly held in place by PWR mixing vane spacer grids. This mesh generation process will support ongoing efforts to develop, demonstrate and validate advanced multi-phase computational fluid dynamics methods that enable more robust identification of dryout conditions and DNB occurrence.Building upon prior efforts and experience, multiple computational meshes were developed using the native mesh generation capabilities of the commercial CFD code STAR-CCM+. These meshes were used to simulate two test cases from the Westinghouse 5 by 5 rod bundle facility. The sensitivity of predicted quantities of interest to the mesh resolution was then established using two evaluation methods, the Grid Convergence Index method and the Least Squares method. This evaluation suggests that the Least Squares method can reliably establish the uncertainty associated with local parameters such as vector velocity components at a point in the domain or surface averaged quantities such as outlet velocity magnitude. However, neither method is suitable for characterization of uncertainty in global extrema such as peak fuel surface temperature, primarily because such parameters are not necessarily associated with a fixed point in space. This shortcoming is significant because the current generation algorithm for identification of DNB event conditions relies on identification of such global extrema. Ongoing efforts to identify DNB based on local surface conditions will address this challenge
Fast precalculated triangular mesh algorithm for 3D binary computer-generated holograms.

Science.gov (United States)

Yang, Fan; Kaczorowski, Andrzej; Wilkinson, Tim D

2014-12-10

A new method for constructing computer-generated holograms using a precalculated triangular mesh is presented. The speed of calculation can be increased dramatically by exploiting both the precalculated base triangle and GPU parallel computing. Unlike algorithms using point-based sources, this method can reconstruct a more vivid 3D object instead of a "hollow image." In addition, there is no need to do a fast Fourier transform for each 3D element every time. A ferroelectric liquid crystal spatial light modulator is used to display the binary hologram within our experiment and the hologram of a base right triangle is produced by utilizing just a one-step Fourier transform in the 2D case, which can be expanded to the 3D case by multiplying by a suitable Fresnel phase plane. All 3D holograms generated in this paper are based on Fresnel propagation; thus, the Fresnel plane is treated as a vital element in producing the hologram. A GeForce GTX 770 graphics card with 2 GB memory is used to achieve parallel computing.
Parallel ray tracing for one-dimensional discrete ordinate computations

International Nuclear Information System (INIS)

Jarvis, R.D.; Nelson, P.

1996-01-01

The ray-tracing sweep in discrete-ordinates, spatially discrete numerical approximation methods applied to the linear, steady-state, plane-parallel, mono-energetic, azimuthally symmetric, neutral-particle transport equation can be reduced to a parallel prefix computation. In so doing, the often severe penalty in convergence rate of the source iteration, suffered by most current parallel algorithms using spatial domain decomposition, can be avoided while attaining parallelism in the spatial domain to whatever extent desired. In addition, the reduction implies parallel algorithm complexity limits for the ray-tracing sweep. The reduction applies to all closed, linear, one-cell functional (CLOF) spatial approximation methods, which encompasses most in current popular use. Scalability test results of an implementation of the algorithm on a 64-node nCube-2S hypercube-connected, message-passing, multi-computer are described. (author)
Direct numerical simulation of bubbles with parallelized adaptive mesh refinement

International Nuclear Information System (INIS)

Talpaert, A.

2015-01-01

The study of two-phase Thermal-Hydraulics is a major topic for Nuclear Engineering for both security and efficiency of nuclear facilities. In addition to experiments, numerical modeling helps to knowing precisely where bubbles appear and how they behave, in the core as well as in the steam generators. This work presents the finest scale of representation of two-phase flows, Direct Numerical Simulation of bubbles. We use the 'Di-phasic Low Mach Number' equation model. It is particularly adapted to low-Mach number flows, that is to say flows which velocity is much slower than the speed of sound; this is very typical of nuclear thermal-hydraulics conditions. Because we study bubbles, we capture the front between vapor and liquid phases thanks to a downward flux limiting numerical scheme. The specific discrete analysis technique this work introduces is well-balanced parallel Adaptive Mesh Refinement (AMR). With AMR, we refined the coarse grid on a batch of patches in order to locally increase precision in areas which matter more, and capture fine changes in the front location and its topology. We show that patch-based AMR is very adapted for parallel computing. We use a variety of physical examples: forced advection, heat transfer, phase changes represented by a Stefan model, as well as the combination of all those models. We will present the results of those numerical simulations, as well as the speed up compared to equivalent non-AMR simulation and to serial computation of the same problems. This document is made up of an abstract and the slides of the presentation. (author)
Coupling parallel adaptive mesh refinement with a nonoverlapping domain decomposition solver

Czech Academy of Sciences Publication Activity Database

Kůs, Pavel; Šístek, Jakub

2017-01-01

Roč. 110, August (2017), s. 34-54 ISSN 0965-9978 R&D Projects: GA ČR GA14-02067S Institutional support: RVO:67985840 Keywords : adaptive mesh refinement * parallel algorithms * domain decomposition Subject RIV: BA - General Mathematics OBOR OECD: Applied mathematics Impact factor: 3.000, year: 2016 http://www.sciencedirect.com/science/article/pii/S0965997816305737
Coupling parallel adaptive mesh refinement with a nonoverlapping domain decomposition solver

Czech Academy of Sciences Publication Activity Database

Kůs, Pavel; Šístek, Jakub

2017-01-01

Roč. 110, August (2017), s. 34-54 ISSN 0965-9978 R&D Projects: GA ČR GA14-02067S Institutional support: RVO:67985840 Keywords : adaptive mesh refinement * parallel algorithms * domain decomposition Subject RIV: BA - General Mathematics OBOR OECD: Applied mathematics Impact factor: 3.000, year: 2016 http://www.sciencedirect.com/science/ article /pii/S0965997816305737
3-D inversion of airborne electromagnetic data parallelized and accelerated by local mesh and adaptive soundings

Science.gov (United States)

Yang, Dikun; Oldenburg, Douglas W.; Haber, Eldad

2014-03-01

Airborne electromagnetic (AEM) methods are highly efficient tools for assessing the Earth's conductivity structures in a large area at low cost. However, the configuration of AEM measurements, which typically have widely distributed transmitter-receiver pairs, makes the rigorous modelling and interpretation extremely time-consuming in 3-D. Excessive overcomputing can occur when working on a large mesh covering the entire survey area and inverting all soundings in the data set. We propose two improvements. The first is to use a locally optimized mesh for each AEM sounding for the forward modelling and calculation of sensitivity. This dedicated local mesh is small with fine cells near the sounding location and coarse cells far away in accordance with EM diffusion and the geometric decay of the signals. Once the forward problem is solved on the local meshes, the sensitivity for the inversion on the global mesh is available through quick interpolation. Using local meshes for AEM forward modelling avoids unnecessary computing on fine cells on a global mesh that are far away from the sounding location. Since local meshes are highly independent, the forward modelling can be efficiently parallelized over an array of processors. The second improvement is random and dynamic down-sampling of the soundings. Each inversion iteration only uses a random subset of the soundings, and the subset is reselected for every iteration. The number of soundings in the random subset, determined by an adaptive algorithm, is tied to the degree of model regularization. This minimizes the overcomputing caused by working with redundant soundings. Our methods are compared against conventional methods and tested with a synthetic example. We also invert a field data set that was previously considered to be too large to be practically inverted in 3-D. These examples show that our methodology can dramatically reduce the processing time of 3-D inversion to a practical level without losing resolution
The specification of Stampi, a message passing library for distributed parallel computing

International Nuclear Information System (INIS)

Imamura, Toshiyuki; Takemiya, Hiroshi; Koide, Hiroshi

2000-03-01

At CCSE, Center for Promotion of Computational Science and Engineering, a new message passing library for heterogeneous and distributed parallel computing has been developed, and it is called as Stampi. Stampi enables us to communicate between any combination of parallel computers as well as workstations. Currently, a Stampi system is constructed from Stampi library and Stampi/Java. It provides functions to connect a Stampi application with not only those on COMPACS, COMplex Parallel Computer System, but also applets which work on WWW browsers. This report summarizes the specifications of Stampi and details the development of its system. (author)
Parallel octree-based hexahedral mesh generation for eulerian to lagrangian conversion.

Energy Technology Data Exchange (ETDEWEB)

Staten, Matthew L.; Owen, Steven James

2010-09-01

Computational simulation must often be performed on domains where materials are represented as scalar quantities or volume fractions at cell centers of an octree-based grid. Common examples include bio-medical, geotechnical or shock physics calculations where interface boundaries are represented only as discrete statistical approximations. In this work, we introduce new methods for generating Lagrangian computational meshes from Eulerian-based data. We focus specifically on shock physics problems that are relevant to ASC codes such as CTH and Alegra. New procedures for generating all-hexahedral finite element meshes from volume fraction data are introduced. A new primal-contouring approach is introduced for defining a geometric domain. New methods for refinement, node smoothing, resolving non-manifold conditions and defining geometry are also introduced as well as an extension of the algorithm to handle tetrahedral meshes. We also describe new scalable MPI-based implementations of these procedures. We describe a new software module, Sculptor, which has been developed for use as an embedded component of CTH. We also describe its interface and its use within the mesh generation code, CUBIT. Several examples are shown to illustrate the capabilities of Sculptor.
Performance of a fine-grained parallel model for multi-group nodal-transport calculations in three-dimensional pin-by-pin reactor geometry

International Nuclear Information System (INIS)

Masahiro, Tatsumi; Akio, Yamamoto

2003-01-01

A production code SCOPE2 was developed based on the fine-grained parallel algorithm by the red/black iterative method targeting parallel computing environments such as a PC-cluster. It can perform a depletion calculation in a few hours using a PC-cluster with the model based on a 9-group nodal-SP3 transport method in 3-dimensional pin-by-pin geometry for in-core fuel management of commercial PWRs. The present algorithm guarantees the identical convergence process as that in serial execution, which is very important from the viewpoint of quality management. The fine-mesh geometry is constructed by hierarchical decomposition with introduction of intermediate management layer as a block that is a quarter piece of a fuel assembly in radial direction. A combination of a mesh division scheme forcing even meshes on each edge and a latency-hidden communication algorithm provided simplicity and efficiency to message passing to enhance parallel performance. Inter-processor communication and parallel I/O access were realized using the MPI functions. Parallel performance was measured for depletion calculations by the 9-group nodal-SP3 transport method in 3-dimensional pin-by-pin geometry with 340 x 340 x 26 meshes for full core geometry and 170 x 170 x 26 for quarter core geometry. A PC cluster that consists of 24 Pentium-4 processors connected by the Fast Ethernet was used for the performance measurement. Calculations in full core geometry gave better speedups compared to those in quarter core geometry because of larger granularity. Fine-mesh sweep and feedback calculation parts gave almost perfect scalability since granularity is large enough, while 1-group coarse-mesh diffusion acceleration gave only around 80%. The speedup and parallel efficiency for total computation time were 22.6 and 94%, respectively, for the calculation in full core geometry with 24 processors. (authors)
Parallelism in matrix computations

CERN Document Server

Gallopoulos, Efstratios; Sameh, Ahmed H

2016-01-01

This book is primarily intended as a research monograph that could also be used in graduate courses for the design of parallel algorithms in matrix computations. It assumes general but not extensive knowledge of numerical linear algebra, parallel architectures, and parallel programming paradigms. The book consists of four parts: (I) Basics; (II) Dense and Special Matrix Computations; (III) Sparse Matrix Computations; and (IV) Matrix functions and characteristics. Part I deals with parallel programming paradigms and fundamental kernels, including reordering schemes for sparse matrices. Part II is devoted to dense matrix computations such as parallel algorithms for solving linear systems, linear least squares, the symmetric algebraic eigenvalue problem, and the singular-value decomposition. It also deals with the development of parallel algorithms for special linear systems such as banded ,Vandermonde ,Toeplitz ,and block Toeplitz systems. Part III addresses sparse matrix computations: (a) the development of pa...
High performance parallel computers for science: New developments at the Fermilab advanced computer program

International Nuclear Information System (INIS)

Nash, T.; Areti, H.; Atac, R.

1988-08-01

Fermilab's Advanced Computer Program (ACP) has been developing highly cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 MFlops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction. 10 refs., 7 figs
Parallel computing of physical maps--a comparative study in SIMD and MIMD parallelism.

Science.gov (United States)

Bhandarkar, S M; Chirravuri, S; Arnold, J

1996-01-01

Ordering clones from a genomic library into physical maps of whole chromosomes presents a central computational problem in genetics. Chromosome reconstruction via clone ordering is usually isomorphic to the NP-complete Optimal Linear Arrangement problem. Parallel SIMD and MIMD algorithms for simulated annealing based on Markov chain distribution are proposed and applied to the problem of chromosome reconstruction via clone ordering. Perturbation methods and problem-specific annealing heuristics are proposed and described. The SIMD algorithms are implemented on a 2048 processor MasPar MP-2 system which is an SIMD 2-D toroidal mesh architecture whereas the MIMD algorithms are implemented on an 8 processor Intel iPSC/860 which is an MIMD hypercube architecture. A comparative analysis of the various SIMD and MIMD algorithms is presented in which the convergence, speedup, and scalability characteristics of the various algorithms are analyzed and discussed. On a fine-grained, massively parallel SIMD architecture with a low synchronization overhead such as the MasPar MP-2, a parallel simulated annealing algorithm based on multiple periodically interacting searches performs the best. For a coarse-grained MIMD architecture with high synchronization overhead such as the Intel iPSC/860, a parallel simulated annealing algorithm based on multiple independent searches yields the best results. In either case, distribution of clonal data across multiple processors is shown to exacerbate the tendency of the parallel simulated annealing algorithm to get trapped in a local optimum.
A compositional reservoir simulator on distributed memory parallel computers

International Nuclear Information System (INIS)

Rame, M.; Delshad, M.

1995-01-01

This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. A portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented
Mathematics and computational methods development in U.S. department of energy-sponsored research (nuclear energy research initiative and nuclear engineering education research). 4. Development of an Expert System for Generation of an Effective Mesh Distribution for the SN Method

International Nuclear Information System (INIS)

Patchimpattapong, Apisit; Haghighat, Alireza

2001-01-01

The discrete ordinates (S N ) method is widely used to obtain numerical solutions of the transport equation. The method calls for discretization of spatial, energy, and angular variables. To generate an 'effective' spatial mesh distribution, one has to consider various factors including particle mean free path (mfp), material and source discontinuities, and problem objectives. This becomes more complicated if we consider the effect of numerics such as differencing schemes, parallel processing strategies, and computation resources. As a result, one may often over/under-mesh depending upon limitations on accuracy, computing resources, and time allotted. To overcome the foregoing issues, we are developing an expert system for input preparation of the discrete ordinates (S N ) method. This project is a part of an ongoing project sponsored by Nuclear Engineering Education Research. Our expert system consists of two parts: (a) an algorithm for generation of a mesh distribution for a serial calculation and (b) an algorithm for extension to parallel computing, which accounts for parallelization parameters including granularity, load balancing, parallel algorithms, and possible architectural issues. Thus far, we have developed a stand-alone algorithm for generation of an 'effective' mesh distribution for a serial calculation. The algorithm has been successfully tested with the Parallel Environment Neutral-Particle Transport (PENTRAN) code system. In this paper, we discuss the structure of our algorithm and present its use for simulating the VENUS-3 experimental facility. To date, we have developed and tested part 1 of this system. This part comprises of four steps: creation of a geometric model and coarse meshes, calculation of un-collided flux, selection of differencing schemes, and generation of fine-mesh distribution. For the un-collided flux calculation, we have developed a parallel code called PENFC. It is capable of calculating un-collided and first-collision fluxes
MeSH Now: automatic MeSH indexing at PubMed scale via learning to rank.

Science.gov (United States)

Mao, Yuqing; Lu, Zhiyong

2017-04-17

MeSH indexing is the task of assigning relevant MeSH terms based on a manual reading of scholarly publications by human indexers. The task is highly important for improving literature retrieval and many other scientific investigations in biomedical research. Unfortunately, given its manual nature, the process of MeSH indexing is both time-consuming (new articles are not immediately indexed until 2 or 3 months later) and costly (approximately ten dollars per article). In response, automatic indexing by computers has been previously proposed and attempted but remains challenging. In order to advance the state of the art in automatic MeSH indexing, a community-wide shared task called BioASQ was recently organized. We propose MeSH Now, an integrated approach that first uses multiple strategies to generate a combined list of candidate MeSH terms for a target article. Through a novel learning-to-rank framework, MeSH Now then ranks the list of candidate terms based on their relevance to the target article. Finally, MeSH Now selects the highest-ranked MeSH terms via a post-processing module. We assessed MeSH Now on two separate benchmarking datasets using traditional precision, recall and F 1 -score metrics. In both evaluations, MeSH Now consistently achieved over 0.60 in F-score, ranging from 0.610 to 0.612. Furthermore, additional experiments show that MeSH Now can be optimized by parallel computing in order to process MEDLINE documents on a large scale. We conclude that MeSH Now is a robust approach with state-of-the-art performance for automatic MeSH indexing and that MeSH Now is capable of processing PubMed scale documents within a reasonable time frame. http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/MeSHNow/ .
Practical parallel computing

CERN Document Server

Morse, H Stephen

1994-01-01

Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi

Massively parallel red-black algorithms for x-y-z response matrix equations

International Nuclear Information System (INIS)

Hanebutte, U.R.; Laurin-Kovitz, K.; Lewis, E.E.

1992-01-01

Recently, both discrete ordinates and spherical harmonic (S n and P n ) methods have been cast in the form of response matrices. In x-y geometry, massively parallel algorithms have been developed to solve the resulting response matrix equations on the Connection Machine family of parallel computers, the CM-2, CM-200, and CM-5. These algorithms utilize two-cycle iteration on a red-black checkerboard. In this work we examine the use of massively parallel red-black algorithms to solve response matric equations in three dimensions. This longer term objective is to utilize massively parallel algorithms to solve S n and/or P n response matrix problems. In this exploratory examination, however, we consider the simple 6 x 6 response matrices that are derivable from fine-mesh diffusion approximations in three dimensions
A Reconfigurable Mesh-Ring Topology for Bluetooth Sensor Networks

Directory of Open Access Journals (Sweden)

Ben-Yi Wang

2018-05-01

Full Text Available In this paper, a Reconfigurable Mesh-Ring (RMR algorithm is proposed for Bluetooth sensor networks. The algorithm is designed in three stages to determine the optimal configuration of the mesh-ring network. Firstly, a designated root advertises and discovers its neighboring nodes. Secondly, a scatternet criterion is built to compute the minimum number of piconets and distributes the connection information for piconet and scatternet. Finally, a peak-search method is designed to determine the optimal mesh-ring configuration for various sizes of networks. To maximize the network capacity, the research problem is formulated by determining the best connectivity of available mesh links. During the formation and maintenance phases, three possible configurations (including piconet, scatternet, and hybrid are examined to determine the optimal placement of mesh links. The peak-search method is a systematic approach, and is implemented by three functional blocks: the topology formation block generates the mesh-ring topology, the routing efficiency block computes the routing performance, and the optimum decision block introduces a decision-making criterion to determine the optimum number of mesh links. Simulation results demonstrate that the optimal mesh-ring configuration can be determined and that the scatternet case achieves better overall performance than the other two configurations. The RMR topology also outperforms the conventional ring-based and cluster-based mesh methods in terms of throughput performance for Bluetooth configurable networks.
Parallel R-matrix computation

International Nuclear Information System (INIS)

Heggarty, J.W.

1999-06-01

For almost thirty years, sequential R-matrix computation has been used by atomic physics research groups, from around the world, to model collision phenomena involving the scattering of electrons or positrons with atomic or molecular targets. As considerable progress has been made in the understanding of fundamental scattering processes, new data, obtained from more complex calculations, is of current interest to experimentalists. Performing such calculations, however, places considerable demands on the computational resources to be provided by the target machine, in terms of both processor speed and memory requirement. Indeed, in some instances the computational requirements are so great that the proposed R-matrix calculations are intractable, even when utilising contemporary classic supercomputers. Historically, increases in the computational requirements of R-matrix computation were accommodated by porting the problem codes to a more powerful classic supercomputer. Although this approach has been successful in the past, it is no longer considered to be a satisfactory solution due to the limitations of current (and future) Von Neumann machines. As a consequence, there has been considerable interest in the high performance multicomputers, that have emerged over the last decade which appear to offer the computational resources required by contemporary R-matrix research. Unfortunately, developing codes for these machines is not as simple a task as it was to develop codes for successive classic supercomputers. The difficulty arises from the considerable differences in the computing models that exist between the two types of machine and results in the programming of multicomputers to be widely acknowledged as a difficult, time consuming and error-prone task. Nevertheless, unless parallel R-matrix computation is realised, important theoretical and experimental atomic physics research will continue to be hindered. This thesis describes work that was undertaken in
Computing NLTE Opacities -- Node Level Parallel Calculation

Energy Technology Data Exchange (ETDEWEB)

Holladay, Daniel [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

2015-09-11

Presentation. The goal: to produce a robust library capable of computing reasonably accurate opacities inline with the assumption of LTE relaxed (non-LTE). Near term: demonstrate acceleration of non-LTE opacity computation. Far term (if funded): connect to application codes with in-line capability and compute opacities. Study science problems. Use efficient algorithms that expose many levels of parallelism and utilize good memory access patterns for use on advanced architectures. Portability to multiple types of hardware including multicore processors, manycore processors such as KNL, GPUs, etc. Easily coupled to radiation hydrodynamics and thermal radiative transfer codes.
Element-topology-independent preconditioners for parallel finite element computations

Science.gov (United States)

Park, K. C.; Alexander, Scott

1992-01-01

A family of preconditioners for the solution of finite element equations are presented, which are element-topology independent and thus can be applicable to element order-free parallel computations. A key feature of the present preconditioners is the repeated use of element connectivity matrices and their left and right inverses. The properties and performance of the present preconditioners are demonstrated via beam and two-dimensional finite element matrices for implicit time integration computations.
Sensitivity study in order to improve the Mesh-grid in environmental nuclear monitoring system using parallels codes

International Nuclear Information System (INIS)

Serrao, Bruno P.; Schirru, Roberto

2015-01-01

All nuclear power plants need some monitoring system in order to monitoring the radioactivity that can be released in the atmosphere in case of accidents. Moreover, this system has also be capable to simulate future releases. For this, these systems calculate the wind field, the quantity of radioactive elements and the dispersion of these elements, around nuclear facilities. Angra 1, 2 and 3 (under construction) Complex site, has 15.75 x 10.75 kilometers (X x Y axis). The z axis is divided in 8 heights. So, the mesh has 23048 cells, each one 250 x 250 meters. This work aims to show the performance of an Environmental Nuclear Monitoring System when working with the cells with 100 x 100 meters and 50 x 50 meters, where the computational effort of this approach will be made using parallels computational programs. (author)
Sensitivity study in order to improve the Mesh-grid in environmental nuclear monitoring system using parallels codes

Energy Technology Data Exchange (ETDEWEB)

Serrao, Bruno P.; Schirru, Roberto, E-mail: bruno@lmp.ufrj.br, E-mail: schirru@lmp.ufrj.br [Coordenacao dos Programas de Pos-Graduacao em Engneharia (PEN/COPPE/UFRJ), Rio de Janeiro, RJ (Brazil). Programa de Engenharia Nuclear

2015-07-01

All nuclear power plants need some monitoring system in order to monitoring the radioactivity that can be released in the atmosphere in case of accidents. Moreover, this system has also be capable to simulate future releases. For this, these systems calculate the wind field, the quantity of radioactive elements and the dispersion of these elements, around nuclear facilities. Angra 1, 2 and 3 (under construction) Complex site, has 15.75 x 10.75 kilometers (X x Y axis). The z axis is divided in 8 heights. So, the mesh has 23048 cells, each one 250 x 250 meters. This work aims to show the performance of an Environmental Nuclear Monitoring System when working with the cells with 100 x 100 meters and 50 x 50 meters, where the computational effort of this approach will be made using parallels computational programs. (author)
Applied Parallel Computing Industrial Computation and Optimization

DEFF Research Database (Denmark)

Madsen, Kaj; NA NA NA Olesen, Dorte

Proceedings and the Third International Workshop on Applied Parallel Computing in Industrial Problems and Optimization (PARA96)......Proceedings and the Third International Workshop on Applied Parallel Computing in Industrial Problems and Optimization (PARA96)...
A new method for simplification and compression of 3D meshes

OpenAIRE

Attene, Marco

2001-01-01

We focus on the lossy compression of manifold triangle meshes. Our SwingWrapper approach partitions the surface of an original mesh M into simply-connected regions, called triangloids. We compute a new mesh M'. Each triangle of M' is a close approximation of a pseudo-triangle of M. By construction, the connectivity of M' is fairly regular and can be compressed to less than a bit per triangle using EdgeBreaker or one of the other recently developed schemes. The locations of the vertices of M' ...
Parallel computing: numerics, applications, and trends

National Research Council Canada - National Science Library

Trobec, Roman; Vajteršic, Marián; Zinterhof, Peter

2009-01-01

... and/or distributed systems. The contributions to this book are focused on topics most concerned in the trends of today's parallel computing. These range from parallel algorithmics, programming, tools, network computing to future parallel computing. Particular attention is paid to parallel numerics: linear algebra, differential equations, numerica...
Development of a mesh-type computer tomography for the two-phase flow

International Nuclear Information System (INIS)

Lee, Jae Young; Lee, In Wook

1998-01-01

This paper is to describe the development of a mesh-type computer tomography for the two-phase flow. The sensor is made of many parallel wires in the orthogonal orientation. A demultiplexer circuits is developed for electrodes to supply driving voltage and for data acquisition system to get the output voltage form the electrode unit. For the reconstruction of image a direct inversion algorithm is adopted. Full automation is provided from the data sensing to the image construction. Through the careful calibariation and field tests in the horizontal and vertical two-phase loop, the present sensor detect images for the solitary wave and the slug realistically. This sensor could be a useful tool in the laboratory experiments
Massive parallel electromagnetic field simulation program JEMS-FDTD design and implementation on jasmin

International Nuclear Information System (INIS)

Li Hanyu; Zhou Haijing; Dong Zhiwei; Liao Cheng; Chang Lei; Cao Xiaolin; Xiao Li

2010-01-01

A large-scale parallel electromagnetic field simulation program JEMS-FDTD(J Electromagnetic Solver-Finite Difference Time Domain) is designed and implemented on JASMIN (J parallel Adaptive Structured Mesh applications INfrastructure). This program can simulate propagation, radiation, couple of electromagnetic field by solving Maxwell equations on structured mesh explicitly with FDTD method. JEMS-FDTD is able to simulate billion-mesh-scale problems on thousands of processors. In this article, the program is verified by simulating the radiation of an electric dipole. A beam waveguide is simulated to demonstrate the capability of large scale parallel computation. A parallel performance test indicates that a high parallel efficiency is obtained. (authors)
An efficient Adaptive Mesh Refinement (AMR) algorithm for the Discontinuous Galerkin method: Applications for the computation of compressible two-phase flows

Science.gov (United States)

Papoutsakis, Andreas; Sazhin, Sergei S.; Begg, Steven; Danaila, Ionut; Luddens, Francky

2018-06-01

We present an Adaptive Mesh Refinement (AMR) method suitable for hybrid unstructured meshes that allows for local refinement and de-refinement of the computational grid during the evolution of the flow. The adaptive implementation of the Discontinuous Galerkin (DG) method introduced in this work (ForestDG) is based on a topological representation of the computational mesh by a hierarchical structure consisting of oct- quad- and binary trees. Adaptive mesh refinement (h-refinement) enables us to increase the spatial resolution of the computational mesh in the vicinity of the points of interest such as interfaces, geometrical features, or flow discontinuities. The local increase in the expansion order (p-refinement) at areas of high strain rates or vorticity magnitude results in an increase of the order of accuracy in the region of shear layers and vortices. A graph of unitarian-trees, representing hexahedral, prismatic and tetrahedral elements is used for the representation of the initial domain. The ancestral elements of the mesh can be split into self-similar elements allowing each tree to grow branches to an arbitrary level of refinement. The connectivity of the elements, their genealogy and their partitioning are described by linked lists of pointers. An explicit calculation of these relations, presented in this paper, facilitates the on-the-fly splitting, merging and repartitioning of the computational mesh by rearranging the links of each node of the tree with a minimal computational overhead. The modal basis used in the DG implementation facilitates the mapping of the fluxes across the non conformal faces. The AMR methodology is presented and assessed using a series of inviscid and viscous test cases. Also, the AMR methodology is used for the modelling of the interaction between droplets and the carrier phase in a two-phase flow. This approach is applied to the analysis of a spray injected into a chamber of quiescent air, using the Eulerian
Node-based finite element method for large-scale adaptive fluid analysis in parallel environments

International Nuclear Information System (INIS)

Toshimitsu, Fujisawa; Genki, Yagawa

2003-01-01

In this paper, a FEM-based (finite element method) mesh free method with a probabilistic node generation technique is presented. In the proposed method, all computational procedures, from the mesh generation to the solution of a system of equations, can be performed fluently in parallel in terms of nodes. Local finite element mesh is generated robustly around each node, even for harsh boundary shapes such as cracks. The algorithm and the data structure of finite element calculation are based on nodes, and parallel computing is realized by dividing a system of equations by the row of the global coefficient matrix. In addition, the node-based finite element method is accompanied by a probabilistic node generation technique, which generates good-natured points for nodes of finite element mesh. Furthermore, the probabilistic node generation technique can be performed in parallel environments. As a numerical example of the proposed method, we perform a compressible flow simulation containing strong shocks. Numerical simulations with frequent mesh refinement, which are required for such kind of analysis, can effectively be performed on parallel processors by using the proposed method. (authors)
Node-based finite element method for large-scale adaptive fluid analysis in parallel environments

Energy Technology Data Exchange (ETDEWEB)

Toshimitsu, Fujisawa [Tokyo Univ., Collaborative Research Center of Frontier Simulation Software for Industrial Science, Institute of Industrial Science (Japan); Genki, Yagawa [Tokyo Univ., Department of Quantum Engineering and Systems Science (Japan)

2003-07-01

In this paper, a FEM-based (finite element method) mesh free method with a probabilistic node generation technique is presented. In the proposed method, all computational procedures, from the mesh generation to the solution of a system of equations, can be performed fluently in parallel in terms of nodes. Local finite element mesh is generated robustly around each node, even for harsh boundary shapes such as cracks. The algorithm and the data structure of finite element calculation are based on nodes, and parallel computing is realized by dividing a system of equations by the row of the global coefficient matrix. In addition, the node-based finite element method is accompanied by a probabilistic node generation technique, which generates good-natured points for nodes of finite element mesh. Furthermore, the probabilistic node generation technique can be performed in parallel environments. As a numerical example of the proposed method, we perform a compressible flow simulation containing strong shocks. Numerical simulations with frequent mesh refinement, which are required for such kind of analysis, can effectively be performed on parallel processors by using the proposed method. (authors)
The numerical parallel computing of photon transport

International Nuclear Information System (INIS)

Huang Qingnan; Liang Xiaoguang; Zhang Lifa

1998-12-01

The parallel computing of photon transport is investigated, the parallel algorithm and the parallelization of programs on parallel computers both with shared memory and with distributed memory are discussed. By analyzing the inherent law of the mathematics and physics model of photon transport according to the structure feature of parallel computers, using the strategy of 'to divide and conquer', adjusting the algorithm structure of the program, dissolving the data relationship, finding parallel liable ingredients and creating large grain parallel subtasks, the sequential computing of photon transport into is efficiently transformed into parallel and vector computing. The program was run on various HP parallel computers such as the HY-1 (PVP), the Challenge (SMP) and the YH-3 (MPP) and very good parallel speedup has been gotten
Locating hardware faults in a data communications network of a parallel computer

Science.gov (United States)

Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.

2010-01-12

Hardware faults location in a data communications network of a parallel computer. Such a parallel computer includes a plurality of compute nodes and a data communications network that couples the compute nodes for data communications and organizes the compute node as a tree. Locating hardware faults includes identifying a next compute node as a parent node and a root of a parent test tree, identifying for each child compute node of the parent node a child test tree having the child compute node as root, running a same test suite on the parent test tree and each child test tree, and identifying the parent compute node as having a defective link connected from the parent compute node to a child compute node if the test suite fails on the parent test tree and succeeds on all the child test trees.
Parallel algorithms for mapping pipelined and parallel computations

Science.gov (United States)

Nicol, David M.

1988-01-01

Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.
Mapping method for generating three-dimensional meshes: past and present

International Nuclear Information System (INIS)

Cook, W.A.; Oakes, W.R.

1982-01-01

Two transformations are derived in this paper. One is a mapping of a unit square onto a surve and the other is a mapping of a unit cube onto a three-dimensional region. Two meshing computer programs are then discussed that use these mappings. The first is INGEN, which has been used to calculate three-dimensional meshes for approximately 15 years. This meshing program uses an index scheme to number boundaries, surfaces, and regions. With such an index scheme, it is possible to control nodal points, elements, and boundary conditions. The second is ESCHER, a meshing program now being developed. Two primary considerations governing development of ESCHER are that meshes graded using quadrilaterals are required and that edge-line geometry defined by Computer-Aided Design/Computer-Aided Manufacturing (CAD/CAM) systems will be a major source of geometry definition. This program separates the processes of nodal-point connectivity generation, computation of nodal-point mapping space coordinates, and mapping of nodal points into model space
Compiler Technology for Parallel Scientific Computation

Directory of Open Access Journals (Sweden)

Can Özturan

1994-01-01

Full Text Available There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving large-scale problems in science and engineering. Yet, the use of parallel computation is limited by the high costs of developing the needed software. To overcome this difficulty we advocate a comprehensive approach to the development of scalable architecture-independent software for scientific computation based on our experience with equational programming language (EPL. Our approach is based on a program decomposition, parallel code synthesis, and run-time support for parallel scientific computation. The program decomposition is guided by the source program annotations provided by the user. The synthesis of parallel code is based on configurations that describe the overall computation as a set of interacting components. Run-time support is provided by the compiler-generated code that redistributes computation and data during object program execution. The generated parallel code is optimized using techniques of data alignment, operator placement, wavefront determination, and memory optimization. In this article we discuss annotations, configurations, parallel code generation, and run-time support suitable for parallel programs written in the functional parallel programming language EPL and in Fortran.

Implementation of QR up- and downdating on a massively parallel |computer

DEFF Research Database (Denmark)

Bendtsen, Claus; Hansen, Per Christian; Madsen, Kaj

1995-01-01

We describe an implementation of QR up- and downdating on a massively parallel computer (the Connection Machine CM-200) and show that the algorithm maps well onto the computer. In particular, we show how the use of corrected semi-normal equations for downdating can be efficiently implemented. We...... also illustrate the use of our algorithms in a new LP algorithm....
3D streamers simulation in a pin to plane configuration using massively parallel computing

Science.gov (United States)

Plewa, J.-M.; Eichwald, O.; Ducasse, O.; Dessante, P.; Jacobs, C.; Renon, N.; Yousfi, M.

2018-03-01

This paper concerns the 3D simulation of corona discharge using high performance computing (HPC) managed with the message passing interface (MPI) library. In the field of finite volume methods applied on non-adaptive mesh grids and in the case of a specific 3D dynamic benchmark test devoted to streamer studies, the great efficiency of the iterative R&B SOR and BiCGSTAB methods versus the direct MUMPS method was clearly demonstrated in solving the Poisson equation using HPC resources. The optimization of the parallelization and the resulting scalability was undertaken as a function of the HPC architecture for a number of mesh cells ranging from 8 to 512 million and a number of cores ranging from 20 to 1600. The R&B SOR method remains at least about four times faster than the BiCGSTAB method and requires significantly less memory for all tested situations. The R&B SOR method was then implemented in a 3D MPI parallelized code that solves the classical first order model of an atmospheric pressure corona discharge in air. The 3D code capabilities were tested by following the development of one, two and four coplanar streamers generated by initial plasma spots for 6 ns. The preliminary results obtained allowed us to follow in detail the formation of the tree structure of a corona discharge and the effects of the mutual interactions between the streamers in terms of streamer velocity, trajectory and diameter. The computing time for 64 million of mesh cells distributed over 1000 cores using the MPI procedures is about 30 min ns-1, regardless of the number of streamers.
Communication Software Performance for Linux Clusters with Mesh Connections

Energy Technology Data Exchange (ETDEWEB)

Jie Chen; William Watson

2003-09-01

Recent progress in copper based commodity Gigabit Ethernet interconnects enables constructing clusters to achieve extremely high I/O bandwidth at low cost with mesh connections. However, the TCP/IP protocol stack cannot match the improved performance of Gigabit Ethernet networks especially in the case of multiple interconnects on a single host. In this paper, we evaluate and compare the performance characteristics of TCP/IP and M-VIA software that is an implementation of VIA.In particular, we focus on the performance of the software systems for a mesh communication architecture and demonstrate the feasibility of using multiple Gigabit Ethernet cards on one host to achieve aggregated bandwidth and latency that are not only better than what TCP provides but also compare favorably to some of the special purpose high-speed networks. In addition, implementation of a new M-VIA driver for one type of Gigabit Ethernet card will be discussed.
Collectively loading an application in a parallel computer

Science.gov (United States)

Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.; Miller, Samuel J.; Mundy, Michael B.

2016-01-05

Collectively loading an application in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: identifying, by a parallel computer control system, a subset of compute nodes in the parallel computer to execute a job; selecting, by the parallel computer control system, one of the subset of compute nodes in the parallel computer as a job leader compute node; retrieving, by the job leader compute node from computer memory, an application for executing the job; and broadcasting, by the job leader to the subset of compute nodes in the parallel computer, the application for executing the job.
How to model wireless mesh networks topology

International Nuclear Information System (INIS)

Sanni, M L; Hashim, A A; Anwar, F; Ali, S; Ahmed, G S M

2013-01-01

The specification of network connectivity model or topology is the beginning of design and analysis in Computer Network researches. Wireless Mesh Networks is an autonomic network that is dynamically self-organised, self-configured while the mesh nodes establish automatic connectivity with the adjacent nodes in the relay network of wireless backbone routers. Researches in Wireless Mesh Networks range from node deployment to internetworking issues with sensor, Internet and cellular networks. These researches require modelling of relationships and interactions among nodes including technical characteristics of the links while satisfying the architectural requirements of the physical network. However, the existing topology generators model geographic topologies which constitute different architectures, thus may not be suitable in Wireless Mesh Networks scenarios. The existing methods of topology generation are explored, analysed and parameters for their characterisation are identified. Furthermore, an algorithm for the design of Wireless Mesh Networks topology based on square grid model is proposed in this paper. The performance of the topology generated is also evaluated. This research is particularly important in the generation of a close-to-real topology for ensuring relevance of design to the intended network and validity of results obtained in Wireless Mesh Networks researches
Eigensolution of finite element problems in a completely connected parallel architecture

Science.gov (United States)

Akl, Fred A.; Morel, Michael R.

1989-01-01

A parallel algorithm for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi)=(M)(phi)(omega), where (K) and (M) are of order N, and (omega) is of order q is presented. The parallel algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm has been successfully implemented on a tightly coupled multiple-instruction-multiple-data (MIMD) parallel processing computer, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor, or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macro-tasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18 and 3.61 are achieved on two, four, six and eight processors, respectively.
Parallel Alterations of Functional Connectivity during Execution and Imagination after Motor Imagery Learning

Science.gov (United States)

Zhang, Rushao; Hui, Mingqi; Long, Zhiying; Zhao, Xiaojie; Yao, Li

2012-01-01

Background Neural substrates underlying motor learning have been widely investigated with neuroimaging technologies. Investigations have illustrated the critical regions of motor learning and further revealed parallel alterations of functional activation during imagination and execution after learning. However, little is known about the functional connectivity associated with motor learning, especially motor imagery learning, although benefits from functional connectivity analysis attract more attention to the related explorations. We explored whether motor imagery (MI) and motor execution (ME) shared parallel alterations of functional connectivity after MI learning. Methodology/Principal Findings Graph theory analysis, which is widely used in functional connectivity exploration, was performed on the functional magnetic resonance imaging (fMRI) data of MI and ME tasks before and after 14 days of consecutive MI learning. The control group had no learning. Two measures, connectivity degree and interregional connectivity, were calculated and further assessed at a statistical level. Two interesting results were obtained: (1) The connectivity degree of the right posterior parietal lobe decreased in both MI and ME tasks after MI learning in the experimental group; (2) The parallel alterations of interregional connectivity related to the right posterior parietal lobe occurred in the supplementary motor area for both tasks. Conclusions/Significance These computational results may provide the following insights: (1) The establishment of motor schema through MI learning may induce the significant decrease of connectivity degree in the posterior parietal lobe; (2) The decreased interregional connectivity between the supplementary motor area and the right posterior parietal lobe in post-test implicates the dissociation between motor learning and task performing. These findings and explanations further revealed the neural substrates underpinning MI learning and supported that
Java parallel secure stream for grid computing

International Nuclear Information System (INIS)

Chen, J.; Akers, W.; Chen, Y.; Watson, W.

2001-01-01

The emergence of high speed wide area networks makes grid computing a reality. However grid applications that need reliable data transfer still have difficulties to achieve optimal TCP performance due to network tuning of TCP window size to improve the bandwidth and to reduce latency on a high speed wide area network. The authors present a pure Java package called JPARSS (Java Parallel Secure Stream) that divides data into partitions that are sent over several parallel Java streams simultaneously and allows Java or Web applications to achieve optimal TCP performance in a gird environment without the necessity of tuning the TCP window size. Several experimental results are provided to show that using parallel stream is more effective than tuning TCP window size. In addition X.509 certificate based single sign-on mechanism and SSL based connection establishment are integrated into this package. Finally a few applications using this package will be discussed
Systematic approach for deriving feasible mappings of parallel algorithms to parallel computing platforms

NARCIS (Netherlands)

Arkin, Ethem; Tekinerdogan, Bedir; Imre, Kayhan M.

2017-01-01

The need for high-performance computing together with the increasing trend from single processor to parallel computer architectures has leveraged the adoption of parallel computing. To benefit from parallel computing power, usually parallel algorithms are defined that can be mapped and executed
Depth-Averaged Non-Hydrostatic Hydrodynamic Model Using a New Multithreading Parallel Computing Method

Directory of Open Access Journals (Sweden)

Ling Kang

2017-03-01

Full Text Available Compared to the hydrostatic hydrodynamic model, the non-hydrostatic hydrodynamic model can accurately simulate flows that feature vertical accelerations. The model’s low computational efficiency severely restricts its wider application. This paper proposes a non-hydrostatic hydrodynamic model based on a multithreading parallel computing method. The horizontal momentum equation is obtained by integrating the Navier–Stokes equations from the bottom to the free surface. The vertical momentum equation is approximated by the Keller-box scheme. A two-step method is used to solve the model equations. A parallel strategy based on block decomposition computation is utilized. The original computational domain is subdivided into two subdomains that are physically connected via a virtual boundary technique. Two sub-threads are created and tasked with the computation of the two subdomains. The producer–consumer model and the thread lock technique are used to achieve synchronous communication between sub-threads. The validity of the model was verified by solitary wave propagation experiments over a flat bottom and slope, followed by two sinusoidal wave propagation experiments over submerged breakwater. The parallel computing method proposed here was found to effectively enhance computational efficiency and save 20%–40% computation time compared to serial computing. The parallel acceleration rate and acceleration efficiency are approximately 1.45% and 72%, respectively. The parallel computing method makes a contribution to the popularization of non-hydrostatic models.
An Introduction to Parallel Computation R

Indian Academy of Sciences (India)

How are they programmed? This article provides an introduction. A parallel computer is a network of processors built for ... and have been used to solve problems much faster than a single ... in parallel computer design is to select an organization which ..... The most ambitious approach to parallel computing is to develop.
Parallel Computing Strategies for Irregular Algorithms

Science.gov (United States)

Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

2002-01-01

Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Development of a multimaterial, two-dimensional, arbitrary Lagrangian-Eulerian mesh computer program

International Nuclear Information System (INIS)

Barton, R.T.

1982-01-01

We have developed a large, multimaterial, two-dimensional Arbitrary Lagrangian-Eulerian (ALE) computer program. The special feature of an ALE mesh is that it can be either an embedded Lagrangian mesh, a fixed Eulerian mesh, or a partially embedded, partially remapped mesh. Remapping is used to remove Lagrangian mesh distortion. This general purpose program has been used for astrophysical modeling, under the guidance of James R. Wilson. The rationale behind the development of this program will be used to highlight several important issues in program design
Line-plane broadcasting in a data communications network of a parallel computer

Science.gov (United States)

Archer, Charles J.; Berg, Jeremy E.; Blocksome, Michael A.; Smith, Brian E.

2010-06-08

Methods, apparatus, and products are disclosed for line-plane broadcasting in a data communications network of a parallel computer, the parallel computer comprising a plurality of compute nodes connected together through the network, the network optimized for point to point data communications and characterized by at least a first dimension, a second dimension, and a third dimension, that include: initiating, by a broadcasting compute node, a broadcast operation, including sending a message to all of the compute nodes along an axis of the first dimension for the network; sending, by each compute node along the axis of the first dimension, the message to all of the compute nodes along an axis of the second dimension for the network; and sending, by each compute node along the axis of the second dimension, the message to all of the compute nodes along an axis of the third dimension for the network.
Parallel quantum computing in a single ensemble quantum computer

International Nuclear Information System (INIS)

Long Guilu; Xiao, L.

2004-01-01

We propose a parallel quantum computing mode for ensemble quantum computer. In this mode, some qubits are in pure states while other qubits are in mixed states. It enables a single ensemble quantum computer to perform 'single-instruction-multidata' type of parallel computation. Parallel quantum computing can provide additional speedup in Grover's algorithm and Shor's algorithm. In addition, it also makes a fuller use of qubit resources in an ensemble quantum computer. As a result, some qubits discarded in the preparation of an effective pure state in the Schulman-Varizani and the Cleve-DiVincenzo algorithms can be reutilized
Stampi: a message passing library for distributed parallel computing. User's guide, second edition

International Nuclear Information System (INIS)

Imamura, Toshiyuki; Koide, Hiroshi; Takemiya, Hiroshi

2000-02-01

A new message passing library, Stampi, has been developed to realize a computation with different kind of parallel computers arbitrarily and making MPI (Message Passing Interface) as an unique interface for communication. Stampi is based on the MPI2 specification, and it realizes dynamic process creation to different machines and communication between spawned one within the scope of MPI semantics. Main features of Stampi are summarized as follows: (i) an automatic switch function between external- and internal communications, (ii) a message routing/relaying with a routing module, (iii) a dynamic process creation, (iv) a support of two types of connection, Master/Slave and Client/Server, (v) a support of a communication with Java applets. Indeed vendors implemented MPI libraries as a closed system in one parallel machine or their systems, and did not support both functions; process creation and communication to external machines. Stampi supports both functions and enables us distributed parallel computing. Currently Stampi has been implemented on COMPACS (COMplex PArallel Computer System) introduced in CCSE, five parallel computers and one graphic workstation, moreover on eight kinds of parallel machines, totally fourteen systems. Stampi provides us MPI communication functionality on them. This report describes mainly the usage of Stampi. (author)
Speed up of MCACE, a Monte Carlo code for evaluation of shielding safety, by parallel computer, (3)

International Nuclear Information System (INIS)

Takano, Makoto; Masukawa, Fumihiro; Naito, Yoshitaka; Onodera, Emi; Imawaka, Tsuneyuki; Yoda, Yoshihisa.

1993-07-01

The parallel computing of the MCACE code has been studied on two platforms; 1) Shared Memory Type Vector-Parallel Computer Monte-4 and 2) Networked Several Workstations. On the Monte-4, a disk-file has been allocated to collect all results computed by 4 CPUs in parallel, executing the copy of the MCACE code on each CPU. On the workstations under network environment, two parallel models have been evaluated; 1) a host-node model and 2) the model used on the Monte-4 where no software for parallelization has been employed but only standard FORTRAN language. The measurement of computing times has showed that speed up of about 3 times has been achieved by using 4 CPUs of the Monte-4. Further, connecting 4 workstations by network, the computing speed by parallelization has achieved faster than our scalar main frame computer, FACOM M-780. (author)
High-resolution multi-code implementation of unsteady Navier-Stokes flow solver based on paralleled overset adaptive mesh refinement and high-order low-dissipation hybrid schemes

Science.gov (United States)

Li, Gaohua; Fu, Xiang; Wang, Fuxin

2017-10-01

The low-dissipation high-order accurate hybrid up-winding/central scheme based on fifth-order weighted essentially non-oscillatory (WENO) and sixth-order central schemes, along with the Spalart-Allmaras (SA)-based delayed detached eddy simulation (DDES) turbulence model, and the flow feature-based adaptive mesh refinement (AMR), are implemented into a dual-mesh overset grid infrastructure with parallel computing capabilities, for the purpose of simulating vortex-dominated unsteady detached wake flows with high spatial resolutions. The overset grid assembly (OGA) process based on collection detection theory and implicit hole-cutting algorithm achieves an automatic coupling for the near-body and off-body solvers, and the error-and-try method is used for obtaining a globally balanced load distribution among the composed multiple codes. The results of flows over high Reynolds cylinder and two-bladed helicopter rotor show that the combination of high-order hybrid scheme, advanced turbulence model, and overset adaptive mesh refinement can effectively enhance the spatial resolution for the simulation of turbulent wake eddies.
Finite element method for solving Kohn-Sham equations based on self-adaptive tetrahedral mesh

International Nuclear Information System (INIS)

Zhang Dier; Shen Lihua; Zhou Aihui; Gong Xingao

2008-01-01

A finite element (FE) method with self-adaptive mesh-refinement technique is developed for solving the density functional Kohn-Sham equations. The FE method adopts local piecewise polynomials basis functions, which produces sparsely structured matrices of Hamiltonian. The method is well suitable for parallel implementation without using Fourier transform. In addition, the self-adaptive mesh-refinement technique can control the computational accuracy and efficiency with optimal mesh density in different regions
Parallel computers and three-dimensional computational electromagnetics

International Nuclear Information System (INIS)

Madsen, N.K.

1994-01-01

The authors have continued to enhance their ability to use new massively parallel processing computers to solve time-domain electromagnetic problems. New vectorization techniques have improved the performance of their code DSI3D by factors of 5 to 15, depending on the computer used. New radiation boundary conditions and far-field transformations now allow the computation of radar cross-section values for complex objects. A new parallel-data extraction code has been developed that allows the extraction of data subsets from large problems, which have been run on parallel computers, for subsequent post-processing on workstations with enhanced graphics capabilities. A new charged-particle-pushing version of DSI3D is under development. Finally, DSI3D has become a focal point for several new Cooperative Research and Development Agreement activities with industrial companies such as Lockheed Advanced Development Company, Varian, Hughes Electron Dynamics Division, General Atomic, and Cray

Towards a real time computation of the dose in a phantom segmented into homogeneous meshes

International Nuclear Information System (INIS)

Blanpain, B.

2009-10-01

Automatic radiation therapy treatment planning necessitates a very fast computation of the dose delivered to the patient. We propose to compute the dose by segmenting the patient's phantom into homogeneous meshes, and by associating, to the meshes, projections to dose distributions pre-computed in homogeneous phantoms, along with weights managing heterogeneities. The dose computation is divided into two steps. The first step impacts the meshes: projections and weights are set according to physical and geometrical criteria. The second step impacts the voxels: the dose is computed by evaluating the functions previously associated to their mesh. This method is very fast, in particular when there are few points of interest (several hundreds). In this case, results are obtained in less than one second. With such performances, practical realization of automatic treatment planning becomes practically feasible. (author)
On synchronous parallel computations with independent probabilistic choice

International Nuclear Information System (INIS)

Reif, J.H.

1984-01-01

This paper introduces probabilistic choice to synchronous parallel machine models; in particular parallel RAMs. The power of probabilistic choice in parallel computations is illustrate by parallelizing some known probabilistic sequential algorithms. The authors characterize the computational complexity of time, space, and processor bounded probabilistic parallel RAMs in terms of the computational complexity of probabilistic sequential RAMs. They show that parallelism uniformly speeds up time bounded probabilistic sequential RAM computations by nearly a quadratic factor. They also show that probabilistic choice can be eliminated from parallel computations by introducing nonuniformity
Modeling and Grid impedance Variation Analysis of Parallel Connected Grid Connected Inverter based on Impedance Based Harmonic Analysis

DEFF Research Database (Denmark)

Kwon, JunBum; Wang, Xiongfei; Bak, Claus Leth

2014-01-01

This paper addresses the harmonic compensation error problem existing with parallel connected inverter in the same grid interface conditions by means of impedance-based analysis and modeling. Unlike the single grid connected inverter, it is found that multiple parallel connected inverters and grid...... impedance can make influence to each other if they each have a harmonic compensation function. The analysis method proposed in this paper is based on the relationship between the overall output impedance and input impedance of parallel connected inverter, where controller gain design method, which can...
SU-D-207-04: GPU-Based 4D Cone-Beam CT Reconstruction Using Adaptive Meshing Method

International Nuclear Information System (INIS)

Zhong, Z; Gu, X; Iyengar, P; Mao, W; Wang, J; Guo, X

2015-01-01

Purpose: Due to the limited number of projections at each phase, the image quality of a four-dimensional cone-beam CT (4D-CBCT) is often degraded, which decreases the accuracy of subsequent motion modeling. One of the promising methods is the simultaneous motion estimation and image reconstruction (SMEIR) approach. The objective of this work is to enhance the computational speed of the SMEIR algorithm using adaptive feature-based tetrahedral meshing and GPU-based parallelization. Methods: The first step is to generate the tetrahedral mesh based on the features of a reference phase 4D-CBCT, so that the deformation can be well captured and accurately diffused from the mesh vertices to voxels of the image volume. After the mesh generation, the updated motion model and other phases of 4D-CBCT can be obtained by matching the 4D-CBCT projection images at each phase with the corresponding forward projections of the deformed reference phase of 4D-CBCT. The entire process of this 4D-CBCT reconstruction method is implemented on GPU, resulting in significantly increasing the computational efficiency due to its tremendous parallel computing ability. Results: A 4D XCAT digital phantom was used to test the proposed mesh-based image reconstruction algorithm. The image Result shows both bone structures and inside of the lung are well-preserved and the tumor position can be well captured. Compared to the previous voxel-based CPU implementation of SMEIR, the proposed method is about 157 times faster for reconstructing a 10 -phase 4D-CBCT with dimension 256×256×150. Conclusion: The GPU-based parallel 4D CBCT reconstruction method uses the feature-based mesh for estimating motion model and demonstrates equivalent image Result with previous voxel-based SMEIR approach, with significantly improved computational speed
PCG: A software package for the iterative solution of linear systems on scalar, vector and parallel computers

Energy Technology Data Exchange (ETDEWEB)

Joubert, W. [Los Alamos National Lab., NM (United States); Carey, G.F. [Univ. of Texas, Austin, TX (United States)

1994-12-31

A great need exists for high performance numerical software libraries transportable across parallel machines. This talk concerns the PCG package, which solves systems of linear equations by iterative methods on parallel computers. The features of the package are discussed, as well as techniques used to obtain high performance as well as transportability across architectures. Representative numerical results are presented for several machines including the Connection Machine CM-5, Intel Paragon and Cray T3D parallel computers.
Massively Parallel Computing: A Sandia Perspective

Energy Technology Data Exchange (ETDEWEB)

Dosanjh, Sudip S.; Greenberg, David S.; Hendrickson, Bruce; Heroux, Michael A.; Plimpton, Steve J.; Tomkins, James L.; Womble, David E.

1999-05-06

The computing power available to scientists and engineers has increased dramatically in the past decade, due in part to progress in making massively parallel computing practical and available. The expectation for these machines has been great. The reality is that progress has been slower than expected. Nevertheless, massively parallel computing is beginning to realize its potential for enabling significant break-throughs in science and engineering. This paper provides a perspective on the state of the field, colored by the authors' experiences using large scale parallel machines at Sandia National Laboratories. We address trends in hardware, system software and algorithms, and we also offer our view of the forces shaping the parallel computing industry.
Providing full point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer

Energy Technology Data Exchange (ETDEWEB)

Archer, Charles J.; Faraj, Daniel A.; Inglett, Todd A.; Ratterman, Joseph D.

2018-01-30

Methods, apparatus, and products are disclosed for providing full point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer, each compute node connected to each adjacent compute node in the global combining network through a link, that include: receiving a network packet in a compute node, the network packet specifying a destination compute node; selecting, in dependence upon the destination compute node, at least one of the links for the compute node along which to forward the network packet toward the destination compute node; and forwarding the network packet along the selected link to the adjacent compute node connected to the compute node through the selected link.
Aspects of computation on asynchronous parallel processors

International Nuclear Information System (INIS)

Wright, M.

1989-01-01

The increasing availability of asynchronous parallel processors has provided opportunities for original and useful work in scientific computing. However, the field of parallel computing is still in a highly volatile state, and researchers display a wide range of opinion about many fundamental questions such as models of parallelism, approaches for detecting and analyzing parallelism of algorithms, and tools that allow software developers and users to make effective use of diverse forms of complex hardware. This volume collects the work of researchers specializing in different aspects of parallel computing, who met to discuss the framework and the mechanics of numerical computing. The far-reaching impact of high-performance asynchronous systems is reflected in the wide variety of topics, which include scientific applications (e.g. linear algebra, lattice gauge simulation, ordinary and partial differential equations), models of parallelism, parallel language features, task scheduling, automatic parallelization techniques, tools for algorithm development in parallel environments, and system design issues
A discrete ordinate response matrix method for massively parallel computers

International Nuclear Information System (INIS)

Hanebutte, U.R.; Lewis, E.E.

1991-01-01

A discrete ordinate response matrix method is formulated for the solution of neutron transport problems on massively parallel computers. The response matrix formulation eliminates iteration on the scattering source. The nodal matrices which result from the diamond-differenced equations are utilized in a factored form which minimizes memory requirements and significantly reduces the required number of algorithm utilizes massive parallelism by assigning each spatial node to a processor. The algorithm is accelerated effectively by a synthetic method in which the low-order diffusion equations are also solved by massively parallel red/black iterations. The method has been implemented on a 16k Connection Machine-2, and S 8 and S 16 solutions have been obtained for fixed-source benchmark problems in X--Y geometry
Parallel Connection of Silicon Carbide MOSFETs for Multichip Power Modules

DEFF Research Database (Denmark)

Li, Helong

challenges from the manufacture and application points of view. The less mature manufacture process limits the yield and the single die size of the SiC MOSFETs, which results a smaller current capability of a single SiC MOSFET die. Consequently, in high current application, the paralleled connections of Si...... connections for the paralleled dies are presented and the source of the transient current imbalance is concluded. To mitigate the transient current imbalance in the traditional DBC layout, a novel DBC layout with split output is proposed. First, the working mechanism of the split output topology is studied...... the current sharing performance among the paralleled SiC MOSFET dies in the power module. The proposed DBC layout is not only limited for SiC MOSFETs, but also for Si IGBTs and other voltage controlled devices. of the circuit mismatch on the paralleled connection of SiC MOSFETs. It reveals the circuit...
The FORCE: A portable parallel programming language supporting computational structural mechanics

Science.gov (United States)

Jordan, Harry F.; Benten, Muhammad S.; Brehm, Juergen; Ramanan, Aruna

1989-01-01

This project supports the conversion of codes in Computational Structural Mechanics (CSM) to a parallel form which will efficiently exploit the computational power available from multiprocessors. The work is a part of a comprehensive, FORTRAN-based system to form a basis for a parallel version of the NICE/SPAR combination which will form the CSM Testbed. The software is macro-based and rests on the force methodology developed by the principal investigator in connection with an early scientific multiprocessor. Machine independence is an important characteristic of the system so that retargeting it to the Flex/32, or any other multiprocessor on which NICE/SPAR might be imnplemented, is well supported. The principal investigator has experience in producing parallel software for both full and sparse systems of linear equations using the force macros. Other researchers have used the Force in finite element programs. It has been possible to rapidly develop software which performs at maximum efficiency on a multiprocessor. The inherent machine independence of the system also means that the parallelization will not be limited to a specific multiprocessor.
Dynamic stability calculations for power grids employing a parallel computer

Energy Technology Data Exchange (ETDEWEB)

Schmidt, K

1982-06-01

The aim of dynamic contingency calculations in power systems is to estimate the effects of assumed disturbances, such as loss of generation. Due to the large dimensions of the problem these simulations require considerable computing time and costs, to the effect that they are at present only used in a planning state but not for routine checks in power control stations. In view of the homogeneity of the problem, where a multitude of equal generator models, having different parameters, are to be integrated simultaneously, the use of a parallel computer looks very attractive. The results of this study employing a prototype parallel computer (SMS 201) are presented. It consists of up to 128 equal microcomputers bus-connected to a control computer. Each of the modules is programmed to simulate a node of the power grid. Generators with their associated control are represented by models of 13 states each. Passive nodes are complemented by 'phantom'-generators, so that the whole power grid is homogenous, thus removing the need for load-flow-iterations. Programming of microcomputers is essentially performed in FORTRAN.
Data communications in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2013-11-12

Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.
Harmonic resonance assessment of multiple paralleled grid-connected inverters system

DEFF Research Database (Denmark)

Wang, Yanbo; Wang, Xiongfei; Blaabjerg, Frede

2017-01-01

This paper presents an eigenvalue-based impedance stability analytical method of multiple paralleled grid-connected inverter system. Different from the conventional impedance-based stability criterion, this work first built the state-space model of paralleled grid-connected inverters. On the basis...... of this, a bridge between the state-space-based modelling and impedance-based stability criterion is presented. The proposed method is able to perform stability assessment locally at the connection points of the component. Meanwhile, the eigenvalue-based sensitivity analysis is adopted to identify...
Template based parallel checkpointing in a massively parallel computer system

Science.gov (United States)

Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN

2009-01-13

A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
Intercluster Connection in Cognitive Wireless Mesh Networks Based on Intelligent Network Coding

Science.gov (United States)

Chen, Xianfu; Zhao, Zhifeng; Jiang, Tao; Grace, David; Zhang, Honggang

2009-12-01

Cognitive wireless mesh networks have great flexibility to improve spectrum resource utilization, within which secondary users (SUs) can opportunistically access the authorized frequency bands while being complying with the interference constraint as well as the QoS (Quality-of-Service) requirement of primary users (PUs). In this paper, we consider intercluster connection between the neighboring clusters under the framework of cognitive wireless mesh networks. Corresponding to the collocated clusters, data flow which includes the exchanging of control channel messages usually needs four time slots in traditional relaying schemes since all involved nodes operate in half-duplex mode, resulting in significant bandwidth efficiency loss. The situation is even worse at the gateway node connecting the two colocated clusters. A novel scheme based on network coding is proposed in this paper, which needs only two time slots to exchange the same amount of information mentioned above. Our simulation shows that the network coding-based intercluster connection has the advantage of higher bandwidth efficiency compared with the traditional strategy. Furthermore, how to choose an optimal relaying transmission power level at the gateway node in an environment of coexisting primary and secondary users is discussed. We present intelligent approaches based on reinforcement learning to solve the problem. Theoretical analysis and simulation results both show that the intelligent approaches can achieve optimal throughput for the intercluster relaying in the long run.
Analysis of series resonant converter with series-parallel connection

Science.gov (United States)

Lin, Bor-Ren; Huang, Chien-Lan

2011-02-01

In this study, a parallel inductor-inductor-capacitor (LLC) resonant converter series-connected on the primary side and parallel-connected on the secondary side is presented for server power supply systems. Based on series resonant behaviour, the power metal-oxide-semiconductor field-effect transistors are turned on at zero voltage switching and the rectifier diodes are turned off at zero current switching. Thus, the switching losses on the power semiconductors are reduced. In the proposed converter, the primary windings of the two LLC converters are connected in series. Thus, the two converters have the same primary currents to ensure that they can supply the balance load current. On the output side, two LLC converters are connected in parallel to share the load current and to reduce the current stress on the secondary windings and the rectifier diodes. In this article, the principle of operation, steady-state analysis and design considerations of the proposed converter are provided and discussed. Experiments with a laboratory prototype with a 24 V/21 A output for server power supply were performed to verify the effectiveness of the proposed converter.
Data communications in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2013-10-29

Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a data communications instruction, the instruction characterized by an instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance with the instruction type, the transfer data from the origin endpoint to the target endpoint.
Adaptive mesh refinement for shocks and material interfaces

Energy Technology Data Exchange (ETDEWEB)

Dai, William Wenlong [Los Alamos National Laboratory

2010-01-01

There are three kinds of adaptive mesh refinement (AMR) in structured meshes. Block-based AMR sometimes over refines meshes. Cell-based AMR treats cells cell by cell and thus loses the advantage of the nature of structured meshes. Patch-based AMR is intended to combine advantages of block- and cell-based AMR, i.e., the nature of structured meshes and sharp regions of refinement. But, patch-based AMR has its own difficulties. For example, patch-based AMR typically cannot preserve symmetries of physics problems. In this paper, we will present an approach for a patch-based AMR for hydrodynamics simulations. The approach consists of clustering, symmetry preserving, mesh continuity, flux correction, communications, management of patches, and load balance. The special features of this patch-based AMR include symmetry preserving, efficiency of refinement across shock fronts and material interfaces, special implementation of flux correction, and patch management in parallel computing environments. To demonstrate the capability of the AMR framework, we will show both two- and three-dimensional hydrodynamics simulations with many levels of refinement.
Parallel finite elements with domain decomposition and its pre-processing

International Nuclear Information System (INIS)

Yoshida, A.; Yagawa, G.; Hamada, S.

1993-01-01

This paper describes a parallel finite element analysis using a domain decomposition method, and the pre-processing for the parallel calculation. Computer simulations are about to replace experiments in various fields, and the scale of model to be simulated tends to be extremely large. On the other hand, computational environment has drastically changed in these years. Especially, parallel processing on massively parallel computers or computer networks is considered to be promising techniques. In order to achieve high efficiency on such parallel computation environment, large granularity of tasks, a well-balanced workload distribution are key issues. It is also important to reduce the cost of pre-processing in such parallel FEM. From the point of view, the authors developed the domain decomposition FEM with the automatic and dynamic task-allocation mechanism and the automatic mesh generation/domain subdivision system for it. (author)

Documentation for MeshKit - Reactor Geometry (&mesh) Generator

Energy Technology Data Exchange (ETDEWEB)

Jain, Rajeev [Argonne National Lab. (ANL), Argonne, IL (United States); Mahadevan, Vijay [Argonne National Lab. (ANL), Argonne, IL (United States)

2015-09-30

This report gives documentation for using MeshKit’s Reactor Geometry (and mesh) Generator (RGG) GUI and also briefly documents other algorithms and tools available in MeshKit. RGG is a program designed to aid in modeling and meshing of complex/large hexagonal and rectilinear reactor cores. RGG uses Argonne’s SIGMA interfaces, Qt and VTK to produce an intuitive user interface. By integrating a 3D view of the reactor with the meshing tools and combining them into one user interface, RGG streamlines the task of preparing a simulation mesh and enables real-time feedback that reduces accidental scripting mistakes that could waste hours of meshing. RGG interfaces with MeshKit tools to consolidate the meshing process, meaning that going from model to mesh is as easy as a button click. This report is designed to explain RGG v 2.0 interface and provide users with the knowledge and skills to pilot RGG successfully. Brief documentation of MeshKit source code, tools and other algorithms available are also presented for developers to extend and add new algorithms to MeshKit. RGG tools work in serial and parallel and have been used to model complex reactor core models consisting of conical pins, load pads, several thousands of axially varying material properties of instrumentation pins and other interstices meshes.
Local adaptive mesh refinement for shock hydrodynamics

International Nuclear Information System (INIS)

Berger, M.J.; Colella, P.; Lawrence Livermore Laboratory, Livermore, 94550 California)

1989-01-01

The aim of this work is the development of an automatic, adaptive mesh refinement strategy for solving hyperbolic conservation laws in two dimensions. There are two main difficulties in doing this. The first problem is due to the presence of discontinuities in the solution and the effect on them of discontinuities in the mesh. The second problem is how to organize the algorithm to minimize memory and CPU overhead. This is an important consideration and will continue to be important as more sophisticated algorithms that use data structures other than arrays are developed for use on vector and parallel computers. copyright 1989 Academic Press, Inc
Parallel computing by Monte Carlo codes MVP/GMVP

International Nuclear Information System (INIS)

Nagaya, Yasunobu; Nakagawa, Masayuki; Mori, Takamasa

2001-01-01

General-purpose Monte Carlo codes MVP/GMVP are well-vectorized and thus enable us to perform high-speed Monte Carlo calculations. In order to achieve more speedups, we parallelized the codes on the different types of parallel computing platforms or by using a standard parallelization library MPI. The platforms used for benchmark calculations are a distributed-memory vector-parallel computer Fujitsu VPP500, a distributed-memory massively parallel computer Intel paragon and a distributed-memory scalar-parallel computer Hitachi SR2201, IBM SP2. As mentioned generally, linear speedup could be obtained for large-scale problems but parallelization efficiency decreased as the batch size per a processing element(PE) was smaller. It was also found that the statistical uncertainty for assembly powers was less than 0.1% by the PWR full-core calculation with more than 10 million histories and it took about 1.5 hours by massively parallel computing. (author)
Finite element electromagnetic field computation on the Sequent Symmetry 81 parallel computer

International Nuclear Information System (INIS)

Ratnajeevan, S.; Hoole, H.

1990-01-01

Finite element field analysis algorithms lend themselves to parallelization and this fact is exploited in this paper to implement a finite element analysis program for electromagnetic field computation on the Sequent Symmetry 81 parallel computer with three processors. In terms of waiting time, the maximum gains are to be made in matrix solution and therefore this paper concentrates on the gains in parallelizing the solution part of finite element analysis. An outline of how parallelization could be exploited in most finite element operations is given in this paper although the actual implemention of parallelism on the Sequent Symmetry 81 parallel computer was in sparsity computation, matrix assembly and the matrix solution areas. In all cases, the algorithms were modified suit the parallel programming application rather than allowing the compiler to parallelize on existing algorithms
Algorithmically specialized parallel computers

CERN Document Server

Snyder, Lawrence; Gannon, Dennis B

1985-01-01

Algorithmically Specialized Parallel Computers focuses on the concept and characteristics of an algorithmically specialized computer.This book discusses the algorithmically specialized computers, algorithmic specialization using VLSI, and innovative architectures. The architectures and algorithms for digital signal, speech, and image processing and specialized architectures for numerical computations are also elaborated. Other topics include the model for analyzing generalized inter-processor, pipelined architecture for search tree maintenance, and specialized computer organization for raster
Domain decomposition methods and parallel computing

International Nuclear Information System (INIS)

Meurant, G.

1991-01-01

In this paper, we show how to efficiently solve large linear systems on parallel computers. These linear systems arise from discretization of scientific computing problems described by systems of partial differential equations. We show how to get a discrete finite dimensional system from the continuous problem and the chosen conjugate gradient iterative algorithm is briefly described. Then, the different kinds of parallel architectures are reviewed and their advantages and deficiencies are emphasized. We sketch the problems found in programming the conjugate gradient method on parallel computers. For this algorithm to be efficient on parallel machines, domain decomposition techniques are introduced. We give results of numerical experiments showing that these techniques allow a good rate of convergence for the conjugate gradient algorithm as well as computational speeds in excess of a billion of floating point operations per second. (author). 5 refs., 11 figs., 2 tabs., 1 inset
The Research of the Parallel Computing Development from the Angle of Cloud Computing

Science.gov (United States)

Peng, Zhensheng; Gong, Qingge; Duan, Yanyu; Wang, Yun

2017-10-01

Cloud computing is the development of parallel computing, distributed computing and grid computing. The development of cloud computing makes parallel computing come into people’s lives. Firstly, this paper expounds the concept of cloud computing and introduces two several traditional parallel programming model. Secondly, it analyzes and studies the principles, advantages and disadvantages of OpenMP, MPI and Map Reduce respectively. Finally, it takes MPI, OpenMP models compared to Map Reduce from the angle of cloud computing. The results of this paper are intended to provide a reference for the development of parallel computing.
Parallel reservoir simulator computations

International Nuclear Information System (INIS)

Hemanth-Kumar, K.; Young, L.C.

1995-01-01

The adaptation of a reservoir simulator for parallel computations is described. The simulator was originally designed for vector processors. It performs approximately 99% of its calculations in vector/parallel mode and relative to scalar calculations it achieves speedups of 65 and 81 for black oil and EOS simulations, respectively on the CRAY C-90
CX: A Scalable, Robust Network for Parallel Computing

Directory of Open Access Journals (Sweden)

Peter Cappello

2002-01-01

Full Text Available CX, a network-based computational exchange, is presented. The system's design integrates variations of ideas from other researchers, such as work stealing, non-blocking tasks, eager scheduling, and space-based coordination. The object-oriented API is simple, compact, and cleanly separates application logic from the logic that supports interprocess communication and fault tolerance. Computations, of course, run to completion in the presence of computational hosts that join and leave the ongoing computation. Such hosts, or producers, use task caching and prefetching to overlap computation with interprocessor communication. To break a potential task server bottleneck, a network of task servers is presented. Even though task servers are envisioned as reliable, the self-organizing, scalable network of n- servers, described as a sibling-connected height-balanced fat tree, tolerates a sequence of n-1 server failures. Tasks are distributed throughout the server network via a simple "diffusion" process. CX is intended as a test bed for research on automated silent auctions, reputation services, authentication services, and bonding services. CX also provides a test bed for algorithm research into network-based parallel computation.
Partitioning of unstructured meshes for load balancing

International Nuclear Information System (INIS)

Martin, O.C.; Otto, S.W.

1994-01-01

Many large-scale engineering and scientific calculations involve repeated updating of variables on an unstructured mesh. To do these types of computations on distributed memory parallel computers, it is necessary to partition the mesh among the processors so that the load balance is maximized and inter-processor communication time is minimized. This can be approximated by the problem, of partitioning a graph so as to obtain a minimum cut, a well-studied combinatorial optimization problem. Graph partitioning algorithms are discussed that give good but not necessarily optimum solutions. These algorithms include local search methods recursive spectral bisection, and more general purpose methods such as simulated annealing. It is shown that a general procedure enables to combine simulated annealing with Kernighan-Lin. The resulting algorithm is both very fast and extremely effective. (authors) 23 refs., 3 figs., 1 tab
Adaptive mesh refinement in titanium

Energy Technology Data Exchange (ETDEWEB)

Colella, Phillip; Wen, Tong

2005-01-21

In this paper, we evaluate Titanium's usability as a high-level parallel programming language through a case study, where we implement a subset of Chombo's functionality in Titanium. Chombo is a software package applying the Adaptive Mesh Refinement methodology to numerical Partial Differential Equations at the production level. In Chombo, the library approach is used to parallel programming (C++ and Fortran, with MPI), whereas Titanium is a Java dialect designed for high-performance scientific computing. The performance of our implementation is studied and compared with that of Chombo in solving Poisson's equation based on two grid configurations from a real application. Also provided are the counts of lines of code from both sides.
Parallel Computing Using Web Servers and "Servlets".

Science.gov (United States)

Lo, Alfred; Bloor, Chris; Choi, Y. K.

2000-01-01

Describes parallel computing and presents inexpensive ways to implement a virtual parallel computer with multiple Web servers. Highlights include performance measurement of parallel systems; models for using Java and intranet technology including single server, multiple clients and multiple servers, single client; and a comparison of CGI (common…
Existence of parallel spinors on non-simply-connected Riemannian manifolds

International Nuclear Information System (INIS)

McInnes, B.

1997-04-01

It is well known, and important for applications, that Ricci-flat Riemannian manifolds of non-generic holonomy always admit a parallel [covariant constant] spinor if they are simply connected. The non-simply-connected case is much more subtle, however. We show that a parallel spinor can still be found in this case provided that the [real] dimension is not a multiple of four, and provided that the spin structure is carefully chosen. (author). 10 refs
Analysis of parallel computing performance of the code MCNP

International Nuclear Information System (INIS)

Wang Lei; Wang Kan; Yu Ganglin

2006-01-01

Parallel computing can reduce the running time of the code MCNP effectively. With the MPI message transmitting software, MCNP5 can achieve its parallel computing on PC cluster with Windows operating system. Parallel computing performance of MCNP is influenced by factors such as the type, the complexity level and the parameter configuration of the computing problem. This paper analyzes the parallel computing performance of MCNP regarding with these factors and gives measures to improve the MCNP parallel computing performance. (authors)
Sharing of nonlinear load in parallel-connected three-phase converters

DEFF Research Database (Denmark)

Borup, Uffe; Blaabjerg, Frede; Enjeti, Prasad N.

2001-01-01

compensation are connected in parallel. Without the new solution, they are normally not able to distinguish the harmonic currents that flow to the load and harmonic currents that circulate between the converters. Analysis and experimental results on two 90-kVA 400-Hz converters in parallel are presented......In this paper, a new control method is presented which enables equal sharing of linear and nonlinear loads in three-phase power converters connected in parallel, without communication between the converters. The paper focuses on solving the problem that arises when two converters with harmonic....... The results show that both linear and nonlinear loads can be shared equally by the proposed concept....
Parallel computing and networking; Heiretsu keisanki to network

Energy Technology Data Exchange (ETDEWEB)

Asakawa, E; Tsuru, T [Japan National Oil Corp., Tokyo (Japan); Matsuoka, T [Japan Petroleum Exploration Co. Ltd., Tokyo (Japan)

1996-05-01

This paper describes the trend of parallel computers used in geophysical exploration. Around 1993 was the early days when the parallel computers began to be used for geophysical exploration. Classification of these computers those days was mainly MIMD (multiple instruction stream, multiple data stream), SIMD (single instruction stream, multiple data stream) and the like. Parallel computers were publicized in the 1994 meeting of the Geophysical Exploration Society as a `high precision imaging technology`. Concerning the library of parallel computers, there was a shift to PVM (parallel virtual machine) in 1993 and to MPI (message passing interface) in 1995. In addition, the compiler of FORTRAN90 was released with support implemented for data parallel and vector computers. In 1993, networks used were Ethernet, FDDI, CDDI and HIPPI. In 1995, the OC-3 products under ATM began to propagate. However, ATM remains to be an interoffice high speed network because the ATM service has not spread yet for the public network. 1 ref.
Two-phase flow steam generator simulations on parallel computers using domain decomposition method

International Nuclear Information System (INIS)

Belliard, M.

2003-01-01

Within the framework of the Domain Decomposition Method (DDM), we present industrial steady state two-phase flow simulations of PWR Steam Generators (SG) using iteration-by-sub-domain methods: standard and Adaptive Dirichlet/Neumann methods (ADN). The averaged mixture balance equations are solved by a Fractional-Step algorithm, jointly with the Crank-Nicholson scheme and the Finite Element Method. The algorithm works with overlapping or non-overlapping sub-domains and with conforming or nonconforming meshing. Computations are run on PC networks or on massively parallel mainframe computers. A CEA code-linker and the PVM package are used (master-slave context). SG mock-up simulations, involving up to 32 sub-domains, highlight the efficiency (speed-up, scalability) and the robustness of the chosen approach. With the DDM, the computational problem size is easily increased to about 1,000,000 cells and the CPU time is significantly reduced. The difficulties related to industrial use are also discussed. (author)
Parallel simulation of tsunami inundation on a large-scale supercomputer

Science.gov (United States)

Oishi, Y.; Imamura, F.; Sugawara, D.

2013-12-01

An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the
Parallel eigenanalysis of finite element models in a completely connected architecture

Science.gov (United States)

Akl, F. A.; Morel, M. R.

1989-01-01

A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi) = (M)(phi)(omega), where (K) and (M) are of order N, and (omega) is order of q. The concurrent solution of the eigenproblem is based on the multifrontal/modified subspace method and is achieved in a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm was successfully implemented on a tightly coupled multiple-instruction multiple-data parallel processing machine, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macrotasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. A parallel finite element dynamic analysis program, p-feda, is documented and the performance of its subroutines in parallel environment is analyzed.
Parallel Computing in SCALE

International Nuclear Information System (INIS)

DeHart, Mark D.; Williams, Mark L.; Bowman, Stephen M.

2010-01-01

The SCALE computational architecture has remained basically the same since its inception 30 years ago, although constituent modules and capabilities have changed significantly. This SCALE concept was intended to provide a framework whereby independent codes can be linked to provide a more comprehensive capability than possible with the individual programs - allowing flexibility to address a wide variety of applications. However, the current system was designed originally for mainframe computers with a single CPU and with significantly less memory than today's personal computers. It has been recognized that the present SCALE computation system could be restructured to take advantage of modern hardware and software capabilities, while retaining many of the modular features of the present system. Preliminary work is being done to define specifications and capabilities for a more advanced computational architecture. This paper describes the state of current SCALE development activities and plans for future development. With the release of SCALE 6.1 in 2010, a new phase of evolutionary development will be available to SCALE users within the TRITON and NEWT modules. The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system developed by Oak Ridge National Laboratory (ORNL) provides a comprehensive and integrated package of codes and nuclear data for a wide range of applications in criticality safety, reactor physics, shielding, isotopic depletion and decay, and sensitivity/uncertainty (S/U) analysis. Over the last three years, since the release of version 5.1 in 2006, several important new codes have been introduced within SCALE, and significant advances applied to existing codes. Many of these new features became available with the release of SCALE 6.0 in early 2009. However, beginning with SCALE 6.1, a first generation of parallel computing is being introduced. In addition to near-term improvements, a plan for longer term SCALE enhancement

Second-order particle-in-cell (PIC) computational method in the one-dimensional variable Eulerian mesh system

International Nuclear Information System (INIS)

Pyun, J.J.

1981-01-01

As part of an effort to incorporate the variable Eulerian mesh into the second-order PIC computational method, a truncation error analysis was performed to calculate the second-order error terms for the variable Eulerian mesh system. The results that the maximum mesh size increment/decrement is limited to be α(Δr/sub i/) 2 where Δr/sub i/ is a non-dimensional mesh size of the ith cell, and α is a constant of order one. The numerical solutions of Burgers' equation by the second-order PIC method in the variable Eulerian mesh system wer compared with its exact solution. It was found that the second-order accuracy in the PIC method was maintained under the above condition. Additional problems were analyzed using the second-order PIC methods in both variable and uniform Eulerian mesh systems. The results indicate that the second-order PIC method in the variable Eulerian mesh system can provide substantial computational time saving with no loss in accuracy
A novel two-level dynamic parallel data scheme for large 3-D SN calculations

International Nuclear Information System (INIS)

Sjoden, G.E.; Shedlock, D.; Haghighat, A.; Yi, C.

2005-01-01

We introduce a new dynamic parallel memory optimization scheme for executing large scale 3-D discrete ordinates (Sn) simulations on distributed memory parallel computers. In order for parallel transport codes to be truly scalable, they must use parallel data storage, where only the variables that are locally computed are locally stored. Even with parallel data storage for the angular variables, cumulative storage requirements for large discrete ordinates calculations can be prohibitive. To address this problem, Memory Tuning has been implemented into the PENTRAN 3-D parallel discrete ordinates code as an optimized, two-level ('large' array, 'small' array) parallel data storage scheme. Memory Tuning can be described as the process of parallel data memory optimization. Memory Tuning dynamically minimizes the amount of required parallel data in allocated memory on each processor using a statistical sampling algorithm. This algorithm is based on the integral average and standard deviation of the number of fine meshes contained in each coarse mesh in the global problem. Because PENTRAN only stores the locally computed problem phase space, optimal two-level memory assignments can be unique on each node, depending upon the parallel decomposition used (hybrid combinations of angular, energy, or spatial). As demonstrated in the two large discrete ordinates models presented (a storage cask and an OECD MOX Benchmark), Memory Tuning can save a substantial amount of memory per parallel processor, allowing one to accomplish very large scale Sn computations. (authors)
Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2014-08-12

Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Parallel visualization on leadership computing resources

Energy Technology Data Exchange (ETDEWEB)

Peterka, T; Ross, R B [Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439 (United States); Shen, H-W [Department of Computer Science and Engineering, Ohio State University, Columbus, OH 43210 (United States); Ma, K-L [Department of Computer Science, University of California at Davis, Davis, CA 95616 (United States); Kendall, W [Department of Electrical Engineering and Computer Science, University of Tennessee at Knoxville, Knoxville, TN 37996 (United States); Yu, H, E-mail: tpeterka@mcs.anl.go [Sandia National Laboratories, California, Livermore, CA 94551 (United States)

2009-07-01

Changes are needed in the way that visualization is performed, if we expect the analysis of scientific data to be effective at the petascale and beyond. By using similar techniques as those used to parallelize simulations, such as parallel I/O, load balancing, and effective use of interprocess communication, the supercomputers that compute these datasets can also serve as analysis and visualization engines for them. Our team is assessing the feasibility of performing parallel scientific visualization on some of the most powerful computational resources of the U.S. Department of Energy's National Laboratories in order to pave the way for analyzing the next generation of computational results. This paper highlights some of the conclusions of that research.
Parallel visualization on leadership computing resources

International Nuclear Information System (INIS)

Peterka, T; Ross, R B; Shen, H-W; Ma, K-L; Kendall, W; Yu, H

2009-01-01

Changes are needed in the way that visualization is performed, if we expect the analysis of scientific data to be effective at the petascale and beyond. By using similar techniques as those used to parallelize simulations, such as parallel I/O, load balancing, and effective use of interprocess communication, the supercomputers that compute these datasets can also serve as analysis and visualization engines for them. Our team is assessing the feasibility of performing parallel scientific visualization on some of the most powerful computational resources of the U.S. Department of Energy's National Laboratories in order to pave the way for analyzing the next generation of computational results. This paper highlights some of the conclusions of that research.
Parallel computing solution of Boltzmann neutron transport equation

International Nuclear Information System (INIS)

Ansah-Narh, T.

2010-01-01

The focus of the research was on developing parallel computing algorithm for solving Eigen-values of the Boltzmam Neutron Transport Equation (BNTE) in a slab geometry using multi-grid approach. In response to the problem of slow execution of serial computing when solving large problems, such as BNTE, the study was focused on the design of parallel computing systems which was an evolution of serial computing that used multiple processing elements simultaneously to solve complex physical and mathematical problems. Finite element method (FEM) was used for the spatial discretization scheme, while angular discretization was accomplished by expanding the angular dependence in terms of Legendre polynomials. The eigenvalues representing the multiplication factors in the BNTE were determined by the power method. MATLAB Compiler Version 4.1 (R2009a) was used to compile the MATLAB codes of BNTE. The implemented parallel algorithms were enabled with matlabpool, a Parallel Computing Toolbox function. The option UseParallel was set to 'always' and the default value of the option was 'never'. When those conditions held, the solvers computed estimated gradients in parallel. The parallel computing system was used to handle all the bottlenecks in the matrix generated from the finite element scheme and each domain of the power method generated. The parallel algorithm was implemented on a Symmetric Multi Processor (SMP) cluster machine, which had Intel 32 bit quad-core x 86 processors. Convergence rates and timings for the algorithm on the SMP cluster machine were obtained. Numerical experiments indicated the designed parallel algorithm could reach perfect speedup and had good stability and scalability. (au)
State-space-based harmonic stability analysis for paralleled grid-connected inverters

DEFF Research Database (Denmark)

Wang, Yanbo; Wang, Xiongfei; Chen, Zhe

2016-01-01

This paper addresses a state-space-based harmonic stability analysis of paralleled grid-connected inverters system. A small signal model of individual inverter is developed, where LCL filter, the equivalent delay of control system, and current controller are modeled. Then, the overall small signal...... model of paralleled grid-connected inverters is built. Finally, the state space-based stability analysis approach is developed to explain the harmonic resonance phenomenon. The eigenvalue traces associated with time delay and coupled grid impedance are obtained, which accounts for how the unstable...... inverter produces the harmonic resonance and leads to the instability of whole paralleled system. The proposed approach reveals the contributions of the grid impedance as well as the coupled effect on other grid-connected inverters under different grid conditions. Simulation and experimental results...
Massively parallel evolutionary computation on GPGPUs

CERN Document Server

Tsutsui, Shigeyoshi

2013-01-01

Evolutionary algorithms (EAs) are metaheuristics that learn from natural collective behavior and are applied to solve optimization problems in domains such as scheduling, engineering, bioinformatics, and finance. Such applications demand acceptable solutions with high-speed execution using finite computational resources. Therefore, there have been many attempts to develop platforms for running parallel EAs using multicore machines, massively parallel cluster machines, or grid computing environments. Recent advances in general-purpose computing on graphics processing units (GPGPU) have opened u
OpenCL-based vicinity computation for 3D multiresolution mesh compression

Science.gov (United States)

Hachicha, Soumaya; Elkefi, Akram; Ben Amar, Chokri

2017-03-01

3D multiresolution mesh compression systems are still widely addressed in many domains. These systems are more and more requiring volumetric data to be processed in real-time. Therefore, the performance is becoming constrained by material resources usage and an overall reduction in the computational time. In this paper, our contribution entirely lies on computing, in real-time, triangles neighborhood of 3D progressive meshes for a robust compression algorithm based on the scan-based wavelet transform(WT) technique. The originality of this latter algorithm is to compute the WT with minimum memory usage by processing data as they are acquired. However, with large data, this technique is considered poor in term of computational complexity. For that, this work exploits the GPU to accelerate the computation using OpenCL as a heterogeneous programming language. Experiments demonstrate that, aside from the portability across various platforms and the flexibility guaranteed by the OpenCL-based implementation, this method can improve performance gain in speedup factor of 5 compared to the sequential CPU implementation.
Concurrent particle-in-cell plasma simulation on a multi-transputer parallel computer

International Nuclear Information System (INIS)

Khare, A.N.; Jethra, A.; Patel, Kartik

1992-01-01

This report describes the parallelization of a Particle-in-Cell (PIC) plasma simulation code on a multi-transputer parallel computer. The algorithm used in the parallelization of the PIC method is described. The decomposition schemes related to the distribution of the particles among the processors are discussed. The implementation of the algorithm on a transputer network connected as a torus is presented. The solutions of the problems related to global communication of data are presented in the form of a set of generalized communication functions. The performance of the program as a function of data size and the number of transputers show that the implementation is scalable and represents an effective way of achieving high performance at acceptable cost. (author). 11 refs., 4 figs., 2 tabs., appendices
Analysis and Modeling of Circulating Current in Two Parallel-Connected Inverters

DEFF Research Database (Denmark)

Maheshwari, Ram Krishan; Gohil, Ghanshyamsinh Vijaysinh; Bede, Lorand

2015-01-01

Parallel-connected inverters are gaining attention for high power applications because of the limited power handling capability of the power modules. Moreover, the parallel-connected inverters may have low total harmonic distortion of the ac current if they are operated with the interleaved pulse...... this model, the circulating current between two parallel-connected inverters is analysed in this study. The peak and root mean square (rms) values of the normalised circulating current are calculated for different PWM methods, which makes this analysis a valuable tool to design a filter for the circulating......-width modulation (PWM). However, the interleaved PWM causes a circulating current between the inverters, which in turn causes additional losses. A model describing the dynamics of the circulating current is presented in this study which shows that the circulating current depends on the common-mode voltage. Using...
Event monitoring of parallel computations

Directory of Open Access Journals (Sweden)

Gruzlikov Alexander M.

2015-06-01

Full Text Available The paper considers the monitoring of parallel computations for detection of abnormal events. It is assumed that computations are organized according to an event model, and monitoring is based on specific test sequences
Novel 3D Compression Methods for Geometry, Connectivity and Texture

Science.gov (United States)

Siddeq, M. M.; Rodrigues, M. A.

2016-06-01

A large number of applications in medical visualization, games, engineering design, entertainment, heritage, e-commerce and so on require the transmission of 3D models over the Internet or over local networks. 3D data compression is an important requirement for fast data storage, access and transmission within bandwidth limitations. The Wavefront OBJ (object) file format is commonly used to share models due to its clear simple design. Normally each OBJ file contains a large amount of data (e.g. vertices and triangulated faces, normals, texture coordinates and other parameters) describing the mesh surface. In this paper we introduce a new method to compress geometry, connectivity and texture coordinates by a novel Geometry Minimization Algorithm (GM-Algorithm) in connection with arithmetic coding. First, each vertex ( x, y, z) coordinates are encoded to a single value by the GM-Algorithm. Second, triangle faces are encoded by computing the differences between two adjacent vertex locations, which are compressed by arithmetic coding together with texture coordinates. We demonstrate the method on large data sets achieving compression ratios between 87 and 99 % without reduction in the number of reconstructed vertices and triangle faces. The decompression step is based on a Parallel Fast Matching Search Algorithm (Parallel-FMS) to recover the structure of the 3D mesh. A comparative analysis of compression ratios is provided with a number of commonly used 3D file formats such as VRML, OpenCTM and STL highlighting the performance and effectiveness of the proposed method.
Massively parallel quantum computer simulator

NARCIS (Netherlands)

De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.

2007-01-01

We describe portable software to simulate universal quantum computers on massive parallel Computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray
Parallel computational in nuclear group constant calculation

International Nuclear Information System (INIS)

Su'ud, Zaki; Rustandi, Yaddi K.; Kurniadi, Rizal

2002-01-01

In this paper parallel computational method in nuclear group constant calculation using collision probability method will be discuss. The main focus is on the calculation of collision matrix which need large amount of computational time. The geometry treated here is concentric cylinder. The calculation of collision probability matrix is carried out using semi analytic method using Beckley Naylor Function. To accelerate computation speed some computer parallel used to solve the problem. We used LINUX based parallelization using PVM software with C or fortran language. While in windows based we used socket programming using DELPHI or C builder. The calculation results shows the important of optimal weight for each processor in case there area many type of processor speed
Paralelno umrežavanje računara / Parallel networking of the computers

Directory of Open Access Journals (Sweden)

Milojko Jevtović

2007-04-01

Full Text Available U radu je izložena originalna koncepcija tehničkog rešenja paralelnog umrežavanja računara, kao i lokalnih računarskih mreža (LAN - Local Area Network, odnosno povezivanje i istovremena komunikacija preko više različitih transportnih telekomunikacionih mreža. Opisano je jedno rešenje paralelnog umrežavanja, kojim je omogućen pouzdani prenos multimedijalnog saobraćaja i prenos podataka u realnom vremenu između računara ili LAN istovremeno preko N (N = 1, 2, 3, 4,.. različitih, međusobno nezavisnih mreža širokog prostranstva (WAN - Wide Area Network. Paralelno umrežavanje zasnovano je na korišćenju univerzalnog modema, čije je rešenje, takođe ukratko predstavljeno. / In this paper, new concept for parallel networking of the computers or LANs over different WAN telecommunications networks, is presented. One solution of the parallel networks, which enables reliable transfer of multimedia traffic and data transmission in real time between a computer of LAN via N (N = 1, 2 3, 4,… different inter-connected Wide Area Network. Connections between computers or LANs and wide area networks are realized using universal modems whose solution has also been presented.
Contact-impact algorithms on parallel computers

International Nuclear Information System (INIS)

Zhong Zhihua; Nilsson, Larsgunnar

1994-01-01

Contact-impact algorithms on parallel computers are discussed within the context of explicit finite element analysis. The algorithms concerned include a contact searching algorithm and an algorithm for contact force calculations. The contact searching algorithm is based on the territory concept of the general HITA algorithm. However, no distinction is made between different contact bodies, or between different contact surfaces. All contact segments from contact boundaries are taken as a single set. Hierarchy territories and contact territories are expanded. A three-dimensional bucket sort algorithm is used to sort contact nodes. The defence node algorithm is used in the calculation of contact forces. Both the contact searching algorithm and the defence node algorithm are implemented on the connection machine CM-200. The performance of the algorithms is examined under different circumstances, and numerical results are presented. ((orig.))
Numerical discrepancy between serial and MPI parallel computations

Directory of Open Access Journals (Sweden)

Sang Bong Lee

2016-09-01

Full Text Available Numerical simulations of 1D Burgers equation and 2D sloshing problem were carried out to study numerical discrepancy between serial and parallel computations. The numerical domain was decomposed into 2 and 4 subdomains for parallel computations with message passing interface. The numerical solution of Burgers equation disclosed that fully explicit boundary conditions used on subdomains of parallel computation was responsible for the numerical discrepancy of transient solution between serial and parallel computations. Two dimensional sloshing problems in a rectangular domain were solved using OpenFOAM. After a lapse of initial transient time sloshing patterns of water were significantly different in serial and parallel computations although the same numerical conditions were given. Based on the histograms of pressure measured at two points near the wall the statistical characteristics of numerical solution was not affected by the number of subdomains as much as the transient solution was dependent on the number of subdomains.
Models of parallel computation :a survey and classification

Institute of Scientific and Technical Information of China (English)

ZHANG Yunquan; CHEN Guoliang; SUN Guangzhong; MIAO Qiankun

2007-01-01

In this paper,the state-of-the-art parallel computational model research is reviewed.We will introduce various models that were developed during the past decades.According to their targeting architecture features,especially memory organization,we classify these parallel computational models into three generations.These models and their characteristics are discussed based on three generations classification.We believe that with the ever increasing speed gap between the CPU and memory systems,incorporating non-uniform memory hierarchy into computational models will become unavoidable.With the emergence of multi-core CPUs,the parallelism hierarchy of current computing platforms becomes more and more complicated.Describing this complicated parallelism hierarchy in future computational models becomes more and more important.A semi-automatic toolkit that can extract model parameters and their values on real computers can reduce the model analysis complexity,thus allowing more complicated models with more parameters to be adopted.Hierarchical memory and hierarchical parallelism will be two very important features that should be considered in future model design and research.
Parallel Simulation of Three-Dimensional Free Surface Fluid Flow Problems

International Nuclear Information System (INIS)

BAER, THOMAS A.; SACKINGER, PHILIP A.; SUBIA, SAMUEL R.

1999-01-01

Simulation of viscous three-dimensional fluid flow typically involves a large number of unknowns. When free surfaces are included, the number of unknowns increases dramatically. Consequently, this class of problem is an obvious application of parallel high performance computing. We describe parallel computation of viscous, incompressible, free surface, Newtonian fluid flow problems that include dynamic contact fines. The Galerkin finite element method was used to discretize the fully-coupled governing conservation equations and a ''pseudo-solid'' mesh mapping approach was used to determine the shape of the free surface. In this approach, the finite element mesh is allowed to deform to satisfy quasi-static solid mechanics equations subject to geometric or kinematic constraints on the boundaries. As a result, nodal displacements must be included in the set of unknowns. Other issues discussed are the proper constraints appearing along the dynamic contact line in three dimensions. Issues affecting efficient parallel simulations include problem decomposition to equally distribute computational work among a SPMD computer and determination of robust, scalable preconditioners for the distributed matrix systems that must be solved. Solution continuation strategies important for serial simulations have an enhanced relevance in a parallel coquting environment due to the difficulty of solving large scale systems. Parallel computations will be demonstrated on an example taken from the coating flow industry: flow in the vicinity of a slot coater edge. This is a three dimensional free surface problem possessing a contact line that advances at the web speed in one region but transitions to static behavior in another region. As such, a significant fraction of the computational time is devoted to processing boundary data. Discussion focuses on parallel speed ups for fixed problem size, a class of problems of immediate practical importance

TU-AB-202-05: GPU-Based 4D Deformable Image Registration Using Adaptive Tetrahedral Mesh Modeling

International Nuclear Information System (INIS)

Zhong, Z; Zhuang, L; Gu, X; Wang, J; Chen, H; Zhen, X

2016-01-01

Purpose: Deformable image registration (DIR) has been employed today as an automated and effective segmentation method to transfer tumor or organ contours from the planning image to daily images, instead of manual segmentation. However, the computational time and accuracy of current DIR approaches are still insufficient for online adaptive radiation therapy (ART), which requires real-time and high-quality image segmentation, especially in a large datasets of 4D-CT images. The objective of this work is to propose a new DIR algorithm, with fast computational speed and high accuracy, by using adaptive feature-based tetrahedral meshing and GPU-based parallelization. Methods: The first step is to generate the adaptive tetrahedral mesh based on the image features of a reference phase of 4D-CT, so that the deformation can be well captured and accurately diffused from the mesh vertices to voxels of the image volume. Subsequently, the deformation vector fields (DVF) and other phases of 4D-CT can be obtained by matching each phase of the target 4D-CT images with the corresponding deformed reference phase. The proposed 4D DIR method is implemented on GPU, resulting in significantly increasing the computational efficiency due to its parallel computing ability. Results: A 4D NCAT digital phantom was used to test the efficiency and accuracy of our method. Both the image and DVF results show that the fine structures and shapes of lung are well preserved, and the tumor position is well captured, i.e., 3D distance error is 1.14 mm. Compared to the previous voxel-based CPU implementation of DIR, such as demons, the proposed method is about 160x faster for registering a 10-phase 4D-CT with a phase dimension of 256×256×150. Conclusion: The proposed 4D DIR method uses feature-based mesh and GPU-based parallelism, which demonstrates the capability to compute both high-quality image and motion results, with significant improvement on the computational speed.
TU-AB-202-05: GPU-Based 4D Deformable Image Registration Using Adaptive Tetrahedral Mesh Modeling

Energy Technology Data Exchange (ETDEWEB)

Zhong, Z; Zhuang, L [Wayne State University, Detroit, MI (United States); Gu, X; Wang, J [UT Southwestern Medical Center, Dallas, TX (United States); Chen, H; Zhen, X [Southern Medical University, Guangzhou, Guangdong (China)

2016-06-15

Purpose: Deformable image registration (DIR) has been employed today as an automated and effective segmentation method to transfer tumor or organ contours from the planning image to daily images, instead of manual segmentation. However, the computational time and accuracy of current DIR approaches are still insufficient for online adaptive radiation therapy (ART), which requires real-time and high-quality image segmentation, especially in a large datasets of 4D-CT images. The objective of this work is to propose a new DIR algorithm, with fast computational speed and high accuracy, by using adaptive feature-based tetrahedral meshing and GPU-based parallelization. Methods: The first step is to generate the adaptive tetrahedral mesh based on the image features of a reference phase of 4D-CT, so that the deformation can be well captured and accurately diffused from the mesh vertices to voxels of the image volume. Subsequently, the deformation vector fields (DVF) and other phases of 4D-CT can be obtained by matching each phase of the target 4D-CT images with the corresponding deformed reference phase. The proposed 4D DIR method is implemented on GPU, resulting in significantly increasing the computational efficiency due to its parallel computing ability. Results: A 4D NCAT digital phantom was used to test the efficiency and accuracy of our method. Both the image and DVF results show that the fine structures and shapes of lung are well preserved, and the tumor position is well captured, i.e., 3D distance error is 1.14 mm. Compared to the previous voxel-based CPU implementation of DIR, such as demons, the proposed method is about 160x faster for registering a 10-phase 4D-CT with a phase dimension of 256×256×150. Conclusion: The proposed 4D DIR method uses feature-based mesh and GPU-based parallelism, which demonstrates the capability to compute both high-quality image and motion results, with significant improvement on the computational speed.
Parallel computing in genomic research: advances and applications

Directory of Open Access Journals (Sweden)

Ocaña K

2015-11-01

Full Text Available Kary Ocaña,1 Daniel de Oliveira2 1National Laboratory of Scientific Computing, Petrópolis, Rio de Janeiro, 2Institute of Computing, Fluminense Federal University, Niterói, Brazil Abstract: Today's genomic experiments have to process the so-called "biological big data" that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities. Keywords: high-performance computing, genomic research, cloud computing, grid computing, cluster computing, parallel computing
Visualization of Octree Adaptive Mesh Refinement (AMR) in Astrophysical Simulations

Science.gov (United States)

Labadens, M.; Chapon, D.; Pomaréde, D.; Teyssier, R.

2012-09-01

Computer simulations are important in current cosmological research. Those simulations run in parallel on thousands of processors, and produce huge amount of data. Adaptive mesh refinement is used to reduce the computing cost while keeping good numerical accuracy in regions of interest. RAMSES is a cosmological code developed by the Commissariat à l'énergie atomique et aux énergies alternatives (English: Atomic Energy and Alternative Energies Commission) which uses Octree adaptive mesh refinement. Compared to grid based AMR, the Octree AMR has the advantage to fit very precisely the adaptive resolution of the grid to the local problem complexity. However, this specific octree data type need some specific software to be visualized, as generic visualization tools works on Cartesian grid data type. This is why the PYMSES software has been also developed by our team. It relies on the python scripting language to ensure a modular and easy access to explore those specific data. In order to take advantage of the High Performance Computer which runs the RAMSES simulation, it also uses MPI and multiprocessing to run some parallel code. We would like to present with more details our PYMSES software with some performance benchmarks. PYMSES has currently two visualization techniques which work directly on the AMR. The first one is a splatting technique, and the second one is a custom ray tracing technique. Both have their own advantages and drawbacks. We have also compared two parallel programming techniques with the python multiprocessing library versus the use of MPI run. The load balancing strategy has to be smartly defined in order to achieve a good speed up in our computation. Results obtained with this software are illustrated in the context of a massive, 9000-processor parallel simulation of a Milky Way-like galaxy.
Parallel computing for event reconstruction in high-energy physics

International Nuclear Information System (INIS)

Wolbers, S.

1993-01-01

Parallel computing has been recognized as a solution to large computing problems. In High Energy Physics offline event reconstruction of detector data is a very large computing problem that has been solved with parallel computing techniques. A review of the parallel programming package CPS (Cooperative Processes Software) developed and used at Fermilab for offline reconstruction of Terabytes of data requiring the delivery of hundreds of Vax-Years per experiment is given. The Fermilab UNIX farms, consisting of 180 Silicon Graphics workstations and 144 IBM RS6000 workstations, are used to provide the computing power for the experiments. Fermilab has had a long history of providing production parallel computing starting with the ACP (Advanced Computer Project) Farms in 1986. The Fermilab UNIX Farms have been in production for over 2 years with 24 hour/day service to experimental user groups. Additional tools for management, control and monitoring these large systems will be described. Possible future directions for parallel computing in High Energy Physics will be given
Parallel multigrid smoothing: polynomial versus Gauss-Seidel

International Nuclear Information System (INIS)

Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

2003-01-01

Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines
Parallel multigrid smoothing: polynomial versus Gauss-Seidel

Science.gov (United States)

Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

2003-07-01

Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines.
Introduction to massively-parallel computing in high-energy physics

CERN Document Server

AUTHOR|(CDS)2083520

1993-01-01

Ever since computers were first used for scientific and numerical work, there has existed an "arms race" between the technical development of faster computing hardware, and the desires of scientists to solve larger problems in shorter time-scales. However, the vast leaps in processor performance achieved through advances in semi-conductor science have reached a hiatus as the technology comes up against the physical limits of the speed of light and quantum effects. This has lead all high performance computer manufacturers to turn towards a parallel architecture for their new machines. In these lectures we will introduce the history and concepts behind parallel computing, and review the various parallel architectures and software environments currently available. We will then introduce programming methodologies that allow efficient exploitation of parallel machines, and present case studies of the parallelization of typical High Energy Physics codes for the two main classes of parallel computing architecture (S...
A parallel direct solver for the self-adaptive hp Finite Element Method

KAUST Repository

Paszyński, Maciej R.

2010-03-01

In this paper we present a new parallel multi-frontal direct solver, dedicated for the hp Finite Element Method (hp-FEM). The self-adaptive hp-FEM generates in a fully automatic mode, a sequence of hp-meshes delivering exponential convergence of the error with respect to the number of degrees of freedom (d.o.f.) as well as the CPU time, by performing a sequence of hp refinements starting from an arbitrary initial mesh. The solver constructs an initial elimination tree for an arbitrary initial mesh, and expands the elimination tree each time the mesh is refined. This allows us to keep track of the order of elimination for the solver. The solver also minimizes the memory usage, by de-allocating partial LU factorizations computed during the elimination stage of the solver, and recomputes them for the backward substitution stage, by utilizing only about 10% of the computational time necessary for the original computations. The solver has been tested on 3D Direct Current (DC) borehole resistivity measurement simulations problems. We measure the execution time and memory usage of the solver over a large regular mesh with 1.5 million degrees of freedom as well as on the highly non-regular mesh, generated by the self-adaptive h p-FEM, with finite elements of various sizes and polynomial orders of approximation varying from p = 1 to p = 9. From the presented experiments it follows that the parallel solver scales well up to the maximum number of utilized processors. The limit for the solver scalability is the maximum sequential part of the algorithm: the computations of the partial LU factorizations over the longest path, coming from the root of the elimination tree down to the deepest leaf. © 2009 Elsevier Inc. All rights reserved.
A new bipolar RRAM selector based on anti-parallel connected diodes for crossbar applications

International Nuclear Information System (INIS)

Li, Yingtao; Gong, Qingchun; Li, Rongrong; Jiang, Xinyu

2014-01-01

Crossbar arrays are the most promising application of a resistive random access memory (RRAM) device for achieving high density memory. However, cross-talk interference in the crossbar array limits the increase in the integration density. In this paper, the combination of two anti-parallel connected diodes and a bipolar RRAM cell is proposed to suppress the sneak current in a crossbar array with anti-parallel connected diodes as the selector for the bipolar RRAM. By using the anti-parallel connected diodes as a selector, the sneak current can be effectively suppressed and the high density crossbar array of more than 1 Mb can be realized as estimated by the 1/2V read voltage scheme. These results indicate that anti-parallel connected diodes can be used as a bipolar selector and have great potential for high density bipolar RRAM crossbar array applications. (papers)
Parallel connecting poloidal coil system for a doublet tokamak fusion reactor

International Nuclear Information System (INIS)

Toffolo, W.E.; Chen, W.Y.; Purcell, J.R.; Wesley, J.C.

1977-09-01

A method has been developed for parallel connection of the ohmic heating (OH) coil. The method involves subdividing the OH-coil into a number of parallel connected subcoils, with each subcoil having about 20 turns. Each of the field shaping coils (F-coils) also contains 20 turns, so that when connected to a common power supply, the OH and F-coils are decoupled. The advantages resulting from the scheme are numerous: (1) each F-coil contains a much smaller number of turns compared with the previous design concept, thus the construction and maintenance will be easier; (2) the parallel connected OH-coils form a constant flux envelope, resulting in an inherently lower error field at the plasma and the TF coil region, and this low error field is not sensitive to the variation in location of the OH-coils; (3) the voltage and current ratings of the individual OH coil conductors are reduced; and (4) the low impedance of the OH-coil system greatly improves the possibility of using a homopolar motor generator as a means of achieving flux reversal during startup and plasma current control during the burn cycle
Algorithms for parallel computers

International Nuclear Information System (INIS)

Churchhouse, R.F.

1985-01-01

Until relatively recently almost all the algorithms for use on computers had been designed on the (usually unstated) assumption that they were to be run on single processor, serial machines. With the introduction of vector processors, array processors and interconnected systems of mainframes, minis and micros, however, various forms of parallelism have become available. The advantage of parallelism is that it offers increased overall processing speed but it also raises some fundamental questions, including: (i) which, if any, of the existing 'serial' algorithms can be adapted for use in the parallel mode. (ii) How close to optimal can such adapted algorithms be and, where relevant, what are the convergence criteria. (iii) How can we design new algorithms specifically for parallel systems. (iv) For multi-processor systems how can we handle the software aspects of the interprocessor communications. Aspects of these questions illustrated by examples are considered in these lectures. (orig.)
CUBESIM, Hypercube and Denelcor Hep Parallel Computer Simulation

International Nuclear Information System (INIS)

Dunigan, T.H.

1988-01-01

1 - Description of program or function: CUBESIM is a set of subroutine libraries and programs for the simulation of message-passing parallel computers and shared-memory parallel computers. Subroutines are supplied to simulate the Intel hypercube and the Denelcor HEP parallel computers. The system permits a user to develop and test parallel programs written in C or FORTRAN on a single processor. The user may alter such hypercube parameters as message startup times, packet size, and the computation-to-communication ratio. The simulation generates a trace file that can be used for debugging, performance analysis, or graphical display. 2 - Method of solution: The CUBESIM simulator is linked with the user's parallel application routines to run as a single UNIX process. The simulator library provides a small operating system to perform process and message management. 3 - Restrictions on the complexity of the problem: Up to 128 processors can be simulated with a virtual memory limit of 6 million bytes. Up to 1000 processes can be simulated
Highly parallel machines and future of scientific computing

International Nuclear Information System (INIS)

Singh, G.S.

1992-01-01

Computing requirement of large scale scientific computing has always been ahead of what state of the art hardware could supply in the form of supercomputers of the day. And for any single processor system the limit to increase in the computing power was realized a few years back itself. Now with the advent of parallel computing systems the availability of machines with the required computing power seems a reality. In this paper the author tries to visualize the future large scale scientific computing in the penultimate decade of the present century. The author summarized trends in parallel computers and emphasize the need for a better programming environment and software tools for optimal performance. The author concludes this paper with critique on parallel architectures, software tools and algorithms. (author). 10 refs., 2 tabs
Cache-Oblivious Mesh Layouts

International Nuclear Information System (INIS)

Yoon, S; Lindstrom, P; Pascucci, V; Manocha, D

2005-01-01

We present a novel method for computing cache-oblivious layouts of large meshes that improve the performance of interactive visualization and geometric processing algorithms. Given that the mesh is accessed in a reasonably coherent manner, we assume no particular data access patterns or cache parameters of the memory hierarchy involved in the computation. Furthermore, our formulation extends directly to computing layouts of multi-resolution and bounding volume hierarchies of large meshes. We develop a simple and practical cache-oblivious metric for estimating cache misses. Computing a coherent mesh layout is reduced to a combinatorial optimization problem. We designed and implemented an out-of-core multilevel minimization algorithm and tested its performance on unstructured meshes composed of tens to hundreds of millions of triangles. Our layouts can significantly reduce the number of cache misses. We have observed 2-20 times speedups in view-dependent rendering, collision detection, and isocontour extraction without any modification of the algorithms or runtime applications
Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

Science.gov (United States)

Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

2017-07-01

Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).
Better than $1/Mflops substained: a scalable PC-based parallel computer for lattice QCD

International Nuclear Information System (INIS)

Fodor, Z.; Papp, G.

2002-02-01

We study the feasibility of a PC-based parallel computer for medium to large scale lattice QCD simulations. Our cluster built at the Eoetvoes Univ., Inst. Theor. Phys. consists of 137 Intel P4-1.7 GHz nodes with 512 MB RDRAM. The 32-bit, single precision sustained performance for dynamical QCD without communication is 1510 Mflops/node with Wilson and 970 Mflops/node with staggered fermions. This gives a total performance of 208 Gflops for Wilson and 133 Gflops for staggered QCD, respectively (for 64-bit applications the performance is approximately halved). The novel feature of our system is its communication architecture. In order to have a scalable, cost-effective machine we use Gigabit Ethernet cards for nearest-neighbor communications in a two-dimensional mesh. This type of communication is cost effective (only 30% of the hardware costs is spent on the communication). According to our benchmark measurements this type of communication results in around 40% communication time fraction for lattices upto 48 3 . 96 in full QCD simulations. The price/sustained-perfomance ratio for full QCD is better than $1/Mflops for Wilson (and around $1.5/Mflops for staggered) quarks for practically any lattice size, which can fit in our parallel computer. (orig.)
Impact analysis on a massively parallel computer

International Nuclear Information System (INIS)

Zacharia, T.; Aramayo, G.A.

1994-01-01

Advanced mathematical techniques and computer simulation play a major role in evaluating and enhancing the design of beverage cans, industrial, and transportation containers for improved performance. Numerical models are used to evaluate the impact requirements of containers used by the Department of Energy (DOE) for transporting radioactive materials. Many of these models are highly compute-intensive. An analysis may require several hours of computational time on current supercomputers despite the simplicity of the models being studied. As computer simulations and materials databases grow in complexity, massively parallel computers have become important tools. Massively parallel computational research at the Oak Ridge National Laboratory (ORNL) and its application to the impact analysis of shipping containers is briefly described in this paper
Magnetic Integration for Parallel Interleaved VSCs Connected in a Whiffletree Configuration

DEFF Research Database (Denmark)

Gohil, Ghanshyamsinh Vijaysinh; Bede, Lorand; Teodorescu, Remus

2016-01-01

The Voltage Source Converters (VSCs) are often connected in parallel to realize a high current rating. In such systems, the harmonic quality of the output voltage can be improved by interleaving the carrier signals of the parallel VSCs. However, an additional inductive filter is often required...
Model-driven product line engineering for mapping parallel algorithms to parallel computing platforms

NARCIS (Netherlands)

Arkin, Ethem; Tekinerdogan, Bedir

2016-01-01

Mapping parallel algorithms to parallel computing platforms requires several activities such as the analysis of the parallel algorithm, the definition of the logical configuration of the platform, the mapping of the algorithm to the logical configuration platform and the implementation of the

Parallel processing for fluid dynamics applications

International Nuclear Information System (INIS)

Johnson, G.M.

1989-01-01

The impact of parallel processing on computational science and, in particular, on computational fluid dynamics is growing rapidly. In this paper, particular emphasis is given to developments which have occurred within the past two years. Parallel processing is defined and the reasons for its importance in high-performance computing are reviewed. Parallel computer architectures are classified according to the number and power of their processing units, their memory, and the nature of their connection scheme. Architectures which show promise for fluid dynamics applications are emphasized. Fluid dynamics problems are examined for parallelism inherent at the physical level. CFD algorithms and their mappings onto parallel architectures are discussed. Several example are presented to document the performance of fluid dynamics applications on present-generation parallel processing devices
Computer-Aided Parallelizer and Optimizer

Science.gov (United States)

Jin, Haoqiang

2011-01-01

The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.
MPI to Coarray Fortran: Experiences with a CFD Solver for Unstructured Meshes

Directory of Open Access Journals (Sweden)

Anuj Sharma

2017-01-01

Full Text Available High-resolution numerical methods and unstructured meshes are required in many applications of Computational Fluid Dynamics (CFD. These methods are quite computationally expensive and hence benefit from being parallelized. Message Passing Interface (MPI has been utilized traditionally as a parallelization strategy. However, the inherent complexity of MPI contributes further to the existing complexity of the CFD scientific codes. The Partitioned Global Address Space (PGAS parallelization paradigm was introduced in an attempt to improve the clarity of the parallel implementation. We present our experiences of converting an unstructured high-resolution compressible Navier-Stokes CFD solver from MPI to PGAS Coarray Fortran. We present the challenges, methodology, and performance measurements of our approach using Coarray Fortran. With the Cray compiler, we observe Coarray Fortran as a viable alternative to MPI. We are hopeful that Intel and open-source implementations could be utilized in the future.
Parallelism in computations in quantum and statistical mechanics

International Nuclear Information System (INIS)

Clementi, E.; Corongiu, G.; Detrich, J.H.

1985-01-01

Often very fundamental biochemical and biophysical problems defy simulations because of limitations in today's computers. We present and discuss a distributed system composed of two IBM 4341 s and/or an IBM 4381 as front-end processors and ten FPS-164 attached array processors. This parallel system - called LCAP - has presently a peak performance of about 110 Mflops; extensions to higher performance are discussed. Presently, the system applications use a modified version of VM/SP as the operating system: description of the modifications is given. Three applications programs have been migrated from sequential to parallel: a molecular quantum mechanical, a Metropolis-Monte Carlo and a molecular dynamics program. Descriptions of the parallel codes are briefly outlined. Use of these parallel codes has already opened up new capabilities for our research. The very positive performance comparisons with today's supercomputers allow us to conclude that parallel computers and programming, of the type we have considered, represent a pragmatic answer to many computationally intensive problems. (orig.)
Massively Parallel Finite Element Programming

KAUST Repository

Heister, Timo; Kronbichler, Martin; Bangerth, Wolfgang

2010-01-01

Today's large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
Massively Parallel Finite Element Programming

KAUST Repository

Heister, Timo

2010-01-01

Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications

Science.gov (United States)

Sun, Xian-He

1997-01-01

Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm
Architecture and VHDL behavioural validation of a parallel processor dedicated to computer vision

International Nuclear Information System (INIS)

Collette, Thierry

1992-01-01

Speeding up image processing is mainly obtained using parallel computers; SIMD processors (single instruction stream, multiple data stream) have been developed, and have proven highly efficient regarding low-level image processing operations. Nevertheless, their performances drop for most intermediate of high level operations, mainly when random data reorganisations in processor memories are involved. The aim of this thesis was to extend the SIMD computer capabilities to allow it to perform more efficiently at the image processing intermediate level. The study of some representative algorithms of this class, points out the limits of this computer. Nevertheless, these limits can be erased by architectural modifications. This leads us to propose SYMPATIX, a new SIMD parallel computer. To valid its new concept, a behavioural model written in VHDL - Hardware Description Language - has been elaborated. With this model, the new computer performances have been estimated running image processing algorithm simulations. VHDL modeling approach allows to perform the system top down electronic design giving an easy coupling between system architectural modifications and their electronic cost. The obtained results show SYMPATIX to be an efficient computer for low and intermediate level image processing. It can be connected to a high level computer, opening up the development of new computer vision applications. This thesis also presents, a top down design method, based on the VHDL, intended for electronic system architects. (author) [fr
SU-E-CAMPUS-I-02: Estimation of the Dosimetric Error Caused by the Voxelization of Hybrid Computational Phantoms Using Triangle Mesh-Based Monte Carlo Transport

Energy Technology Data Exchange (ETDEWEB)

Lee, C [Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD (United States); Badal, A [U.S. Food ' Drug Administration (CDRH/OSEL), Silver Spring, MD (United States)

2014-06-15

Purpose: Computational voxel phantom provides realistic anatomy but the voxel structure may result in dosimetric error compared to real anatomy composed of perfect surface. We analyzed the dosimetric error caused from the voxel structure in hybrid computational phantoms by comparing the voxel-based doses at different resolutions with triangle mesh-based doses. Methods: We incorporated the existing adult male UF/NCI hybrid phantom in mesh format into a Monte Carlo transport code, penMesh that supports triangle meshes. We calculated energy deposition to selected organs of interest for parallel photon beams with three mono energies (0.1, 1, and 10 MeV) in antero-posterior geometry. We also calculated organ energy deposition using three voxel phantoms with different voxel resolutions (1, 5, and 10 mm) using MCNPX2.7. Results: Comparison of organ energy deposition between the two methods showed that agreement overall improved for higher voxel resolution, but for many organs the differences were small. Difference in the energy deposition for 1 MeV, for example, decreased from 11.5% to 1.7% in muscle but only from 0.6% to 0.3% in liver as voxel resolution increased from 10 mm to 1 mm. The differences were smaller at higher energies. The number of photon histories processed per second in voxels were 6.4×10{sup 4}, 3.3×10{sup 4}, and 1.3×10{sup 4}, for 10, 5, and 1 mm resolutions at 10 MeV, respectively, while meshes ran at 4.0×10{sup 4} histories/sec. Conclusion: The combination of hybrid mesh phantom and penMesh was proved to be accurate and of similar speed compared to the voxel phantom and MCNPX. The lowest voxel resolution caused a maximum dosimetric error of 12.6% at 0.1 MeV and 6.8% at 10 MeV but the error was insignificant in some organs. We will apply the tool to calculate dose to very thin layer tissues (e.g., radiosensitive layer in gastro intestines) which cannot be modeled by voxel phantoms.
Outcomes of Orbital Floor Reconstruction After Extensive Maxillectomy Using the Computer-Assisted Fabricated Individual Titanium Mesh Technique.

Science.gov (United States)

Zhang, Wen-Bo; Mao, Chi; Liu, Xiao-Jing; Guo, Chuan-Bin; Yu, Guang-Yan; Peng, Xin

2015-10-01

Orbital floor defects after extensive maxillectomy can cause severe esthetic and functional deformities. Orbital floor reconstruction using the computer-assisted fabricated individual titanium mesh technique is a promising method. This study evaluated the application and clinical outcomes of this technique. This retrospective study included 10 patients with orbital floor defects after maxillectomy performed from 2012 through 2014. A 3-dimensional individual stereo model based on mirror images of the unaffected orbit was obtained to fabricate an anatomically adapted titanium mesh using computer-assisted design and manufacturing. The titanium mesh was inserted into the defect using computer navigation. The postoperative globe projection and orbital volume were measured and the incidence of postoperative complications was evaluated. The average postoperative globe projection was 15.91 ± 1.80 mm on the affected side and 16.24 ± 2.24 mm on the unaffected side (P = .505), and the average postoperative orbital volume was 26.01 ± 1.28 and 25.57 ± 1.89 mL, respectively (P = .312). The mean mesh depth was 25.11 ± 2.13 mm. The mean follow-up period was 23.4 ± 7.7 months (12 to 34 months). Of the 10 patients, 9 did not develop diplopia or a decrease in visual acuity and ocular motility. Titanium mesh exposure was not observed in any patient. All patients were satisfied with their postoperative facial symmetry. Orbital floor reconstruction after extensive maxillectomy with an individual titanium mesh fabricated using computer-assisted techniques can preserve globe projection and orbital volume, resulting in successful clinical outcomes. Copyright © 2015 American Association of Oral and Maxillofacial Surgeons. Published by Elsevier Inc. All rights reserved.
Parallel Computing:. Some Activities in High Energy Physics

Science.gov (United States)

Willers, Ian

This paper examines some activities in High Energy Physics that utilise parallel computing. The topic includes all computing from the proposed SIMD front end detectors, the farming applications, high-powered RISC processors and the large machines in the computer centers. We start by looking at the motivation behind using parallelism for general purpose computing. The developments around farming are then described from its simplest form to the more complex system in Fermilab. Finally, there is a list of some developments that are happening close to the experiments.
Hybrid parallel computing architecture for multiview phase shifting

Science.gov (United States)

Zhong, Kai; Li, Zhongwei; Zhou, Xiaohui; Shi, Yusheng; Wang, Congjun

2014-11-01

The multiview phase-shifting method shows its powerful capability in achieving high resolution three-dimensional (3-D) shape measurement. Unfortunately, this ability results in very high computation costs and 3-D computations have to be processed offline. To realize real-time 3-D shape measurement, a hybrid parallel computing architecture is proposed for multiview phase shifting. In this architecture, the central processing unit can co-operate with the graphic processing unit (GPU) to achieve hybrid parallel computing. The high computation cost procedures, including lens distortion rectification, phase computation, correspondence, and 3-D reconstruction, are implemented in GPU, and a three-layer kernel function model is designed to simultaneously realize coarse-grained and fine-grained paralleling computing. Experimental results verify that the developed system can perform 50 fps (frame per second) real-time 3-D measurement with 260 K 3-D points per frame. A speedup of up to 180 times is obtained for the performance of the proposed technique using a NVIDIA GT560Ti graphics card rather than a sequential C in a 3.4 GHZ Inter Core i7 3770.
Simulating control rod and fuel assembly motion using moving meshes

Energy Technology Data Exchange (ETDEWEB)

Gilbert, D. [Department of Electrical and Computer Engineering, McMaster University, 1280 Main Street West, Hamilton Ontario, L8S 4K1 (Canada)], E-mail: gilbertdw1@gmail.com; Roman, J.E. [Departamento de Sistemas Informaticos y Computacion, Universidad Politecnica de Valencia, Camino de Vera s/n, 46022 Valencia (Spain); Garland, Wm. J. [Department of Engineering Physics, McMaster University, 1280 Main Street West, Hamilton Ontario, L8S 4K1 (Canada); Poehlman, W.F.S. [Department of Computing and Software, McMaster University, 1280 Main Street West, Hamilton Ontario, L8S 4K1 (Canada)

2008-02-15

A prerequisite for designing a transient simulation experiment which includes the motion of control and fuel assemblies is the careful verification of a steady state model which computes k{sub eff} versus assembly insertion distance. Previous studies in nuclear engineering have usually approached the problem of the motion of control rods with the use of nonlinear nodal models. Nodal methods employ special approximations for the leading and trailing cells of the moving assemblies to avoid the rod cusping problem which results from the naive volume weighted cell cross-section approximation. A prototype framework called the MOOSE has been developed for modeling moving components in the presence of diffusion phenomena. A linear finite difference model is constructed, solutions for which are computed by SLEPc, a high performance parallel eigenvalue solver. Design techniques for the implementation of a patched non-conformal mesh which links groups of sub-meshes that can move relative to one another are presented. The generation of matrices which represent moving meshes which conserve neutron current at their boundaries, and the performance of the framework when applied to model reactivity insertion experiments is also discussed.
GENERATION OF IRREGULAR HEXAGONAL MESHES

Directory of Open Access Journals (Sweden)

Vlasov Aleksandr Nikolaevich

2012-07-01

Decomposition is performed in a constructive way and, as option, it involves meshless representation. Further, this mapping method is used to generate the calculation mesh. In this paper, the authors analyze different cases of mapping onto simply connected and bi-connected canonical domains. They represent forward and backward mapping techniques. Their potential application for generation of nonuniform meshes within the framework of the asymptotic homogenization theory is also performed to assess and project effective characteristics of heterogeneous materials (composites.
Structured Parallel Programming Patterns for Efficient Computation

CERN Document Server

McCool, Michael; Robison, Arch

2012-01-01

Programming is now parallel programming. Much as structured programming revolutionized traditional serial programming decades ago, a new kind of structured programming, based on patterns, is relevant to parallel programming today. Parallel computing experts and industry insiders Michael McCool, Arch Robison, and James Reinders describe how to design and implement maintainable and efficient parallel algorithms using a pattern-based approach. They present both theory and practice, and give detailed concrete examples using multiple programming models. Examples are primarily given using two of th
A shape and mesh adaptive computational methodology for gamma ray dose from volumetric sources

International Nuclear Information System (INIS)

Mirza, N.M.; Ali, B.; Mirza, S.M.; Tufail, M.; Ahmad, N.

1991-01-01

Indoor external exposure to the population is dominated by gamma rays emitted from the walls and the floor of a room. A shape and mesh size adaptive flux calculational approach has been developed for a typical wall source. Parametric studies of the effect of mesh size on flux calculations have been done. The optimum value of the mesh size is found to depend strongly on distance from the source, permissible limits on uncertainty in flux predictions and on computer Central Processing Unit time. To test the computations, a typical wall source was reduced to a point, a line and an infinite volume source having finite thickness, and the computed flux values were compared with values from corresponding analytical expressions for these sources. Results indicate that the errors under optimum conditions remain less than 6% for the fluxes calculated from this approach when compared with the analytical values for the point and the line source approximations. Also, when the wall is simulated as an infinite volume source having finite thickness, the errors in computed to analytical flux ratios remain large for smaller wall dimensions. However, the errors become less than 10% when the wall dimensions are greater than ten mean free paths for 3 MeV gamma rays. Also, specific dose rates from this methodology remain within the difference of 15% for the values obtained by Monte Carlo method. (author)
Defining the best parallelization strategy for a diphasic compressible fluid mechanics code

International Nuclear Information System (INIS)

Berthou, Jean-Yves; Fayolle, Eric; Faucher, Eric; Scliffet, Laurent

2000-01-01

Nuclear plants use steam generator safety valves in order to regulate possible large pressure variations of fluids. In case of an incident these valves may be fed with pressurized liquid water (for instance a pressure of 9 MPa at a temperature of 300degC). When a pressurized liquid is submitted to a strong pressure drop, it will start evaporating. This phenomena is called flashing. Z. Bilicki and co-authors proposed the homogeneous relaxation model (HRM) to compute critical flashing water flows. Its computation in the case of non stationary one-dimensional flashing flows has been carried out with the development of a dedicated time dependent Finite Volume scheme based on a simplified version of the Godunov approach. Electricite De France Research and Development division have developed a monodimensional implementation of the HRM model: ECOSS, a 11000 lines FORTRAN 90. Applied to a shock tube test case with a 20000 elements monodimensional mesh, the simulation of the physical phenomenon during 2.5 seconds requires at least 100 days of computation on a SUN Sparc-Ultra60. This execution time justifies the ECOSS parallelization. Furthermore, we plan a modeling on 2D meshes for the next few years. Knowing that multiplying the mesh dimension by a factor 10 multiplies the execution time by a factor 100, ECOSS would take years of computation with small 2D meshes (1000 x 1000) on a conventional workstation. This paper describes the parallelization analysis we have conducted and we presents the experimental results we have obtained applying different programming model (MPI, OpenMP, HPF) on various platforms (a Compaq Proliant 6000 4 processors, a Cray T3E-750 300 processors, a HP class V 16 processors, a SGI Origin2000 32 processors, a cluster of PCs and a COMPAQ SC 232 processors). These experimental results will be discussed according to the following criteria: efficiency, salability, maintainability, developing costs and portability. As a conclusion, we will present the
Defining the best parallelization strategy for a diphasic compressible fluid mechanics code

Energy Technology Data Exchange (ETDEWEB)

Berthou, Jean-Yves; Fayolle, Eric [Electricite de France, Research and Development division, Modeling and Information Technologies Department, CLAMART CEDEX (France); Faucher, Eric; Scliffet, Laurent [Electricite de France, Research and Development Division, Mechanics and Component Technology Branch Department, Moret sur Loing (France)

2000-09-01

Nuclear plants use steam generator safety valves in order to regulate possible large pressure variations of fluids. In case of an incident these valves may be fed with pressurized liquid water (for instance a pressure of 9 MPa at a temperature of 300degC). When a pressurized liquid is submitted to a strong pressure drop, it will start evaporating. This phenomena is called flashing. Z. Bilicki and co-authors proposed the homogeneous relaxation model (HRM) to compute critical flashing water flows. Its computation in the case of non stationary one-dimensional flashing flows has been carried out with the development of a dedicated time dependent Finite Volume scheme based on a simplified version of the Godunov approach. Electricite De France Research and Development division have developed a monodimensional implementation of the HRM model: ECOSS, a 11000 lines FORTRAN 90. Applied to a shock tube test case with a 20000 elements monodimensional mesh, the simulation of the physical phenomenon during 2.5 seconds requires at least 100 days of computation on a SUN Sparc-Ultra60. This execution time justifies the ECOSS parallelization. Furthermore, we plan a modeling on 2D meshes for the next few years. Knowing that multiplying the mesh dimension by a factor 10 multiplies the execution time by a factor 100, ECOSS would take years of computation with small 2D meshes (1000 x 1000) on a conventional workstation. This paper describes the parallelization analysis we have conducted and we presents the experimental results we have obtained applying different programming model (MPI, OpenMP, HPF) on various platforms (a Compaq Proliant 6000 4 processors, a Cray T3E-750 300 processors, a HP class V 16 processors, a SGI Origin2000 32 processors, a cluster of PCs and a COMPAQ SC 232 processors). These experimental results will be discussed according to the following criteria: efficiency, salability, maintainability, developing costs and portability. As a conclusion, we will present the
Efficient computation of clipped Voronoi diagram for mesh generation

KAUST Repository

Yan, Dongming

2013-04-01

The Voronoi diagram is a fundamental geometric structure widely used in various fields, especially in computer graphics and geometry computing. For a set of points in a compact domain (i.e. a bounded and closed 2D region or a 3D volume), some Voronoi cells of their Voronoi diagram are infinite or partially outside of the domain, but in practice only the parts of the cells inside the domain are needed, as when computing the centroidal Voronoi tessellation. Such a Voronoi diagram confined to a compact domain is called a clipped Voronoi diagram. We present an efficient algorithm to compute the clipped Voronoi diagram for a set of sites with respect to a compact 2D region or a 3D volume. We also apply the proposed method to optimal mesh generation based on the centroidal Voronoi tessellation. Crown Copyright © 2011 Published by Elsevier Ltd. All rights reserved.
Efficient computation of clipped Voronoi diagram for mesh generation

KAUST Repository

Yan, Dongming; Wang, Wen Ping; Lé vy, Bruno L.; Liu, Yang

2013-01-01

The Voronoi diagram is a fundamental geometric structure widely used in various fields, especially in computer graphics and geometry computing. For a set of points in a compact domain (i.e. a bounded and closed 2D region or a 3D volume), some Voronoi cells of their Voronoi diagram are infinite or partially outside of the domain, but in practice only the parts of the cells inside the domain are needed, as when computing the centroidal Voronoi tessellation. Such a Voronoi diagram confined to a compact domain is called a clipped Voronoi diagram. We present an efficient algorithm to compute the clipped Voronoi diagram for a set of sites with respect to a compact 2D region or a 3D volume. We also apply the proposed method to optimal mesh generation based on the centroidal Voronoi tessellation. Crown Copyright © 2011 Published by Elsevier Ltd. All rights reserved.

From parallel to distributed computing for reactive scattering calculations

International Nuclear Information System (INIS)

Lagana, A.; Gervasi, O.; Baraglia, R.

1994-01-01

Some reactive scattering codes have been ported on different innovative computer architectures ranging from massively parallel machines to clustered workstations. The porting has required a drastic restructuring of the codes to single out computationally decoupled cpu intensive subsections. The suitability of different theoretical approaches for parallel and distributed computing restructuring is discussed and the efficiency of related algorithms evaluated
A high performance parallel approach to medical imaging

International Nuclear Information System (INIS)

Frieder, G.; Frieder, O.; Stytz, M.R.

1988-01-01

Research into medical imaging using general purpose parallel processing architectures is described and a review of the performance of previous medical imaging machines is provided. Results demonstrating that general purpose parallel architectures can achieve performance comparable to other, specialized, medical imaging machine architectures is presented. A new back-to-front hidden-surface removal algorithm is described. Results demonstrating the computational savings obtained by using the modified back-to-front hidden-surface removal algorithm are presented. Performance figures for forming a full-scale medical image on a mesh interconnected multiprocessor are presented
Parallel computation for solving the tridiagonal linear system of equations

International Nuclear Information System (INIS)

Ishiguro, Misako; Harada, Hiroo; Fujii, Minoru; Fujimura, Toichiro; Nakamura, Yasuhiro; Nanba, Katsumi.

1981-09-01

Recently, applications of parallel computation for scientific calculations have increased from the need of the high speed calculation of large scale programs. At the JAERI computing center, an array processor FACOM 230-75 APU has installed to study the applicability of parallel computation for nuclear codes. We made some numerical experiments by using the APU on the methods of solution of tridiagonal linear equation which is an important problem in scientific calculations. Referring to the recent papers with parallel methods, we investigate eight ones. These are Gauss elimination method, Parallel Gauss method, Accelerated parallel Gauss method, Jacobi method, Recursive doubling method, Cyclic reduction method, Chebyshev iteration method, and Conjugate gradient method. The computing time and accuracy were compared among the methods on the basis of the numerical experiments. As the result, it is found that the Cyclic reduction method is best both in computing time and accuracy and the Gauss elimination method is the second one. (author)
The new landscape of parallel computer architecture

International Nuclear Information System (INIS)

Shalf, John

2007-01-01

The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models
The new landscape of parallel computer architecture

Energy Technology Data Exchange (ETDEWEB)

Shalf, John [NERSC Division, Lawrence Berkeley National Laboratory 1 Cyclotron Road, Berkeley California, 94720 (United States)

2007-07-15

The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models.
Development of Parallel Computing Framework to Enhance Radiation Transport Code Capabilities for Rare Isotope Beam Facility Design

Energy Technology Data Exchange (ETDEWEB)

Kostin, Mikhail [Michigan State Univ., East Lansing, MI (United States); Mokhov, Nikolai [Fermi National Accelerator Lab. (FNAL), Batavia, IL (United States); Niita, Koji [Research Organization for Information Science and Technology, Ibaraki-ken (Japan)

2013-09-25

A parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. It is intended to be used with older radiation transport codes implemented in Fortran77, Fortran 90 or C. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was developed and tested in conjunction with the MARS15 code. It is possible to use it with other codes such as PHITS, FLUKA and MCNP after certain adjustments. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility can be used in single process calculations as well as in the parallel regime. The framework corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.
Parallel algorithms and architecture for computation of manipulator forward dynamics

Science.gov (United States)

Fijany, Amir; Bejczy, Antal K.

1989-01-01

Parallel computation of manipulator forward dynamics is investigated. Considering three classes of algorithms for the solution of the problem, that is, the O(n), the O(n exp 2), and the O(n exp 3) algorithms, parallelism in the problem is analyzed. It is shown that the problem belongs to the class of NC and that the time and processors bounds are of O(log2/2n) and O(n exp 4), respectively. However, the fastest stable parallel algorithms achieve the computation time of O(n) and can be derived by parallelization of the O(n exp 3) serial algorithms. Parallel computation of the O(n exp 3) algorithms requires the development of parallel algorithms for a set of fundamentally different problems, that is, the Newton-Euler formulation, the computation of the inertia matrix, decomposition of the symmetric, positive definite matrix, and the solution of triangular systems. Parallel algorithms for this set of problems are developed which can be efficiently implemented on a unique architecture, a triangular array of n(n+2)/2 processors with a simple nearest-neighbor interconnection. This architecture is particularly suitable for VLSI and WSI implementations. The developed parallel algorithm, compared to the best serial O(n) algorithm, achieves an asymptotic speedup of more than two orders-of-magnitude in the computation the forward dynamics.
The ongoing investigation of high performance parallel computing in HEP

CERN Document Server

Peach, Kenneth J; Böck, R K; Dobinson, Robert W; Hansroul, M; Norton, Alan Robert; Willers, Ian Malcolm; Baud, J P; Carminati, F; Gagliardi, F; McIntosh, E; Metcalf, M; Robertson, L; CERN. Geneva. Detector Research and Development Committee

1993-01-01

Past and current exploitation of parallel computing in High Energy Physics is summarized and a list of R & D projects in this area is presented. The applicability of new parallel hardware and software to physics problems is investigated, in the light of the requirements for computing power of LHC experiments and the current trends in the computer industry. Four main themes are discussed (possibilities for a finer grain of parallelism; fine-grain communication mechanism; usable parallel programming environment; different programming models and architectures, using standard commercial products). Parallel computing technology is potentially of interest for offline and vital for real time applications in LHC. A substantial investment in applications development and evaluation of state of the art hardware and software products is needed. A solid development environment is required at an early stage, before mainline LHC program development begins.
Stampi: a message passing library for distributed parallel computing. User's guide

International Nuclear Information System (INIS)

Imamura, Toshiyuki; Koide, Hiroshi; Takemiya, Hiroshi

1998-11-01

A new message passing library, Stampi, has been developed to realize a computation with different kind of parallel computers arbitrarily and making MPI (Message Passing Interface) as an unique interface for communication. Stampi is based on MPI2 specification. It realizes dynamic process creation to different machines and communication between spawned one within the scope of MPI semantics. Vender implemented MPI as a closed system in one parallel machine and did not support both functions; process creation and communication to external machines. Stampi supports both functions and enables us distributed parallel computing. Currently Stampi has been implemented on COMPACS (COMplex PArallel Computer System) introduced in CCSE, five parallel computers and one graphic workstation, and any communication on them can be processed on. (author)
A Parallel Cartesian Approach for External Aerodynamics of Vehicles with Complex Geometry

Science.gov (United States)

Aftosmis, M. J.; Berger, M. J.; Adomavicius, G.

2001-01-01

This workshop paper presents the current status in the development of a new approach for the solution of the Euler equations on Cartesian meshes with embedded boundaries in three dimensions on distributed and shared memory architectures. The approach uses adaptively refined Cartesian hexahedra to fill the computational domain. Where these cells intersect the geometry, they are cut by the boundary into arbitrarily shaped polyhedra which receive special treatment by the solver. The presentation documents a newly developed multilevel upwind solver based on a flexible domain-decomposition strategy. One novel aspect of the work is its use of space-filling curves (SFC) for memory efficient on-the-fly parallelization, dynamic re-partitioning and automatic coarse mesh generation. Within each subdomain the approach employs a variety reordering techniques so that relevant data are on the same page in memory permitting high-performance on cache-based processors. Details of the on-the-fly SFC based partitioning are presented as are construction rules for the automatic coarse mesh generation. After describing the approach, the paper uses model problems and 3- D configurations to both verify and validate the solver. The model problems demonstrate that second-order accuracy is maintained despite the presence of the irregular cut-cells in the mesh. In addition, it examines both parallel efficiency and convergence behavior. These investigations demonstrate a parallel speed-up in excess of 28 on 32 processors of an SGI Origin 2000 system and confirm that mesh partitioning has no effect on convergence behavior.
Passivity Enhancement in RES Based Power Plant with Paralleled Grid-Connected Inverters

DEFF Research Database (Denmark)

Bai, Haofeng; Wang, Xiongfei; Blaabjerg, Frede

2016-01-01

Harmonic instability is threatening the operation of power plants with multiple grid connected converters in parallel. To analyze and improve the stability of the grid connected converters, the passivity of the output admittance converters is first analyzed in this paper. It is shown that the non-passivity...
Locating hardware faults in a parallel computer

Science.gov (United States)

Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.

2010-04-13

Locating hardware faults in a parallel computer, including defining within a tree network of the parallel computer two or more sets of non-overlapping test levels of compute nodes of the network that together include all the data communications links of the network, each non-overlapping test level comprising two or more adjacent tiers of the tree; defining test cells within each non-overlapping test level, each test cell comprising a subtree of the tree including a subtree root compute node and all descendant compute nodes of the subtree root compute node within a non-overlapping test level; performing, separately on each set of non-overlapping test levels, an uplink test on all test cells in a set of non-overlapping test levels; and performing, separately from the uplink tests and separately on each set of non-overlapping test levels, a downlink test on all test cells in a set of non-overlapping test levels.
Adaptive radial basis function mesh deformation using data reduction

Science.gov (United States)

Gillebaart, T.; Blom, D. S.; van Zuijlen, A. H.; Bijl, H.

2016-09-01

bandwidth available between CPU and memory. In terms of parallel efficiency/scaling the different studied methods perform similarly, with the greedy algorithm being the bottleneck. In terms of absolute computational work the adaptive methods are better for the cases studied due to their more efficient selection of the control points. By automating most of the RBF mesh deformation, a robust, efficient and almost user-independent mesh deformation method is presented.
An Alternative Algorithm for Computing Watersheds on Shared Memory Parallel Computers

NARCIS (Netherlands)

Meijster, A.; Roerdink, J.B.T.M.

1995-01-01

In this paper a parallel implementation of a watershed algorithm is proposed. The algorithm can easily be implemented on shared memory parallel computers. The watershed transform is generally considered to be inherently sequential since the discrete watershed of an image is defined using recursion.
General-purpose parallel simulator for quantum computing

International Nuclear Information System (INIS)

Niwa, Jumpei; Matsumoto, Keiji; Imai, Hiroshi

2002-01-01

With current technologies, it seems to be very difficult to implement quantum computers with many qubits. It is therefore of importance to simulate quantum algorithms and circuits on the existing computers. However, for a large-size problem, the simulation often requires more computational power than is available from sequential processing. Therefore, simulation methods for parallel processors are required. We have developed a general-purpose simulator for quantum algorithms/circuits on the parallel computer (Sun Enterprise4500). It can simulate algorithms/circuits with up to 30 qubits. In order to test efficiency of our proposed methods, we have simulated Shor's factorization algorithm and Grover's database search, and we have analyzed robustness of the corresponding quantum circuits in the presence of both decoherence and operational errors. The corresponding results, statistics, and analyses are presented in this paper
A Parallel Algorithm for Connected Component Labelling of Gray-scale Images on Homogeneous Multicore Architectures

International Nuclear Information System (INIS)

Niknam, Mehdi; Thulasiraman, Parimala; Camorlinga, Sergio

2010-01-01

Connected component labelling is an essential step in image processing. We provide a parallel version of Suzuki's sequential connected component algorithm in order to speed up the labelling process. Also, we modify the algorithm to enable labelling gray-scale images. Due to the data dependencies in the algorithm we used a method similar to pipeline to exploit parallelism. The parallel algorithm method achieved a speedup of 2.5 for image size of 256 x 256 pixels using 4 processing threads.
Implementations of BLAST for parallel computers.

Science.gov (United States)

Jülich, A

1995-02-01

The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.
Parallel grid generation algorithm for distributed memory computers

Science.gov (United States)

Moitra, Stuti; Moitra, Anutosh

1994-01-01

A parallel grid-generation algorithm and its implementation on the Intel iPSC/860 computer are described. The grid-generation scheme is based on an algebraic formulation of homotopic relations. Methods for utilizing the inherent parallelism of the grid-generation scheme are described, and implementation of multiple levELs of parallelism on multiple instruction multiple data machines are indicated. The algorithm is capable of providing near orthogonality and spacing control at solid boundaries while requiring minimal interprocessor communications. Results obtained on the Intel hypercube for a blended wing-body configuration are used to demonstrate the effectiveness of the algorithm. Fortran implementations bAsed on the native programming model of the iPSC/860 computer and the Express system of software tools are reported. Computational gains in execution time speed-up ratios are given.
Aggregating job exit statuses of a plurality of compute nodes executing a parallel application

Science.gov (United States)

Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.; Mundy, Michael B.

2015-07-21

Aggregating job exit statuses of a plurality of compute nodes executing a parallel application, including: identifying a subset of compute nodes in the parallel computer to execute the parallel application; selecting one compute node in the subset of compute nodes in the parallel computer as a job leader compute node; initiating execution of the parallel application on the subset of compute nodes; receiving an exit status from each compute node in the subset of compute nodes, where the exit status for each compute node includes information describing execution of some portion of the parallel application by the compute node; aggregating each exit status from each compute node in the subset of compute nodes; and sending an aggregated exit status for the subset of compute nodes in the parallel computer.
GAMER: A GRAPHIC PROCESSING UNIT ACCELERATED ADAPTIVE-MESH-REFINEMENT CODE FOR ASTROPHYSICS

International Nuclear Information System (INIS)

Schive, H.-Y.; Tsai, Y.-C.; Chiueh Tzihong

2010-01-01

We present the newly developed code, GPU-accelerated Adaptive-MEsh-Refinement code (GAMER), which adopts a novel approach in improving the performance of adaptive-mesh-refinement (AMR) astrophysical simulations by a large factor with the use of the graphic processing unit (GPU). The AMR implementation is based on a hierarchy of grid patches with an oct-tree data structure. We adopt a three-dimensional relaxing total variation diminishing scheme for the hydrodynamic solver and a multi-level relaxation scheme for the Poisson solver. Both solvers have been implemented in GPU, by which hundreds of patches can be advanced in parallel. The computational overhead associated with the data transfer between the CPU and GPU is carefully reduced by utilizing the capability of asynchronous memory copies in GPU, and the computing time of the ghost-zone values for each patch is diminished by overlapping it with the GPU computations. We demonstrate the accuracy of the code by performing several standard test problems in astrophysics. GAMER is a parallel code that can be run in a multi-GPU cluster system. We measure the performance of the code by performing purely baryonic cosmological simulations in different hardware implementations, in which detailed timing analyses provide comparison between the computations with and without GPU(s) acceleration. Maximum speed-up factors of 12.19 and 10.47 are demonstrated using one GPU with 4096 3 effective resolution and 16 GPUs with 8192 3 effective resolution, respectively.

Solving the Stokes problem on a massively parallel computer

DEFF Research Database (Denmark)

Axelsson, Owe; Barker, Vincent A.; Neytcheva, Maya

2001-01-01

boundary value problem for each velocity component, are solved by the conjugate gradient method with a preconditioning based on the algebraic multi‐level iteration (AMLI) technique. The velocity is found from the computed pressure. The method is optimal in the sense that the computational work...... is proportional to the number of unknowns. Further, it is designed to exploit a massively parallel computer with distributed memory architecture. Numerical experiments on a Cray T3E computer illustrate the parallel performance of the method....
Algorithms for computational fluid dynamics n parallel processors

International Nuclear Information System (INIS)

Van de Velde, E.F.

1986-01-01

A study of parallel algorithms for the numerical solution of partial differential equations arising in computational fluid dynamics is presented. The actual implementation on parallel processors of shared and nonshared memory design is discussed. The performance of these algorithms is analyzed in terms of machine efficiency, communication time, bottlenecks and software development costs. For elliptic equations, a parallel preconditioned conjugate gradient method is described, which has been used to solve pressure equations discretized with high order finite elements on irregular grids. A parallel full multigrid method and a parallel fast Poisson solver are also presented. Hyperbolic conservation laws were discretized with parallel versions of finite difference methods like the Lax-Wendroff scheme and with the Random Choice method. Techniques are developed for comparing the behavior of an algorithm on different architectures as a function of problem size and local computational effort. Effective use of these advanced architecture machines requires the use of machine dependent programming. It is shown that the portability problems can be minimized by introducing high level operations on vectors and matrices structured into program libraries
Integrated computer network high-speed parallel interface

International Nuclear Information System (INIS)

Frank, R.B.

1979-03-01

As the number and variety of computers within Los Alamos Scientific Laboratory's Central Computer Facility grows, the need for a standard, high-speed intercomputer interface has become more apparent. This report details the development of a High-Speed Parallel Interface from conceptual through implementation stages to meet current and future needs for large-scle network computing within the Integrated Computer Network. 4 figures
Switching current imbalance mitigation in power modules with parallel connected SiC MOSFETs

DEFF Research Database (Denmark)

Beczkowski, Szymon; Jørgensen, Asger Bjørn; Li, Helong

2017-01-01

Multichip power modules use parallel connected chips to achieve high current rating. Due to a finite flexibility in a DBC layout, some electrical asymmetries will occur in the module. Parallel connected transistors will exhibit uneven static and dynamic current sharing due to these asymmetries....... Especially important are the couplings between gate and power loops of individual transistors. Fast changing source currents cause gate voltage imbalances yielding uneven switching currents. Equalizing gate voltages seen by paralleled transistors, done by adjusting source bond wires, is proposed...... in this paper. Analysis is performed on an industry standard DBC layout using numerically extracted module parasitics. The method of tuning individual source inductances shows clear improvement in dynamic current balancing and prevents excessive current overshoot during transistors turn-on....
Parallel computing in genomic research: advances and applications.

Science.gov (United States)

Ocaña, Kary; de Oliveira, Daniel

2015-01-01

Today's genomic experiments have to process the so-called "biological big data" that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities.
Performance evaluation for compressible flow calculations on five parallel computers of different architectures

International Nuclear Information System (INIS)

Kimura, Toshiya.

1997-03-01

A two-dimensional explicit Euler solver has been implemented for five MIMD parallel computers of different machine architectures in Center for Promotion of Computational Science and Engineering of Japan Atomic Energy Research Institute. These parallel computers are Fujitsu VPP300, NEC SX-4, CRAY T94, IBM SP2, and Hitachi SR2201. The code was parallelized by several parallelization methods, and a typical compressible flow problem has been calculated for different grid sizes changing the number of processors. Their effective performances for parallel calculations, such as calculation speed, speed-up ratio and parallel efficiency, have been investigated and evaluated. The communication time among processors has been also measured and evaluated. As a result, the differences on the performance and the characteristics between vector-parallel and scalar-parallel computers can be pointed, and it will present the basic data for efficient use of parallel computers and for large scale CFD simulations on parallel computers. (author)
Contributions for the optimization of the extensibility of parallel programing of turbulent plasmas

International Nuclear Information System (INIS)

Rozar, F.

2015-01-01

The work realized through this thesis focuses on the optimization of the Gysela code which simulates a plasma turbulence. Optimization of a scientific application concerns mainly one of the three following points: 1) the simulation of larger meshes, 2) the reduction of computing time and 3) the enhancement of the computation accuracy. The first part of this manuscript presents the contributions relative to the simulation of larger mesh. Alike many simulation codes, getting more realistic simulations is often analogous to rene the meshes. The finer the mesh the larger the memory consumption. Moreover, during these last few years, the supercomputers had trend to provide less and less memory per computer core. For these reasons, we have developed a library, the libMTM (Modeling and Tracing Memory), dedicated to study precisely the memory consumption of parallel softwares. The libMTM tools allowed us to reduce the memory consumption of Gysela and to study its scalability. As far as we know, there is no other tool which provides equivalent features which allow the memory scalability study. The second part of the manuscript presents the works relative to the optimization of the computation time and the improvement of accuracy of the gyro-average operator. This operator represents a corner stone of the gyrokinetic model which is used by the Gysela application. The improvement of accuracy emanates from a change in the computing method: a scheme based on a 2D Hermite interpolation substitutes the Pade approximation. Although the new version of the gyro-average operator is more accurate, it is also more expensive in computation time than the former one. In order to keep the simulation in reasonable time, different optimizations have been performed on the new computing method to get it competitive. Finally, we have developed a MPI parallelized version of the new gyro-average operator. The good scalability of this new gyro-average computer will allow, eventually, a reduction
Challenges in Second-Generation Wireless Mesh Networks

Directory of Open Access Journals (Sweden)

Pescapé Antonio

2008-01-01

Full Text Available Wireless mesh networks have the potential to provide ubiquitous high-speed Internet access at low costs. The good news is that initial deployments of WiFi meshes show the feasibility of providing ubiquitous Internet connectivity. However, their performance is far below the necessary and achievable limit. Moreover, users' subscription in the existing meshes is dismal even though the technical challenges to get connectivity are low. This paper provides an overview of the current status of mesh networks' deployment, and highlights the technical, economical, and social challenges that need to be addressed in the next years. As a proof-of-principle study, we discuss the above-mentioned challenges with reference to three real networks: (i MagNets, an operator-driven planned two-tier mesh network; (ii Berlin Freifunk network as a pure community-driven single-tier network; (iii Weimar Freifunk network, also a community-driven but two-tier network.
Advances in randomized parallel computing

CERN Document Server

Rajasekaran, Sanguthevar

1999-01-01

The technique of randomization has been employed to solve numerous prob lems of computing both sequentially and in parallel. Examples of randomized algorithms that are asymptotically better than their deterministic counterparts in solving various fundamental problems abound. Randomized algorithms have the advantages of simplicity and better performance both in theory and often in practice. This book is a collection of articles written by renowned experts in the area of randomized parallel computing. A brief introduction to randomized algorithms In the aflalysis of algorithms, at least three different measures of performance can be used: the best case, the worst case, and the average case. Often, the average case run time of an algorithm is much smaller than the worst case. 2 For instance, the worst case run time of Hoare's quicksort is O(n ), whereas its average case run time is only O( n log n). The average case analysis is conducted with an assumption on the input space. The assumption made to arrive at t...
Impact of temperature on performance of series and parallel connected mono-crystalline silicon solar cells

Directory of Open Access Journals (Sweden)

Subhash Chander

2015-11-01

Full Text Available This paper presents a study on impact of temperature on the performance of series and parallel connected mono-crystalline silicon (mono-Si solar cell employing solar simulator. The experiment was carried out at constant light intensity 550 W/m2with cell temperature in the range 25–60 oC for single, series and parallel connected mono-Si solar cells. The performance parameters like open circuit voltage, maximum power, fill factor and efficiency are found to decrease with cell temperature while the short circuit current is observed to increase. The experimental results reveal that silicon solar cells connected in series and parallel combinations follow the Kirchhoff’s laws and the temperature has a significant effect on the performance parameters of solar cell.
Research in Parallel Algorithms and Software for Computational Aerosciences

Science.gov (United States)

Domel, Neal D.

1996-01-01

Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
The 2nd Symposium on the Frontiers of Massively Parallel Computations

Science.gov (United States)

Mills, Ronnie (Editor)

1988-01-01

Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.
DIMACS Workshop on Interconnection Networks and Mapping, and Scheduling Parallel Computations

CERN Document Server

Rosenberg, Arnold L; Sotteau, Dominique; NSF Science and Technology Center in Discrete Mathematics and Theoretical Computer Science; Interconnection networks and mapping and scheduling parallel computations

1995-01-01

The interconnection network is one of the most basic components of a massively parallel computer system. Such systems consist of hundreds or thousands of processors interconnected to work cooperatively on computations. One of the central problems in parallel computing is the task of mapping a collection of processes onto the processors and routing network of a parallel machine. Once this mapping is done, it is critical to schedule computations within and communication among processor from universities and laboratories, as well as practitioners involved in the design, implementation, and application of massively parallel systems. Focusing on interconnection networks of parallel architectures of today and of the near future , the book includes topics such as network topologies,network properties, message routing, network embeddings, network emulation, mappings, and efficient scheduling. inputs for a process are available where and when the process is scheduled to be computed. This book contains the refereed pro...
Parallel Object-Oriented Computation Applied to a Finite Element Problem

Directory of Open Access Journals (Sweden)

Jon B. Weissman

1993-01-01

Full Text Available The conventional wisdom in the scientific computing community is that the best way to solve large-scale numerically intensive scientific problems on today's parallel MIMD computers is to use Fortran or C programmed in a data-parallel style using low-level message-passing primitives. This approach inevitably leads to nonportable codes and extensive development time, and restricts parallel programming to the domain of the expert programmer. We believe that these problems are not inherent to parallel computing but are the result of the programming tools used. We will show that comparable performance can be achieved with little effort if better tools that present higher level abstractions are used. The vehicle for our demonstration is a 2D electromagnetic finite element scattering code we have implemented in Mentat, an object-oriented parallel processing system. We briefly describe the application. Mentat, the implementation, and present performance results for both a Mentat and a hand-coded parallel Fortran version.
Wing-Body Aeroelasticity Using Finite-Difference Fluid/Finite-Element Structural Equations on Parallel Computers

Science.gov (United States)

Byun, Chansup; Guruswamy, Guru P.; Kutler, Paul (Technical Monitor)

1994-01-01

In recent years significant advances have been made for parallel computers in both hardware and software. Now parallel computers have become viable tools in computational mechanics. Many application codes developed on conventional computers have been modified to benefit from parallel computers. Significant speedups in some areas have been achieved by parallel computations. For single-discipline use of both fluid dynamics and structural dynamics, computations have been made on wing-body configurations using parallel computers. However, only a limited amount of work has been completed in combining these two disciplines for multidisciplinary applications. The prime reason is the increased level of complication associated with a multidisciplinary approach. In this work, procedures to compute aeroelasticity on parallel computers using direct coupling of fluid and structural equations will be investigated for wing-body configurations. The parallel computer selected for computations is an Intel iPSC/860 computer which is a distributed-memory, multiple-instruction, multiple data (MIMD) computer with 128 processors. In this study, the computational efficiency issues of parallel integration of both fluid and structural equations will be investigated in detail. The fluid and structural domains will be modeled using finite-difference and finite-element approaches, respectively. Results from the parallel computer will be compared with those from the conventional computers using a single processor. This study will provide an efficient computational tool for the aeroelastic analysis of wing-body structures on MIMD type parallel computers.
Fluid dynamics parallel computer development at NASA Langley Research Center

Science.gov (United States)

Townsend, James C.; Zang, Thomas A.; Dwoyer, Douglas L.

1987-01-01

To accomplish more detailed simulations of highly complex flows, such as the transition to turbulence, fluid dynamics research requires computers much more powerful than any available today. Only parallel processing on multiple-processor computers offers hope for achieving the required effective speeds. Looking ahead to the use of these machines, the fluid dynamicist faces three issues: algorithm development for near-term parallel computers, architecture development for future computer power increases, and assessment of possible advantages of special purpose designs. Two projects at NASA Langley address these issues. Software development and algorithm exploration is being done on the FLEX/32 Parallel Processing Research Computer. New architecture features are being explored in the special purpose hardware design of the Navier-Stokes Computer. These projects are complementary and are producing promising results.
The finite element solution of two-dimensional transverse magnetic scattering problems on the connection machine

International Nuclear Information System (INIS)

Hutchinson, S.; Costillo, S.; Dalton, K.; Hensel, E.

1990-01-01

A study is conducted of the finite element solution of the partial differential equations governing two-dimensional electromagnetic field scattering problems on a SIMD computer. A nodal assembly technique is introduced which maps a single node to a single processor. The physical domain is first discretized in parallel to yield the node locations of an O-grid mesh. Next, the system of equations is assembled and then solved in parallel using a conjugate gradient algorithm for complex-valued, non-symmetric, non-positive definite systems. Using this technique and Thinking Machines Corporation's Connection Machine-2 (CM-2), problems with more than 250k nodes are solved. Results of electromagnetic scattering, governed by the 2-d scalar Hemoholtz wave equations are presented in this paper. Solutions are demonstrated for a wide range of objects. A summary of performance data is given for the set of test problems
Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU.

Science.gov (United States)

Shen, Wenfeng; Wei, Daming; Xu, Weimin; Zhu, Xin; Yuan, Shizhong

2010-10-01

Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It was tested in three parallelization device setups: (a) a four-core CPU without a general-purpose GPU, (b) a general-purpose GPU plus 1 core of CPU, and (c) a four-core CPU plus a general-purpose GPU. To effectively take advantage of a multi-core CPU and a general-purpose GPU, an algorithm based on load-prediction dynamic scheduling was developed and applied to setting (c). In the simulation with 1600 time steps, the speedup of the parallel computation as compared to the serial computation was 3.9 in setting (a), 16.8 in setting (b), and 20.0 in setting (c). This study demonstrates that a current PC with a multi-core CPU and a general-purpose GPU provides a good environment for parallel computations in biological modelling and simulation studies. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.
A parallelization study of the general purpose Monte Carlo code MCNP4 on a distributed memory highly parallel computer

International Nuclear Information System (INIS)

Yamazaki, Takao; Fujisaki, Masahide; Okuda, Motoi; Takano, Makoto; Masukawa, Fumihiro; Naito, Yoshitaka

1993-01-01

The general purpose Monte Carlo code MCNP4 has been implemented on the Fujitsu AP1000 distributed memory highly parallel computer. Parallelization techniques developed and studied are reported. A shielding analysis function of the MCNP4 code is parallelized in this study. A technique to map a history to each processor dynamically and to map control process to a certain processor was applied. The efficiency of parallelized code is up to 80% for a typical practical problem with 512 processors. These results demonstrate the advantages of a highly parallel computer to the conventional computers in the field of shielding analysis by Monte Carlo method. (orig.)
Practical implementation of tetrahedral mesh reconstruction in emission tomography

Science.gov (United States)

Boutchko, R.; Sitek, A.; Gullberg, G. T.

2013-05-01

This paper presents a practical implementation of image reconstruction on tetrahedral meshes optimized for emission computed tomography with parallel beam geometry. Tetrahedral mesh built on a point cloud is a convenient image representation method, intrinsically three-dimensional and with a multi-level resolution property. Image intensities are defined at the mesh nodes and linearly interpolated inside each tetrahedron. For the given mesh geometry, the intensities can be computed directly from tomographic projections using iterative reconstruction algorithms with a system matrix calculated using an exact analytical formula. The mesh geometry is optimized for a specific patient using a two stage process. First, a noisy image is reconstructed on a finely-spaced uniform cloud. Then, the geometry of the representation is adaptively transformed through boundary-preserving node motion and elimination. Nodes are removed in constant intensity regions, merged along the boundaries, and moved in the direction of the mean local intensity gradient in order to provide higher node density in the boundary regions. Attenuation correction and detector geometric response are included in the system matrix. Once the mesh geometry is optimized, it is used to generate the final system matrix for ML-EM reconstruction of node intensities and for visualization of the reconstructed images. In dynamic PET or SPECT imaging, the system matrix generation procedure is performed using a quasi-static sinogram, generated by summing projection data from multiple time frames. This system matrix is then used to reconstruct the individual time frame projections. Performance of the new method is evaluated by reconstructing simulated projections of the NCAT phantom and the method is then applied to dynamic SPECT phantom and patient studies and to a dynamic microPET rat study. Tetrahedral mesh-based images are compared to the standard voxel-based reconstruction for both high and low signal-to-noise ratio

Practical implementation of tetrahedral mesh reconstruction in emission tomography

International Nuclear Information System (INIS)

Boutchko, R; Gullberg, G T; Sitek, A

2013-01-01

This paper presents a practical implementation of image reconstruction on tetrahedral meshes optimized for emission computed tomography with parallel beam geometry. Tetrahedral mesh built on a point cloud is a convenient image representation method, intrinsically three-dimensional and with a multi-level resolution property. Image intensities are defined at the mesh nodes and linearly interpolated inside each tetrahedron. For the given mesh geometry, the intensities can be computed directly from tomographic projections using iterative reconstruction algorithms with a system matrix calculated using an exact analytical formula. The mesh geometry is optimized for a specific patient using a two stage process. First, a noisy image is reconstructed on a finely-spaced uniform cloud. Then, the geometry of the representation is adaptively transformed through boundary-preserving node motion and elimination. Nodes are removed in constant intensity regions, merged along the boundaries, and moved in the direction of the mean local intensity gradient in order to provide higher node density in the boundary regions. Attenuation correction and detector geometric response are included in the system matrix. Once the mesh geometry is optimized, it is used to generate the final system matrix for ML-EM reconstruction of node intensities and for visualization of the reconstructed images. In dynamic PET or SPECT imaging, the system matrix generation procedure is performed using a quasi-static sinogram, generated by summing projection data from multiple time frames. This system matrix is then used to reconstruct the individual time frame projections. Performance of the new method is evaluated by reconstructing simulated projections of the NCAT phantom and the method is then applied to dynamic SPECT phantom and patient studies and to a dynamic microPET rat study. Tetrahedral mesh-based images are compared to the standard voxel-based reconstruction for both high and low signal-to-noise ratio
Non-Almost Periodicity of Parallel Transports for Homogeneous Connections

International Nuclear Information System (INIS)

Brunnemann, Johannes; Fleischhack, Christian

2012-01-01

Let A be the affine space of all connections in an SU(2) principal fibre bundle over ℝ 3 . The set of homogeneous isotropic connections forms a line l in A. We prove that the parallel transports for general, non-straight paths in the base manifold do not depend almost periodically on l. Consequently, the embedding l ↪ A does not continuously extend to an embedding l-bar ↪ A-bar of the respective compactifications. Here, the Bohr compactification l-bar corresponds to the configuration space of homogeneous isotropic loop quantum cosmology and A-bar to that of loop quantum gravity. Analogous results are given for the anisotropic case.
FILMPAR: A parallel algorithm designed for the efficient and accurate computation of thin film flow on functional surfaces containing micro-structure

Science.gov (United States)

Lee, Y. C.; Thompson, H. M.; Gaskell, P. H.

2009-12-01

FILMPAR is a highly efficient and portable parallel multigrid algorithm for solving a discretised form of the lubrication approximation to three-dimensional, gravity-driven, continuous thin film free-surface flow over substrates containing micro-scale topography. While generally applicable to problems involving heterogeneous and distributed features, for illustrative purposes the algorithm is benchmarked on a distributed memory IBM BlueGene/P computing platform for the case of flow over a single trench topography, enabling direct comparison with complementary experimental data and existing serial multigrid solutions. Parallel performance is assessed as a function of the number of processors employed and shown to lead to super-linear behaviour for the production of mesh-independent solutions. In addition, the approach is used to solve for the case of flow over a complex inter-connected topographical feature and a description provided of how FILMPAR could be adapted relatively simply to solve for a wider class of related thin film flow problems. Program summaryProgram title: FILMPAR Catalogue identifier: AEEL_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEL_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 530 421 No. of bytes in distributed program, including test data, etc.: 1 960 313 Distribution format: tar.gz Programming language: C++ and MPI Computer: Desktop, server Operating system: Unix/Linux Mac OS X Has the code been vectorised or parallelised?: Yes. Tested with up to 128 processors RAM: 512 MBytes Classification: 12 External routines: GNU C/C++, MPI Nature of problem: Thin film flows over functional substrates containing well-defined single and complex topographical features are of enormous significance, having a wide variety of engineering
A Novel Parallel Algorithm for Edit Distance Computation

Directory of Open Access Journals (Sweden)

Muhammad Murtaza Yousaf

2018-01-01

Full Text Available The edit distance between two sequences is the minimum number of weighted transformation-operations that are required to transform one string into the other. The weighted transformation-operations are insert, remove, and substitute. Dynamic programming solution to find edit distance exists but it becomes computationally intensive when the lengths of strings become very large. This work presents a novel parallel algorithm to solve edit distance problem of string matching. The algorithm is based on resolving dependencies in the dynamic programming solution of the problem and it is able to compute each row of edit distance table in parallel. In this way, it becomes possible to compute the complete table in min(m,n iterations for strings of size m and n whereas state-of-the-art parallel algorithm solves the problem in max(m,n iterations. The proposed algorithm also increases the amount of parallelism in each of its iteration. The algorithm is also capable of exploiting spatial locality while its implementation. Additionally, the algorithm works in a load balanced way that further improves its performance. The algorithm is implemented for multicore systems having shared memory. Implementation of the algorithm in OpenMP shows linear speedup and better execution time as compared to state-of-the-art parallel approach. Efficiency of the algorithm is also proven better in comparison to its competitor.
RGG: Reactor geometry (and mesh) generator

International Nuclear Information System (INIS)

Jain, R.; Tautges, T.

2012-01-01

The reactor geometry (and mesh) generator RGG takes advantage of information about repeated structures in both assembly and core lattices to simplify the creation of geometry and mesh. It is released as open source software as a part of the MeshKit mesh generation library. The methodology operates in three stages. First, assembly geometry models of various types are generated by a tool called AssyGen. Next, the assembly model or models are meshed by using MeshKit tools or the CUBIT mesh generation tool-kit, optionally based on a journal file output by AssyGen. After one or more assembly model meshes have been constructed, a tool called CoreGen uses a copy/move/merge process to arrange the model meshes into a core model. In this paper, we present the current state of tools and new features in RGG. We also discuss the parallel-enabled CoreGen, which in several cases achieves super-linear speedups since the problems fit in available RAM at higher processor counts. Several RGG applications - 1/6 VHTR model, 1/4 PWR reactor core, and a full-core model for Monju - are reported. (authors)
Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

Science.gov (United States)

Moon, Hongsik

What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the
Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E

2014-11-11

Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.
Comparison of the deflated preconditioned conjugate gradient method and parallel direct solver for composite materials

NARCIS (Netherlands)

Jönsthövel, T.B.; Van Gijzen, M.B.; MacLachlan, S.; Vuik, C.; Scarpas, A.

2011-01-01

The demand for large FE meshes increases as parallel computing becomes the standard in FE simulations. Direct and iterative solution methods are used to solve the resulting linear systems. Many applications concern composite materials, which are characterized by large discontinuities in the material
Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB

Science.gov (United States)

Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.

2017-01-01

Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.
Improved mesh generator for the POISSON Group Codes

International Nuclear Information System (INIS)

Gupta, R.C.

1987-01-01

This paper describes the improved mesh generator of the POISSON Group Codes. These improvements enable one to have full control over the way the mesh is generated and in particular the way the mesh density is distributed throughout this model. A higher mesh density in certain regions coupled with a successively lower mesh density in others keeps the accuracy of the field computation high and the requirements on the computer time and computer memory low. The mesh is generated with the help of codes AUTOMESH and LATTICE; both have gone through a major upgrade. Modifications have also been made in the POISSON part of these codes. We shall present an example of a superconducting dipole magnet to explain how to use this code. The results of field computations are found to be reliable within a few parts in a hundred thousand even in such complex geometries
Notes on the Mesh Handler and Mesh Data Conversion

International Nuclear Information System (INIS)

Lee, Sang Yong; Park, Chan Eok

2009-01-01

At the outset of the development of the thermal-hydraulic code (THC), efforts have been made to utilize the recent technology of the computational fluid dynamics. Among many of them, the unstructured mesh approach was adopted to alleviate the restriction of the grid handling system. As a natural consequence, a mesh handler (MH) has been developed to manipulate the complex mesh data from the mesh generator. The mesh generator, Gambit, was chosen at the beginning of the development of the code. But a new mesh generator, Pointwise, was introduced to get more flexible mesh generation capability. An open source code, Paraview, was chosen as a post processor, which can handle unstructured as well as structured mesh data. Overall data processing system for THC is shown in Figure-1. There are various file formats to save the mesh data in the permanent storage media. A couple of dozen of file formats are found even in the above mentioned programs. A competent mesh handler should have the capability to import or export mesh data as many as possible formats. But, in reality, there are two aspects that make it difficult to achieve the competence. The first aspect to consider is the time and efforts to program the interface code. And the second aspect, which is even more difficult one, is the fact that many mesh data file formats are proprietary information. In this paper, some experience of the development of the format conversion programs will be presented. File formats involved are Gambit neutral format, Ansys-CFX grid file format, VTK legacy file format, Nastran format and CGNS
Parallel computing for data science with examples in R, C++ and CUDA

CERN Document Server

Matloff, Norman

2015-01-01

Parallel Computing for Data Science: With Examples in R, C++ and CUDA is one of the first parallel computing books to concentrate exclusively on parallel data structures, algorithms, software tools, and applications in data science. It includes examples not only from the classic ""n observations, p variables"" matrix format but also from time series, network graph models, and numerous other structures common in data science. The examples illustrate the range of issues encountered in parallel programming.With the main focus on computation, the book shows how to compute on three types of platfor
Temporal fringe pattern analysis with parallel computing

International Nuclear Information System (INIS)

Tuck Wah Ng; Kar Tien Ang; Argentini, Gianluca

2005-01-01

Temporal fringe pattern analysis is invaluable in transient phenomena studies but necessitates long processing times. Here we describe a parallel computing strategy based on the single-program multiple-data model and hyperthreading processor technology to reduce the execution time. In a two-node cluster workstation configuration we found that execution periods were reduced by 1.6 times when four virtual processors were used. To allow even lower execution times with an increasing number of processors, the time allocated for data transfer, data read, and waiting should be minimized. Parallel computing is found here to present a feasible approach to reduce execution times in temporal fringe pattern analysis
A Modified Droop Control Method for Parallel-Connected Current Source Inverters

DEFF Research Database (Denmark)

Wei, Baoze; Guerrero, Josep M.; Quintero, Juan Carlos Vasquez

2016-01-01

In this paper, a novel control method was proposed for current source inverters under the grid-connected working mode. The control scheme is based on a modified droop control method, with an additional current reference signal that will be generated instead of the voltage reference. Hence......, there is only a current control loop with droop control in the whole control scheme without voltage control loop. So it is very suitable for grid-connected current source inverter which will simplify the design of the control scheme and combine the advantage of droop control. The parallel configuration...... is widely used to acquire high power demand, but the circulating current problem is a key issue that should be considered. In this paper, a simulation based on parallel current source inverters using the proposed control scheme is provided. Simulation results showed that a good circulating current...
Unstructured mesh adaptivity for urban flooding modelling

Science.gov (United States)

Hu, R.; Fang, F.; Salinas, P.; Pain, C. C.

2018-05-01

Over the past few decades, urban floods have been gaining more attention due to their increase in frequency. To provide reliable flooding predictions in urban areas, various numerical models have been developed to perform high-resolution flood simulations. However, the use of high-resolution meshes across the whole computational domain causes a high computational burden. In this paper, a 2D control-volume and finite-element flood model using adaptive unstructured mesh technology has been developed. This adaptive unstructured mesh technique enables meshes to be adapted optimally in time and space in response to the evolving flow features, thus providing sufficient mesh resolution where and when it is required. It has the advantage of capturing the details of local flows and wetting and drying front while reducing the computational cost. Complex topographic features are represented accurately during the flooding process. For example, the high-resolution meshes around the buildings and steep regions are placed when the flooding water reaches these regions. In this work a flooding event that happened in 2002 in Glasgow, Scotland, United Kingdom has been simulated to demonstrate the capability of the adaptive unstructured mesh flooding model. The simulations have been performed using both fixed and adaptive unstructured meshes, and then results have been compared with those published 2D and 3D results. The presented method shows that the 2D adaptive mesh model provides accurate results while having a low computational cost.
Parallelism, fractal geometry and other aspects of computational mathematics

International Nuclear Information System (INIS)

Churchhouse, R.F.

1991-01-01

In some fields such as meteorology, theoretical physics, quantum chemistry and hydrodynamics there are problems which involve so much computation that computers of the power of a thousand times a Cray 2 could be fully utilised if they were available. Since it is unlikely that uniprocessors of such power will be available, such large scale problems could be solved by using systems of computers running in parallel. This approach, of course, requires to find appropriate algorithms for the solution of such problems which can efficiently make use of a large number of computers working in parallel. 11 refs, 10 figs, 1 tab
Scalable air cathode microbial fuel cells using glass fiber separators, plastic mesh supporters, and graphite fiber brush anodes

KAUST Repository

Zhang, Xiaoyuan

2011-01-01

The combined use of brush anodes and glass fiber (GF1) separators, and plastic mesh supporters were used here for the first time to create a scalable microbial fuel cell architecture. Separators prevented short circuiting of closely-spaced electrodes, and cathode supporters were used to avoid water gaps between the separator and cathode that can reduce power production. The maximum power density with a separator and supporter and a single cathode was 75±1W/m3. Removing the separator decreased power by 8%. Adding a second cathode increased power to 154±1W/m3. Current was increased by connecting two MFCs connected in parallel. These results show that brush anodes, combined with a glass fiber separator and a plastic mesh supporter, produce a useful MFC architecture that is inherently scalable due to good insulation between the electrodes and a compact architecture. © 2010 Elsevier Ltd.
Cartesian Mesh Linearized Euler Equations Solver for Aeroacoustic Problems around Full Aircraft

Directory of Open Access Journals (Sweden)

Yuma Fukushima

2015-01-01

Full Text Available The linearized Euler equations (LEEs solver for aeroacoustic problems has been developed on block-structured Cartesian mesh to address complex geometry. Taking advantage of the benefits of Cartesian mesh, we employ high-order schemes for spatial derivatives and for time integration. On the other hand, the difficulty of accommodating curved wall boundaries is addressed by the immersed boundary method. The resulting LEEs solver is robust to complex geometry and numerically efficient in a parallel environment. The accuracy and effectiveness of the present solver are validated by one-dimensional and three-dimensional test cases. Acoustic scattering around a sphere and noise propagation from the JT15D nacelle are computed. The results show good agreement with analytical, computational, and experimental results. Finally, noise propagation around fuselage-wing-nacelle configurations is computed as a practical example. The results show that the sound pressure level below the over-the-wing nacelle (OWN configuration is much lower than that of the conventional DLR-F6 aircraft configuration due to the shielding effect of the OWN configuration.
Prototyping and Simulating Parallel, Distributed Computations with VISA

National Research Council Canada - National Science Library

Demeure, Isabelle M; Nutt, Gary J

1989-01-01

...] to support the design, prototyping, and simulation of parallel, distributed computations. In particular, VISA is meant to guide the choice of partitioning and communication strategies for such computations, based on their performance...
Introducing a distributed unstructured mesh into gyrokinetic particle-in-cell code, XGC

Science.gov (United States)

Yoon, Eisung; Shephard, Mark; Seol, E. Seegyoung; Kalyanaraman, Kaushik

2017-10-01

XGC has shown good scalability for large leadership supercomputers. The current production version uses a copy of the entire unstructured finite element mesh on every MPI rank. Although an obvious scalability issue if the mesh sizes are to be dramatically increased, the current approach is also not optimal with respect to data locality of particles and mesh information. To address these issues we have initiated the development of a distributed mesh PIC method. This approach directly addresses the base scalability issue with respect to mesh size and, through the use of a mesh entity centric view of the particle mesh relationship, provides opportunities to address data locality needs of many core and GPU supported heterogeneous systems. The parallel mesh PIC capabilities are being built on the Parallel Unstructured Mesh Infrastructure (PUMI). The presentation will first overview the form of mesh distribution used and indicate the structures and functions used to support the mesh, the particles and their interaction. Attention will then focus on the node-level optimizations being carried out to ensure performant operation of all PIC operations on the distributed mesh. Partnership for Edge Physics Simulation (EPSI) Grant No. DE-SC0008449 and Center for Extended Magnetohydrodynamic Modeling (CEMM) Grant No. DE-SC0006618.

A chimera grid scheme. [multiple overset body-conforming mesh system for finite difference adaptation to complex aircraft configurations

Science.gov (United States)

Steger, J. L.; Dougherty, F. C.; Benek, J. A.

1983-01-01

A mesh system composed of multiple overset body-conforming grids is described for adapting finite-difference procedures to complex aircraft configurations. In this so-called 'chimera mesh,' a major grid is generated about a main component of the configuration and overset minor grids are used to resolve all other features. Methods for connecting overset multiple grids and modifications of flow-simulation algorithms are discussed. Computational tests in two dimensions indicate that the use of multiple overset grids can simplify the task of grid generation without an adverse effect on flow-field algorithms and computer code complexity.
An automatic way of finding robust elimination trees for a multi-frontal sparse solver for radical 2D hierarchical meshes

KAUST Repository

AbouEisha, Hassan M.

2014-01-01

In this paper we present a dynamic programming algorithm for finding optimal elimination trees for the multi-frontal direct solver algorithm executed over two dimensional meshes with point singularities. The elimination tree found by the optimization algorithm results in a linear computational cost of sequential direct solver. Based on the optimal elimination tree found by the optimization algorithm we construct heuristic sequential multi-frontal direct solver algorithm resulting in a linear computational cost as well as heuristic parallel multi-frontal direct solver algorithm resulting in a logarithmic computational cost. The resulting parallel algorithm is implemented on NVIDIA CUDA GPU architecture based on our graph-grammar approach. © 2014 Springer-Verlag.
Parallel evolutionary computation in bioinformatics applications.

Science.gov (United States)

Pinho, Jorge; Sobral, João Luis; Rocha, Miguel

2013-05-01

A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Parallel computation with molecular-motor-propelled agents in nanofabricated networks.

Science.gov (United States)

Nicolau, Dan V; Lard, Mercy; Korten, Till; van Delft, Falco C M J M; Persson, Malin; Bengtsson, Elina; Månsson, Alf; Diez, Stefan; Linke, Heiner; Nicolau, Dan V

2016-03-08

The combinatorial nature of many important mathematical problems, including nondeterministic-polynomial-time (NP)-complete problems, places a severe limitation on the problem size that can be solved with conventional, sequentially operating electronic computers. There have been significant efforts in conceiving parallel-computation approaches in the past, for example: DNA computation, quantum computation, and microfluidics-based computation. However, these approaches have not proven, so far, to be scalable and practical from a fabrication and operational perspective. Here, we report the foundations of an alternative parallel-computation system in which a given combinatorial problem is encoded into a graphical, modular network that is embedded in a nanofabricated planar device. Exploring the network in a parallel fashion using a large number of independent, molecular-motor-propelled agents then solves the mathematical problem. This approach uses orders of magnitude less energy than conventional computers, thus addressing issues related to power consumption and heat dissipation. We provide a proof-of-concept demonstration of such a device by solving, in a parallel fashion, the small instance {2, 5, 9} of the subset sum problem, which is a benchmark NP-complete problem. Finally, we discuss the technical advances necessary to make our system scalable with presently available technology.
Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing

Science.gov (United States)

Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide

2015-09-01

The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
Unstructured Adaptive Meshes: Bad for Your Memory?

Science.gov (United States)

Biswas, Rupak; Feng, Hui-Yu; VanderWijngaart, Rob

2003-01-01

This viewgraph presentation explores the need for a NASA Advanced Supercomputing (NAS) parallel benchmark for problems with irregular dynamical memory access. This benchmark is important and necessary because: 1) Problems with localized error source benefit from adaptive nonuniform meshes; 2) Certain machines perform poorly on such problems; 3) Parallel implementation may provide further performance improvement but is difficult. Some examples of problems which use irregular dynamical memory access include: 1) Heat transfer problem; 2) Heat source term; 3) Spectral element method; 4) Base functions; 5) Elemental discrete equations; 6) Global discrete equations. Nonconforming Mesh and Mortar Element Method are covered in greater detail in this presentation.
Curious parallels and curious connections--phylogenetic thinking in biology and historical linguistics.

Science.gov (United States)

Atkinson, Quentin D; Gray, Russell D

2005-08-01

In The Descent of Man (1871), Darwin observed "curious parallels" between the processes of biological and linguistic evolution. These parallels mean that evolutionary biologists and historical linguists seek answers to similar questions and face similar problems. As a result, the theory and methodology of the two disciplines have evolved in remarkably similar ways. In addition to Darwin's curious parallels of process, there are a number of equally curious parallels and connections between the development of methods in biology and historical linguistics. Here we briefly review the parallels between biological and linguistic evolution and contrast the historical development of phylogenetic methods in the two disciplines. We then look at a number of recent studies that have applied phylogenetic methods to language data and outline some current problems shared by the two fields.
A language for data-parallel and task parallel programming dedicated to multi-SIMD computers. Contributions to hydrodynamic simulation with lattice gases

International Nuclear Information System (INIS)

Pic, Marc Michel

1995-01-01

Parallel programming covers task-parallelism and data-parallelism. Many problems need both parallelisms. Multi-SIMD computers allow hierarchical approach of these parallelisms. The T++ language, based on C++, is dedicated to exploit Multi-SIMD computers using a programming paradigm which is an extension of array-programming to tasks managing. Our language introduced array of independent tasks to achieve separately (MIMD), on subsets of processors of identical behaviour (SIMD), in order to translate the hierarchical inclusion of data-parallelism in task-parallelism. To manipulate in a symmetrical way tasks and data we propose meta-operations which have the same behaviour on tasks arrays and on data arrays. We explain how to implement this language on our parallel computer SYMPHONIE in order to profit by the locally-shared memory, by the hardware virtualization, and by the multiplicity of communications networks. We analyse simultaneously a typical application of such architecture. Finite elements scheme for Fluid mechanic needs powerful parallel computers and requires large floating points abilities. Lattice gases is an alternative to such simulations. Boolean lattice bases are simple, stable, modular, need to floating point computation, but include numerical noise. Boltzmann lattice gases present large precision of computation, but needs floating points and are only locally stable. We propose a new scheme, called multi-bit, who keeps the advantages of each boolean model to which it is applied, with large numerical precision and reduced noise. Experiments on viscosity, physical behaviour, noise reduction and spurious invariants are shown and implementation techniques for parallel Multi-SIMD computers detailed. (author) [fr
Passivity Enhancement in Renewable Energy Source Based Power Plant With Paralleled Grid-Connected VSIs

DEFF Research Database (Denmark)

Bai, Haofeng; Wang, Xiongfei; Blaabjerg, Frede

2017-01-01

Harmonic instability is threatening the operation of renewable energy based power plants where multiple gridconnected VSIs are connected in parallel. To analyze and improve the stability of the grid-connected VSIs, the real part of the output admittance of the VSIs is first investigated......-connected VSIs can improve the stability of the renewable power plant....
Monte Carlo charged-particle tracking and energy deposition on a Lagrangian mesh.

Science.gov (United States)

Yuan, J; Moses, G A; McKenty, P W

2005-10-01

A Monte Carlo algorithm for alpha particle tracking and energy deposition on a cylindrical computational mesh in a Lagrangian hydrodynamics code used for inertial confinement fusion (ICF) simulations is presented. The straight line approximation is used to follow propagation of "Monte Carlo particles" which represent collections of alpha particles generated from thermonuclear deuterium-tritium (DT) reactions. Energy deposition in the plasma is modeled by the continuous slowing down approximation. The scheme addresses various aspects arising in the coupling of Monte Carlo tracking with Lagrangian hydrodynamics; such as non-orthogonal severely distorted mesh cells, particle relocation on the moving mesh and particle relocation after rezoning. A comparison with the flux-limited multi-group diffusion transport method is presented for a polar direct drive target design for the National Ignition Facility. Simulations show the Monte Carlo transport method predicts about earlier ignition than predicted by the diffusion method, and generates higher hot spot temperature. Nearly linear speed-up is achieved for multi-processor parallel simulations.
Parallel computing techniques for rotorcraft aerodynamics

Science.gov (United States)

Ekici, Kivanc

The modification of unsteady three-dimensional Navier-Stokes codes for application on massively parallel and distributed computing environments is investigated. The Euler/Navier-Stokes code TURNS (Transonic Unsteady Rotor Navier-Stokes) was chosen as a test bed because of its wide use by universities and industry. For the efficient implementation of TURNS on parallel computing systems, two algorithmic changes are developed. First, main modifications to the implicit operator, Lower-Upper Symmetric Gauss Seidel (LU-SGS) originally used in TURNS, is performed. Second, application of an inexact Newton method, coupled with a Krylov subspace iterative method (Newton-Krylov method) is carried out. Both techniques have been tried previously for the Euler equations mode of the code. In this work, we have extended the methods to the Navier-Stokes mode. Several new implicit operators were tried because of convergence problems of traditional operators with the high cell aspect ratio (CAR) grids needed for viscous calculations on structured grids. Promising results for both Euler and Navier-Stokes cases are presented for these operators. For the efficient implementation of Newton-Krylov methods to the Navier-Stokes mode of TURNS, efficient preconditioners must be used. The parallel implicit operators used in the previous step are employed as preconditioners and the results are compared. The Message Passing Interface (MPI) protocol has been used because of its portability to various parallel architectures. It should be noted that the proposed methodology is general and can be applied to several other CFD codes (e.g. OVERFLOW).
Tutorial: Parallel Computing of Simulation Models for Risk Analysis.

Science.gov (United States)

Reilly, Allison C; Staid, Andrea; Gao, Michael; Guikema, Seth D

2016-10-01

Simulation models are widely used in risk analysis to study the effects of uncertainties on outcomes of interest in complex problems. Often, these models are computationally complex and time consuming to run. This latter point may be at odds with time-sensitive evaluations or may limit the number of parameters that are considered. In this article, we give an introductory tutorial focused on parallelizing simulation code to better leverage modern computing hardware, enabling risk analysts to better utilize simulation-based methods for quantifying uncertainty in practice. This article is aimed primarily at risk analysts who use simulation methods but do not yet utilize parallelization to decrease the computational burden of these models. The discussion is focused on conceptual aspects of embarrassingly parallel computer code and software considerations. Two complementary examples are shown using the languages MATLAB and R. A brief discussion of hardware considerations is located in the Appendix. © 2016 Society for Risk Analysis.
Computer hardware fault administration

Science.gov (United States)

Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.

2010-09-14

Computer hardware fault administration carried out in a parallel computer, where the parallel computer includes a plurality of compute nodes. The compute nodes are coupled for data communications by at least two independent data communications networks, where each data communications network includes data communications links connected to the compute nodes. Typical embodiments carry out hardware fault administration by identifying a location of a defective link in the first data communications network of the parallel computer and routing communications data around the defective link through the second data communications network of the parallel computer.
Small file aggregation in a parallel computing system

Science.gov (United States)

Faibish, Sorin; Bent, John M.; Tzelnic, Percy; Grider, Gary; Zhang, Jingwang

2014-09-02

Techniques are provided for small file aggregation in a parallel computing system. An exemplary method for storing a plurality of files generated by a plurality of processes in a parallel computing system comprises aggregating the plurality of files into a single aggregated file; and generating metadata for the single aggregated file. The metadata comprises an offset and a length of each of the plurality of files in the single aggregated file. The metadata can be used to unpack one or more of the files from the single aggregated file.
Three-dimensional parallel edge-based finite element modeling of electromagnetic data with field redatuming

DEFF Research Database (Denmark)

Cai, Hongzhu; Čuma, Martin; Zhdanov, Michael

2015-01-01

This paper presents a parallelized version of the edge-based finite element method with a novel post-processing approach for numerical modeling of an electromagnetic field in complex media. The method uses an unstructured tetrahedral mesh which can reduce the number of degrees of freedom signific......This paper presents a parallelized version of the edge-based finite element method with a novel post-processing approach for numerical modeling of an electromagnetic field in complex media. The method uses an unstructured tetrahedral mesh which can reduce the number of degrees of freedom...... significantly. The linear system of finite element equations is solved using parallel direct solvers which are robust for ill-conditioned systems and efficient for multiple source electromagnetic (EM) modeling. We also introduce a novel approach to compute the scalar components of the electric field from...... the tangential components along each edge based on field redatuming. The method can produce a more accurate result as compared to conventional approach. We have applied the developed algorithm to compute the EM response for a typical 3D anisotropic geoelectrical model of the off-shore HC reservoir with complex...
Coarse mesh code development

Energy Technology Data Exchange (ETDEWEB)

Lieberoth, J.

1975-06-15

The numerical solution of the neutron diffusion equation plays a very important role in the analysis of nuclear reactors. A wide variety of numerical procedures has been proposed, at which most of the frequently used numerical methods are fundamentally based on the finite- difference approximation where the partial derivatives are approximated by the finite difference. For complex geometries, typical of the practical reactor problems, the computational accuracy of the finite-difference method is seriously affected by the size of the mesh width relative to the neutron diffusion length and by the heterogeneity of the medium. Thus, a very large number of mesh points are generally required to obtain a reasonably accurate approximate solution of the multi-dimensional diffusion equation. Since the computation time is approximately proportional to the number of mesh points, a detailed multidimensional analysis, based on the conventional finite-difference method, is still expensive even with modern large-scale computers. Accordingly, there is a strong incentive to develop alternatives that can reduce the number of mesh-points and still retain accuracy. One of the promising alternatives is the finite element method, which consists of the expansion of the neutron flux by piecewise polynomials. One of the advantages of this procedure is its flexibility in selecting the locations of the mesh points and the degree of the expansion polynomial. The small number of mesh points of the coarse grid enables to store the results of several of the least outer iterations and to calculate well extrapolated values of them by comfortable formalisms. This holds especially if only one energy distribution of fission neutrons is assumed for all fission processes in the reactor, because the whole information of an outer iteration is contained in a field of fission rates which has the size of all mesh points of the coarse grid.
Performing an allreduce operation on a plurality of compute nodes of a parallel computer

Science.gov (United States)

Faraj, Ahmad [Rochester, MN

2012-04-17

Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.
Plane-wave electronic structure calculations on a parallel supercomputer

International Nuclear Information System (INIS)

Nelson, J.S.; Plimpton, S.J.; Sears, M.P.

1993-01-01

The development of iterative solutions of Schrodinger's equation in a plane-wave (pw) basis over the last several years has coincided with great advances in the computational power available for performing the calculations. These dual developments have enabled many new and interesting condensed matter phenomena to be studied from a first-principles approach. The authors present a detailed description of the implementation on a parallel supercomputer (hypercube) of the first-order equation-of-motion solution to Schrodinger's equation, using plane-wave basis functions and ab initio separable pseudopotentials. By distributing the plane-waves across the processors of the hypercube many of the computations can be performed in parallel, resulting in decreases in the overall computation time relative to conventional vector supercomputers. This partitioning also provides ample memory for large Fast Fourier Transform (FFT) meshes and the storage of plane-wave coefficients for many hundreds of energy bands. The usefulness of the parallel techniques is demonstrated by benchmark timings for both the FFT's and iterations of the self-consistent solution of Schrodinger's equation for different sized Si unit cells of up to 512 atoms
Fast Parallel Computation of Polynomials Using Few Processors

DEFF Research Database (Denmark)

Valiant, Leslie G.; Skyum, Sven; Berkowitz, S.

1983-01-01

It is shown that any multivariate polynomial of degree $d$ that can be computed sequentially in $C$ steps can be computed in parallel in $O((\\log d)(\\log C + \\log d))$ steps using only $(Cd)^{O(1)} $ processors....
Fast parallel computation of polynomials using few processors

DEFF Research Database (Denmark)

Valiant, Leslie; Skyum, Sven

1981-01-01

It is shown that any multivariate polynomial that can be computed sequentially in C steps and has degree d can be computed in parallel in 0((log d) (log C + log d)) steps using only (Cd)0(1) processors....

Parallel computing in enterprise modeling.

Energy Technology Data Exchange (ETDEWEB)

Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.

2008-08-01

This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.
Basic design of parallel computational program for probabilistic structural analysis

International Nuclear Information System (INIS)

Kaji, Yoshiyuki; Arai, Taketoshi; Gu, Wenwei; Nakamura, Hitoshi

1999-06-01

In our laboratory, for 'development of damage evaluation method of structural brittle materials by microscopic fracture mechanics and probabilistic theory' (nuclear computational science cross-over research) we examine computational method related to super parallel computation system which is coupled with material strength theory based on microscopic fracture mechanics for latent cracks and continuum structural model to develop new structural reliability evaluation methods for ceramic structures. This technical report is the review results regarding probabilistic structural mechanics theory, basic terms of formula and program methods of parallel computation which are related to principal terms in basic design of computational mechanics program. (author)
Basic design of parallel computational program for probabilistic structural analysis

Energy Technology Data Exchange (ETDEWEB)

Kaji, Yoshiyuki; Arai, Taketoshi [Japan Atomic Energy Research Inst., Tokai, Ibaraki (Japan). Tokai Research Establishment; Gu, Wenwei; Nakamura, Hitoshi

1999-06-01

In our laboratory, for `development of damage evaluation method of structural brittle materials by microscopic fracture mechanics and probabilistic theory` (nuclear computational science cross-over research) we examine computational method related to super parallel computation system which is coupled with material strength theory based on microscopic fracture mechanics for latent cracks and continuum structural model to develop new structural reliability evaluation methods for ceramic structures. This technical report is the review results regarding probabilistic structural mechanics theory, basic terms of formula and program methods of parallel computation which are related to principal terms in basic design of computational mechanics program. (author)
Enhanced Computer Aided Simulation of Meshing and Contact With Application for Spiral Bevel Gear Drives

National Research Council Canada - National Science Library

Litvin, F

1999-01-01

An integrated tooth contact analysis (TCA) computer program for the simulation of meshing and contact of gear drives that calculates transmission errors and shift of hearing contact for misaligned gear drives has been developed...
THE PLUTO CODE FOR ADAPTIVE MESH COMPUTATIONS IN ASTROPHYSICAL FLUID DYNAMICS

International Nuclear Information System (INIS)

Mignone, A.; Tzeferacos, P.; Zanni, C.; Bodo, G.; Van Straalen, B.; Colella, P.

2012-01-01

We present a description of the adaptive mesh refinement (AMR) implementation of the PLUTO code for solving the equations of classical and special relativistic magnetohydrodynamics (MHD and RMHD). The current release exploits, in addition to the static grid version of the code, the distributed infrastructure of the CHOMBO library for multidimensional parallel computations over block-structured, adaptively refined grids. We employ a conservative finite-volume approach where primary flow quantities are discretized at the cell center in a dimensionally unsplit fashion using the Corner Transport Upwind method. Time stepping relies on a characteristic tracing step where piecewise parabolic method, weighted essentially non-oscillatory, or slope-limited linear interpolation schemes can be handily adopted. A characteristic decomposition-free version of the scheme is also illustrated. The solenoidal condition of the magnetic field is enforced by augmenting the equations with a generalized Lagrange multiplier providing propagation and damping of divergence errors through a mixed hyperbolic/parabolic explicit cleaning step. Among the novel features, we describe an extension of the scheme to include non-ideal dissipative processes, such as viscosity, resistivity, and anisotropic thermal conduction without operator splitting. Finally, we illustrate an efficient treatment of point-local, potentially stiff source terms over hierarchical nested grids by taking advantage of the adaptivity in time. Several multidimensional benchmarks and applications to problems of astrophysical relevance assess the potentiality of the AMR version of PLUTO in resolving flow features separated by large spatial and temporal disparities.
Identifying failure in a tree network of a parallel computer

Science.gov (United States)

Archer, Charles J.; Pinnow, Kurt W.; Wallenfelt, Brian P.

2010-08-24

Methods, parallel computers, and products are provided for identifying failure in a tree network of a parallel computer. The parallel computer includes one or more processing sets including an I/O node and a plurality of compute nodes. For each processing set embodiments include selecting a set of test compute nodes, the test compute nodes being a subset of the compute nodes of the processing set; measuring the performance of the I/O node of the processing set; measuring the performance of the selected set of test compute nodes; calculating a current test value in dependence upon the measured performance of the I/O node of the processing set, the measured performance of the set of test compute nodes, and a predetermined value for I/O node performance; and comparing the current test value with a predetermined tree performance threshold. If the current test value is below the predetermined tree performance threshold, embodiments include selecting another set of test compute nodes. If the current test value is not below the predetermined tree performance threshold, embodiments include selecting from the test compute nodes one or more potential problem nodes and testing individually potential problem nodes and links to potential problem nodes.
Parallel-Vector Algorithm For Rapid Structural Anlysis

Science.gov (United States)

Agarwal, Tarun R.; Nguyen, Duc T.; Storaasli, Olaf O.

1993-01-01

New algorithm developed to overcome deficiency of skyline storage scheme by use of variable-band storage scheme. Exploits both parallel and vector capabilities of modern high-performance computers. Gives engineers and designers opportunity to include more design variables and constraints during optimization of structures. Enables use of more refined finite-element meshes to obtain improved understanding of complex behaviors of aerospace structures leading to better, safer designs. Not only attractive for current supercomputers but also for next generation of shared-memory supercomputers.
Adaptive hybrid mesh refinement for multiphysics applications

International Nuclear Information System (INIS)

Khamayseh, Ahmed; Almeida, Valmor de

2007-01-01

The accuracy and convergence of computational solutions of mesh-based methods is strongly dependent on the quality of the mesh used. We have developed methods for optimizing meshes that are comprised of elements of arbitrary polygonal and polyhedral type. We present in this research the development of r-h hybrid adaptive meshing technology tailored to application areas relevant to multi-physics modeling and simulation. Solution-based adaptation methods are used to reposition mesh nodes (r-adaptation) or to refine the mesh cells (h-adaptation) to minimize solution error. The numerical methods perform either the r-adaptive mesh optimization or the h-adaptive mesh refinement method on the initial isotropic or anisotropic meshes to equidistribute weighted geometric and/or solution error function. We have successfully introduced r-h adaptivity to a least-squares method with spherical harmonics basis functions for the solution of the spherical shallow atmosphere model used in climate modeling. In addition, application of this technology also covers a wide range of disciplines in computational sciences, most notably, time-dependent multi-physics, multi-scale modeling and simulation
The Space-Time Conservative Schemes for Large-Scale, Time-Accurate Flow Simulations with Tetrahedral Meshes

Science.gov (United States)

Venkatachari, Balaji Shankar; Streett, Craig L.; Chang, Chau-Lyan; Friedlander, David J.; Wang, Xiao-Yen; Chang, Sin-Chung

2016-01-01

Despite decades of development of unstructured mesh methods, high-fidelity time-accurate simulations are still predominantly carried out on structured, or unstructured hexahedral meshes by using high-order finite-difference, weighted essentially non-oscillatory (WENO), or hybrid schemes formed by their combinations. In this work, the space-time conservation element solution element (CESE) method is used to simulate several flow problems including supersonic jet/shock interaction and its impact on launch vehicle acoustics, and direct numerical simulations of turbulent flows using tetrahedral meshes. This paper provides a status report for the continuing development of the space-time conservation element solution element (CESE) numerical and software framework under the Revolutionary Computational Aerosciences (RCA) project. Solution accuracy and large-scale parallel performance of the numerical framework is assessed with the goal of providing a viable paradigm for future high-fidelity flow physics simulations.
Computationally efficient implementation of combustion chemistry in parallel PDF calculations

International Nuclear Information System (INIS)

Lu Liuyan; Lantz, Steven R.; Ren Zhuyin; Pope, Stephen B.

2009-01-01

In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f m pi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive parallel
Parallel algorithms for computation of the manipulator inertia matrix

Science.gov (United States)

Amin-Javaheri, Masoud; Orin, David E.

1989-01-01

The development of an O(log2N) parallel algorithm for the manipulator inertia matrix is presented. It is based on the most efficient serial algorithm which uses the composite rigid body method. Recursive doubling is used to reformulate the linear recurrence equations which are required to compute the diagonal elements of the matrix. It results in O(log2N) levels of computation. Computation of the off-diagonal elements involves N linear recurrences of varying-size and a new method, which avoids redundant computation of position and orientation transforms for the manipulator, is developed. The O(log2N) algorithm is presented in both equation and graphic forms which clearly show the parallelism inherent in the algorithm.
Fencing data transfers in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Blocksome, Michael A.; Mamidala, Amith R.

2015-06-02

Fencing data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task; the compute nodes coupled for data communications through the PAMI and through data communications resources including at least one segment of shared random access memory; including initiating execution through the PAMI of an ordered sequence of active SEND instructions for SEND data transfers between two endpoints, effecting deterministic SEND data transfers through a segment of shared memory; and executing through the PAMI, with no FENCE accounting for SEND data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all SEND instructions initiated prior to execution of the FENCE instruction for SEND data transfers between the two endpoints.
Parallel peak pruning for scalable SMP contour tree computation

Energy Technology Data Exchange (ETDEWEB)

Carr, Hamish A. [Univ. of Leeds (United Kingdom); Weber, Gunther H. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Davis, CA (United States); Sewell, Christopher M. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Ahrens, James P. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

2017-03-09

As data sets grow to exascale, automated data analysis and visualisation are increasingly important, to intermediate human understanding and to reduce demands on disk storage via in situ analysis. Trends in architecture of high performance computing systems necessitate analysis algorithms to make effective use of combinations of massively multicore and distributed systems. One of the principal analytic tools is the contour tree, which analyses relationships between contours to identify features of more than local importance. Unfortunately, the predominant algorithms for computing the contour tree are explicitly serial, and founded on serial metaphors, which has limited the scalability of this form of analysis. While there is some work on distributed contour tree computation, and separately on hybrid GPU-CPU computation, there is no efficient algorithm with strong formal guarantees on performance allied with fast practical performance. Here in this paper, we report the first shared SMP algorithm for fully parallel contour tree computation, withfor-mal guarantees of O(lgnlgt) parallel steps and O(n lgn) work, and implementations with up to 10x parallel speed up in OpenMP and up to 50x speed up in NVIDIA Thrust.
Flexible operation of parallel grid-connecting converters under unbalanced grid voltage

DEFF Research Database (Denmark)

Lu, Jinghang; Savaghebi, Mehdi; Guerrero, Josep M.

2017-01-01

-link voltage ripple, and overloading. Moreover, under grid voltage unbalance, the active power delivery ability is decreased due to the converter's current rating limitation. In this paper, a thorough study on the current limitation of the grid-connecting converter under grid voltage unbalance is conducted....... In addition, based on the principle that total output active power should be oscillation free, a coordinated control strategy is proposed for the parallel grid-connecting converters. The case study has been conducted to demonstrate the effectiveness of this proposed control strategy....
The Quick Measure of a Nurbs Surface Curvature for Accurate Triangular Meshing

Directory of Open Access Journals (Sweden)

Kniat Aleksander

2014-04-01

Full Text Available NURBS surfaces are the most widely used surfaces for three-dimensional models in CAD/ CAE programs. When a model for FEM calculation is prepared with a CAD program it is inevitable to mesh it finally. There are many algorithms for meshing planar regions. Some of them may be used for meshing surfaces but it is necessary to take the curvature of the surface under consideration to avoid poor quality mesh. The mesh must be denser in the curved regions of the surface. In this paper, instead of analysing a surface curvature, the method to assess how close is a mesh triangle to the surface to which its vertices belong, is presented. The distance between a mesh triangle and a parallel tangent plane through a point on a surface is the measure of the triangle quality. Finding the surface point whose projection is located inside the mesh triangle and which is the tangency point to the plane parallel to this triangle is an optimization problem. Mathematical description of the problem and the algorithm to find its solution are also presented in the paper.
Progress on H5Part: A Portable High Performance Parallel Data Interface for Electromagnetics Simulations

International Nuclear Information System (INIS)

Adelmann, Andreas; Gsell, Achim; Oswald, Benedikt; Schietinger, Thomas; Bethel, Wes; Shalf, John; Siegerist, Cristina; Stockinger, Kurt

2007-01-01

Significant problems facing all experimental and computational sciences arise from growing data size and complexity. Common to all these problems is the need to perform efficient data I/O on diverse computer architectures. In our scientific application, the largest parallel particle simulations generate vast quantities of six-dimensional data. Such a simulation run produces data for an aggregate data size up to several TB per run. Motivated by the need to address data I/O and access challenges, we have implemented H5Part, an open source data I/O API that simplifies the use of the Hierarchical Data Format v5 library (HDF5). HDF5 is an industry standard for high performance, cross-platform data storage and retrieval that runs on all contemporary architectures from large parallel supercomputers to laptops. H5Part, which is oriented to the needs of the particle physics and cosmology communities, provides support for parallel storage and retrieval of particles, structured and in the future unstructured meshes. In this paper, we describe recent work focusing on I/O support for particles and structured meshes and provide data showing performance on modern supercomputer architectures like the IBM POWER 5
An Introduction to Parallel Cluster Computing Using PVM for Computer Modeling and Simulation of Engineering Problems

International Nuclear Information System (INIS)

Spencer, VN

2001-01-01

An investigation has been conducted regarding the ability of clustered personal computers to improve the performance of executing software simulations for solving engineering problems. The power and utility of personal computers continues to grow exponentially through advances in computing capabilities such as newer microprocessors, advances in microchip technologies, electronic packaging, and cost effective gigabyte-size hard drive capacity. Many engineering problems require significant computing power. Therefore, the computation has to be done by high-performance computer systems that cost millions of dollars and need gigabytes of memory to complete the task. Alternately, it is feasible to provide adequate computing in the form of clustered personal computers. This method cuts the cost and size by linking (clustering) personal computers together across a network. Clusters also have the advantage that they can be used as stand-alone computers when they are not operating as a parallel computer. Parallel computing software to exploit clusters is available for computer operating systems like Unix, Windows NT, or Linux. This project concentrates on the use of Windows NT, and the Parallel Virtual Machine (PVM) system to solve an engineering dynamics problem in Fortran
Distributed parallel computing in stochastic modeling of groundwater systems.

Science.gov (United States)

Dong, Yanhui; Li, Guomin; Xu, Haizhen

2013-03-01

Stochastic modeling is a rapidly evolving, popular approach to the study of the uncertainty and heterogeneity of groundwater systems. However, the use of Monte Carlo-type simulations to solve practical groundwater problems often encounters computational bottlenecks that hinder the acquisition of meaningful results. To improve the computational efficiency, a system that combines stochastic model generation with MODFLOW-related programs and distributed parallel processing is investigated. The distributed computing framework, called the Java Parallel Processing Framework, is integrated into the system to allow the batch processing of stochastic models in distributed and parallel systems. As an example, the system is applied to the stochastic delineation of well capture zones in the Pinggu Basin in Beijing. Through the use of 50 processing threads on a cluster with 10 multicore nodes, the execution times of 500 realizations are reduced to 3% compared with those of a serial execution. Through this application, the system demonstrates its potential in solving difficult computational problems in practical stochastic modeling. © 2012, The Author(s). Groundwater © 2012, National Ground Water Association.
An object-oriented programming paradigm for parallelization of computational fluid dynamics

International Nuclear Information System (INIS)

Ohta, Takashi.

1997-03-01

We propose an object-oriented programming paradigm for parallelization of scientific computing programs, and show that the approach can be a very useful strategy. Generally, parallelization of scientific programs tends to be complicated and unportable due to the specific requirements of each parallel computer or compiler. In this paper, we show that the object-oriented programming design, which separates the parallel processing parts from the solver of the applications, can achieve the large improvement in the maintenance of the codes, as well as the high portability. We design the program for the two-dimensional Euler equations according to the paradigm, and evaluate the parallel performance on IBM SP2. (author)
Monte Carlo calculations on a parallel computer using MORSE-C.G

International Nuclear Information System (INIS)

Wood, J.

1995-01-01

The general purpose particle transport Monte Carlo code, MORSE-C.G., is implemented on a parallel computing transputer-based system having MIMD architecture. Example problems are solved which are representative of the 3-principal types of problem that can be solved by the original serial code, namely, fixed source, eigenvalue (k-eff) and time-dependent. The results from the parallelized version of the code are compared in tables with the serial code run on a mainframe serial computer, and with an independent, deterministic transport code. The performance of the parallel computer as the number of processors is varied is shown graphically. For the parallel strategy used, the loss of efficiency as the number of processors is increased, is investigated. (author)

Shape space exploration of constrained meshes

KAUST Repository

Yang, Yongliang; Yang, Yijun; Pottmann, Helmut; Mitra, Niloy J.

2011-01-01

We present a general computational framework to locally characterize any shape space of meshes implicitly prescribed by a collection of non-linear constraints. We computationally access such manifolds, typically of high dimension and co-dimension, through first and second order approximants, namely tangent spaces and quadratically parameterized osculant surfaces. Exploration and navigation of desirable subspaces of the shape space with regard to application specific quality measures are enabled using approximants that are intrinsic to the underlying manifold and directly computable in the parameter space of the osculant surface. We demonstrate our framework on shape spaces of planar quad (PQ) meshes, where each mesh face is constrained to be (nearly) planar, and circular meshes, where each face has a circumcircle. We evaluate our framework for navigation and design exploration on a variety of inputs, while keeping context specific properties such as fairness, proximity to a reference surface, etc. © 2011 ACM.
Shape space exploration of constrained meshes

KAUST Repository

Yang, Yongliang

2011-12-12

We present a general computational framework to locally characterize any shape space of meshes implicitly prescribed by a collection of non-linear constraints. We computationally access such manifolds, typically of high dimension and co-dimension, through first and second order approximants, namely tangent spaces and quadratically parameterized osculant surfaces. Exploration and navigation of desirable subspaces of the shape space with regard to application specific quality measures are enabled using approximants that are intrinsic to the underlying manifold and directly computable in the parameter space of the osculant surface. We demonstrate our framework on shape spaces of planar quad (PQ) meshes, where each mesh face is constrained to be (nearly) planar, and circular meshes, where each face has a circumcircle. We evaluate our framework for navigation and design exploration on a variety of inputs, while keeping context specific properties such as fairness, proximity to a reference surface, etc. © 2011 ACM.
Parallel processing of two-dimensional Sn transport calculations

International Nuclear Information System (INIS)

Uematsu, M.

1997-01-01

A parallel processing method for the two-dimensional S n transport code DOT3.5 has been developed to achieve a drastic reduction in computation time. In the proposed method, parallelization is achieved with angular domain decomposition and/or space domain decomposition. The calculational speed of parallel processing by angular domain decomposition is largely influenced by frequent communications between processing elements. To assess parallelization efficiency, sample problems with up to 32 x 32 spatial meshes were solved with a Sun workstation using the PVM message-passing library. As a result, parallel calculation using 16 processing elements, for example, was found to be nine times as fast as that with one processing element. As for parallel processing by geometry segmentation, the influence of processing element communications on computation time is small; however, discontinuity at the segment boundary degrades convergence speed. To accelerate the convergence, an alternate sweep of angular flux in conjunction with space domain decomposition and a two-step rescaling method consisting of segmentwise rescaling and ordinary pointwise rescaling have been developed. By applying the developed method, the number of iterations needed to obtain a converged flux solution was reduced by a factor of 2. As a result, parallel calculation using 16 processing elements was found to be 5.98 times as fast as the original DOT3.5 calculation
Mesh-free Hamiltonian implementation of two dimensional Darwin model

Science.gov (United States)

Siddi, Lorenzo; Lapenta, Giovanni; Gibbon, Paul

2017-08-01

A new approach to Darwin or magnetoinductive plasma simulation is presented, which combines a mesh-free field solver with a robust time-integration scheme avoiding numerical divergence errors in the solenoidal field components. The mesh-free formulation employs an efficient parallel Barnes-Hut tree algorithm to speed up the computation of fields summed directly from the particles, avoiding the necessity of divergence cleaning procedures typically required by particle-in-cell methods. The time-integration scheme employs a Hamiltonian formulation of the Lorentz force, circumventing the development of violent numerical instabilities associated with time differentiation of the vector potential. It is shown that a semi-implicit scheme converges rapidly and is robust to further numerical instabilities which can develop from a dominant contribution of the vector potential to the canonical momenta. The model is validated by various static and dynamic benchmark tests, including a simulation of the Weibel-like filamentation instability in beam-plasma interactions.
Development of skeletal system for mesh-type ICRP reference adult phantoms

Science.gov (United States)

Yeom, Yeon Soo; Wang, Zhao Jun; Tat Nguyen, Thang; Kim, Han Sung; Choi, Chansoo; Han, Min Cheol; Kim, Chan Hyeong; Lee, Jai Ki; Chung, Beom Sun; Zankl, Maria; Petoussi-Henss, Nina; Bolch, Wesley E.; Lee, Choonsik

2016-10-01

The reference adult computational phantoms of the international commission on radiological protection (ICRP) described in Publication 110 are voxel-type computational phantoms based on whole-body computed tomography (CT) images of adult male and female patients. The voxel resolutions of these phantoms are in the order of a few millimeters and smaller tissues such as the eye lens, the skin, and the walls of some organs cannot be properly defined in the phantoms, resulting in limitations in dose coefficient calculations for weakly penetrating radiations. In order to address the limitations of the ICRP-110 phantoms, an ICRP Task Group has been recently formulated and the voxel phantoms are now being converted to a high-quality mesh format. As a part of the conversion project, in the present study, the skeleton models, one of the most important and complex organs of the body, were constructed. The constructed skeleton models were then tested by calculating red bone marrow (RBM) and endosteum dose coefficients (DCs) for broad parallel beams of photons and electrons and comparing the calculated values with those of the original ICRP-110 phantoms. The results show that for the photon exposures, there is a generally good agreement in the DCs between the mesh-type phantoms and the original voxel-type ICRP-110 phantoms; that is, the dose discrepancies were less than 7% in all cases except for the 0.03 MeV cases, for which the maximum difference was 14%. On the other hand, for the electron exposures (⩽4 MeV), the DCs of the mesh-type phantoms deviate from those of the ICRP-110 phantoms by up to ~1600 times at 0.03 MeV, which is indeed due to the improvement of the skeletal anatomy of the developed skeleton mesh models.
Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems

KAUST Repository

Mudigere, Dheevatsa; Sridharan, Srinivas; Deshpande, Anand; Park, Jongsoo; Heinecke, Alexander; Smelyanskiy, Mikhail; Kaul, Bharat; Dubey, Pradeep; Kaushik, Dinesh; Keyes, David E.

2015-01-01

-grid implicit flow solver, which forms the backbone of computational aerodynamics, poses particular challenges due to its large irregular working sets, unstructured memory accesses, and variable/limited amount of parallelism. This code, based on a domain
Parallel diffusion calculation for the PHAETON on-line multiprocessor computer

International Nuclear Information System (INIS)

Collart, J.M.; Fedon-Magnaud, C.; Lautard, J.J.

1987-04-01

The aim of the PHAETON project is the design of an on-line computer in order to increase the immediate knowledge of the main operating and safety parameters in power plants. A significant stage is the computation of the three dimensional flux distribution. For cost and safety reason a computer based on a parallel microprocessor architecture has been studied. This paper presents a first approach to parallelized three dimensional diffusion calculation. A computing software has been written and built in a four processors demonstrator. We present the realization in progress, concerning the final equipment. 8 refs
THM-GTRF: New Spider meshes, New Hydra-TH runs

Energy Technology Data Exchange (ETDEWEB)

Bakosi, Jozsef [Los Alamos National Laboratory; Christon, Mark A. [Los Alamos National Laboratory; Francois, Marianne M. [Los Alamos National Laboratory; Lowrie, Robert B. [Los Alamos National Laboratory; Nourgaliev, Robert [Los Alamos National Laboratory

2012-06-20

Progress is reported on computational capabilities for the grid-to-rod-fretting (GTRF) problem of pressurized water reactors. Numeca's Hexpress/Hybrid mesh generator is demonstrated as an excellent alternative to generating computational meshes for complex flow geometries, such as in GTRF. Mesh assessment is carried out using standard industrial computational fluid dynamics practices. Hydra-TH, a simulation code developed at LANL for reactor thermal-hydraulics, is demonstrated on hybrid meshes, containing different element types. A series of new Hydra-TH calculations has been carried out collecting turbulence statistics. Preliminary results on the newly generated meshes are discussed; full analysis will be documented in the L3 milestone, THM.CFD.P5.05, Sept. 2012.
Parallel computation of nondeterministic algorithms in VLSI

Energy Technology Data Exchange (ETDEWEB)

Hortensius, P D

1987-01-01

This work examines parallel VLSI implementations of nondeterministic algorithms. It is demonstrated that conventional pseudorandom number generators are unsuitable for highly parallel applications. Efficient parallel pseudorandom sequence generation can be accomplished using certain classes of elementary one-dimensional cellular automata. The pseudorandom numbers appear in parallel on each clock cycle. Extensive study of the properties of these new pseudorandom number generators is made using standard empirical random number tests, cycle length tests, and implementation considerations. Furthermore, it is shown these particular cellular automata can form the basis of efficient VLSI architectures for computations involved in the Monte Carlo simulation of both the percolation and Ising models from statistical mechanics. Finally, a variation on a Built-In Self-Test technique based upon cellular automata is presented. These Cellular Automata-Logic-Block-Observation (CALBO) circuits improve upon conventional design for testability circuitry.
Emerging Nanophotonic Applications Explored with Advanced Scientific Parallel Computing

Science.gov (United States)

Meng, Xiang

The domain of nanoscale optical science and technology is a combination of the classical world of electromagnetics and the quantum mechanical regime of atoms and molecules. Recent advancements in fabrication technology allows the optical structures to be scaled down to nanoscale size or even to the atomic level, which are far smaller than the wavelength they are designed for. These nanostructures can have unique, controllable, and tunable optical properties and their interactions with quantum materials can have important near-field and far-field optical response. Undoubtedly, these optical properties can have many important applications, ranging from the efficient and tunable light sources, detectors, filters, modulators, high-speed all-optical switches; to the next-generation classical and quantum computation, and biophotonic medical sensors. This emerging research of nanoscience, known as nanophotonics, is a highly interdisciplinary field requiring expertise in materials science, physics, electrical engineering, and scientific computing, modeling and simulation. It has also become an important research field for investigating the science and engineering of light-matter interactions that take place on wavelength and subwavelength scales where the nature of the nanostructured matter controls the interactions. In addition, the fast advancements in the computing capabilities, such as parallel computing, also become as a critical element for investigating advanced nanophotonic devices. This role has taken on even greater urgency with the scale-down of device dimensions, and the design for these devices require extensive memory and extremely long core hours. Thus distributed computing platforms associated with parallel computing are required for faster designs processes. Scientific parallel computing constructs mathematical models and quantitative analysis techniques, and uses the computing machines to analyze and solve otherwise intractable scientific challenges. In
Simulation and parallel connection of step-down piezoelectric transformers

International Nuclear Information System (INIS)

Thang, Vo Viet; Kim, In Sung; Jeong, Soon Jong; Kim, Min Soo; Song, Jae Sung

2012-01-01

Piezoelectric transformers have been used widely in electronic circuits due to advantages such as high efficiency, miniaturization and no flammability; however the output power has been limited. For overcoming this drawback, some research has recently been focused on connections between piezoelectric transformers. Based on these operations, the output power has been improved compared to the single operation. Parallel operation of step-down piezoelectric transformers is presented in this paper. An important factor affecting the parallel operation of piezoelectric transformer was the resonance frequency, and a small difference in resonance frequencies was obtained with transformers having the same dimensions and fabricating processes. The piezoelectric transformers were found to operate in first radial mode at a frequency of 68 kHz. An equivalent circuit was used to investigate parallel driving of piezoelectric transformers and then to compare the result with experimental observations. The electrical characteristics, including the output voltage, output power and efficient were measured at a matching resistive load. Effects of frequency on the step-down ratio and of the input voltage on the power properties in the simulation were similar to the experimental results. The output power of the parallel operation was 35 W at a load of 50 Ω and an input voltage of 100 V; the temperature rise was 30 .deg. C and the efficiency was 88%.
A parallel adaptive finite difference algorithm for petroleum reservoir simulation

Energy Technology Data Exchange (ETDEWEB)

Hoang, Hai Minh

2005-07-01

Adaptive finite differential for problems arising in simulation of flow in porous medium applications are considered. Such methods have been proven useful for overcoming limitations of computational resources and improving the resolution of the numerical solutions to a wide range of problems. By local refinement of the computational mesh where it is needed to improve the accuracy of solutions, yields better solution resolution representing more efficient use of computational resources than is possible with traditional fixed-grid approaches. In this thesis, we propose a parallel adaptive cell-centered finite difference (PAFD) method for black-oil reservoir simulation models. This is an extension of the adaptive mesh refinement (AMR) methodology first developed by Berger and Oliger (1984) for the hyperbolic problem. Our algorithm is fully adaptive in time and space through the use of subcycling, in which finer grids are advanced at smaller time steps than the coarser ones. When coarse and fine grids reach the same advanced time level, they are synchronized to ensure that the global solution is conservative and satisfy the divergence constraint across all levels of refinement. The material in this thesis is subdivided in to three overall parts. First we explain the methodology and intricacies of AFD scheme. Then we extend a finite differential cell-centered approximation discretization to a multilevel hierarchy of refined grids, and finally we are employing the algorithm on parallel computer. The results in this work show that the approach presented is robust, and stable, thus demonstrating the increased solution accuracy due to local refinement and reduced computing resource consumption. (Author)
Implementation of PHENIX trigger algorithms on massively parallel computers

International Nuclear Information System (INIS)

Petridis, A.N.; Wohn, F.K.

1995-01-01

The event selection requirements of contemporary high energy and nuclear physics experiments are met by the introduction of on-line trigger algorithms which identify potentially interesting events and reduce the data acquisition rate to levels that are manageable by the electronics. Such algorithms being parallel in nature can be simulated off-line using massively parallel computers. The PHENIX experiment intends to investigate the possible existence of a new phase of matter called the quark gluon plasma which has been theorized to have existed in very early stages of the evolution of the universe by studying collisions of heavy nuclei at ultra-relativistic energies. Such interactions can also reveal important information regarding the structure of the nucleus and mandate a thorough investigation of the simpler proton-nucleus collisions at the same energies. The complexity of PHENIX events and the need to analyze and also simulate them at rates similar to the data collection ones imposes enormous computation demands. This work is a first effort to implement PHENIX trigger algorithms on parallel computers and to study the feasibility of using such machines to run the complex programs necessary for the simulation of the PHENIX detector response. Fine and coarse grain approaches have been studied and evaluated. Depending on the application the performance of a massively parallel computer can be much better or much worse than that of a serial workstation. A comparison between single instruction and multiple instruction computers is also made and possible applications of the single instruction machines to high energy and nuclear physics experiments are outlined. copyright 1995 American Institute of Physics
Parallel computing for homogeneous diffusion and transport equations in neutronics

International Nuclear Information System (INIS)

Pinchedez, K.

1999-06-01

Parallel computing meets the ever-increasing requirements for neutronic computer code speed and accuracy. In this work, two different approaches have been considered. We first parallelized the sequential algorithm used by the neutronics code CRONOS developed at the French Atomic Energy Commission. The algorithm computes the dominant eigenvalue associated with PN simplified transport equations by a mixed finite element method. Several parallel algorithms have been developed on distributed memory machines. The performances of the parallel algorithms have been studied experimentally by implementation on a T3D Cray and theoretically by complexity models. A comparison of various parallel algorithms has confirmed the chosen implementations. We next applied a domain sub-division technique to the two-group diffusion Eigen problem. In the modal synthesis-based method, the global spectrum is determined from the partial spectra associated with sub-domains. Then the Eigen problem is expanded on a family composed, on the one hand, from eigenfunctions associated with the sub-domains and, on the other hand, from functions corresponding to the contribution from the interface between the sub-domains. For a 2-D homogeneous core, this modal method has been validated and its accuracy has been measured. (author)
AUTOMATIC MESH GENERATION OF 3-D GEOMETRIC MODELS

Institute of Scientific and Technical Information of China (English)

刘剑飞

2003-01-01

In this paper the presentation of the ball-packing method is reviewed,and a scheme to generate mesh for complex 3-D geometric models is given,which consists of 4 steps:(1)create nodes in 3-D models by ball-packing method,(2)connect nodes to generate mesh by 3-D Delaunay triangulation,(3)retrieve the boundary of the model after Delaunay triangulation,(4)improve the mesh.
RAMA: A file system for massively parallel computers

Science.gov (United States)

Miller, Ethan L.; Katz, Randy H.

1993-01-01

This paper describes a file system design for massively parallel computers which makes very efficient use of a few disks per processor. This overcomes the traditional I/O bottleneck of massively parallel machines by storing the data on disks within the high-speed interconnection network. In addition, the file system, called RAMA, requires little inter-node synchronization, removing another common bottleneck in parallel processor file systems. Support for a large tertiary storage system can easily be integrated in lo the file system; in fact, RAMA runs most efficiently when tertiary storage is used.
Performance modeling of parallel algorithms for solving neutron diffusion problems

International Nuclear Information System (INIS)

Azmy, Y.Y.; Kirk, B.L.

1995-01-01

Neutron diffusion calculations are the most common computational methods used in the design, analysis, and operation of nuclear reactors and related activities. Here, mathematical performance models are developed for the parallel algorithm used to solve the neutron diffusion equation on message passing and shared memory multiprocessors represented by the Intel iPSC/860 and the Sequent Balance 8000, respectively. The performance models are validated through several test problems, and these models are used to estimate the performance of each of the two considered architectures in situations typical of practical applications, such as fine meshes and a large number of participating processors. While message passing computers are capable of producing speedup, the parallel efficiency deteriorates rapidly as the number of processors increases. Furthermore, the speedup fails to improve appreciably for massively parallel computers so that only small- to medium-sized message passing multiprocessors offer a reasonable platform for this algorithm. In contrast, the performance model for the shared memory architecture predicts very high efficiency over a wide range of number of processors reasonable for this architecture. Furthermore, the model efficiency of the Sequent remains superior to that of the hypercube if its model parameters are adjusted to make its processors as fast as those of the iPSC/860. It is concluded that shared memory computers are better suited for this parallel algorithm than message passing computers
Algorithms and data structures for massively parallel generic adaptive finite element codes

KAUST Repository

Bangerth, Wolfgang

2011-12-01

Today\\'s largest supercomputers have 100,000s of processor cores and offer the potential to solve partial differential equations discretized by billions of unknowns. However, the complexity of scaling to such large machines and problem sizes has so far prevented the emergence of generic software libraries that support such computations, although these would lower the threshold of entry and enable many more applications to benefit from large-scale computing. We are concerned with providing this functionality for mesh-adaptive finite element computations. We assume the existence of an "oracle" that implements the generation and modification of an adaptive mesh distributed across many processors, and that responds to queries about its structure. Based on querying the oracle, we develop scalable algorithms and data structures for generic finite element methods. Specifically, we consider the parallel distribution of mesh data, global enumeration of degrees of freedom, constraints, and postprocessing. Our algorithms remove the bottlenecks that typically limit large-scale adaptive finite element analyses. We demonstrate scalability of complete finite element workflows on up to 16,384 processors. An implementation of the proposed algorithms, based on the open source software p4est as mesh oracle, is provided under an open source license through the widely used deal.II finite element software library. © 2011 ACM 0098-3500/2011/12-ART10 $10.00.
Polyhedral meshing as an innovative approach to computational domain discretization of a cyclone in a fluidized bed CLC unit

Directory of Open Access Journals (Sweden)

Sosnowski Marcin

2017-01-01

Full Text Available Chemical Looping Combustion (CLC is a technology that allows the separation of CO2, which is generated by the combustion of fossil fuels. The majority of process designs currently under investigation are systems of coupled fluidized beds. Advances in the development of power generation system using CLC cannot be introduced without using numerical modelling as a research tool. The primary and critical activity in numerical modelling is the computational domain discretization. It influences the numerical diffusion as well as convergence of the model and therefore the overall accuracy of the obtained results. Hence an innovative approach of computational domain discretization using polyhedral (POLY mesh is proposed in the paper. This method reduces both the numerical diffusion of the mesh as well as the time cost of preparing the model for subsequent calculation. The major advantage of POLY mesh is that each individual cell has many neighbours, so gradients can be much better approximated in comparison to commonly-used tetrahedral (TET mesh. POLYs are also less sensitive to stretching than TETs which results in better numerical stability of the model. Therefore detailed comparison of numerical modelling results concerning subsection of CLC system using tetrahedral and polyhedral mesh is covered in the paper.
Modeling and design of a multivariable control system for multi-paralleled grid-connected inverters with LCL filter

DEFF Research Database (Denmark)

Akhavan, Ali; Mohammadi, Hamid Reza; Guerrero, Josep M.

2018-01-01

The quality of injected current in multi-paralleled grid-connected inverters is a matter of concern. The current controlled grid-connected inverters with LCL filter are widely used in the distributed generation (DG) systems due to their fast dynamic response and better power features. However...... with resonances in the system, damping methods such as passive or active damping is necessary. Secondly and perhaps more importantly, paralleled grid-connected inverters in a microgrid are coupled due to grid impedance. Generally, the coupling effect is not taken into account when designing the control systems...

Massive parallel 3D PIC simulation of negative ion extraction

Science.gov (United States)

Revel, Adrien; Mochalskyy, Serhiy; Montellano, Ivar Mauricio; Wünderlich, Dirk; Fantz, Ursel; Minea, Tiberiu

2017-09-01

The 3D PIC-MCC code ONIX is dedicated to modeling Negative hydrogen/deuterium Ion (NI) extraction and co-extraction of electrons from radio-frequency driven, low pressure plasma sources. It provides valuable insight on the complex phenomena involved in the extraction process. In previous calculations, a mesh size larger than the Debye length was used, implying numerical electron heating. Important steps have been achieved in terms of computation performance and parallelization efficiency allowing successful massive parallel calculations (4096 cores), imperative to resolve the Debye length. In addition, the numerical algorithms have been improved in terms of grid treatment, i.e., the electric field near the complex geometry boundaries (plasma grid) is calculated more accurately. The revised model preserves the full 3D treatment, but can take advantage of a highly refined mesh. ONIX was used to investigate the role of the mesh size, the re-injection scheme for lost particles (extracted or wall absorbed), and the electron thermalization process on the calculated extracted current and plasma characteristics. It is demonstrated that all numerical schemes give the same NI current distribution for extracted ions. Concerning the electrons, the pair-injection technique is found well-adapted to simulate the sheath in front of the plasma grid.
A method of paralleling computer calculation for two-dimensional kinetic plasma model

International Nuclear Information System (INIS)

Brazhnik, V.A.; Demchenko, V.V.; Dem'yanov, V.G.; D'yakov, V.E.; Ol'shanskij, V.V.; Panchenko, V.I.

1987-01-01

A method for parallel computer calculation and OSIRIS program complex realizing it and designed for numerical plasma simulation by the macroparticle method are described. The calculation can be carried out either with one or simultaneously with two computers BESM-6, that is provided by some package of interacting programs functioning in every computer. Program interaction in every computer is based on event techniques realized in OS DISPAK. Parallel computer calculation with two BESM-6 computers allows to accelerate the computation 1.5 times
Performance of Air Pollution Models on Massively Parallel Computers

DEFF Research Database (Denmark)

Brown, John; Hansen, Per Christian; Wasniewski, Jerzy

1996-01-01

To compare the performance and use of three massively parallel SIMD computers, we implemented a large air pollution model on the computers. Using a realistic large-scale model, we gain detailed insight about the performance of the three computers when used to solve large-scale scientific problems...
A Parallel Computational Model for Multichannel Phase Unwrapping Problem

Science.gov (United States)

Imperatore, Pasquale; Pepe, Antonio; Lanari, Riccardo

2015-05-01

In this paper, a parallel model for the solution of the computationally intensive multichannel phase unwrapping (MCh-PhU) problem is proposed. Firstly, the Extended Minimum Cost Flow (EMCF) algorithm for solving MCh-PhU problem is revised within the rigorous mathematical framework of the discrete calculus ; thus permitting to capture its topological structure in terms of meaningful discrete differential operators. Secondly, emphasis is placed on those methodological and practical aspects, which lead to a parallel reformulation of the EMCF algorithm. Thus, a novel dual-level parallel computational model, in which the parallelism is hierarchically implemented at two different (i.e., process and thread) levels, is presented. The validity of our approach has been demonstrated through a series of experiments that have revealed a significant speedup. Therefore, the attained high-performance prototype is suitable for the solution of large-scale phase unwrapping problems in reasonable time frames, with a significant impact on the systematic exploitation of the existing, and rapidly growing, large archives of SAR data.
MEDUSA - An overset grid flow solver for network-based parallel computer systems

Science.gov (United States)

Smith, Merritt H.; Pallis, Jani M.

1993-01-01

Continuing improvement in processing speed has made it feasible to solve the Reynolds-Averaged Navier-Stokes equations for simple three-dimensional flows on advanced workstations. Combining multiple workstations into a network-based heterogeneous parallel computer allows the application of programming principles learned on MIMD (Multiple Instruction Multiple Data) distributed memory parallel computers to the solution of larger problems. An overset-grid flow solution code has been developed which uses a cluster of workstations as a network-based parallel computer. Inter-process communication is provided by the Parallel Virtual Machine (PVM) software. Solution speed equivalent to one-third of a Cray-YMP processor has been achieved from a cluster of nine commonly used engineering workstation processors. Load imbalance and communication overhead are the principal impediments to parallel efficiency in this application.
Parallel Resolved Open Source CFD-DEM: Method, Validation and Application

Directory of Open Access Journals (Sweden)

A. Hager

2014-03-01

Full Text Available In the following paper the authors present a fully parallelized Open Source method for calculating the interaction of immersed bodies and surrounding fluid. A combination of computational fluid dynamics (CFD and a discrete element method (DEM accounts for the physics of both the fluid and the particles. The objects considered are relatively big compared to the cells of the fluid mesh, i.e. they cover several cells each. Thus this fictitious domain method (FDM is called resolved. The implementation is realized within the Open Source framework CFDEMcOupling (www.cfdem.com, which provides an interface between OpenFOAM® based CFD-solvers and the DEM software LIGGGHTS (www.liggghts.com. While both LIGGGHTS and OpenFOAM® were already parallelized, only a recent improvement of the algorithm permits the fully parallel computation of resolved problems. Alongside with a detailed description of the method, its implementation and recent improvements, a number of application and validation examples is presented in the scope of this paper.
Processing data communications events by awakening threads in parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2016-03-15

Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.
A software platform for continuum modeling of ion channels based on unstructured mesh

International Nuclear Information System (INIS)

Tu, B; Bai, S Y; Xie, Y; Zhang, L B; Lu, B Z; Chen, M X

2014-01-01

Most traditional continuum molecular modeling adopted finite difference or finite volume methods which were based on a structured mesh (grid). Unstructured meshes were only occasionally used, but an increased number of applications emerge in molecular simulations. To facilitate the continuum modeling of biomolecular systems based on unstructured meshes, we are developing a software platform with tools which are particularly beneficial to those approaches. This work describes the software system specifically for the simulation of a typical, complex molecular procedure: ion transport through a three-dimensional channel system that consists of a protein and a membrane. The platform contains three parts: a meshing tool chain for ion channel systems, a parallel finite element solver for the Poisson–Nernst–Planck equations describing the electrodiffusion process of ion transport, and a visualization program for continuum molecular modeling. The meshing tool chain in the platform, which consists of a set of mesh generation tools, is able to generate high-quality surface and volume meshes for ion channel systems. The parallel finite element solver in our platform is based on the parallel adaptive finite element package PHG which wass developed by one of the authors [1]. As a featured component of the platform, a new visualization program, VCMM, has specifically been developed for continuum molecular modeling with an emphasis on providing useful facilities for unstructured mesh-based methods and for their output analysis and visualization. VCMM provides a graphic user interface and consists of three modules: a molecular module, a meshing module and a numerical module. A demonstration of the platform is provided with a study of two real proteins, the connexin 26 and hemolysin ion channels. (paper)
Time complexity analysis for distributed memory computers: implementation of parallel conjugate gradient method

NARCIS (Netherlands)

Hoekstra, A.G.; Sloot, P.M.A.; Haan, M.J.; Hertzberger, L.O.; van Leeuwen, J.

1991-01-01

New developments in Computer Science, both hardware and software, offer researchers, such as physicists, unprecedented possibilities to solve their computational intensive problems.However, full exploitation of e.g. new massively parallel computers, parallel languages or runtime environments
A 3D gyrokinetic particle-in-cell simulation of fusion plasma microturbulence on parallel computers

Science.gov (United States)

Williams, T. J.

1992-12-01

One of the grand challenge problems now supported by HPCC is the Numerical Tokamak Project. A goal of this project is the study of low-frequency micro-instabilities in tokamak plasmas, which are believed to cause energy loss via turbulent thermal transport across the magnetic field lines. An important tool in this study is gyrokinetic particle-in-cell (PIC) simulation. Gyrokinetic, as opposed to fully-kinetic, methods are particularly well suited to the task because they are optimized to study the frequency and wavelength domain of the microinstabilities. Furthermore, many researchers now employ low-noise delta(f) methods to greatly reduce statistical noise by modelling only the perturbation of the gyrokinetic distribution function from a fixed background, not the entire distribution function. In spite of the increased efficiency of these improved algorithms over conventional PIC algorithms, gyrokinetic PIC simulations of tokamak micro-turbulence are still highly demanding of computer power--even fully-vectorized codes on vector supercomputers. For this reason, we have worked for several years to redevelop these codes on massively parallel computers. We have developed 3D gyrokinetic PIC simulation codes for SIMD and MIMD parallel processors, using control-parallel, data-parallel, and domain-decomposition message-passing (DDMP) programming paradigms. This poster summarizes our earlier work on codes for the Connection Machine and BBN TC2000 and our development of a generic DDMP code for distributed-memory parallel machines. We discuss the memory-access issues which are of key importance in writing parallel PIC codes, with special emphasis on issues peculiar to gyrokinetic PIC. We outline the domain decompositions in our new DDMP code and discuss the interplay of different domain decompositions suited for the particle-pushing and field-solution components of the PIC algorithm.
Parallel algorithms and archtectures for computational structural mechanics

Science.gov (United States)

Patrick, Merrell; Ma, Shing; Mahajan, Umesh

1989-01-01

The determination of the fundamental (lowest) natural vibration frequencies and associated mode shapes is a key step used to uncover and correct potential failures or problem areas in most complex structures. However, the computation time taken by finite element codes to evaluate these natural frequencies is significant, often the most computationally intensive part of structural analysis calculations. There is continuing need to reduce this computation time. This study addresses this need by developing methods for parallel computation.
Iteration schemes for parallelizing models of superconductivity

Energy Technology Data Exchange (ETDEWEB)

Gray, P.A. [Michigan State Univ., East Lansing, MI (United States)

1996-12-31

The time dependent Lawrence-Doniach model, valid for high fields and high values of the Ginzburg-Landau parameter, is often used for studying vortex dynamics in layered high-T{sub c} superconductors. When solving these equations numerically, the added degrees of complexity due to the coupling and nonlinearity of the model often warrant the use of high-performance computers for their solution. However, the interdependence between the layers can be manipulated so as to allow parallelization of the computations at an individual layer level. The reduced parallel tasks may then be solved independently using a heterogeneous cluster of networked workstations connected together with Parallel Virtual Machine (PVM) software. Here, this parallelization of the model is discussed and several computational implementations of varying degrees of parallelism are presented. Computational results are also given which contrast properties of convergence speed, stability, and consistency of these implementations. Included in these results are models involving the motion of vortices due to an applied current and pinning effects due to various material properties.
A method for data handling numerical results in parallel OpenFOAM simulations

International Nuclear Information System (INIS)

nd Vasile Pârvan Ave., 300223, TM Timişoara, Romania, alin.anton@cs.upt.ro (Romania))" data-affiliation=" (Faculty of Automatic Control and Computing, Politehnica University of Timişoara, 2nd Vasile Pârvan Ave., 300223, TM Timişoara, Romania, alin.anton@cs.upt.ro (Romania))" >Anton, Alin; th Mihai Viteazu Ave., 300221, TM Timişoara (Romania))" data-affiliation=" (Center for Advanced Research in Engineering Science, Romanian Academy – Timişoara Branch, 24th Mihai Viteazu Ave., 300221, TM Timişoara (Romania))" >Muntean, Sebastian

2015-01-01

Parallel computational fluid dynamics simulations produce vast amount of numerical result data. This paper introduces a method for reducing the size of the data by replaying the interprocessor traffic. The results are recovered only in certain regions of interest configured by the user. A known test case is used for several mesh partitioning scenarios using the OpenFOAM toolkit ® [1]. The space savings obtained with classic algorithms remain constant for more than 60 Gb of floating point data. Our method is most efficient on large simulation meshes and is much better suited for compressing large scale simulation results than the regular algorithms
A method for data handling numerical results in parallel OpenFOAM simulations

Energy Technology Data Exchange (ETDEWEB)

Anton, Alin [Faculty of Automatic Control and Computing, Politehnica University of Timişoara, 2" n" d Vasile Pârvan Ave., 300223, TM Timişoara, Romania, alin.anton@cs.upt.ro (Romania); Muntean, Sebastian [Center for Advanced Research in Engineering Science, Romanian Academy – Timişoara Branch, 24" t" h Mihai Viteazu Ave., 300221, TM Timişoara (Romania)

2015-12-31

Parallel computational fluid dynamics simulations produce vast amount of numerical result data. This paper introduces a method for reducing the size of the data by replaying the interprocessor traffic. The results are recovered only in certain regions of interest configured by the user. A known test case is used for several mesh partitioning scenarios using the OpenFOAM toolkit{sup ®}[1]. The space savings obtained with classic algorithms remain constant for more than 60 Gb of floating point data. Our method is most efficient on large simulation meshes and is much better suited for compressing large scale simulation results than the regular algorithms.
Event parallelism: Distributed memory parallel computing for high energy physics experiments

International Nuclear Information System (INIS)

Nash, T.

1989-05-01

This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. 6 figs
Event parallelism: Distributed memory parallel computing for high energy physics experiments

International Nuclear Information System (INIS)

Nash, T.

1989-01-01

This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. (orig.)
Event parallelism: Distributed memory parallel computing for high energy physics experiments

Science.gov (United States)

Nash, Thomas

1989-12-01

This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC system, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described.
Streaming simplification of tetrahedral meshes.

Science.gov (United States)

Vo, Huy T; Callahan, Steven P; Lindstrom, Peter; Pascucci, Valerio; Silva, Cláudio T

2007-01-01

Unstructured tetrahedral meshes are commonly used in scientific computing to represent scalar, vector, and tensor fields in three dimensions. Visualization of these meshes can be difficult to perform interactively due to their size and complexity. By reducing the size of the data, we can accomplish real-time visualization necessary for scientific analysis. We propose a two-step approach for streaming simplification of large tetrahedral meshes. Our algorithm arranges the data on disk in a streaming, I/O-efficient format that allows coherent access to the tetrahedral cells. A quadric-based simplification is sequentially performed on small portions of the mesh in-core. Our output is a coherent streaming mesh which facilitates future processing. Our technique is fast, produces high quality approximations, and operates out-of-core to process meshes too large for main memory.
GPU-accelerated Lattice Boltzmann method for anatomical extraction in patient-specific computational hemodynamics

Science.gov (United States)

Yu, H.; Wang, Z.; Zhang, C.; Chen, N.; Zhao, Y.; Sawchuk, A. P.; Dalsing, M. C.; Teague, S. D.; Cheng, Y.

2014-11-01

Existing research of patient-specific computational hemodynamics (PSCH) heavily relies on software for anatomical extraction of blood arteries. Data reconstruction and mesh generation have to be done using existing commercial software due to the gap between medical image processing and CFD, which increases computation burden and introduces inaccuracy during data transformation thus limits the medical applications of PSCH. We use lattice Boltzmann method (LBM) to solve the level-set equation over an Eulerian distance field and implicitly and dynamically segment the artery surfaces from radiological CT/MRI imaging data. The segments seamlessly feed to the LBM based CFD computation of PSCH thus explicit mesh construction and extra data management are avoided. The LBM is ideally suited for GPU (graphic processing unit)-based parallel computing. The parallel acceleration over GPU achieves excellent performance in PSCH computation. An application study will be presented which segments an aortic artery from a chest CT dataset and models PSCH of the segmented artery.
Image-Based Geometric Modeling and Mesh Generation

CERN Document Server

2013-01-01

As a new interdisciplinary research area, “image-based geometric modeling and mesh generation” integrates image processing, geometric modeling and mesh generation with finite element method (FEM) to solve problems in computational biomedicine, materials sciences and engineering. It is well known that FEM is currently well-developed and efficient, but mesh generation for complex geometries (e.g., the human body) still takes about 80% of the total analysis time and is the major obstacle to reduce the total computation time. It is mainly because none of the traditional approaches is sufficient to effectively construct finite element meshes for arbitrarily complicated domains, and generally a great deal of manual interaction is involved in mesh generation. This contributed volume, the first for such an interdisciplinary topic, collects the latest research by experts in this area. These papers cover a broad range of topics, including medical imaging, image alignment and segmentation, image-to-mesh conversion,...

IPython: components for interactive and parallel computing across disciplines. (Invited)

Science.gov (United States)

Perez, F.; Bussonnier, M.; Frederic, J. D.; Froehle, B. M.; Granger, B. E.; Ivanov, P.; Kluyver, T.; Patterson, E.; Ragan-Kelley, B.; Sailer, Z.

2013-12-01

Scientific computing is an inherently exploratory activity that requires constantly cycling between code, data and results, each time adjusting the computations as new insights and questions arise. To support such a workflow, good interactive environments are critical. The IPython project (http://ipython.org) provides a rich architecture for interactive computing with: 1. Terminal-based and graphical interactive consoles. 2. A web-based Notebook system with support for code, text, mathematical expressions, inline plots and other rich media. 3. Easy to use, high performance tools for parallel computing. Despite its roots in Python, the IPython architecture is designed in a language-agnostic way to facilitate interactive computing in any language. This allows users to mix Python with Julia, R, Octave, Ruby, Perl, Bash and more, as well as to develop native clients in other languages that reuse the IPython clients. In this talk, I will show how IPython supports all stages in the lifecycle of a scientific idea: 1. Individual exploration. 2. Collaborative development. 3. Production runs with parallel resources. 4. Publication. 5. Education. In particular, the IPython Notebook provides an environment for "literate computing" with a tight integration of narrative and computation (including parallel computing). These Notebooks are stored in a JSON-based document format that provides an "executable paper": notebooks can be version controlled, exported to HTML or PDF for publication, and used for teaching.
Parallel computing of a climate model on the dawn 1000 by domain decomposition method

Science.gov (United States)

Bi, Xunqiang

1997-12-01

In this paper the parallel computing of a grid-point nine-level atmospheric general circulation model on the Dawn 1000 is introduced. The model was developed by the Institute of Atmospheric Physics (IAP), Chinese Academy of Sciences (CAS). The Dawn 1000 is a MIMD massive parallel computer made by National Research Center for Intelligent Computer (NCIC), CAS. A two-dimensional domain decomposition method is adopted to perform the parallel computing. The potential ways to increase the speed-up ratio and exploit more resources of future massively parallel supercomputation are also discussed.
Efficiency Analysis of the access method with the cascading Bloom filter to the data warehouse on the parallel computing platform

Science.gov (United States)

Grigoriev, Yu A.; Proletarskaya, V. A.; Ermakov, E. Yu; Ermakov, O. Yu

2017-10-01

A new method was developed with a cascading Bloom filter (CBF) for executing SQL queries in the Apache Spark parallel computing environment. It includes the representation of the original query in the form of several subqueries, the development of a connection graph and the transformation of subqueries, the definition of connections where it is necessary to use Bloom filters, the representation of the graph in terms of Spark. On the example of the query Q3 of the TPC-H test, full-scale experiments were carried out, which confirmed the effectiveness of the developed method.
Fine-grain Parallel Processing On A Commodity Platform: A Solution For The Atlas Second-level Trigger

CERN Document Server

Boosten, M

2003-01-01

From 2005 on, CERN expects to have a new accelerator available for experiments: the Large Hadron Collider (LHC), with a circumference of 27 kilometres. The ATLAS detector produces 40 TeraBytes/s of data. Only a fraction of all data is interesting. A computer system, called the trigger, selects the interesting data through real-time data analysis. The trigger consists of three subsequent filtering levels: LVL1, LVL2, and LVL3. LVL1 will be implemented using special-purpose hardware. LVL2 and LVL3 will be implemented using a Network Of Workstations (NOW). A major problem is to make efficient use of the computing power available in each workstation. The major contribution of this designer's project is an infrastructure named MESH. MESH enables CERN to cost- effectively implement the LVL2 trigger. Furthermore, due to the use of commodity technology, MESH enables the LVL2 trigger to be cost-effectively upgraded and supported during its 20 year lifecycle. MESH facilitates efficient parallel processing on PCs interc...
Mesh influence on the fire computer modeling in nuclear power plants

Directory of Open Access Journals (Sweden)

D. Lázaro

2018-04-01

Full Text Available Fire computer models allow to study real fire scenarios consequences. Its use in nuclear power plants has increased with the new regulations to apply risk informed performance-based methods for the analysis and design of fire safety solutions. The selection of the cell side factor is very important in these kinds of models. The mesh must establish a compromise between the geometry adjustment, the resolution of the equations and the computation times. This paper aims to study the impact of several cell sizes, using the fire computer model FDS, to evaluate the relative affectation in the final simulation results. In order to validate that, we have employed several scenarios of interest for nuclear power plants. Conclusions offer relevant data for users and show some cell sizes that can be selected to guarantee the quality of the simulations and reduce the results uncertainty.
Parallelization of MCNP4 code by using simple FORTRAN algorithms

International Nuclear Information System (INIS)

Yazid, P.I.; Takano, Makoto; Masukawa, Fumihiro; Naito, Yoshitaka.

1993-12-01

Simple FORTRAN algorithms, that rely only on open, close, read and write statements, together with disk files and some UNIX commands have been applied to parallelization of MCNP4. The code, named MCNPNFS, maintains almost all capabilities of MCNP4 in solving shielding problems. It is able to perform parallel computing on a set of any UNIX workstations connected by a network, regardless of the heterogeneity in hardware system, provided that all processors produce a binary file in the same format. Further, it is confirmed that MCNPNFS can be executed also on Monte-4 vector-parallel computer. MCNPNFS has been tested intensively by executing 5 photon-neutron benchmark problems, a spent fuel cask problem and 17 sample problems included in the original code package of MCNP4. Three different workstations, connected by a network, have been used to execute MCNPNFS in parallel. By measuring CPU time, the parallel efficiency is determined to be 58% to 99% and 86% in average. On Monte-4, MCNPNFS has been executed using 4 processors concurrently and has achieved the parallel efficiency of 79% in average. (author)
Internode data communications in a parallel computer

Science.gov (United States)

Archer, Charles J.; Blocksome, Michael A.; Miller, Douglas R.; Parker, Jeffrey J.; Ratterman, Joseph D.; Smith, Brian E.

2013-09-03

Internode data communications in a parallel computer that includes compute nodes that each include main memory and a messaging unit, the messaging unit including computer memory and coupling compute nodes for data communications, in which, for each compute node at compute node boot time: a messaging unit allocates, in the messaging unit's computer memory, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; receives, prior to initialization of a particular process on the compute node, a data communications message intended for the particular process; and stores the data communications message in the message buffer associated with the particular process. Upon initialization of the particular process, the process establishes a messaging buffer in main memory of the compute node and copies the data communications message from the message buffer of the messaging unit into the message buffer of main memory.
Performance assessment of the SIMFAP parallel cluster at IFIN-HH Bucharest

International Nuclear Information System (INIS)

Adam, Gh.; Adam, S.; Ayriyan, A.; Dushanov, E.; Hayryan, E.; Korenkov, V.; Lutsenko, A.; Mitsyn, V.; Sapozhnikova, T.; Sapozhnikov, A; Streltsova, O.; Buzatu, F.; Dulea, M.; Vasile, I.; Sima, A.; Visan, C.; Busa, J.; Pokorny, I.

2008-01-01

Performance assessment and case study outputs of the parallel SIMFAP cluster at IFIN-HH Bucharest point to its effective and reliable operation. A comparison with results on the supercomputing system in LIT-JINR Dubna adds insight on resource allocation for problem solving by parallel computing. The solution of models asking for very large numbers of knots in the discretization mesh needs the migration to high performance computing based on parallel cluster architectures. The acquisition of ready-to-use parallel computing facilities being beyond limited budgetary resources, the solution at IFIN-HH was to buy the hardware and the inter-processor network, and to implement by own efforts the open software concerning both the operating system and the parallel computing standard. The present paper provides a report demonstrating the successful solution of these tasks. The implementation of the well-known HPL (High Performance LINPACK) Benchmark points to the effective and reliable operation of the cluster. The comparison of HPL outputs obtained on parallel clusters of different magnitudes shows that there is an optimum range of the order N of the linear algebraic system over which a given parallel cluster provides optimum parallel solutions. For the SIMFAP cluster, this range can be inferred to correspond to about 1 to 2 x 10 4 linear algebraic equations. For an algorithm of polynomial complexity N α the task sharing among p processors within a parallel solution mainly follows an (N/p)α behaviour under peak performance achievement. Thus, while the problem complexity remains the same, a substantial decrease of the coefficient of the leading order of the polynomial complexity is achieved. (authors)
Parallel processing algorithms for hydrocodes on a computer with MIMD architecture (DENELCOR's HEP)

International Nuclear Information System (INIS)

Hicks, D.L.

1983-11-01

In real time simulation/prediction of complex systems such as water-cooled nuclear reactors, if reactor operators had fast simulator/predictors to check the consequences of their operations before implementing them, events such as the incident at Three Mile Island might be avoided. However, existing simulator/predictors such as RELAP run slower than real time on serial computers. It appears that the only way to overcome the barrier to higher computing rates is to use computers with architectures that allow concurrent computations or parallel processing. The computer architecture with the greatest degree of parallelism is labeled Multiple Instruction Stream, Multiple Data Stream (MIMD). An example of a machine of this type is the HEP computer by DENELCOR. It appears that hydrocodes are very well suited for parallelization on the HEP. It is a straightforward exercise to parallelize explicit, one-dimensional Lagrangean hydrocodes in a zone-by-zone parallelization. Similarly, implicit schemes can be parallelized in a zone-by-zone fashion via an a priori, symbolic inversion of the tridiagonal matrix that arises in an implicit scheme. These techniques are extended to Eulerian hydrocodes by using Harlow's rezone technique. The extension from single-phase Eulerian to two-phase Eulerian is straightforward. This step-by-step extension leads to hydrocodes with zone-by-zone parallelization that are capable of two-phase flow simulation. Extensions to two and three spatial dimensions can be achieved by operator splitting. It appears that a zone-by-zone parallelization is the best way to utilize the capabilities of an MIMD machine. 40 references
Software Alchemy: Turning Complex Statistical Computations into Embarrassingly-Parallel Ones

Directory of Open Access Journals (Sweden)

Norman Matloff

2016-07-01

Full Text Available The growth in the use of computationally intensive statistical procedures, especially with big data, has necessitated the usage of parallel computation on diverse platforms such as multicore, GPUs, clusters and clouds. However, slowdown due to interprocess communication costs typically limits such methods to "embarrassingly parallel" (EP algorithms, especially on non-shared memory platforms. This paper develops a broadlyapplicable method for converting many non-EP algorithms into statistically equivalent EP ones. The method is shown to yield excellent levels of speedup for a variety of statistical computations. It also overcomes certain problems of memory limitations.
Dynamic Mesh Adaptation for Front Evolution Using Discontinuous Galerkin Based Weighted Condition Number Mesh Relaxation

Energy Technology Data Exchange (ETDEWEB)

Greene, Patrick T. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Schofield, Samuel P. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Nourgaliev, Robert [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2016-06-21

A new mesh smoothing method designed to cluster mesh cells near a dynamically evolving interface is presented. The method is based on weighted condition number mesh relaxation with the weight function being computed from a level set representation of the interface. The weight function is expressed as a Taylor series based discontinuous Galerkin projection, which makes the computation of the derivatives of the weight function needed during the condition number optimization process a trivial matter. For cases when a level set is not available, a fast method for generating a low-order level set from discrete cell-centered elds, such as a volume fraction or index function, is provided. Results show that the low-order level set works equally well for the weight function as the actual level set. Meshes generated for a number of interface geometries are presented, including cases with multiple level sets. Dynamic cases for moving interfaces are presented to demonstrate the method's potential usefulness to arbitrary Lagrangian Eulerian (ALE) methods.
Investigation and experimental validation of the contribution of optical interconnects in the SYMPHONIE massively parallel computer

International Nuclear Information System (INIS)

Scheer, Patrick

1998-01-01

Progress in microelectronics lead to electronic circuits which are increasingly integrated, with an operating frequency and an inputs/outputs count larger than the ones supported by printed circuit board and back-plane technologies. As a result, distributed systems with several boards cannot fully exploit the performance of integrated circuits. In synchronous parallel computers, the situation is worsen since the overall system performances rely on the efficiency of electrical interconnects between the integrated circuits which include the processing elements (PE). The study of a real parallel computer named SYMPHONIE shows for instance that the system operating frequency is far smaller than the capabilities of the microelectronics technology used for the PE implementation. Optical interconnections may cancel these limitations by providing more efficient connections between the PE. Especially, free-space optical interconnections based on vertical-cavity surface-emitting lasers (VCSEL), micro-lens and PIN photodiodes are compatible with the required features of the PE communications. Zero bias modulation of VCSEL with CMOS-compatible digital signals is studied and experimentally demonstrated. A model of the propagation of truncated gaussian beams through micro-lenses is developed. It is then used to optimise the geometry of the detection areas. A dedicated mechanical system is also proposed and implemented for integrating free-space optical interconnects in a standard electronic environment, representative of the one of parallel computer systems. A specially designed demonstrator provides the experimental validation of the above physical concepts. (author) [fr
Parallel PDE-Based Simulations Using the Common Component Architecture

International Nuclear Information System (INIS)

McInnes, Lois C.; Allan, Benjamin A.; Armstrong, Robert; Benson, Steven J.; Bernholdt, David E.; Dahlgren, Tamara L.; Diachin, Lori; Krishnan, Manoj Kumar; Kohl, James A.; Larson, J. Walter; Lefantzi, Sophia; Nieplocha, Jarek; Norris, Boyana; Parker, Steven G.; Ray, Jaideep; Zhou, Shujia

2006-01-01

The complexity of parallel PDE-based simulations continues to increase as multimodel, multiphysics, and multi-institutional projects become widespread. A goal of component based software engineering in such large-scale simulations is to help manage this complexity by enabling better interoperability among various codes that have been independently developed by different groups. The Common Component Architecture (CCA) Forum is defining a component architecture specification to address the challenges of high-performance scientific computing. In addition, several execution frameworks, supporting infrastructure, and general purpose components are being developed. Furthermore, this group is collaborating with others in the high-performance computing community to design suites of domain-specific component interface specifications and underlying implementations. This chapter discusses recent work on leveraging these CCA efforts in parallel PDE-based simulations involving accelerator design, climate modeling, combustion, and accidental fires and explosions. We explain how component technology helps to address the different challenges posed by each of these applications, and we highlight how component interfaces built on existing parallel toolkits facilitate the reuse of software for parallel mesh manipulation, discretization, linear algebra, integration, optimization, and parallel data redistribution. We also present performance data to demonstrate the suitability of this approach, and we discuss strategies for applying component technologies to both new and existing applications
Parallel scientific computing theory, algorithms, and applications of mesh based and meshless methods

CERN Document Server

Trobec, Roman

2015-01-01

This book is concentrated on the synergy between computer science and numerical analysis. It is written to provide a firm understanding of the described approaches to computer scientists, engineers or other experts who have to solve real problems. The meshless solution approach is described in more detail, with a description of the required algorithms and the methods that are needed for the design of an efficient computer program. Most of the details are demonstrated on solutions of practical problems, from basic to more complicated ones. This book will be a useful tool for any reader interes
Pacing a data transfer operation between compute nodes on a parallel computer

Science.gov (United States)

Blocksome, Michael A [Rochester, MN

2011-09-13

Methods, systems, and products are disclosed for pacing a data transfer between compute nodes on a parallel computer that include: transferring, by an origin compute node, a chunk of an application message to a target compute node; sending, by the origin compute node, a pacing request to a target direct memory access (`DMA`) engine on the target compute node using a remote get DMA operation; determining, by the origin compute node, whether a pacing response to the pacing request has been received from the target DMA engine; and transferring, by the origin compute node, a next chunk of the application message if the pacing response to the pacing request has been received from the target DMA engine.
Analysis of achievable capacity in irregularly-placed high performance mesh nodes

CSIR Research Space (South Africa)

Olwal, TO

2012-09-01

Full Text Available -directional antenna for backhaul mesh connectivity and access. The third radio interface card is attached to a 2.4 GHz omni-directional antenna for mesh client access network. As shown in Figure 2, the HPN block diagram has a weather proof Unshielded Twisted Pair... by an embedded microcontroller technology [11]. To ensure high speed performance, the innovation has the first radio interface card attached to a 5 GHz directional antenna for backhaul mesh routing; the second interface card is connected to a 5 GHz omni...
GPU accelerated fuzzy connected image segmentation by using CUDA.

Science.gov (United States)

Zhuge, Ying; Cao, Yong; Miller, Robert W

2009-01-01

Image segmentation techniques using fuzzy connectedness principles have shown their effectiveness in segmenting a variety of objects in several large applications in recent years. However, one problem of these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays commodity graphics hardware provides high parallel computing power. In this paper, we present a parallel fuzzy connected image segmentation algorithm on Nvidia's Compute Unified Device Architecture (CUDA) platform for segmenting large medical image data sets. Our experiments based on three data sets with small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 7.2x, 7.3x, and 14.4x, correspondingly, for the three data sets over the sequential implementation of fuzzy connected image segmentation algorithm on CPU.
Practical integrated simulation systems for coupled numerical simulations in parallel

Energy Technology Data Exchange (ETDEWEB)

Osamu, Hazama; Zhihong, Guo [Japan Atomic Energy Research Inst., Centre for Promotion of Computational Science and Engineering, Tokyo (Japan)

2003-07-01

In order for the numerical simulations to reflect 'real-world' phenomena and occurrences, incorporation of multidisciplinary and multi-physics simulations considering various physical models and factors are becoming essential. However, there still exist many obstacles which inhibit such numerical simulations. For example, it is still difficult in many instances to develop satisfactory software packages which allow for such coupled simulations and such simulations will require more computational resources. A precise multi-physics simulation today will require parallel processing which again makes it a complicated process. Under the international cooperative efforts between CCSE/JAERI and Fraunhofer SCAI, a German institute, a library called the MpCCI, or Mesh-based Parallel Code Coupling Interface, has been implemented together with a library called STAMPI to couple two existing codes to develop an 'integrated numerical simulation system' intended for meta-computing environments. (authors)
TME (Task Mapping Editor): tool for executing distributed parallel computing. TME user's manual

International Nuclear Information System (INIS)

Takemiya, Hiroshi; Yamagishi, Nobuhiro; Imamura, Toshiyuki

2000-03-01

At the Center for Promotion of Computational Science and Engineering, a software environment PPExe has been developed to support scientific computing on a parallel computer cluster (distributed parallel scientific computing). TME (Task Mapping Editor) is one of components of the PPExe and provides a visual programming environment for distributed parallel scientific computing. Users can specify data dependence among tasks (programs) visually as a data flow diagram and map these tasks onto computers interactively through GUI of TME. The specified tasks are processed by other components of PPExe such as Meta-scheduler, RIM (Resource Information Monitor), and EMS (Execution Management System) according to the execution order of these tasks determined by TME. In this report, we describe the usage of TME. (author)
Influence of mesh non-orthogonality on numerical simulation of buoyant jet flows

International Nuclear Information System (INIS)

Ishigaki, Masahiro; Abe, Satoshi; Sibamoto, Yasuteru; Yonomoto, Taisuke

2017-01-01

Highlights: • Influence of mesh non-orthogonality on numerical solution of buoyant jet flows. • Buoyant jet flows are simulated with hexahedral and prismatic meshes. • Jet instability with prismatic meshes may be overestimated compared to that with hexahedral meshes. • Modified solvers that can reduce the influence of mesh non-orthogonality and reduce computation time are proposed. - Abstract: In the present research, we discuss the influence of mesh non-orthogonality on numerical solution of a type of buoyant flow. Buoyant jet flows are simulated numerically with hexahedral and prismatic mesh elements in an open source Computational Fluid Dynamics (CFD) code called “OpenFOAM”. Buoyant jet instability obtained with the prismatic meshes may be overestimated compared to that obtained with the hexahedral meshes when non-orthogonal correction is not applied in the code. Although the non-orthogonal correction method can improve the instability generated by mesh non-orthogonality, it may increase computation time required to reach a convergent solution. Thus, we propose modified solvers that can reduce the influence of mesh non-orthogonality and reduce the computation time compared to the existing solvers in OpenFOAM. It is demonstrated that calculations for a buoyant jet with a large temperature difference are performed faster by the modified solver.

Influence of mesh non-orthogonality on numerical simulation of buoyant jet flows

Energy Technology Data Exchange (ETDEWEB)

Ishigaki, Masahiro, E-mail: ishigaki.masahiro@jaea.go.jp; Abe, Satoshi; Sibamoto, Yasuteru; Yonomoto, Taisuke

2017-04-01

Highlights: • Influence of mesh non-orthogonality on numerical solution of buoyant jet flows. • Buoyant jet flows are simulated with hexahedral and prismatic meshes. • Jet instability with prismatic meshes may be overestimated compared to that with hexahedral meshes. • Modified solvers that can reduce the influence of mesh non-orthogonality and reduce computation time are proposed. - Abstract: In the present research, we discuss the influence of mesh non-orthogonality on numerical solution of a type of buoyant flow. Buoyant jet flows are simulated numerically with hexahedral and prismatic mesh elements in an open source Computational Fluid Dynamics (CFD) code called “OpenFOAM”. Buoyant jet instability obtained with the prismatic meshes may be overestimated compared to that obtained with the hexahedral meshes when non-orthogonal correction is not applied in the code. Although the non-orthogonal correction method can improve the instability generated by mesh non-orthogonality, it may increase computation time required to reach a convergent solution. Thus, we propose modified solvers that can reduce the influence of mesh non-orthogonality and reduce the computation time compared to the existing solvers in OpenFOAM. It is demonstrated that calculations for a buoyant jet with a large temperature difference are performed faster by the modified solver.
Towards Blockchain-enabled Wireless Mesh Networks

OpenAIRE

Selimi, Mennan; Kabbinale, Aniruddh Rao; Ali, Anwaar; Navarro, Leandro; Sathiaseelan, Arjuna

2018-01-01

Recently, mesh networking and blockchain are two of the hottest technologies in the telecommunications industry. Combining both can reformulate internet access and make connecting to the Internet not only easy, but affordable too. Hyperledger Fabric (HLF) is a blockchain framework implementation and one of the Hyperledger projects hosted by The Linux Foundation. We evaluate HLF in a real production mesh network and in the laboratory, quantify its performance, bottlenecks and limitations of th...
Contributions to computational stereology and parallel programming

DEFF Research Database (Denmark)

Rasmusson, Allan

rotator, even without the need for isotropic sections. To meet the need for computational power to perform image restoration of virtual tissue sections, parallel programming on GPUs has also been part of the project. This has lead to a significant change in paradigm for a previously developed surgical...
A non overlapping parallel domain decomposition method applied to the simplified transport equations

International Nuclear Information System (INIS)

Lathuiliere, B.; Barrault, M.; Ramet, P.; Roman, J.

2009-01-01

A reactivity computation requires to compute the highest eigenvalue of a generalized eigenvalue problem. An inverse power algorithm is used commonly. Very fine modelizations are difficult to tackle for our sequential solver, based on the simplified transport equations, in terms of memory consumption and computational time. So, we propose a non-overlapping domain decomposition method for the approximate resolution of the linear system to solve at each inverse power iteration. Our method brings to a low development effort as the inner multigroup solver can be re-use without modification, and allows us to adapt locally the numerical resolution (mesh, finite element order). Numerical results are obtained by a parallel implementation of the method on two different cases with a pin by pin discretization. This results are analyzed in terms of memory consumption and parallel efficiency. (authors)
Application of parallel connected power-MOSFET elements to high current d.c. power supply

International Nuclear Information System (INIS)

Matsukawa, Tatsuya; Shioyama, Masanori; Shimada, Katsuhiro; Takaku, Taku; Neumeyer, Charles; Tsuji-Iio, Shunji; Shimada, Ryuichi

2001-01-01

The low aspect ratio spherical torus (ST), which has single turn toroidal field coil, requires the extremely high d.c. current like as 20 MA to energize the coil. Considering the ratings of such extremely high current and low voltage, power-MOSFET element is employed as the switching device for the a.c./d.c. converter of power supply. One of the advantages of power-MOSFET element is low on-state resistance, which is to meet the high current and low voltage operation. Recently, the capacity of power-MOSFET element has been increased and its on-state resistance has been decreased, so that the possibility of construction of high current and low voltage a.c./d.c. converter with parallel connected power-MOSFET elements has been growing. With the aim of developing the high current d.c. power supply using power-MOSFET, the basic characteristics of parallel operation with power-MOSFET elements are experimentally investigated. And, the synchronous rectifier type and the bi-directional self commutated type a.c./d.c. converters using parallel connected power-MOSFET elements are proposed
Evaluating the performance of the particle finite element method in parallel architectures

Science.gov (United States)

Gimenez, Juan M.; Nigro, Norberto M.; Idelsohn, Sergio R.

2014-05-01

This paper presents a high performance implementation for the particle-mesh based method called particle finite element method two (PFEM-2). It consists of a material derivative based formulation of the equations with a hybrid spatial discretization which uses an Eulerian mesh and Lagrangian particles. The main aim of PFEM-2 is to solve transport equations as fast as possible keeping some level of accuracy. The method was found to be competitive with classical Eulerian alternatives for these targets, even in their range of optimal application. To evaluate the goodness of the method with large simulations, it is imperative to use of parallel environments. Parallel strategies for Finite Element Method have been widely studied and many libraries can be used to solve Eulerian stages of PFEM-2. However, Lagrangian stages, such as streamline integration, must be developed considering the parallel strategy selected. The main drawback of PFEM-2 is the large amount of memory needed, which limits its application to large problems with only one computer. Therefore, a distributed-memory implementation is urgently needed. Unlike a shared-memory approach, using domain decomposition the memory is automatically isolated, thus avoiding race conditions; however new issues appear due to data distribution over the processes. Thus, a domain decomposition strategy for both particle and mesh is adopted, which minimizes the communication between processes. Finally, performance analysis running over multicore and multinode architectures are presented. The Courant-Friedrichs-Lewy number used influences the efficiency of the parallelization and, in some cases, a weighted partitioning can be used to improve the speed-up. However the total cputime for cases presented is lower than that obtained when using classical Eulerian strategies.
Parallel file system performances in fusion data storage

Energy Technology Data Exchange (ETDEWEB)

Iannone, F., E-mail: francesco.iannone@enea.it [Associazione EURATOM-ENEA sulla Fusione, C.R.ENEA Frascati, via E.Fermi, 45 - 00044 Frascati, Rome (Italy); Podda, S.; Bracco, G. [ENEA Information Communication Tecnologies, Lungotevere Thaon di Revel, 76 - 00196 Rome (Italy); Manduchi, G. [Associazione EURATOM-ENEA sulla Fusione, Consorzio RFX, Corso Stati Uniti, 4 - 35127 Padua (Italy); Maslennikov, A. [CASPUR Inter-University Consortium for the Application of Super-Computing for Research, via dei Tizii, 6b - 00185 Rome (Italy); Migliori, S. [ENEA Information Communication Tecnologies, Lungotevere Thaon di Revel, 76 - 00196 Rome (Italy); Wolkersdorfer, K. [Juelich Supercomputing Centre-FZJ, D-52425 Juelich (Germany)

2012-12-15

High I/O flow rates, up to 10 GB/s, are required in large fusion Tokamak experiments like ITER where hundreds of nodes store simultaneously large amounts of data acquired during the plasma discharges. Typical network topologies such as linear arrays (systolic), rings, meshes (2-D arrays), tori (3-D arrays), trees, butterfly, hypercube in combination with high speed data transports like Infiniband or 10G-Ethernet, are the main areas in which the effort to overcome the so-called parallel I/O bottlenecks is most focused. The high I/O flow rates were modelled in an emulated testbed based on the parallel file systems such as Lustre and GPFS, commonly used in High Performance Computing. The test runs on High Performance Computing-For Fusion (8640 cores) and ENEA CRESCO (3392 cores) supercomputers. Message Passing Interface based applications were developed to emulate parallel I/O on Lustre and GPFS using data archival and access solutions like MDSPLUS and Universal Access Layer. These methods of data storage organization are widely diffused in nuclear fusion experiments and are being developed within the EFDA Integrated Tokamak Modelling - Task Force; the authors tried to evaluate their behaviour in a realistic emulation setup.
Communication complexity of distributed computing and a parallel algorithm for polynomial roots

International Nuclear Information System (INIS)

Tiwari, P.

1986-01-01

The first part of this thesis begins with a discussion of the minimum communication requirements in some distributed networks. The main result is a general technique for determining lower bounds on the communication complexity of problems on various distributed computer networks. This general technique is derived by simulating the general network by a linear array and then using a lower bound on the communication complexity of the problem on the linear array. Applications of this technique yield nontrivial optimal or near-optimal lower bounds on the communication complexity of distinctness, ranking, uniqueness, merging, and triangle detection on a ring, a mesh, and a complete binary tree of processors. A technique similar to the one used in proving the above results, yields interesting graph theoretic results concerning decomposition of a graph into complete bipartite subgraphs. The second part of the this is devoted to the design of a fast parallel algorithm for determining all roots of a polynomial. Given a polynomial rho(z) of degree n with m bit integer coefficients and an integer μ, the author considers the problem of determining all its roots with error less than 2/sup -μ/. It is shown that this problem is in the class NC if rho(z) has all real roots
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment.

Science.gov (United States)

Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che

2014-01-16

To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high
Accelerating Astronomy & Astrophysics in the New Era of Parallel Computing: GPUs, Phi and Cloud Computing

Science.gov (United States)

Ford, Eric B.; Dindar, Saleh; Peters, Jorg

2015-08-01

The realism of astrophysical simulations and statistical analyses of astronomical data are set by the available computational resources. Thus, astronomers and astrophysicists are constantly pushing the limits of computational capabilities. For decades, astronomers benefited from massive improvements in computational power that were driven primarily by increasing clock speeds and required relatively little attention to details of the computational hardware. For nearly a decade, increases in computational capabilities have come primarily from increasing the degree of parallelism, rather than increasing clock speeds. Further increases in computational capabilities will likely be led by many-core architectures such as Graphical Processing Units (GPUs) and Intel Xeon Phi. Successfully harnessing these new architectures, requires significantly more understanding of the hardware architecture, cache hierarchy, compiler capabilities and network network characteristics.I will provide an astronomer's overview of the opportunities and challenges provided by modern many-core architectures and elastic cloud computing. The primary goal is to help an astronomical audience understand what types of problems are likely to yield more than order of magnitude speed-ups and which problems are unlikely to parallelize sufficiently efficiently to be worth the development time and/or costs.I will draw on my experience leading a team in developing the Swarm-NG library for parallel integration of large ensembles of small n-body systems on GPUs, as well as several smaller software projects. I will share lessons learned from collaborating with computer scientists, including both technical and soft skills. Finally, I will discuss the challenges of training the next generation of astronomers to be proficient in this new era of high-performance computing, drawing on experience teaching a graduate class on High-Performance Scientific Computing for Astrophysics and organizing a 2014 advanced summer
Short-circuit testing of monofilar Bi-2212 coils connected in series and in parallel

International Nuclear Information System (INIS)

Polasek, A; Dias, R; Serra, E T; Filho, O O; Niedu, D

2010-01-01

Superconducting Fault Current Limiters (SCFCL's) are one of the most promising technologies for fault current limitation. In the present work, resistive SCFCL components based on Bi-2212 monofilar coils are subjected to short-circuit testing. These SCFCL components can be easily connected in series and/or in parallel by using joints and clamps. This allows a considerable flexibility to developing larger SCFCL devices, since the configuration and size of the whole device can be easily adapted to the operational conditions. The single components presented critical current (Ic) values of 240-260 A, at 77 K. Short-circuits during 40-120 ms were applied. A single component can withstand a voltage drop of 126-252 V (0.3-0.6 V/cm). Components connected in series withstand higher voltage levels, whereas parallel connection allows higher rated currents during normal operation, but the limited current is also higher. Prospective currents as high as 10-40 kA (peak value) were limited to 3-9 kA (peak value) in the first half cycle.
Parallelization of a three-dimensional whole core transport code DeCART

Energy Technology Data Exchange (ETDEWEB)

Jin Young, Cho; Han Gyu, Joo; Ha Yong, Kim; Moon-Hee, Chang [Korea Atomic Energy Research Institute, Yuseong-gu, Daejon (Korea, Republic of)

2003-07-01

Parallelization of the DeCART (deterministic core analysis based on ray tracing) code is presented that reduces the computational burden of the tremendous computing time and memory required in three-dimensional whole core transport calculations. The parallelization employs the concept of MPI grouping and the MPI/OpenMP mixed scheme as well. Since most of the computing time and memory are used in MOC (method of characteristics) and the multi-group CMFD (coarse mesh finite difference) calculation in DeCART, variables and subroutines related to these two modules are the primary targets for parallelization. Specifically, the ray tracing module was parallelized using a planar domain decomposition scheme and an angular domain decomposition scheme. The parallel performance of the DeCART code is evaluated by solving a rodded variation of the C5G7MOX three dimensional benchmark problem and a simplified three-dimensional SMART PWR core problem. In C5G7MOX problem with 24 CPUs, a speedup of maximum 21 is obtained on an IBM Regatta machine and 22 on a LINUX Cluster in the MOC kernel, which indicates good parallel performance of the DeCART code. In the simplified SMART problem, the memory requirement of about 11 GBytes in the single processor cases reduces to 940 Mbytes with 24 processors, which means that the DeCART code can now solve large core problems with affordable LINUX clusters. (authors)
Vector and parallel processors in computational science

International Nuclear Information System (INIS)

Duff, I.S.; Reid, J.K.

1985-01-01

These proceedings contain the articles presented at the named conference. These concern hardware and software for vector and parallel processors, numerical methods and algorithms for the computation on such processors, as well as applications of such methods to different fields of physics and related sciences. See hints under the relevant topics. (HSI)
Modelling and simulation of multiple single - phase induction motor in parallel connection

Directory of Open Access Journals (Sweden)

Sujitjorn, S.

2006-11-01

Full Text Available A mathematical model for parallel connected n-multiple single-phase induction motors in generalized state-space form is proposed in this paper. The motor group draws electric power from one inverter. The model is developed by the dq-frame theory and was tested against four loading scenarios in which satisfactory results were obtained.
Application of parallel computing techniques to a large-scale reservoir simulation

International Nuclear Information System (INIS)

Zhang, Keni; Wu, Yu-Shu; Ding, Chris; Pruess, Karsten

2001-01-01

Even with the continual advances made in both computational algorithms and computer hardware used in reservoir modeling studies, large-scale simulation of fluid and heat flow in heterogeneous reservoirs remains a challenge. The problem commonly arises from intensive computational requirement for detailed modeling investigations of real-world reservoirs. This paper presents the application of a massive parallel-computing version of the TOUGH2 code developed for performing large-scale field simulations. As an application example, the parallelized TOUGH2 code is applied to develop a three-dimensional unsaturated-zone numerical model simulating flow of moisture, gas, and heat in the unsaturated zone of Yucca Mountain, Nevada, a potential repository for high-level radioactive waste. The modeling approach employs refined spatial discretization to represent the heterogeneous fractured tuffs of the system, using more than a million 3-D gridblocks. The problem of two-phase flow and heat transfer within the model domain leads to a total of 3,226,566 linear equations to be solved per Newton iteration. The simulation is conducted on a Cray T3E-900, a distributed-memory massively parallel computer. Simulation results indicate that the parallel computing technique, as implemented in the TOUGH2 code, is very efficient. The reliability and accuracy of the model results have been demonstrated by comparing them to those of small-scale (coarse-grid) models. These comparisons show that simulation results obtained with the refined grid provide more detailed predictions of the future flow conditions at the site, aiding in the assessment of proposed repository performance
Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems

KAUST Repository

Mudigere, Dheevatsa

2015-05-01

In this work, we revisit the 1999 Gordon Bell Prize winning PETSc-FUN3D aerodynamics code, extending it with highly-tuned shared-memory parallelization and detailed performance analysis on modern highly parallel architectures. An unstructured-grid implicit flow solver, which forms the backbone of computational aerodynamics, poses particular challenges due to its large irregular working sets, unstructured memory accesses, and variable/limited amount of parallelism. This code, based on a domain decomposition approach, exposes tradeoffs between the number of threads assigned to each MPI-rank sub domain, and the total number of domains. By applying several algorithm- and architecture-aware optimization techniques for unstructured grids, we show a 6.9X speed-up in performance on a single-node Intel® XeonTM1 E5 2690 v2 processor relative to the out-of-the-box compilation. Our scaling studies on TACC Stampede supercomputer show that our optimizations continue to provide performance benefits over baseline implementation as we scale up to 256 nodes.
Cartesian anisotropic mesh adaptation for compressible flow

International Nuclear Information System (INIS)

Keats, W.A.; Lien, F.-S.

2004-01-01

Simulating transient compressible flows involving shock waves presents challenges to the CFD practitioner in terms of the mesh quality required to resolve discontinuities and prevent smearing. This paper discusses a novel two-dimensional Cartesian anisotropic mesh adaptation technique implemented for compressible flow. This technique, developed for laminar flow by Ham, Lien and Strong, is efficient because it refines and coarsens cells using criteria that consider the solution in each of the cardinal directions separately. In this paper the method will be applied to compressible flow. The procedure shows promise in its ability to deliver good quality solutions while achieving computational savings. The convection scheme used is the Advective Upstream Splitting Method (Plus), and the refinement/ coarsening criteria are based on work done by Ham et al. Transient shock wave diffraction over a backward step and shock reflection over a forward step are considered as test cases because they demonstrate that the quality of the solution can be maintained as the mesh is refined and coarsened in time. The data structure is explained in relation to the computational mesh, and the object-oriented design and implementation of the code is presented. Refinement and coarsening algorithms are outlined. Computational savings over uniform and isotropic mesh approaches are shown to be significant. (author)
WEKA-G: Parallel data mining on computational grids

Directory of Open Access Journals (Sweden)

PIMENTA, A.

2009-12-01

Full Text Available Data mining is a technology that can extract useful information from large amounts of data. However, mining a database often requires a high computational power. To resolve this problem, this paper presents a tool (Weka-G, which runs in parallel algorithms used in the mining process data. As the environment for doing so, we use a computational grid by adding several features within a WAN.
Effecting a broadcast with an allreduce operation on a parallel computer

Science.gov (United States)

Almasi, Gheorghe; Archer, Charles J.; Ratterman, Joseph D.; Smith, Brian E.

2010-11-02

A parallel computer comprises a plurality of compute nodes organized into at least one operational group for collective parallel operations. Each compute node is assigned a unique rank and is coupled for data communications through a global combining network. One compute node is assigned to be a logical root. A send buffer and a receive buffer is configured. Each element of a contribution of the logical root in the send buffer is contributed. One or more zeros corresponding to a size of the element are injected. An allreduce operation with a bitwise OR using the element and the injected zeros is performed. And the result for the allreduce operation is determined and stored in each receive buffer.
Parallel performances of three 3D reconstruction methods on MIMD computers: Feldkamp, block ART and SIRT algorithms

International Nuclear Information System (INIS)

Laurent, C.; Chassery, J.M.; Peyrin, F.; Girerd, C.

1996-01-01

This paper deals with the parallel implementations of reconstruction methods in 3D tomography. 3D tomography requires voluminous data and long computation times. Parallel computing, on MIMD computers, seems to be a good approach to manage this problem. In this study, we present the different steps of the parallelization on an abstract parallel computer. Depending on the method, we use two main approaches to parallelize the algorithms: the local approach and the global approach. Experimental results on MIMD computers are presented. Two 3D images reconstructed from realistic data are showed

10th International Workshop on Parallel Tools for High Performance Computing

CERN Document Server

Gracia, José; Hilbrich, Tobias; Knüpfer, Andreas; Resch, Michael; Nagel, Wolfgang

2017-01-01

This book presents the proceedings of the 10th International Parallel Tools Workshop, held October 4-5, 2016 in Stuttgart, Germany – a forum to discuss the latest advances in parallel tools. High-performance computing plays an increasingly important role for numerical simulation and modelling in academic and industrial research. At the same time, using large-scale parallel systems efficiently is becoming more difficult. A number of tools addressing parallel program development and analysis have emerged from the high-performance computing community over the last decade, and what may have started as collection of small helper script has now matured to production-grade frameworks. Powerful user interfaces and an extensive body of documentation allow easy usage by non-specialists.
A learnable parallel processing architecture towards unity of memory and computing.

Science.gov (United States)

Li, H; Gao, B; Chen, Z; Zhao, Y; Huang, P; Ye, H; Liu, L; Liu, X; Kang, J

2015-08-14

Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named "iMemComp", where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped "iMemComp" with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on "iMemComp" can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.
A learnable parallel processing architecture towards unity of memory and computing

Science.gov (United States)

Li, H.; Gao, B.; Chen, Z.; Zhao, Y.; Huang, P.; Ye, H.; Liu, L.; Liu, X.; Kang, J.

2015-08-01

Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named “iMemComp”, where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped “iMemComp” with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on “iMemComp” can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.
Lagrangian fluid dynamics using the Voronoi-Delauanay mesh

International Nuclear Information System (INIS)

Dukowicz, J.K.

1981-01-01

A Lagrangian technique for numerical fluid dynamics is described. This technique makes use of the Voronoi mesh to efficiently locate new neighbors, and it uses the dual (Delaunay) triangulation to define computational cells. This removes all topological restrictions and facilitates the solution of problems containing interfaces and multiple materials. To improve computational accuracy a mesh smoothing procedure is employed
Characterization of the mechanism of drug-drug interactions from PubMed using MeSH terms.

Science.gov (United States)

Lu, Yin; Figler, Bryan; Huang, Hong; Tu, Yi-Cheng; Wang, Ju; Cheng, Feng

2017-01-01

Identifying drug-drug interaction (DDI) is an important topic for the development of safe pharmaceutical drugs and for the optimization of multidrug regimens for complex diseases such as cancer and HIV. There have been about 150,000 publications on DDIs in PubMed, which is a great resource for DDI studies. In this paper, we introduced an automatic computational method for the systematic analysis of the mechanism of DDIs using MeSH (Medical Subject Headings) terms from PubMed literature. MeSH term is a controlled vocabulary thesaurus developed by the National Library of Medicine for indexing and annotating articles. Our method can effectively identify DDI-relevant MeSH terms such as drugs, proteins and phenomena with high accuracy. The connections among these MeSH terms were investigated by using co-occurrence heatmaps and social network analysis. Our approach can be used to visualize relationships of DDI terms, which has the potential to help users better understand DDIs. As the volume of PubMed records increases, our method for automatic analysis of DDIs from the PubMed database will become more accurate.
Beam dynamics simulations using a parallel version of PARMILA

International Nuclear Information System (INIS)

Ryne, R.D.

1996-01-01

The computer code PARMILA has been the primary tool for the design of proton and ion linacs in the United States for nearly three decades. Previously it was sufficient to perform simulations with of order 10000 particles, but recently the need to perform high resolution halo studies for next-generation, high intensity linacs has made it necessary to perform simulations with of order 100 million particles. With the advent of massively parallel computers such simulations are now within reach. Parallel computers already make it possible, for example, to perform beam dynamics calculations with tens of millions of particles, requiring over 10 GByte of core memory, in just a few hours. Also, parallel computers are becoming easier to use thanks to the availability of mature, Fortran-like languages such as Connection Machine Fortran and High Performance Fortran. We will describe our experience developing a parallel version of PARMILA and the performance of the new code
Beam dynamics simulations using a parallel version of PARMILA

International Nuclear Information System (INIS)

Ryne, Robert

1996-01-01

The computer code PARMILA has been the primary tool for the design of proton and ion linacs in the United States for nearly three decades. Previously it was sufficient to perform simulations with of order 10000 particles, but recently the need to perform high resolution halo studies for next-generation, high intensity linacs has made it necessary to perform simulations with of order 100 million particles. With the advent of massively parallel computers such simulations are now within reach. Parallel computers already make it possible, for example, to perform beam dynamics calculations with tens of millions of particles, requiring over 10 GByte of core memory, in just a few hours. Also, parallel computers are becoming easier to use thanks to the availability of mature, Fortran-like languages such as Connection Machine Fortran and High Performance Fortran. We will describe our experience developing a parallel version of PARMILA and the performance of the new code. (author)
On the efficient parallel computation of Legendre transforms

NARCIS (Netherlands)

Inda, M.A.; Bisseling, R.H.; Maslen, D.K.

2001-01-01

In this article, we discuss a parallel implementation of efficient algorithms for computation of Legendre polynomial transforms and other orthogonal polynomial transforms. We develop an approach to the Driscoll-Healy algorithm using polynomial arithmetic and present experimental results on the
On the efficient parallel computation of Legendre transforms

NARCIS (Netherlands)

Inda, M.A.; Bisseling, R.H.; Maslen, D.K.

1999-01-01

In this article we discuss a parallel implementation of efficient algorithms for computation of Legendre polynomial transforms and other orthogonal polynomial transforms. We develop an approach to the Driscoll-Healy algorithm using polynomial arithmetic and present experimental results on the
Analysis of multigrid methods on massively parallel computers: Architectural implications

Science.gov (United States)

Matheson, Lesley R.; Tarjan, Robert E.

1993-01-01

We study the potential performance of multigrid algorithms running on massively parallel computers with the intent of discovering whether presently envisioned machines will provide an efficient platform for such algorithms. We consider the domain parallel version of the standard V cycle algorithm on model problems, discretized using finite difference techniques in two and three dimensions on block structured grids of size 10(exp 6) and 10(exp 9), respectively. Our models of parallel computation were developed to reflect the computing characteristics of the current generation of massively parallel multicomputers. These models are based on an interconnection network of 256 to 16,384 message passing, 'workstation size' processors executing in an SPMD mode. The first model accomplishes interprocessor communications through a multistage permutation network. The communication cost is a logarithmic function which is similar to the costs in a variety of different topologies. The second model allows single stage communication costs only. Both models were designed with information provided by machine developers and utilize implementation derived parameters. With the medium grain parallelism of the current generation and the high fixed cost of an interprocessor communication, our analysis suggests an efficient implementation requires the machine to support the efficient transmission of long messages, (up to 1000 words) or the high initiation cost of a communication must be significantly reduced through an alternative optimization technique. Furthermore, with variable length message capability, our analysis suggests the low diameter multistage networks provide little or no advantage over a simple single stage communications network.
HypGrid2D. A 2-d mesh generator

Energy Technology Data Exchange (ETDEWEB)

Soerensen, N N

1998-03-01

The implementation of a hyperbolic mesh generation procedure, based on an equation for orthogonality and an equation for the cell face area is described. The method is fast, robust and gives meshes with good smoothness and orthogonality. The procedure is implemented in a program called HypGrid2D. The HypGrid2D program is capable of generating C-, O- and `H`-meshes for use in connection with the EllipSys2D Navier-Stokes solver. To illustrate the capabilities of the program, some test examples are shown. First a series of C-meshes are generated around a NACA-0012 airfoil. Secondly a series of O-meshes are generated around a NACA-65-418 airfoil. Finally `H`-meshes are generated over a Gaussian hill and a linear escarpment. (au)
Parallel computing in experimental mechanics and optical measurement: A review (II)

Science.gov (United States)

Wang, Tianyi; Kemao, Qian

2018-05-01

With advantages such as non-destructiveness, high sensitivity and high accuracy, optical techniques have successfully integrated into various important physical quantities in experimental mechanics (EM) and optical measurement (OM). However, in pursuit of higher image resolutions for higher accuracy, the computation burden of optical techniques has become much heavier. Therefore, in recent years, heterogeneous platforms composing of hardware such as CPUs and GPUs, have been widely employed to accelerate these techniques due to their cost-effectiveness, short development cycle, easy portability, and high scalability. In this paper, we analyze various works by first illustrating their different architectures, followed by introducing their various parallel patterns for high speed computation. Next, we review the effects of CPU and GPU parallel computing specifically in EM & OM applications in a broad scope, which include digital image/volume correlation, fringe pattern analysis, tomography, hyperspectral imaging, computer-generated holograms, and integral imaging. In our survey, we have found that high parallelism can always be exploited in such applications for the development of high-performance systems.
8th International Workshop on Parallel Tools for High Performance Computing

CERN Document Server

Gracia, José; Knüpfer, Andreas; Resch, Michael; Nagel, Wolfgang

2015-01-01

Numerical simulation and modelling using High Performance Computing has evolved into an established technique in academic and industrial research. At the same time, the High Performance Computing infrastructure is becoming ever more complex. For instance, most of the current top systems around the world use thousands of nodes in which classical CPUs are combined with accelerator cards in order to enhance their compute power and energy efficiency. This complexity can only be mastered with adequate development and optimization tools. Key topics addressed by these tools include parallelization on heterogeneous systems, performance optimization for CPUs and accelerators, debugging of increasingly complex scientific applications, and optimization of energy usage in the spirit of green IT. This book represents the proceedings of the 8th International Parallel Tools Workshop, held October 1-2, 2014 in Stuttgart, Germany – which is a forum to discuss the latest advancements in the parallel tools.
A Navier-Strokes Chimera Code on the Connection Machine CM-5: Design and Performance

Science.gov (United States)

Jespersen, Dennis C.; Levit, Creon; Kwak, Dochan (Technical Monitor)

1994-01-01

We have implemented a three-dimensional compressible Navier-Stokes code on the Connection Machine CM-5. The code is set up for implicit time-stepping on single or multiple structured grids. For multiple grids and geometrically complex problems, we follow the 'chimera' approach, where flow data on one zone is interpolated onto another in the region of overlap. We will describe our design philosophy and give some timing results for the current code. A parallel machine like the CM-5 is well-suited for finite-difference methods on structured grids. The regular pattern of connections of a structured mesh maps well onto the architecture of the machine. So the first design choice, finite differences on a structured mesh, is natural. We use centered differences in space, with added artificial dissipation terms. When numerically solving the Navier-Stokes equations, there are liable to be some mesh cells near a solid body that are small in at least one direction. This mesh cell geometry can impose a very severe CFL (Courant-Friedrichs-Lewy) condition on the time step for explicit time-stepping methods. Thus, though explicit time-stepping is well-suited to the architecture of the machine, we have adopted implicit time-stepping. We have further taken the approximate factorization approach. This creates the need to solve large banded linear systems and creates the first possible barrier to an efficient algorithm. To overcome this first possible barrier we have considered two options. The first is just to solve the banded linear systems with data spread over the whole machine, using whatever fast method is available. This option is adequate for solving scalar tridiagonal systems, but for scalar pentadiagonal or block tridiagonal systems it is somewhat slower than desired. The second option is to 'transpose' the flow and geometry variables as part of the time-stepping process: Start with x-lines of data in-processor. Form explicit terms in x, then transpose so y-lines of data are
Eighth SIAM conference on parallel processing for scientific computing: Final program and abstracts

Energy Technology Data Exchange (ETDEWEB)

NONE

1997-12-31

This SIAM conference is the premier forum for developments in parallel numerical algorithms, a field that has seen very lively and fruitful developments over the past decade, and whose health is still robust. Themes for this conference were: combinatorial optimization; data-parallel languages; large-scale parallel applications; message-passing; molecular modeling; parallel I/O; parallel libraries; parallel software tools; parallel compilers; particle simulations; problem-solving environments; and sparse matrix computations.
How to Build an AppleSeed: A Parallel Macintosh Cluster for Numerically Intensive Computing

Science.gov (United States)

Decyk, V. K.; Dauger, D. E.

We have constructed a parallel cluster consisting of a mixture of Apple Macintosh G3 and G4 computers running the Mac OS, and have achieved very good performance on numerically intensive, parallel plasma particle-incell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the main stream of computing.
Efficient 3D geometric and Zernike moments computation from unstructured surface meshes.

Science.gov (United States)

Pozo, José María; Villa-Uriol, Maria-Cruz; Frangi, Alejandro F

2011-03-01

This paper introduces and evaluates a fast exact algorithm and a series of faster approximate algorithms for the computation of 3D geometric moments from an unstructured surface mesh of triangles. Being based on the object surface reduces the computational complexity of these algorithms with respect to volumetric grid-based algorithms. In contrast, it can only be applied for the computation of geometric moments of homogeneous objects. This advantage and restriction is shared with other proposed algorithms based on the object boundary. The proposed exact algorithm reduces the computational complexity for computing geometric moments up to order N with respect to previously proposed exact algorithms, from N(9) to N(6). The approximate series algorithm appears as a power series on the rate between triangle size and object size, which can be truncated at any desired degree. The higher the number and quality of the triangles, the better the approximation. This approximate algorithm reduces the computational complexity to N(3). In addition, the paper introduces a fast algorithm for the computation of 3D Zernike moments from the computed geometric moments, with a computational complexity N(4), while the previously proposed algorithm is of order N(6). The error introduced by the proposed approximate algorithms is evaluated in different shapes and the cost-benefit ratio in terms of error, and computational time is analyzed for different moment orders.
Fast parallel molecular algorithms for DNA-based computation: factoring integers.

Science.gov (United States)

Chang, Weng-Long; Guo, Minyi; Ho, Michael Shan-Hui

2005-06-01

The RSA public-key cryptosystem is an algorithm that converts input data to an unrecognizable encryption and converts the unrecognizable data back into its original decryption form. The security of the RSA public-key cryptosystem is based on the difficulty of factoring the product of two large prime numbers. This paper demonstrates to factor the product of two large prime numbers, and is a breakthrough in basic biological operations using a molecular computer. In order to achieve this, we propose three DNA-based algorithms for parallel subtractor, parallel comparator, and parallel modular arithmetic that formally verify our designed molecular solutions for factoring the product of two large prime numbers. Furthermore, this work indicates that the cryptosystems using public-key are perhaps insecure and also presents clear evidence of the ability of molecular computing to perform complicated mathematical operations.
Electromagnetic Physics Models for Parallel Computing Architectures

Science.gov (United States)

Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

2016-10-01

The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.
Intranode data communications in a parallel computer

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Ratterman, Joseph D; Smith, Brian E

2014-01-07

Intranode data communications in a parallel computer that includes compute nodes configured to execute processes, where the data communications include: allocating, upon initialization of a first process of a computer node, a region of shared memory; establishing, by the first process, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; sending, to a second process on the same compute node, a data communications message without determining whether the second process has been initialized, including storing the data communications message in the message buffer of the second process; and upon initialization of the second process: retrieving, by the second process, a pointer to the second process's message buffer; and retrieving, by the second process from the second process's message buffer in dependence upon the pointer, the data communications message sent by the first process.

Intranode data communications in a parallel computer

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Ratterman, Joseph D; Smith, Brian E

2013-07-23

Intranode data communications in a parallel computer that includes compute nodes configured to execute processes, where the data communications include: allocating, upon initialization of a first process of a compute node, a region of shared memory; establishing, by the first process, a predefined number of message buffers, each message buffer associated with a process to be initialized on the compute node; sending, to a second process on the same compute node, a data communications message without determining whether the second process has been initialized, including storing the data communications message in the message buffer of the second process; and upon initialization of the second process: retrieving, by the second process, a pointer to the second process's message buffer; and retrieving, by the second process from the second process's message buffer in dependence upon the pointer, the data communications message sent by the first process.
Algebraic mesh generation for large scale viscous-compressible aerodynamic simulation

International Nuclear Information System (INIS)

Smith, R.E.

1984-01-01

Viscous-compressible aerodynamic simulation is the numerical solution of the compressible Navier-Stokes equations and associated boundary conditions. Boundary-fitted coordinate systems are well suited for the application of finite difference techniques to the Navier-Stokes equations. An algebraic approach to boundary-fitted coordinate systems is one where an explicit functional relation describes a mesh on which a solution is obtained. This approach has the advantage of rapid-precise mesh control. The basic mathematical structure of three algebraic mesh generation techniques is described. They are transfinite interpolation, the multi-surface method, and the two-boundary technique. The Navier-Stokes equations are transformed to a computational coordinate system where boundary-fitted coordinates can be applied. Large-scale computation implies that there is a large number of mesh points in the coordinate system. Computation of viscous compressible flow using boundary-fitted coordinate systems and the application of this computational philosophy on a vector computer are presented
Distributed-memory matrix computations

DEFF Research Database (Denmark)

Balle, Susanne Mølleskov

1995-01-01

The main goal of this project is to investigate, develop, and implement algorithms for numerical linear algebra on parallel computers in order to acquire expertise in methods for parallel computations. An important motivation for analyzaing and investigating the potential for parallelism in these......The main goal of this project is to investigate, develop, and implement algorithms for numerical linear algebra on parallel computers in order to acquire expertise in methods for parallel computations. An important motivation for analyzaing and investigating the potential for parallelism...... in these algorithms is that many scientific applications rely heavily on the performance of the involved dense linear algebra building blocks. Even though we consider the distributed-memory as well as the shared-memory programming paradigm, the major part of the thesis is dedicated to distributed-memory architectures....... We emphasize distributed-memory massively parallel computers - such as the Connection Machines model CM-200 and model CM-5/CM-5E - available to us at UNI-C and at Thinking Machines Corporation. The CM-200 was at the time this project started one of the few existing massively parallel computers...
Adaptive mesh refinement for storm surge

KAUST Repository

Mandli, Kyle T.; Dawson, Clint N.

2014-01-01

An approach to utilizing adaptive mesh refinement algorithms for storm surge modeling is proposed. Currently numerical models exist that can resolve the details of coastal regions but are often too costly to be run in an ensemble forecasting framework without significant computing resources. The application of adaptive mesh refinement algorithms substantially lowers the computational cost of a storm surge model run while retaining much of the desired coastal resolution. The approach presented is implemented in the GeoClaw framework and compared to ADCIRC for Hurricane Ike along with observed tide gauge data and the computational cost of each model run. © 2014 Elsevier Ltd.
Adaptive mesh refinement for storm surge

KAUST Repository

Mandli, Kyle T.

2014-03-01

An approach to utilizing adaptive mesh refinement algorithms for storm surge modeling is proposed. Currently numerical models exist that can resolve the details of coastal regions but are often too costly to be run in an ensemble forecasting framework without significant computing resources. The application of adaptive mesh refinement algorithms substantially lowers the computational cost of a storm surge model run while retaining much of the desired coastal resolution. The approach presented is implemented in the GeoClaw framework and compared to ADCIRC for Hurricane Ike along with observed tide gauge data and the computational cost of each model run. © 2014 Elsevier Ltd.
Parallel transposition of sparse data structures

DEFF Research Database (Denmark)

Wang, Hao; Liu, Weifeng; Hou, Kaixi

2016-01-01

Many applications in computational sciences and social sciences exploit sparsity and connectivity of acquired data. Even though many parallel sparse primitives such as sparse matrix-vector (SpMV) multiplication have been extensively studied, some other important building blocks, e.g., parallel tr...... transposition in the latest vendor-supplied library on an Intel multicore CPU platform, and the MergeTrans approach achieves on average of 3.4-fold (up to 11.7-fold) speedup on an Intel Xeon Phi many-core processor....
Fast Evaluation of Segmentation Quality with Parallel Computing

Directory of Open Access Journals (Sweden)

Henry Cruz

2017-01-01

Full Text Available In digital image processing and computer vision, a fairly frequent task is the performance comparison of different algorithms on enormous image databases. This task is usually time-consuming and tedious, such that any kind of tool to simplify this work is welcome. To achieve an efficient and more practical handling of a normally tedious evaluation, we implemented the automatic detection system, with the help of MATLAB®’s Parallel Computing Toolbox™. The key parts of the system have been parallelized to achieve simultaneous execution and analysis of segmentation algorithms on the one hand and the evaluation of detection accuracy for the nonforested regions, such as a study case, on the other hand. As a positive side effect, CPU usage was reduced and processing time was significantly decreased by 68.54% compared to sequential processing (i.e., executing the system with each algorithm one by one.
Study and obtention of exact, and approximation, algorithms and heuristics for a mesh partitioning problem under memory constraints

International Nuclear Information System (INIS)

Morais, Sebastien

2016-01-01

In many scientific areas, the size and the complexity of numerical simulations lead to make intensive use of massively parallel runs on High Performance Computing (HPC) architectures. Such computers consist in a set of processing units (PU) where memory is distributed. Distribution of simulation data is therefore crucial: it has to minimize the computation time of the simulation while ensuring that the data allocated to every PU can be locally stored in memory. For most of the numerical simulations, the physical and numerical data are based on a mesh. The computations are then performed at the cell level (for example within triangles and quadrilaterals in 2D, or within tetrahedrons and hexahedrons in 3D). More specifically, computing and memory cost can be associated to each cell. In our context, where the mathematical methods used are finite elements or finite volumes, the realization of the computations associated with a cell may require information carried by neighboring cells. The standard implementation relies to locally store useful data of this neighborhood on the PU, even if cells of this neighborhood are not locally computed. Such non computed but stored cells are called ghost cells, and can have a significant impact on the memory consumption of a PU. The problem to solve is thus not only to partition a mesh on several parts by affecting each cell to one and only one part while minimizing the computational load assigned to each part. It is also necessary to keep into account that the memory load of both the cells where the computations are performed and their neighbors has to fit into PU memory. This leads to partition the computations while the mesh is distributed with overlaps. Explicitly taking these data overlaps into account is the problem that we propose to study. (author) [fr
Parallel computation of rotating flows

DEFF Research Database (Denmark)

Lundin, Lars Kristian; Barker, Vincent A.; Sørensen, Jens Nørkær

1999-01-01

This paper deals with the simulation of 3‐D rotating flows based on the velocity‐vorticity formulation of the Navier‐Stokes equations in cylindrical coordinates. The governing equations are discretized by a finite difference method. The solution is advanced to a new time level by a two‐step process...... is that of solving a singular, large, sparse, over‐determined linear system of equations, and the iterative method CGLS is applied for this purpose. We discuss some of the mathematical and numerical aspects of this procedure and report on the performance of our software on a wide range of parallel computers. Darbe...
A Deep Penetration Problem Calculation Using AETIUS:An Easy Modeling Discrete Ordinates Transport Code UsIng Unstructured Tetrahedral Mesh, Shared Memory Parallel

Science.gov (United States)

KIM, Jong Woon; LEE, Young-Ouk

2017-09-01

As computing power gets better and better, computer codes that use a deterministic method seem to be less useful than those using the Monte Carlo method. In addition, users do not like to think about space, angles, and energy discretization for deterministic codes. However, a deterministic method is still powerful in that we can obtain a solution of the flux throughout the problem, particularly as when particles can barely penetrate, such as in a deep penetration problem with small detection volumes. Recently, a new state-of-the-art discrete-ordinates code, ATTILA, was developed and has been widely used in several applications. ATTILA provides the capabilities to solve geometrically complex 3-D transport problems by using an unstructured tetrahedral mesh. Since 2009, we have been developing our own code by benchmarking ATTILA. AETIUS is a discrete ordinates code that uses an unstructured tetrahedral mesh such as ATTILA. For pre- and post- processing, Gmsh is used to generate an unstructured tetrahedral mesh by importing a CAD file (*.step) and visualizing the calculation results of AETIUS. Using a CAD tool, the geometry can be modeled very easily. In this paper, we describe a brief overview of AETIUS and provide numerical results from both AETIUS and a Monte Carlo code, MCNP5, in a deep penetration problem with small detection volumes. The results demonstrate the effectiveness and efficiency of AETIUS for such calculations.
Establishing a group of endpoints in a parallel computer

Science.gov (United States)

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.; Xue, Hanhong

2016-02-02

A parallel computer executes a number of tasks, each task includes a number of endpoints and the endpoints are configured to support collective operations. In such a parallel computer, establishing a group of endpoints receiving a user specification of a set of endpoints included in a global collection of endpoints, where the user specification defines the set in accordance with a predefined virtual representation of the endpoints, the predefined virtual representation is a data structure setting forth an organization of tasks and endpoints included in the global collection of endpoints and the user specification defines the set of endpoints without a user specification of a particular endpoint; and defining a group of endpoints in dependence upon the predefined virtual representation of the endpoints and the user specification.
Noise simulation in cone beam CT imaging with parallel computing

International Nuclear Information System (INIS)

Tu, S.-J.; Shaw, Chris C; Chen, Lingyun

2006-01-01

We developed a computer noise simulation model for cone beam computed tomography imaging using a general purpose PC cluster. This model uses a mono-energetic x-ray approximation and allows us to investigate three primary performance components, specifically quantum noise, detector blurring and additive system noise. A parallel random number generator based on the Weyl sequence was implemented in the noise simulation and a visualization technique was accordingly developed to validate the quality of the parallel random number generator. In our computer simulation model, three-dimensional (3D) phantoms were mathematically modelled and used to create 450 analytical projections, which were then sampled into digital image data. Quantum noise was simulated and added to the analytical projection image data, which were then filtered to incorporate flat panel detector blurring. Additive system noise was generated and added to form the final projection images. The Feldkamp algorithm was implemented and used to reconstruct the 3D images of the phantoms. A 24 dual-Xeon PC cluster was used to compute the projections and reconstructed images in parallel with each CPU processing 10 projection views for a total of 450 views. Based on this computer simulation system, simulated cone beam CT images were generated for various phantoms and technique settings. Noise power spectra for the flat panel x-ray detector and reconstructed images were then computed to characterize the noise properties. As an example among the potential applications of our noise simulation model, we showed that images of low contrast objects can be produced and used for image quality evaluation
Energy-Efficient FPGA-Based Parallel Quasi-Stochastic Computing

Directory of Open Access Journals (Sweden)

Ramu Seva

2017-11-01

Full Text Available The high performance of FPGA (Field Programmable Gate Array in image processing applications is justified by its flexible reconfigurability, its inherent parallel nature and the availability of a large amount of internal memories. Lately, the Stochastic Computing (SC paradigm has been found to be significantly advantageous in certain application domains including image processing because of its lower hardware complexity and power consumption. However, its viability is deemed to be limited due to its serial bitstream processing and excessive run-time requirement for convergence. To address these issues, a novel approach is proposed in this work where an energy-efficient implementation of SC is accomplished by introducing fast-converging Quasi-Stochastic Number Generators (QSNGs and parallel stochastic bitstream processing, which are well suited to leverage FPGA’s reconfigurability and abundant internal memory resources. The proposed approach has been tested on the Virtex-4 FPGA, and results have been compared with the serial and parallel implementations of conventional stochastic computation using the well-known SC edge detection and multiplication circuits. Results prove that by using this approach, execution time, as well as the power consumption are decreased by a factor of 3.5 and 4.5 for the edge detection circuit and multiplication circuit, respectively.
Parallel processing of neutron transport in fuel assembly calculation

International Nuclear Information System (INIS)

Song, Jae Seung

1992-02-01

Group constants, which are used for reactor analyses by nodal method, are generated by fuel assembly calculations based on the neutron transport theory, since one or a quarter of the fuel assembly corresponds to a unit mesh in the current nodal calculation. The group constant calculation for a fuel assembly is performed through spectrum calculations, a two-dimensional fuel assembly calculation, and depletion calculations. The purpose of this study is to develop a parallel algorithm to be used in a parallel processor for the fuel assembly calculation and the depletion calculations of the group constant generation. A serial program, which solves the neutron integral transport equation using the transmission probability method and the linear depletion equation, was prepared and verified by a benchmark calculation. Small changes from the serial program was enough to parallelize the depletion calculation which has inherent parallel characteristics. In the fuel assembly calculation, however, efficient parallelization is not simple and easy because of the many coupling parameters in the calculation and data communications among CPU's. In this study, the group distribution method is introduced for the parallel processing of the fuel assembly calculation to minimize the data communications. The parallel processing was performed on Quadputer with 4 CPU's operating in NURAD Lab. at KAIST. Efficiencies of 54.3 % and 78.0 % were obtained in the fuel assembly calculation and depletion calculation, respectively, which lead to the overall speedup of about 2.5. As a result, it is concluded that the computing time consumed for the group constant generation can be easily reduced by parallel processing on the parallel computer with small size CPU's
Control of parallel-connected bidirectional AC-DC converters in stationary frame for microgrid application

DEFF Research Database (Denmark)

Lu, Xiaonan; Guerrero, Josep M.; Teodorescu, Remus

2011-01-01

With the penetration of renewable energy in modern power system, microgrid has become a popular application worldwide. In this paper, parallel-connected bidirectional converters for AC and DC hybrid microgrid application are proposed as an efficient interface. To reach the goal of bidirectional...... power conversion, both rectifier and inverter modes are analyzed. In order to achieve high performance operation, hierarchical control system is accomplished. The control system is designed in stationary frame, with harmonic compensation in parallel and no coupled terms between axes. In this control...
Parallel Computing for Brain Simulation.

Science.gov (United States)

Pastur-Romay, L A; Porto-Pazos, A B; Cedron, F; Pazos, A

2017-01-01

The human brain is the most complex system in the known universe, it is therefore one of the greatest mysteries. It provides human beings with extraordinary abilities. However, until now it has not been understood yet how and why most of these abilities are produced. For decades, researchers have been trying to make computers reproduce these abilities, focusing on both understanding the nervous system and, on processing data in a more efficient way than before. Their aim is to make computers process information similarly to the brain. Important technological developments and vast multidisciplinary projects have allowed creating the first simulation with a number of neurons similar to that of a human brain. This paper presents an up-to-date review about the main research projects that are trying to simulate and/or emulate the human brain. They employ different types of computational models using parallel computing: digital models, analog models and hybrid models. This review includes the current applications of these works, as well as future trends. It is focused on various works that look for advanced progress in Neuroscience and still others which seek new discoveries in Computer Science (neuromorphic hardware, machine learning techniques). Their most outstanding characteristics are summarized and the latest advances and future plans are presented. In addition, this review points out the importance of considering not only neurons: Computational models of the brain should also include glial cells, given the proven importance of astrocytes in information processing. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
A hybrid method for the parallel computation of Green's functions

DEFF Research Database (Denmark)

Petersen, Dan Erik; Li, Song; Stokbro, Kurt

2009-01-01

of the large number of times this calculation needs to be performed, this is computationally very expensive even on supercomputers. The classical approach is based on recurrence formulas which cannot be efficiently parallelized. This practically prevents the solution of large problems with hundreds...... of thousands of atoms. We propose new recurrences for a general class of sparse matrices to calculate Green's and lesser Green's function matrices which extend formulas derived by Takahashi and others. We show that these recurrences may lead to a dramatically reduced computational cost because they only...... require computing a small number of entries of the inverse matrix. Then. we propose a parallelization strategy for block tridiagonal matrices which involves a combination of Schur complement calculations and cyclic reduction. It achieves good scalability even on problems of modest size....
Automatic mesh generation with QMESH program

International Nuclear Information System (INIS)

Ise, Takeharu; Tsutsui, Tsuneo

1977-05-01

Usage of the two-dimensional self-organizing mesh generation program, QMESH, is presented together with the descriptions and the experience, as it has recently been converted and reconstructed from the NEACPL version to the FACOM. The program package consists of the QMESH code to generate quadrilaterial meshes with smoothing techniques, the QPLOT code to plot the data obtained from the QMESH on the graphic COM, and the RENUM code to renumber the meshes by using a bandwidth minimization procedure. The technique of mesh reconstructuring coupled with smoothing techniques is especially useful when one generates the meshes for computer codes based on the finite element method. Several typical examples are given for easy access to the QMESH program, which is registered in the R.B-disks of JAERI for users. (auth.)
Generation of hybrid meshes for the simulation of petroleum reservoirs; Generation de maillages hybrides pour la simulation de reservoirs petroliers

Energy Technology Data Exchange (ETDEWEB)

Balaven-Clermidy, S.

2001-12-01

Oil reservoir simulations study multiphase flows in porous media. These flows are described and evaluated through numerical schemes on a discretization of the reservoir domain. In this thesis, we were interested in this spatial discretization and a new kind of hybrid mesh has been proposed where the radial nature of flows in the vicinity of wells is directly taken into account in the geometry. Our modular approach described wells and their drainage area through radial circular meshes. These well meshes are inserted in a structured reservoir mesh (a Corner Point Geometry mesh) made up with hexahedral cells. Finally, in order to generate a global conforming mesh, proper connections are realized between the different kinds of meshes through unstructured transition ones. To compute these transition meshes that we want acceptable in terms of finite volume methods, an automatic method based on power diagrams has been developed. Our approach can deal with a homogeneous anisotropic medium and allows the user to insert vertical or horizontal wells as well as secondary faults in the reservoir mesh. Our work has been implemented, tested and validated in 2D and 2D1/2. It can also be extended in 3D when the geometrical constraints are simplicial ones: points, segments and triangles. (author)
A parallel finite element procedure for contact-impact problems using edge-based smooth triangular element and GPU

Science.gov (United States)

Cai, Yong; Cui, Xiangyang; Li, Guangyao; Liu, Wenyang

2018-04-01

The edge-smooth finite element method (ES-FEM) can improve the computational accuracy of triangular shell elements and the mesh partition efficiency of complex models. In this paper, an approach is developed to perform explicit finite element simulations of contact-impact problems with a graphical processing unit (GPU) using a special edge-smooth triangular shell element based on ES-FEM. Of critical importance for this problem is achieving finer-grained parallelism to enable efficient data loading and to minimize communication between the device and host. Four kinds of parallel strategies are then developed to efficiently solve these ES-FEM based shell element formulas, and various optimization methods are adopted to ensure aligned memory access. Special focus is dedicated to developing an approach for the parallel construction of edge systems. A parallel hierarchy-territory contact-searching algorithm (HITA) and a parallel penalty function calculation method are embedded in this parallel explicit algorithm. Finally, the program flow is well designed, and a GPU-based simulation system is developed, using Nvidia's CUDA. Several numerical examples are presented to illustrate the high quality of the results obtained with the proposed methods. In addition, the GPU-based parallel computation is shown to significantly reduce the computing time.

Fast Simulation of Large-Scale Floods Based on GPU Parallel Computing

Directory of Open Access Journals (Sweden)

Qiang Liu

2018-05-01

Full Text Available Computing speed is a significant issue of large-scale flood simulations for real-time response to disaster prevention and mitigation. Even today, most of the large-scale flood simulations are generally run on supercomputers due to the massive amounts of data and computations necessary. In this work, a two-dimensional shallow water model based on an unstructured Godunov-type finite volume scheme was proposed for flood simulation. To realize a fast simulation of large-scale floods on a personal computer, a Graphics Processing Unit (GPU-based, high-performance computing method using the OpenACC application was adopted to parallelize the shallow water model. An unstructured data management method was presented to control the data transportation between the GPU and CPU (Central Processing Unit with minimum overhead, and then both computation and data were offloaded from the CPU to the GPU, which exploited the computational capability of the GPU as much as possible. The parallel model was validated using various benchmarks and real-world case studies. The results demonstrate that speed-ups of up to one order of magnitude can be achieved in comparison with the serial model. The proposed parallel model provides a fast and reliable tool with which to quickly assess flood hazards in large-scale areas and, thus, has a bright application prospect for dynamic inundation risk identification and disaster assessment.
Reduced-Order Structure-Preserving Model for Parallel-Connected Three-Phase Grid-Tied Inverters

Energy Technology Data Exchange (ETDEWEB)

Johnson, Brian B [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Purba, Victor [University of Minnesota; Jafarpour, Saber [University of California Santa-Barbara; Bullo, Francesco [University of California Santa-Barbara; Dhople, Sairaj V. [University of Minnesota

2017-08-21

Next-generation power networks will contain large numbers of grid-connected inverters satisfying a significant fraction of system load. Since each inverter model has a relatively large number of dynamic states, it is impractical to analyze complex system models where the full dynamics of each inverter are retained. To address this challenge, we derive a reduced-order structure-preserving model for parallel-connected grid-tied three-phase inverters. Here, each inverter in the system is assumed to have a full-bridge topology, LCL filter at the point of common coupling, and the control architecture for each inverter includes a current controller, a power controller, and a phase-locked loop for grid synchronization. We outline a structure-preserving reduced-order inverter model with lumped parameters for the setting where the parallel inverters are each designed such that the filter components and controller gains scale linearly with the power rating. By structure preserving, we mean that the reduced-order three-phase inverter model is also composed of an LCL filter, a power controller, current controller, and PLL. We show that the system of parallel inverters can be modeled exactly as one aggregated inverter unit and this equivalent model has the same number of dynamical states as any individual inverter in the system. Numerical simulations validate the reduced-order model.
Nyx: Adaptive mesh, massively-parallel, cosmological simulation code

Science.gov (United States)

Almgren, Ann; Beckner, Vince; Friesen, Brian; Lukic, Zarija; Zhang, Weiqun

2017-12-01

Nyx code solves equations of compressible hydrodynamics on an adaptive grid hierarchy coupled with an N-body treatment of dark matter. The gas dynamics in Nyx use a finite volume methodology on an adaptive set of 3-D Eulerian grids; dark matter is represented as discrete particles moving under the influence of gravity. Particles are evolved via a particle-mesh method, using Cloud-in-Cell deposition/interpolation scheme. Both baryonic and dark matter contribute to the gravitational field. In addition, Nyx includes physics for accurately modeling the intergalactic medium; in optically thin limits and assuming ionization equilibrium, the code calculates heating and cooling processes of the primordial-composition gas in an ionizing ultraviolet background radiation field.
In-cylinder diesel spray combustion simulations using parallel computation: A performance benchmarking study

International Nuclear Information System (INIS)

Pang, Kar Mun; Ng, Hoon Kiat; Gan, Suyin

2012-01-01

Highlights: ► A performance benchmarking exercise is conducted for diesel combustion simulations. ► The reduced chemical mechanism shows its advantages over base and skeletal models. ► High efficiency and great reduction of CPU runtime are achieved through 4-node solver. ► Increasing ISAT memory from 0.1 to 2 GB reduces the CPU runtime by almost 35%. ► Combustion and soot processes are predicted well with minimal computational cost. - Abstract: In the present study, in-cylinder diesel combustion simulation was performed with parallel processing on an Intel Xeon Quad-Core platform to allow both fluid dynamics and chemical kinetics of the surrogate diesel fuel model to be solved simultaneously on multiple processors. Here, Cartesian Z-Coordinate was selected as the most appropriate partitioning algorithm since it computationally bisects the domain such that the dynamic load associated with fuel particle tracking was evenly distributed during parallel computations. Other variables examined included number of compute nodes, chemistry sizes and in situ adaptive tabulation (ISAT) parameters. Based on the performance benchmarking test conducted, parallel configuration of 4-compute node was found to reduce the computational runtime most efficiently whereby a parallel efficiency of up to 75.4% was achieved. The simulation results also indicated that accuracy level was insensitive to the number of partitions or the partitioning algorithms. The effect of reducing the number of species on computational runtime was observed to be more significant than reducing the number of reactions. Besides, the study showed that an increase in the ISAT maximum storage of up to 2 GB reduced the computational runtime by 50%. Also, the ISAT error tolerance of 10 −3 was chosen to strike a balance between results accuracy and computational runtime. The optimised parameters in parallel processing and ISAT, as well as the use of the in-house reduced chemistry model allowed accurate
Reduced-Order Structure-Preserving Model for Parallel-Connected Three-Phase Grid-Tied Inverters: Preprint

Energy Technology Data Exchange (ETDEWEB)

Johnson, Brian B [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Purba, Victor [University of Minnesota; Jafarpour, Saber [University of California, Santa Barbara; Bullo, Francesco [University of California, Santa Barbara; Dhople, Sairaj [University of Minnesota

2017-08-31

Given that next-generation infrastructures will contain large numbers of grid-connected inverters and these interfaces will be satisfying a growing fraction of system load, it is imperative to analyze the impacts of power electronics on such systems. However, since each inverter model has a relatively large number of dynamic states, it would be impractical to execute complex system models where the full dynamics of each inverter are retained. To address this challenge, we derive a reduced-order structure-preserving model for parallel-connected grid-tied three-phase inverters. Here, each inverter in the system is assumed to have a full-bridge topology, LCL filter at the point of common coupling, and the control architecture for each inverter includes a current controller, a power controller, and a phase-locked loop for grid synchronization. We outline a structure-preserving reduced-order inverter model for the setting where the parallel inverters are each designed such that the filter components and controller gains scale linearly with the power rating. By structure preserving, we mean that the reduced-order three-phase inverter model is also composed of an LCL filter, a power controller, current controller, and PLL. That is, we show that the system of parallel inverters can be modeled exactly as one aggregated inverter unit and this equivalent model has the same number of dynamical states as an individual inverter in the paralleled system. Numerical simulations validate the reduced-order models.
Teaching Scientific Computing: A Model-Centered Approach to Pipeline and Parallel Programming with C

Directory of Open Access Journals (Sweden)

Vladimiras Dolgopolovas

2015-01-01

Full Text Available The aim of this study is to present an approach to the introduction into pipeline and parallel computing, using a model of the multiphase queueing system. Pipeline computing, including software pipelines, is among the key concepts in modern computing and electronics engineering. The modern computer science and engineering education requires a comprehensive curriculum, so the introduction to pipeline and parallel computing is the essential topic to be included in the curriculum. At the same time, the topic is among the most motivating tasks due to the comprehensive multidisciplinary and technical requirements. To enhance the educational process, the paper proposes a novel model-centered framework and develops the relevant learning objects. It allows implementing an educational platform of constructivist learning process, thus enabling learners’ experimentation with the provided programming models, obtaining learners’ competences of the modern scientific research and computational thinking, and capturing the relevant technical knowledge. It also provides an integral platform that allows a simultaneous and comparative introduction to pipelining and parallel computing. The programming language C for developing programming models and message passing interface (MPI and OpenMP parallelization tools have been chosen for implementation.
Computational acceleration for MR image reconstruction in partially parallel imaging.

Science.gov (United States)

Ye, Xiaojing; Chen, Yunmei; Huang, Feng

2011-05-01

In this paper, we present a fast numerical algorithm for solving total variation and l(1) (TVL1) based image reconstruction with application in partially parallel magnetic resonance imaging. Our algorithm uses variable splitting method to reduce computational cost. Moreover, the Barzilai-Borwein step size selection method is adopted in our algorithm for much faster convergence. Experimental results on clinical partially parallel imaging data demonstrate that the proposed algorithm requires much fewer iterations and/or less computational cost than recently developed operator splitting and Bregman operator splitting methods, which can deal with a general sensing matrix in reconstruction framework, to get similar or even better quality of reconstructed images.
Electromagnetic Physics Models for Parallel Computing Architectures

International Nuclear Information System (INIS)

Amadio, G; Bianchini, C; Iope, R; Ananya, A; Apostolakis, J; Aurora, A; Bandieramonte, M; Brun, R; Carminati, F; Gheata, A; Gheata, M; Goulas, I; Nikitina, T; Bhattacharyya, A; Mohanty, A; Canal, P; Elvira, D; Jun, S Y; Lima, G; Duhem, L

2016-01-01

The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well. (paper)
Three dimensional adaptive mesh refinement on a spherical shell for atmospheric models with lagrangian coordinates

Science.gov (United States)

Penner, Joyce E.; Andronova, Natalia; Oehmke, Robert C.; Brown, Jonathan; Stout, Quentin F.; Jablonowski, Christiane; van Leer, Bram; Powell, Kenneth G.; Herzog, Michael

2007-07-01

One of the most important advances needed in global climate models is the development of atmospheric General Circulation Models (GCMs) that can reliably treat convection. Such GCMs require high resolution in local convectively active regions, both in the horizontal and vertical directions. During previous research we have developed an Adaptive Mesh Refinement (AMR) dynamical core that can adapt its grid resolution horizontally. Our approach utilizes a finite volume numerical representation of the partial differential equations with floating Lagrangian vertical coordinates and requires resolving dynamical processes on small spatial scales. For the latter it uses a newly developed general-purpose library, which facilitates 3D block-structured AMR on spherical grids. The library manages neighbor information as the blocks adapt, and handles the parallel communication and load balancing, freeing the user to concentrate on the scientific modeling aspects of their code. In particular, this library defines and manages adaptive blocks on the sphere, provides user interfaces for interpolation routines and supports the communication and load-balancing aspects for parallel applications. We have successfully tested the library in a 2-D (longitude-latitude) implementation. During the past year, we have extended the library to treat adaptive mesh refinement in the vertical direction. Preliminary results are discussed. This research project is characterized by an interdisciplinary approach involving atmospheric science, computer science and mathematical/numerical aspects. The work is done in close collaboration between the Atmospheric Science, Computer Science and Aerospace Engineering Departments at the University of Michigan and NOAA GFDL.
Three dimensional adaptive mesh refinement on a spherical shell for atmospheric models with lagrangian coordinates

International Nuclear Information System (INIS)

Penner, Joyce E; Andronova, Natalia; Oehmke, Robert C; Brown, Jonathan; Stout, Quentin F; Jablonowski, Christiane; Leer, Bram van; Powell, Kenneth G; Herzog, Michael

2007-01-01

One of the most important advances needed in global climate models is the development of atmospheric General Circulation Models (GCMs) that can reliably treat convection. Such GCMs require high resolution in local convectively active regions, both in the horizontal and vertical directions. During previous research we have developed an Adaptive Mesh Refinement (AMR) dynamical core that can adapt its grid resolution horizontally. Our approach utilizes a finite volume numerical representation of the partial differential equations with floating Lagrangian vertical coordinates and requires resolving dynamical processes on small spatial scales. For the latter it uses a newly developed general-purpose library, which facilitates 3D block-structured AMR on spherical grids. The library manages neighbor information as the blocks adapt, and handles the parallel communication and load balancing, freeing the user to concentrate on the scientific modeling aspects of their code. In particular, this library defines and manages adaptive blocks on the sphere, provides user interfaces for interpolation routines and supports the communication and load-balancing aspects for parallel applications. We have successfully tested the library in a 2-D (longitude-latitude) implementation. During the past year, we have extended the library to treat adaptive mesh refinement in the vertical direction. Preliminary results are discussed. This research project is characterized by an interdisciplinary approach involving atmospheric science, computer science and mathematical/numerical aspects. The work is done in close collaboration between the Atmospheric Science, Computer Science and Aerospace Engineering Departments at the University of Michigan and NOAA GFDL
Connecting numeric models

International Nuclear Information System (INIS)

Caremoli, C.; Erhard, P.

1996-01-01

Computerized simulation uses calculation codes whose validation is reliable. Reactor simulators should take greater advantage of latest computer technology impact, in particular in the field of parallel processing. Instead of creating more global simulation codes whose validation might be a problem, connecting several existing codes should be a promising solution. (D.L.). 3 figs
A virtual surgical training system that simulates cutting of soft tissue using a modified pre-computed elastic model.

Science.gov (United States)

Toe, Kyaw Kyar; Huang, Weimin; Yang, Tao; Duan, Yuping; Zhou, Jiayin; Su, Yi; Teo, Soo-Kng; Kumar, Selvaraj Senthil; Lim, Calvin Chi-Wan; Chui, Chee Kong; Chang, Stephen

2015-08-01

This work presents a surgical training system that incorporates cutting operation of soft tissue simulated based on a modified pre-computed linear elastic model in the Simulation Open Framework Architecture (SOFA) environment. A precomputed linear elastic model used for the simulation of soft tissue deformation involves computing the compliance matrix a priori based on the topological information of the mesh. While this process may require a few minutes to several hours, based on the number of vertices in the mesh, it needs only to be computed once and allows real-time computation of the subsequent soft tissue deformation. However, as the compliance matrix is based on the initial topology of the mesh, it does not allow any topological changes during simulation, such as cutting or tearing of the mesh. This work proposes a way to modify the pre-computed data by correcting the topological connectivity in the compliance matrix, without re-computing the compliance matrix which is computationally expensive.
Parallel, distributed and GPU computing technologies in single-particle electron microscopy.

Science.gov (United States)

Schmeisser, Martin; Heisen, Burkhard C; Luettich, Mario; Busche, Boris; Hauer, Florian; Koske, Tobias; Knauber, Karl-Heinz; Stark, Holger

2009-07-01

Most known methods for the determination of the structure of macromolecular complexes are limited or at least restricted at some point by their computational demands. Recent developments in information technology such as multicore, parallel and GPU processing can be used to overcome these limitations. In particular, graphics processing units (GPUs), which were originally developed for rendering real-time effects in computer games, are now ubiquitous and provide unprecedented computational power for scientific applications. Each parallel-processing paradigm alone can improve overall performance; the increased computational performance obtained by combining all paradigms, unleashing the full power of today's technology, makes certain applications feasible that were previously virtually impossible. In this article, state-of-the-art paradigms are introduced, the tools and infrastructure needed to apply these paradigms are presented and a state-of-the-art infrastructure and solution strategy for moving scientific applications to the next generation of computer hardware is outlined.
Managing internode data communications for an uninitialized process in a parallel computer

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Parker, Jeffrey J; Ratterman, Joseph D; Smith, Brian E

2014-05-20

A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior to initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory.
Optical Interconnection Via Computer-Generated Holograms

Science.gov (United States)

Liu, Hua-Kuang; Zhou, Shaomin

1995-01-01

Method of free-space optical interconnection developed for data-processing applications like parallel optical computing, neural-network computing, and switching in optical communication networks. In method, multiple optical connections between multiple sources of light in one array and multiple photodetectors in another array made via computer-generated holograms in electrically addressed spatial light modulators (ESLMs). Offers potential advantages of massive parallelism, high space-bandwidth product, high time-bandwidth product, low power consumption, low cross talk, and low time skew. Also offers advantage of programmability with flexibility of reconfiguration, including variation of strengths of optical connections in real time.
NeuroTessMesh: A Tool for the Generation and Visualization of Neuron Meshes and Adaptive On-the-Fly Refinement

Directory of Open Access Journals (Sweden)

Juan J. Garcia-Cantero

2017-06-01

Full Text Available Gaining a better understanding of the human brain continues to be one of the greatest challenges for science, largely because of the overwhelming complexity of the brain and the difficulty of analyzing the features and behavior of dense neural networks. Regarding analysis, 3D visualization has proven to be a useful tool for the evaluation of complex systems. However, the large number of neurons in non-trivial circuits, together with their intricate geometry, makes the visualization of a neuronal scenario an extremely challenging computational problem. Previous work in this area dealt with the generation of 3D polygonal meshes that approximated the cells’ overall anatomy but did not attempt to deal with the extremely high storage and computational cost required to manage a complex scene. This paper presents NeuroTessMesh, a tool specifically designed to cope with many of the problems associated with the visualization of neural circuits that are comprised of large numbers of cells. In addition, this method facilitates the recovery and visualization of the 3D geometry of cells included in databases, such as NeuroMorpho, and provides the tools needed to approximate missing information such as the soma’s morphology. This method takes as its only input the available compact, yet incomplete, morphological tracings of the cells as acquired by neuroscientists. It uses a multiresolution approach that combines an initial, coarse mesh generation with subsequent on-the-fly adaptive mesh refinement stages using tessellation shaders. For the coarse mesh generation, a novel approach, based on the Finite Element Method, allows approximation of the 3D shape of the soma from its incomplete description. Subsequently, the adaptive refinement process performed in the graphic card generates meshes that provide good visual quality geometries at a reasonable computational cost, both in terms of memory and rendering time. All the described techniques have been
Resonance Interaction of Multi-Parallel Grid-Connected Inverters with LCL Filter

DEFF Research Database (Denmark)

Lu, Minghui; Wang, Xiongfei; Loh, Poh Chiang

2017-01-01

This letter investigates the resonance characteristics and stability problem caused by the interactions of multiparallel LCL-filtered inverters. Compared to single grid-connected inverter, the multiinverter system presents a more challenging resonance issue, where the inverter interactions may...... excite multiple resonances at various frequencies. This letter proposes a modeling and analysis method based on the current separation scheme. It reveals that an interactive resonant current that circulates between the paralleled three-phase inverters may arise, depending on the current distribution...
Domain Immersion Technique And Free Surface Computations Applied To Extrusion And Mixing Processes

Science.gov (United States)

Valette, Rudy; Vergnes, Bruno; Basset, Olivier; Coupez, Thierry

2007-04-01

This work focuses on the development of numerical techniques devoted to the simulation of mixing processes of complex fluids such as twin-screw extrusion or batch mixing. In mixing process simulation, the absence of symmetry of the moving boundaries (the screws or the rotors) implies that their rigid body motion has to be taken into account by using a special treatment. We therefore use a mesh immersion technique (MIT), which consists in using a P1+/P1-based (MINI-element) mixed finite element method for solving the velocity-pressure problem and then solving the problem in the whole barrel cavity by imposing a rigid motion (rotation) to nodes found located inside the so called immersed domain, each subdomain (screw, rotor) being represented by a surface CAD mesh (or its mathematical equation in simple cases). The independent meshes are immersed into a unique backgound computational mesh by computing the distance function to their boundaries. Intersections of meshes are accounted for, allowing to compute a fill factor usable as for the VOF methodology. This technique, combined with the use of parallel computing, allows to compute the time-dependent flow of generalized Newtonian fluids including yield stress fluids in a complex system such as a twin screw extruder, including moving free surfaces, which are treated by a "level set" and Hamilton-Jacobi method.
An Implementation of Parallel and Networked Computing Schemes for the Real-Time Image Reconstruction Based on Electrical Tomography

International Nuclear Information System (INIS)

Park, Sook Hee

2001-02-01

This thesis implements and analyzes the parallel and networked computing libraries based on the multiprocessor computer architecture as well as networked computers, aiming at improving the computation speed of ET(Electrical Tomography) system which requires enormous CPU time in reconstructing the unknown internal state of the target object. As an instance of the typical tomography technology, ET partitions the cross-section of the target object into the tiny elements and calculates the resistivity of them with signal values measured at the boundary electrodes surrounding the surface of the object after injecting the predetermined current pattern through the object. The number of elements is determined considering the trade-off between the accuracy of the reconstructed image and the computation time. As the elements become more finer, the number of element increases, and the system can get the better image. However, the reconstruction time increases polynomially with the number of partitioned elements since the procedure consists of a number of time consuming matrix operations such as multiplication, inverse, pseudo inverse, Jacobian and so on. Consequently, the demand for improving computation speed via multiple processor grows indispensably. Moreover, currently released PCs can be stuffed with up to 4 CPUs interconnected to the shared memory while some operating systems enable the application process to benefit from such computer by allocating the threaded job to each CPU, resulting in concurrent processing. In addition, a networked computing or cluster computing environment is commonly available to almost every computer which contains communication protocol and is connected to local or global network. After partitioning the given job(numerical operation), each CPU or computer calculates the partial result independently, and the results are merged via common memory to produce the final result. It is desirable to adopt the commonly used library such as Matlab to
Connection machine: a computer architecture based on cellular automata

Energy Technology Data Exchange (ETDEWEB)

Hillis, W D

1984-01-01

This paper describes the connection machine, a programmable computer based on cellular automata. The essential idea behind the connection machine is that a regular locally-connected cellular array can be made to behave as if the processing cells are connected into any desired topology. When the topology of the machine is chosen to match the topology of the application program, the result is a fast, powerful computing engine. The connection machine was originally designed to implement knowledge retrieval operations in artificial intelligence programs, but the hardware and the programming techniques are apparently applicable to a much larger class of problems. A machine with 100000 processing cells is currently being constructed. 27 references.

AUTOMATIC MESH GENERATION OF 3—D GEOMETRIC MODELS

Institute of Scientific and Technical Information of China (English)

刘剑飞

2003-01-01

In this paper the presentation of the ball-packing method is reviewed, and a schemeto generate mesh for complex 3-D geometric models is given, which consists of 4 steps: (1) createnodes in 3-D models by ball-packing method, (2) connect nodes to generate mesh by 3-D Delaunaytriangulation, (3) retrieve the boundary of the model after Delaunay triangulation, (4) improve themesh.
Efficient Parallel Engineering Computing on Linux Workstations

Science.gov (United States)

Lou, John Z.

2010-01-01

A C software module has been developed that creates lightweight processes (LWPs) dynamically to achieve parallel computing performance in a variety of engineering simulation and analysis applications to support NASA and DoD project tasks. The required interface between the module and the application it supports is simple, minimal and almost completely transparent to the user applications, and it can achieve nearly ideal computing speed-up on multi-CPU engineering workstations of all operating system platforms. The module can be integrated into an existing application (C, C++, Fortran and others) either as part of a compiled module or as a dynamically linked library (DLL).
DVS-SOFTWARE: An Effective Tool for Applying Highly Parallelized Hardware To Computational Geophysics

Science.gov (United States)

Herrera, I.; Herrera, G. S.

2015-12-01

Most geophysical systems are macroscopic physical systems. The behavior prediction of such systems is carried out by means of computational models whose basic models are partial differential equations (PDEs) [1]. Due to the enormous size of the discretized version of such PDEs it is necessary to apply highly parallelized super-computers. For them, at present, the most efficient software is based on non-overlapping domain decomposition methods (DDM). However, a limiting feature of the present state-of-the-art techniques is due to the kind of discretizations used in them. Recently, I. Herrera and co-workers using 'non-overlapping discretizations' have produced the DVS-Software which overcomes this limitation [2]. The DVS-software can be applied to a great variety of geophysical problems and achieves very high parallel efficiencies (90%, or so [3]). It is therefore very suitable for effectively applying the most advanced parallel supercomputers available at present. In a parallel talk, in this AGU Fall Meeting, Graciela Herrera Z. will present how this software is being applied to advance MOD-FLOW. Key Words: Parallel Software for Geophysics, High Performance Computing, HPC, Parallel Computing, Domain Decomposition Methods (DDM)REFERENCES [1]. Herrera Ismael and George F. Pinder, Mathematical Modelling in Science and Engineering: An axiomatic approach", John Wiley, 243p., 2012. [2]. Herrera, I., de la Cruz L.M. and Rosas-Medina A. "Non Overlapping Discretization Methods for Partial, Differential Equations". NUMER METH PART D E, 30: 1427-1454, 2014, DOI 10.1002/num 21852. (Open source) [3]. Herrera, I., & Contreras Iván "An Innovative Tool for Effectively Applying Highly Parallelized Software To Problems of Elasticity". Geofísica Internacional, 2015 (In press)
Computational performance of Free Mesh Method applied to continuum mechanics problems

Science.gov (United States)

YAGAWA, Genki

2011-01-01

The free mesh method (FMM) is a kind of the meshless methods intended for particle-like finite element analysis of problems that are difficult to handle using global mesh generation, or a node-based finite element method that employs a local mesh generation technique and a node-by-node algorithm. The aim of the present paper is to review some unique numerical solutions of fluid and solid mechanics by employing FMM as well as the Enriched Free Mesh Method (EFMM), which is a new version of FMM, including compressible flow and sounding mechanism in air-reed instruments as applications to fluid mechanics, and automatic remeshing for slow crack growth, dynamic behavior of solid as well as large-scale Eigen-frequency of engine block as applications to solid mechanics. PMID:21558753
Applications of parallel computer architectures to the real-time simulation of nuclear power systems

International Nuclear Information System (INIS)

Doster, J.M.; Sills, E.D.

1988-01-01

In this paper the authors report on efforts to utilize parallel computer architectures for the thermal-hydraulic simulation of nuclear power systems and current research efforts toward the development of advanced reactor operator aids and control systems based on this new technology. Many aspects of reactor thermal-hydraulic calculations are inherently parallel, and the computationally intensive portions of these calculations can be effectively implemented on modern computers. Timing studies indicate faster-than-real-time, high-fidelity physics models can be developed when the computational algorithms are designed to take advantage of the computer's architecture. These capabilities allow for the development of novel control systems and advanced reactor operator aids. Coupled with an integral real-time data acquisition system, evolving parallel computer architectures can provide operators and control room designers improved control and protection capabilities. Current research efforts are currently under way in this area
Toward An Unstructured Mesh Database

Science.gov (United States)

Rezaei Mahdiraji, Alireza; Baumann, Peter Peter

2014-05-01

Unstructured meshes are used in several application domains such as earth sciences (e.g., seismology), medicine, oceanography, cli- mate modeling, GIS as approximate representations of physical objects. Meshes subdivide a domain into smaller geometric elements (called cells) which are glued together by incidence relationships. The subdivision of a domain allows computational manipulation of complicated physical structures. For instance, seismologists model earthquakes using elastic wave propagation solvers on hexahedral meshes. The hexahedral con- tains several hundred millions of grid points and millions of hexahedral cells. Each vertex node in the hexahedrals stores a multitude of data fields. To run simulation on such meshes, one needs to iterate over all the cells, iterate over incident cells to a given cell, retrieve coordinates of cells, assign data values to cells, etc. Although meshes are used in many application domains, to the best of our knowledge there is no database vendor that support unstructured mesh features. Currently, the main tool for querying and manipulating unstructured meshes are mesh libraries, e.g., CGAL and GRAL. Mesh li- braries are dedicated libraries which includes mesh algorithms and can be run on mesh representations. The libraries do not scale with dataset size, do not have declarative query language, and need deep C++ knowledge for query implementations. Furthermore, due to high coupling between the implementations and input file structure, the implementations are less reusable and costly to maintain. A dedicated mesh database offers the following advantages: 1) declarative querying, 2) ease of maintenance, 3) hiding mesh storage structure from applications, and 4) transparent query optimization. To design a mesh database, the first challenge is to define a suitable generic data model for unstructured meshes. We proposed ImG-Complexes data model as a generic topological mesh data model which extends incidence graph model to multi
Procedure for the automatic mesh generation of innovative gear teeth

Directory of Open Access Journals (Sweden)

Radicella Andrea Chiaramonte

2016-01-01

Full Text Available After having described gear wheels with teeth having the two sides constituted by different involutes and their importance in engineering applications, we stress the need for an efficient procedure for the automatic mesh generation of innovative gear teeth. First, we describe the procedure for the subdivision of the tooth profile in the various possible cases, then we show the method for creating the subdivision mesh, defined by two series of curves called meridians and parallels. Finally, we describe how the above procedure for automatic mesh generation is able to solve specific cases that may arise when dealing with teeth having the two sides constituted by different involutes.
HTMT-class Latency Tolerant Parallel Architecture for Petaflops Scale Computation

Science.gov (United States)

Sterling, Thomas; Bergman, Larry

2000-01-01

Computational Aero Sciences and other numeric intensive computation disciplines demand computing throughputs substantially greater than the Teraflops scale systems only now becoming available. The related fields of fluids, structures, thermal, combustion, and dynamic controls are among the interdisciplinary areas that in combination with sufficient resolution and advanced adaptive techniques may force performance requirements towards Petaflops. This will be especially true for compute intensive models such as Navier-Stokes are or when such system models are only part of a larger design optimization computation involving many design points. Yet recent experience with conventional MPP configurations comprising commodity processing and memory components has shown that larger scale frequently results in higher programming difficulty and lower system efficiency. While important advances in system software and algorithms techniques have had some impact on efficiency and programmability for certain classes of problems, in general it is unlikely that software alone will resolve the challenges to higher scalability. As in the past, future generations of high-end computers may require a combination of hardware architecture and system software advances to enable efficient operation at a Petaflops level. The NASA led HTMT project has engaged the talents of a broad interdisciplinary team to develop a new strategy in high-end system architecture to deliver petaflops scale computing in the 2004/5 timeframe. The Hybrid-Technology, MultiThreaded parallel computer architecture incorporates several advanced technologies in combination with an innovative dynamic adaptive scheduling mechanism to provide unprecedented performance and efficiency within practical constraints of cost, complexity, and power consumption. The emerging superconductor Rapid Single Flux Quantum electronics can operate at 100 GHz (the record is 770 GHz) and one percent of the power required by convention
Computation and parallel implementation for early vision

Science.gov (United States)

Gualtieri, J. Anthony

1990-01-01

The problem of early vision is to transform one or more retinal illuminance images-pixel arrays-to image representations built out of such primitive visual features such as edges, regions, disparities, and clusters. These transformed representations form the input to later vision stages that perform higher level vision tasks including matching and recognition. Researchers developed algorithms for: (1) edge finding in the scale space formulation; (2) correlation methods for computing matches between pairs of images; and (3) clustering of data by neural networks. These algorithms are formulated for parallel implementation of SIMD machines, such as the Massively Parallel Processor, a 128 x 128 array processor with 1024 bits of local memory per processor. For some cases, researchers can show speedups of three orders of magnitude over serial implementations.
Parallel, distributed and GPU computing technologies in single-particle electron microscopy

International Nuclear Information System (INIS)

Schmeisser, Martin; Heisen, Burkhard C.; Luettich, Mario; Busche, Boris; Hauer, Florian; Koske, Tobias; Knauber, Karl-Heinz; Stark, Holger

2009-01-01

An introduction to the current paradigm shift towards concurrency in software. Most known methods for the determination of the structure of macromolecular complexes are limited or at least restricted at some point by their computational demands. Recent developments in information technology such as multicore, parallel and GPU processing can be used to overcome these limitations. In particular, graphics processing units (GPUs), which were originally developed for rendering real-time effects in computer games, are now ubiquitous and provide unprecedented computational power for scientific applications. Each parallel-processing paradigm alone can improve overall performance; the increased computational performance obtained by combining all paradigms, unleashing the full power of today’s technology, makes certain applications feasible that were previously virtually impossible. In this article, state-of-the-art paradigms are introduced, the tools and infrastructure needed to apply these paradigms are presented and a state-of-the-art infrastructure and solution strategy for moving scientific applications to the next generation of computer hardware is outlined
Shear Alignment of Diblock Copolymers for Patterning Nanowire Meshes

Energy Technology Data Exchange (ETDEWEB)

Gustafson, Kyle T. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2016-09-08

Metallic nanowire meshes are useful as cheap, flexible alternatives to indium tin oxide – an expensive, brittle material used in transparent conductive electrodes. We have fabricated nanowire meshes over areas up to 2.5 cm² by: 1) mechanically aligning parallel rows of diblock copolymer (diBCP) microdomains; 2) selectively infiltrating those domains with metallic ions; 3) etching away the diBCP template; 4) sintering to reduce ions to metal nanowires; and, 5) repeating steps 1 – 4 on the same sample at a 90° offset. We aligned parallel rows of polystyrene-b-poly(2-vinylpyridine) [PS(48.5 kDa)-b-P2VP(14.5 kDa)] microdomains by heating above its glass transition temperature (T_g ≈ 100°C), applying mechanical shear pressure (33 kPa) and normal force (13.7 N), and cooling below T_g. DiBCP samples were submerged in aqueous solutions of metallic ions (15 – 40 mM ions; 0.1 – 0.5 M HCl) for 30 – 90 minutes, which coordinate to nitrogen in P2VP. Subsequent ozone-etching and sintering steps yielded parallel nanowires. We aimed to optimize alignment parameters (e.g. shear and normal pressures, alignment duration, and PDMS thickness) to improve the quality, reproducibility, and scalability of meshes. We also investigated metals other than Pt and Au that may be patterned using this technique (Cu, Ag).
Characterization of the mechanism of drug-drug interactions from PubMed using MeSH terms.

Directory of Open Access Journals (Sweden)

Yin Lu

Full Text Available Identifying drug-drug interaction (DDI is an important topic for the development of safe pharmaceutical drugs and for the optimization of multidrug regimens for complex diseases such as cancer and HIV. There have been about 150,000 publications on DDIs in PubMed, which is a great resource for DDI studies. In this paper, we introduced an automatic computational method for the systematic analysis of the mechanism of DDIs using MeSH (Medical Subject Headings terms from PubMed literature. MeSH term is a controlled vocabulary thesaurus developed by the National Library of Medicine for indexing and annotating articles. Our method can effectively identify DDI-relevant MeSH terms such as drugs, proteins and phenomena with high accuracy. The connections among these MeSH terms were investigated by using co-occurrence heatmaps and social network analysis. Our approach can be used to visualize relationships of DDI terms, which has the potential to help users better understand DDIs. As the volume of PubMed records increases, our method for automatic analysis of DDIs from the PubMed database will become more accurate.
6th International Meshing Roundtable '97

Energy Technology Data Exchange (ETDEWEB)

White, D.

1997-09-01

The goal of the 6th International Meshing Roundtable is to bring together researchers and developers from industry, academia, and government labs in a stimulating, open environment for the exchange of technical information related to the meshing process. In the pas~ the Roundtable has enjoyed significant participation born each of these groups from a wide variety of countries. The Roundtable will consist of technical presentations from contributed papers and abstracts, two invited speakers, and two invited panels of experts discussing topics related to the development and use of automatic mesh generation tools. In addition, this year we will feature a "Bring Your Best Mesh" competition and poster session to encourage discussion and participation from a wide variety of mesh generation tool users. The schedule and evening social events are designed to provide numerous opportunities for informal dialog. A proceedings will be published by Sandia National Laboratories and distributed at the Roundtable. In addition, papers of exceptionally high quaIity will be submitted to a special issue of the International Journal of Computational Geometry and Applications. Papers and one page abstracts were sought that present original results on the meshing process. Potential topics include but are got limited to: Unstructured triangular and tetrahedral mesh generation Unstructured quadrilateral and hexahedral mesh generation Automated blocking and structured mesh generation Mixed element meshing Surface mesh generation Geometry decomposition and clean-up techniques Geometry modification techniques related to meshing Adaptive mesh refinement and mesh quality control Mesh visualization Special purpose meshing algorithms for particular applications Theoretical or novel ideas with practical potential Technical presentations from industrial researchers.
Semi-coarsening multigrid methods for parallel computing

Energy Technology Data Exchange (ETDEWEB)

Jones, J.E.

1996-12-31

Standard multigrid methods are not well suited for problems with anisotropic coefficients which can occur, for example, on grids that are stretched to resolve a boundary layer. There are several different modifications of the standard multigrid algorithm that yield efficient methods for anisotropic problems. In the paper, we investigate the parallel performance of these multigrid algorithms. Multigrid algorithms which work well for anisotropic problems are based on line relaxation and/or semi-coarsening. In semi-coarsening multigrid algorithms a grid is coarsened in only one of the coordinate directions unlike standard or full-coarsening multigrid algorithms where a grid is coarsened in each of the coordinate directions. When both semi-coarsening and line relaxation are used, the resulting multigrid algorithm is robust and automatic in that it requires no knowledge of the nature of the anisotropy. This is the basic multigrid algorithm whose parallel performance we investigate in the paper. The algorithm is currently being implemented on an IBM SP2 and its performance is being analyzed. In addition to looking at the parallel performance of the basic semi-coarsening algorithm, we present algorithmic modifications with potentially better parallel efficiency. One modification reduces the amount of computational work done in relaxation at the expense of using multiple coarse grids. This modification is also being implemented with the aim of comparing its performance to that of the basic semi-coarsening algorithm.
Design and implementation of a topology control scheme for wireless mesh networks

CSIR Research Space (South Africa)

Mudali, P

2009-09-01

Full Text Available The Wireless Mesh Network (WMN) backbone is usually comprised of stationary nodes but the transient nature of wireless links results in changing network topologies. Topology Control (TC) aims to preserve network connectivity in ad hoc and mesh...
A parallel algorithm for 3D dislocation dynamics

International Nuclear Information System (INIS)

Wang Zhiqiang; Ghoniem, Nasr; Swaminarayan, Sriram; LeSar, Richard

2006-01-01

Dislocation dynamics (DD), a discrete dynamic simulation method in which dislocations are the fundamental entities, is a powerful tool for investigation of plasticity, deformation and fracture of materials at the micron length scale. However, severe computational difficulties arising from complex, long-range interactions between these curvilinear line defects limit the application of DD in the study of large-scale plastic deformation. We present here the development of a parallel algorithm for accelerated computer simulations of DD. By representing dislocations as a 3D set of dislocation particles, we show here that the problem of an interacting ensemble of dislocations can be converted to a problem of a particle ensemble, interacting with a long-range force field. A grid using binary space partitioning is constructed to keep track of node connectivity across domains. We demonstrate the computational efficiency of the parallel micro-plasticity code and discuss how O(N) methods map naturally onto the parallel data structure. Finally, we present results from applications of the parallel code to deformation in single crystal fcc metals
Finite element meshing approached as a global minimization process

Energy Technology Data Exchange (ETDEWEB)

WITKOWSKI,WALTER R.; JUNG,JOSEPH; DOHRMANN,CLARK R.; LEUNG,VITUS J.

2000-03-01

The ability to generate a suitable finite element mesh in an automatic fashion is becoming the key to being able to automate the entire engineering analysis process. However, placing an all-hexahedron mesh in a general three-dimensional body continues to be an elusive goal. The approach investigated in this research is fundamentally different from any other that is known of by the authors. A physical analogy viewpoint is used to formulate the actual meshing problem which constructs a global mathematical description of the problem. The analogy used was that of minimizing the electrical potential of a system charged particles within a charged domain. The particles in the presented analogy represent duals to mesh elements (i.e., quads or hexes). Particle movement is governed by a mathematical functional which accounts for inter-particles repulsive, attractive and alignment forces. This functional is minimized to find the optimal location and orientation of each particle. After the particles are connected a mesh can be easily resolved. The mathematical description for this problem is as easy to formulate in three-dimensions as it is in two- or one-dimensions. The meshing algorithm was developed within CoMeT. It can solve the two-dimensional meshing problem for convex and concave geometries in a purely automated fashion. Investigation of the robustness of the technique has shown a success rate of approximately 99% for the two-dimensional geometries tested. Run times to mesh a 100 element complex geometry were typically in the 10 minute range. Efficiency of the technique is still an issue that needs to be addressed. Performance is an issue that is critical for most engineers generating meshes. It was not for this project. The primary focus of this work was to investigate and evaluate a meshing algorithm/philosophy with efficiency issues being secondary. The algorithm was also extended to mesh three-dimensional geometries. Unfortunately, only simple geometries were tested
Anisotropic mesh adaptation for marine ice-sheet modelling

Science.gov (United States)

Gillet-Chaulet, Fabien; Tavard, Laure; Merino, Nacho; Peyaud, Vincent; Brondex, Julien; Durand, Gael; Gagliardini, Olivier

2017-04-01

Improving forecasts of ice-sheets contribution to sea-level rise requires, amongst others, to correctly model the dynamics of the grounding line (GL), i.e. the line where the ice detaches from its underlying bed and goes afloat on the ocean. Many numerical studies, including the intercomparison exercises MISMIP and MISMIP3D, have shown that grid refinement in the GL vicinity is a key component to obtain reliable results. Improving model accuracy while maintaining the computational cost affordable has then been an important target for the development of marine icesheet models. Adaptive mesh refinement (AMR) is a method where the accuracy of the solution is controlled by spatially adapting the mesh size. It has become popular in models using the finite element method as they naturally deal with unstructured meshes, but block-structured AMR has also been successfully applied to model GL dynamics. The main difficulty with AMR is to find efficient and reliable estimators of the numerical error to control the mesh size. Here, we use the estimator proposed by Frey and Alauzet (2015). Based on the interpolation error, it has been found effective in practice to control the numerical error, and has some flexibility, such as its ability to combine metrics for different variables, that makes it attractive. Routines to compute the anisotropic metric defining the mesh size have been implemented in the finite element ice flow model Elmer/Ice (Gagliardini et al., 2013). The mesh adaptation is performed using the freely available library MMG (Dapogny et al., 2014) called from Elmer/Ice. Using a setup based on the inter-comparison exercise MISMIP+ (Asay-Davis et al., 2016), we study the accuracy of the solution when the mesh is adapted using various variables (ice thickness, velocity, basal drag, …). We show that combining these variables allows to reduce the number of mesh nodes by more than one order of magnitude, for the same numerical accuracy, when compared to uniform mesh
Fast parallel algorithms that compute transitive closure of a fuzzy relation

Science.gov (United States)

Kreinovich, Vladik YA.

1993-01-01

The notion of a transitive closure of a fuzzy relation is very useful for clustering in pattern recognition, for fuzzy databases, etc. The original algorithm proposed by L. Zadeh (1971) requires the computation time O(n(sup 4)), where n is the number of elements in the relation. In 1974, J. C. Dunn proposed a O(n(sup 2)) algorithm. Since we must compute n(n-1)/2 different values s(a, b) (a not equal to b) that represent the fuzzy relation, and we need at least one computational step to compute each of these values, we cannot compute all of them in less than O(n(sup 2)) steps. So, Dunn's algorithm is in this sense optimal. For small n, it is ok. However, for big n (e.g., for big databases), it is still a lot, so it would be desirable to decrease the computation time (this problem was formulated by J. Bezdek). Since this decrease cannot be done on a sequential computer, the only way to do it is to use a computer with several processors working in parallel. We show that on a parallel computer, transitive closure can be computed in time O((log(sub 2)(n))2).
Soft Computing Methods for Disulfide Connectivity Prediction.

Science.gov (United States)

Márquez-Chamorro, Alfonso E; Aguilar-Ruiz, Jesús S

2015-01-01

The problem of protein structure prediction (PSP) is one of the main challenges in structural bioinformatics. To tackle this problem, PSP can be divided into several subproblems. One of these subproblems is the prediction of disulfide bonds. The disulfide connectivity prediction problem consists in identifying which nonadjacent cysteines would be cross-linked from all possible candidates. Determining the disulfide bond connectivity between the cysteines of a protein is desirable as a previous step of the 3D PSP, as the protein conformational search space is highly reduced. The most representative soft computing approaches for the disulfide bonds connectivity prediction problem of the last decade are summarized in this paper. Certain aspects, such as the different methodologies based on soft computing approaches (artificial neural network or support vector machine) or features of the algorithms, are used for the classification of these methods.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.