matpar parallel extensions: Topics by WorldWideScience.org

Sample records for matpar parallel extensions

Matpar: Parallel Extensions for MATLAB

Science.gov (United States)

Springer, P. L.

1998-01-01

Matpar is a set of client/server software that allows a MATLAB user to take advantage of a parallel computer for very large problems. The user can replace calls to certain built-in MATLAB functions with calls to Matpar functions.
A PARALLEL EXTENSION OF THE UAL ENVIRONMENT

International Nuclear Information System (INIS)

MALITSKY, N.; SHISHLO, A.

2001-01-01

The deployment of the Unified Accelerator Library (UAL) environment on the parallel cluster is presented. The approach is based on the Message-Passing Interface (MPI) library and the Perl adapter that allows one to control and mix together the existing conventional UAL components with the new MPI-based parallel extensions. In the paper, we provide timing results and describe the application of the new environment to the SNS Ring complex beam dynamics studies, particularly, simulations of several physical effects, such as space charge, field errors, fringe fields, and others
.NET 4.5 parallel extensions

CERN Document Server

Freeman, Bryan

2013-01-01

This book contains practical recipes on everything you will need to create task-based parallel programs using C#, .NET 4.5, and Visual Studio. The book is packed with illustrated code examples to create scalable programs.This book is intended to help experienced C# developers write applications that leverage the power of modern multicore processors. It provides the necessary knowledge for an experienced C# developer to work with .NET parallelism APIs. Previous experience of writing multithreaded applications is not necessary.
Extensions to the Parallel Real-Time Artificial Intelligence System (PRAIS) for fault-tolerant heterogeneous cycle-stealing reasoning

Science.gov (United States)

Goldstein, David

1991-01-01

Extensions to an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS) are discussed. PRAIS strives for transparently parallelizing production (rule-based) systems, even under real-time constraints. PRAIS accomplished these goals (presented at the first annual C Language Integrated Production System (CLIPS) conference) by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors. Results using the original PRAIS architecture over a network of Sun 3's, Sun 4's and VAX's are presented. Mechanisms using the producer-consumer model to extend the architecture for fault-tolerance and distributed truth maintenance initiation are also discussed.
A multi-transputer system for parallel Monte Carlo simulations of extensive air showers

International Nuclear Information System (INIS)

Gils, H.J.; Heck, D.; Oehlschlaeger, J.; Schatz, G.; Thouw, T.

1989-01-01

A multiprocessor computer system has been brought into operation at the Kernforschungszentrum Karlsruhe. It is dedicated to Monte Carlo simulations of extensive air showers induced by ultra-high energy cosmic rays. The architecture consists of two independently working VMEbus systems each with a 68020 microprocessor as host computer and twelve T800 transputers for parallel processing. The two systems are linked via Ethernet for data exchange. The T800 transputers are equipped with 4 Mbyte RAM each, sufficient to run rather large codes. The host computers are operated under UNIX 5.3. On the transputers compilers for PARALLEL FORTRAN, C, and PASCAL are available. The simple modular architecture of this parallel computer reflects the single purpose for which it is intended. The hardware of the multiprocessor computer is described as well as the way how the user software is handled and distributed to the 24 working processors. The performance of the parallel computer is demonstrated by well-known benchmarks and by realistic Monte Carlo simulations of air showers. Comparisons with other types of microprocessors and with large universal computers are made. It is demonstrated that a cost reduction by more than a factor of 20 is achieved by this system as compared to universal computer. (orig.)
Professional Parallel Programming with C# Master Parallel Extensions with NET 4

CERN Document Server

Hillar, Gastón

2010-01-01

Expert guidance for those programming today's dual-core processors PCs As PC processors explode from one or two to now eight processors, there is an urgent need for programmers to master concurrent programming. This book dives deep into the latest technologies available to programmers for creating professional parallel applications using C#, .NET 4, and Visual Studio 2010. The book covers task-based programming, coordination data structures, PLINQ, thread pools, asynchronous programming model, and more. It also teaches other parallel programming techniques, such as SIMD and vectorization.Teach
Rubus: A compiler for seamless and extensible parallelism

Science.gov (United States)

Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been
Rubus: A compiler for seamless and extensible parallelism.

Directory of Open Access Journals (Sweden)

Muhammad Adnan

Full Text Available Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU, originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84
Parallel Programming with Intel Parallel Studio XE

CERN Document Server

Blair-Chappell , Stephen

2012-01-01

Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the
Possible origin and significance of extension-parallel drainages in Arizona's metamophic core complexes

Science.gov (United States)

Spencer, J.E.

2000-01-01

The corrugated form of the Harcuvar, South Mountains, and Catalina metamorphic core complexes in Arizona reflects the shape of the middle Tertiary extensional detachment fault that projects over each complex. Corrugation axes are approximately parallel to the fault-displacement direction and to the footwall mylonitic lineation. The core complexes are locally incised by enigmatic, linear drainages that parallel corrugation axes and the inferred extension direction and are especially conspicuous on the crests of antiformal corrugations. These drainages have been attributed to erosional incision on a freshly denuded, planar, inclined fault ramp followed by folding that elevated and preserved some drainages on the crests of rising antiforms. According to this hypothesis, corrugations were produced by folding after subacrial exposure of detachment-fault foot-walls. An alternative hypothesis, proposed here, is as follows. In a setting where preexisting drainages cross an active normal fault, each fault-slip event will cut each drainage into two segments separated by a freshly denuded fault ramp. The upper and lower drainage segments will remain hydraulically linked after each fault-slip event if the drainage in the hanging-wall block is incised, even if the stream is on the flank of an antiformal corrugation and there is a large component of strike-slip fault movement. Maintenance of hydraulic linkage during sequential fault-slip events will guide the lengthening stream down the fault ramp as the ramp is uncovered, and stream incision will form a progressively lengthening, extension-parallel, linear drainage segment. This mechanism for linear drainage genesis is compatible with corrugations as original irregularities of the detachment fault, and does not require folding after early to middle Miocene footwall exhumations. This is desirable because many drainages are incised into nonmylonitic crystalline footwall rocks that were probably not folded under low
Parallel algorithms

CERN Document Server

Casanova, Henri; Robert, Yves

2008-01-01

""…The authors of the present book, who have extensive credentials in both research and instruction in the area of parallelism, present a sound, principled treatment of parallel algorithms. … This book is very well written and extremely well designed from an instructional point of view. … The authors have created an instructive and fascinating text. The book will serve researchers as well as instructors who need a solid, readable text for a course on parallelism in computing. Indeed, for anyone who wants an understandable text from which to acquire a current, rigorous, and broad vi
Physics based modeling of a series parallel battery pack for asymmetry analysis, predictive control and life extension

Science.gov (United States)

Ganesan, Nandhini; Basu, Suman; Hariharan, Krishnan S.; Kolake, Subramanya Mayya; Song, Taewon; Yeo, Taejung; Sohn, Dong Kee; Doo, Seokgwang

2016-08-01

Lithium-Ion batteries used for electric vehicle applications are subject to large currents and various operation conditions, making battery pack design and life extension a challenging problem. With increase in complexity, modeling and simulation can lead to insights that ensure optimal performance and life extension. In this manuscript, an electrochemical-thermal (ECT) coupled model for a 6 series × 5 parallel pack is developed for Li ion cells with NCA/C electrodes and validated against experimental data. Contribution of the cathode to overall degradation at various operating conditions is assessed. Pack asymmetry is analyzed from a design and an operational perspective. Design based asymmetry leads to a new approach of obtaining the individual cell responses of the pack from an average ECT output. Operational asymmetry is demonstrated in terms of effects of thermal gradients on cycle life, and an efficient model predictive control technique is developed. Concept of reconfigurable battery pack is studied using detailed simulations that can be used for effective monitoring and extension of battery pack life.
Parallel paving: An algorithm for generating distributed, adaptive, all-quadrilateral meshes on parallel computers

Energy Technology Data Exchange (ETDEWEB)

Lober, R.R.; Tautges, T.J.; Vaughan, C.T.

1997-03-01

Paving is an automated mesh generation algorithm which produces all-quadrilateral elements. It can additionally generate these elements in varying sizes such that the resulting mesh adapts to a function distribution, such as an error function. While powerful, conventional paving is a very serial algorithm in its operation. Parallel paving is the extension of serial paving into parallel environments to perform the same meshing functions as conventional paving only on distributed, discretized models. This extension allows large, adaptive, parallel finite element simulations to take advantage of paving`s meshing capabilities for h-remap remeshing. A significantly modified version of the CUBIT mesh generation code has been developed to host the parallel paving algorithm and demonstrate its capabilities on both two dimensional and three dimensional surface geometries and compare the resulting parallel produced meshes to conventionally paved meshes for mesh quality and algorithm performance. Sandia`s {open_quotes}tiling{close_quotes} dynamic load balancing code has also been extended to work with the paving algorithm to retain parallel efficiency as subdomains undergo iterative mesh refinement.
A parallel buffer tree

DEFF Research Database (Denmark)

Sitchinava, Nodar; Zeh, Norbert

2012-01-01

We present the parallel buffer tree, a parallel external memory (PEM) data structure for batched search problems. This data structure is a non-trivial extension of Arge's sequential buffer tree to a private-cache multiprocessor environment and reduces the number of I/O operations by the number of...... in the optimal OhOf(psortN + K/PB) parallel I/O complexity, where K is the size of the output reported in the process and psortN is the parallel I/O complexity of sorting N elements using P processors....
Parallel phase model : a programming model for high-end parallel machines with manycores.

Energy Technology Data Exchange (ETDEWEB)

Wu, Junfeng (Syracuse University, Syracuse, NY); Wen, Zhaofang; Heroux, Michael Allen; Brightwell, Ronald Brian

2009-04-01

This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster of nodes with a large number of cores on each node. PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster level. The programming abstraction will be suitable for expressing both fine-grained and coarse-grained parallelism. It includes a few high-level parallel programming language constructs that can be added as an extension to an existing (sequential or parallel) programming language such as C; and the implementation of PPM also includes a light-weight runtime library that runs on top of an existing network communication software layer (e.g. MPI). Design philosophy of PPM and details of the programming abstraction are also presented. Several unstructured applications that inherently require high-volume random fine-grained data accesses have been implemented in PPM with very promising results.
The parallel volume at large distances

DEFF Research Database (Denmark)

Kampf, Jürgen

In this paper we examine the asymptotic behavior of the parallel volume of planar non-convex bodies as the distance tends to infinity. We show that the difference between the parallel volume of the convex hull of a body and the parallel volume of the body itself tends to . This yields a new proof...... for the fact that a planar body can only have polynomial parallel volume, if it is convex. Extensions to Minkowski spaces and random sets are also discussed....
The parallel volume at large distances

DEFF Research Database (Denmark)

Kampf, Jürgen

In this paper we examine the asymptotic behavior of the parallel volume of planar non-convex bodies as the distance tends to infinity. We show that the difference between the parallel volume of the convex hull of a body and the parallel volume of the body itself tends to 0. This yields a new proof...... for the fact that a planar body can only have polynomial parallel volume, if it is convex. Extensions to Minkowski spaces and random sets are also discussed....
Parallelism in matrix computations

CERN Document Server

Gallopoulos, Efstratios; Sameh, Ahmed H

2016-01-01

This book is primarily intended as a research monograph that could also be used in graduate courses for the design of parallel algorithms in matrix computations. It assumes general but not extensive knowledge of numerical linear algebra, parallel architectures, and parallel programming paradigms. The book consists of four parts: (I) Basics; (II) Dense and Special Matrix Computations; (III) Sparse Matrix Computations; and (IV) Matrix functions and characteristics. Part I deals with parallel programming paradigms and fundamental kernels, including reordering schemes for sparse matrices. Part II is devoted to dense matrix computations such as parallel algorithms for solving linear systems, linear least squares, the symmetric algebraic eigenvalue problem, and the singular-value decomposition. It also deals with the development of parallel algorithms for special linear systems such as banded ,Vandermonde ,Toeplitz ,and block Toeplitz systems. Part III addresses sparse matrix computations: (a) the development of pa...
Extension parallel to the rift zone during segmented fault growth: application to the evolution of the NE Atlantic

Directory of Open Access Journals (Sweden)

A. Bubeck

2017-11-01

Full Text Available The mechanical interaction of propagating normal faults is known to influence the linkage geometry of first-order faults, and the development of second-order faults and fractures, which transfer displacement within relay zones. Here we use natural examples of growth faults from two active volcanic rift zones (Koa`e, island of Hawai`i, and Krafla, northern Iceland to illustrate the importance of horizontal-plane extension (heave gradients, and associated vertical axis rotations, in evolving continental rift systems. Second-order extension and extensional-shear faults within the relay zones variably resolve components of regional extension, and components of extension and/or shortening parallel to the rift zone, to accommodate the inherently three-dimensional (3-D strains associated with relay zone development and rotation. Such a configuration involves volume increase, which is accommodated at the surface by open fractures; in the subsurface this may be accommodated by veins or dikes oriented obliquely and normal to the rift axis. To consider the scalability of the effects of relay zone rotations, we compare the geometry and kinematics of fault and fracture sets in the Koa`e and Krafla rift zones with data from exhumed contemporaneous fault and dike systems developed within a > 5×104 km2 relay system that developed during formation of the NE Atlantic margins. Based on the findings presented here we propose a new conceptual model for the evolution of segmented continental rift basins on the NE Atlantic margins.
Parallel processing from applications to systems

CERN Document Server

Moldovan, Dan I

1993-01-01

This text provides one of the broadest presentations of parallelprocessing available, including the structure of parallelprocessors and parallel algorithms. The emphasis is on mappingalgorithms to highly parallel computers, with extensive coverage ofarray and multiprocessor architectures. Early chapters provideinsightful coverage on the analysis of parallel algorithms andprogram transformations, effectively integrating a variety ofmaterial previously scattered throughout the literature. Theory andpractice are well balanced across diverse topics in this concisepresentation. For exceptional cla

Parallelization in Modern C++

CERN Multimedia

CERN. Geneva

2016-01-01

The traditionally used and well established parallel programming models OpenMP and MPI are both targeting lower level parallelism and are meant to be as language agnostic as possible. For a long time, those models were the only widely available portable options for developing parallel C++ applications beyond using plain threads. This has strongly limited the optimization capabilities of compilers, has inhibited extensibility and genericity, and has restricted the use of those models together with other, modern higher level abstractions introduced by the C++11 and C++14 standards. The recent revival of interest in the industry and wider community for the C++ language has also spurred a remarkable amount of standardization proposals and technical specifications being developed. Those efforts however have so far failed to build a vision on how to seamlessly integrate various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous many-task execution flows, continuation s...
Streaming for Functional Data-Parallel Languages

DEFF Research Database (Denmark)

Madsen, Frederik Meisner

In this thesis, we investigate streaming as a general solution to the space inefficiency commonly found in functional data-parallel programming languages. The data-parallel paradigm maps well to parallel SIMD-style hardware. However, the traditional fully materializing execution strategy...... by extending two existing data-parallel languages: NESL and Accelerate. In the extensions we map bulk operations to data-parallel streams that can evaluate fully sequential, fully parallel or anything in between. By a dataflow, piecewise parallel execution strategy, the runtime system can adjust to any target...... flattening necessitates all sub-computations to materialize at the same time. For example, naive n by n matrix multiplication requires n^3 space in NESL because the algorithm contains n^3 independent scalar multiplications. For large values of n, this is completely unacceptable. We address the problem...
Parallel k-means++

Energy Technology Data Exchange (ETDEWEB)

2017-04-04

A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique. We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.
Overview of the Force Scientific Parallel Language

Directory of Open Access Journals (Sweden)

Gita Alaghband

1994-01-01

Full Text Available The Force parallel programming language designed for large-scale shared-memory multiprocessors is presented. The language provides a number of parallel constructs as extensions to the ordinary Fortran language and is implemented as a two-level macro preprocessor to support portability across shared memory multiprocessors. The global parallelism model on which the Force is based provides a powerful parallel language. The parallel constructs, generic synchronization, and freedom from process management supported by the Force has resulted in structured parallel programs that are ported to the many multiprocessors on which the Force is implemented. Two new parallel constructs for looping and functional decomposition are discussed. Several programming examples to illustrate some parallel programming approaches using the Force are also presented.
Performance Analysis of Parallel Mathematical Subroutine library PARCEL

International Nuclear Information System (INIS)

Yamada, Susumu; Shimizu, Futoshi; Kobayashi, Kenichi; Kaburaki, Hideo; Kishida, Norio

2000-01-01

The parallel mathematical subroutine library PARCEL (Parallel Computing Elements) has been developed by Japan Atomic Energy Research Institute for easy use of typical parallelized mathematical codes in any application problems on distributed parallel computers. The PARCEL includes routines for linear equations, eigenvalue problems, pseudo-random number generation, and fast Fourier transforms. It is shown that the results of performance for linear equations routines exhibit good parallelization efficiency on vector, as well as scalar, parallel computers. A comparison of the efficiency results with the PETSc (Portable Extensible Tool kit for Scientific Computations) library has been reported. (author)
Parallel S/sub n/ iteration schemes

International Nuclear Information System (INIS)

Wienke, B.R.; Hiromoto, R.E.

1986-01-01

The iterative, multigroup, discrete ordinates (S/sub n/) technique for solving the linear transport equation enjoys widespread usage and appeal. Serial iteration schemes and numerical algorithms developed over the years provide a timely framework for parallel extension. On the Denelcor HEP, the authors investigate three parallel iteration schemes for solving the one-dimensional S/sub n/ transport equation. The multigroup representation and serial iteration methods are also reviewed. This analysis represents a first attempt to extend serial S/sub n/ algorithms to parallel environments and provides good baseline estimates on ease of parallel implementation, relative algorithm efficiency, comparative speedup, and some future directions. The authors examine ordered and chaotic versions of these strategies, with and without concurrent rebalance and diffusion acceleration. Two strategies efficiently support high degrees of parallelization and appear to be robust parallel iteration techniques. The third strategy is a weaker parallel algorithm. Chaotic iteration, difficult to simulate on serial machines, holds promise and converges faster than ordered versions of the schemes. Actual parallel speedup and efficiency are high and payoff appears substantial
Parallel object-oriented specification language

NARCIS (Netherlands)

Florescu, O.; Voeten, J.P.M.; Theelen, B.D.; Geilen, M.C.W.; Corporaal, H.; Burns, Alan

2008-01-01

The Parallel Object-Oriented Specification Language (POOSL) is an expressive modelling language for hardware/software systems [10]. It was originally defined in [7] as an object-oriented extension of process algebra CCS [6], supporting (conditional) synchronous message passing between
A Coupling Tool for Parallel Molecular Dynamics-Continuum Simulations

KAUST Repository

Neumann, Philipp

2012-06-01

We present a tool for coupling Molecular Dynamics and continuum solvers. It is written in C++ and is meant to support the developers of hybrid molecular - continuum simulations in terms of both realisation of the respective coupling algorithm as well as parallel execution of the hybrid simulation. We describe the implementational concept of the tool and its parallel extensions. We particularly focus on the parallel execution of particle insertions into dense molecular systems and propose a respective parallel algorithm. Our implementations are validated for serial and parallel setups in two and three dimensions. © 2012 IEEE.
Parallel transposition of sparse data structures

DEFF Research Database (Denmark)

Wang, Hao; Liu, Weifeng; Hou, Kaixi

2016-01-01

Many applications in computational sciences and social sciences exploit sparsity and connectivity of acquired data. Even though many parallel sparse primitives such as sparse matrix-vector (SpMV) multiplication have been extensively studied, some other important building blocks, e.g., parallel tr...... transposition in the latest vendor-supplied library on an Intel multicore CPU platform, and the MergeTrans approach achieves on average of 3.4-fold (up to 11.7-fold) speedup on an Intel Xeon Phi many-core processor....
Distributed Parallel Architecture for "Big Data"

Directory of Open Access Journals (Sweden)

Catalin BOJA

2012-01-01

Full Text Available This paper is an extension to the "Distributed Parallel Architecture for Storing and Processing Large Datasets" paper presented at the WSEAS SEPADS’12 conference in Cambridge. In its original version the paper went over the benefits of using a distributed parallel architecture to store and process large datasets. This paper analyzes the problem of storing, processing and retrieving meaningful insight from petabytes of data. It provides a survey on current distributed and parallel data processing technologies and, based on them, will propose an architecture that can be used to solve the analyzed problem. In this version there is more emphasis put on distributed files systems and the ETL processes involved in a distributed environment.
Parallel Task Processing on a Multicore Platform in a PC-based Control System for Parallel Kinematics

Directory of Open Access Journals (Sweden)

Harald Michalik

2009-02-01

Full Text Available Multicore platforms are such that have one physical processor chip with multiple cores interconnected via a chip level bus. Because they deliver a greater computing power through concurrency, offer greater system density multicore platforms provide best qualifications to address the performance bottleneck encountered in PC-based control systems for parallel kinematic robots with heavy CPU-load. Heavy load control tasks are generated by new control approaches that include features like singularity prediction, structure control algorithms, vision data integration and similar tasks. In this paper we introduce the parallel task scheduling extension of a communication architecture specially tailored for the development of PC-based control of parallel kinematics. The Sche-duling is specially designed for the processing on a multicore platform. It breaks down the serial task processing of the robot control cycle and extends it with parallel task processing paths in order to enhance the overall control performance.
Parallel processing algorithms for hydrocodes on a computer with MIMD architecture (DENELCOR's HEP)

International Nuclear Information System (INIS)

Hicks, D.L.

1983-11-01

In real time simulation/prediction of complex systems such as water-cooled nuclear reactors, if reactor operators had fast simulator/predictors to check the consequences of their operations before implementing them, events such as the incident at Three Mile Island might be avoided. However, existing simulator/predictors such as RELAP run slower than real time on serial computers. It appears that the only way to overcome the barrier to higher computing rates is to use computers with architectures that allow concurrent computations or parallel processing. The computer architecture with the greatest degree of parallelism is labeled Multiple Instruction Stream, Multiple Data Stream (MIMD). An example of a machine of this type is the HEP computer by DENELCOR. It appears that hydrocodes are very well suited for parallelization on the HEP. It is a straightforward exercise to parallelize explicit, one-dimensional Lagrangean hydrocodes in a zone-by-zone parallelization. Similarly, implicit schemes can be parallelized in a zone-by-zone fashion via an a priori, symbolic inversion of the tridiagonal matrix that arises in an implicit scheme. These techniques are extended to Eulerian hydrocodes by using Harlow's rezone technique. The extension from single-phase Eulerian to two-phase Eulerian is straightforward. This step-by-step extension leads to hydrocodes with zone-by-zone parallelization that are capable of two-phase flow simulation. Extensions to two and three spatial dimensions can be achieved by operator splitting. It appears that a zone-by-zone parallelization is the best way to utilize the capabilities of an MIMD machine. 40 references
Extending the POSIX I/O interface: a parallel file system perspective.

Energy Technology Data Exchange (ETDEWEB)

Vilayannur, M.; Lang, S.; Ross, R.; Klundt, R.; Ward, L.; Mathematics and Computer Science; VMWare, Inc.; SNL

2008-12-11

The POSIX interface does not lend itself well to enabling good performance for high-end applications. Extensions are needed in the POSIX I/O interface so that high-concurrency HPC applications running on top of parallel file systems perform well. This paper presents the rationale, design, and evaluation of a reference implementation of a subset of the POSIX I/O interfaces on a widely used parallel file system (PVFS) on clusters. Experimental results on a set of micro-benchmarks confirm that the extensions to the POSIX interface greatly improve scalability and performance.
Lempel–Ziv Data Compression on Parallel and Distributed Systems

Directory of Open Access Journals (Sweden)

Sergio De Agostino

2011-09-01

Full Text Available We present a survey of results concerning Lempel–Ziv data compression on parallel and distributed systems, starting from the theoretical approach to parallel time complexity to conclude with the practical goal of designing distributed algorithms with low communication cost. Storer’s extension for image compression is also discussed.
Parallel computation of nondeterministic algorithms in VLSI

Energy Technology Data Exchange (ETDEWEB)

Hortensius, P D

1987-01-01

This work examines parallel VLSI implementations of nondeterministic algorithms. It is demonstrated that conventional pseudorandom number generators are unsuitable for highly parallel applications. Efficient parallel pseudorandom sequence generation can be accomplished using certain classes of elementary one-dimensional cellular automata. The pseudorandom numbers appear in parallel on each clock cycle. Extensive study of the properties of these new pseudorandom number generators is made using standard empirical random number tests, cycle length tests, and implementation considerations. Furthermore, it is shown these particular cellular automata can form the basis of efficient VLSI architectures for computations involved in the Monte Carlo simulation of both the percolation and Ising models from statistical mechanics. Finally, a variation on a Built-In Self-Test technique based upon cellular automata is presented. These Cellular Automata-Logic-Block-Observation (CALBO) circuits improve upon conventional design for testability circuitry.
Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors

Science.gov (United States)

Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.

1990-01-01

Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.
Parallelism in computations in quantum and statistical mechanics

International Nuclear Information System (INIS)

Clementi, E.; Corongiu, G.; Detrich, J.H.

1985-01-01

Often very fundamental biochemical and biophysical problems defy simulations because of limitations in today's computers. We present and discuss a distributed system composed of two IBM 4341 s and/or an IBM 4381 as front-end processors and ten FPS-164 attached array processors. This parallel system - called LCAP - has presently a peak performance of about 110 Mflops; extensions to higher performance are discussed. Presently, the system applications use a modified version of VM/SP as the operating system: description of the modifications is given. Three applications programs have been migrated from sequential to parallel: a molecular quantum mechanical, a Metropolis-Monte Carlo and a molecular dynamics program. Descriptions of the parallel codes are briefly outlined. Use of these parallel codes has already opened up new capabilities for our research. The very positive performance comparisons with today's supercomputers allow us to conclude that parallel computers and programming, of the type we have considered, represent a pragmatic answer to many computationally intensive problems. (orig.)
Parallel algorithms for interactive manipulation of digital terrain models

Science.gov (United States)

Davis, E. W.; Mcallister, D. F.; Nagaraj, V.

1988-01-01

Interactive three-dimensional graphics applications, such as terrain data representation and manipulation, require extensive arithmetic processing. Massively parallel machines are attractive for this application since they offer high computational rates, and grid connected architectures provide a natural mapping for grid based terrain models. Presented here are algorithms for data movement on the massive parallel processor (MPP) in support of pan and zoom functions over large data grids. It is an extension of earlier work that demonstrated real-time performance of graphics functions on grids that were equal in size to the physical dimensions of the MPP. When the dimensions of a data grid exceed the processing array size, data is packed in the array memory. Windows of the total data grid are interactively selected for processing. Movement of packed data is needed to distribute items across the array for efficient parallel processing. Execution time for data movement was found to exceed that for arithmetic aspects of graphics functions. Performance figures are given for routines written in MPP Pascal.
Extensions of Parallel Coordinates for Interactive Exploration of Large Multi-Timepoint Data Sets

NARCIS (Netherlands)

Blaas, J.; Botha, C.P.; Post, F.H.

2008-01-01

Parallel coordinate plots (PCPs) are commonly used in information visualization to provide insight into multi-variate data. These plots help to spot correlations between variables. PCPs have been successfully applied to unstructured datasets up to a few millions of points. In this paper, we present
Efficient Parallel Algorithms for Unsteady Incompressible Flows

KAUST Repository

Guermond, Jean-Luc; Minev, Peter D.

2013-01-01

The objective of this paper is to give an overview of recent developments on splitting schemes for solving the time-dependent incompressible Navier–Stokes equations and to discuss possible extensions to the variable density/viscosity case. A particular attention is given to algorithms that can be implemented efficiently on large parallel clusters.

Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine

DEFF Research Database (Denmark)

Madsen, Kasper Grud Skat; Zhou, Yongluan; Cao, Jianneng

2017-01-01

Load balancing, operator instance collocations and horizontal scaling are critical issues in Parallel Stream Processing Engines to achieve low data processing latency, optimized cluster utilization and minimized communication cost respectively. In previous work, these issues are typically tackled...... solution called ALBIC, which support general jobs. We implement the proposed techniques on top of Apache Storm, an open-source Parallel Stream Processing Engine. The extensive experimental results over both synthetic and real datasets show that our techniques clearly outperform existing approaches....
Parallel solutions of the two-group neutron diffusion equations

International Nuclear Information System (INIS)

Zee, K.S.; Turinsky, P.J.

1987-01-01

Recent efforts to adapt various numerical solution algorithms to parallel computer architectures have addressed the possibility of substantially reducing the running time of few-group neutron diffusion calculations. The authors have developed an efficient iterative parallel algorithm and an associated computer code for the rapid solution of the finite difference method representation of the two-group neutron diffusion equations on the CRAY X/MP-48 supercomputer having multi-CPUs and vector pipelines. For realistic simulation of light water reactor cores, the code employees a macroscopic depletion model with trace capability for selected fission product transients and critical boron. In addition to this, moderator and fuel temperature feedback models are also incorporated into the code. The validity of the physics models used in the code were benchmarked against qualified codes and proved accurate. This work is an extension of previous work in that various feedback effects are accounted for in the system; the entire code is structured to accommodate extensive vectorization; and an additional parallelism by multitasking is achieved not only for the solution of the matrix equations associated with the inner iterations but also for the other segments of the code, e.g., outer iterations
Parallel Object-Oriented Computation Applied to a Finite Element Problem

Directory of Open Access Journals (Sweden)

Jon B. Weissman

1993-01-01

Full Text Available The conventional wisdom in the scientific computing community is that the best way to solve large-scale numerically intensive scientific problems on today's parallel MIMD computers is to use Fortran or C programmed in a data-parallel style using low-level message-passing primitives. This approach inevitably leads to nonportable codes and extensive development time, and restricts parallel programming to the domain of the expert programmer. We believe that these problems are not inherent to parallel computing but are the result of the programming tools used. We will show that comparable performance can be achieved with little effort if better tools that present higher level abstractions are used. The vehicle for our demonstration is a 2D electromagnetic finite element scattering code we have implemented in Mentat, an object-oriented parallel processing system. We briefly describe the application. Mentat, the implementation, and present performance results for both a Mentat and a hand-coded parallel Fortran version.
Scaling up machine learning: parallel and distributed approaches

National Research Council Canada - National Science Library

Bekkerman, Ron; Bilenko, Mikhail; Langford, John

2012-01-01

... presented in the book cover a range of parallelization platforms from FPGAs and GPUs to multi-core systems and commodity clusters; concurrent programming frameworks that include CUDA, MPI, MapReduce, and DryadLINQ; and various learning settings: supervised, unsupervised, semi-supervised, and online learning. Extensive coverage of parallelizat...
A language for data-parallel and task parallel programming dedicated to multi-SIMD computers. Contributions to hydrodynamic simulation with lattice gases

International Nuclear Information System (INIS)

Pic, Marc Michel

1995-01-01

Parallel programming covers task-parallelism and data-parallelism. Many problems need both parallelisms. Multi-SIMD computers allow hierarchical approach of these parallelisms. The T++ language, based on C++, is dedicated to exploit Multi-SIMD computers using a programming paradigm which is an extension of array-programming to tasks managing. Our language introduced array of independent tasks to achieve separately (MIMD), on subsets of processors of identical behaviour (SIMD), in order to translate the hierarchical inclusion of data-parallelism in task-parallelism. To manipulate in a symmetrical way tasks and data we propose meta-operations which have the same behaviour on tasks arrays and on data arrays. We explain how to implement this language on our parallel computer SYMPHONIE in order to profit by the locally-shared memory, by the hardware virtualization, and by the multiplicity of communications networks. We analyse simultaneously a typical application of such architecture. Finite elements scheme for Fluid mechanic needs powerful parallel computers and requires large floating points abilities. Lattice gases is an alternative to such simulations. Boolean lattice bases are simple, stable, modular, need to floating point computation, but include numerical noise. Boltzmann lattice gases present large precision of computation, but needs floating points and are only locally stable. We propose a new scheme, called multi-bit, who keeps the advantages of each boolean model to which it is applied, with large numerical precision and reduced noise. Experiments on viscosity, physical behaviour, noise reduction and spurious invariants are shown and implementation techniques for parallel Multi-SIMD computers detailed. (author) [fr
Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

Directory of Open Access Journals (Sweden)

Stephen L. Olivier

2013-01-01

Full Text Available Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems. Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.
Scheduling Parallel Jobs Using Migration and Consolidation in the Cloud

Directory of Open Access Journals (Sweden)

Xiaocheng Liu

2012-01-01

Full Text Available An increasing number of high performance computing parallel applications leverages the power of the cloud for parallel processing. How to schedule the parallel applications to improve the quality of service is the key to the successful host of parallel applications in the cloud. The large scale of the cloud makes the parallel job scheduling more complicated as even simple parallel job scheduling problem is NP-complete. In this paper, we propose a parallel job scheduling algorithm named MEASY. MEASY adopts migration and consolidation to enhance the most popular EASY scheduling algorithm. Our extensive experiments on well-known workloads show that our algorithm takes very good care of the quality of service. For two common parallel job scheduling objectives, our algorithm produces an up to 41.1% and an average of 23.1% improvement on the average response time; an up to 82.9% and an average of 69.3% improvement on the average slowdown. Our algorithm is robust even in terms that it allows inaccurate CPU usage estimation and high migration cost. Our approach involves trivial modification on EASY and requires no additional technique; it is practical and effective in the cloud environment.
xSPDE: Extensible software for stochastic equations

Directory of Open Access Journals (Sweden)

Simon Kiesewetter

2016-01-01

Full Text Available We introduce an extensible software toolbox, xSPDE, for solving ordinary and partial stochastic differential equations. The toolbox makes extensive use of vector and parallel methods. Inputs are exceptionally simple, to reduce the learning curve, with default options for all of the many input parameters. The code calculates functional means, correlations and spectra, checks for errors in both time-step and sampling, and provides several choices of algorithm. Most aspects of the code, including the numerical algorithm, have a modular functional design to allow user modifications.
On Viviani's Theorem and Its Extensions

Science.gov (United States)

Abboud, Elias

2010-01-01

Viviani's theorem states that the sum of distances from any point inside an equilateral triangle to its sides is constant. Here, in an extension of this result, we show, using linear programming, that any convex polygon can be divided into parallel line segments on which the sum of the distances to the sides of the polygon is constant. Let us say…
Numerical Investigation of Startup Instabilities in Parallel-Channel Natural Circulation Boiling Systems

Directory of Open Access Journals (Sweden)

S. P. Lakshmanan

2010-01-01

Full Text Available The behaviour of a parallel-channel natural circulation boiling water reactor under a low-pressure low-power startup condition has been studied numerically (using RELAP5 and compared with its scaled model. The parallel-channel RELAP5 model is an extension of a single-channel model developed and validated with experimental results. Existence of in-phase and out-of-phase flashing instabilities in the parallel-channel systems is investigated through simulations under equal and unequal power boundary conditions in the channels. The effect of flow resistance on Type-I oscillations is explored. For nonidentical condition in the channels, the flow fluctuations in the parallel-channel systems are found to be out-of-phase.
Automatic parallelization of while-Loops using speculative execution

International Nuclear Information System (INIS)

Collard, J.F.

1995-01-01

Automatic parallelization of imperative sequential programs has focused on nests of for-loops. The most recent of them consist in finding an affine mapping with respect to the loop indices to simultaneously capture the temporal and spatial properties of the parallelized program. Such a mapping is usually called a open-quotes space-time transformation.close quotes This work describes an extension of these techniques to while-loops using speculative execution. We show that space-time transformations are a good framework for summing up previous restructuration techniques of while-loop, such as pipelining. Moreover, we show that these transformations can be derived and applied automatically
Sn transport calculations on vector and parallel processors

International Nuclear Information System (INIS)

Rhoades, W.A.; Childs, R.L.

1987-01-01

The transport of radiation from the source to the location of people or equipment gives rise to some of the most challenging of calculations. A problem may involve as many as a billion unknowns, each evaluated several times to resolve interdependence. Such calculations run many hours on a Cray computer, and a typical study involves many such calculations. This paper will discuss the steps taken to vectorize the DOT code, which solves transport problems in two space dimensions (2-D); the extension of this code to 3-D; and the plans for extension to parallel processors
A High-Performance Parallel FDTD Method Enhanced by Using SSE Instruction Set

Directory of Open Access Journals (Sweden)

Dau-Chyrh Chang

2012-01-01

Full Text Available We introduce a hardware acceleration technique for the parallel finite difference time domain (FDTD method using the SSE (streaming (single instruction multiple data SIMD extensions instruction set. The implementation of SSE instruction set to parallel FDTD method has achieved the significant improvement on the simulation performance. The benchmarks of the SSE acceleration on both the multi-CPU workstation and computer cluster have demonstrated the advantages of (vector arithmetic logic unit VALU acceleration over GPU acceleration. Several engineering applications are employed to demonstrate the performance of parallel FDTD method enhanced by SSE instruction set.
Process-Oriented Parallel Programming with an Application to Data-Intensive Computing

OpenAIRE

Givelberg, Edward

2014-01-01

We introduce process-oriented programming as a natural extension of object-oriented programming for parallel computing. It is based on the observation that every class of an object-oriented language can be instantiated as a process, accessible via a remote pointer. The introduction of process pointers requires no syntax extension, identifies processes with programming objects, and enables processes to exchange information simply by executing remote methods. Process-oriented programming is a h...
Discussion about the design for mesh data structure within the parallel framework

International Nuclear Information System (INIS)

Shi Guangmei; Wu Ruian; Wang Keying; Ji Xiaoyu; Hao Zhiming; Mo Jun; He Yingbo

2010-01-01

The mesh data structure, one of the fundamental data structure within the parallel framework, its design and realization level have an effect upon parallel capability of the parallel framework. Through the architecture and the fundamental data structure within some typical parallel framework relatively analyzed, such as JASMIN, SIERRA, and ITAPS, the design thought of parallel framework is discussed. Through borrowing ideas from layered set of services design about the SIERRA Framework, and combining with the objective of PANDA Framework in the near future, this paper present the rudimentary system about PANDA framework layered set of services. On this foundation, detailed introduction is placed in the definition and the management of the mesh data structure that it is located in the underlayer of the PANDA framework. The design and realization about parallel distributed mesh data structure of PANDA are emphatically discussed. The PANDA framework extension and application program development based on PANDA framework are grounded on our efforts.
Xyce parallel electronic simulator : users' guide.

Energy Technology Data Exchange (ETDEWEB)

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

2011-05-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is
Locating an axis-parallel rectangle on a Manhattan plane

DEFF Research Database (Denmark)

Brimberg, Jack; Juel, Henrik; Körner, Mark-Christoph

2014-01-01

In this paper we consider the problem of locating an axis-parallel rectangle in the plane such that the sum of distances between the rectangle and a finite point set is minimized, where the distance is measured by the Manhattan norm 1. In this way we solve an extension of the Weber problem...
10th International Workshop on Parallel Tools for High Performance Computing

CERN Document Server

Gracia, José; Hilbrich, Tobias; Knüpfer, Andreas; Resch, Michael; Nagel, Wolfgang

2017-01-01

This book presents the proceedings of the 10th International Parallel Tools Workshop, held October 4-5, 2016 in Stuttgart, Germany – a forum to discuss the latest advances in parallel tools. High-performance computing plays an increasingly important role for numerical simulation and modelling in academic and industrial research. At the same time, using large-scale parallel systems efficiently is becoming more difficult. A number of tools addressing parallel program development and analysis have emerged from the high-performance computing community over the last decade, and what may have started as collection of small helper script has now matured to production-grade frameworks. Powerful user interfaces and an extensive body of documentation allow easy usage by non-specialists.
First massively parallel algorithm to be implemented in Apollo-II code

International Nuclear Information System (INIS)

Stankovski, Z.

1994-01-01

The collision probability (CP) method in neutron transport, as applied to arbitrary 2D XY geometries, like the TDT module in APOLLO-II, is very time consuming. Consequently RZ or 3D extensions became prohibitive. Fortunately, this method is very suitable for parallelization. Massively parallel computer architectures, especially MIMD machines, bring a new breath to this method. In this paper we present a CM5 implementation of the CP method. Parallelization is applied to the energy groups, using the CMMD message passing library. In our case we use 32 processors for the standard 99-group APOLLIB-II library. The real advantage of this algorithm will appear in the calculation of the future fine multigroup library (about 8000 groups) of the SAPHYR project with a massively parallel computer (to the order of hundreds of processors). (author). 3 tabs., 4 figs., 4 refs
First massively parallel algorithm to be implemented in APOLLO-II code

International Nuclear Information System (INIS)

Stankovski, Z.

1994-01-01

The collision probability method in neutron transport, as applied to arbitrary 2-dimensional geometries, like the two dimensional transport module in APOLLO-II is very time consuming. Consequently 3-dimensional extension became prohibitive. Fortunately, this method is very suitable for parallelization. Massively parallel computer architectures, especially MIMD machines, bring a new breath to this method. In this paper we present a CM5 implementation of the collision probability method. Parallelization is applied to the energy groups, using the CMMD massage passing library. In our case we used 32 processors for the standard 99-group APOLLIB-II library. The real advantage of this algorithm will appear in the calculation of the future multigroup library (about 8000 groups) of the SAPHYR project with a massively parallel computer (to the order of hundreds of processors). (author). 4 refs., 4 figs., 3 tabs

Compiling the parallel programming language NestStep to the CELL processor

OpenAIRE

Holm, Magnus

2010-01-01

The goal of this project is to create a source-to-source compiler which will translate NestStep code to C code. The compiler's job is to replace NestStep constructs with a series of function calls to the NestStep runtime system. NestStep is a parallel programming language extension based on the BSP model. It adds constructs for parallel programming on top of an imperative programming language. For this project, only constructs extending the C language are relevant. The output code will compil...
Parallel implementation of DNA sequences matching algorithms using PWM on GPU architecture.

Science.gov (United States)

Sharma, Rahul; Gupta, Nitin; Narang, Vipin; Mittal, Ankush

2011-01-01

Positional Weight Matrices (PWMs) are widely used in representation and detection of Transcription Factor Of Binding Sites (TFBSs) on DNA. We implement online PWM search algorithm over parallel architecture. A large PWM data can be processed on Graphic Processing Unit (GPU) systems in parallel which can help in matching sequences at a faster rate. Our method employs extensive usage of highly multithreaded architecture and shared memory of multi-cored GPU. An efficient use of shared memory is required to optimise parallel reduction in CUDA. Our optimised method has a speedup of 230-280x over linear implementation on GPU named GeForce GTX 280.
A note on the nucleation with multiple steps: Parallel and series nucleation

OpenAIRE

Iwamatsu, Masao

2012-01-01

Parallel and series nucleation are the basic elements of the complex nucleation process when two saddle points exist on the free-energy landscape. It is pointed out that the nucleation rates follow formulas similar to those of parallel and series connection of resistors or conductors in an electric circuit. Necessary formulas to calculate individual nucleation rates at the saddle points and the total nucleation rate are summarized and the extension to the more complex nucleation process is su...
JTpack90: A parallel, object-based, Fortran 90 linear algebra package

Energy Technology Data Exchange (ETDEWEB)

Turner, J.A.; Kothe, D.B. [Los Alamos National Lab., NM (United States); Ferrell, R.C. [Cambridge Power Computing Associates, Ltd., Brookline, MA (United States)

1997-03-01

The authors have developed an object-based linear algebra package, currently with emphasis on sparse Krylov methods, driven primarily by needs of the Los Alamos National Laboratory parallel unstructured-mesh casting simulation tool Telluride. Support for a number of sparse storage formats, methods, and preconditioners have been implemented, driven primarily by application needs. They describe the object-based Fortran 90 approach, which enhances maintainability, performance, and extensibility, the parallelization approach using a new portable gather/scatter library (PGSLib), current capabilities and future plans, and present preliminary performance results on a variety of platforms.
Transmission Index Research of Parallel Manipulators Based on Matrix Orthogonal Degree

Science.gov (United States)

Shao, Zhu-Feng; Mo, Jiao; Tang, Xiao-Qiang; Wang, Li-Ping

2017-11-01

Performance index is the standard of performance evaluation, and is the foundation of both performance analysis and optimal design for the parallel manipulator. Seeking the suitable kinematic indices is always an important and challenging issue for the parallel manipulator. So far, there are extensive studies in this field, but few existing indices can meet all the requirements, such as simple, intuitive, and universal. To solve this problem, the matrix orthogonal degree is adopted, and generalized transmission indices that can evaluate motion/force transmissibility of fully parallel manipulators are proposed. Transmission performance analysis of typical branches, end effectors, and parallel manipulators is given to illustrate proposed indices and analysis methodology. Simulation and analysis results reveal that proposed transmission indices possess significant advantages, such as normalized finite (ranging from 0 to 1), dimensionally homogeneous, frame-free, intuitive and easy to calculate. Besides, proposed indices well indicate the good transmission region and relativity to the singularity with better resolution than the traditional local conditioning index, and provide a novel tool for kinematic analysis and optimal design of fully parallel manipulators.
A note on the nucleation with multiple steps: parallel and series nucleation.

Science.gov (United States)

Iwamatsu, Masao

2012-01-28

Parallel and series nucleation are the basic elements of the complex nucleation process when two saddle points exist on the free-energy landscape. It is pointed out that the nucleation rates follow formulas similar to those of parallel and series connection of resistors or conductors in an electric circuit. Necessary formulas to calculate individual nucleation rates at the saddle points and the total nucleation rate are summarized, and the extension to the more complex nucleation process is suggested. © 2012 American Institute of Physics
High-performance file I/O in Java : existing approaches and bulk I/O extensions.

Energy Technology Data Exchange (ETDEWEB)

Bonachea, D.; Dickens, P.; Thakur, R.; Mathematics and Computer Science; Univ. of California at Berkeley; Illinois Institute of Technology

2001-07-01

There is a growing interest in using Java as the language for developing high-performance computing applications. To be successful in the high-performance computing domain, however, Java must not only be able to provide high computational performance, but also high-performance I/O. In this paper, we first examine several approaches that attempt to provide high-performance I/O in Java - many of which are not obvious at first glance - and evaluate their performance on two parallel machines, the IBM SP and the SGI Origin2000. We then propose extensions to the Java I/O library that address the deficiencies in the Java I/O API and improve performance dramatically. The extensions add bulk (array) I/O operations to Java, thereby removing much of the overhead currently associated with array I/O in Java. We have implemented the extensions in two ways: in a standard JVM using the Java Native Interface (JNI) and in a high-performance parallel dialect of Java called Titanium. We describe the two implementations and present performance results that demonstrate the benefits of the proposed extensions.
Oxytocin: parallel processing in the social brain?

Science.gov (United States)

Dölen, Gül

2015-06-01

Early studies attempting to disentangle the network complexity of the brain exploited the accessibility of sensory receptive fields to reveal circuits made up of synapses connected both in series and in parallel. More recently, extension of this organisational principle beyond the sensory systems has been made possible by the advent of modern molecular, viral and optogenetic approaches. Here, evidence supporting parallel processing of social behaviours mediated by oxytocin is reviewed. Understanding oxytocinergic signalling from this perspective has significant implications for the design of oxytocin-based therapeutic interventions aimed at disorders such as autism, where disrupted social function is a core clinical feature. Moreover, identification of opportunities for novel technology development will require a better appreciation of the complexity of the circuit-level organisation of the social brain. © 2015 The Authors. Journal of Neuroendocrinology published by John Wiley & Sons Ltd on behalf of British Society for Neuroendocrinology.
Capacity Analysis for Parallel Runway through Agent-Based Simulation

Directory of Open Access Journals (Sweden)

Yang Peng

2013-01-01

Full Text Available Parallel runway is the mainstream structure of China hub airport, runway is often the bottleneck of an airport, and the evaluation of its capacity is of great importance to airport management. This study outlines a model, multiagent architecture, implementation approach, and software prototype of a simulation system for evaluating runway capacity. Agent Unified Modeling Language (AUML is applied to illustrate the inbound and departing procedure of planes and design the agent-based model. The model is evaluated experimentally, and the quality is studied in comparison with models, created by SIMMOD and Arena. The results seem to be highly efficient, so the method can be applied to parallel runway capacity evaluation and the model propose favorable flexibility and extensibility.
Application of Pfortran and Co-Array Fortran in the Parallelization of the GROMOS96 Molecular Dynamics Module

Directory of Open Access Journals (Sweden)

Piotr Bała

2001-01-01

Full Text Available After at least a decade of parallel tool development, parallelization of scientific applications remains a significant undertaking. Typically parallelization is a specialized activity supported only partially by the programming tool set, with the programmer involved with parallel issues in addition to sequential ones. The details of concern range from algorithm design down to low-level data movement details. The aim of parallel programming tools is to automate the latter without sacrificing performance and portability, allowing the programmer to focus on algorithm specification and development. We present our use of two similar parallelization tools, Pfortran and Cray's Co-Array Fortran, in the parallelization of the GROMOS96 molecular dynamics module. Our parallelization started from the GROMOS96 distribution's shared-memory implementation of the replicated algorithm, but used little of that existing parallel structure. Consequently, our parallelization was close to starting with the sequential version. We found the intuitive extensions to Pfortran and Co-Array Fortran helpful in the rapid parallelization of the project. We present performance figures for both the Pfortran and Co-Array Fortran parallelizations showing linear speedup within the range expected by these parallelization methods.
penORNL: a parallel Monte Carlo photon and electron transport package using PENELOPE

International Nuclear Information System (INIS)

Bekar, Kursat B.; Miller, Thomas Martin; Patton, Bruce W.; Weber, Charles F.

2015-01-01

The parallel Monte Carlo photon and electron transport code package penORNL was developed at Oak Ridge National Laboratory to enable advanced scanning electron microscope (SEM) simulations on high-performance computing systems. This paper discusses the implementations, capabilities and parallel performance of the new code package. penORNL uses PENELOPE for its physics calculations and provides all available PENELOPE features to the users, as well as some new features including source definitions specifically developed for SEM simulations, a pulse-height tally capability for detailed simulations of gamma and x-ray detectors, and a modified interaction forcing mechanism to enable accurate energy deposition calculations. The parallel performance of penORNL was extensively tested with several model problems, and very good linear parallel scaling was observed with up to 512 processors. penORNL, along with its new features, will be available for SEM simulations upon completion of the new pulse-height tally implementation.
A highly scalable massively parallel fast marching method for the Eikonal equation

Science.gov (United States)

Yang, Jianming; Stern, Frederick

2017-03-01

The fast marching method is a widely used numerical method for solving the Eikonal equation arising from a variety of scientific and engineering fields. It is long deemed inherently sequential and an efficient parallel algorithm applicable to large-scale practical applications is not available in the literature. In this study, we present a highly scalable massively parallel implementation of the fast marching method using a domain decomposition approach. Central to this algorithm is a novel restarted narrow band approach that coordinates the frequency of communications and the amount of computations extra to a sequential run for achieving an unprecedented parallel performance. Within each restart, the narrow band fast marching method is executed; simple synchronous local exchanges and global reductions are adopted for communicating updated data in the overlapping regions between neighboring subdomains and getting the latest front status, respectively. The independence of front characteristics is exploited through special data structures and augmented status tags to extract the masked parallelism within the fast marching method. The efficiency, flexibility, and applicability of the parallel algorithm are demonstrated through several examples. These problems are extensively tested on six grids with up to 1 billion points using different numbers of processes ranging from 1 to 65536. Remarkable parallel speedups are achieved using tens of thousands of processes. Detailed pseudo-codes for both the sequential and parallel algorithms are provided to illustrate the simplicity of the parallel implementation and its similarity to the sequential narrow band fast marching algorithm.
Expressiveness modulo Bisimilarity of Regular Expressions with Parallel Composition (Extended Abstract

Directory of Open Access Journals (Sweden)

Jos C. M. Baeten

2010-11-01

Full Text Available The languages accepted by finite automata are precisely the languages denoted by regular expressions. In contrast, finite automata may exhibit behaviours that cannot be described by regular expressions up to bisimilarity. In this paper, we consider extensions of the theory of regular expressions with various forms of parallel composition and study the effect on expressiveness. First we prove that adding pure interleaving to the theory of regular expressions strictly increases its expressiveness up to bisimilarity. Then, we prove that replacing the operation for pure interleaving by ACP-style parallel composition gives a further increase in expressiveness. Finally, we prove that the theory of regular expressions with ACP-style parallel composition and encapsulation is expressive enough to express all finite automata up to bisimilarity. Our results extend the expressiveness results obtained by Bergstra, Bethke and Ponse for process algebras with (the binary variant of Kleene's star operation.
A parallel simulated annealing algorithm for standard cell placement on a hypercube computer

Science.gov (United States)

Jones, Mark Howard

1987-01-01

A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm.
BCD/CPS: An event-level GEANT3 parallelization via CPS

International Nuclear Information System (INIS)

Roberts, L.A.

1991-04-01

BCD/CPS is an implementation of the Bottom Collider Detector GEANT3 simulation for CPS processor ranches. BCD/CPS demonstrates some of the capabilities of event-parallel applications applicable to current SSC detector simulations using the CPS and CZ/CPS communications protocols. Design, implementation and usage of the BCD/CPS simulation are presented along with extensive source listings for novice GEANT3/CPS programmers. 11 refs
Contributions for the optimization of the extensibility of parallel programing of turbulent plasmas

International Nuclear Information System (INIS)

Rozar, F.

2015-01-01

The work realized through this thesis focuses on the optimization of the Gysela code which simulates a plasma turbulence. Optimization of a scientific application concerns mainly one of the three following points: 1) the simulation of larger meshes, 2) the reduction of computing time and 3) the enhancement of the computation accuracy. The first part of this manuscript presents the contributions relative to the simulation of larger mesh. Alike many simulation codes, getting more realistic simulations is often analogous to rene the meshes. The finer the mesh the larger the memory consumption. Moreover, during these last few years, the supercomputers had trend to provide less and less memory per computer core. For these reasons, we have developed a library, the libMTM (Modeling and Tracing Memory), dedicated to study precisely the memory consumption of parallel softwares. The libMTM tools allowed us to reduce the memory consumption of Gysela and to study its scalability. As far as we know, there is no other tool which provides equivalent features which allow the memory scalability study. The second part of the manuscript presents the works relative to the optimization of the computation time and the improvement of accuracy of the gyro-average operator. This operator represents a corner stone of the gyrokinetic model which is used by the Gysela application. The improvement of accuracy emanates from a change in the computing method: a scheme based on a 2D Hermite interpolation substitutes the Pade approximation. Although the new version of the gyro-average operator is more accurate, it is also more expensive in computation time than the former one. In order to keep the simulation in reasonable time, different optimizations have been performed on the new computing method to get it competitive. Finally, we have developed a MPI parallelized version of the new gyro-average operator. The good scalability of this new gyro-average computer will allow, eventually, a reduction
Practical parallel computing

CERN Document Server

Morse, H Stephen

1994-01-01

Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi
Loading Variables on Implant-Supported Distal-Extension Removable Partial Dentures: An In Vitro Pilot Study.

Science.gov (United States)

Hirata, Kiyotaka; Takahashi, Toshihito; Tomita, Akiko; Gonda, Tomoya; Maeda, Yoshinobu

2016-01-01

The aim of this study was to investigate strain on implants used for adjunctive support of distal extension removable partial dentures. An implant with strain gauges was used for testing purposes in two positions, parallel and inclined. Three loading scenarios--loading apparatus (LA), artificial teeth via cotton roll (CR), and artificial teeth (UT)--were studied and strains compared via the Kruskal-Wallis test (P=.05). Strain under CR was significantly larger than UT in parallel (P<.05). However, the opposite was observed in inclined. Strain in parallel was smallest for UT, whereas in inclined it was largest for CR.
Parallel rendering

Science.gov (United States)

Crockett, Thomas W.

1995-01-01

This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.
Parallel computations

CERN Document Server

1982-01-01

Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn

Parallel sorting algorithms

CERN Document Server

Akl, Selim G

1985-01-01

Parallel Sorting Algorithms explains how to use parallel algorithms to sort a sequence of items on a variety of parallel computers. The book reviews the sorting problem, the parallel models of computation, parallel algorithms, and the lower bounds on the parallel sorting problems. The text also presents twenty different algorithms, such as linear arrays, mesh-connected computers, cube-connected computers. Another example where algorithm can be applied is on the shared-memory SIMD (single instruction stream multiple data stream) computers in which the whole sequence to be sorted can fit in the
A parallel additive Schwarz preconditioned Jacobi-Davidson algorithm for polynomial eigenvalue problems in quantum dot simulation

International Nuclear Information System (INIS)

Hwang, F-N; Wei, Z-H; Huang, T-M; Wang Weichung

2010-01-01

We develop a parallel Jacobi-Davidson approach for finding a partial set of eigenpairs of large sparse polynomial eigenvalue problems with application in quantum dot simulation. A Jacobi-Davidson eigenvalue solver is implemented based on the Portable, Extensible Toolkit for Scientific Computation (PETSc). The eigensolver thus inherits PETSc's efficient and various parallel operations, linear solvers, preconditioning schemes, and easy usages. The parallel eigenvalue solver is then used to solve higher degree polynomial eigenvalue problems arising in numerical simulations of three dimensional quantum dots governed by Schroedinger's equations. We find that the parallel restricted additive Schwarz preconditioner in conjunction with a parallel Krylov subspace method (e.g. GMRES) can solve the correction equations, the most costly step in the Jacobi-Davidson algorithm, very efficiently in parallel. Besides, the overall performance is quite satisfactory. We have observed near perfect superlinear speedup by using up to 320 processors. The parallel eigensolver can find all target interior eigenpairs of a quintic polynomial eigenvalue problem with more than 32 million variables within 12 minutes by using 272 Intel 3.0 GHz processors.
Xyce parallel electronic simulator : users' guide. Version 5.1.

Energy Technology Data Exchange (ETDEWEB)

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

2009-11-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a
Xyce Parallel Electronic Simulator : users' guide, version 4.1.

Energy Technology Data Exchange (ETDEWEB)

Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

2009-02-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a
Arc-parallel extension and fluid flow in an ancient accretionary wedge: The San Juan Islands, Washington

Science.gov (United States)

Schermer, Elizabeth R.; Gillaspy, J.R.; Lamb, R.

2007-01-01

Structural analysis of the Lopez Structural Complex, a major Late Cretaceous terrane-bounding fault zone in the San Juan thrust system, reveals a sequence of events that provides insight into accretionary wedge mechanics and regional tectonics. After formation of regional ductile flattening and shear-related fabrics, the area was crosscut by brittle structures including: (1) southwest-vergent thrusts, (2) extension veins and normal faults related to northwest-southeast extension, and (3) conjugate strike-slip structures that record northwest-southeast extension and northeast-southwest shortening. Aragonite-bearing veins are associated with thrust and normal faults, but only rarely with strike-slip faults. High-pressure, low-temperature (HP-LT) minerals constrain the conditions for brittle deformation to ???20 km and formed in an accretionary prism during active subduction, which suggests that these brittle structures record internal wedge deformation at depth and early during uplift of the San Juan nappes. The structures are consistent with orogen-normal shortening and vertical thickening followed by vertical thinning and along-strike extension. The kinematic evolution may be related initially to changes in wedge strength, followed by response to overthickening of the wedge in an unbuttressed, obliquely convergent setting. The change in vein mineralogy indicates that exhumation occurred prior to the strike-slip event. The pressure and temperature conditions and spatial and temporal extent of small faults associated with fluid flow suggest a link between these structures and the silent earthquake process. ?? 2007 Geological Society of America.
Parallel MR imaging.

Science.gov (United States)

Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A; Seiberlich, Nicole

2012-07-01

Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the undersampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. Copyright © 2012 Wiley Periodicals, Inc.
Automatic Thread-Level Parallelization in the Chombo AMR Library

Energy Technology Data Exchange (ETDEWEB)

Christen, Matthias; Keen, Noel; Ligocki, Terry; Oliker, Leonid; Shalf, John; Van Straalen, Brian; Williams, Samuel

2011-05-26

The increasing on-chip parallelism has some substantial implications for HPC applications. Currently, hybrid programming models (typically MPI+OpenMP) are employed for mapping software to the hardware in order to leverage the hardware?s architectural features. In this paper, we present an approach that automatically introduces thread level parallelism into Chombo, a parallel adaptive mesh refinement framework for finite difference type PDE solvers. In Chombo, core algorithms are specified in the ChomboFortran, a macro language extension to F77 that is part of the Chombo framework. This domain-specific language forms an already used target language for an automatic migration of the large number of existing algorithms into a hybrid MPI+OpenMP implementation. It also provides access to the auto-tuning methodology that enables tuning certain aspects of an algorithm to hardware characteristics. Performance measurements are presented for a few of the most relevant kernels with respect to a specific application benchmark using this technique as well as benchmark results for the entire application. The kernel benchmarks show that, using auto-tuning, up to a factor of 11 in performance was gained with 4 threads with respect to the serial reference implementation.
A Fast, High Quality, and Reproducible Parallel Lagged-Fibonacci Pseudorandom Number Generator

Science.gov (United States)

Mascagni, Michael; Cuccaro, Steven A.; Pryor, Daniel V.; Robinson, M. L.

1995-07-01

We study the suitability of the additive lagged-Fibonacci pseudo-random number generator for parallel computation. This generator has relatively short period with respect to the size of its seed. However, the short period is more than made up for with the huge number of full-period cycles it contains. These different full period cycles are called equivalence classes. We show how to enumerate the equivalence classes and how to compute seeds to select a given equivalence class, In addition, we present some theoretical measures of quality for this generator when used in parallel. Next, we conjecture on the size of these measures of quality for this generator. Extensive empirical evidence supports this conjecture. In addition, a probabilistic interpretation of these measures leads to another conjecture similarly supported by empirical evidence. Finally we give an explicit parallelization suitable for a fully reproducible asynchronous MIMD implementation.
High-Performance Psychometrics: The Parallel-E Parallel-M Algorithm for Generalized Latent Variable Models. Research Report. ETS RR-16-34

Science.gov (United States)

von Davier, Matthias

2016-01-01

This report presents results on a parallel implementation of the expectation-maximization (EM) algorithm for multidimensional latent variable models. The developments presented here are based on code that parallelizes both the E step and the M step of the parallel-E parallel-M algorithm. Examples presented in this report include item response…
A Parallel Approach for Frequent Subgraph Mining in a Single Large Graph Using Spark

Directory of Open Access Journals (Sweden)

Fengcai Qiao

2018-02-01

Full Text Available Frequent subgraph mining (FSM plays an important role in graph mining, attracting a great deal of attention in many areas, such as bioinformatics, web data mining and social networks. In this paper, we propose SSiGraM (Spark based Single Graph Mining, a Spark based parallel frequent subgraph mining algorithm in a single large graph. Aiming to approach the two computational challenges of FSM, we conduct the subgraph extension and support evaluation parallel across all the distributed cluster worker nodes. In addition, we also employ a heuristic search strategy and three novel optimizations: load balancing, pre-search pruning and top-down pruning in the support evaluation process, which significantly improve the performance. Extensive experiments with four different real-world datasets demonstrate that the proposed algorithm outperforms the existing GraMi (Graph Mining algorithm by an order of magnitude for all datasets and can work with a lower support threshold.
Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study

Directory of Open Access Journals (Sweden)

Hari Radhakrishnan

2015-01-01

Full Text Available This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were done using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.
Optimal task mapping in safety-critical real-time parallel systems; Placement optimal de taches pour les systemes paralleles temps-reel critiques

Energy Technology Data Exchange (ETDEWEB)

Aussagues, Ch

1998-12-11

This PhD thesis is dealing with the correct design of safety-critical real-time parallel systems. Such systems constitutes a fundamental part of high-performance systems for command and control that can be found in the nuclear domain or more generally in parallel embedded systems. The verification of their temporal correctness is the core of this thesis. our contribution is mainly in the following three points: the analysis and extension of a programming model for such real-time parallel systems; the proposal of an original method based on a new operator of synchronized product of state machines task-graphs; the validation of the approach by its implementation and evaluation. The work addresses particularly the main problem of optimal task mapping on a parallel architecture, such that the temporal constraints are globally guaranteed, i.e. the timeliness property is valid. The results incorporate also optimally criteria for the sizing and correct dimensioning of a parallel system, for instance in the number of processing elements. These criteria are connected with operational constraints of the application domain. Our approach is based on the off-line analysis of the feasibility of the deadline-driven dynamic scheduling that is used to schedule tasks inside one processor. This leads us to define the synchronized-product, a system of linear, constraints is automatically generated and then allows to calculate a maximum load of a group of tasks and then to verify their timeliness constraints. The communications, their timeliness verification and incorporation to the mapping problem is the second main contribution of this thesis. FInally, the global solving technique dealing with both task and communication aspects has been implemented and evaluated in the framework of the OASIS project in the LETI research center at the CEA/Saclay. (author) 96 refs.
Analysis of Retransmission Policies for Parallel Data Transmission

Directory of Open Access Journals (Sweden)

I. A. Halepoto

2018-06-01

Full Text Available Stream control transmission protocol (SCTP is a transport layer protocol, which is efficient, reliable, and connection-oriented as compared to transmission control protocol (TCP and user datagram protocol (UDP. Additionally, SCTP has more innovative features like multihoming, multistreaming and unordered delivery. With multihoming, SCTP establishes multiple paths between a sender and receiver. However, it only uses the primary path for data transmission and the secondary path (or paths for fault tolerance. Concurrent multipath transfer extension of SCTP (CMT-SCTP allows a sender to transmit data in parallel over multiple paths, which increases the overall transmission throughput. Parallel data transmission is beneficial for higher data rates. Parallel transmission or connection is also good in services such as video streaming where if one connection is occupied with errors the transmission continues on alternate links. With parallel transmission, the unordered data packets arrival is very common at receiver. The receiver has to wait until the missing data packets arrive, causing performance degradation while using CMT-SCTP. In order to reduce the transmission delay at the receiver, CMT-SCTP uses intelligent retransmission polices to immediately retransmit the missing packets. The retransmission policies used by CMT-SCTP are RTX-SSTHRESH, RTX-LOSSRATE and RTX-CWND. The main objective of this paper is the performance analysis of the retransmission policies. This paper evaluates RTX-SSTHRESH, RTX-LOSSRATE and RTX-CWND. Simulations are performed on the Network Simulator 2. In the simulations with various scenarios and parameters, it is observed that the RTX-LOSSRATE is a suitable policy.
A SPECT reconstruction method for extending parallel to non-parallel geometries

International Nuclear Information System (INIS)

Wen Junhai; Liang Zhengrong

2010-01-01

Due to its simplicity, parallel-beam geometry is usually assumed for the development of image reconstruction algorithms. The established reconstruction methodologies are then extended to fan-beam, cone-beam and other non-parallel geometries for practical application. This situation occurs for quantitative SPECT (single photon emission computed tomography) imaging in inverting the attenuated Radon transform. Novikov reported an explicit parallel-beam formula for the inversion of the attenuated Radon transform in 2000. Thereafter, a formula for fan-beam geometry was reported by Bukhgeim and Kazantsev (2002 Preprint N. 99 Sobolev Institute of Mathematics). At the same time, we presented a formula for varying focal-length fan-beam geometry. Sometimes, the reconstruction formula is so implicit that we cannot obtain the explicit reconstruction formula in the non-parallel geometries. In this work, we propose a unified reconstruction framework for extending parallel-beam geometry to any non-parallel geometry using ray-driven techniques. Studies by computer simulations demonstrated the accuracy of the presented unified reconstruction framework for extending parallel-beam to non-parallel geometries in inverting the attenuated Radon transform.
A general exact method for synthesizing parallel-beam projections from cone-beam projections via filtered backprojection

International Nuclear Information System (INIS)

Li Liang; Chen Zhiqiang; Xing Yuxiang; Zhang Li; Kang Kejun; Wang Ge

2006-01-01

In recent years, image reconstruction methods for cone-beam computed tomography (CT) have been extensively studied. However, few of these studies discussed computing parallel-beam projections from cone-beam projections. In this paper, we focus on the exact synthesis of complete or incomplete parallel-beam projections from cone-beam projections. First, an extended central slice theorem is described to establish a relationship between the Radon space and the Fourier space. Then, data sufficiency conditions are proposed for computing parallel-beam projection data from cone-beam data. Using these results, a general filtered backprojection algorithm is formulated that can exactly synthesize parallel-beam projection data from cone-beam projection data. As an example, we prove that parallel-beam projections can be exactly synthesized in an angular range in the case of circular cone-beam scanning. Interestingly, this angular range is larger than that derived in the Feldkamp reconstruction framework. Numerical experiments are performed in the circular scanning case to verify our method
The language parallel Pascal and other aspects of the massively parallel processor

Science.gov (United States)

Reeves, A. P.; Bruner, J. D.

1982-01-01

A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.
Parallel Atomistic Simulations

Energy Technology Data Exchange (ETDEWEB)

HEFFELFINGER,GRANT S.

2000-01-18

Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.
N-body simulation for self-gravitating collisional systems with a new SIMD instruction set extension to the x86 architecture, Advanced Vector eXtensions

Science.gov (United States)

Tanikawa, Ataru; Yoshikawa, Kohji; Okamoto, Takashi; Nitadori, Keigo

2012-02-01

We present a high-performance N-body code for self-gravitating collisional systems accelerated with the aid of a new SIMD instruction set extension of the x86 architecture: Advanced Vector eXtensions (AVX), an enhanced version of the Streaming SIMD Extensions (SSE). With one processor core of Intel Core i7-2600 processor (8 MB cache and 3.40 GHz) based on Sandy Bridge micro-architecture, we implemented a fourth-order Hermite scheme with individual timestep scheme ( Makino and Aarseth, 1992), and achieved the performance of ˜20 giga floating point number operations per second (GFLOPS) for double-precision accuracy, which is two times and five times higher than that of the previously developed code implemented with the SSE instructions ( Nitadori et al., 2006b), and that of a code implemented without any explicit use of SIMD instructions with the same processor core, respectively. We have parallelized the code by using so-called NINJA scheme ( Nitadori et al., 2006a), and achieved ˜90 GFLOPS for a system containing more than N = 8192 particles with 8 MPI processes on four cores. We expect to achieve about 10 tera FLOPS (TFLOPS) for a self-gravitating collisional system with N ˜ 10 5 on massively parallel systems with at most 800 cores with Sandy Bridge micro-architecture. This performance will be comparable to that of Graphic Processing Unit (GPU) cluster systems, such as the one with about 200 Tesla C1070 GPUs ( Spurzem et al., 2010). This paper offers an alternative to collisional N-body simulations with GRAPEs and GPUs.
Parallel integer sorting with medium and fine-scale parallelism

Science.gov (United States)

Dagum, Leonardo

1993-01-01

Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.
Effective damping for SSR analysis of parallel turbine-generators

International Nuclear Information System (INIS)

Agrawal, B.L.; Farmer, R.G.

1988-01-01

Damping is a dominant parameter in studies to determine SSR problem severity and countermeasure requirements. To reach valid conclusions for multi-unit plants, it is essential that the net effective damping of unequally loaded units be known. For the Palo Verde Nuclear Generating Station, extensive testing and analysis have been performed to verify and develop an accurate means of determining the effective damping of unequally loaded units in parallel. This has led to a unique and simple algorithm which correlates well with two other analytic techniques

Interactive animation of fault-tolerant parallel algorithms

Energy Technology Data Exchange (ETDEWEB)

Apgar, S.W.

1992-02-01

Animation of algorithms makes understanding them intuitively easier. This paper describes the software tool Raft (Robust Animator of Fault Tolerant Algorithms). The Raft system allows the user to animate a number of parallel algorithms which achieve fault tolerant execution. In particular, we use it to illustrate the key Write-All problem. It has an extensive user-interface which allows a choice of the number of processors, the number of elements in the Write-All array, and the adversary to control the processor failures. The novelty of the system is that the interface allows the user to create new on-line adversaries as the algorithm executes.
Fast parallel algorithm for CT image reconstruction.

Science.gov (United States)

Flores, Liubov A; Vidal, Vicent; Mayo, Patricia; Rodenas, Francisco; Verdú, Gumersindo

2012-01-01

In X-ray computed tomography (CT) the X rays are used to obtain the projection data needed to generate an image of the inside of an object. The image can be generated with different techniques. Iterative methods are more suitable for the reconstruction of images with high contrast and precision in noisy conditions and from a small number of projections. Their use may be important in portable scanners for their functionality in emergency situations. However, in practice, these methods are not widely used due to the high computational cost of their implementation. In this work we analyze iterative parallel image reconstruction with the Portable Extensive Toolkit for Scientific computation (PETSc).
Evaluation of thermal performance of all-GaN power module in parallel operation

International Nuclear Information System (INIS)

Chou, Po-Chien; Cheng, Stone; Chen, Szu-Hao

2014-01-01

This work presents an extensive thermal characterization of a single discrete GaN high-electron-mobility transistor (HEMT) device when operated in parallel at temperatures of 25 °C–175 °C. The maximum drain current (I D max ), on-resistance (R ON ), pinch-off voltage (V P ) and peak transconductance (g m ) at various chamber temperatures are measured and correlations among these parameters studied. Understanding the dependence of key transistor parameters on temperature is crucial to inhibiting the generation of hot spots and the equalization of currents in the parallel operation of HEMTs. A detailed analysis of the current imbalance between two parallel HEMT cells and its consequential effect on the junction temperature are also presented. The results from variations in the characteristics of the parallel-connected devices further verify that the thermal stability and switching behavior of these cells are balanced. Two parallel HEMT cells are operated at a safe working distance from thermal runaway to prevent destruction of the hottest cell. - Highlights: • This work reveals the sorting process of GaN devices for parallel operation. • The variations of I D max , R ON , V P , and g m with temperature are established. • The temperature-dependence parameters are crucial to prevent hot spots generation. • Safe working operation prevents thermal runaway and hottest cell destruction
About Parallel Programming: Paradigms, Parallel Execution and Collaborative Systems

Directory of Open Access Journals (Sweden)

Loredana MOCEAN

2009-01-01

Full Text Available In the last years, there were made efforts for delineation of a stabile and unitary frame, where the problems of logical parallel processing must find solutions at least at the level of imperative languages. The results obtained by now are not at the level of the made efforts. This paper wants to be a little contribution at these efforts. We propose an overview in parallel programming, parallel execution and collaborative systems.
Parallel computing works!

CERN Document Server

Fox, Geoffrey C; Messina, Guiseppe C

2014-01-01

A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop
Experimental studies in a single-phase parallel channel natural circulation system. Preliminary results

International Nuclear Information System (INIS)

Bodkha, Kapil; Pilkhwal, D.S.; Jana, S.S.; Vijayan, P.K.

2016-01-01

Natural circulation systems find extensive applications in industrial engineering systems. One of the applications is in nuclear reactor where the decay heat is removed by natural circulation of the fluid under off-normal conditions. The upcoming reactor designs make use of natural circulation in order to remove the heat from core under normal operating conditions also. These reactors employ multiple vertical fuel channels with provision of on-power refueling/defueling. Natural circulation systems are relatively simple, safe and reliable when compared to forced circulation systems. However, natural circulation systems are prone to encounter flow instabilities which are highly undesirable for various reasons. Presence of parallel channels under natural circulation makes the system more complicated. To examine the behavior of parallel channel system, studies were carried out for single-phase natural circulation flow in a multiple vertical channel system. The objective of the present work is to study the flow behavior of the parallel heated channel system under natural circulation for different operating conditions. Steady state and transient studies have been carried out in a parallel channel natural circulation system with three heated channels. The paper brings out the details of the system considered, different cases analyzed and preliminary results of studies carried out on a single-phase parallel channel system.
Development of a parallelization strategy for the VARIANT code

International Nuclear Information System (INIS)

Hanebutte, U.R.; Khalil, H.S.; Palmiotti, G.; Tatsumi, M.

1996-01-01

The VARIANT code solves the multigroup steady-state neutron diffusion and transport equation in three-dimensional Cartesian and hexagonal geometries using the variational nodal method. VARIANT consists of four major parts that must be executed sequentially: input handling, calculation of response matrices, solution algorithm (i.e. inner-outer iteration), and output of results. The objective of the parallelization effort was to reduce the overall computing time by distributing the work of the two computationally intensive (sequential) tasks, the coupling coefficient calculation and the iterative solver, equally among a group of processors. This report describes the code's calculations and gives performance results on one of the benchmark problems used to test the code. The performance analysis in the IBM SPx system shows good efficiency for well-load-balanced programs. Even for relatively small problem sizes, respectable efficiencies are seen for the SPx. An extension to achieve a higher degree of parallelism will be addressed in future work. 7 refs., 1 tab
Parallel Auxiliary Space AMG Solver for $H(div)$ Problems

Energy Technology Data Exchange (ETDEWEB)

Kolev, Tzanio V. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Vassilevski, Panayot S. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2012-12-18

We present a family of scalable preconditioners for matrices arising in the discretization of $H(div)$ problems using the lowest order Raviart--Thomas finite elements. Our approach belongs to the class of “auxiliary space''--based methods and requires only the finite element stiffness matrix plus some minimal additional discretization information about the topology and orientation of mesh entities. Also, we provide a detailed algebraic description of the theory, parallel implementation, and different variants of this parallel auxiliary space divergence solver (ADS) and discuss its relations to the Hiptmair--Xu (HX) auxiliary space decomposition of $H(div)$ [SIAM J. Numer. Anal., 45 (2007), pp. 2483--2509] and to the auxiliary space Maxwell solver AMS [J. Comput. Math., 27 (2009), pp. 604--623]. Finally, an extensive set of numerical experiments demonstrates the robustness and scalability of our implementation on large-scale $H(div)$ problems with large jumps in the material coefficients.
New method for protection of parallel generator; Novo metodo para protecao do gerador em paralelismo

Energy Technology Data Exchange (ETDEWEB)

Silva, M R.C. da [Elfa-Seg Eletronica Ltda. (Brazil)

1988-07-01

The protection of synchronous machinery, especially generators working in parallel with the pertaining electric power utility have been extensively discussed specially because of the growing importance of co-generation in Brazil. This work discusses existing efficient methods and suggests new ways of proceeding this protection. 8 refs., 2 figs.
Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2014-08-12

Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Extension as expression of social responsibility for higher education

Directory of Open Access Journals (Sweden)

Ricardo Antonio de Marco

2016-05-01

Full Text Available The National System of Higher Education Assessment 2004 in Axis 2, Institutional Development and its dimensions 1 and 3: Mission and Institutional Development Plan (IDP and the Social Responsibility of the institution highlights the need for universities to incorporate in their activities teaching, research and extension practices that demonstrate their positive involvement in social development. In this sense, this article aims to evaluate the practice of university extension contributes to the consolidation of University Social Responsibility. was used as a method descriptive research and documentary analysis found that the institutional documents of the University of the West of Santa Catarina: mission, vision and values; Institutional Development Plan and the extension project of the University of Chapecó Best Age (UMIC; and the National System of Higher Education Evaluation. From this inference, it was revealed that UNOESC in its constitutive principles and official documents value-oriented civic education for social inclusion. It was found that the consolidation of MSW necessarily involves watchful eye of management to the principles of indivisibility of teaching, research and extension, components and ended the universities, which when not properly executed, counter and violate the legal provision; that inter- and transdisciplinary nature of extension projects, such as UMIC, have strong contribution to the consolidation of MSW; parallel, left clear that isolation Extension projects like UMIC not reach the fullness of the social commitment of universities, suggesting that inseparability is present with the incorporation of actions that promote social development.
Testing New Programming Paradigms with NAS Parallel Benchmarks

Science.gov (United States)

Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

2000-01-01

Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage
Systematic approach for deriving feasible mappings of parallel algorithms to parallel computing platforms

NARCIS (Netherlands)

Arkin, Ethem; Tekinerdogan, Bedir; Imre, Kayhan M.

2017-01-01

The need for high-performance computing together with the increasing trend from single processor to parallel computer architectures has leveraged the adoption of parallel computing. To benefit from parallel computing power, usually parallel algorithms are defined that can be mapped and executed
Parallel efficient rate control methods for JPEG 2000

Science.gov (United States)

Martínez-del-Amor, Miguel Á.; Bruns, Volker; Sparenberg, Heiko

2017-09-01

Since the introduction of JPEG 2000, several rate control methods have been proposed. Among them, post-compression rate-distortion optimization (PCRD-Opt) is the most widely used, and the one recommended by the standard. The approach followed by this method is to first compress the entire image split in code blocks, and subsequently, optimally truncate the set of generated bit streams according to the maximum target bit rate constraint. The literature proposes various strategies on how to estimate ahead of time where a block will get truncated in order to stop the execution prematurely and save time. However, none of them have been defined bearing in mind a parallel implementation. Today, multi-core and many-core architectures are becoming popular for JPEG 2000 codecs implementations. Therefore, in this paper, we analyze how some techniques for efficient rate control can be deployed in GPUs. In order to do that, the design of our GPU-based codec is extended, allowing stopping the process at a given point. This extension also harnesses a higher level of parallelism on the GPU, leading to up to 40% of speedup with 4K test material on a Titan X. In a second step, three selected rate control methods are adapted and implemented in our parallel encoder. A comparison is then carried out, and used to select the best candidate to be deployed in a GPU encoder, which gave an extra 40% of speedup in those situations where it was really employed.
A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines

Directory of Open Access Journals (Sweden)

Cieślik Marcin

2011-02-01

Full Text Available Abstract Background Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. Results To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'. A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption. An add-on module ('NuBio' facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures and functionality (e.g., to parse/write standard file formats. Conclusions PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and
Kemari: A Portable High Performance Fortran System for Distributed Memory Parallel Processors

Directory of Open Access Journals (Sweden)

T. Kamachi

1997-01-01

Full Text Available We have developed a compilation system which extends High Performance Fortran (HPF in various aspects. We support the parallelization of well-structured problems with loop distribution and alignment directives similar to HPF's data distribution directives. Such directives give both additional control to the user and simplify the compilation process. For the support of unstructured problems, we provide directives for dynamic data distribution through user-defined mappings. The compiler also allows integration of message-passing interface (MPI primitives. The system is part of a complete programming environment which also comprises a parallel debugger and a performance monitor and analyzer. After an overview of the compiler, we describe the language extensions and related compilation mechanisms in detail. Performance measurements demonstrate the compiler's applicability to a variety of application classes.
Parallel algorithms for mapping pipelined and parallel computations

Science.gov (United States)

Nicol, David M.

1988-01-01

Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.
Cpl6: The New Extensible, High-Performance Parallel Coupler forthe Community Climate System Model

Energy Technology Data Exchange (ETDEWEB)

Craig, Anthony P.; Jacob, Robert L.; Kauffman, Brain; Bettge,Tom; Larson, Jay; Ong, Everest; Ding, Chris; He, Yun

2005-03-24

Coupled climate models are large, multiphysics applications designed to simulate the Earth's climate and predict the response of the climate to any changes in the forcing or boundary conditions. The Community Climate System Model (CCSM) is a widely used state-of-art climate model that has released several versions to the climate community over the past ten years. Like many climate models, CCSM employs a coupler, a functional unit that coordinates the exchange of data between parts of climate system such as the atmosphere and ocean. This paper describes the new coupler, cpl6, contained in the latest version of CCSM,CCSM3. Cpl6 introduces distributed-memory parallelism to the coupler, a class library for important coupler functions, and a standardized interface for component models. Cpl6 is implemented entirely in Fortran90 and uses Model Coupling Toolkit as the base for most of its classes. Cpl6 gives improved performance over previous versions and scales well on multiple platforms.
On program restructuring, scheduling, and communication for parallel processor systems

Energy Technology Data Exchange (ETDEWEB)

Polychronopoulos, Constantine D. [Univ. of Illinois, Urbana, IL (United States)

1986-08-01

This dissertation discusses several software and hardware aspects of program execution on large-scale, high-performance parallel processor systems. The issues covered are program restructuring, partitioning, scheduling and interprocessor communication, synchronization, and hardware design issues of specialized units. All this work was performed focusing on a single goal: to maximize program speedup, or equivalently, to minimize parallel execution time. Parafrase, a Fortran restructuring compiler was used to transform programs in a parallel form and conduct experiments. Two new program restructuring techniques are presented, loop coalescing and subscript blocking. Compile-time and run-time scheduling schemes are covered extensively. Depending on the program construct, these algorithms generate optimal or near-optimal schedules. For the case of arbitrarily nested hybrid loops, two optimal scheduling algorithms for dynamic and static scheduling are presented. Simulation results are given for a new dynamic scheduling algorithm. The performance of this algorithm is compared to that of self-scheduling. Techniques for program partitioning and minimization of interprocessor communication for idealized program models and for real Fortran programs are also discussed. The close relationship between scheduling, interprocessor communication, and synchronization becomes apparent at several points in this work. Finally, the impact of various types of overhead on program speedup and experimental results are presented.
Implementation of a Monte Carlo algorithm for neutron transport on a massively parallel SIMD machine

International Nuclear Information System (INIS)

Baker, R.S.

1992-01-01

We present some results from the recent adaptation of a vectorized Monte Carlo algorithm to a massively parallel architecture. The performance of the algorithm on a single processor Cray Y-MP and a Thinking Machine Corporations CM-2 and CM-200 is compared for several test problems. The results show that significant speedups are obtainable for vectorized Monte Carlo algorithms on massively parallel machines, even when the algorithms are applied to realistic problems which require extensive variance reduction. However, the architecture of the Connection Machine does place some limitations on the regime in which the Monte Carlo algorithm may be expected to perform well

Implementation of a Monte Carlo algorithm for neutron transport on a massively parallel SIMD machine

International Nuclear Information System (INIS)

Baker, R.S.

1993-01-01

We present some results from the recent adaptation of a vectorized Monte Carlo algorithm to a massively parallel architecture. The performance of the algorithm on a single processor Cray Y-MP and a Thinking Machine Corporations CM-2 and CM-200 is compared for several test problems. The results show that significant speedups are obtainable for vectorized Monte Carlo algorithms on massively parallel machines, even when the algorithms are applied to realistic problems which require extensive variance reduction. However, the architecture of the Connection Machine does place some limitations on the regime in which the Monte Carlo algorithm may be expected to perform well. (orig.)
Stage-by-Stage and Parallel Flow Path Compressor Modeling for a Variable Cycle Engine

Science.gov (United States)

Kopasakis, George; Connolly, Joseph W.; Cheng, Larry

2015-01-01

This paper covers the development of stage-by-stage and parallel flow path compressor modeling approaches for a Variable Cycle Engine. The stage-by-stage compressor modeling approach is an extension of a technique for lumped volume dynamics and performance characteristic modeling. It was developed to improve the accuracy of axial compressor dynamics over lumped volume dynamics modeling. The stage-by-stage compressor model presented here is formulated into a parallel flow path model that includes both axial and rotational dynamics. This is done to enable the study of compressor and propulsion system dynamic performance under flow distortion conditions. The approaches utilized here are generic and should be applicable for the modeling of any axial flow compressor design.
Xyce Parallel Electronic Simulator - User's Guide, Version 1.0

Energy Technology Data Exchange (ETDEWEB)

HUTCHINSON, SCOTT A; KEITER, ERIC R.; HOEKSTRA, ROBERT J.; WATERS, LON J.; RUSSO, THOMAS V.; RANKIN, ERIC LAMONT; WIX, STEVEN D.

2002-11-01

This manual describes the use of the Xyce Parallel Electronic Simulator code for simulating electrical circuits at a variety of abstraction levels. The Xyce Parallel Electronic Simulator has been written to support,in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. As such, the development has focused on improving the capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). (4) Object-oriented code design and implementation using modern coding-practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. The code is a parallel code in the most general sense of the phrase--a message passing parallel implementation--which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Furthermore, careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved even as the number of processors grows. Another feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce Parallel Electronic Simulator is designed to support a variety of device model inputs. These input formats include standard analytical models, behavioral models
Parallel computing works

Energy Technology Data Exchange (ETDEWEB)

1991-10-23

An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.
Optimal task mapping in safety-critical real-time parallel systems

International Nuclear Information System (INIS)

Aussagues, Ch.

1998-01-01

This PhD thesis is dealing with the correct design of safety-critical real-time parallel systems. Such systems constitutes a fundamental part of high-performance systems for command and control that can be found in the nuclear domain or more generally in parallel embedded systems. The verification of their temporal correctness is the core of this thesis. our contribution is mainly in the following three points: the analysis and extension of a programming model for such real-time parallel systems; the proposal of an original method based on a new operator of synchronized product of state machines task-graphs; the validation of the approach by its implementation and evaluation. The work addresses particularly the main problem of optimal task mapping on a parallel architecture, such that the temporal constraints are globally guaranteed, i.e. the timeliness property is valid. The results incorporate also optimally criteria for the sizing and correct dimensioning of a parallel system, for instance in the number of processing elements. These criteria are connected with operational constraints of the application domain. Our approach is based on the off-line analysis of the feasibility of the deadline-driven dynamic scheduling that is used to schedule tasks inside one processor. This leads us to define the synchronized-product, a system of linear, constraints is automatically generated and then allows to calculate a maximum load of a group of tasks and then to verify their timeliness constraints. The communications, their timeliness verification and incorporation to the mapping problem is the second main contribution of this thesis. FInally, the global solving technique dealing with both task and communication aspects has been implemented and evaluated in the framework of the OASIS project in the LETI research center at the CEA/Saclay. (author)
Template based parallel checkpointing in a massively parallel computer system

Science.gov (United States)

Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN

2009-01-13

A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
Parallel vs. Convergent Evolution in Domestication and Diversification of Crops in the Americas

Directory of Open Access Journals (Sweden)

Barbara Pickersgill

2018-05-01

Full Text Available Domestication involves changes in various traits of the phenotype in response to human selection. Diversification may accompany or follow domestication, and results in variants within the crop adapted to different uses by humans or different agronomic conditions. Similar domestication and diversification traits may be shared by closely related species (parallel evolution or by distantly related species (convergent evolution. Many of these traits are produced by complex genetic networks or long biosynthetic pathways that are extensively conserved even in distantly related species. Similar phenotypic changes in different species may be controlled by homologous genes (parallel evolution at the genetic level or non-homologous genes (convergent evolution at the genetic level. It has been suggested that parallel evolution may be more frequent among closely related species, or among diversification rather than domestication traits, or among traits produced by simple metabolic pathways. Crops domesticated in the Americas span a spectrum of genetic relatedness, have been domesticated for diverse purposes, and have responded to human selection by changes in many different traits, so provide examples of both parallel and convergent evolution at various levels. However, despite the current explosion in relevant information, data are still insufficient to provide quantitative or conclusive assessments of the relative roles of these two processes in domestication and diversification
Introduction to parallel programming

CERN Document Server

Brawer, Steven

1989-01-01

Introduction to Parallel Programming focuses on the techniques, processes, methodologies, and approaches involved in parallel programming. The book first offers information on Fortran, hardware and operating system models, and processes, shared memory, and simple parallel programs. Discussions focus on processes and processors, joining processes, shared memory, time-sharing with multiple processors, hardware, loops, passing arguments in function/subroutine calls, program structure, and arithmetic expressions. The text then elaborates on basic parallel programming techniques, barriers and race
Parallel Calculations in LS-DYNA

Science.gov (United States)

Vartanovich Mkrtychev, Oleg; Aleksandrovich Reshetov, Andrey

2017-11-01

Nowadays, structural mechanics exhibits a trend towards numeric solutions being found for increasingly extensive and detailed tasks, which requires that capacities of computing systems be enhanced. Such enhancement can be achieved by different means. E.g., in case a computing system is represented by a workstation, its components can be replaced and/or extended (CPU, memory etc.). In essence, such modification eventually entails replacement of the entire workstation, i.e. replacement of certain components necessitates exchange of others (faster CPUs and memory devices require buses with higher throughput etc.). Special consideration must be given to the capabilities of modern video cards. They constitute powerful computing systems capable of running data processing in parallel. Interestingly, the tools originally designed to render high-performance graphics can be applied for solving problems not immediately related to graphics (CUDA, OpenCL, Shaders etc.). However, not all software suites utilize video cards’ capacities. Another way to increase capacity of a computing system is to implement a cluster architecture: to add cluster nodes (workstations) and to increase the network communication speed between the nodes. The advantage of this approach is extensive growth due to which a quite powerful system can be obtained by combining not particularly powerful nodes. Moreover, separate nodes may possess different capacities. This paper considers the use of a clustered computing system for solving problems of structural mechanics with LS-DYNA software. To establish a range of dependencies a mere 2-node cluster has proven sufficient.
Xyce Parallel Electronic Simulator : users' guide, version 2.0.

Energy Technology Data Exchange (ETDEWEB)

Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.

2004-06-01

This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator capable of simulating electrical circuits at a variety of abstraction levels. Primarily, Xyce has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability the current state-of-the-art in the following areas: {sm_bullet} Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. {sm_bullet} Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. {sm_bullet} Device models which are specifically tailored to meet Sandia's needs, including many radiation-aware devices. {sm_bullet} A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). {sm_bullet} Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing of computing platforms. These include serial, shared-memory and distributed-memory parallel implementation - which allows it to run efficiently on the widest possible number parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. One feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce
Model-driven product line engineering for mapping parallel algorithms to parallel computing platforms

NARCIS (Netherlands)

Arkin, Ethem; Tekinerdogan, Bedir

2016-01-01

Mapping parallel algorithms to parallel computing platforms requires several activities such as the analysis of the parallel algorithm, the definition of the logical configuration of the platform, the mapping of the algorithm to the logical configuration platform and the implementation of the
Massively parallel mathematical sieves

Energy Technology Data Exchange (ETDEWEB)

Montry, G.R.

1989-01-01

The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.
Computer-Aided Parallelizer and Optimizer

Science.gov (United States)

Jin, Haoqiang

2011-01-01

The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.
Data communications in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2013-11-12

Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.
Application Portable Parallel Library

Science.gov (United States)

Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

1995-01-01

Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.
Parallel Algorithms and Patterns

Energy Technology Data Exchange (ETDEWEB)

Robey, Robert W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

2016-06-16

This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.
PetClaw: Parallelization and Performance Optimization of a Python-Based Nonlinear Wave Propagation Solver Using PETSc

KAUST Repository

Alghamdi, Amal Mohammed

2012-04-01

Clawpack, a conservation laws package implemented in Fortran, and its Python-based version, PyClaw, are existing tools providing nonlinear wave propagation solvers that use state of the art finite volume methods. Simulations using those tools can have extensive computational requirements to provide accurate results. Therefore, a number of tools, such as BearClaw and MPIClaw, have been developed based on Clawpack to achieve significant speedup by exploiting parallel architectures. However, none of them has been shown to scale on a large number of cores. Furthermore, these tools, implemented in Fortran, achieve parallelization by inserting parallelization logic and MPI standard routines throughout the serial code in a non modular manner. Our contribution in this thesis research is three-fold. First, we demonstrate an advantageous use case of Python in implementing easy-to-use modular extensible scalable scientific software tools by developing an implementation of a parallelization framework, PetClaw, for PyClaw using the well-known Portable Extensible Toolkit for Scientific Computation, PETSc, through its Python wrapper petsc4py. Second, we demonstrate the possibility of getting acceptable Python code performance when compared to Fortran performance after introducing a number of serial optimizations to the Python code including integrating Clawpack Fortran kernels into PyClaw for low-level computationally intensive parts of the code. As a result of those optimizations, the Python overhead in PetClaw for a shallow water application is only 12 percent when compared to the corresponding Fortran Clawpack application. Third, we provide a demonstration of PetClaw scalability on up to the entirety of Shaheen; a 16-rack Blue Gene/P IBM supercomputer that comprises 65,536 cores and located at King Abdullah University of Science and Technology (KAUST). The PetClaw solver achieved above 0.98 weak scaling efficiency for an Euler application on the whole machine excluding the
Parallel Beam-Beam Simulation Incorporating Multiple Bunches and Multiple Interaction Regions

CERN Document Server

Jones, F W; Pieloni, T

2007-01-01

The simulation code COMBI has been developed to enable the study of coherent beam-beam effects in the full collision scenario of the LHC, with multiple bunches interacting at multiple crossing points over many turns. The program structure and input are conceived in a general way which allows arbitrary numbers and placements of bunches and interaction points (IP's), together with procedural options for head-on and parasitic collisions (in the strong-strong sense), beam transport, statistics gathering, harmonic analysis, and periodic output of simulation data. The scale of this problem, once we go beyond the simplest case of a pair of bunches interacting once per turn, quickly escalates into the parallel computing arena, and herein we will describe the construction of an MPI-based version of COMBI able to utilize arbitrary numbers of processors to support efficient calculation of multi-bunch multi-IP interactions and transport. Implementing the parallel version did not require extensive disruption of the basic ...
A parallel solver for huge dense linear systems

Science.gov (United States)

Badia, J. M.; Movilla, J. L.; Climente, J. I.; Castillo, M.; Marqués, M.; Mayo, R.; Quintana-Ortí, E. S.; Planelles, J.

2011-11-01

HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to yield an efficient solution of the systems on a wide range of parallel platforms, from clusters of processors to massively parallel multiprocessors. It exploits out-of-core strategies to leverage the secondary memory in order to solve huge linear systems O(100.000). The API is based on the parallel linear algebra library PLAPACK, and on its Out-Of-Core (OOC) extension POOCLAPACK. Both PLAPACK and POOCLAPACK use the Message Passing Interface (MPI) as the communication layer and BLAS to perform the local matrix operations. The API provides a friendly interface to the users, hiding almost all the technical aspects related to the parallel execution of the code and the use of the secondary memory to solve the systems. In particular, the API can automatically select the best way to store and solve the systems, depending of the dimension of the system, the number of processes and the main memory of the platform. Experimental results on several parallel platforms report high performance, reaching more than 1 TFLOP with 64 cores to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors. New version program summaryProgram title: Huge Dense System Solver (HDSS) Catalogue identifier: AEHU_v1_1 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHU_v1_1.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 87 062 No. of bytes in distributed program, including test data, etc.: 1 069 110 Distribution format: tar.gz Programming language: Fortran90, C Computer: Parallel architectures: multiprocessors, computer clusters Operating system
High performance computing of density matrix renormalization group method for 2-dimensional model. Parallelization strategy toward peta computing

International Nuclear Information System (INIS)

Yamada, Susumu; Igarashi, Ryo; Machida, Masahiko; Imamura, Toshiyuki; Okumura, Masahiko; Onishi, Hiroaki

2010-01-01

We parallelize the density matrix renormalization group (DMRG) method, which is a ground-state solver for one-dimensional quantum lattice systems. The parallelization allows us to extend the applicable range of the DMRG to n-leg ladders i.e., quasi two-dimension cases. Such an extension is regarded to bring about several breakthroughs in e.g., quantum-physics, chemistry, and nano-engineering. However, the straightforward parallelization requires all-to-all communications between all processes which are unsuitable for multi-core systems, which is a mainstream of current parallel computers. Therefore, we optimize the all-to-all communications by the following two steps. The first one is the elimination of the communications between all processes by only rearranging data distribution with the communication data amount kept. The second one is the avoidance of the communication conflict by rescheduling the calculation and the communication. We evaluate the performance of the DMRG method on multi-core supercomputers and confirm that our two-steps tuning is quite effective. (author)

Totally parallel multilevel algorithms

Science.gov (United States)

Frederickson, Paul O.

1988-01-01

Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.
Neural Parallel Engine: A toolbox for massively parallel neural signal processing.

Science.gov (United States)

Tam, Wing-Kin; Yang, Zhi

2018-05-01

Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.
A possibility of parallel and anti-parallel diffraction measurements on ...

Indian Academy of Sciences (India)

However, a bent perfect crystal (BPC) monochromator at monochromatic focusing condition can provide a quite flat and equal resolution property at both parallel and anti-parallel positions and thus one can have a chance to use both sides for the diffraction experiment. From the data of the FWHM and the / measured ...
PVeStA: A Parallel Statistical Model Checking and Quantitative Analysis Tool

KAUST Repository

AlTurki, Musab

2011-01-01

Statistical model checking is an attractive formal analysis method for probabilistic systems such as, for example, cyber-physical systems which are often probabilistic in nature. This paper is about drastically increasing the scalability of statistical model checking, and making such scalability of analysis available to tools like Maude, where probabilistic systems can be specified at a high level as probabilistic rewrite theories. It presents PVeStA, an extension and parallelization of the VeStA statistical model checking tool [10]. PVeStA supports statistical model checking of probabilistic real-time systems specified as either: (i) discrete or continuous Markov Chains; or (ii) probabilistic rewrite theories in Maude. Furthermore, the properties that it can model check can be expressed in either: (i) PCTL/CSL, or (ii) the QuaTEx quantitative temporal logic. As our experiments show, the performance gains obtained from parallelization can be very high. © 2011 Springer-Verlag.
Automatic mesh refinement and parallel load balancing for Fokker-Planck-DSMC algorithm

Science.gov (United States)

Küchlin, Stephan; Jenny, Patrick

2018-06-01

Recently, a parallel Fokker-Planck-DSMC algorithm for rarefied gas flow simulation in complex domains at all Knudsen numbers was developed by the authors. Fokker-Planck-DSMC (FP-DSMC) is an augmentation of the classical DSMC algorithm, which mitigates the near-continuum deficiencies in terms of computational cost of pure DSMC. At each time step, based on a local Knudsen number criterion, the discrete DSMC collision operator is dynamically switched to the Fokker-Planck operator, which is based on the integration of continuous stochastic processes in time, and has fixed computational cost per particle, rather than per collision. In this contribution, we present an extension of the previous implementation with automatic local mesh refinement and parallel load-balancing. In particular, we show how the properties of discrete approximations to space-filling curves enable an efficient implementation. Exemplary numerical studies highlight the capabilities of the new code.
Parallel implementation of the PHOENIX generalized stellar atmosphere program. II. Wavelength parallelization

International Nuclear Information System (INIS)

Baron, E.; Hauschildt, Peter H.

1998-01-01

We describe an important addition to the parallel implementation of our generalized nonlocal thermodynamic equilibrium (NLTE) stellar atmosphere and radiative transfer computer program PHOENIX. In a previous paper in this series we described data and task parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. These algorithms divided the work spatially or by spectral lines, that is, distributing the radial zones, individual spectral lines, or characteristic rays among different processors and employ, in addition, task parallelism for logically independent functions (such as atomic and molecular line opacities). For finite, monotonic velocity fields, the radiative transfer equation is an initial value problem in wavelength, and hence each wavelength point depends upon the previous one. However, for sophisticated NLTE models of both static and moving atmospheres needed to accurately describe, e.g., novae and supernovae, the number of wavelength points is very large (200,000 - 300,000) and hence parallelization over wavelength can lead both to considerable speedup in calculation time and the ability to make use of the aggregate memory available on massively parallel supercomputers. Here, we describe an implementation of a pipelined design for the wavelength parallelization of PHOENIX, where the necessary data from the processor working on a previous wavelength point is sent to the processor working on the succeeding wavelength point as soon as it is known. Our implementation uses a MIMD design based on a relatively small number of standard message passing interface (MPI) library calls and is fully portable between serial and parallel computers. copyright 1998 The American Astronomical Society
Fast parallel algorithms for the x-ray transform and its adjoint.

Science.gov (United States)

Gao, Hao

2012-11-01

Iterative reconstruction methods often offer better imaging quality and allow for reconstructions with lower imaging dose than classical methods in computed tomography. However, the computational speed is a major concern for these iterative methods, for which the x-ray transform and its adjoint are two most time-consuming components. The speed issue becomes even notable for the 3D imaging such as cone beam scans or helical scans, since the x-ray transform and its adjoint are frequently computed as there is usually not enough computer memory to save the corresponding system matrix. The purpose of this paper is to optimize the algorithm for computing the x-ray transform and its adjoint, and their parallel computation. The fast and highly parallelizable algorithms for the x-ray transform and its adjoint are proposed for the infinitely narrow beam in both 2D and 3D. The extension of these fast algorithms to the finite-size beam is proposed in 2D and discussed in 3D. The CPU and GPU codes are available at https://sites.google.com/site/fastxraytransform. The proposed algorithm is faster than Siddon's algorithm for computing the x-ray transform. In particular, the improvement for the parallel computation can be an order of magnitude. The authors have proposed fast and highly parallelizable algorithms for the x-ray transform and its adjoint, which are extendable for the finite-size beam. The proposed algorithms are suitable for parallel computing in the sense that the computational cost per parallel thread is O(1).
Linear parallel processing machines I

Energy Technology Data Exchange (ETDEWEB)

Von Kunze, M

1984-01-01

As is well-known, non-context-free grammars for generating formal languages happen to be of a certain intrinsic computational power that presents serious difficulties to efficient parsing algorithms as well as for the development of an algebraic theory of contextsensitive languages. In this paper a framework is given for the investigation of the computational power of formal grammars, in order to start a thorough analysis of grammars consisting of derivation rules of the form aB ..-->.. A/sub 1/ ... A /sub n/ b/sub 1/...b /sub m/ . These grammars may be thought of as automata by means of parallel processing, if one considers the variables as operators acting on the terminals while reading them right-to-left. This kind of automata and their 2-dimensional programming language prove to be useful by allowing a concise linear-time algorithm for integer multiplication. Linear parallel processing machines (LP-machines) which are, in their general form, equivalent to Turing machines, include finite automata and pushdown automata (with states encoded) as special cases. Bounded LP-machines yield deterministic accepting automata for nondeterministic contextfree languages, and they define an interesting class of contextsensitive languages. A characterization of this class in terms of generating grammars is established by using derivation trees with crossings as a helpful tool. From the algebraic point of view, deterministic LP-machines are effectively represented semigroups with distinguished subsets. Concerning the dualism between generating and accepting devices of formal languages within the algebraic setting, the concept of accepting automata turns out to reduce essentially to embeddability in an effectively represented extension monoid, even in the classical cases.
A design concept of parallel elasticity extracted from biological muscles for engineered actuators.

Science.gov (United States)

Chen, Jie; Jin, Hongzhe; Iida, Fumiya; Zhao, Jie

2016-08-23

Series elastic actuation that takes inspiration from biological muscle-tendon units has been extensively studied and used to address the challenges (e.g. energy efficiency, robustness) existing in purely stiff robots. However, there also exists another form of passive property in biological actuation, parallel elasticity within muscles themselves, and our knowledge of it is limited: for example, there is still no general design strategy for the elasticity profile. When we look at nature, on the other hand, there seems a universal agreement in biological systems: experimental evidence has suggested that a concave-upward elasticity behaviour is exhibited within the muscles of animals. Seeking to draw possible design clues for elasticity in parallel with actuators, we use a simplified joint model to investigate the mechanisms behind this biologically universal preference of muscles. Actuation of the model is identified from general biological joints and further reduced with a specific focus on muscle elasticity aspects, for the sake of easy implementation. By examining various elasticity scenarios, one without elasticity and three with elasticity of different profiles, we find that parallel elasticity generally exerts contradictory influences on energy efficiency and disturbance rejection, due to the mechanical impedance shift thus caused. The trade-off analysis between them also reveals that concave parallel elasticity is able to achieve a more advantageous balance than linear and convex ones. It is expected that the results could contribute to our further understanding of muscle elasticity and provide a theoretical guideline on how to properly design parallel elasticity behaviours for engineering systems such as artificial actuators and robotic joints.
Parallel magnetic resonance imaging

International Nuclear Information System (INIS)

Larkman, David J; Nunes, Rita G

2007-01-01

Parallel imaging has been the single biggest innovation in magnetic resonance imaging in the last decade. The use of multiple receiver coils to augment the time consuming Fourier encoding has reduced acquisition times significantly. This increase in speed comes at a time when other approaches to acquisition time reduction were reaching engineering and human limits. A brief summary of spatial encoding in MRI is followed by an introduction to the problem parallel imaging is designed to solve. There are a large number of parallel reconstruction algorithms; this article reviews a cross-section, SENSE, SMASH, g-SMASH and GRAPPA, selected to demonstrate the different approaches. Theoretical (the g-factor) and practical (coil design) limits to acquisition speed are reviewed. The practical implementation of parallel imaging is also discussed, in particular coil calibration. How to recognize potential failure modes and their associated artefacts are shown. Well-established applications including angiography, cardiac imaging and applications using echo planar imaging are reviewed and we discuss what makes a good application for parallel imaging. Finally, active research areas where parallel imaging is being used to improve data quality by repairing artefacted images are also reviewed. (invited topical review)
Experiences in Data-Parallel Programming

Directory of Open Access Journals (Sweden)

Terry W. Clark

1997-01-01

Full Text Available To efficiently parallelize a scientific application with a data-parallel compiler requires certain structural properties in the source program, and conversely, the absence of others. A recent parallelization effort of ours reinforced this observation and motivated this correspondence. Specifically, we have transformed a Fortran 77 version of GROMOS, a popular dusty-deck program for molecular dynamics, into Fortran D, a data-parallel dialect of Fortran. During this transformation we have encountered a number of difficulties that probably are neither limited to this particular application nor do they seem likely to be addressed by improved compiler technology in the near future. Our experience with GROMOS suggests a number of points to keep in mind when developing software that may at some time in its life cycle be parallelized with a data-parallel compiler. This note presents some guidelines for engineering data-parallel applications that are compatible with Fortran D or High Performance Fortran compilers.
Non-Cartesian parallel imaging reconstruction.

Science.gov (United States)

Wright, Katherine L; Hamilton, Jesse I; Griswold, Mark A; Gulani, Vikas; Seiberlich, Nicole

2014-11-01

Non-Cartesian parallel imaging has played an important role in reducing data acquisition time in MRI. The use of non-Cartesian trajectories can enable more efficient coverage of k-space, which can be leveraged to reduce scan times. These trajectories can be undersampled to achieve even faster scan times, but the resulting images may contain aliasing artifacts. Just as Cartesian parallel imaging can be used to reconstruct images from undersampled Cartesian data, non-Cartesian parallel imaging methods can mitigate aliasing artifacts by using additional spatial encoding information in the form of the nonhomogeneous sensitivities of multi-coil phased arrays. This review will begin with an overview of non-Cartesian k-space trajectories and their sampling properties, followed by an in-depth discussion of several selected non-Cartesian parallel imaging algorithms. Three representative non-Cartesian parallel imaging methods will be described, including Conjugate Gradient SENSE (CG SENSE), non-Cartesian generalized autocalibrating partially parallel acquisition (GRAPPA), and Iterative Self-Consistent Parallel Imaging Reconstruction (SPIRiT). After a discussion of these three techniques, several potential promising clinical applications of non-Cartesian parallel imaging will be covered. © 2014 Wiley Periodicals, Inc.
Influence of Paralleling Dies and Paralleling Half-Bridges on Transient Current Distribution in Multichip Power Modules

DEFF Research Database (Denmark)

Li, Helong; Zhou, Wei; Wang, Xiongfei

2018-01-01

This paper addresses the transient current distribution in the multichip half-bridge power modules, where two types of paralleling connections with different current commutation mechanisms are considered: paralleling dies and paralleling half-bridges. It reveals that with paralleling dies, both t...
Parallel Architectures and Parallel Algorithms for Integrated Vision Systems. Ph.D. Thesis

Science.gov (United States)

Choudhary, Alok Nidhi

1989-01-01

Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems.
Pattern-Driven Automatic Parallelization

Directory of Open Access Journals (Sweden)

Christoph W. Kessler

1996-01-01

Full Text Available This article describes a knowledge-based system for automatic parallelization of a wide class of sequential numerical codes operating on vectors and dense matrices, and for execution on distributed memory message-passing multiprocessors. Its main feature is a fast and powerful pattern recognition tool that locally identifies frequently occurring computations and programming concepts in the source code. This tool also works for dusty deck codes that have been "encrypted" by former machine-specific code transformations. Successful pattern recognition guides sophisticated code transformations including local algorithm replacement such that the parallelized code need not emerge from the sequential program structure by just parallelizing the loops. It allows access to an expert's knowledge on useful parallel algorithms, available machine-specific library routines, and powerful program transformations. The partially restored program semantics also supports local array alignment, distribution, and redistribution, and allows for faster and more exact prediction of the performance of the parallelized target code than is usually possible.
Data communications in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2013-10-29

Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a data communications instruction, the instruction characterized by an instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance with the instruction type, the transfer data from the origin endpoint to the target endpoint.
The STAPL Parallel Graph Library

KAUST Repository

Harshvardhan,; Fidel, Adam; Amato, Nancy M.; Rauchwerger, Lawrence

2013-01-01

This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable
Parallelism and array processing

International Nuclear Information System (INIS)

Zacharov, V.

1983-01-01

Modern computing, as well as the historical development of computing, has been dominated by sequential monoprocessing. Yet there is the alternative of parallelism, where several processes may be in concurrent execution. This alternative is discussed in a series of lectures, in which the main developments involving parallelism are considered, both from the standpoint of computing systems and that of applications that can exploit such systems. The lectures seek to discuss parallelism in a historical context, and to identify all the main aspects of concurrency in computation right up to the present time. Included will be consideration of the important question as to what use parallelism might be in the field of data processing. (orig.)
Vectorization, parallelization and porting of nuclear codes (vectorization and parallelization). Progress report fiscal 1998

International Nuclear Information System (INIS)

Ishizuki, Shigeru; Kawai, Wataru; Nemoto, Toshiyuki; Ogasawara, Shinobu; Kume, Etsuo; Adachi, Masaaki; Kawasaki, Nobuo; Yatake, Yo-ichi

2000-03-01

Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system, the AP3000 system and the Paragon system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 12 codes in fiscal 1998. These results are reported in 3 parts, i.e., the vectorization and parallelization on vector processors part, the parallelization on scalar processors part and the porting part. In this report, we describe the vectorization and parallelization on vector processors. In this vectorization and parallelization on vector processors part, the vectorization of General Tokamak Circuit Simulation Program code GTCSP, the vectorization and parallelization of Molecular Dynamics NTV (n-particle, Temperature and Velocity) Simulation code MSP2, Eddy Current Analysis code EDDYCAL, Thermal Analysis Code for Test of Passive Cooling System by HENDEL T2 code THANPACST2 and MHD Equilibrium code SELENEJ on the VPP500 are described. In the parallelization on scalar processors part, the parallelization of Monte Carlo N-Particle Transport code MCNP4B2, Plasma Hydrodynamics code using Cubic Interpolated Propagation Method PHCIP and Vectorized Monte Carlo code (continuous energy model / multi-group model) MVP/GMVP on the Paragon are described. In the porting part, the porting of Monte Carlo N-Particle Transport code MCNP4B2 and Reactor Safety Analysis code RELAP5 on the AP3000 are described. (author)
Parallel External Memory Graph Algorithms

DEFF Research Database (Denmark)

Arge, Lars Allan; Goodrich, Michael T.; Sitchinava, Nodari

2010-01-01

In this paper, we study parallel I/O efficient graph algorithms in the Parallel External Memory (PEM) model, one o f the private-cache chip multiprocessor (CMP) models. We study the fundamental problem of list ranking which leads to efficient solutions to problems on trees, such as computing lowest...... an optimal speedup of Â¿(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts....

Parallel inter channel interaction mechanisms

International Nuclear Information System (INIS)

Jovic, V.; Afgan, N.; Jovic, L.

1995-01-01

Parallel channels interactions are examined. For experimental researches of nonstationary regimes flow in three parallel vertical channels results of phenomenon analysis and mechanisms of parallel channel interaction for adiabatic condition of one-phase fluid and two-phase mixture flow are shown. (author)
Passive and partially active fault tolerance for massively parallel stream processing engines

DEFF Research Database (Denmark)

Su, Li; Zhou, Yongluan

2018-01-01

. On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE...... also propose effective and efficient algorithms to optimize a partially active replication plan to maximize the quality of tentative outputs. We implemented PPA on top of Storm, an open-source MPSPE and conducted extensive experiments using both real and synthetic datasets to verify the effectiveness...
Seeing or moving in parallel

DEFF Research Database (Denmark)

Christensen, Mark Schram; Ehrsson, H Henrik; Nielsen, Jens Bo

2013-01-01

a different network, involving bilateral dorsal premotor cortex (PMd), primary motor cortex, and SMA, was more active when subjects viewed parallel movements while performing either symmetrical or parallel movements. Correlations between behavioral instability and brain activity were present in right lateral...... adduction-abduction movements symmetrically or in parallel with real-time congruent or incongruent visual feedback of the movements. One network, consisting of bilateral superior and middle frontal gyrus and supplementary motor area (SMA), was more active when subjects performed parallel movements, whereas...
The numerical parallel computing of photon transport

International Nuclear Information System (INIS)

Huang Qingnan; Liang Xiaoguang; Zhang Lifa

1998-12-01

The parallel computing of photon transport is investigated, the parallel algorithm and the parallelization of programs on parallel computers both with shared memory and with distributed memory are discussed. By analyzing the inherent law of the mathematics and physics model of photon transport according to the structure feature of parallel computers, using the strategy of 'to divide and conquer', adjusting the algorithm structure of the program, dissolving the data relationship, finding parallel liable ingredients and creating large grain parallel subtasks, the sequential computing of photon transport into is efficiently transformed into parallel and vector computing. The program was run on various HP parallel computers such as the HY-1 (PVP), the Challenge (SMP) and the YH-3 (MPP) and very good parallel speedup has been gotten
Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

Science.gov (United States)

Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

2017-07-01

Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).
Writing parallel programs that work

CERN Multimedia

CERN. Geneva

2012-01-01

Serial algorithms typically run inefficiently on parallel machines. This may sound like an obvious statement, but it is the root cause of why parallel programming is considered to be difficult. The current state of the computer industry is still that almost all programs in existence are serial. This talk will describe the techniques used in the Intel Parallel Studio to provide a developer with the tools necessary to understand the behaviors and limitations of the existing serial programs. Once the limitations are known the developer can refactor the algorithms and reanalyze the resulting programs with the tools in the Intel Parallel Studio to create parallel programs that work. About the speaker Paul Petersen is a Sr. Principal Engineer in the Software and Solutions Group (SSG) at Intel. He received a Ph.D. degree in Computer Science from the University of Illinois in 1993. After UIUC, he was employed at Kuck and Associates, Inc. (KAI) working on auto-parallelizing compiler (KAP), and was involved in th...
Parallel Framework for Cooperative Processes

Directory of Open Access Journals (Sweden)

Mitică Craus

2005-01-01

Full Text Available This paper describes the work of an object oriented framework designed to be used in the parallelization of a set of related algorithms. The idea behind the system we are describing is to have a re-usable framework for running several sequential algorithms in a parallel environment. The algorithms that the framework can be used with have several things in common: they have to run in cycles and the work should be possible to be split between several "processing units". The parallel framework uses the message-passing communication paradigm and is organized as a master-slave system. Two applications are presented: an Ant Colony Optimization (ACO parallel algorithm for the Travelling Salesman Problem (TSP and an Image Processing (IP parallel algorithm for the Symmetrical Neighborhood Filter (SNF. The implementations of these applications by means of the parallel framework prove to have good performances: approximatively linear speedup and low communication cost.
Compiler Technology for Parallel Scientific Computation

Directory of Open Access Journals (Sweden)

Can Özturan

1994-01-01

Full Text Available There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving large-scale problems in science and engineering. Yet, the use of parallel computation is limited by the high costs of developing the needed software. To overcome this difficulty we advocate a comprehensive approach to the development of scalable architecture-independent software for scientific computation based on our experience with equational programming language (EPL. Our approach is based on a program decomposition, parallel code synthesis, and run-time support for parallel scientific computation. The program decomposition is guided by the source program annotations provided by the user. The synthesis of parallel code is based on configurations that describe the overall computation as a set of interacting components. Run-time support is provided by the compiler-generated code that redistributes computation and data during object program execution. The generated parallel code is optimized using techniques of data alignment, operator placement, wavefront determination, and memory optimization. In this article we discuss annotations, configurations, parallel code generation, and run-time support suitable for parallel programs written in the functional parallel programming language EPL and in Fortran.
Object-Oriented Parallel Particle-in-Cell Code for Beam Dynamics Simulation in Linear Accelerators

International Nuclear Information System (INIS)

Qiang, J.; Ryne, R.D.; Habib, S.; Decky, V.

1999-01-01

In this paper, we present an object-oriented three-dimensional parallel particle-in-cell code for beam dynamics simulation in linear accelerators. A two-dimensional parallel domain decomposition approach is employed within a message passing programming paradigm along with a dynamic load balancing. Implementing object-oriented software design provides the code with better maintainability, reusability, and extensibility compared with conventional structure based code. This also helps to encapsulate the details of communications syntax. Performance tests on SGI/Cray T3E-900 and SGI Origin 2000 machines show good scalability of the object-oriented code. Some important features of this code also include employing symplectic integration with linear maps of external focusing elements and using z as the independent variable, typical in accelerators. A successful application was done to simulate beam transport through three superconducting sections in the APT linac design
Parallel computing: numerics, applications, and trends

National Research Council Canada - National Science Library

Trobec, Roman; Vajteršic, Marián; Zinterhof, Peter

2009-01-01

... and/or distributed systems. The contributions to this book are focused on topics most concerned in the trends of today's parallel computing. These range from parallel algorithmics, programming, tools, network computing to future parallel computing. Particular attention is paid to parallel numerics: linear algebra, differential equations, numerica...
Parallel Computing Strategies for Irregular Algorithms

Science.gov (United States)

Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

2002-01-01

Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Parallel computing and molecular dynamics of biological membranes

International Nuclear Information System (INIS)

La Penna, G.; Letardi, S.; Minicozzi, V.; Morante, S.; Rossi, G.C.; Salina, G.

1998-01-01

In this talk I discuss the general question of the portability of molecular dynamics codes for diffusive systems on parallel computers of the APE family. The intrinsic single precision of the today available platforms does not seem to affect the numerical accuracy of the simulations, while the absence of integer addressing from CPU to individual nodes puts strong constraints on possible programming strategies. Liquids can be satisfactorily simulated using the ''systolic'' method. For more complex systems, like the biological ones at which we are ultimately interested in, the ''domain decomposition'' approach is best suited to beat the quadratic growth of the inter-molecular computational time with the number of atoms of the system. The promising perspectives of using this strategy for extensive simulations of lipid bilayers are briefly reviewed. (orig.)
Vector and parallel processors in computational science. Proceedings

Energy Technology Data Exchange (ETDEWEB)

Duff, I S; Reid, J K

1985-01-01

This volume contains papers from most of the invited talks and from several of the contributed talks and poster sessions presented at VAPP II. The contents present an extensive coverage of all important aspects of vector and parallel processors, including hardware, languages, numerical algorithms and applications. The topics covered include descriptions of new machines (both research and commercial machines), languages and software aids, and general discussions of whole classes of machines and their uses. Numerical methods papers include Monte Carlo algorithms, iterative and direct methods for solving large systems, finite elements, optimization, random number generation and mathematical software. The specific applications covered include neutron diffusion calculations, molecular dynamics, weather forecasting, lattice gauge calculations, fluid dynamics, flight simulation, cartography, image processing and cryptography. Most machines and architecture types are being used for these applications. many refs.
P-SPARSLIB: A parallel sparse iterative solution package

Energy Technology Data Exchange (ETDEWEB)

Saad, Y. [Univ. of Minnesota, Minneapolis, MN (United States)

1994-12-31

Iterative methods are gaining popularity in engineering and sciences at a time where the computational environment is changing rapidly. P-SPARSLIB is a project to build a software library for sparse matrix computations on parallel computers. The emphasis is on iterative methods and the use of distributed sparse matrices, an extension of the domain decomposition approach to general sparse matrices. One of the goals of this project is to develop a software package geared towards specific applications. For example, the author will test the performance and usefulness of P-SPARSLIB modules on linear systems arising from CFD applications. Equally important is the goal of portability. In the long run, the author wishes to ensure that this package is portable on a variety of platforms, including SIMD environments and shared memory environments.
The Glasgow Parallel Reduction Machine: Programming Shared-memory Many-core Systems using Parallel Task Composition

Directory of Open Access Journals (Sweden)

Ashkan Tousimojarad

2013-12-01

Full Text Available We present the Glasgow Parallel Reduction Machine (GPRM, a novel, flexible framework for parallel task-composition based many-core programming. We allow the programmer to structure programs into task code, written as C++ classes, and communication code, written in a restricted subset of C++ with functional semantics and parallel evaluation. In this paper we discuss the GPRM, the virtual machine framework that enables the parallel task composition approach. We focus the discussion on GPIR, the functional language used as the intermediate representation of the bytecode running on the GPRM. Using examples in this language we show the flexibility and power of our task composition framework. We demonstrate the potential using an implementation of a merge sort algorithm on a 64-core Tilera processor, as well as on a conventional Intel quad-core processor and an AMD 48-core processor system. We also compare our framework with OpenMP tasks in a parallel pointer chasing algorithm running on the Tilera processor. Our results show that the GPRM programs outperform the corresponding OpenMP codes on all test platforms, and can greatly facilitate writing of parallel programs, in particular non-data parallel algorithms such as reductions.
Patterns for Parallel Software Design

CERN Document Server

Ortega-Arjona, Jorge Luis

2010-01-01

Essential reading to understand patterns for parallel programming Software patterns have revolutionized the way we think about how software is designed, built, and documented, and the design of parallel software requires you to consider other particular design aspects and special skills. From clusters to supercomputers, success heavily depends on the design skills of software developers. Patterns for Parallel Software Design presents a pattern-oriented software architecture approach to parallel software design. This approach is not a design method in the classic sense, but a new way of managin
High performance parallel I/O

CERN Document Server

Prabhat

2014-01-01

Gain Critical Insight into the Parallel I/O EcosystemParallel I/O is an integral component of modern high performance computing (HPC), especially in storing and processing very large datasets to facilitate scientific discovery. Revealing the state of the art in this field, High Performance Parallel I/O draws on insights from leading practitioners, researchers, software architects, developers, and scientists who shed light on the parallel I/O ecosystem.The first part of the book explains how large-scale HPC facilities scope, configure, and operate systems, with an emphasis on choices of I/O har
Parallel transport of long mean-free-path plasma along open magnetic field lines: Parallel heat flux

International Nuclear Information System (INIS)

Guo Zehua; Tang Xianzhu

2012-01-01

In a long mean-free-path plasma where temperature anisotropy can be sustained, the parallel heat flux has two components with one associated with the parallel thermal energy and the other the perpendicular thermal energy. Due to the large deviation of the distribution function from local Maxwellian in an open field line plasma with low collisionality, the conventional perturbative calculation of the parallel heat flux closure in its local or non-local form is no longer applicable. Here, a non-perturbative calculation is presented for a collisionless plasma in a two-dimensional flux expander bounded by absorbing walls. Specifically, closures of previously unfamiliar form are obtained for ions and electrons, which relate two distinct components of the species parallel heat flux to the lower order fluid moments such as density, parallel flow, parallel and perpendicular temperatures, and the field quantities such as the magnetic field strength and the electrostatic potential. The plasma source and boundary condition at the absorbing wall enter explicitly in the closure calculation. Although the closure calculation does not take into account wave-particle interactions, the results based on passing orbits from steady-state collisionless drift-kinetic equation show remarkable agreement with fully kinetic-Maxwell simulations. As an example of the physical implications of the theory, the parallel heat flux closures are found to predict a surprising observation in the kinetic-Maxwell simulation of the 2D magnetic flux expander problem, where the parallel heat flux of the parallel thermal energy flows from low to high parallel temperature region.
Is Monte Carlo embarrassingly parallel?

Energy Technology Data Exchange (ETDEWEB)

Hoogenboom, J. E. [Delft Univ. of Technology, Mekelweg 15, 2629 JB Delft (Netherlands); Delft Nuclear Consultancy, IJsselzoom 2, 2902 LB Capelle aan den IJssel (Netherlands)

2012-07-01

Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)
Is Monte Carlo embarrassingly parallel?

International Nuclear Information System (INIS)

Hoogenboom, J. E.

2012-01-01

Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)

Parallel algorithms for continuum dynamics

International Nuclear Information System (INIS)

Hicks, D.L.; Liebrock, L.M.

1987-01-01

Simply porting existing parallel programs to a new parallel processor may not achieve the full speedup possible; to achieve the maximum efficiency may require redesigning the parallel algorithms for the specific architecture. The authors discuss here parallel algorithms that were developed first for the HEP processor and then ported to the CRAY X-MP/4, the ELXSI/10, and the Intel iPSC/32. Focus is mainly on the most recent parallel processing results produced, i.e., those on the Intel Hypercube. The applications are simulations of continuum dynamics in which the momentum and stress gradients are important. Examples of these are inertial confinement fusion experiments, severe breaks in the coolant system of a reactor, weapons physics, shock-wave physics. Speedup efficiencies on the Intel iPSC Hypercube are very sensitive to the ratio of communication to computation. Great care must be taken in designing algorithms for this machine to avoid global communication. This is much more critical on the iPSC than it was on the three previous parallel processors
Vectorization, parallelization and porting of nuclear codes. Vectorization and parallelization. Progress report fiscal 1999

Energy Technology Data Exchange (ETDEWEB)

Adachi, Masaaki; Ogasawara, Shinobu; Kume, Etsuo [Japan Atomic Energy Research Inst., Tokai, Ibaraki (Japan). Tokai Research Establishment; Ishizuki, Shigeru; Nemoto, Toshiyuki; Kawasaki, Nobuo; Kawai, Wataru [Fujitsu Ltd., Tokyo (Japan); Yatake, Yo-ichi [Hitachi Ltd., Tokyo (Japan)

2001-02-01

Several computer codes in the nuclear field have been vectorized, parallelized and trans-ported on the FUJITSU VPP500 system, the AP3000 system, the SX-4 system and the Paragon system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 18 codes in fiscal 1999. These results are reported in 3 parts, i.e., the vectorization and the parallelization part on vector processors, the parallelization part on scalar processors and the porting part. In this report, we describe the vectorization and parallelization on vector processors. In this vectorization and parallelization on vector processors part, the vectorization of Relativistic Molecular Orbital Calculation code RSCAT, a microscopic transport code for high energy nuclear collisions code JAM, three-dimensional non-steady thermal-fluid analysis code STREAM, Relativistic Density Functional Theory code RDFT and High Speed Three-Dimensional Nodal Diffusion code MOSRA-Light on the VPP500 system and the SX-4 system are described. (author)
Parallel R-matrix computation

International Nuclear Information System (INIS)

Heggarty, J.W.

1999-06-01

For almost thirty years, sequential R-matrix computation has been used by atomic physics research groups, from around the world, to model collision phenomena involving the scattering of electrons or positrons with atomic or molecular targets. As considerable progress has been made in the understanding of fundamental scattering processes, new data, obtained from more complex calculations, is of current interest to experimentalists. Performing such calculations, however, places considerable demands on the computational resources to be provided by the target machine, in terms of both processor speed and memory requirement. Indeed, in some instances the computational requirements are so great that the proposed R-matrix calculations are intractable, even when utilising contemporary classic supercomputers. Historically, increases in the computational requirements of R-matrix computation were accommodated by porting the problem codes to a more powerful classic supercomputer. Although this approach has been successful in the past, it is no longer considered to be a satisfactory solution due to the limitations of current (and future) Von Neumann machines. As a consequence, there has been considerable interest in the high performance multicomputers, that have emerged over the last decade which appear to offer the computational resources required by contemporary R-matrix research. Unfortunately, developing codes for these machines is not as simple a task as it was to develop codes for successive classic supercomputers. The difficulty arises from the considerable differences in the computing models that exist between the two types of machine and results in the programming of multicomputers to be widely acknowledged as a difficult, time consuming and error-prone task. Nevertheless, unless parallel R-matrix computation is realised, important theoretical and experimental atomic physics research will continue to be hindered. This thesis describes work that was undertaken in
Implementation and performance of parallelized elegant

International Nuclear Information System (INIS)

Wang, Y.; Borland, M.

2008-01-01

The program elegant is widely used for design and modeling of linacs for free-electron lasers and energy recovery linacs, as well as storage rings and other applications. As part of a multi-year effort, we have parallelized many aspects of the code, including single-particle dynamics, wakefields, and coherent synchrotron radiation. We report on the approach used for gradual parallelization, which proved very beneficial in getting parallel features into the hands of users quickly. We also report details of parallelization of collective effects. Finally, we discuss performance of the parallelized code in various applications.
Parallelizing the spectral transform method: A comparison of alternative parallel algorithms

International Nuclear Information System (INIS)

Foster, I.; Worley, P.H.

1993-01-01

The spectral transform method is a standard numerical technique for solving partial differential equations on the sphere and is widely used in global climate modeling. In this paper, we outline different approaches to parallelizing the method and describe experiments that we are conducting to evaluate the efficiency of these approaches on parallel computers. The experiments are conducted using a testbed code that solves the nonlinear shallow water equations on a sphere, but are designed to permit evaluation in the context of a global model. They allow us to evaluate the relative merits of the approaches as a function of problem size and number of processors. The results of this study are guiding ongoing work on PCCM2, a parallel implementation of the Community Climate Model developed at the National Center for Atmospheric Research
Algorithms for parallel computers

International Nuclear Information System (INIS)

Churchhouse, R.F.

1985-01-01

Until relatively recently almost all the algorithms for use on computers had been designed on the (usually unstated) assumption that they were to be run on single processor, serial machines. With the introduction of vector processors, array processors and interconnected systems of mainframes, minis and micros, however, various forms of parallelism have become available. The advantage of parallelism is that it offers increased overall processing speed but it also raises some fundamental questions, including: (i) which, if any, of the existing 'serial' algorithms can be adapted for use in the parallel mode. (ii) How close to optimal can such adapted algorithms be and, where relevant, what are the convergence criteria. (iii) How can we design new algorithms specifically for parallel systems. (iv) For multi-processor systems how can we handle the software aspects of the interprocessor communications. Aspects of these questions illustrated by examples are considered in these lectures. (orig.)
Parallel processing for fluid dynamics applications

International Nuclear Information System (INIS)

Johnson, G.M.

1989-01-01

The impact of parallel processing on computational science and, in particular, on computational fluid dynamics is growing rapidly. In this paper, particular emphasis is given to developments which have occurred within the past two years. Parallel processing is defined and the reasons for its importance in high-performance computing are reviewed. Parallel computer architectures are classified according to the number and power of their processing units, their memory, and the nature of their connection scheme. Architectures which show promise for fluid dynamics applications are emphasized. Fluid dynamics problems are examined for parallelism inherent at the physical level. CFD algorithms and their mappings onto parallel architectures are discussed. Several example are presented to document the performance of fluid dynamics applications on present-generation parallel processing devices
Parallel discrete event simulation

NARCIS (Netherlands)

Overeinder, B.J.; Hertzberger, L.O.; Sloot, P.M.A.; Withagen, W.J.

1991-01-01

In simulating applications for execution on specific computing systems, the simulation performance figures must be known in a short period of time. One basic approach to the problem of reducing the required simulation time is the exploitation of parallelism. However, in parallelizing the simulation
Parallelization of Unsteady Adaptive Mesh Refinement for Unstructured Navier-Stokes Solvers

Science.gov (United States)

Schwing, Alan M.; Nompelis, Ioannis; Candler, Graham V.

2014-01-01

This paper explores the implementation of the MPI parallelization in a Navier-Stokes solver using adaptive mesh re nement. Viscous and inviscid test problems are considered for the purpose of benchmarking, as are implicit and explicit time advancement methods. The main test problem for comparison includes e ects from boundary layers and other viscous features and requires a large number of grid points for accurate computation. Ex- perimental validation against double cone experiments in hypersonic ow are shown. The adaptive mesh re nement shows promise for a staple test problem in the hypersonic com- munity. Extension to more advanced techniques for more complicated ows is described.
Shear strength of a thermal barrier coating parallel to the bond coat

International Nuclear Information System (INIS)

Cruse, T.A.; Dommarco, R.C.; Bastias, P.C.

1998-01-01

The static and low cycle fatigue strength of an air plasma sprayed (APS) partially stabilized zirconia thermal barrier coating (TBC) is experimentally evaluated. The shear testing utilized the Iosipescu shear test arrangement. Testing was performed parallel to the TBC-substrate interface. The TBC testing required an innovative use of steel extensions with the TBC bonded between the steel extensions to form the standard Iosipescu specimen shape. The test method appears to have been successful. Fracture of the TBC was initiated in shear, although unconstrained specimen fractures propagated at the TBC-bond coat interface. The use of side grooves on the TBC was successful in keeping the failure in the gage section and did not appear to affect the shear strength values that were measured. Low cycle fatigue failures were obtained at high stress levels approaching the ultimate strength of the TBC. The static and fatigue strengths do not appear to be markedly different from tensile properties for comparable TBC material
The Galley Parallel File System

Science.gov (United States)

Nieuwejaar, Nils; Kotz, David

1996-01-01

Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.
Comparing the performance of different meta-heuristics for unweighted parallel machine scheduling

Directory of Open Access Journals (Sweden)

Adamu, Mumuni Osumah

2015-08-01

Full Text Available This article considers the due window scheduling problem to minimise the number of early and tardy jobs on identical parallel machines. This problem is known to be NP complete and thus finding an optimal solution is unlikely. Three meta-heuristics and their hybrids are proposed and extensive computational experiments are conducted. The purpose of this paper is to compare the performance of these meta-heuristics and their hybrids and to determine the best among them. Detailed comparative tests have also been conducted to analyse the different heuristics with the simulated annealing hybrid giving the best result.
PDDP, A Data Parallel Programming Model

Directory of Open Access Journals (Sweden)

Karen H. Warren

1996-01-01

Full Text Available PDDP, the parallel data distribution preprocessor, is a data parallel programming model for distributed memory parallel computers. PDDP implements high-performance Fortran-compatible data distribution directives and parallelism expressed by the use of Fortran 90 array syntax, the FORALL statement, and the WHERE construct. Distributed data objects belong to a global name space; other data objects are treated as local and replicated on each processor. PDDP allows the user to program in a shared memory style and generates codes that are portable to a variety of parallel machines. For interprocessor communication, PDDP uses the fastest communication primitives on each platform.
Design considerations for parallel graphics libraries

Science.gov (United States)

Crockett, Thomas W.

1994-01-01

Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.
Machine translation with minimal reliance on parallel resources

CERN Document Server

Tambouratzis, George; Sofianopoulos, Sokratis

2017-01-01

This book provides a unified view on a new methodology for Machine Translation (MT). This methodology extracts information from widely available resources (extensive monolingual corpora) while only assuming the existence of a very limited parallel corpus, thus having a unique starting point to Statistical Machine Translation (SMT). In this book, a detailed presentation of the methodology principles and system architecture is followed by a series of experiments, where the proposed system is compared to other MT systems using a set of established metrics including BLEU, NIST, Meteor and TER. Additionally, a free-to-use code is available, that allows the creation of new MT systems. The volume is addressed to both language professionals and researchers. Prerequisites for the readers are very limited and include a basic understanding of the machine translation as well as of the basic tools of natural language processing.
Automatic Loop Parallelization via Compiler Guided Refactoring

DEFF Research Database (Denmark)

Larsen, Per; Ladelsky, Razya; Lidman, Jacob

For many parallel applications, performance relies not on instruction-level parallelism, but on loop-level parallelism. Unfortunately, many modern applications are written in ways that obstruct automatic loop parallelization. Since we cannot identify sufficient parallelization opportunities...... for these codes in a static, off-line compiler, we developed an interactive compilation feedback system that guides the programmer in iteratively modifying application source, thereby improving the compiler’s ability to generate loop-parallel code. We use this compilation system to modify two sequential...... benchmarks, finding that the code parallelized in this way runs up to 8.3 times faster on an octo-core Intel Xeon 5570 system and up to 12.5 times faster on a quad-core IBM POWER6 system. Benchmark performance varies significantly between the systems. This suggests that semi-automatic parallelization should...
Aspects of computation on asynchronous parallel processors

International Nuclear Information System (INIS)

Wright, M.

1989-01-01

The increasing availability of asynchronous parallel processors has provided opportunities for original and useful work in scientific computing. However, the field of parallel computing is still in a highly volatile state, and researchers display a wide range of opinion about many fundamental questions such as models of parallelism, approaches for detecting and analyzing parallelism of algorithms, and tools that allow software developers and users to make effective use of diverse forms of complex hardware. This volume collects the work of researchers specializing in different aspects of parallel computing, who met to discuss the framework and the mechanics of numerical computing. The far-reaching impact of high-performance asynchronous systems is reflected in the wide variety of topics, which include scientific applications (e.g. linear algebra, lattice gauge simulation, ordinary and partial differential equations), models of parallelism, parallel language features, task scheduling, automatic parallelization techniques, tools for algorithm development in parallel environments, and system design issues
Extensions of the results on powers of -hyponormal and -hyponormal operators

Directory of Open Access Journals (Sweden)

Yang Changsen

2006-01-01

Full Text Available Firstly, we will show the following extension of the results on powers of -hyponormal and -hyponormal operators: let and be positive integers, if is -hyponormal for , then: (i in case , and hold, (ii in case , and hold. Secondly, we will show an estimation on powers of -hyponormal operators for which implies the best possibility of our results. Lastly, we will show a parallel estimation on powers of -hyponormal operators as follows: let , then the following hold for each positive integer and : (i there exists a log-hyponormal operator such that , (ii there exists a -hyponormal operator such that .
Parallelization of the FLAPW method

International Nuclear Information System (INIS)

Canning, A.; Mannstadt, W.; Freeman, A.J.

1999-01-01

The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about one hundred atoms due to a lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel computer
Parallelization of the FLAPW method

Science.gov (United States)

Canning, A.; Mannstadt, W.; Freeman, A. J.

2000-08-01

The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining structural, electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work, we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel supercomputer.

Parallelization of 2-D lattice Boltzmann codes

International Nuclear Information System (INIS)

Suzuki, Soichiro; Kaburaki, Hideo; Yokokawa, Mitsuo.

1996-03-01

Lattice Boltzmann (LB) codes to simulate two dimensional fluid flow are developed on vector parallel computer Fujitsu VPP500 and scalar parallel computer Intel Paragon XP/S. While a 2-D domain decomposition method is used for the scalar parallel LB code, a 1-D domain decomposition method is used for the vector parallel LB code to be vectorized along with the axis perpendicular to the direction of the decomposition. High parallel efficiency of 95.1% by the vector parallel calculation on 16 processors with 1152x1152 grid and 88.6% by the scalar parallel calculation on 100 processors with 800x800 grid are obtained. The performance models are developed to analyze the performance of the LB codes. It is shown by our performance models that the execution speed of the vector parallel code is about one hundred times faster than that of the scalar parallel code with the same number of processors up to 100 processors. We also analyze the scalability in keeping the available memory size of one processor element at maximum. Our performance model predicts that the execution time of the vector parallel code increases about 3% on 500 processors. Although the 1-D domain decomposition method has in general a drawback in the interprocessor communication, the vector parallel LB code is still suitable for the large scale and/or high resolution simulations. (author)
Parallelization of 2-D lattice Boltzmann codes

Energy Technology Data Exchange (ETDEWEB)

Suzuki, Soichiro; Kaburaki, Hideo; Yokokawa, Mitsuo

1996-03-01

Lattice Boltzmann (LB) codes to simulate two dimensional fluid flow are developed on vector parallel computer Fujitsu VPP500 and scalar parallel computer Intel Paragon XP/S. While a 2-D domain decomposition method is used for the scalar parallel LB code, a 1-D domain decomposition method is used for the vector parallel LB code to be vectorized along with the axis perpendicular to the direction of the decomposition. High parallel efficiency of 95.1% by the vector parallel calculation on 16 processors with 1152x1152 grid and 88.6% by the scalar parallel calculation on 100 processors with 800x800 grid are obtained. The performance models are developed to analyze the performance of the LB codes. It is shown by our performance models that the execution speed of the vector parallel code is about one hundred times faster than that of the scalar parallel code with the same number of processors up to 100 processors. We also analyze the scalability in keeping the available memory size of one processor element at maximum. Our performance model predicts that the execution time of the vector parallel code increases about 3% on 500 processors. Although the 1-D domain decomposition method has in general a drawback in the interprocessor communication, the vector parallel LB code is still suitable for the large scale and/or high resolution simulations. (author).
Explorations of the implementation of a parallel IDW interpolation algorithm in a Linux cluster-based parallel GIS

Science.gov (United States)

Huang, Fang; Liu, Dingsheng; Tan, Xicheng; Wang, Jian; Chen, Yunping; He, Binbin

2011-04-01

To design and implement an open-source parallel GIS (OP-GIS) based on a Linux cluster, the parallel inverse distance weighting (IDW) interpolation algorithm has been chosen as an example to explore the working model and the principle of algorithm parallel pattern (APP), one of the parallelization patterns for OP-GIS. Based on an analysis of the serial IDW interpolation algorithm of GRASS GIS, this paper has proposed and designed a specific parallel IDW interpolation algorithm, incorporating both single process, multiple data (SPMD) and master/slave (M/S) programming modes. The main steps of the parallel IDW interpolation algorithm are: (1) the master node packages the related information, and then broadcasts it to the slave nodes; (2) each node calculates its assigned data extent along one row using the serial algorithm; (3) the master node gathers the data from all nodes; and (4) iterations continue until all rows have been processed, after which the results are outputted. According to the experiments performed in the course of this work, the parallel IDW interpolation algorithm can attain an efficiency greater than 0.93 compared with similar algorithms, which indicates that the parallel algorithm can greatly reduce processing time and maximize speed and performance.
Parallel Monte Carlo reactor neutronics

International Nuclear Information System (INIS)

Blomquist, R.N.; Brown, F.B.

1994-01-01

The issues affecting implementation of parallel algorithms for large-scale engineering Monte Carlo neutron transport simulations are discussed. For nuclear reactor calculations, these include load balancing, recoding effort, reproducibility, domain decomposition techniques, I/O minimization, and strategies for different parallel architectures. Two codes were parallelized and tested for performance. The architectures employed include SIMD, MIMD-distributed memory, and workstation network with uneven interactive load. Speedups linear with the number of nodes were achieved
Parallel Implicit Algorithms for CFD

Science.gov (United States)

Keyes, David E.

1998-01-01

The main goal of this project was efficient distributed parallel and workstation cluster implementations of Newton-Krylov-Schwarz (NKS) solvers for implicit Computational Fluid Dynamics (CFD.) "Newton" refers to a quadratically convergent nonlinear iteration using gradient information based on the true residual, "Krylov" to an inner linear iteration that accesses the Jacobian matrix only through highly parallelizable sparse matrix-vector products, and "Schwarz" to a domain decomposition form of preconditioning the inner Krylov iterations with primarily neighbor-only exchange of data between the processors. Prior experience has established that Newton-Krylov methods are competitive solvers in the CFD context and that Krylov-Schwarz methods port well to distributed memory computers. The combination of the techniques into Newton-Krylov-Schwarz was implemented on 2D and 3D unstructured Euler codes on the parallel testbeds that used to be at LaRC and on several other parallel computers operated by other agencies or made available by the vendors. Early implementations were made directly in Massively Parallel Integration (MPI) with parallel solvers we adapted from legacy NASA codes and enhanced for full NKS functionality. Later implementations were made in the framework of the PETSC library from Argonne National Laboratory, which now includes pseudo-transient continuation Newton-Krylov-Schwarz solver capability (as a result of demands we made upon PETSC during our early porting experiences). A secondary project pursued with funding from this contract was parallel implicit solvers in acoustics, specifically in the Helmholtz formulation. A 2D acoustic inverse problem has been solved in parallel within the PETSC framework.
Parallel kinematics type, kinematics, and optimal design

CERN Document Server

Liu, Xin-Jun

2014-01-01

Parallel Kinematics- Type, Kinematics, and Optimal Design presents the results of 15 year's research on parallel mechanisms and parallel kinematics machines. This book covers the systematic classification of parallel mechanisms (PMs) as well as providing a large number of mechanical architectures of PMs available for use in practical applications. It focuses on the kinematic design of parallel robots. One successful application of parallel mechanisms in the field of machine tools, which is also called parallel kinematics machines, has been the emerging trend in advanced machine tools. The book describes not only the main aspects and important topics in parallel kinematics, but also references novel concepts and approaches, i.e. type synthesis based on evolution, performance evaluation and optimization based on screw theory, singularity model taking into account motion and force transmissibility, and others. This book is intended for researchers, scientists, engineers and postgraduates or above with interes...
Vectorization, parallelization and porting of nuclear codes on the VPP500 system (vectorization). Progress report fiscal 1997

International Nuclear Information System (INIS)

Kawasaki, Nobuo; Ogasawara, Shinobu; Adachi, Masaaki; Kume, Etsuo; Ishizuki, Shigeru; Tanabe, Hidenobu; Nemoto, Toshiyuki; Kawai, Wataru; Watanabe, Hideo

1999-05-01

Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system and/or the AP3000 system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 14 codes in fiscal 1997. These results are reported in 3 parts, i.e., the vectorization part, the parallelization part and the porting part. In this report, we describe the vectorization. In this vectorization part, the vectorization of multidimensional two-fluid model code ACE-3D for evaluation of constitutive equations, statistical decay code SD and three-dimensional thermal analysis code for in-core test section (T2) of HENDEL SSPHEAT are described. In the parallelization part, the parallelization of cylindrical direct numerical simulation code CYLDNS44N, worldwide version of system for prediction of environmental emergency dose information code WSPEEDI, extension of quantum molecular dynamics code EQMD and three-dimensional non-steady compressible fluid dynamics code STREAM are described. In the porting part, the porting of transient reactor analysis code TRAC-BF1 and Monte Carlo radiation transport code MCNP4A on the AP3000 are described. In addition, a modification of program libraries for command-driven interactive data analysis plotting program IPLOT is described. (author)
Vectorization, parallelization and porting of nuclear codes on the VPP500 system (porting). Progress report fiscal 1997

International Nuclear Information System (INIS)

Ishizuki, Shigeru; Nemoto, Toshiyuki; Kawai, Wataru; Watanabe, Hideo; Tanabe, Hidenobu; Kawasaki, Nobuo; Adachi, Masaaki; Ogasawara, Shinobu; Kume, Etsuo

1999-05-01

Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system and/or the AP3000 system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 14 codes in fiscal 1997. These results are reported in 3 parts, i.e., the vectorization part, the parallelization part and the porting part. In this report, we describe the porting. In this porting part, the porting of transient reactor analysis code TRAC-BF1 and Monte Carlo radiation transport code MCNP4A on the AP3000 are described. In addition, a modification of program libraries for command-driven interactive data analysis plotting program IPLOT is described. In the vectorization part, the vectorization of multidimensional two-fluid model code ACE-3D for evaluation of constitutive equations, statistical decay code SD and three-dimensional thermal analysis code for in-core test section (T2) of HENDEL SSPHEAT are described. In the parallelization part, the parallelization of cylindrical direct numerical simulation code CYLDNS44N, worldwide version of system for prediction of environmental emergency dose information code WSPEEDI, extension of quantum molecular dynamics code EQMD and three-dimensional non-steady compressible fluid dynamics code STREAM are described. (author)
CELLS v1.0: updated and parallelized version of an electrical scheme to simulate multiple electrified clouds and flashes over large domains

Directory of Open Access Journals (Sweden)

C. Barthe

2012-01-01

Full Text Available The paper describes the fully parallelized electrical scheme CELLS which is suitable to simulate explicitly electrified storm systems on parallel computers. Our motivation here is to show that a cloud electricity scheme can be developed for use on large grids with complex terrain. Large computational domains are needed to perform real case meteorological simulations with many independent convective cells.

The scheme computes the bulk electric charge attached to each cloud particle and hydrometeor. Positive and negative ions are also taken into account. Several parametrizations of the dominant non-inductive charging process are included and an inductive charging process as well. The electric field is obtained by inverting the Gauss equation with an extension to terrain-following coordinates. The new feature concerns the lightning flash scheme which is a simplified version of an older detailed sequential scheme. Flashes are composed of a bidirectional leader phase (vertical extension from the triggering point and a phase obeying a fractal law (with horizontal extension on electrically charged zones. The originality of the scheme lies in the way the branching phase is treated to get a parallel code.

The complete electrification scheme is tested for the 10 July 1996 STERAO case and for the 21 July 1998 EULINOX case. Flash characteristics are analysed in detail and additional sensitivity experiments are performed for the STERAO case. Although the simulations were run for flat terrain conditions, they show that the model behaves well on multiprocessor computers. This opens a wide area of application for this electrical scheme with the next objective of running real meterological case on large domains.
Experiments with parallel algorithms for combinatorial problems

NARCIS (Netherlands)

G.A.P. Kindervater (Gerard); H.W.J.M. Trienekens

1985-01-01

textabstractIn the last decade many models for parallel computation have been proposed and many parallel algorithms have been developed. However, few of these models have been realized and most of these algorithms are supposed to run on idealized, unrealistic parallel machines. The parallel machines
Postobductional extension along and within the Frontal Range of the Eastern Oman Mountains

Science.gov (United States)

Mattern, Frank; Scharf, Andreas

2018-04-01

The Oman Mountains formed by late Cretaceous obduction of the Tethys-derived Semail Ophiolite. This study concerns the postobductional extension on the northern flank of the mountain belt. Nine sites at the northern margins of the Jabal Akhdar/Nakhl and Saih Hatat domes of the Eastern Oman ("Hajar") Mountains were investigated. The northern margins are marked by a system of major interconnected extensional faults, the "Frontal Range Fault". While the vertical displacements along the Saih Hatat and westerly located Jabal Nakhl domes measure 2.25-6.25 km, 0.5-4.5 km and 4-7 km, respectively, it amounts to 1-5 km along the Jabal Akhdar Dome. Extension had started during the late Cretaceous, towards the end of ophiolite emplacement. Two stages of extension can be ascertained (late Cretaceous to early Eocene and probably Oligocene) at the eastern part of the Frontal Range Fault System (Wadi Kabir and Fanja Graben faults of similar strike). Along the intervening and differently striking fault segments at Sad and Sunub the same two stages of deformation are deduced. The first stage is characterized again by extension. The second stage is marked by dextral motion, including local transtension. Probable Oligocene extension affected the Batinah Coast Fault while it also affected the Wadi Kabir Fault and the Fanja Graben. It is unclear whether the western portion of the Frontal Range Fault also went through two stages of deformation. Bedding-parallel ductile and brittle deformation is a common phenomenon. Hot springs and listwaenite are associated with dextral releasing bends within the fault system, as well as a basalt intrusion of probable Oligocene age. A structural transect through the Frontal Range along the superbly exposed Wadi Bani Kharous (Jabal Akhdar Dome) revealed that extension affected the Frontal Range at least 2.5 km south of the Frontal Range Fault. Also here, bedding-parallel shearing is important, but not exclusive. A late Cretaceous thrust was
Parallel reservoir simulator computations

International Nuclear Information System (INIS)

Hemanth-Kumar, K.; Young, L.C.

1995-01-01

The adaptation of a reservoir simulator for parallel computations is described. The simulator was originally designed for vector processors. It performs approximately 99% of its calculations in vector/parallel mode and relative to scalar calculations it achieves speedups of 65 and 81 for black oil and EOS simulations, respectively on the CRAY C-90
The STAPL Parallel Graph Library

KAUST Repository

Harshvardhan,

2013-01-01

This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable distributed graph container and a collection of commonly used parallel graph algorithms. The library introduces pGraph pViews that separate algorithm design from the container implementation. It supports three graph processing algorithmic paradigms, level-synchronous, asynchronous and coarse-grained, and provides common graph algorithms based on them. Experimental results demonstrate improved scalability in performance and data size over existing graph libraries on more than 16,000 cores and on internet-scale graphs containing over 16 billion vertices and 250 billion edges. © Springer-Verlag Berlin Heidelberg 2013.
Expressing Parallelism with ROOT

Energy Technology Data Exchange (ETDEWEB)

Piparo, D. [CERN; Tejedor, E. [CERN; Guiraud, E. [CERN; Ganis, G. [CERN; Mato, P. [CERN; Moneta, L. [CERN; Valls Pla, X. [CERN; Canal, P. [Fermilab

2017-11-22

The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Expressing Parallelism with ROOT

Science.gov (United States)

Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.

2017-10-01

The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Parallel hierarchical radiosity rendering

Energy Technology Data Exchange (ETDEWEB)

Carter, Michael [Iowa State Univ., Ames, IA (United States)

1993-07-01

In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.
Simulating Hydrologic Flow and Reactive Transport with PFLOTRAN and PETSc on Emerging Fine-Grained Parallel Computer Architectures

Science.gov (United States)

Mills, R. T.; Rupp, K.; Smith, B. F.; Brown, J.; Knepley, M.; Zhang, H.; Adams, M.; Hammond, G. E.

2017-12-01

As the high-performance computing community pushes towards the exascale horizon, power and heat considerations have driven the increasing importance and prevalence of fine-grained parallelism in new computer architectures. High-performance computing centers have become increasingly reliant on GPGPU accelerators and "manycore" processors such as the Intel Xeon Phi line, and 512-bit SIMD registers have even been introduced in the latest generation of Intel's mainstream Xeon server processors. The high degree of fine-grained parallelism and more complicated memory hierarchy considerations of such "manycore" processors present several challenges to existing scientific software. Here, we consider how the massively parallel, open-source hydrologic flow and reactive transport code PFLOTRAN - and the underlying Portable, Extensible Toolkit for Scientific Computation (PETSc) library on which it is built - can best take advantage of such architectures. We will discuss some key features of these novel architectures and our code optimizations and algorithmic developments targeted at them, and present experiences drawn from working with a wide range of PFLOTRAN benchmark problems on these architectures.
Shared Variable Oriented Parallel Precompiler for SPMD Model

Institute of Scientific and Technical Information of China (English)

无

1995-01-01

For the moment,commercial parallel computer systems with distributed memory architecture are usually provided with parallel FORTRAN or parallel C compliers,which are just traditional sequential FORTRAN or C compilers expanded with communication statements.Programmers suffer from writing parallel programs with communication statements. The Shared Variable Oriented Parallel Precompiler (SVOPP) proposed in this paper can automatically generate appropriate communication statements based on shared variables for SPMD(Single Program Multiple Data) computation model and greatly ease the parallel programming with high communication efficiency.The core function of parallel C precompiler has been successfully verified on a transputer-based parallel computer.Its prominent performance shows that SVOPP is probably a break-through in parallel programming technique.
Evaluating parallel optimization on transputers

Directory of Open Access Journals (Sweden)

A.G. Chalmers

2003-12-01

Full Text Available The faster processing power of modern computers and the development of efficient algorithms have made it possible for operations researchers to tackle a much wider range of problems than ever before. Further improvements in processing speed can be achieved utilising relatively inexpensive transputers to process components of an algorithm in parallel. The Davidon-Fletcher-Powell method is one of the most successful and widely used optimisation algorithms for unconstrained problems. This paper examines the algorithm and identifies the components that can be processed in parallel. The results of some experiments with these components are presented which indicates under what conditions parallel processing with an inexpensive configuration is likely to be faster than the traditional sequential implementations. The performance of the whole algorithm with its parallel components is then compared with the original sequential algorithm. The implementation serves to illustrate the practicalities of speeding up typical OR algorithms in terms of difficulty, effort and cost. The results give an indication of the savings in time a given parallel implementation can be expected to yield.
Programming massively parallel processors a hands-on approach

CERN Document Server

Kirk, David B

2010-01-01

Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs. Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA. The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who ...

PFLOTRAN User Manual: A Massively Parallel Reactive Flow and Transport Model for Describing Surface and Subsurface Processes

Energy Technology Data Exchange (ETDEWEB)

Lichtner, Peter C. [OFM Research, Redmond, WA (United States); Hammond, Glenn E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Lu, Chuan [Idaho National Lab. (INL), Idaho Falls, ID (United States); Karra, Satish [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Bisht, Gautam [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Andre, Benjamin [National Center for Atmospheric Research, Boulder, CO (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Mills, Richard [Intel Corporation, Portland, OR (United States); Univ. of Tennessee, Knoxville, TN (United States); Kumar, Jitendra [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

2015-01-20

PFLOTRAN solves a system of generally nonlinear partial differential equations describing multi-phase, multicomponent and multiscale reactive flow and transport in porous materials. The code is designed to run on massively parallel computing architectures as well as workstations and laptops (e.g. Hammond et al., 2011). Parallelization is achieved through domain decomposition using the PETSc (Portable Extensible Toolkit for Scientific Computation) libraries for the parallelization framework (Balay et al., 1997). PFLOTRAN has been developed from the ground up for parallel scalability and has been run on up to 218 processor cores with problem sizes up to 2 billion degrees of freedom. Written in object oriented Fortran 90, the code requires the latest compilers compatible with Fortran 2003. At the time of this writing this requires gcc 4.7.x, Intel 12.1.x and PGC compilers. As a requirement of running problems with a large number of degrees of freedom, PFLOTRAN allows reading input data that is too large to fit into memory allotted to a single processor core. The current limitation to the problem size PFLOTRAN can handle is the limitation of the HDF5 file format used for parallel IO to 32 bit integers. Noting that 2³² = 4; 294; 967; 296, this gives an estimate of the maximum problem size that can be currently run with PFLOTRAN. Hopefully this limitation will be remedied in the near future.
Approaches for introducing high molecular diversity in scaffolds: fast parallel synthesis of highly substituted 1H-quinolin-4-one libraries.

Science.gov (United States)

Kuznetsov, Vladimir; Gorohovsky, Sofia; Levy, Amalia; Meir, Simcha; Shkoulev, Vladimir; Menashe, Naim; Greenwald, Moshe; Aizikovich, Alexander; Ofer, Dror; Byk, Gerardo; Gellerman, Garry

2004-01-01

We have developed a two steps strategy for the parallel synthesis of highly diversified quinolin-ones. In the first step we have combined and improved different synthetic methods for generating quinolin-4-ones bearing four different substitutions at specific positions using round bottomed flasks. The synthesis was assessed for a large number of substituted quinolin-4-ones. In the second step, the improved method was adapted to a parallel array synthesis using a 12 positions carrousel as demonstrated for the synthesis of 42-variable quinolin-4-ones. The first combinatorial library set 14(a-x) was obtained with a chemical purity of more than 95% without purification, the second library set 15(a-r), which included two synthetic steps, needed combinatorial purification using an innovative parallel purifier. The proposed approach contributes to a more extensive diversification of molecular scaffolds in general and provides access to highly substituted quinolinones in particular.
Exploiting Symmetry on Parallel Architectures.

Science.gov (United States)

Stiller, Lewis Benjamin

1995-01-01

This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.
Advanced parallel processing with supercomputer architectures

International Nuclear Information System (INIS)

Hwang, K.

1987-01-01

This paper investigates advanced parallel processing techniques and innovative hardware/software architectures that can be applied to boost the performance of supercomputers. Critical issues on architectural choices, parallel languages, compiling techniques, resource management, concurrency control, programming environment, parallel algorithms, and performance enhancement methods are examined and the best answers are presented. The authors cover advanced processing techniques suitable for supercomputers, high-end mainframes, minisupers, and array processors. The coverage emphasizes vectorization, multitasking, multiprocessing, and distributed computing. In order to achieve these operation modes, parallel languages, smart compilers, synchronization mechanisms, load balancing methods, mapping parallel algorithms, operating system functions, application library, and multidiscipline interactions are investigated to ensure high performance. At the end, they assess the potentials of optical and neural technologies for developing future supercomputers
A conceptual design of multidisciplinary-integrated C.F.D. simulation on parallel computers

International Nuclear Information System (INIS)

Onishi, Ryoichi; Ohta, Takashi; Kimura, Toshiya.

1996-11-01

A design of a parallel aeroelastic code for aircraft integrated simulations is conducted. The method for integrating aerodynamics and structural dynamics software on parallel computers is devised by using the Euler/Navier-Stokes equations coupled with wing-box finite element structures. A synthesis of modern aircraft requires the optimizations of aerodynamics, structures, controls, operabilities, or other design disciplines, and the R and D efforts to implement Multidisciplinary Design Optimization environments using high performance computers are made especially among the U.S. aerospace industries. This report describes a Multiple Program Multiple Data (MPMD) parallelization of aerodynamics and structural dynamics codes with a dynamic deformation grid. A three-dimensional computation of a flowfield with dynamic deformation caused by a structural deformation is performed, and a pressure data calculated is used for a computation of the structural deformation which is input again to a fluid dynamics code. This process is repeated exchanging the computed data of pressures and deformations between flowfield grids and structural elements. It enables to simulate the structure movements which take into account of the interaction of fluid and structure. The conceptual design for achieving the aforementioned various functions is reported. Also the future extensions to incorporate control systems, which enable to simulate a realistic aircraft configuration to be a major tool for Aircraft Integrated Simulation, are investigated. (author)
Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E

2014-11-11

Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.
Turning an Extension Aide into an Extension Agent

Science.gov (United States)

Seevers, Brenda; Dormody, Thomas J.

2010-01-01

For any organization to remain sustainable, a renewable source of faculty and staff needs to be available. The Extension Internship Program for Juniors and Seniors in High School is a new tool for recruiting and developing new Extension agents. Students get "hands on" experience working in an Extension office and earn college credit…
Finding Tropical Cyclones on a Cloud Computing Cluster: Using Parallel Virtualization for Large-Scale Climate Simulation Analysis

Energy Technology Data Exchange (ETDEWEB)

Hasenkamp, Daren; Sim, Alexander; Wehner, Michael; Wu, Kesheng

2010-09-30

Extensive computing power has been used to tackle issues such as climate changes, fusion energy, and other pressing scientific challenges. These computations produce a tremendous amount of data; however, many of the data analysis programs currently only run a single processor. In this work, we explore the possibility of using the emerging cloud computing platform to parallelize such sequential data analysis tasks. As a proof of concept, we wrap a program for analyzing trends of tropical cyclones in a set of virtual machines (VMs). This approach allows the user to keep their familiar data analysis environment in the VMs, while we provide the coordination and data transfer services to ensure the necessary input and output are directed to the desired locations. This work extensively exercises the networking capability of the cloud computing systems and has revealed a number of weaknesses in the current cloud system software. In our tests, we are able to scale the parallel data analysis job to a modest number of VMs and achieve a speedup that is comparable to running the same analysis task using MPI. However, compared to MPI based parallelization, the cloud-based approach has a number of advantages. The cloud-based approach is more flexible because the VMs can capture arbitrary software dependencies without requiring the user to rewrite their programs. The cloud-based approach is also more resilient to failure; as long as a single VM is running, it can make progress while as soon as one MPI node fails the whole analysis job fails. In short, this initial work demonstrates that a cloud computing system is a viable platform for distributed scientific data analyses traditionally conducted on dedicated supercomputing systems.
Finding Tropical Cyclones on a Cloud Computing Cluster: Using Parallel Virtualization for Large-Scale Climate Simulation Analysis

International Nuclear Information System (INIS)

Hasenkamp, Daren; Sim, Alexander; Wehner, Michael; Wu, Kesheng

2010-01-01

Extensive computing power has been used to tackle issues such as climate changes, fusion energy, and other pressing scientific challenges. These computations produce a tremendous amount of data; however, many of the data analysis programs currently only run a single processor. In this work, we explore the possibility of using the emerging cloud computing platform to parallelize such sequential data analysis tasks. As a proof of concept, we wrap a program for analyzing trends of tropical cyclones in a set of virtual machines (VMs). This approach allows the user to keep their familiar data analysis environment in the VMs, while we provide the coordination and data transfer services to ensure the necessary input and output are directed to the desired locations. This work extensively exercises the networking capability of the cloud computing systems and has revealed a number of weaknesses in the current cloud system software. In our tests, we are able to scale the parallel data analysis job to a modest number of VMs and achieve a speedup that is comparable to running the same analysis task using MPI. However, compared to MPI based parallelization, the cloud-based approach has a number of advantages. The cloud-based approach is more flexible because the VMs can capture arbitrary software dependencies without requiring the user to rewrite their programs. The cloud-based approach is also more resilient to failure; as long as a single VM is running, it can make progress while as soon as one MPI node fails the whole analysis job fails. In short, this initial work demonstrates that a cloud computing system is a viable platform for distributed scientific data analyses traditionally conducted on dedicated supercomputing systems.
Tilted cone-beam reconstruction with row-wise fan-to-parallel rebinning

International Nuclear Information System (INIS)

Hsieh Jiang; Tang Xiangyang

2006-01-01

Reconstruction algorithms for cone-beam CT have been the focus of many studies. Several exact and approximate reconstruction algorithms were proposed for step-and-shoot and helical scanning trajectories to combat cone-beam related artefacts. In this paper, we present a new closed-form cone-beam reconstruction formula for tilted gantry data acquisition. Although several algorithms were proposed in the past to combat errors induced by the gantry tilt, none of the algorithms addresses the scenario in which the cone-beam geometry is first rebinned to a set of parallel beams prior to the filtered backprojection. We show that the image quality advantages of the rebinned parallel-beam reconstruction are significant, which makes the development of such an algorithm necessary. Because of the rebinning process, the reconstruction algorithm becomes more complex and the amount of iso-centre adjustment depends not only on the projection and tilt angles, but also on the reconstructed pixel location. In this paper, we first demonstrate the advantages of the row-wise fan-to-parallel rebinning and derive a closed-form solution for the reconstruction algorithm for the step-and-shoot and constant-pitch helical scans. The proposed algorithm requires the 'warping' of the reconstruction matrix on a view-by-view basis prior to the backprojection step. We further extend the algorithm to the variable-pitch helical scans in which the patient table travels at non-constant speeds. The algorithm was tested extensively on both the 16- and 64-slice CT scanners. The efficacy of the algorithm is clearly demonstrated by multiple experiments
SOFTWARE FOR DESIGNING PARALLEL APPLICATIONS

Directory of Open Access Journals (Sweden)

M. K. Bouza

2017-01-01

Full Text Available The object of research is the tools to support the development of parallel programs in C/C ++. The methods and software which automates the process of designing parallel applications are proposed.
An Introduction to Parallel Computation R

Indian Academy of Sciences (India)

How are they programmed? This article provides an introduction. A parallel computer is a network of processors built for ... and have been used to solve problems much faster than a single ... in parallel computer design is to select an organization which ..... The most ambitious approach to parallel computing is to develop.
Building a parallel file system simulator

International Nuclear Information System (INIS)

Molina-Estolano, E; Maltzahn, C; Brandt, S A; Bent, J

2009-01-01

Parallel file systems are gaining in popularity in high-end computing centers as well as commercial data centers. High-end computing systems are expected to scale exponentially and to pose new challenges to their storage scalability in terms of cost and power. To address these challenges scientists and file system designers will need a thorough understanding of the design space of parallel file systems. Yet there exist few systematic studies of parallel file system behavior at petabyte- and exabyte scale. An important reason is the significant cost of getting access to large-scale hardware to test parallel file systems. To contribute to this understanding we are building a parallel file system simulator that can simulate parallel file systems at very large scale. Our goal is to simulate petabyte-scale parallel file systems on a small cluster or even a single machine in reasonable time and fidelity. With this simulator, file system experts will be able to tune existing file systems for specific workloads, scientists and file system deployment engineers will be able to better communicate workload requirements, file system designers and researchers will be able to try out design alternatives and innovations at scale, and instructors will be able to study very large-scale parallel file system behavior in the class room. In this paper we describe our approach and provide preliminary results that are encouraging both in terms of fidelity and simulation scalability.
Parallelization for first principles electronic state calculation program

International Nuclear Information System (INIS)

Watanabe, Hiroshi; Oguchi, Tamio.

1997-03-01

In this report we study the parallelization for First principles electronic state calculation program. The target machines are NEC SX-4 for shared memory type parallelization and FUJITSU VPP300 for distributed memory type parallelization. The features of each parallel machine are surveyed, and the parallelization methods suitable for each are proposed. It is shown that 1.60 times acceleration is achieved with 2 CPU parallelization by SX-4 and 4.97 times acceleration is achieved with 12 PE parallelization by VPP 300. (author)
Parallel computation

International Nuclear Information System (INIS)

Jejcic, A.; Maillard, J.; Maurel, G.; Silva, J.; Wolff-Bacha, F.

1997-01-01

The work in the field of parallel processing has developed as research activities using several numerical Monte Carlo simulations related to basic or applied current problems of nuclear and particle physics. For the applications utilizing the GEANT code development or improvement works were done on parts simulating low energy physical phenomena like radiation, transport and interaction. The problem of actinide burning by means of accelerators was approached using a simulation with the GEANT code. A program of neutron tracking in the range of low energies up to the thermal region has been developed. It is coupled to the GEANT code and permits in a single pass the simulation of a hybrid reactor core receiving a proton burst. Other works in this field refers to simulations for nuclear medicine applications like, for instance, development of biological probes, evaluation and characterization of the gamma cameras (collimators, crystal thickness) as well as the method for dosimetric calculations. Particularly, these calculations are suited for a geometrical parallelization approach especially adapted to parallel machines of the TN310 type. Other works mentioned in the same field refer to simulation of the electron channelling in crystals and simulation of the beam-beam interaction effect in colliders. The GEANT code was also used to simulate the operation of germanium detectors designed for natural and artificial radioactivity monitoring of environment
Neoclassical parallel flow calculation in the presence of external parallel momentum sources in Heliotron J

Energy Technology Data Exchange (ETDEWEB)

Nishioka, K.; Nakamura, Y. [Graduate School of Energy Science, Kyoto University, Gokasho, Uji, Kyoto 611-0011 (Japan); Nishimura, S. [National Institute for Fusion Science, 322-6 Oroshi-cho, Toki, Gifu 509-5292 (Japan); Lee, H. Y. [Korea Advanced Institute of Science and Technology, Daejeon 305-701 (Korea, Republic of); Kobayashi, S.; Mizuuchi, T.; Nagasaki, K.; Okada, H.; Minami, T.; Kado, S.; Yamamoto, S.; Ohshima, S.; Konoshima, S.; Sano, F. [Institute of Advanced Energy, Kyoto University, Gokasho, Uji, Kyoto 611-0011 (Japan)

2016-03-15

A moment approach to calculate neoclassical transport in non-axisymmetric torus plasmas composed of multiple ion species is extended to include the external parallel momentum sources due to unbalanced tangential neutral beam injections (NBIs). The momentum sources that are included in the parallel momentum balance are calculated from the collision operators of background particles with fast ions. This method is applied for the clarification of the physical mechanism of the neoclassical parallel ion flows and the multi-ion species effect on them in Heliotron J NBI plasmas. It is found that parallel ion flow can be determined by the balance between the parallel viscosity and the external momentum source in the region where the external source is much larger than the thermodynamic force driven source in the collisional plasmas. This is because the friction between C{sup 6+} and D{sup +} prevents a large difference between C{sup 6+} and D{sup +} flow velocities in such plasmas. The C{sup 6+} flow velocities, which are measured by the charge exchange recombination spectroscopy system, are numerically evaluated with this method. It is shown that the experimentally measured C{sup 6+} impurity flow velocities do not contradict clearly with the neoclassical estimations, and the dependence of parallel flow velocities on the magnetic field ripples is consistent in both results.
Parallel performance of the angular versus spatial domain decomposition for discrete ordinates transport methods

International Nuclear Information System (INIS)

Fischer, J.W.; Azmy, Y.Y.

2003-01-01

A previously reported parallel performance model for Angular Domain Decomposition (ADD) of the Discrete Ordinates method for solving multidimensional neutron transport problems is revisited for further validation. Three communication schemes: native MPI, the bucket algorithm, and the distributed bucket algorithm, are included in the validation exercise that is successfully conducted on a Beowulf cluster. The parallel performance model is comprised of three components: serial, parallel, and communication. The serial component is largely independent of the number of participating processors, P, while the parallel component decreases like 1/P. These two components are independent of the communication scheme, in contrast with the communication component that typically increases with P in a manner highly dependent on the global reduced algorithm. Correct trends for each component and each communication scheme were measured for the Arbitrarily High Order Transport (AHOT) code, thus validating the performance models. Furthermore, extensive experiments illustrate the superiority of the bucket algorithm. The primary question addressed in this research is: for a given problem size, which domain decomposition method, angular or spatial, is best suited to parallelize Discrete Ordinates methods on a specific computational platform? We address this question for three-dimensional applications via parallel performance models that include parameters specifying the problem size and system performance: the above-mentioned ADD, and a previously constructed and validated Spatial Domain Decomposition (SDD) model. We conclude that for large problems the parallel component dwarfs the communication component even on moderately large numbers of processors. The main advantages of SDD are: (a) scalability to higher numbers of processors of the order of the number of computational cells; (b) smaller memory requirement; (c) better performance than ADD on high-end platforms and large number of
Structural Properties of G,T-Parallel Duplexes

Directory of Open Access Journals (Sweden)

Anna Aviñó

2010-01-01

Full Text Available The structure of G,T-parallel-stranded duplexes of DNA carrying similar amounts of adenine and guanine residues is studied by means of molecular dynamics (MD simulations and UV- and CD spectroscopies. In addition the impact of the substitution of adenine by 8-aminoadenine and guanine by 8-aminoguanine is analyzed. The presence of 8-aminoadenine and 8-aminoguanine stabilizes the parallel duplex structure. Binding of these oligonucleotides to their target polypyrimidine sequences to form the corresponding G,T-parallel triplex was not observed. Instead, when unmodified parallel-stranded duplexes were mixed with their polypyrimidine target, an interstrand Watson-Crick duplex was formed. As predicted by theoretical calculations parallel-stranded duplexes carrying 8-aminopurines did not bind to their target. The preference for the parallel-duplex over the Watson-Crick antiparallel duplex is attributed to the strong stabilization of the parallel duplex produced by the 8-aminopurines. Theoretical studies show that the isomorphism of the triads is crucial for the stability of the parallel triplex.
High-speed parallel solution of the neutron diffusion equation with the hierarchical domain decomposition boundary element method incorporating parallel communications

International Nuclear Information System (INIS)

Tsuji, Masashi; Chiba, Gou

2000-01-01

A hierarchical domain decomposition boundary element method (HDD-BEM) for solving the multiregion neutron diffusion equation (NDE) has been fully parallelized, both for numerical computations and for data communications, to accomplish a high parallel efficiency on distributed memory message passing parallel computers. Data exchanges between node processors that are repeated during iteration processes of HDD-BEM are implemented, without any intervention of the host processor that was used to supervise parallel processing in the conventional parallelized HDD-BEM (P-HDD-BEM). Thus, the parallel processing can be executed with only cooperative operations of node processors. The communication overhead was even the dominant time consuming part in the conventional P-HDD-BEM, and the parallelization efficiency decreased steeply with the increase of the number of processors. With the parallel data communication, the efficiency is affected only by the number of boundary elements assigned to decomposed subregions, and the communication overhead can be drastically reduced. This feature can be particularly advantageous in the analysis of three-dimensional problems where a large number of processors are required. The proposed P-HDD-BEM offers a promising solution to the deterioration problem of parallel efficiency and opens a new path to parallel computations of NDEs on distributed memory message passing parallel computers. (author)
Parallel education: what is it?

OpenAIRE

Amos, Michelle Peta

2017-01-01

In the history of education it has long been discussed that single-sex and coeducation are the two models of education present in schools. With the introduction of parallel schools over the last 15 years, there has been very little research into this 'new model'. Many people do not understand what it means for a school to be parallel or they confuse a parallel model with co-education, due to the presence of both boys and girls within the one institution. Therefore, the main obj...

Parallel computing of physical maps--a comparative study in SIMD and MIMD parallelism.

Science.gov (United States)

Bhandarkar, S M; Chirravuri, S; Arnold, J

1996-01-01

Ordering clones from a genomic library into physical maps of whole chromosomes presents a central computational problem in genetics. Chromosome reconstruction via clone ordering is usually isomorphic to the NP-complete Optimal Linear Arrangement problem. Parallel SIMD and MIMD algorithms for simulated annealing based on Markov chain distribution are proposed and applied to the problem of chromosome reconstruction via clone ordering. Perturbation methods and problem-specific annealing heuristics are proposed and described. The SIMD algorithms are implemented on a 2048 processor MasPar MP-2 system which is an SIMD 2-D toroidal mesh architecture whereas the MIMD algorithms are implemented on an 8 processor Intel iPSC/860 which is an MIMD hypercube architecture. A comparative analysis of the various SIMD and MIMD algorithms is presented in which the convergence, speedup, and scalability characteristics of the various algorithms are analyzed and discussed. On a fine-grained, massively parallel SIMD architecture with a low synchronization overhead such as the MasPar MP-2, a parallel simulated annealing algorithm based on multiple periodically interacting searches performs the best. For a coarse-grained MIMD architecture with high synchronization overhead such as the Intel iPSC/860, a parallel simulated annealing algorithm based on multiple independent searches yields the best results. In either case, distribution of clonal data across multiple processors is shown to exacerbate the tendency of the parallel simulated annealing algorithm to get trapped in a local optimum.
On synchronous parallel computations with independent probabilistic choice

International Nuclear Information System (INIS)

Reif, J.H.

1984-01-01

This paper introduces probabilistic choice to synchronous parallel machine models; in particular parallel RAMs. The power of probabilistic choice in parallel computations is illustrate by parallelizing some known probabilistic sequential algorithms. The authors characterize the computational complexity of time, space, and processor bounded probabilistic parallel RAMs in terms of the computational complexity of probabilistic sequential RAMs. They show that parallelism uniformly speeds up time bounded probabilistic sequential RAM computations by nearly a quadratic factor. They also show that probabilistic choice can be eliminated from parallel computations by introducing nonuniformity
Automatic Parallelization Tool: Classification of Program Code for Parallel Computing

Directory of Open Access Journals (Sweden)

Mustafa Basthikodi

2016-04-01

Full Text Available Performance growth of single-core processors has come to a halt in the past decade, but was re-enabled by the introduction of parallelism in processors. Multicore frameworks along with Graphical Processing Units empowered to enhance parallelism broadly. Couples of compilers are updated to developing challenges forsynchronization and threading issues. Appropriate program and algorithm classifications will have advantage to a great extent to the group of software engineers to get opportunities for effective parallelization. In present work we investigated current species for classification of algorithms, in that related work on classification is discussed along with the comparison of issues that challenges the classification. The set of algorithms are chosen which matches the structure with different issues and perform given task. We have tested these algorithms utilizing existing automatic species extraction toolsalong with Bones compiler. We have added functionalities to existing tool, providing a more detailed characterization. The contributions of our work include support for pointer arithmetic, conditional and incremental statements, user defined types, constants and mathematical functions. With this, we can retain significant data which is not captured by original speciesof algorithms. We executed new theories into the device, empowering automatic characterization of program code.
Resistor Combinations for Parallel Circuits.

Science.gov (United States)

McTernan, James P.

1978-01-01

To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)
Parallelization methods study of thermal-hydraulics codes

International Nuclear Information System (INIS)

Gaudart, Catherine

2000-01-01

The variety of parallelization methods and machines leads to a wide selection for programmers. In this study we suggest, in an industrial context, some solutions from the experience acquired through different parallelization methods. The study is about several scientific codes which simulate a large variety of thermal-hydraulics phenomena. A bibliography on parallelization methods and a first analysis of the codes showed the difficulty of our process on the whole applications to study. Therefore, it would be necessary to identify and extract a representative part of these applications and parallelization methods. The linear solver part of the codes forced itself. On this particular part several parallelization methods had been used. From these developments one could estimate the necessary work for a non initiate programmer to parallelize his application, and the impact of the development constraints. The different methods of parallelization tested are the numerical library PETSc, the parallelizer PAF, the language HPF, the formalism PEI and the communications library MPI and PYM. In order to test several methods on different applications and to follow the constraint of minimization of the modifications in codes, a tool called SPS (Server of Parallel Solvers) had be developed. We propose to describe the different constraints about the optimization of codes in an industrial context, to present the solutions given by the tool SPS, to show the development of the linear solver part with the tested parallelization methods and lastly to compare the results against the imposed criteria. (author) [fr
Kmerind: A Flexible Parallel Library for K-mer Indexing of Biological Sequences on Distributed Memory Systems.

Science.gov (United States)

Pan, Tony; Flick, Patrick; Jain, Chirag; Liu, Yongchao; Aluru, Srinivas

2017-10-09

Counting and indexing fixed length substrings, or k-mers, in biological sequences is a key step in many bioinformatics tasks including genome alignment and mapping, genome assembly, and error correction. While advances in next generation sequencing technologies have dramatically reduced the cost and improved latency and throughput, few bioinformatics tools can efficiently process the datasets at the current generation rate of 1.8 terabases every 3 days. We present Kmerind, a high performance parallel k-mer indexing library for distributed memory environments. The Kmerind library provides a set of simple and consistent APIs with sequential semantics and parallel implementations that are designed to be flexible and extensible. Kmerind's k-mer counter performs similarly or better than the best existing k-mer counting tools even on shared memory systems. In a distributed memory environment, Kmerind counts k-mers in a 120 GB sequence read dataset in less than 13 seconds on 1024 Xeon CPU cores, and fully indexes their positions in approximately 17 seconds. Querying for 1% of the k-mers in these indices can be completed in 0.23 seconds and 28 seconds, respectively. Kmerind is the first k-mer indexing library for distributed memory environments, and the first extensible library for general k-mer indexing and counting. Kmerind is available at https://github.com/ParBLiSS/kmerind.
Simulation Exploration through Immersive Parallel Planes

Energy Technology Data Exchange (ETDEWEB)

Brunhart-Lupo, Nicholas J [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Bush, Brian W [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Gruchalla, Kenny M [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Smith, Steve [Los Alamos Visualization Associates

2017-05-25

We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, each individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.
Workspace Analysis for Parallel Robot

Directory of Open Access Journals (Sweden)

Ying Sun

2013-05-01

Full Text Available As a completely new-type of robot, the parallel robot possesses a lot of advantages that the serial robot does not, such as high rigidity, great load-carrying capacity, small error, high precision, small self-weight/load ratio, good dynamic behavior and easy control, hence its range is extended in using domain. In order to find workspace of parallel mechanism, the numerical boundary-searching algorithm based on the reverse solution of kinematics and limitation of link length has been introduced. This paper analyses position workspace, orientation workspace of parallel robot of the six degrees of freedom. The result shows: It is a main means to increase and decrease its workspace to change the length of branch of parallel mechanism; The radius of the movement platform has no effect on the size of workspace, but will change position of workspace.
Massively Parallel Finite Element Programming

KAUST Repository

Heister, Timo; Kronbichler, Martin; Bangerth, Wolfgang

2010-01-01

Today's large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
Massively Parallel Finite Element Programming

KAUST Repository

Heister, Timo

2010-01-01

Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
Emerging Nanophotonic Applications Explored with Advanced Scientific Parallel Computing

Science.gov (United States)

Meng, Xiang

The domain of nanoscale optical science and technology is a combination of the classical world of electromagnetics and the quantum mechanical regime of atoms and molecules. Recent advancements in fabrication technology allows the optical structures to be scaled down to nanoscale size or even to the atomic level, which are far smaller than the wavelength they are designed for. These nanostructures can have unique, controllable, and tunable optical properties and their interactions with quantum materials can have important near-field and far-field optical response. Undoubtedly, these optical properties can have many important applications, ranging from the efficient and tunable light sources, detectors, filters, modulators, high-speed all-optical switches; to the next-generation classical and quantum computation, and biophotonic medical sensors. This emerging research of nanoscience, known as nanophotonics, is a highly interdisciplinary field requiring expertise in materials science, physics, electrical engineering, and scientific computing, modeling and simulation. It has also become an important research field for investigating the science and engineering of light-matter interactions that take place on wavelength and subwavelength scales where the nature of the nanostructured matter controls the interactions. In addition, the fast advancements in the computing capabilities, such as parallel computing, also become as a critical element for investigating advanced nanophotonic devices. This role has taken on even greater urgency with the scale-down of device dimensions, and the design for these devices require extensive memory and extremely long core hours. Thus distributed computing platforms associated with parallel computing are required for faster designs processes. Scientific parallel computing constructs mathematical models and quantitative analysis techniques, and uses the computing machines to analyze and solve otherwise intractable scientific challenges. In
Collectively loading an application in a parallel computer

Science.gov (United States)

Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.; Miller, Samuel J.; Mundy, Michael B.

2016-01-05

Collectively loading an application in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: identifying, by a parallel computer control system, a subset of compute nodes in the parallel computer to execute a job; selecting, by the parallel computer control system, one of the subset of compute nodes in the parallel computer as a job leader compute node; retrieving, by the job leader compute node from computer memory, an application for executing the job; and broadcasting, by the job leader to the subset of compute nodes in the parallel computer, the application for executing the job.
Productive Parallel Programming: The PCN Approach

Directory of Open Access Journals (Sweden)

Ian Foster

1992-01-01

Full Text Available We describe the PCN programming system, focusing on those features designed to improve the productivity of scientists and engineers using parallel supercomputers. These features include a simple notation for the concise specification of concurrent algorithms, the ability to incorporate existing Fortran and C code into parallel applications, facilities for reusing parallel program components, a portable toolkit that allows applications to be developed on a workstation or small parallel computer and run unchanged on supercomputers, and integrated debugging and performance analysis tools. We survey representative scientific applications and identify problem classes for which PCN has proved particularly useful.
Parallel-In-Time For Moving Meshes

Energy Technology Data Exchange (ETDEWEB)

Falgout, R. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Manteuffel, T. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Southworth, B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Schroder, J. B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2016-02-04

With steadily growing computational resources available, scientists must develop e ective ways to utilize the increased resources. High performance, highly parallel software has be- come a standard. However until recent years parallelism has focused primarily on the spatial domain. When solving a space-time partial di erential equation (PDE), this leads to a sequential bottleneck in the temporal dimension, particularly when taking a large number of time steps. The XBraid parallel-in-time library was developed as a practical way to add temporal parallelism to existing se- quential codes with only minor modi cations. In this work, a rezoning-type moving mesh is applied to a di usion problem and formulated in a parallel-in-time framework. Tests and scaling studies are run using XBraid and demonstrate excellent results for the simple model problem considered herein.
Integrated Task And Data Parallel Programming: Language Design

Science.gov (United States)

Grimshaw, Andrew S.; West, Emily A.

1998-01-01

his research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers '95 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program m. Additional 1995 Activities During the fall I collaborated
Performance of the Galley Parallel File System

Science.gov (United States)

Nieuwejaar, Nils; Kotz, David

1996-01-01

As the input/output (I/O) needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. This interface conceals the parallism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. Initial experiments, reported in this paper, indicate that Galley is capable of providing high-performance 1/O to applications the applications that rely on them. In Section 3 we describe that access data in patterns that have been observed to be common.
Unified Singularity Modeling and Reconfiguration of 3rTPS Metamorphic Parallel Mechanisms with Parallel Constraint Screws

Directory of Open Access Journals (Sweden)

Yufeng Zhuang

2015-01-01

Full Text Available This paper presents a unified singularity modeling and reconfiguration analysis of variable topologies of a class of metamorphic parallel mechanisms with parallel constraint screws. The new parallel mechanisms consist of three reconfigurable rTPS limbs that have two working phases stemming from the reconfigurable Hooke (rT joint. While one phase has full mobility, the other supplies a constraint force to the platform. Based on these, the platform constraint screw systems show that the new metamorphic parallel mechanisms have four topologies by altering the limb phases with mobility change among 1R2T (one rotation with two translations, 2R2T, and 3R2T and mobility 6. Geometric conditions of the mechanism design are investigated with some special topologies illustrated considering the limb arrangement. Following this and the actuation scheme analysis, a unified Jacobian matrix is formed using screw theory to include the change between geometric constraints and actuation constraints in the topology reconfiguration. Various singular configurations are identified by analyzing screw dependency in the Jacobian matrix. The work in this paper provides basis for singularity-free workspace analysis and optimal design of the class of metamorphic parallel mechanisms with parallel constraint screws which shows simple geometric constraints with potential simple kinematics and dynamics properties.
Automated Long-Term Monitoring of Parallel Microfluidic Operations Applying a Machine Vision-Assisted Positioning Method

Science.gov (United States)

Yip, Hon Ming; Li, John C. S.; Cui, Xin; Gao, Qiannan; Leung, Chi Chiu

2014-01-01

As microfluidics has been applied extensively in many cell and biochemical applications, monitoring the related processes is an important requirement. In this work, we design and fabricate a high-throughput microfluidic device which contains 32 microchambers to perform automated parallel microfluidic operations and monitoring on an automated stage of a microscope. Images are captured at multiple spots on the device during the operations for monitoring samples in microchambers in parallel; yet the device positions may vary at different time points throughout operations as the device moves back and forth on a motorized microscopic stage. Here, we report an image-based positioning strategy to realign the chamber position before every recording of microscopic image. We fabricate alignment marks at defined locations next to the chambers in the microfluidic device as reference positions. We also develop image processing algorithms to recognize the chamber positions in real-time, followed by realigning the chambers to their preset positions in the captured images. We perform experiments to validate and characterize the device functionality and the automated realignment operation. Together, this microfluidic realignment strategy can be a platform technology to achieve precise positioning of multiple chambers for general microfluidic applications requiring long-term parallel monitoring of cell and biochemical activities. PMID:25133248
Automated long-term monitoring of parallel microfluidic operations applying a machine vision-assisted positioning method.

Science.gov (United States)

Yip, Hon Ming; Li, John C S; Xie, Kai; Cui, Xin; Prasad, Agrim; Gao, Qiannan; Leung, Chi Chiu; Lam, Raymond H W

2014-01-01

As microfluidics has been applied extensively in many cell and biochemical applications, monitoring the related processes is an important requirement. In this work, we design and fabricate a high-throughput microfluidic device which contains 32 microchambers to perform automated parallel microfluidic operations and monitoring on an automated stage of a microscope. Images are captured at multiple spots on the device during the operations for monitoring samples in microchambers in parallel; yet the device positions may vary at different time points throughout operations as the device moves back and forth on a motorized microscopic stage. Here, we report an image-based positioning strategy to realign the chamber position before every recording of microscopic image. We fabricate alignment marks at defined locations next to the chambers in the microfluidic device as reference positions. We also develop image processing algorithms to recognize the chamber positions in real-time, followed by realigning the chambers to their preset positions in the captured images. We perform experiments to validate and characterize the device functionality and the automated realignment operation. Together, this microfluidic realignment strategy can be a platform technology to achieve precise positioning of multiple chambers for general microfluidic applications requiring long-term parallel monitoring of cell and biochemical activities.
Automated Long-Term Monitoring of Parallel Microfluidic Operations Applying a Machine Vision-Assisted Positioning Method

Directory of Open Access Journals (Sweden)

Hon Ming Yip

2014-01-01

Full Text Available As microfluidics has been applied extensively in many cell and biochemical applications, monitoring the related processes is an important requirement. In this work, we design and fabricate a high-throughput microfluidic device which contains 32 microchambers to perform automated parallel microfluidic operations and monitoring on an automated stage of a microscope. Images are captured at multiple spots on the device during the operations for monitoring samples in microchambers in parallel; yet the device positions may vary at different time points throughout operations as the device moves back and forth on a motorized microscopic stage. Here, we report an image-based positioning strategy to realign the chamber position before every recording of microscopic image. We fabricate alignment marks at defined locations next to the chambers in the microfluidic device as reference positions. We also develop image processing algorithms to recognize the chamber positions in real-time, followed by realigning the chambers to their preset positions in the captured images. We perform experiments to validate and characterize the device functionality and the automated realignment operation. Together, this microfluidic realignment strategy can be a platform technology to achieve precise positioning of multiple chambers for general microfluidic applications requiring long-term parallel monitoring of cell and biochemical activities.

Fast ℓ1-SPIRiT Compressed Sensing Parallel Imaging MRI: Scalable Parallel Implementation and Clinically Feasible Runtime

Science.gov (United States)

Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

2012-01-01

We present ℓ1-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the Wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative Self-Consistent Parallel Imaging (SPIRiT). Like many iterative MRI reconstructions, ℓ1-SPIRiT’s image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing ℓ1-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of ℓ1-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT Spoiled Gradient Echo (SPGR) sequence with up to 8× acceleration via poisson-disc undersampling in the two phase-encoded directions. PMID:22345529
Machine Learning and Parallelism in the Reconstruction of LHCb and its Upgrade

Science.gov (United States)

De Cian, Michel

2016-11-01

The LHCb detector at the LHC is a general purpose detector in the forward region with a focus on reconstructing decays of c- and b-hadrons. For Run II of the LHC, a new trigger strategy with a real-time reconstruction, alignment and calibration was employed. This was made possible by implementing an offline-like track reconstruction in the high level trigger. However, the ever increasing need for a higher throughput and the move to parallelism in the CPU architectures in the last years necessitated the use of vectorization techniques to achieve the desired speed and a more extensive use of machine learning to veto bad events early on. This document discusses selected improvements in computationally expensive parts of the track reconstruction, like the Kalman filter, as well as an improved approach to get rid of fake tracks using fast machine learning techniques. In the last part, a short overview of the track reconstruction challenges for the upgrade of LHCb, is given. Running a fully software-based trigger, a large gain in speed in the reconstruction has to be achieved to cope with the 40 MHz bunch-crossing rate. Two possible approaches for techniques exploiting massive parallelization are discussed.
Machine Learning and Parallelism in the Reconstruction of LHCb and its Upgrade

International Nuclear Information System (INIS)

Cian, Michel De

2016-01-01

The LHCb detector at the LHC is a general purpose detector in the forward region with a focus on reconstructing decays of c- and b-hadrons. For Run II of the LHC, a new trigger strategy with a real-time reconstruction, alignment and calibration was employed. This was made possible by implementing an offline-like track reconstruction in the high level trigger. However, the ever increasing need for a higher throughput and the move to parallelism in the CPU architectures in the last years necessitated the use of vectorization techniques to achieve the desired speed and a more extensive use of machine learning to veto bad events early on. This document discusses selected improvements in computationally expensive parts of the track reconstruction, like the Kalman filter, as well as an improved approach to get rid of fake tracks using fast machine learning techniques. In the last part, a short overview of the track reconstruction challenges for the upgrade of LHCb, is given. Running a fully software-based trigger, a large gain in speed in the reconstruction has to be achieved to cope with the 40 MHz bunch-crossing rate. Two possible approaches for techniques exploiting massive parallelization are discussed
Parallel and non-parallel laminar mixed convection flow in an inclined tube: The effect of the boundary conditions

International Nuclear Information System (INIS)

Barletta, A.

2008-01-01

The necessary condition for the onset of parallel flow in the fully developed region of an inclined duct is applied to the case of a circular tube. Parallel flow in inclined ducts is an uncommon regime, since in most cases buoyancy tends to produce the onset of secondary flow. The present study shows how proper thermal boundary conditions may preserve parallel flow regime. Mixed convection flow is studied for a special non-axisymmetric thermal boundary condition that, with a proper choice of a switch parameter, may be compatible with parallel flow. More precisely, a circumferentially variable heat flux distribution is prescribed on the tube wall, expressed as a sinusoidal function of the azimuthal coordinate θ with period 2π. A π/2 rotation in the position of the maximum heat flux, achieved by setting the switch parameter, may allow or not the existence of parallel flow. Two cases are considered corresponding to parallel and non-parallel flow. In the first case, the governing balance equations allow a simple analytical solution. On the contrary, in the second case, the local balance equations are solved numerically by employing a finite element method
Parallel programming with Easy Java Simulations

Science.gov (United States)

Esquembre, F.; Christian, W.; Belloni, M.

2018-01-01

Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.
Parallelism and Scalability in an Image Processing Application

DEFF Research Database (Denmark)

Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

2008-01-01

parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chip. This means that parallel processing is required in application areas that traditionally have not used...
Parallelism and Scalability in an Image Processing Application

DEFF Research Database (Denmark)

Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

2009-01-01

parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chips. This means that parallel processing is required in application areas that traditionally have not used...
Parallel auto-correlative statistics with VTK.

Energy Technology Data Exchange (ETDEWEB)

Pebay, Philippe Pierre; Bennett, Janine Camille

2013-08-01

This report summarizes existing statistical engines in VTK and presents both the serial and parallel auto-correlative statistics engines. It is a sequel to [PT08, BPRT09b, PT09, BPT09, PT10] which studied the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k-means, and order statistics engines. The ease of use of the new parallel auto-correlative statistics engine is illustrated by the means of C++ code snippets and algorithm verification is provided. This report justifies the design of the statistics engines with parallel scalability in mind, and provides scalability and speed-up analysis results for the autocorrelative statistics engine.
Conformal pure radiation with parallel rays

International Nuclear Information System (INIS)

Leistner, Thomas; Paweł Nurowski

2012-01-01

We define pure radiation metrics with parallel rays to be n-dimensional pseudo-Riemannian metrics that admit a parallel null line bundle K and whose Ricci tensor vanishes on vectors that are orthogonal to K. We give necessary conditions in terms of the Weyl, Cotton and Bach tensors for a pseudo-Riemannian metric to be conformal to a pure radiation metric with parallel rays. Then, we derive conditions in terms of the tractor calculus that are equivalent to the existence of a pure radiation metric with parallel rays in a conformal class. We also give analogous results for n-dimensional pseudo-Riemannian pp-waves. (paper)
Parallel plasma fluid turbulence calculations

International Nuclear Information System (INIS)

Leboeuf, J.N.; Carreras, B.A.; Charlton, L.A.; Drake, J.B.; Lynch, V.E.; Newman, D.E.; Sidikman, K.L.; Spong, D.A.

1994-01-01

The study of plasma turbulence and transport is a complex problem of critical importance for fusion-relevant plasmas. To this day, the fluid treatment of plasma dynamics is the best approach to realistic physics at the high resolution required for certain experimentally relevant calculations. Core and edge turbulence in a magnetic fusion device have been modeled using state-of-the-art, nonlinear, three-dimensional, initial-value fluid and gyrofluid codes. Parallel implementation of these models on diverse platforms--vector parallel (National Energy Research Supercomputer Center's CRAY Y-MP C90), massively parallel (Intel Paragon XP/S 35), and serial parallel (clusters of high-performance workstations using the Parallel Virtual Machine protocol)--offers a variety of paths to high resolution and significant improvements in real-time efficiency, each with its own advantages. The largest and most efficient calculations have been performed at the 200 Mword memory limit on the C90 in dedicated mode, where an overlap of 12 to 13 out of a maximum of 16 processors has been achieved with a gyrofluid model of core fluctuations. The richness of the physics captured by these calculations is commensurate with the increased resolution and efficiency and is limited only by the ingenuity brought to the analysis of the massive amounts of data generated
A task parallel implementation of fast multipole methods

KAUST Repository

Taura, Kenjiro

2012-11-01

This paper describes a task parallel implementation of ExaFMM, an open source implementation of fast multipole methods (FMM), using a lightweight task parallel library MassiveThreads. Although there have been many attempts on parallelizing FMM, experiences have almost exclusively been limited to formulation based on flat homogeneous parallel loops. FMM in fact contains operations that cannot be readily expressed in such conventional but restrictive models. We show that task parallelism, or parallel recursions in particular, allows us to parallelize all operations of FMM naturally and scalably. Moreover it allows us to parallelize a \\'\\'mutual interaction\\'\\' for force/potential evaluation, which is roughly twice as efficient as a more conventional, unidirectional force/potential evaluation. The net result is an open source FMM that is clearly among the fastest single node implementations, including those on GPUs; with a million particles on a 32 cores Sandy Bridge 2.20GHz node, it completes a single time step including tree construction and force/potential evaluation in 65 milliseconds. The study clearly showcases both programmability and performance benefits of flexible parallel constructs over more monolithic parallel loops. © 2012 IEEE.
Second derivative parallel block backward differentiation type ...

African Journals Online (AJOL)

Second derivative parallel block backward differentiation type formulas for Stiff ODEs. ... Log in or Register to get access to full text downloads. ... and the methods are inherently parallel and can be distributed over parallel processors. They are ...
Parallelization of quantum molecular dynamics simulation code

International Nuclear Information System (INIS)

Kato, Kaori; Kunugi, Tomoaki; Shibahara, Masahiko; Kotake, Susumu

1998-02-01

A quantum molecular dynamics simulation code has been developed for the analysis of the thermalization of photon energies in the molecule or materials in Kansai Research Establishment. The simulation code is parallelized for both Scalar massively parallel computer (Intel Paragon XP/S75) and Vector parallel computer (Fujitsu VPP300/12). Scalable speed-up has been obtained with a distribution to processor units by division of particle group in both parallel computers. As a result of distribution to processor units not only by particle group but also by the particles calculation that is constructed with fine calculations, highly parallelization performance is achieved in Intel Paragon XP/S75. (author)
A Parallel Approach to Fractal Image Compression

OpenAIRE

Lubomir Dedera

2004-01-01

The paper deals with a parallel approach to coding and decoding algorithms in fractal image compressionand presents experimental results comparing sequential and parallel algorithms from the point of view of achieved bothcoding and decoding time and effectiveness of parallelization.
Cache-aware data structure model for parallelism and dynamic load balancing

International Nuclear Information System (INIS)

Sridi, Marwa

2016-01-01

This PhD thesis is dedicated to the implementation of innovative parallel methods in the framework of fast transient fluid-structure dynamics. It improves existing methods within EUROPLEXUS software, in order to optimize the shared memory parallel strategy, complementary to the original distributed memory approach, brought together into a global hybrid strategy for clusters of multi-core nodes. Starting from a sound analysis of the state of the art concerning data structuring techniques correlated to the hierarchic memory organization of current multi-processor architectures, the proposed work introduces an approach suitable for an explicit time integration (i.e. with no linear system to solve at each step). A data structure of type 'Structure of arrays' is conserved for the global data storage, providing flexibility and efficiency for current operations on kinematics fields (displacement, velocity and acceleration). On the contrary, in the particular case of elementary operations (for internal forces generic computations, as well as fluxes computations between cell faces for fluid models), particularly time consuming but localized in the program, a temporary data structure of type 'Array of structures' is used instead, to force an efficient filling of the cache memory and increase the performance of the resolution, for both serial and shared memory parallel processing. Switching from the global structure to the temporary one is based on a cell grouping strategy, following classing cache-blocking principles but handling specifically for this work neighboring data necessary to the efficient treatment of ALE fluxes for cells on the group boundaries. The proposed approach is extensively tested, from the point of views of both the computation time and the access failures into cache memory, confronting the gains obtained within the elementary operations to the potential overhead generated by the data structure switch. Obtained results are very satisfactory, especially
Differences Between Distributed and Parallel Systems

Energy Technology Data Exchange (ETDEWEB)

Brightwell, R.; Maccabe, A.B.; Rissen, R.

1998-10-01

Distributed systems have been studied for twenty years and are now coming into wider use as fast networks and powerful workstations become more readily available. In many respects a massively parallel computer resembles a network of workstations and it is tempting to port a distributed operating system to such a machine. However, there are significant differences between these two environments and a parallel operating system is needed to get the best performance out of a massively parallel system. This report characterizes the differences between distributed systems, networks of workstations, and massively parallel systems and analyzes the impact of these differences on operating system design. In the second part of the report, we introduce Puma, an operating system specifically developed for massively parallel systems. We describe Puma portals, the basic building blocks for message passing paradigms implemented on top of Puma, and show how the differences observed in the first part of the report have influenced the design and implementation of Puma.
A survey of parallel multigrid algorithms

Science.gov (United States)

Chan, Tony F.; Tuminaro, Ray S.

1987-01-01

A typical multigrid algorithm applied to well-behaved linear-elliptic partial-differential equations (PDEs) is described. Criteria for designing and evaluating parallel algorithms are presented. Before evaluating the performance of some parallel multigrid algorithms, consideration is given to some theoretical complexity results for solving PDEs in parallel and for executing the multigrid algorithm. The effect of mapping and load imbalance on the partial efficiency of the algorithm is studied.
Parallel computing by Monte Carlo codes MVP/GMVP

International Nuclear Information System (INIS)

Nagaya, Yasunobu; Nakagawa, Masayuki; Mori, Takamasa

2001-01-01

General-purpose Monte Carlo codes MVP/GMVP are well-vectorized and thus enable us to perform high-speed Monte Carlo calculations. In order to achieve more speedups, we parallelized the codes on the different types of parallel computing platforms or by using a standard parallelization library MPI. The platforms used for benchmark calculations are a distributed-memory vector-parallel computer Fujitsu VPP500, a distributed-memory massively parallel computer Intel paragon and a distributed-memory scalar-parallel computer Hitachi SR2201, IBM SP2. As mentioned generally, linear speedup could be obtained for large-scale problems but parallelization efficiency decreased as the batch size per a processing element(PE) was smaller. It was also found that the statistical uncertainty for assembly powers was less than 0.1% by the PWR full-core calculation with more than 10 million histories and it took about 1.5 hours by massively parallel computing. (author)
The parallel processing of EGS4 code on distributed memory scalar parallel computer:Intel Paragon XP/S15-256

Energy Technology Data Exchange (ETDEWEB)

Takemiya, Hiroshi; Ohta, Hirofumi; Honma, Ichirou

1996-03-01

The parallelization of Electro-Magnetic Cascade Monte Carlo Simulation Code, EGS4 on distributed memory scalar parallel computer: Intel Paragon XP/S15-256 is described. EGS4 has the feature that calculation time for one incident particle is quite different from each other because of the dynamic generation of secondary particles and different behavior of each particle. Granularity for parallel processing, parallel programming model and the algorithm of parallel random number generation are discussed and two kinds of method, each of which allocates particles dynamically or statically, are used for the purpose of realizing high speed parallel processing of this code. Among four problems chosen for performance evaluation, the speedup factors for three problems have been attained to nearly 100 times with 128 processor. It has been found that when both the calculation time for each incident particles and its dispersion are large, it is preferable to use dynamic particle allocation method which can average the load for each processor. And it has also been found that when they are small, it is preferable to use static particle allocation method which reduces the communication overhead. Moreover, it is pointed out that to get the result accurately, it is necessary to use double precision variables in EGS4 code. Finally, the workflow of program parallelization is analyzed and tools for program parallelization through the experience of the EGS4 parallelization are discussed. (author).
Towards a streaming model for nested data parallelism

DEFF Research Database (Denmark)

Madsen, Frederik Meisner; Filinski, Andrzej

2013-01-01

The language-integrated cost semantics for nested data parallelism pioneered by NESL provides an intuitive, high-level model for predicting performance and scalability of parallel algorithms with reasonable accuracy. However, this predictability, obtained through a uniform, parallelism-flattening......The language-integrated cost semantics for nested data parallelism pioneered by NESL provides an intuitive, high-level model for predicting performance and scalability of parallel algorithms with reasonable accuracy. However, this predictability, obtained through a uniform, parallelism......-processable in a streaming fashion. This semantics is directly compatible with previously proposed piecewise execution models for nested data parallelism, but allows the expected space usage to be reasoned about directly at the source-language level. The language definition and implementation are still very much work...

Massively Parallel Computing: A Sandia Perspective

Energy Technology Data Exchange (ETDEWEB)

Dosanjh, Sudip S.; Greenberg, David S.; Hendrickson, Bruce; Heroux, Michael A.; Plimpton, Steve J.; Tomkins, James L.; Womble, David E.

1999-05-06

The computing power available to scientists and engineers has increased dramatically in the past decade, due in part to progress in making massively parallel computing practical and available. The expectation for these machines has been great. The reality is that progress has been slower than expected. Nevertheless, massively parallel computing is beginning to realize its potential for enabling significant break-throughs in science and engineering. This paper provides a perspective on the state of the field, colored by the authors' experiences using large scale parallel machines at Sandia National Laboratories. We address trends in hardware, system software and algorithms, and we also offer our view of the forces shaping the parallel computing industry.
Parallel Algorithms for the Exascale Era

Energy Technology Data Exchange (ETDEWEB)

Robey, Robert W. [Los Alamos National Laboratory

2016-10-19

New parallel algorithms are needed to reach the Exascale level of parallelism with millions of cores. We look at some of the research developed by students in projects at LANL. The research blends ideas from the early days of computing while weaving in the fresh approach brought by students new to the field of high performance computing. We look at reproducibility of global sums and why it is important to parallel computing. Next we look at how the concept of hashing has led to the development of more scalable algorithms suitable for next-generation parallel computers. Nearly all of this work has been done by undergraduates and published in leading scientific journals.
A Parallel Approach to Fractal Image Compression

Directory of Open Access Journals (Sweden)

Lubomir Dedera

2004-01-01

Full Text Available The paper deals with a parallel approach to coding and decoding algorithms in fractal image compressionand presents experimental results comparing sequential and parallel algorithms from the point of view of achieved bothcoding and decoding time and effectiveness of parallelization.
A parallelization study of the general purpose Monte Carlo code MCNP4 on a distributed memory highly parallel computer

International Nuclear Information System (INIS)

Yamazaki, Takao; Fujisaki, Masahide; Okuda, Motoi; Takano, Makoto; Masukawa, Fumihiro; Naito, Yoshitaka

1993-01-01

The general purpose Monte Carlo code MCNP4 has been implemented on the Fujitsu AP1000 distributed memory highly parallel computer. Parallelization techniques developed and studied are reported. A shielding analysis function of the MCNP4 code is parallelized in this study. A technique to map a history to each processor dynamically and to map control process to a certain processor was applied. The efficiency of parallelized code is up to 80% for a typical practical problem with 512 processors. These results demonstrate the advantages of a highly parallel computer to the conventional computers in the field of shielding analysis by Monte Carlo method. (orig.)
Applications of the parallel computing system using network

International Nuclear Information System (INIS)

Ido, Shunji; Hasebe, Hiroki

1994-01-01

Parallel programming is applied to multiple processors connected in Ethernet. Data exchanges between tasks located in each processing element are realized by two ways. One is socket which is standard library on recent UNIX operating systems. Another is a network connecting software, named as Parallel Virtual Machine (PVM) which is a free software developed by ORNL, to use many workstations connected to network as a parallel computer. This paper discusses the availability of parallel computing using network and UNIX workstations and comparison between specialized parallel systems (Transputer and iPSC/860) in a Monte Carlo simulation which generally shows high parallelization ratio. (author)
Introduction of Parallel GPGPU Acceleration Algorithms for the Solution of Radiative Transfer

Science.gov (United States)

Godoy, William F.; Liu, Xu

2011-01-01

General-purpose computing on graphics processing units (GPGPU) is a recent technique that allows the parallel graphics processing unit (GPU) to accelerate calculations performed sequentially by the central processing unit (CPU). To introduce GPGPU to radiative transfer, the Gauss-Seidel solution of the well-known expressions for 1-D and 3-D homogeneous, isotropic media is selected as a test case. Different algorithms are introduced to balance memory and GPU-CPU communication, critical aspects of GPGPU. Results show that speed-ups of one to two orders of magnitude are obtained when compared to sequential solutions. The underlying value of GPGPU is its potential extension in radiative solvers (e.g., Monte Carlo, discrete ordinates) at a minimal learning curve.
Parallel CFD Algorithms for Aerodynamical Flow Solvers on Unstructured Meshes. Parts 1 and 2

Science.gov (United States)

Barth, Timothy J.; Kwak, Dochan (Technical Monitor)

1995-01-01

The Advisory Group for Aerospace Research and Development (AGARD) has requested my participation in the lecture series entitled Parallel Computing in Computational Fluid Dynamics to be held at the von Karman Institute in Brussels, Belgium on May 15-19, 1995. In addition, a request has been made from the US Coordinator for AGARD at the Pentagon for NASA Ames to hold a repetition of the lecture series on October 16-20, 1995. I have been asked to be a local coordinator for the Ames event. All AGARD lecture series events have attendance limited to NATO allied countries. A brief of the lecture series is provided in the attached enclosure. Specifically, I have been asked to give two lectures of approximately 75 minutes each on the subject of parallel solution techniques for the fluid flow equations on unstructured meshes. The title of my lectures is "Parallel CFD Algorithms for Aerodynamical Flow Solvers on Unstructured Meshes" (Parts I-II). The contents of these lectures will be largely review in nature and will draw upon previously published work in this area. Topics of my lectures will include: (1) Mesh partitioning algorithms. Recursive techniques based on coordinate bisection, Cuthill-McKee level structures, and spectral bisection. (2) Newton's method for large scale CFD problems. Size and complexity estimates for Newton's method, modifications for insuring global convergence. (3) Techniques for constructing the Jacobian matrix. Analytic and numerical techniques for Jacobian matrix-vector products, constructing the transposed matrix, extensions to optimization and homotopy theories. (4) Iterative solution algorithms. Practical experience with GIVIRES and BICG-STAB matrix solvers. (5) Parallel matrix preconditioning. Incomplete Lower-Upper (ILU) factorization, domain-decomposed ILU, approximate Schur complement strategies.
Increasing phylogenetic resolution at low taxonomic levels using massively parallel sequencing of chloroplast genomes

Directory of Open Access Journals (Sweden)

Cronn Richard

2009-12-01

Full Text Available Abstract Background Molecular evolutionary studies share the common goal of elucidating historical relationships, and the common challenge of adequately sampling taxa and characters. Particularly at low taxonomic levels, recent divergence, rapid radiations, and conservative genome evolution yield limited sequence variation, and dense taxon sampling is often desirable. Recent advances in massively parallel sequencing make it possible to rapidly obtain large amounts of sequence data, and multiplexing makes extensive sampling of megabase sequences feasible. Is it possible to efficiently apply massively parallel sequencing to increase phylogenetic resolution at low taxonomic levels? Results We reconstruct the infrageneric phylogeny of Pinus from 37 nearly-complete chloroplast genomes (average 109 kilobases each of an approximately 120 kilobase genome generated using multiplexed massively parallel sequencing. 30/33 ingroup nodes resolved with ≥ 95% bootstrap support; this is a substantial improvement relative to prior studies, and shows massively parallel sequencing-based strategies can produce sufficient high quality sequence to reach support levels originally proposed for the phylogenetic bootstrap. Resampling simulations show that at least the entire plastome is necessary to fully resolve Pinus, particularly in rapidly radiating clades. Meta-analysis of 99 published infrageneric phylogenies shows that whole plastome analysis should provide similar gains across a range of plant genera. A disproportionate amount of phylogenetic information resides in two loci (ycf1, ycf2, highlighting their unusual evolutionary properties. Conclusion Plastome sequencing is now an efficient option for increasing phylogenetic resolution at lower taxonomic levels in plant phylogenetic and population genetic analyses. With continuing improvements in sequencing capacity, the strategies herein should revolutionize efforts requiring dense taxon and character sampling
Balanced, parallel operation of flashlamps

International Nuclear Information System (INIS)

Carder, B.M.; Merritt, B.T.

1979-01-01

A new energy store, the Compensated Pulsed Alternator (CPA), promises to be a cost effective substitute for capacitors to drive flashlamps that pump large Nd:glass lasers. Because the CPA is large and discrete, it will be necessary that it drive many parallel flashlamp circuits, presenting a problem in equal current distribution. Current division to +- 20% between parallel flashlamps has been achieved, but this is marginal for laser pumping. A method is presented here that provides equal current sharing to about 1%, and it includes fused protection against short circuit faults. The method was tested with eight parallel circuits, including both open-circuit and short-circuit fault tests
Bayer image parallel decoding based on GPU

Science.gov (United States)

Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua

2012-11-01

In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.
Refinement of Parallel and Reactive Programs

OpenAIRE

Back, R. J. R.

1992-01-01

We show how to apply the refinement calculus to stepwise refinement of parallel and reactive programs. We use action systems as our basic program model. Action systems are sequential programs which can be implemented in a parallel fashion. Hence refinement calculus methods, originally developed for sequential programs, carry over to the derivation of parallel programs. Refinement of reactive programs is handled by data refinement techniques originally developed for the sequential refinement c...
Portable parallel programming in a Fortran environment

International Nuclear Information System (INIS)

May, E.N.

1989-01-01

Experience using the Argonne-developed PARMACs macro package to implement a portable parallel programming environment is described. Fortran programs with intrinsic parallelism of coarse and medium granularity are easily converted to parallel programs which are portable among a number of commercially available parallel processors in the class of shared-memory bus-based and local-memory network based MIMD processors. The parallelism is implemented using standard UNIX (tm) tools and a small number of easily understood synchronization concepts (monitors and message-passing techniques) to construct and coordinate multiple cooperating processes on one or many processors. Benchmark results are presented for parallel computers such as the Alliant FX/8, the Encore MultiMax, the Sequent Balance, the Intel iPSC/2 Hypercube and a network of Sun 3 workstations. These parallel machines are typical MIMD types with from 8 to 30 processors, each rated at from 1 to 10 MIPS processing power. The demonstration code used for this work is a Monte Carlo simulation of the response to photons of a ''nearly realistic'' lead, iron and plastic electromagnetic and hadronic calorimeter, using the EGS4 code system. 6 refs., 2 figs., 2 tabs
Structured Parallel Programming Patterns for Efficient Computation

CERN Document Server

McCool, Michael; Robison, Arch

2012-01-01

Programming is now parallel programming. Much as structured programming revolutionized traditional serial programming decades ago, a new kind of structured programming, based on patterns, is relevant to parallel programming today. Parallel computing experts and industry insiders Michael McCool, Arch Robison, and James Reinders describe how to design and implement maintainable and efficient parallel algorithms using a pattern-based approach. They present both theory and practice, and give detailed concrete examples using multiple programming models. Examples are primarily given using two of th
Incorporation of parallel electrospun fibers for improved topographical guidance in 3D nerve guides

International Nuclear Information System (INIS)

Jeffries, Eric M; Wang Yadong

2013-01-01

Three dimensional (3D) conduits facilitate nerve regeneration. Parallel microfibers have been shown to guide axon extension and Schwann cell migration on flat sheets via topographical cues. However, incorporation of aligned microfibers into 3D conduits to accelerate nerve regeneration has proven challenging. We report an electrospinning technique to incorporate parallel microfibers into 3D constructs at high surface areas while retaining an open architecture. The nerve guide consists of many microchannels lined with a thin layer of longitudinally-aligned microfibers. This design aims to maximize benefits of topographical cues without inhibiting cellular infiltration. We support this hypothesis by demonstrating efficient cell infiltration in vitro. Additionally, this new technique reduces wall thickness compared to our previous design, providing a greater total area for tissue growth. This approach results in an architecture that very closely mimics the structure of decellularized nerve but with larger microchannel diameters to encourage cell infiltration. We believe that reproducing the native architecture is the first step toward matching autograph efficacy. Furthermore, this design can be combined with other biochemical cues to promote nerve regeneration. (paper)
A Tutorial on Parallel and Concurrent Programming in Haskell

Science.gov (United States)

Peyton Jones, Simon; Singh, Satnam

This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.
Parallel Computing Using Web Servers and "Servlets".

Science.gov (United States)

Lo, Alfred; Bloor, Chris; Choi, Y. K.

2000-01-01

Describes parallel computing and presents inexpensive ways to implement a virtual parallel computer with multiple Web servers. Highlights include performance measurement of parallel systems; models for using Java and intranet technology including single server, multiple clients and multiple servers, single client; and a comparison of CGI (common…
Current distribution characteristics of superconducting parallel circuits

International Nuclear Information System (INIS)

Mori, K.; Suzuki, Y.; Hara, N.; Kitamura, M.; Tominaka, T.

1994-01-01

In order to increase the current carrying capacity of the current path of the superconducting magnet system, the portion of parallel circuits such as insulated multi-strand cables or parallel persistent current switches (PCS) are made. In superconducting parallel circuits of an insulated multi-strand cable or a parallel persistent current switch (PCS), the current distribution during the current sweep, the persistent mode, and the quench process were investigated. In order to measure the current distribution, two methods were used. (1) Each strand was surrounded with a pure iron core with the air gap. In the air gap, a Hall probe was located. The accuracy of this method was deteriorated by the magnetic hysteresis of iron. (2) The Rogowski coil without iron was used for the current measurement of each path in a 4-parallel PCS. As a result, it was shown that the current distribution characteristics of a parallel PCS is very similar to that of an insulated multi-strand cable for the quench process
Parallel hierarchical global illumination

Energy Technology Data Exchange (ETDEWEB)

Snell, Quinn O. [Iowa State Univ., Ames, IA (United States)

1997-10-08

Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.
6th International Parallel Tools Workshop

CERN Document Server

Brinkmann, Steffen; Gracia, José; Resch, Michael; Nagel, Wolfgang

2013-01-01

The latest advances in the High Performance Computing hardware have significantly raised the level of available compute performance. At the same time, the growing hardware capabilities of modern supercomputing architectures have caused an increasing complexity of the parallel application development. Despite numerous efforts to improve and simplify parallel programming, there is still a lot of manual debugging and tuning work required. This process is supported by special software tools, facilitating debugging, performance analysis, and optimization and thus making a major contribution to the development of robust and efficient parallel software. This book introduces a selection of the tools, which were presented and discussed at the 6th International Parallel Tools Workshop, held in Stuttgart, Germany, 25-26 September 2012.
Random-phase approximation and its extension for the O(2) anharmonic oscillator

International Nuclear Information System (INIS)

Aouissat, Z.; Martin, C.

2004-01-01

We apply the random-phase approximation (RPA) and its extension called renormalized RPA to the quantum anharmonic oscillator with an O(2) symmetry. We first obtain the equation for the RPA frequencies in the standard and in the renormalized RPAs using the equation-of-motion method. In the case where the ground state has a broken symmetry, we check the existence of a zero frequency in the standard and in the renormalized RPAs. Then we use a time-dependent approach where the standard-RPA frequencies are obtained as small oscillations around the static solution in the time-dependent Hartree-Bogolyubov equation. We draw the parallel between the two approaches. (orig.)

Random-phase approximation and its extension for the O(2) anharmonic oscillator

Energy Technology Data Exchange (ETDEWEB)

Aouissat, Z. [Institut fuer Kernphysik, Technische Hochschule Darmstadt, Schlossgarten 9, D-64289, Darmstadt (Germany); Martin, C. [Groupe de Physique Theorique, Institut de Physique Nucleaire, F-91406, Orsay Cedex (France)

2004-02-01

We apply the random-phase approximation (RPA) and its extension called renormalized RPA to the quantum anharmonic oscillator with an O(2) symmetry. We first obtain the equation for the RPA frequencies in the standard and in the renormalized RPAs using the equation-of-motion method. In the case where the ground state has a broken symmetry, we check the existence of a zero frequency in the standard and in the renormalized RPAs. Then we use a time-dependent approach where the standard-RPA frequencies are obtained as small oscillations around the static solution in the time-dependent Hartree-Bogolyubov equation. We draw the parallel between the two approaches. (orig.)
Angular parallelization of a curvilinear Sn transport theory method

International Nuclear Information System (INIS)

Haghighat, A.

1991-01-01

In this paper a parallel algorithm for angular domain decomposition (or parallelization) of an r-dependent spherical S n transport theory method is derived. The parallel formulation is incorporated into TWOTRAN-II using the IBM Parallel Fortran compiler and implemented on an IBM 3090/400 (with four processors). The behavior of the parallel algorithm for different physical problems is studied, and it is concluded that the parallel algorithm behaves differently in the presence of a fission source as opposed to the absence of a fission source; this is attributed to the relative contributions of the source and the angular redistribution terms in the S s algorithm. Further, the parallel performance of the algorithm is measured for various problem sizes and different combinations of angular subdomains or processors. Poor parallel efficiencies between ∼35 and 50% are achieved in situations where the relative difference of parallel to serial iterations is ∼50%. High parallel efficiencies between ∼60% and 90% are obtained in situations where the relative difference of parallel to serial iterations is <35%
Combining Compile-Time and Run-Time Parallelization

Directory of Open Access Journals (Sweden)

Sungdo Moon

1999-01-01

Full Text Available This paper demonstrates that significant improvements to automatic parallelization technology require that existing systems be extended in two ways: (1 they must combine high‐quality compile‐time analysis with low‐cost run‐time testing; and (2 they must take control flow into account during analysis. We support this claim with the results of an experiment that measures the safety of parallelization at run time for loops left unparallelized by the Stanford SUIF compiler’s automatic parallelization system. We present results of measurements on programs from two benchmark suites – SPECFP95 and NAS sample benchmarks – which identify inherently parallel loops in these programs that are missed by the compiler. We characterize remaining parallelization opportunities, and find that most of the loops require run‐time testing, analysis of control flow, or some combination of the two. We present a new compile‐time analysis technique that can be used to parallelize most of these remaining loops. This technique is designed to not only improve the results of compile‐time parallelization, but also to produce low‐cost, directed run‐time tests that allow the system to defer binding of parallelization until run‐time when safety cannot be proven statically. We call this approach predicated array data‐flow analysis. We augment array data‐flow analysis, which the compiler uses to identify independent and privatizable arrays, by associating predicates with array data‐flow values. Predicated array data‐flow analysis allows the compiler to derive “optimistic” data‐flow values guarded by predicates; these predicates can be used to derive a run‐time test guaranteeing the safety of parallelization.
Parallelization characteristics of the DeCART code

International Nuclear Information System (INIS)

Cho, J. Y.; Joo, H. G.; Kim, H. Y.; Lee, C. C.; Chang, M. H.; Zee, S. Q.

2003-12-01

This report is to describe the parallelization characteristics of the DeCART code and also examine its parallel performance. Parallel computing algorithms are implemented to DeCART to reduce the tremendous computational burden and memory requirement involved in the three-dimensional whole core transport calculation. In the parallelization of the DeCART code, the axial domain decomposition is first realized by using MPI (Message Passing Interface), and then the azimuthal angle domain decomposition by using either MPI or OpenMP. When using the MPI for both the axial and the angle domain decomposition, the concept of MPI grouping is employed for convenient communication in each communication world. For the parallel computation, most of all the computing modules except for the thermal hydraulic module are parallelized. These parallelized computing modules include the MOC ray tracing, CMFD, NEM, region-wise cross section preparation and cell homogenization modules. For the distributed allocation, most of all the MOC and CMFD/NEM variables are allocated only for the assigned planes, which reduces the required memory by a ratio of the number of the assigned planes to the number of all planes. The parallel performance of the DeCART code is evaluated by solving two problems, a rodded variation of the C5G7 MOX three-dimensional benchmark problem and a simplified three-dimensional SMART PWR core problem. In the aspect of parallel performance, the DeCART code shows a good speedup of about 40.1 and 22.4 in the ray tracing module and about 37.3 and 20.2 in the total computing time when using 48 CPUs on the IBM Regatta and 24 CPUs on the LINUX cluster, respectively. In the comparison between the MPI and OpenMP, OpenMP shows a somewhat better performance than MPI. Therefore, it is concluded that the first priority in the parallel computation of the DeCART code is in the axial domain decomposition by using MPI, and then in the angular domain using OpenMP, and finally the angular
PSHED: a simplified approach to developing parallel programs

International Nuclear Information System (INIS)

Mahajan, S.M.; Ramesh, K.; Rajesh, K.; Somani, A.; Goel, M.

1992-01-01

This paper presents a simplified approach in the forms of a tree structured computational model for parallel application programs. An attempt is made to provide a standard user interface to execute programs on BARC Parallel Processing System (BPPS), a scalable distributed memory multiprocessor. The interface package called PSHED provides a basic framework for representing and executing parallel programs on different parallel architectures. The PSHED package incorporates concepts from a broad range of previous research in programming environments and parallel computations. (author). 6 refs
Parallel evolutionary computation in bioinformatics applications.

Science.gov (United States)

Pinho, Jorge; Sobral, João Luis; Rocha, Miguel

2013-05-01

A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
PARALLEL IMPORT: REALITY FOR RUSSIA

Directory of Open Access Journals (Sweden)

Т. А. Сухопарова

2014-01-01

Full Text Available Problem of parallel import is urgent question at now. Parallel import legalization in Russia is expedient. Such statement based on opposite experts opinion analysis. At the same time it’s necessary to negative consequences consider of this decision and to apply remedies to its minimization.Purchase on Elibrary.ru > Buy now
Reactor Dosimetry Applications Using RAPTOR-M3G:. a New Parallel 3-D Radiation Transport Code

Science.gov (United States)

Longoni, Gianluca; Anderson, Stanwood L.

2009-08-01

The numerical solution of the Linearized Boltzmann Equation (LBE) via the Discrete Ordinates method (SN) requires extensive computational resources for large 3-D neutron and gamma transport applications due to the concurrent discretization of the angular, spatial, and energy domains. This paper will discuss the development RAPTOR-M3G (RApid Parallel Transport Of Radiation - Multiple 3D Geometries), a new 3-D parallel radiation transport code, and its application to the calculation of ex-vessel neutron dosimetry responses in the cavity of a commercial 2-loop Pressurized Water Reactor (PWR). RAPTOR-M3G is based domain decomposition algorithms, where the spatial and angular domains are allocated and processed on multi-processor computer architectures. As compared to traditional single-processor applications, this approach reduces the computational load as well as the memory requirement per processor, yielding an efficient solution methodology for large 3-D problems. Measured neutron dosimetry responses in the reactor cavity air gap will be compared to the RAPTOR-M3G predictions. This paper is organized as follows: Section 1 discusses the RAPTOR-M3G methodology; Section 2 describes the 2-loop PWR model and the numerical results obtained. Section 3 addresses the parallel performance of the code, and Section 4 concludes this paper with final remarks and future work.
Multitasking TORT Under UNICOS: Parallel Performance Models and Measurements

International Nuclear Information System (INIS)

Azmy, Y.Y.; Barnett, D.A.

1999-01-01

The existing parallel algorithms in the TORT discrete ordinates were updated to function in a UNI-COS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead
Multitasking TORT under UNICOS: Parallel performance models and measurements

International Nuclear Information System (INIS)

Barnett, A.; Azmy, Y.Y.

1999-01-01

The existing parallel algorithms in the TORT discrete ordinates code were updated to function in a UNICOS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead
Parallel artificial liquid membrane extraction

DEFF Research Database (Denmark)

Gjelstad, Astrid; Rasmussen, Knut Einar; Parmer, Marthe Petrine

2013-01-01

This paper reports development of a new approach towards analytical liquid-liquid-liquid membrane extraction termed parallel artificial liquid membrane extraction. A donor plate and acceptor plate create a sandwich, in which each sample (human plasma) and acceptor solution is separated by an arti......This paper reports development of a new approach towards analytical liquid-liquid-liquid membrane extraction termed parallel artificial liquid membrane extraction. A donor plate and acceptor plate create a sandwich, in which each sample (human plasma) and acceptor solution is separated...... by an artificial liquid membrane. Parallel artificial liquid membrane extraction is a modification of hollow-fiber liquid-phase microextraction, where the hollow fibers are replaced by flat membranes in a 96-well plate format....
Massively parallel multicanonical simulations

Science.gov (United States)

Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard

2018-03-01

Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.
Parallel thermal radiation transport in two dimensions

International Nuclear Information System (INIS)

Smedley-Stevenson, R.P.; Ball, S.R.

2003-01-01

This paper describes the distributed memory parallel implementation of a deterministic thermal radiation transport algorithm in a 2-dimensional ALE hydrodynamics code. The parallel algorithm consists of a variety of components which are combined in order to produce a state of the art computational capability, capable of solving large thermal radiation transport problems using Blue-Oak, the 3 Tera-Flop MPP (massive parallel processors) computing facility at AWE (United Kingdom). Particular aspects of the parallel algorithm are described together with examples of the performance on some challenging applications. (author)
Parallel thermal radiation transport in two dimensions

Energy Technology Data Exchange (ETDEWEB)

Smedley-Stevenson, R.P.; Ball, S.R. [AWE Aldermaston (United Kingdom)

2003-07-01

This paper describes the distributed memory parallel implementation of a deterministic thermal radiation transport algorithm in a 2-dimensional ALE hydrodynamics code. The parallel algorithm consists of a variety of components which are combined in order to produce a state of the art computational capability, capable of solving large thermal radiation transport problems using Blue-Oak, the 3 Tera-Flop MPP (massive parallel processors) computing facility at AWE (United Kingdom). Particular aspects of the parallel algorithm are described together with examples of the performance on some challenging applications. (author)
Parallel processing for artificial intelligence 1

CERN Document Server

Kanal, LN; Kumar, V; Suttner, CB

1994-01-01

Parallel processing for AI problems is of great current interest because of its potential for alleviating the computational demands of AI procedures. The articles in this book consider parallel processing for problems in several areas of artificial intelligence: image processing, knowledge representation in semantic networks, production rules, mechanization of logic, constraint satisfaction, parsing of natural language, data filtering and data mining. The publication is divided into six sections. The first addresses parallel computing for processing and understanding images. The second discus
Comparison of parallel viscosity with neoclassical theory

International Nuclear Information System (INIS)

Ida, K.; Nakajima, N.

1996-04-01

Toroidal rotation profiles are measured with charge exchange spectroscopy for the plasma heated with tangential NBI in CHS heliotron/torsatron device to estimate parallel viscosity. The parallel viscosity derived from the toroidal rotation velocity shows good agreement with the neoclassical parallel viscosity plus the perpendicular viscosity. (μ perpendicular = 2 m 2 /s). (author)
Adapting algorithms to massively parallel hardware

CERN Document Server

Sioulas, Panagiotis

2016-01-01

In the recent years, the trend in computing has shifted from delivering processors with faster clock speeds to increasing the number of cores per processor. This marks a paradigm shift towards parallel programming in which applications are programmed to exploit the power provided by multi-cores. Usually there is gain in terms of the time-to-solution and the memory footprint. Specifically, this trend has sparked an interest towards massively parallel systems that can provide a large number of processors, and possibly computing nodes, as in the GPUs and MPPAs (Massively Parallel Processor Arrays). In this project, the focus was on two distinct computing problems: k-d tree searches and track seeding cellular automata. The goal was to adapt the algorithms to parallel systems and evaluate their performance in different cases.
Implementing Shared Memory Parallelism in MCBEND

Directory of Open Access Journals (Sweden)

Bird Adam

2017-01-01

Full Text Available MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers’s ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.
Intra-Arc extension in Central America: Links between plate motions, tectonics, volcanism, and geochemistry

Science.gov (United States)

Phipps Morgan, Jason; Ranero, Cesar; Vannucchi, Paola

2010-05-01

This study revisits the kinematics and tectonics of Central America subduction, synthesizing observations of marine bathymetry, high-resolution land topography, current plate motions, and the recent seismotectonic and magmatic history in this region. The inferred tectonic history implies that the Guatemala-El Salvador and Nicaraguan segments of this volcanic arc have been a region of significant arc tectonic extension; extension arising from the interplay between subduction roll-back of the Cocos Plate and the ~10-15 mm/yr slower westward drift of the Caribbean plate relative to the North American Plate. The ages of belts of magmatic rocks paralleling both sides of the current Nicaraguan arc are consistent with long-term arc-normal extension in Nicaragua at the rate of ~5-10 mm/yr, in agreement with rates predicted by plate kinematics. Significant arc-normal extension can ‘hide' a very large intrusive arc-magma flux; we suggest that Nicaragua is, in fact, the most magmatically robust section of the Central American arc, and that the volume of intrusive volcanism here has been previously greatly underestimated. Yet, this flux is hidden by the persistent extension and sediment infill of the rifting basin in which the current arc sits. Observed geochemical differences between the Nicaraguan arc and its neighbors which suggest that Nicaragua has a higher rate of arc-magmatism are consistent with this interpretation. Smaller-amplitude, but similar systematic geochemical correlations between arc-chemistry and arc-extension in Guatemala show the same pattern as the even larger variations between the Nicaragua arc and its neighbors. We are also exploring the potential implications of intra-arc extension for deformation processes along the subducting plate boundary and within the forearc ‘microplate'.
Parallel fabrication of macroporous scaffolds.

Science.gov (United States)

Dobos, Andrew; Grandhi, Taraka Sai Pavan; Godeshala, Sudhakar; Meldrum, Deirdre R; Rege, Kaushal

2018-07-01

Scaffolds generated from naturally occurring and synthetic polymers have been investigated in several applications because of their biocompatibility and tunable chemo-mechanical properties. Existing methods for generation of 3D polymeric scaffolds typically cannot be parallelized, suffer from low throughputs, and do not allow for quick and easy removal of the fragile structures that are formed. Current molds used in hydrogel and scaffold fabrication using solvent casting and porogen leaching are often single-use and do not facilitate 3D scaffold formation in parallel. Here, we describe a simple device and related approaches for the parallel fabrication of macroporous scaffolds. This approach was employed for the generation of macroporous and non-macroporous materials in parallel, in higher throughput and allowed for easy retrieval of these 3D scaffolds once formed. In addition, macroporous scaffolds with interconnected as well as non-interconnected pores were generated, and the versatility of this approach was employed for the generation of 3D scaffolds from diverse materials including an aminoglycoside-derived cationic hydrogel ("Amikagel"), poly(lactic-co-glycolic acid) or PLGA, and collagen. Macroporous scaffolds generated using the device were investigated for plasmid DNA binding and cell loading, indicating the use of this approach for developing materials for different applications in biotechnology. Our results demonstrate that the device-based approach is a simple technology for generating scaffolds in parallel, which can enhance the toolbox of current fabrication techniques. © 2018 Wiley Periodicals, Inc.

Event parallelism: Distributed memory parallel computing for high energy physics experiments

International Nuclear Information System (INIS)

Nash, T.

1989-05-01

This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. 6 figs
Event parallelism: Distributed memory parallel computing for high energy physics experiments

International Nuclear Information System (INIS)

Nash, T.

1989-01-01

This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. (orig.)
Event parallelism: Distributed memory parallel computing for high energy physics experiments

Science.gov (United States)

Nash, Thomas

1989-12-01

This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC system, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described.
Researching the Parallel Process in Supervision and Psychotherapy

DEFF Research Database (Denmark)

Jacobsen, Claus Haugaard

Reflects upon how to do process research in supervision and in the parallel process. A single case study is presented illustrating how a study on parallel process can be carried out.......Reflects upon how to do process research in supervision and in the parallel process. A single case study is presented illustrating how a study on parallel process can be carried out....
Development of parallel/serial program analyzing tool

International Nuclear Information System (INIS)

Watanabe, Hiroshi; Nagao, Saichi; Takigawa, Yoshio; Kumakura, Toshimasa

1999-03-01

Japan Atomic Energy Research Institute has been developing 'KMtool', a parallel/serial program analyzing tool, in order to promote the parallelization of the science and engineering computation program. KMtool analyzes the performance of program written by FORTRAN77 and MPI, and it reduces the effort for parallelization. This paper describes development purpose, design, utilization and evaluation of KMtool. (author)
Simulation Exploration through Immersive Parallel Planes: Preprint

Energy Technology Data Exchange (ETDEWEB)

Brunhart-Lupo, Nicholas; Bush, Brian W.; Gruchalla, Kenny; Smith, Steve

2016-03-01

We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, each individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.
Parallel programming practical aspects, models and current limitations

CERN Document Server

Tarkov, Mikhail S

2014-01-01

Parallel programming is designed for the use of parallel computer systems for solving time-consuming problems that cannot be solved on a sequential computer in a reasonable time. These problems can be divided into two classes: 1. Processing large data arrays (including processing images and signals in real time)2. Simulation of complex physical processes and chemical reactions For each of these classes, prospective methods are designed for solving problems. For data processing, one of the most promising technologies is the use of artificial neural networks. Particles-in-cell method and cellular automata are very useful for simulation. Problems of scalability of parallel algorithms and the transfer of existing parallel programs to future parallel computers are very acute now. An important task is to optimize the use of the equipment (including the CPU cache) of parallel computers. Along with parallelizing information processing, it is essential to ensure the processing reliability by the relevant organization ...
Semantic Language Extensions for Implicit Parallel Programming

Science.gov (United States)

2013-09-01

mobile CPU interacts with a GPU on the same device and a cloud based backend at a remote location presents endless possibilities for solving com...for his contribution to the compiler infrastructure . His creativity in solving research problems and expertise in architecting and implementing...92 5.5.1 Frontend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.5.2 Backend
Computing effective properties of random heterogeneous materials on heterogeneous parallel processors

Science.gov (United States)

Leidi, Tiziano; Scocchi, Giulio; Grossi, Loris; Pusterla, Simone; D'Angelo, Claudio; Thiran, Jean-Philippe; Ortona, Alberto

2012-11-01

In recent decades, finite element (FE) techniques have been extensively used for predicting effective properties of random heterogeneous materials. In the case of very complex microstructures, the choice of numerical methods for the solution of this problem can offer some advantages over classical analytical approaches, and it allows the use of digital images obtained from real material samples (e.g., using computed tomography). On the other hand, having a large number of elements is often necessary for properly describing complex microstructures, ultimately leading to extremely time-consuming computations and high memory requirements. With the final objective of reducing these limitations, we improved an existing freely available FE code for the computation of effective conductivity (electrical and thermal) of microstructure digital models. To allow execution on hardware combining multi-core CPUs and a GPU, we first translated the original algorithm from Fortran to C, and we subdivided it into software components. Then, we enhanced the C version of the algorithm for parallel processing with heterogeneous processors. With the goal of maximizing the obtained performances and limiting resource consumption, we utilized a software architecture based on stream processing, event-driven scheduling, and dynamic load balancing. The parallel processing version of the algorithm has been validated using a simple microstructure consisting of a single sphere located at the centre of a cubic box, yielding consistent results. Finally, the code was used for the calculation of the effective thermal conductivity of a digital model of a real sample (a ceramic foam obtained using X-ray computed tomography). On a computer equipped with dual hexa-core Intel Xeon X5670 processors and an NVIDIA Tesla C2050, the parallel application version features near to linear speed-up progression when using only the CPU cores. It executes more than 20 times faster when additionally using the GPU.
Parallelization of Subchannel Analysis Code MATRA

International Nuclear Information System (INIS)

Kim, Seongjin; Hwang, Daehyun; Kwon, Hyouk

2014-01-01

A stand-alone calculation of MATRA code used up pertinent computing time for the thermal margin calculations while a relatively considerable time is needed to solve the whole core pin-by-pin problems. In addition, it is strongly required to improve the computation speed of the MATRA code to satisfy the overall performance of the multi-physics coupling calculations. Therefore, a parallel approach to improve and optimize the computability of the MATRA code is proposed and verified in this study. The parallel algorithm is embodied in the MATRA code using the MPI communication method and the modification of the previous code structure was minimized. An improvement is confirmed by comparing the results between the single and multiple processor algorithms. The speedup and efficiency are also evaluated when increasing the number of processors. The parallel algorithm was implemented to the subchannel code MATRA using the MPI. The performance of the parallel algorithm was verified by comparing the results with those from the MATRA with the single processor. It is also noticed that the performance of the MATRA code was greatly improved by implementing the parallel algorithm for the 1/8 core and whole core problems
Broadcasting a message in a parallel computer

Science.gov (United States)

Berg, Jeremy E [Rochester, MN; Faraj, Ahmad A [Rochester, MN

2011-08-02

Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.
Parallel programming with Python

CERN Document Server

Palach, Jan

2014-01-01

A fast, easy-to-follow and clear tutorial to help you develop Parallel computing systems using Python. Along with explaining the fundamentals, the book will also introduce you to slightly advanced concepts and will help you in implementing these techniques in the real world. If you are an experienced Python programmer and are willing to utilize the available computing resources by parallelizing applications in a simple way, then this book is for you. You are required to have a basic knowledge of Python development to get the most of this book.
Sociologists in Extension

Science.gov (United States)

Christenson, James A.; And Others

1977-01-01

The article describes the work activities of the extension sociologist, the relative advantage and disadvantage of extension roles in relation to teaching/research roles, and the relevance of sociological training and research for extension work. (NQ)
Multistage parallel-serial time averaging filters

International Nuclear Information System (INIS)

Theodosiou, G.E.

1980-01-01

Here, a new time averaging circuit design, the 'parallel filter' is presented, which can reduce the time jitter, introduced in time measurements using counters of large dimensions. This parallel filter could be considered as a single stage unit circuit which can be repeated an arbitrary number of times in series, thus providing a parallel-serial filter type as a result. The main advantages of such a filter over a serial one are much less electronic gate jitter and time delay for the same amount of total time uncertainty reduction. (orig.)
Massively parallel Fokker-Planck code ALLAp

International Nuclear Information System (INIS)

Batishcheva, A.A.; Krasheninnikov, S.I.; Craddock, G.G.; Djordjevic, V.

1996-01-01

The recently developed for workstations Fokker-Planck code ALLA simulates the temporal evolution of 1V, 2V and 1D2V collisional edge plasmas. In this work we present the results of code parallelization on the CRI T3D massively parallel platform (ALLAp version). Simultaneously we benchmark the 1D2V parallel vesion against an analytic self-similar solution of the collisional kinetic equation. This test is not trivial as it demands a very strong spatial temperature and density variation within the simulation domain. (orig.)
Population genomics of parallel adaptation in threespine stickleback using sequenced RAD tags.

Directory of Open Access Journals (Sweden)

Paul A Hohenlohe

2010-02-01

Full Text Available Next-generation sequencing technology provides novel opportunities for gathering genome-scale sequence data in natural populations, laying the empirical foundation for the evolving field of population genomics. Here we conducted a genome scan of nucleotide diversity and differentiation in natural populations of threespine stickleback (Gasterosteus aculeatus. We used Illumina-sequenced RAD tags to identify and type over 45,000 single nucleotide polymorphisms (SNPs in each of 100 individuals from two oceanic and three freshwater populations. Overall estimates of genetic diversity and differentiation among populations confirm the biogeographic hypothesis that large panmictic oceanic populations have repeatedly given rise to phenotypically divergent freshwater populations. Genomic regions exhibiting signatures of both balancing and divergent selection were remarkably consistent across multiple, independently derived populations, indicating that replicate parallel phenotypic evolution in stickleback may be occurring through extensive, parallel genetic evolution at a genome-wide scale. Some of these genomic regions co-localize with previously identified QTL for stickleback phenotypic variation identified using laboratory mapping crosses. In addition, we have identified several novel regions showing parallel differentiation across independent populations. Annotation of these regions revealed numerous genes that are candidates for stickleback phenotypic evolution and will form the basis of future genetic analyses in this and other organisms. This study represents the first high-density SNP-based genome scan of genetic diversity and differentiation for populations of threespine stickleback in the wild. These data illustrate the complementary nature of laboratory crosses and population genomic scans by confirming the adaptive significance of previously identified genomic regions, elucidating the particular evolutionary and demographic history of such
Parallel Algorithms for Groebner-Basis Reduction

Science.gov (United States)

1987-09-25

22209 ELEMENT NO. NO. NO. ACCESSION NO. 11. TITLE (Include Security Classification) * PARALLEL ALGORITHMS FOR GROEBNER -BASIS REDUCTION 12. PERSONAL...All other editions are obsolete. Productivity Engineering in the UNIXt Environment p Parallel Algorithms for Groebner -Basis Reduction Technical Report
A possibility of parallel and anti-parallel diffraction measurements on ...

Indian Academy of Sciences (India)

resolution property of the other one, anti-parallel position, is very poor. .... in a wide angular region using BPC mochromator at the MF condition by showing ... and N Nimura, Proceedings of the 7th World Conference on Neutron Radiography,.
Research in Parallel Algorithms and Software for Computational Aerosciences

Science.gov (United States)

Domel, Neal D.

1996-01-01

Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
A task parallel implementation of fast multipole methods

KAUST Repository

Taura, Kenjiro; Nakashima, Jun; Yokota, Rio; Maruyama, Naoya

2012-01-01

This paper describes a task parallel implementation of ExaFMM, an open source implementation of fast multipole methods (FMM), using a lightweight task parallel library MassiveThreads. Although there have been many attempts on parallelizing FMM

Massively parallel de novo protein design for targeted therapeutics

KAUST Repository

Chevalier, Aaron

2017-09-26

De novo protein design holds promise for creating small stable proteins with shapes customized to bind therapeutic targets. We describe a massively parallel approach for designing, manufacturing and screening mini-protein binders, integrating large-scale computational design, oligonucleotide synthesis, yeast display screening and next-generation sequencing. We designed and tested 22,660 mini-proteins of 37-43 residues that target influenza haemagglutinin and botulinum neurotoxin B, along with 6,286 control sequences to probe contributions to folding and binding, and identified 2,618 high-affinity binders. Comparison of the binding and non-binding design sets, which are two orders of magnitude larger than any previously investigated, enabled the evaluation and improvement of the computational model. Biophysical characterization of a subset of the binder designs showed that they are extremely stable and, unlike antibodies, do not lose activity after exposure to high temperatures. The designs elicit little or no immune response and provide potent prophylactic and therapeutic protection against influenza, even after extensive repeated dosing.
Massively parallel de novo protein design for targeted therapeutics

KAUST Repository

Chevalier, Aaron; Silva, Daniel-Adriano; Rocklin, Gabriel J.; Hicks, Derrick R.; Vergara, Renan; Murapa, Patience; Bernard, Steffen M.; Zhang, Lu; Lam, Kwok-Ho; Yao, Guorui; Bahl, Christopher D.; Miyashita, Shin-Ichiro; Goreshnik, Inna; Fuller, James T.; Koday, Merika T.; Jenkins, Cody M.; Colvin, Tom; Carter, Lauren; Bohn, Alan; Bryan, Cassie M.; Ferná ndez-Velasco, D. Alejandro; Stewart, Lance; Dong, Min; Huang, Xuhui; Jin, Rongsheng; Wilson, Ian A.; Fuller, Deborah H.; Baker, David

2017-01-01

De novo protein design holds promise for creating small stable proteins with shapes customized to bind therapeutic targets. We describe a massively parallel approach for designing, manufacturing and screening mini-protein binders, integrating large-scale computational design, oligonucleotide synthesis, yeast display screening and next-generation sequencing. We designed and tested 22,660 mini-proteins of 37-43 residues that target influenza haemagglutinin and botulinum neurotoxin B, along with 6,286 control sequences to probe contributions to folding and binding, and identified 2,618 high-affinity binders. Comparison of the binding and non-binding design sets, which are two orders of magnitude larger than any previously investigated, enabled the evaluation and improvement of the computational model. Biophysical characterization of a subset of the binder designs showed that they are extremely stable and, unlike antibodies, do not lose activity after exposure to high temperatures. The designs elicit little or no immune response and provide potent prophylactic and therapeutic protection against influenza, even after extensive repeated dosing.
Massively parallel de novo protein design for targeted therapeutics

Science.gov (United States)

Chevalier, Aaron; Silva, Daniel-Adriano; Rocklin, Gabriel J.; Hicks, Derrick R.; Vergara, Renan; Murapa, Patience; Bernard, Steffen M.; Zhang, Lu; Lam, Kwok-Ho; Yao, Guorui; Bahl, Christopher D.; Miyashita, Shin-Ichiro; Goreshnik, Inna; Fuller, James T.; Koday, Merika T.; Jenkins, Cody M.; Colvin, Tom; Carter, Lauren; Bohn, Alan; Bryan, Cassie M.; Fernández-Velasco, D. Alejandro; Stewart, Lance; Dong, Min; Huang, Xuhui; Jin, Rongsheng; Wilson, Ian A.; Fuller, Deborah H.; Baker, David

2018-01-01

De novo protein design holds promise for creating small stable proteins with shapes customized to bind therapeutic targets. We describe a massively parallel approach for designing, manufacturing and screening mini-protein binders, integrating large-scale computational design, oligonucleotide synthesis, yeast display screening and next-generation sequencing. We designed and tested 22,660 mini-proteins of 37–43 residues that target influenza haemagglutinin and botulinum neurotoxin B, along with 6,286 control sequences to probe contributions to folding and binding, and identified 2,618 high-affinity binders. Comparison of the binding and non-binding design sets, which are two orders of magnitude larger than any previously investigated, enabled the evaluation and improvement of the computational model. Biophysical characterization of a subset of the binder designs showed that they are extremely stable and, unlike antibodies, do not lose activity after exposure to high temperatures. The designs elicit little or no immune response and provide potent prophylactic and therapeutic protection against influenza, even after extensive repeated dosing. PMID:28953867
Optimisation of a parallel ocean general circulation model

OpenAIRE

M. I. Beare; D. P. Stevens

1997-01-01

International audience; This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by...
Vectorization, parallelization and porting of nuclear codes on the VPP500 system (parallelization). Progress report fiscal 1996

Energy Technology Data Exchange (ETDEWEB)

Watanabe, Hideo; Kawai, Wataru; Nemoto, Toshiyuki [Fujitsu Ltd., Tokyo (Japan); and others

1997-12-01

Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. These results are reported in 3 parts, i.e., the vectorization part, the parallelization part and the porting part. In this report, we describe the parallelization. In this parallelization part, the parallelization of 2-Dimensional relativistic electromagnetic particle code EM2D, Cylindrical Direct Numerical Simulation code CYLDNS and molecular dynamics code for simulating radiation damages in diamond crystals DGR are described. In the vectorization part, the vectorization of two and three dimensional discrete ordinates simulation code DORT-TORT, gas dynamics analysis code FLOWGR and relativistic Boltzmann-Uehling-Uhlenbeck simulation code RBUU are described. And then, in the porting part, the porting of reactor safety analysis code RELAP5/MOD3.2 and RELAP5/MOD3.2.1.2, nuclear data processing system NJOY and 2-D multigroup discrete ordinate transport code TWOTRAN-II are described. And also, a survey for the porting of command-driven interactive data analysis plotting program IPLOT are described. (author)
Configuration affects parallel stent grafting results.

Science.gov (United States)

Tanious, Adam; Wooster, Mathew; Armstrong, Paul A; Zwiebel, Bruce; Grundy, Shane; Back, Martin R; Shames, Murray L

2018-05-01

A number of adjunctive "off-the-shelf" procedures have been described to treat complex aortic diseases. Our goal was to evaluate parallel stent graft configurations and to determine an optimal formula for these procedures. This is a retrospective review of all patients at a single medical center treated with parallel stent grafts from January 2010 to September 2015. Outcomes were evaluated on the basis of parallel graft orientation, type, and main body device. Primary end points included parallel stent graft compromise and overall endovascular aneurysm repair (EVAR) compromise. There were 78 patients treated with a total of 144 parallel stents for a variety of pathologic processes. There was a significant correlation between main body oversizing and snorkel compromise (P = .0195) and overall procedural complication (P = .0019) but not with endoleak rates. Patients were organized into the following oversizing groups for further analysis: 0% to 10%, 10% to 20%, and >20%. Those oversized into the 0% to 10% group had the highest rate of overall EVAR complication (73%; P = .0003). There were no significant correlations between any one particular configuration and overall procedural complication. There was also no significant correlation between total number of parallel stents employed and overall complication. Composite EVAR configuration had no significant correlation with individual snorkel compromise, endoleak, or overall EVAR or procedural complication. The configuration most prone to individual snorkel compromise and overall EVAR complication was a four-stent configuration with two stents in an antegrade position and two stents in a retrograde position (60% complication rate). The configuration most prone to endoleak was one or two stents in retrograde position (33% endoleak rate), followed by three stents in an all-antegrade position (25%). There was a significant correlation between individual stent configuration and stent compromise (P = .0385), with 31
Parallel multigrid smoothing: polynomial versus Gauss-Seidel

International Nuclear Information System (INIS)

Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

2003-01-01

Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines
Parallel multigrid smoothing: polynomial versus Gauss-Seidel

Science.gov (United States)

Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

2003-07-01

Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines.
Scalable parallel prefix solvers for discrete ordinates transport

International Nuclear Information System (INIS)

Pautz, S.; Pandya, T.; Adams, M.

2009-01-01

The well-known 'sweep' algorithm for inverting the streaming-plus-collision term in first-order deterministic radiation transport calculations has some desirable numerical properties. However, it suffers from parallel scaling issues caused by a lack of concurrency. The maximum degree of concurrency, and thus the maximum parallelism, grows more slowly than the problem size for sweeps-based solvers. We investigate a new class of parallel algorithms that involves recasting the streaming-plus-collision problem in prefix form and solving via cyclic reduction. This method, although computationally more expensive at low levels of parallelism than the sweep algorithm, offers better theoretical scalability properties. Previous work has demonstrated this approach for one-dimensional calculations; we show how to extend it to multidimensional calculations. Notably, for multiple dimensions it appears that this approach is limited to long-characteristics discretizations; other discretizations cannot be cast in prefix form. We implement two variants of the algorithm within the radlib/SCEPTRE transport code library at Sandia National Laboratories and show results on two different massively parallel systems. Both the 'forward' and 'symmetric' solvers behave similarly, scaling well to larger degrees of parallelism then sweeps-based solvers. We do observe some issues at the highest levels of parallelism (relative to the system size) and discuss possible causes. We conclude that this approach shows good potential for future parallel systems, but the parallel scalability will depend heavily on the architecture of the communication networks of these systems. (authors)
Domain decomposition methods and parallel computing

International Nuclear Information System (INIS)

Meurant, G.

1991-01-01

In this paper, we show how to efficiently solve large linear systems on parallel computers. These linear systems arise from discretization of scientific computing problems described by systems of partial differential equations. We show how to get a discrete finite dimensional system from the continuous problem and the chosen conjugate gradient iterative algorithm is briefly described. Then, the different kinds of parallel architectures are reviewed and their advantages and deficiencies are emphasized. We sketch the problems found in programming the conjugate gradient method on parallel computers. For this algorithm to be efficient on parallel machines, domain decomposition techniques are introduced. We give results of numerical experiments showing that these techniques allow a good rate of convergence for the conjugate gradient algorithm as well as computational speeds in excess of a billion of floating point operations per second. (author). 5 refs., 11 figs., 2 tabs., 1 inset
Extensions of guiding center motion to higher order

International Nuclear Information System (INIS)

Northrop, T.G.; Rome, J.A.

1978-01-01

In a static magnetic field, some well-known guiding center equations maintain their form when extended to next order in gyroradius. In these cases, it is only necessary to include the next order term in the magnetic moment series. The differential equation for guiding center motion which describes both the parallel and perpendicular velocities correctly through first order in gyroradius is given. The question of how to define the guiding center position through second order arises and is discussed, and second order drifts are derived for one usual definition. The toroidal canonical angular momentum, P/sub phi/, of the guiding center in an axisymmetric field is shown to be conserved using the guiding center velocity correct through first order. When second-order motion is included, P/sub phi/ is no longer a constant. The above extensions of guiding center theory help to resolve the different tokamak orbits obtained either by using the guiding center equations of motion or by using conservation of P/sub phi/
Extensions of guiding center motion to higher order

International Nuclear Information System (INIS)

Northrop, T.G.; Rome, J.A.

1977-07-01

In a static magnetic field, some well-known guiding center equations maintain their form when extended to next order in gyroradius. In these cases, it is only necessary to include the next order term in the magnetic moment series. The differential equation for guiding center motion which describes both the parallel and perpendicular velocities correctly through first order in gyroradius is given. The question of how to define the guiding center position through second order arises and is discussed, and second order drifts are derived for one usual definition. The toroidal canonical angular momentum, P/sub phi/, of the guiding center in an axisymmetric field is shown to be conserved using the guiding center velocity correct through first order. When second order motion is included, P/sub phi/ is no longer a constant. The above extensions of guiding center theory help to resolve the different tokamak orbits obtained either by using the guiding center equations of motion or by using conservation of P/sub phi/
Parallelization and automatic data distribution for nuclear reactor simulations

Energy Technology Data Exchange (ETDEWEB)

Liebrock, L.M. [Liebrock-Hicks Research, Calumet, MI (United States)

1997-07-01

Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.
Parallelization and automatic data distribution for nuclear reactor simulations

International Nuclear Information System (INIS)

Liebrock, L.M.

1997-01-01

Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed
The convergence of parallel Boltzmann machines

NARCIS (Netherlands)

Zwietering, P.J.; Aarts, E.H.L.; Eckmiller, R.; Hartmann, G.; Hauske, G.

1990-01-01

We discuss the main results obtained in a study of a mathematical model of synchronously parallel Boltzmann machines. We present supporting evidence for the conjecture that a synchronously parallel Boltzmann machine maximizes a consensus function that consists of a weighted sum of the regular
Implementations of BLAST for parallel computers.

Science.gov (United States)

Jülich, A

1995-02-01

The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.
Synchronization Of Parallel Discrete Event Simulations

Science.gov (United States)

Steinman, Jeffrey S.

1992-01-01

Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Distributed parallel messaging for multiprocessor systems

Science.gov (United States)

Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka

2013-06-04

A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.
Competency Modeling in Extension Education: Integrating an Academic Extension Education Model with an Extension Human Resource Management Model

Science.gov (United States)

Scheer, Scott D.; Cochran, Graham R.; Harder, Amy; Place, Nick T.

2011-01-01

The purpose of this study was to compare and contrast an academic extension education model with an Extension human resource management model. The academic model of 19 competencies was similar across the 22 competencies of the Extension human resource management model. There were seven unique competencies for the human resource management model.…
SPINning parallel systems software

International Nuclear Information System (INIS)

Matlin, O.S.; Lusk, E.; McCune, W.

2002-01-01

We describe our experiences in using Spin to verify parts of the Multi Purpose Daemon (MPD) parallel process management system. MPD is a distributed collection of processes connected by Unix network sockets. MPD is dynamic processes and connections among them are created and destroyed as MPD is initialized, runs user processes, recovers from faults, and terminates. This dynamic nature is easily expressible in the Spin/Promela framework but poses performance and scalability challenges. We present here the results of expressing some of the parallel algorithms of MPD and executing both simulation and verification runs with Spin

Seamless-merging-oriented parallel inverse lithography technology

International Nuclear Information System (INIS)

Yang Yiwei; Shi Zheng; Shen Shanhu

2009-01-01

Inverse lithography technology (ILT), a promising resolution enhancement technology (RET) used in next generations of IC manufacture, has the capability to push lithography to its limit. However, the existing methods of ILT are either time-consuming due to the large layout in a single process, or not accurate enough due to simply block merging in the parallel process. The seamless-merging-oriented parallel ILT method proposed in this paper is fast because of the parallel process; and most importantly, convergence enhancement penalty terms (CEPT) introduced in the parallel ILT optimization process take the environment into consideration as well as environmental change through target updating. This method increases the similarity of the overlapped area between guard-bands and work units, makes the merging process approach seamless and hence reduces hot-spots. The experimental results show that seamless-merging-oriented parallel ILT not only accelerates the optimization process, but also significantly improves the quality of ILT.
Automatic Management of Parallel and Distributed System Resources

Science.gov (United States)

Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.

1990-01-01

Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.
Synchronization Techniques in Parallel Discrete Event Simulation

OpenAIRE

Lindén, Jonatan

2018-01-01

Discrete event simulation is an important tool for evaluating system models in many fields of science and engineering. To improve the performance of large-scale discrete event simulations, several techniques to parallelize discrete event simulation have been developed. In parallel discrete event simulation, the work of a single discrete event simulation is distributed over multiple processing elements. A key challenge in parallel discrete event simulation is to ensure that causally dependent ...
Parallel sparse direct solver for integrated circuit simulation

CERN Document Server

Chen, Xiaoming; Yang, Huazhong

2017-01-01

This book describes algorithmic methods and parallelization techniques to design a parallel sparse direct solver which is specifically targeted at integrated circuit simulation problems. The authors describe a complete flow and detailed parallel algorithms of the sparse direct solver. They also show how to improve the performance by simple but effective numerical techniques. The sparse direct solver techniques described can be applied to any SPICE-like integrated circuit simulator and have been proven to be high-performance in actual circuit simulation. Readers will benefit from the state-of-the-art parallel integrated circuit simulation techniques described in this book, especially the latest parallel sparse matrix solution techniques. · Introduces complicated algorithms of sparse linear solvers, using concise principles and simple examples, without complex theory or lengthy derivations; · Describes a parallel sparse direct solver that can be adopted to accelerate any SPICE-like integrated circuit simulato...
Streaming nested data parallelism on multicores

DEFF Research Database (Denmark)

Madsen, Frederik Meisner; Filinski, Andrzej

2016-01-01

The paradigm of nested data parallelism (NDP) allows a variety of semi-regular computation tasks to be mapped onto SIMD-style hardware, including GPUs and vector units. However, some care is needed to keep down space consumption in situations where the available parallelism may vastly exceed...
Parallel Boltzmann machines : a mathematical model

NARCIS (Netherlands)

Zwietering, P.J.; Aarts, E.H.L.

1991-01-01

A mathematical model is presented for the description of parallel Boltzmann machines. The framework is based on the theory of Markov chains and combines a number of previously known results into one generic model. It is argued that parallel Boltzmann machines maximize a function consisting of a
17 CFR 12.24 - Parallel proceedings.

Science.gov (United States)

2010-04-01

...) Definition. For purposes of this section, a parallel proceeding shall include: (1) An arbitration proceeding... the receivership includes the resolution of claims made by customers; or (3) A petition filed under... any of the foregoing with knowledge of a parallel proceeding shall promptly notify the Commission, by...
Parallel computing solution of Boltzmann neutron transport equation

International Nuclear Information System (INIS)

Ansah-Narh, T.

2010-01-01

The focus of the research was on developing parallel computing algorithm for solving Eigen-values of the Boltzmam Neutron Transport Equation (BNTE) in a slab geometry using multi-grid approach. In response to the problem of slow execution of serial computing when solving large problems, such as BNTE, the study was focused on the design of parallel computing systems which was an evolution of serial computing that used multiple processing elements simultaneously to solve complex physical and mathematical problems. Finite element method (FEM) was used for the spatial discretization scheme, while angular discretization was accomplished by expanding the angular dependence in terms of Legendre polynomials. The eigenvalues representing the multiplication factors in the BNTE were determined by the power method. MATLAB Compiler Version 4.1 (R2009a) was used to compile the MATLAB codes of BNTE. The implemented parallel algorithms were enabled with matlabpool, a Parallel Computing Toolbox function. The option UseParallel was set to 'always' and the default value of the option was 'never'. When those conditions held, the solvers computed estimated gradients in parallel. The parallel computing system was used to handle all the bottlenecks in the matrix generated from the finite element scheme and each domain of the power method generated. The parallel algorithm was implemented on a Symmetric Multi Processor (SMP) cluster machine, which had Intel 32 bit quad-core x 86 processors. Convergence rates and timings for the algorithm on the SMP cluster machine were obtained. Numerical experiments indicated the designed parallel algorithm could reach perfect speedup and had good stability and scalability. (au)
Fortran code for SU(3) lattice gauge theory with and without MPI checkerboard parallelization

Science.gov (United States)

Berg, Bernd A.; Wu, Hao

2012-10-01

We document plain Fortran and Fortran MPI checkerboard code for Markov chain Monte Carlo simulations of pure SU(3) lattice gauge theory with the Wilson action in D dimensions. The Fortran code uses periodic boundary conditions and is suitable for pedagogical purposes and small scale simulations. For the Fortran MPI code two geometries are covered: the usual torus with periodic boundary conditions and the double-layered torus as defined in the paper. Parallel computing is performed on checkerboards of sublattices, which partition the full lattice in one, two, and so on, up to D directions (depending on the parameters set). For updating, the Cabibbo-Marinari heatbath algorithm is used. We present validations and test runs of the code. Performance is reported for a number of currently used Fortran compilers and, when applicable, MPI versions. For the parallelized code, performance is studied as a function of the number of processors. Program summary Program title: STMC2LSU3MPI Catalogue identifier: AEMJ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEMJ_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 26666 No. of bytes in distributed program, including test data, etc.: 233126 Distribution format: tar.gz Programming language: Fortran 77 compatible with the use of Fortran 90/95 compilers, in part with MPI extensions. Computer: Any capable of compiling and executing Fortran 77 or Fortran 90/95, when needed with MPI extensions. Operating system: Red Hat Enterprise Linux Server 6.1 with OpenMPI + pgf77 11.8-0, Centos 5.3 with OpenMPI + gfortran 4.1.2, Cray XT4 with MPICH2 + pgf90 11.2-0. Has the code been vectorised or parallelized?: Yes, parallelized using MPI extensions. Number of processors used: 2 to 11664 RAM: 200 Mega bytes per process. Classification: 11
Parallel electric fields from ionospheric winds

International Nuclear Information System (INIS)

Nakada, M.P.

1987-01-01

The possible production of electric fields parallel to the magnetic field by dynamo winds in the E region is examined, using a jet stream wind model. Current return paths through the F region above the stream are examined as well as return paths through the conjugate ionosphere. The Wulf geometry with horizontal winds moving in opposite directions one above the other is also examined. Parallel electric fields are found to depend strongly on the width of current sheets at the edges of the jet stream. If these are narrow enough, appreciable parallel electric fields are produced. These appear to be sufficient to heat the electrons which reduces the conductivity and produces further increases in parallel electric fields and temperatures. Calculations indicate that high enough temperatures for optical emission can be produced in less than 0.3 s. Some properties of auroras that might be produced by dynamo winds are examined; one property is a time delay in brightening at higher and lower altitudes
Data parallel sorting for particle simulation

Science.gov (United States)

Dagum, Leonardo

1992-01-01

Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.
High performance parallel computers for science

International Nuclear Information System (INIS)

Nash, T.; Areti, H.; Atac, R.; Biel, J.; Cook, A.; Deppe, J.; Edel, M.; Fischler, M.; Gaines, I.; Hance, R.

1989-01-01

This paper reports that Fermilab's Advanced Computer Program (ACP) has been developing cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 Mflops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction
Parallel 3-D method of characteristics in MPACT

International Nuclear Information System (INIS)

Kochunas, B.; Dovvnar, T. J.; Liu, Z.

2013-01-01

A new parallel 3-D MOC kernel has been developed and implemented in MPACT which makes use of the modular ray tracing technique to reduce computational requirements and to facilitate parallel decomposition. The parallel model makes use of both distributed and shared memory parallelism which are implemented with the MPI and OpenMP standards, respectively. The kernel is capable of parallel decomposition of problems in space, angle, and by characteristic rays up to 0(104) processors. Initial verification of the parallel 3-D MOC kernel was performed using the Takeda 3-D transport benchmark problems. The eigenvalues computed by MPACT are within the statistical uncertainty of the benchmark reference and agree well with the averages of other participants. The MPACT k eff differs from the benchmark results for rodded and un-rodded cases by 11 and -40 pcm, respectively. The calculations were performed for various numbers of processors and parallel decompositions up to 15625 processors; all producing the same result at convergence. The parallel efficiency of the worst case was 60%, while very good efficiency (>95%) was observed for cases using 500 processors. The overall run time for the 500 processor case was 231 seconds and 19 seconds for the case with 15625 processors. Ongoing work is focused on developing theoretical performance models and the implementation of acceleration techniques to minimize the number of iterations to converge. (authors)
Mapping robust parallel multigrid algorithms to scalable memory architectures

Science.gov (United States)

Overman, Andrea; Vanrosendale, John

1993-01-01

The convergence rate of standard multigrid algorithms degenerates on problems with stretched grids or anisotropic operators. The usual cure for this is the use of line or plane relaxation. However, multigrid algorithms based on line and plane relaxation have limited and awkward parallelism and are quite difficult to map effectively to highly parallel architectures. Newer multigrid algorithms that overcome anisotropy through the use of multiple coarse grids rather than relaxation are better suited to massively parallel architectures because they require only simple point-relaxation smoothers. In this paper, we look at the parallel implementation of a V-cycle multiple semicoarsened grid (MSG) algorithm on distributed-memory architectures such as the Intel iPSC/860 and Paragon computers. The MSG algorithms provide two levels of parallelism: parallelism within the relaxation or interpolation on each grid and across the grids on each multigrid level. Both levels of parallelism must be exploited to map these algorithms effectively to parallel architectures. This paper describes a mapping of an MSG algorithm to distributed-memory architectures that demonstrates how both levels of parallelism can be exploited. The result is a robust and effective multigrid algorithm for distributed-memory machines.
A qualitative single case study of parallel processes

DEFF Research Database (Denmark)

Jacobsen, Claus Haugaard

2007-01-01

Parallel process in psychotherapy and supervision is a phenomenon manifest in relationships and interactions, that originates in one setting and is reflected in another. This article presents an explorative single case study of parallel processes based on qualitative analyses of two successive...... randomly chosen psychotherapy sessions with a schizophrenic patient and the supervision session given in between. The author's analysis is verified by an independent examiner's analysis. Parallel processes are identified and described. Reflections on the dynamics of parallel processes and supervisory...
Tampa Bay Extension Agents’ Views of Urban Extension: Philosophy and Program Strategies

Directory of Open Access Journals (Sweden)

Amy Harder

2017-06-01

Full Text Available The purpose of this article was to explore the concept of urban Extension as perceived by Extension agents within the Tampa Bay area, one of Florida’s fastest growing metropolitan areas. From a theoretical perspective, it is critical to understand Extension agents’ beliefs about urban Extension because behaviors are directly related to attitudes (Ajzen, 2012. In 2016, a qualitative investigation was undertaken to explore the perspectives of 23 agents working within the Tampa Bay area. Results showed the majority of agents believed that context and client needs are unique for urban Extension, and that to a lesser extent, unique agent expertise is required. Further, these beliefs impacted how agents reported their approach to programming, with an emphasis on providing convenience and seeking partnerships. Difficulties were identified related to identifying the role of Extension in a resource-rich environment of service providers, which contributed to the existence of a perceived disconnect between urban audiences and Extension. Opportunities exist for Extension leadership to provide strategic organizational support that will enhance agents’ abilities to succeed in the metropolitan environment.
Parallel computing and networking; Heiretsu keisanki to network

Energy Technology Data Exchange (ETDEWEB)

Asakawa, E; Tsuru, T [Japan National Oil Corp., Tokyo (Japan); Matsuoka, T [Japan Petroleum Exploration Co. Ltd., Tokyo (Japan)

1996-05-01

This paper describes the trend of parallel computers used in geophysical exploration. Around 1993 was the early days when the parallel computers began to be used for geophysical exploration. Classification of these computers those days was mainly MIMD (multiple instruction stream, multiple data stream), SIMD (single instruction stream, multiple data stream) and the like. Parallel computers were publicized in the 1994 meeting of the Geophysical Exploration Society as a `high precision imaging technology`. Concerning the library of parallel computers, there was a shift to PVM (parallel virtual machine) in 1993 and to MPI (message passing interface) in 1995. In addition, the compiler of FORTRAN90 was released with support implemented for data parallel and vector computers. In 1993, networks used were Ethernet, FDDI, CDDI and HIPPI. In 1995, the OC-3 products under ATM began to propagate. However, ATM remains to be an interoffice high speed network because the ATM service has not spread yet for the public network. 1 ref.
Parallel pic plasma simulation through particle decomposition techniques

International Nuclear Information System (INIS)

Briguglio, S.; Vlad, G.; Di Martino, B.; Naples, Univ. 'Federico II'

1998-02-01

Particle-in-cell (PIC) codes are among the major candidates to yield a satisfactory description of the detail of kinetic effects, such as the resonant wave-particle interaction, relevant in determining the transport mechanism in magnetically confined plasmas. A significant improvement of the simulation performance of such codes con be expected from parallelization, e.g., by distributing the particle population among several parallel processors. Parallelization of a hybrid magnetohydrodynamic-gyrokinetic code has been accomplished within the High Performance Fortran (HPF) framework, and tested on the IBM SP2 parallel system, using a 'particle decomposition' technique. The adopted technique requires a moderate effort in porting the code in parallel form and results in intrinsic load balancing and modest inter processor communication. The performance tests obtained confirm the hypothesis of high effectiveness of the strategy, if targeted towards moderately parallel architectures. Optimal use of resources is also discussed with reference to a specific physics problem [it
Analysis of parallel computing performance of the code MCNP

International Nuclear Information System (INIS)

Wang Lei; Wang Kan; Yu Ganglin

2006-01-01

Parallel computing can reduce the running time of the code MCNP effectively. With the MPI message transmitting software, MCNP5 can achieve its parallel computing on PC cluster with Windows operating system. Parallel computing performance of MCNP is influenced by factors such as the type, the complexity level and the parameter configuration of the computing problem. This paper analyzes the parallel computing performance of MCNP regarding with these factors and gives measures to improve the MCNP parallel computing performance. (authors)
Parallel knock-out schemes in networks

NARCIS (Netherlands)

Broersma, H.J.; Fomin, F.V.; Woeginger, G.J.

2004-01-01

We consider parallel knock-out schemes, a procedure on graphs introduced by Lampert and Slater in 1997 in which each vertex eliminates exactly one of its neighbors in each round. We are considering cases in which after a finite number of rounds, where the minimimum number is called the parallel

Parallel and vector implementation of APROS simulator code

International Nuclear Information System (INIS)

Niemi, J.; Tommiska, J.

1990-01-01

In this paper the vector and parallel processing implementation of a general purpose simulator code is discussed. In this code the utilization of vector processing is straightforward. In addition to the loop level parallel processing, the functional decomposition and the domain decomposition have been considered. Results represented for a PWR-plant simulation illustrate the potential speed-up factors of the alternatives. It turns out that the loop level parallelism and the domain decomposition are the most promising alternative to employ the parallel processing. (author)
SWAMP+: multiple subsequence alignment using associative massive parallelism

Energy Technology Data Exchange (ETDEWEB)

Steinfadt, Shannon Irene [Los Alamos National Laboratory; Baker, Johnnie W [KENT STATE UNIV.

2010-10-18

A new parallel algorithm SWAMP+ incorporates the Smith-Waterman sequence alignment on an associative parallel model known as ASC. It is a highly sensitive parallel approach that expands traditional pairwise sequence alignment. This is the first parallel algorithm to provide multiple non-overlapping, non-intersecting subsequence alignments with the accuracy of Smith-Waterman. The efficient algorithm provides multiple alignments similar to BLAST while creating a better workflow for the end users. The parallel portions of the code run in O(m+n) time using m processors. When m = n, the algorithmic analysis becomes O(n) with a coefficient of two, yielding a linear speedup. Implementation of the algorithm on the SIMD ClearSpeed CSX620 confirms this theoretical linear speedup with real timings.
The impact of brand extension fit, extension strategy and product exposure on attitudinal responses to brand extensions

OpenAIRE

Farstad, Lena Kvelland; Jabran, Mohammed

2013-01-01

Brand extensions have for decades been one of the most used strategies for growth, but the sad reality is that 8 out of 10 extensions fail, making the likelihood of failure unattractively high. In addition, competition and pressure on margins increases as retailers’ power improves due to proliferation of private labels. As a result, managers are eager for new innovative strategies that can differentiate their extension and improve likelihood of success. The purpose of this paper is therefore ...
Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications

Science.gov (United States)

Sun, Xian-He

1997-01-01

Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm
A Green's function method for two-dimensional reactive solute transport in a parallel fracture-matrix system

Science.gov (United States)

Chen, Kewei; Zhan, Hongbin

2018-06-01

The reactive solute transport in a single fracture bounded by upper and lower matrixes is a classical problem that captures the dominant factors affecting transport behavior beyond pore scale. A parallel fracture-matrix system which considers the interaction among multiple paralleled fractures is an extension to a single fracture-matrix system. The existing analytical or semi-analytical solution for solute transport in a parallel fracture-matrix simplifies the problem to various degrees, such as neglecting the transverse dispersion in the fracture and/or the longitudinal diffusion in the matrix. The difficulty of solving the full two-dimensional (2-D) problem lies in the calculation of the mass exchange between the fracture and matrix. In this study, we propose an innovative Green's function approach to address the 2-D reactive solute transport in a parallel fracture-matrix system. The flux at the interface is calculated numerically. It is found that the transverse dispersion in the fracture can be safely neglected due to the small scale of fracture aperture. However, neglecting the longitudinal matrix diffusion would overestimate the concentration profile near the solute entrance face and underestimate the concentration profile at the far side. The error caused by neglecting the longitudinal matrix diffusion decreases with increasing Peclet number. The longitudinal matrix diffusion does not have obvious influence on the concentration profile in long-term. The developed model is applied to a non-aqueous-phase-liquid (DNAPL) contamination field case in New Haven Arkose of Connecticut in USA to estimate the Trichloroethylene (TCE) behavior over 40 years. The ratio of TCE mass stored in the matrix and the injected TCE mass increases above 90% in less than 10 years.
Parallel-Processing Test Bed For Simulation Software

Science.gov (United States)

Blech, Richard; Cole, Gary; Townsend, Scott

1996-01-01

Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).
A parallel nearly implicit time-stepping scheme

OpenAIRE

Botchev, Mike A.; van der Vorst, Henk A.

2001-01-01

Across-the-space parallelism still remains the most mature, convenient and natural way to parallelize large scale problems. One of the major problems here is that implicit time stepping is often difficult to parallelize due to the structure of the system. Approximate implicit schemes have been suggested to circumvent the problem. These schemes have attractive stability properties and they are also very well parallelizable. The purpose of this article is to give an overall assessment of the pa...
[Three-dimensional parallel collagen scaffold promotes tendon extracellular matrix formation].

Science.gov (United States)

Zheng, Zefeng; Shen, Weiliang; Le, Huihui; Dai, Xuesong; Ouyang, Hongwei; Chen, Weishan

2016-03-01

To investigate the effects of three-dimensional parallel collagen scaffold on the cell shape, arrangement and extracellular matrix formation of tendon stem cells. Parallel collagen scaffold was fabricated by unidirectional freezing technique, while random collagen scaffold was fabricated by freeze-drying technique. The effects of two scaffolds on cell shape and extracellular matrix formation were investigated in vitro by seeding tendon stem/progenitor cells and in vivo by ectopic implantation. Parallel and random collagen scaffolds were produced successfully. Parallel collagen scaffold was more akin to tendon than random collagen scaffold. Tendon stem/progenitor cells were spindle-shaped and unified orientated in parallel collagen scaffold, while cells on random collagen scaffold had disorder orientation. Two weeks after ectopic implantation, cells had nearly the same orientation with the collagen substance. In parallel collagen scaffold, cells had parallel arrangement, and more spindly cells were observed. By contrast, cells in random collagen scaffold were disorder. Parallel collagen scaffold can induce cells to be in spindly and parallel arrangement, and promote parallel extracellular matrix formation; while random collagen scaffold can induce cells in random arrangement. The results indicate that parallel collagen scaffold is an ideal structure to promote tendon repairing.
Processing data communications events by awakening threads in parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2016-03-15

Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.
Improvement of Parallel Algorithm for MATRA Code

International Nuclear Information System (INIS)

Kim, Seong-Jin; Seo, Kyong-Won; Kwon, Hyouk; Hwang, Dae-Hyun

2014-01-01

The feasibility study to parallelize the MATRA code was conducted in KAERI early this year. As a result, a parallel algorithm for the MATRA code has been developed to decrease a considerably required computing time to solve a bigsize problem such as a whole core pin-by-pin problem of a general PWR reactor and to improve an overall performance of the multi-physics coupling calculations. It was shown that the performance of the MATRA code was greatly improved by implementing the parallel algorithm using MPI communication. For problems of a 1/8 core and whole core for SMART reactor, a speedup was evaluated as about 10 when the numbers of used processor were 25. However, it was also shown that the performance deteriorated as the axial node number increased. In this paper, the procedure of a communication between processors is optimized to improve the previous parallel algorithm.. To improve the performance deterioration of the parallelized MATRA code, the communication algorithm between processors was newly presented. It was shown that the speedup was improved and stable regardless of the axial node number
Iteration schemes for parallelizing models of superconductivity

Energy Technology Data Exchange (ETDEWEB)

Gray, P.A. [Michigan State Univ., East Lansing, MI (United States)

1996-12-31

The time dependent Lawrence-Doniach model, valid for high fields and high values of the Ginzburg-Landau parameter, is often used for studying vortex dynamics in layered high-T{sub c} superconductors. When solving these equations numerically, the added degrees of complexity due to the coupling and nonlinearity of the model often warrant the use of high-performance computers for their solution. However, the interdependence between the layers can be manipulated so as to allow parallelization of the computations at an individual layer level. The reduced parallel tasks may then be solved independently using a heterogeneous cluster of networked workstations connected together with Parallel Virtual Machine (PVM) software. Here, this parallelization of the model is discussed and several computational implementations of varying degrees of parallelism are presented. Computational results are also given which contrast properties of convergence speed, stability, and consistency of these implementations. Included in these results are models involving the motion of vortices due to an applied current and pinning effects due to various material properties.
A Parallel Priority Queue with Constant Time Operations

DEFF Research Database (Denmark)

Brodal, Gerth Stølting; Träff, Jesper Larsson; Zaroliagis, Christos D.

1998-01-01

We present a parallel priority queue that supports the following operations in constant time:parallel insertionof a sequence of elements ordered according to key,parallel decrease keyfor a sequence of elements ordered according to key,deletion of the minimum key element, anddeletion of an arbitrary...... application is a parallel implementation of Dijkstra's algorithm for the single-source shortest path problem, which runs inO(n) time andO(mlogn) work on a CREW PRAM on graphs withnvertices andmedges. This is a logarithmic factor improvement in the running time compared with previous approaches....
Symmetric and asymmetric capillary bridges between a rough surface and a parallel surface.

Science.gov (United States)

Wang, Yongxin; Michielsen, Stephen; Lee, Hoon Joo

2013-09-03

Although the formation of a capillary bridge between two parallel surfaces has been extensively studied, the majority of research has described only symmetric capillary bridges between two smooth surfaces. In this work, an instrument was built to form a capillary bridge by squeezing a liquid drop on one surface with another surface. An analytical solution that describes the shape of symmetric capillary bridges joining two smooth surfaces has been extended to bridges that are asymmetric about the midplane and to rough surfaces. The solution, given by elliptical integrals of the first and second kind, is consistent with a constant Laplace pressure over the entire surface and has been verified for water, Kaydol, and dodecane drops forming symmetric and asymmetric bridges between parallel smooth surfaces. This solution has been applied to asymmetric capillary bridges between a smooth surface and a rough fabric surface as well as symmetric bridges between two rough surfaces. These solutions have been experimentally verified, and good agreement has been found between predicted and experimental profiles for small drops where the effect of gravity is negligible. Finally, a protocol for determining the profile from the volume and height of the capillary bridge has been developed and experimentally verified.
3D multiphysics modeling of superconducting cavities with a massively parallel simulation suite

Directory of Open Access Journals (Sweden)

Oleksiy Kononenko

2017-10-01

Full Text Available Radiofrequency cavities based on superconducting technology are widely used in particle accelerators for various applications. The cavities usually have high quality factors and hence narrow bandwidths, so the field stability is sensitive to detuning from the Lorentz force and external loads, including vibrations and helium pressure variations. If not properly controlled, the detuning can result in a serious performance degradation of a superconducting accelerator, so an understanding of the underlying detuning mechanisms can be very helpful. Recent advances in the simulation suite ace3p have enabled realistic multiphysics characterization of such complex accelerator systems on supercomputers. In this paper, we present the new capabilities in ace3p for large-scale 3D multiphysics modeling of superconducting cavities, in particular, a parallel eigensolver for determining mechanical resonances, a parallel harmonic response solver to calculate the response of a cavity to external vibrations, and a numerical procedure to decompose mechanical loads, such as from the Lorentz force or piezoactuators, into the corresponding mechanical modes. These capabilities have been used to do an extensive rf-mechanical analysis of dressed TESLA-type superconducting cavities. The simulation results and their implications for the operational stability of the Linac Coherent Light Source-II are discussed.
An Automatic Instruction-Level Parallelization of Machine Code

Directory of Open Access Journals (Sweden)

MARINKOVIC, V.

2018-02-01

Full Text Available Prevailing multicores and novel manycores have made a great challenge of modern day - parallelization of embedded software that is still written as sequential. In this paper, automatic code parallelization is considered, focusing on developing a parallelization tool at the binary level as well as on the validation of this approach. The novel instruction-level parallelization algorithm for assembly code which uses the register names after SSA to find independent blocks of code and then to schedule independent blocks using METIS to achieve good load balance is developed. The sequential consistency is verified and the validation is done by measuring the program execution time on the target architecture. Great speedup, taken as the performance measure in the validation process, and optimal load balancing are achieved for multicore RISC processors with 2 to 16 cores (e.g. MIPS, MicroBlaze, etc.. In particular, for 16 cores, the average speedup is 7.92x, while in some cases it reaches 14x. An approach to automatic parallelization provided by this paper is useful to researchers and developers in the area of parallelization as the basis for further optimizations, as the back-end of a compiler, or as the code parallelization tool for an embedded system.
Step by step parallel programming method for molecular dynamics code

International Nuclear Information System (INIS)

Orii, Shigeo; Ohta, Toshio

1996-07-01

Parallel programming for a numerical simulation program of molecular dynamics is carried out with a step-by-step programming technique using the two phase method. As a result, within the range of a certain computing parameters, it is found to obtain parallel performance by using the level of parallel programming which decomposes the calculation according to indices of do-loops into each processor on the vector parallel computer VPP500 and the scalar parallel computer Paragon. It is also found that VPP500 shows parallel performance in wider range computing parameters. The reason is that the time cost of the program parts, which can not be reduced by the do-loop level of the parallel programming, can be reduced to the negligible level by the vectorization. After that, the time consuming parts of the program are concentrated on less parts that can be accelerated by the do-loop level of the parallel programming. This report shows the step-by-step parallel programming method and the parallel performance of the molecular dynamics code on VPP500 and Paragon. (author)
A possibility of parallel and anti-parallel diffraction measurements on neu- tron diffractometer employing bent perfect crystal monochromator at the monochromatic focusing condition

Science.gov (United States)

Choi, Yong Nam; Kim, Shin Ae; Kim, Sung Kyu; Kim, Sung Baek; Lee, Chang-Hee; Mikula, Pavel

2004-07-01

In a conventional diffractometer having single monochromator, only one position, parallel position, is used for the diffraction experiment (i.e. detection) because the resolution property of the other one, anti-parallel position, is very poor. However, a bent perfect crystal (BPC) monochromator at monochromatic focusing condition can provide a quite flat and equal resolution property at both parallel and anti-parallel positions and thus one can have a chance to use both sides for the diffraction experiment. From the data of the FWHM and the Delta d/d measured on three diffraction geometries (symmetric, asymmetric compression and asymmetric expansion), we can conclude that the simultaneous diffraction measurement in both parallel and anti-parallel positions can be achieved.
Broadcasting collective operation contributions throughout a parallel computer

Science.gov (United States)

Faraj, Ahmad [Rochester, MN

2012-02-21

Methods, systems, and products are disclosed for broadcasting collective operation contributions throughout a parallel computer. The parallel computer includes a plurality of compute nodes connected together through a data communications network. Each compute node has a plurality of processors for use in collective parallel operations on the parallel computer. Broadcasting collective operation contributions throughout a parallel computer according to embodiments of the present invention includes: transmitting, by each processor on each compute node, that processor's collective operation contribution to the other processors on that compute node using intra-node communications; and transmitting on a designated network link, by each processor on each compute node according to a serial processor transmission sequence, that processor's collective operation contribution to the other processors on the other compute nodes using inter-node communications.
[Falsified medicines in parallel trade].

Science.gov (United States)

Muckenfuß, Heide

2017-11-01

The number of falsified medicines on the German market has distinctly increased over the past few years. In particular, stolen pharmaceutical products, a form of falsified medicines, have increasingly been introduced into the legal supply chain via parallel trading. The reasons why parallel trading serves as a gateway for falsified medicines are most likely the complex supply chains and routes of transport. It is hardly possible for national authorities to trace the history of a medicinal product that was bought and sold by several intermediaries in different EU member states. In addition, the heterogeneous outward appearance of imported and relabelled pharmaceutical products facilitates the introduction of illegal products onto the market. Official batch release at the Paul-Ehrlich-Institut offers the possibility of checking some aspects that might provide an indication of a falsified medicine. In some circumstances, this may allow the identification of falsified medicines before they come onto the German market. However, this control is only possible for biomedicinal products that have not received a waiver regarding official batch release. For improved control of parallel trade, better networking among the EU member states would be beneficial. European-wide regulations, e. g., for disclosure of the complete supply chain, would help to minimise the risks of parallel trading and hinder the marketing of falsified medicines.
Applied Parallel Computing Industrial Computation and Optimization

DEFF Research Database (Denmark)

Madsen, Kaj; NA NA NA Olesen, Dorte

Proceedings and the Third International Workshop on Applied Parallel Computing in Industrial Problems and Optimization (PARA96)......Proceedings and the Third International Workshop on Applied Parallel Computing in Industrial Problems and Optimization (PARA96)...

Algorithms for computational fluid dynamics n parallel processors

International Nuclear Information System (INIS)

Van de Velde, E.F.

1986-01-01

A study of parallel algorithms for the numerical solution of partial differential equations arising in computational fluid dynamics is presented. The actual implementation on parallel processors of shared and nonshared memory design is discussed. The performance of these algorithms is analyzed in terms of machine efficiency, communication time, bottlenecks and software development costs. For elliptic equations, a parallel preconditioned conjugate gradient method is described, which has been used to solve pressure equations discretized with high order finite elements on irregular grids. A parallel full multigrid method and a parallel fast Poisson solver are also presented. Hyperbolic conservation laws were discretized with parallel versions of finite difference methods like the Lax-Wendroff scheme and with the Random Choice method. Techniques are developed for comparing the behavior of an algorithm on different architectures as a function of problem size and local computational effort. Effective use of these advanced architecture machines requires the use of machine dependent programming. It is shown that the portability problems can be minimized by introducing high level operations on vectors and matrices structured into program libraries
Parallel visualization on leadership computing resources

Energy Technology Data Exchange (ETDEWEB)

Peterka, T; Ross, R B [Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439 (United States); Shen, H-W [Department of Computer Science and Engineering, Ohio State University, Columbus, OH 43210 (United States); Ma, K-L [Department of Computer Science, University of California at Davis, Davis, CA 95616 (United States); Kendall, W [Department of Electrical Engineering and Computer Science, University of Tennessee at Knoxville, Knoxville, TN 37996 (United States); Yu, H, E-mail: tpeterka@mcs.anl.go [Sandia National Laboratories, California, Livermore, CA 94551 (United States)

2009-07-01

Changes are needed in the way that visualization is performed, if we expect the analysis of scientific data to be effective at the petascale and beyond. By using similar techniques as those used to parallelize simulations, such as parallel I/O, load balancing, and effective use of interprocess communication, the supercomputers that compute these datasets can also serve as analysis and visualization engines for them. Our team is assessing the feasibility of performing parallel scientific visualization on some of the most powerful computational resources of the U.S. Department of Energy's National Laboratories in order to pave the way for analyzing the next generation of computational results. This paper highlights some of the conclusions of that research.
Parallel optoelectronic trinary signed-digit division

Science.gov (United States)

Alam, Mohammad S.

1999-03-01

The trinary signed-digit (TSD) number system has been found to be very useful for parallel addition and subtraction of any arbitrary length operands in constant time. Using the TSD addition and multiplication modules as the basic building blocks, we develop an efficient algorithm for performing parallel TSD division in constant time. The proposed division technique uses one TSD subtraction and two TSD multiplication steps. An optoelectronic correlator based architecture is suggested for implementation of the proposed TSD division algorithm, which fully exploits the parallelism and high processing speed of optics. An efficient spatial encoding scheme is used to ensure better utilization of space bandwidth product of the spatial light modulators used in the optoelectronic implementation.
Parallel visualization on leadership computing resources

International Nuclear Information System (INIS)

Peterka, T; Ross, R B; Shen, H-W; Ma, K-L; Kendall, W; Yu, H

2009-01-01

Changes are needed in the way that visualization is performed, if we expect the analysis of scientific data to be effective at the petascale and beyond. By using similar techniques as those used to parallelize simulations, such as parallel I/O, load balancing, and effective use of interprocess communication, the supercomputers that compute these datasets can also serve as analysis and visualization engines for them. Our team is assessing the feasibility of performing parallel scientific visualization on some of the most powerful computational resources of the U.S. Department of Energy's National Laboratories in order to pave the way for analyzing the next generation of computational results. This paper highlights some of the conclusions of that research.
Language constructs for modular parallel programs

Energy Technology Data Exchange (ETDEWEB)

Foster, I.

1996-03-01

We describe programming language constructs that facilitate the application of modular design techniques in parallel programming. These constructs allow us to isolate resource management and processor scheduling decisions from the specification of individual modules, which can themselves encapsulate design decisions concerned with concurrence, communication, process mapping, and data distribution. This approach permits development of libraries of reusable parallel program components and the reuse of these components in different contexts. In particular, alternative mapping strategies can be explored without modifying other aspects of program logic. We describe how these constructs are incorporated in two practical parallel programming languages, PCN and Fortran M. Compilers have been developed for both languages, allowing experimentation in substantial applications.
Parallel debt in the Serbian finance law

Directory of Open Access Journals (Sweden)

Kuzman Miloš

2014-01-01

Full Text Available The purpose of this paper is to present the mechanism of parallel debt in the Serbian financial law. While considering whether the mechanism of parallel debt exists under the Serbian law, the Anglo-Saxon mechanism of trust is represented. Hence it is explained why the mechanism of trust is not allowed under the Serbian law. Further on, the mechanism of parallel debt is introduced as well as a debate on permissibility of its cause in the Serbian law. Comparative legal arguments about this issue are also presented in this paper. In conclusion, the author suggests that on the basis of the conclusions drawn in this paper, the parallel debt mechanism is to be declared admissible if it is ever taken into consideration by the Serbian courts.
Parallel grid population

Science.gov (United States)

Wald, Ingo; Ize, Santiago

2015-07-28

Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.
A high-speed linear algebra library with automatic parallelism

Science.gov (United States)

Boucher, Michael L.

1994-01-01

Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.
Fringe Capacitance of a Parallel-Plate Capacitor.

Science.gov (United States)

Hale, D. P.

1978-01-01

Describes an experiment designed to measure the forces between charged parallel plates, and determines the relationship among the effective electrode area, the measured capacitance values, and the electrode spacing of a parallel plate capacitor. (GA)
Parallel processing of Monte Carlo code MCNP for particle transport problem

Energy Technology Data Exchange (ETDEWEB)

Higuchi, Kenji; Kawasaki, Takuji

1996-06-01

It is possible to vectorize or parallelize Monte Carlo codes (MC code) for photon and neutron transport problem, making use of independency of the calculation for each particle. Applicability of existing MC code to parallel processing is mentioned. As for parallel computer, we have used both vector-parallel processor and scalar-parallel processor in performance evaluation. We have made (i) vector-parallel processing of MCNP code on Monte Carlo machine Monte-4 with four vector processors, (ii) parallel processing on Paragon XP/S with 256 processors. In this report we describe the methodology and results for parallel processing on two types of parallel or distributed memory computers. In addition, we mention the evaluation of parallel programming environments for parallel computers used in the present work as a part of the work developing STA (Seamless Thinking Aid) Basic Software. (author)
Parallelization of TMVA Machine Learning Algorithms

CERN Document Server

Hajili, Mammad

2017-01-01

This report reflects my work on Parallelization of TMVA Machine Learning Algorithms integrated to ROOT Data Analysis Framework during summer internship at CERN. The report consists of 4 impor- tant part - data set used in training and validation, algorithms that multiprocessing applied on them, parallelization techniques and re- sults of execution time changes due to number of workers.
A parallel adaptive finite difference algorithm for petroleum reservoir simulation

Energy Technology Data Exchange (ETDEWEB)

Hoang, Hai Minh

2005-07-01

Adaptive finite differential for problems arising in simulation of flow in porous medium applications are considered. Such methods have been proven useful for overcoming limitations of computational resources and improving the resolution of the numerical solutions to a wide range of problems. By local refinement of the computational mesh where it is needed to improve the accuracy of solutions, yields better solution resolution representing more efficient use of computational resources than is possible with traditional fixed-grid approaches. In this thesis, we propose a parallel adaptive cell-centered finite difference (PAFD) method for black-oil reservoir simulation models. This is an extension of the adaptive mesh refinement (AMR) methodology first developed by Berger and Oliger (1984) for the hyperbolic problem. Our algorithm is fully adaptive in time and space through the use of subcycling, in which finer grids are advanced at smaller time steps than the coarser ones. When coarse and fine grids reach the same advanced time level, they are synchronized to ensure that the global solution is conservative and satisfy the divergence constraint across all levels of refinement. The material in this thesis is subdivided in to three overall parts. First we explain the methodology and intricacies of AFD scheme. Then we extend a finite differential cell-centered approximation discretization to a multilevel hierarchy of refined grids, and finally we are employing the algorithm on parallel computer. The results in this work show that the approach presented is robust, and stable, thus demonstrating the increased solution accuracy due to local refinement and reduced computing resource consumption. (Author)
Stray current interaction in the system of two extensive underground conductors

International Nuclear Information System (INIS)

Machczynski, W.

1993-01-01

The important problem, technically, is to evaluate the harmful effects that an electrified railway has on nearby earth-return circuits (cables, pipelines). This paper considers the effects of a DC-electrified railway system on two extensive metal conductors buried in parallel in the vicinity of the tracks. The interaction between currents flowing in both underground conductors is taken into account, whereas the reaction of the conductors' currents on the track current is disregarded. The analysis given is applicable to any DC railway system in which tracks may be represented by a single earth return circuit with current energization. It is assumed in the paper that the system considered is linear and that the earth is homogeneous. The technical application of the method is illustrated by an example of computer simulation
Parallel community climate model: Description and user`s guide

Energy Technology Data Exchange (ETDEWEB)

Drake, J.B.; Flanery, R.E.; Semeraro, B.D.; Worley, P.H. [and others

1996-07-15

This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain into geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.
Distance-two interpolation for parallel algebraic multigrid

International Nuclear Information System (INIS)

Sterck, H de; Falgout, R D; Nolting, J W; Yang, U M

2007-01-01

In this paper we study the use of long distance interpolation methods with the low complexity coarsening algorithm PMIS. AMG performance and scalability is compared for classical as well as long distance interpolation methods on parallel computers. It is shown that the increased interpolation accuracy largely restores the scalability of AMG convergence factors for PMIS-coarsened grids, and in combination with complexity reducing methods, such as interpolation truncation, one obtains a class of parallel AMG methods that enjoy excellent scalability properties on large parallel computers
Parallel processing of genomics data

Science.gov (United States)

Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario

2016-10-01

The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.
Parallel science and engineering applications the Charm++ approach

CERN Document Server

Kale, Laxmikant V

2016-01-01

Developed in the context of science and engineering applications, with each abstraction motivated by and further honed by specific application needs, Charm++ is a production-quality system that runs on almost all parallel computers available. Parallel Science and Engineering Applications: The Charm++ Approach surveys a diverse and scalable collection of science and engineering applications, most of which are used regularly on supercomputers by scientists to further their research. After a brief introduction to Charm++, the book presents several parallel CSE codes written in the Charm++ model, along with their underlying scientific and numerical formulations, explaining their parallelization strategies and parallel performance. These chapters demonstrate the versatility of Charm++ and its utility for a wide variety of applications, including molecular dynamics, cosmology, quantum chemistry, fracture simulations, agent-based simulations, and weather modeling. The book is intended for a wide audience of people i...
Parallel processing of two-dimensional Sn transport calculations

International Nuclear Information System (INIS)

Uematsu, M.

1997-01-01

A parallel processing method for the two-dimensional S n transport code DOT3.5 has been developed to achieve a drastic reduction in computation time. In the proposed method, parallelization is achieved with angular domain decomposition and/or space domain decomposition. The calculational speed of parallel processing by angular domain decomposition is largely influenced by frequent communications between processing elements. To assess parallelization efficiency, sample problems with up to 32 x 32 spatial meshes were solved with a Sun workstation using the PVM message-passing library. As a result, parallel calculation using 16 processing elements, for example, was found to be nine times as fast as that with one processing element. As for parallel processing by geometry segmentation, the influence of processing element communications on computation time is small; however, discontinuity at the segment boundary degrades convergence speed. To accelerate the convergence, an alternate sweep of angular flux in conjunction with space domain decomposition and a two-step rescaling method consisting of segmentwise rescaling and ordinary pointwise rescaling have been developed. By applying the developed method, the number of iterations needed to obtain a converged flux solution was reduced by a factor of 2. As a result, parallel calculation using 16 processing elements was found to be 5.98 times as fast as the original DOT3.5 calculation
Parallel Algorithms for Switching Edges in Heterogeneous Graphs.

Science.gov (United States)

Bhuiyan, Hasanuzzaman; Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav

2017-06-01

An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors.
Parallel processing of structural integrity analysis codes

International Nuclear Information System (INIS)

Swami Prasad, P.; Dutta, B.K.; Kushwaha, H.S.

1996-01-01

Structural integrity analysis forms an important role in assessing and demonstrating the safety of nuclear reactor components. This analysis is performed using analytical tools such as Finite Element Method (FEM) with the help of digital computers. The complexity of the problems involved in nuclear engineering demands high speed computation facilities to obtain solutions in reasonable amount of time. Parallel processing systems such as ANUPAM provide an efficient platform for realising the high speed computation. The development and implementation of software on parallel processing systems is an interesting and challenging task. The data and algorithm structure of the codes plays an important role in exploiting the parallel processing system capabilities. Structural analysis codes based on FEM can be divided into two categories with respect to their implementation on parallel processing systems. The first category codes such as those used for harmonic analysis, mechanistic fuel performance codes need not require the parallelisation of individual modules of the codes. The second category of codes such as conventional FEM codes require parallelisation of individual modules. In this category, parallelisation of equation solution module poses major difficulties. Different solution schemes such as domain decomposition method (DDM), parallel active column solver and substructuring method are currently used on parallel processing systems. Two codes, FAIR and TABS belonging to each of these categories have been implemented on ANUPAM. The implementation details of these codes and the performance of different equation solvers are highlighted. (author). 5 refs., 12 figs., 1 tab

The series-parallel circuit in the treatment of fulminant hepatitis.

Science.gov (United States)

Nakae, Hajime; Yonekawa, Chikara; Moon, Sunkwi; Tajimi, Kimitaka

2004-04-01

We developed a series-parallel treatment method for combined plasma exchange (PE) and continuous hemodiafiltration (CHDF) therapy in fulminant hepatitis. We then compared total serum bilirubin, citrate, and cytokine levels obtained by the new methods to those obtained with treatment by the single and reverse-parallel PE methods. Ten adult patients with fulminant hepatitis consented to participate. Plasma exchange was conducted 25 times by the single method (PE only), 16 times by the reverse-parallel method, and 37 times by the series-parallel method. The percentage of total bilirubin removed was highest with the single method followed in order by that with the series-parallel and reverse-parallel methods; the differences were significant. The percentage increase in citrate level was highest with the single method, followed in order by that with the series-parallel and the reverse-parallel methods; these differences were also significant. There was no significant difference in serum interleukin (IL)-6 levels after PE, by the single or the reverse-parallel methods. However, the IL-6 level decreased significantly following PE by the series-parallel method. The serum IL-18 level decreased significantly following PE by each of the three methods. Thus, removal of excess bilirubin, citrate, and cytokines by the series-parallel method, a simple maneuver with excellent removal rates, was considered effective.
Cellular automata a parallel model

CERN Document Server

Mazoyer, J

1999-01-01

Cellular automata can be viewed both as computational models and modelling systems of real processes. This volume emphasises the first aspect. In articles written by leading researchers, sophisticated massive parallel algorithms (firing squad, life, Fischer's primes recognition) are treated. Their computational power and the specific complexity classes they determine are surveyed, while some recent results in relation to chaos from a new dynamic systems point of view are also presented. Audience: This book will be of interest to specialists of theoretical computer science and the parallelism challenge.
Cooperative Extension as a Framework for Health Extension: The Michigan State University Model.

Science.gov (United States)

Dwyer, Jeffrey W; Contreras, Dawn; Eschbach, Cheryl L; Tiret, Holly; Newkirk, Cathy; Carter, Erin; Cronk, Linda

2017-10-01

The Affordable Care Act charged the Agency for Healthcare Research and Quality to create the Primary Care Extension Program, but did not fund this effort. The idea to work through health extension agents to support health care delivery systems was based on the nationally known Cooperative Extension System (CES). Instead of creating new infrastructure in health care, the CES is an ideal vehicle for increasing health-related research and primary care delivery. The CES, a long-standing component of the land-grant university system, features a sustained infrastructure for providing education to communities. The Michigan State University (MSU) Model of Health Extension offers another means of developing a National Primary Care Extension Program that is replicable in part because of the presence of the CES throughout the United States. A partnership between the MSU College of Human Medicine and MSU Extension formed in 2014, emphasizing the promotion and support of human health research. The MSU Model of Health Extension includes the following strategies: building partnerships, preparing MSU Extension educators for participation in research, increasing primary care patient referrals and enrollment in health programs, and exploring innovative funding. Since the formation of the MSU Model of Health Extension, researchers and extension professionals have made 200+ connections, and grants have afforded savings in salary costs. The MSU College of Human Medicine and MSU Extension partnership can serve as a model to promote health partnerships nationwide between CES services within land-grant universities and academic health centers or community-based medical schools.
The BLAZE language - A parallel language for scientific programming

Science.gov (United States)

Mehrotra, Piyush; Van Rosendale, John

1987-01-01

A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.
The BLAZE language: A parallel language for scientific programming

Science.gov (United States)

Mehrotra, P.; Vanrosendale, J.

1985-01-01

A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.
More parallel please

DEFF Research Database (Denmark)

Gregersen, Frans; Josephson, Olle; Kristoffersen, Gjert

of departure that English may be used in parallel with the various local, in this case Nordic, languages. As such, the book integrates the challenge of internationalization faced by any university with the wish to improve quality in research, education and administration based on the local language......Abstract [en] More parallel, please is the result of the work of an Inter-Nordic group of experts on language policy financed by the Nordic Council of Ministers 2014-17. The book presents all that is needed to plan, practice and revise a university language policy which takes as its point......(s). There are three layers in the text: First, you may read the extremely brief version of the in total 11 recommendations for best practice. Second, you may acquaint yourself with the extended version of the recommendations and finally, you may study the reasoning behind each of them. At the end of the text, we give...
An Algorithm for Parallel Sn Sweeps on Unstructured Meshes

International Nuclear Information System (INIS)

Pautz, Shawn D.

2002-01-01

A new algorithm for performing parallel S n sweeps on unstructured meshes is developed. The algorithm uses a low-complexity list ordering heuristic to determine a sweep ordering on any partitioned mesh. For typical problems and with 'normal' mesh partitionings, nearly linear speedups on up to 126 processors are observed. This is an important and desirable result, since although analyses of structured meshes indicate that parallel sweeps will not scale with normal partitioning approaches, no severe asymptotic degradation in the parallel efficiency is observed with modest (≤100) levels of parallelism. This result is a fundamental step in the development of efficient parallel S n methods
3D, parallel fluid-structure interaction code

CSIR Research Space (South Africa)

Oxtoby, Oliver F

2011-01-01

Full Text Available The authors describe the development of a 3D parallel Fluid–Structure–Interaction (FSI) solver and its application to benchmark problems. Fluid and solid domains are discretised using and edge-based finite-volume scheme for efficient parallel...
Utilization pattern of extension tools and methods by Agricultural Extension Agents

Directory of Open Access Journals (Sweden)

M Surudhi

2018-05-01

Full Text Available A study was conducted in Krishnagiri district of Tamil Nadu state to understand the utilization pattern of extension tools and methods by the agricultural extension agents. As ICT revolution is slowly conquering the rural sector, it becomes imperative that the agricultural extension agents transform themselves to the changing times and develop competencies in utilizing these ICTs. The study explored the usage of various extension tools and methods by the change agents and the constraints faced in utilizing them. The findings revealed that the extension functionaries frequently used the individual contact methods viz., telephone, office calls and farm and home visits in the process of transfer of technology. Least efforts were shown in sending SMS based communication. Meetings were the common and frequently adopted group contact method. Demonstrations, farmer field school, farmer’s interest groups, field trips and farmer training programmes were moderately adopted. Posters, leaflets and pre-season campaigns were the widely adopted mass contact methods. They possess least skill in utilizing farm magazines, presenting television and radio programmes, which are among the most popular and most efficient mass contact methods. The extension functionaries need to be trained adequately on the wider use of electronic communication methods like e mails, and SMS in the local language. Efforts should be taken up to sensitize the importance and train the extension agents in the usage of different group and mass contact methods.
Reliability allocation problem in a series-parallel system

International Nuclear Information System (INIS)

Yalaoui, Alice; Chu, Chengbin; Chatelet, Eric

2005-01-01

In order to improve system reliability, designers may introduce in a system different technologies in parallel. When each technology is composed of components in series, the configuration belongs to the series-parallel systems. This type of system has not been studied as much as the parallel-series architecture. There exist no methods dedicated to the reliability allocation in series-parallel systems with different technologies. We propose in this paper theoretical and practical results for the allocation problem in a series-parallel system. Two resolution approaches are developed. Firstly, a one stage problem is studied and the results are exploited for the multi-stages problem. A theoretical condition for obtaining the optimal allocation is developed. Since this condition is too restrictive, we secondly propose an alternative approach based on an approximated function and the results of the one-stage study. This second approach is applied to numerical examples
Parallel imaging with phase scrambling.

Science.gov (United States)

Zaitsev, Maxim; Schultz, Gerrit; Hennig, Juergen; Gruetter, Rolf; Gallichan, Daniel

2015-04-01

Most existing methods for accelerated parallel imaging in MRI require additional data, which are used to derive information about the sensitivity profile of each radiofrequency (RF) channel. In this work, a method is presented to avoid the acquisition of separate coil calibration data for accelerated Cartesian trajectories. Quadratic phase is imparted to the image to spread the signals in k-space (aka phase scrambling). By rewriting the Fourier transform as a convolution operation, a window can be introduced to the convolved chirp function, allowing a low-resolution image to be reconstructed from phase-scrambled data without prominent aliasing. This image (for each RF channel) can be used to derive coil sensitivities to drive existing parallel imaging techniques. As a proof of concept, the quadratic phase was applied by introducing an offset to the x(2) - y(2) shim and the data were reconstructed using adapted versions of the image space-based sensitivity encoding and GeneRalized Autocalibrating Partially Parallel Acquisitions algorithms. The method is demonstrated in a phantom (1 × 2, 1 × 3, and 2 × 2 acceleration) and in vivo (2 × 2 acceleration) using a 3D gradient echo acquisition. Phase scrambling can be used to perform parallel imaging acceleration without acquisition of separate coil calibration data, demonstrated here for a 3D-Cartesian trajectory. Further research is required to prove the applicability to other 2D and 3D sampling schemes. © 2014 Wiley Periodicals, Inc.
Agricultural Extension: Farm Extension Services in Australia, Britain and the United States.

Science.gov (United States)

Williams, Donald B.

By analyzing the scope and structure of agricultural extension services in Australia, Great Britain, and the United States, this work attempts to set guidelines for measuring progress and guiding extension efforts. Extension training, agricultural policy, and activities of national, international, state, and provincial bodies are examined. The…
Optimisation of a parallel ocean general circulation model

Science.gov (United States)

Beare, M. I.; Stevens, D. P.

1997-10-01

This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.
Discrete Hadamard transformation algorithm's parallelism analysis and achievement

Science.gov (United States)

Hu, Hui

2009-07-01

With respect to Discrete Hadamard Transformation (DHT) wide application in real-time signal processing while limitation in operation speed of DSP. The article makes DHT parallel research and its parallel performance analysis. Based on multiprocessor platform-TMS320C80 programming structure, the research is carried out to achieve two kinds of parallel DHT algorithms. Several experiments demonstrated the effectiveness of the proposed algorithms.
A parallel implementation of 3D Zernike moment analysis

Science.gov (United States)

Berjón, Daniel; Arnaldo, Sergio; Morán, Francisco

2011-01-01

Zernike polynomials are a well known set of functions that find many applications in image or pattern characterization because they allow to construct shape descriptors that are invariant against translations, rotations or scale changes. The concepts behind them can be extended to higher dimension spaces, making them also fit to describe volumetric data. They have been less used than their properties might suggest due to their high computational cost. We present a parallel implementation of 3D Zernike moments analysis, written in C with CUDA extensions, which makes it practical to employ Zernike descriptors in interactive applications, yielding a performance of several frames per second in voxel datasets about 2003 in size. In our contribution, we describe the challenges of implementing 3D Zernike analysis in a general-purpose GPU. These include how to deal with numerical inaccuracies, due to the high precision demands of the algorithm, or how to deal with the high volume of input data so that it does not become a bottleneck for the system.
Transparency in stereopsis: parallel encoding of overlapping depth planes.

Science.gov (United States)

Reeves, Adam; Lynch, David

2017-08-01

We report that after extensive training, expert adults can accurately report the number, up to six, of transparent overlapping depth planes portrayed by brief (400 ms or 200 ms) random-element stereoscopic displays, and can well discriminate six from seven planes. Naïve subjects did poorly above three planes. Displays contained seven rows of 12 randomly located ×'s or +'s; jittering the disparities and number in each row to remove spurious cues had little effect on accuracy. Removing the central 3° of the 10° display to eliminate foveal vision hardly reduced the number of reportable planes. Experts could report how many of six planes contained +'s when the remainder contained ×'s, and most learned to report up to six planes in reverse contrast (left eye white +'s; right eye black +'s). Long-term training allowed some experts to reach eight depth planes. Results suggest that adult stereoscopic vision can learn to distinguish the outputs of six or more statistically independent, contrast-insensitive, narrowly tuned, asymmetric disparity channels in parallel.
Parallel object-oriented term rewriting : the booleans

NARCIS (Netherlands)

Rodenburg, P.H.; Vrancken, J.L.M.

As a first case study in parallel object-oriented term rewriting, we give two implementations of term rewriting algorithms for boolean terms, using the parallel object-oriented features of the language Pool-T. The term rewriting systems are specified in the specification formalism
3D printed soft parallel actuator

Science.gov (United States)

Zolfagharian, Ali; Kouzani, Abbas Z.; Khoo, Sui Yang; Noshadi, Amin; Kaynak, Akif

2018-04-01

This paper presents a 3-dimensional (3D) printed soft parallel contactless actuator for the first time. The actuator involves an electro-responsive parallel mechanism made of two segments namely active chain and passive chain both 3D printed. The active chain is attached to the ground from one end and constitutes two actuator links made of responsive hydrogel. The passive chain, on the other hand, is attached to the active chain from one end and consists of two rigid links made of polymer. The actuator links are printed using an extrusion-based 3D-Bioplotter with polyelectrolyte hydrogel as printer ink. The rigid links are also printed by a 3D fused deposition modelling (FDM) printer with acrylonitrile butadiene styrene (ABS) as print material. The kinematics model of the soft parallel actuator is derived via transformation matrices notations to simulate and determine the workspace of the actuator. The printed soft parallel actuator is then immersed into NaOH solution with specific voltage applied to it via two contactless electrodes. The experimental data is then collected and used to develop a parametric model to estimate the end-effector position and regulate kinematics model in response to specific input voltage over time. It is observed that the electroactive actuator demonstrates expected behaviour according to the simulation of its kinematics model. The use of 3D printing for the fabrication of parallel soft actuators opens a new chapter in manufacturing sophisticated soft actuators with high dexterity and mechanical robustness for biomedical applications such as cell manipulation and drug release.
Parallel processor programs in the Federal Government

Science.gov (United States)

Schneck, P. B.; Austin, D.; Squires, S. L.; Lehmann, J.; Mizell, D.; Wallgren, K.

1985-01-01

In 1982, a report dealing with the nation's research needs in high-speed computing called for increased access to supercomputing resources for the research community, research in computational mathematics, and increased research in the technology base needed for the next generation of supercomputers. Since that time a number of programs addressing future generations of computers, particularly parallel processors, have been started by U.S. government agencies. The present paper provides a description of the largest government programs in parallel processing. Established in fiscal year 1985 by the Institute for Defense Analyses for the National Security Agency, the Supercomputing Research Center will pursue research to advance the state of the art in supercomputing. Attention is also given to the DOE applied mathematical sciences research program, the NYU Ultracomputer project, the DARPA multiprocessor system architectures program, NSF research on multiprocessor systems, ONR activities in parallel computing, and NASA parallel processor projects.
MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program

Science.gov (United States)

Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.

2018-02-01

We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.

Parallel image encryption algorithm based on discretized chaotic map

International Nuclear Information System (INIS)

Zhou Qing; Wong Kwokwo; Liao Xiaofeng; Xiang Tao; Hu Yue

2008-01-01

Recently, a variety of chaos-based algorithms were proposed for image encryption. Nevertheless, none of them works efficiently in parallel computing environment. In this paper, we propose a framework for parallel image encryption. Based on this framework, a new algorithm is designed using the discretized Kolmogorov flow map. It fulfills all the requirements for a parallel image encryption algorithm. Moreover, it is secure and fast. These properties make it a good choice for image encryption on parallel computing platforms
Introduction to massively-parallel computing in high-energy physics

CERN Document Server

AUTHOR|(CDS)2083520

1993-01-01

Ever since computers were first used for scientific and numerical work, there has existed an "arms race" between the technical development of faster computing hardware, and the desires of scientists to solve larger problems in shorter time-scales. However, the vast leaps in processor performance achieved through advances in semi-conductor science have reached a hiatus as the technology comes up against the physical limits of the speed of light and quantum effects. This has lead all high performance computer manufacturers to turn towards a parallel architecture for their new machines. In these lectures we will introduce the history and concepts behind parallel computing, and review the various parallel architectures and software environments currently available. We will then introduce programming methodologies that allow efficient exploitation of parallel machines, and present case studies of the parallelization of typical High Energy Physics codes for the two main classes of parallel computing architecture (S...
Improving image quality of parallel phase-shifting digital holography

International Nuclear Information System (INIS)

Awatsuji, Yasuhiro; Tahara, Tatsuki; Kaneko, Atsushi; Koyama, Takamasa; Nishio, Kenzo; Ura, Shogo; Kubota, Toshihiro; Matoba, Osamu

2008-01-01

The authors propose parallel two-step phase-shifting digital holography to improve the image quality of parallel phase-shifting digital holography. The proposed technique can increase the effective number of pixels of hologram twice in comparison to the conventional parallel four-step technique. The increase of the number of pixels makes it possible to improve the image quality of the reconstructed image of the parallel phase-shifting digital holography. Numerical simulation and preliminary experiment of the proposed technique were conducted and the effectiveness of the technique was confirmed. The proposed technique is more practical than the conventional parallel phase-shifting digital holography, because the composition of the digital holographic system based on the proposed technique is simpler.
Program Transformation to Identify List-Based Parallel Skeletons

Directory of Open Access Journals (Sweden)

Venkatesh Kannan

2016-07-01

Full Text Available Algorithmic skeletons are used as building-blocks to ease the task of parallel programming by abstracting the details of parallel implementation from the developer. Most existing libraries provide implementations of skeletons that are defined over flat data types such as lists or arrays. However, skeleton-based parallel programming is still very challenging as it requires intricate analysis of the underlying algorithm and often uses inefficient intermediate data structures. Further, the algorithmic structure of a given program may not match those of list-based skeletons. In this paper, we present a method to automatically transform any given program to one that is defined over a list and is more likely to contain instances of list-based skeletons. This facilitates the parallel execution of a transformed program using existing implementations of list-based parallel skeletons. Further, by using an existing transformation called distillation in conjunction with our method, we produce transformed programs that contain fewer inefficient intermediate data structures.
Parallelization of a spherical Sn transport theory algorithm

International Nuclear Information System (INIS)

Haghighat, A.

1989-01-01

The work described in this paper derives a parallel algorithm for an R-dependent spherical S N transport theory algorithm and studies its performance by testing different sample problems. The S N transport method is one of the most accurate techniques used to solve the linear Boltzmann equation. Several studies have been done on the vectorization of the S N algorithms; however, very few studies have been performed on the parallelization of this algorithm. Weinke and Hommoto have looked at the parallel processing of the different energy groups, and Azmy recently studied the parallel processing of the inner iterations of an X-Y S N nodal transport theory method. Both studies have reported very encouraging results, which have prompted us to look at the parallel processing of an R-dependent S N spherical geometry algorithm. This geometry was chosen because, in spite of its simplicity, it contains the complications of the curvilinear geometries (i.e., redistribution of neutrons over the discretized angular bins)
CUBESIM, Hypercube and Denelcor Hep Parallel Computer Simulation

International Nuclear Information System (INIS)

Dunigan, T.H.

1988-01-01

1 - Description of program or function: CUBESIM is a set of subroutine libraries and programs for the simulation of message-passing parallel computers and shared-memory parallel computers. Subroutines are supplied to simulate the Intel hypercube and the Denelcor HEP parallel computers. The system permits a user to develop and test parallel programs written in C or FORTRAN on a single processor. The user may alter such hypercube parameters as message startup times, packet size, and the computation-to-communication ratio. The simulation generates a trace file that can be used for debugging, performance analysis, or graphical display. 2 - Method of solution: The CUBESIM simulator is linked with the user's parallel application routines to run as a single UNIX process. The simulator library provides a small operating system to perform process and message management. 3 - Restrictions on the complexity of the problem: Up to 128 processors can be simulated with a virtual memory limit of 6 million bytes. Up to 1000 processes can be simulated
Numerical discrepancy between serial and MPI parallel computations

Directory of Open Access Journals (Sweden)

Sang Bong Lee

2016-09-01

Full Text Available Numerical simulations of 1D Burgers equation and 2D sloshing problem were carried out to study numerical discrepancy between serial and parallel computations. The numerical domain was decomposed into 2 and 4 subdomains for parallel computations with message passing interface. The numerical solution of Burgers equation disclosed that fully explicit boundary conditions used on subdomains of parallel computation was responsible for the numerical discrepancy of transient solution between serial and parallel computations. Two dimensional sloshing problems in a rectangular domain were solved using OpenFOAM. After a lapse of initial transient time sloshing patterns of water were significantly different in serial and parallel computations although the same numerical conditions were given. Based on the histograms of pressure measured at two points near the wall the statistical characteristics of numerical solution was not affected by the number of subdomains as much as the transient solution was dependent on the number of subdomains.
Parallel execution of chemical software on EGEE Grid

CERN Document Server

Sterzel, Mariusz

2008-01-01

Constant interest among chemical community to study larger and larger molecules forces the parallelization of existing computational methods in chemistry and development of new ones. These are main reasons of frequent port updates and requests from the community for the Grid ports of new packages to satisfy their computational demands. Unfortunately some parallelization schemes used by chemical packages cannot be directly used in Grid environment. Here we present a solution for Gaussian package. The current state of development of Grid middleware allows easy parallel execution in case of software using any of MPI flavour. Unfortunately many chemical packages do not use MPI for parallelization therefore special treatment is needed. Gaussian can be executed in parallel on SMP architecture or via Linda. These require reservation of certain number of processors/cores on a given WN and the equal number of processors/cores on each WN, respectively. The current implementation of EGEE middleware does not offer such f...
Parallel Narrative Structure in Paul Harding's "Tinkers"

Science.gov (United States)

Çirakli, Mustafa Zeki

2014-01-01

The present paper explores the implications of parallel narrative structure in Paul Harding's "Tinkers" (2009). Besides primarily recounting the two sets of parallel narratives, "Tinkers" also comprises of seemingly unrelated fragments such as excerpts from clock repair manuals and diaries. The main stories, however, told…
Customizable Memory Schemes for Data Parallel Architectures

NARCIS (Netherlands)

Gou, C.

2011-01-01

Memory system efficiency is crucial for any processor to achieve high performance, especially in the case of data parallel machines. Processing capabilities of parallel lanes will be wasted, when data requests are not accomplished in a sustainable and timely manner. Irregular vector memory accesses
A parallel approach to the stable marriage problem

DEFF Research Database (Denmark)

Larsen, Jesper

1997-01-01

This paper describes two parallel algorithms for the stable marriage problem implemented on a MIMD parallel computer. The algorithms are tested against sequential algorithms on randomly generated and worst-case instances. The results clearly show that the combination fo a very simple problem...... and a commercial MIMD system results in parallel algorithms which are not competitive with sequential algorithms wrt. practical performance. 1 Introduction In 1962 the Stable Marriage Problem was....
EXTENSION EDUCATION SYMPOSIUM: reinventing extension as a resource--what does the future hold?

Science.gov (United States)

Mirando, M A; Bewley, J M; Blue, J; Amaral-Phillips, D M; Corriher, V A; Whittet, K M; Arthur, N; Patterson, D J

2012-10-01

The mission of the Cooperative Extension Service, as a component of the land-grant university system, is to disseminate new knowledge and to foster its application and use. Opportunities and challenges facing animal agriculture in the United States have changed dramatically over the past few decades and require the use of new approaches and emerging technologies that are available to extension professionals. Increased federal competitive grant funding for extension, the creation of eXtension, the development of smartphone and related electronic technologies, and the rapidly increasing popularity of social media created new opportunities for extension educators to disseminate knowledge to a variety of audiences and engage these audiences in electronic discussions. Competitive grant funding opportunities for extension efforts to advance animal agriculture became available from the USDA National Institute of Food and Agriculture (NIFA) and have increased dramatically in recent years. The majority of NIFA funding opportunities require extension efforts to be integrated with research, and NIFA encourages the use of eXtension and other cutting-edge approaches to extend research to traditional clientele and nontraditional audiences. A case study is presented to illustrate how research and extension were integrated to improve the adoption of AI by beef producers. Those in agriculture are increasingly resorting to the use of social media venues such as Facebook, YouTube, LinkedIn, and Twitter to access information required to support their enterprises. Use of these various approaches by extension educators requires appreciation of the technology and an understanding of how the target audiences access information available on social media. Technology to deliver information is changing rapidly, and Cooperative Extension Service professionals will need to continuously evaluate digital technology and social media tools to appropriately integrate them into learning and
The kpx, a program analyzer for parallelization

International Nuclear Information System (INIS)

Matsuyama, Yuji; Orii, Shigeo; Ota, Toshiro; Kume, Etsuo; Aikawa, Hiroshi.

1997-03-01

The kpx is a program analyzer, developed as a common technological basis for promoting parallel processing. The kpx consists of three tools. The first is ktool, that shows how much execution time is spent in program segments. The second is ptool, that shows parallelization overhead on the Paragon system. The last is xtool, that shows parallelization overhead on the VPP system. The kpx, designed to work for any FORTRAN cord on any UNIX computer, is confirmed to work well after testing on Paragon, SP2, SR2201, VPP500, VPP300, Monte-4, SX-4 and T90. (author)
Speedup predictions on large scientific parallel programs

International Nuclear Information System (INIS)

Williams, E.; Bobrowicz, F.

1985-01-01

How much speedup can we expect for large scientific parallel programs running on supercomputers. For insight into this problem we extend the parallel processing environment currently existing on the Cray X-MP (a shared memory multiprocessor with at most four processors) to a simulated N-processor environment, where N greater than or equal to 1. Several large scientific parallel programs from Los Alamos National Laboratory were run in this simulated environment, and speedups were predicted. A speedup of 14.4 on 16 processors was measured for one of the three most used codes at the Laboratory
Progress in parallel implementation of the multilevel plane wave time domain algorithm

KAUST Repository

Liu, Yang

2013-07-01

The computational complexity and memory requirements of classical schemes for evaluating transient electromagnetic fields produced by Ns dipoles active for Nt time steps scale as O(NtN s 2) and O(Ns 2), respectively. The multilevel plane wave time domain (PWTD) algorithm [A.A. Ergin et al., Antennas and Propagation Magazine, IEEE, vol. 41, pp. 39-52, 1999], viz. the extension of the frequency domain fast multipole method (FMM) to the time domain, reduces the above costs to O(NtNslog2Ns) and O(Ns α) with α = 1.5 for surface current distributions and α = 4/3 for volumetric ones. Its favorable computational and memory costs notwithstanding, serial implementations of the PWTD scheme unfortunately remain somewhat limited in scope and ill-suited to tackle complex real-world scattering problems, and parallel implementations are called for. © 2013 IEEE.
Development of parallel Fokker-Planck code ALLAp

International Nuclear Information System (INIS)

Batishcheva, A.A.; Sigmar, D.J.; Koniges, A.E.

1996-01-01

We report on our ongoing development of the 3D Fokker-Planck code ALLA for a highly collisional scrape-off-layer (SOL) plasma. A SOL with strong gradients of density and temperature in the spatial dimension is modeled. Our method is based on a 3-D adaptive grid (in space, magnitude of the velocity, and cosine of the pitch angle) and a second order conservative scheme. Note that the grid size is typically 100 x 257 x 65 nodes. It was shown in our previous work that only these capabilities make it possible to benchmark a 3D code against a spatially-dependent self-similar solution of a kinetic equation with the Landau collision term. In the present work we show results of a more precise benchmarking against the exact solutions of the kinetic equation using a new parallel code ALLAp with an improved method of parallelization and a modified boundary condition at the plasma edge. We also report first results from the code parallelization using Message Passing Interface for a Massively Parallel CRI T3D platform. We evaluate the ALLAp code performance versus the number of T3D processors used and compare its efficiency against a Work/Data Sharing parallelization scheme and a workstation version
Practical parallel programming

CERN Document Server

Bauer, Barr E

2014-01-01

This is the book that will teach programmers to write faster, more efficient code for parallel processors. The reader is introduced to a vast array of procedures and paradigms on which actual coding may be based. Examples and real-life simulations using these devices are presented in C and FORTRAN.
Optimisation of a parallel ocean general circulation model

Directory of Open Access Journals (Sweden)

M. I. Beare

1997-10-01

Full Text Available This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.
Optimisation of a parallel ocean general circulation model

Directory of Open Access Journals (Sweden)

M. I. Beare

Full Text Available This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.
Test generation for digital circuits using parallel processing

Science.gov (United States)

Hartmann, Carlos R.; Ali, Akhtar-Uz-Zaman M.

1990-12-01

The problem of test generation for digital logic circuits is an NP-Hard problem. Recently, the availability of low cost, high performance parallel machines has spurred interest in developing fast parallel algorithms for computer-aided design and test. This report describes a method of applying a 15-valued logic system for digital logic circuit test vector generation in a parallel programming environment. A concept called fault site testing allows for test generation, in parallel, that targets more than one fault at a given location. The multi-valued logic system allows results obtained by distinct processors and/or processes to be merged by means of simple set intersections. A machine-independent description is given for the proposed algorithm.

Development of Industrial High-Speed Transfer Parallel Robot

International Nuclear Information System (INIS)

Kim, Byung In; Kyung, Jin Ho; Do, Hyun Min; Jo, Sang Hyun

2013-01-01

Parallel robots used in industry require high stiffness or high speed because of their structural characteristics. Nowadays, the importance of rapid transportation has increased in the distribution industry. In this light, an industrial parallel robot has been developed for high-speed transfer. The developed parallel robot can handle a maximum payload of 3 kg. For a payload of 0.1 kg, the trajectory cycle time is 0.3 s (come and go), and the maximum velocity is 4.5 m/s (pick amp, place work, adept cycle). In this motion, its maximum acceleration is very high and reaches approximately 13g. In this paper, the design, analysis, and performance test results of the developed parallel robot system are introduced
Introduction to parallel algorithms and architectures arrays, trees, hypercubes

CERN Document Server

Leighton, F Thomson

1991-01-01

Introduction to Parallel Algorithms and Architectures: Arrays Trees Hypercubes provides an introduction to the expanding field of parallel algorithms and architectures. This book focuses on parallel computation involving the most popular network architectures, namely, arrays, trees, hypercubes, and some closely related networks.Organized into three chapters, this book begins with an overview of the simplest architectures of arrays and trees. This text then presents the structures and relationships between the dominant network architectures, as well as the most efficient parallel algorithms for
Leveraging human oversight and intervention in large-scale parallel processing of open-source data

Science.gov (United States)

Casini, Enrico; Suri, Niranjan; Bradshaw, Jeffrey M.

2015-05-01

The popularity of cloud computing along with the increased availability of cheap storage have led to the necessity of elaboration and transformation of large volumes of open-source data, all in parallel. One way to handle such extensive volumes of information properly is to take advantage of distributed computing frameworks like Map-Reduce. Unfortunately, an entirely automated approach that excludes human intervention is often unpredictable and error prone. Highly accurate data processing and decision-making can be achieved by supporting an automatic process through human collaboration, in a variety of environments such as warfare, cyber security and threat monitoring. Although this mutual participation seems easily exploitable, human-machine collaboration in the field of data analysis presents several challenges. First, due to the asynchronous nature of human intervention, it is necessary to verify that once a correction is made, all the necessary reprocessing is done in chain. Second, it is often needed to minimize the amount of reprocessing in order to optimize the usage of resources due to limited availability. In order to improve on these strict requirements, this paper introduces improvements to an innovative approach for human-machine collaboration in the processing of large amounts of open-source data in parallel.
The Parallel C++ Statistical Library ‘QUESO’: Quantification of Uncertainty for Estimation, Simulation and Optimization

KAUST Repository

Prudencio, Ernesto E.

2012-01-01

QUESO is a collection of statistical algorithms and programming constructs supporting research into the uncertainty quantification (UQ) of models and their predictions. It has been designed with three objectives: it should (a) be sufficiently abstract in order to handle a large spectrum of models, (b) be algorithmically extensible, allowing an easy insertion of new and improved algorithms, and (c) take advantage of parallel computing, in order to handle realistic models. Such objectives demand a combination of an object-oriented design with robust software engineering practices. QUESO is written in C++, uses MPI, and leverages libraries already available to the scientific community. We describe some UQ concepts, present QUESO, and list planned enhancements.
Facilitating arrhythmia simulation: the method of quantitative cellular automata modeling and parallel running

Directory of Open Access Journals (Sweden)

Mondry Adrian

2004-08-01

Full Text Available Abstract Background Many arrhythmias are triggered by abnormal electrical activity at the ionic channel and cell level, and then evolve spatio-temporally within the heart. To understand arrhythmias better and to diagnose them more precisely by their ECG waveforms, a whole-heart model is required to explore the association between the massively parallel activities at the channel/cell level and the integrative electrophysiological phenomena at organ level. Methods We have developed a method to build large-scale electrophysiological models by using extended cellular automata, and to run such models on a cluster of shared memory machines. We describe here the method, including the extension of a language-based cellular automaton to implement quantitative computing, the building of a whole-heart model with Visible Human Project data, the parallelization of the model on a cluster of shared memory computers with OpenMP and MPI hybrid programming, and a simulation algorithm that links cellular activity with the ECG. Results We demonstrate that electrical activities at channel, cell, and organ levels can be traced and captured conveniently in our extended cellular automaton system. Examples of some ECG waveforms simulated with a 2-D slice are given to support the ECG simulation algorithm. A performance evaluation of the 3-D model on a four-node cluster is also given. Conclusions Quantitative multicellular modeling with extended cellular automata is a highly efficient and widely applicable method to weave experimental data at different levels into computational models. This process can be used to investigate complex and collective biological activities that can be described neither by their governing differentiation equations nor by discrete parallel computation. Transparent cluster computing is a convenient and effective method to make time-consuming simulation feasible. Arrhythmias, as a typical case, can be effectively simulated with the methods
Parallelization of the model-based iterative reconstruction algorithm DIRA

International Nuclear Information System (INIS)

Oertenberg, A.; Sandborg, M.; Alm Carlsson, G.; Malusek, A.; Magnusson, M.

2016-01-01

New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelization of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelization of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelized using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelization of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelization with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. (authors)
Portable programming on parallel/networked computers using the Application Portable Parallel Library (APPL)

Science.gov (United States)

Quealy, Angela; Cole, Gary L.; Blech, Richard A.

1993-01-01

The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.
On the Automatic Parallelization of Sparse and Irregular Fortran Programs

Directory of Open Access Journals (Sweden)

Yuan Lin

1999-01-01

Full Text Available Automatic parallelization is usually believed to be less effective at exploiting implicit parallelism in sparse/irregular programs than in their dense/regular counterparts. However, not much is really known because there have been few research reports on this topic. In this work, we have studied the possibility of using an automatic parallelizing compiler to detect the parallelism in sparse/irregular programs. The study with a collection of sparse/irregular programs led us to some common loop patterns. Based on these patterns new techniques were derived that produced good speedups when manually applied to our benchmark codes. More importantly, these parallelization methods can be implemented in a parallelizing compiler and can be applied automatically.
Survey on present status and trend of parallel programming environments

International Nuclear Information System (INIS)

Takemiya, Hiroshi; Higuchi, Kenji; Honma, Ichiro; Ohta, Hirofumi; Kawasaki, Takuji; Imamura, Toshiyuki; Koide, Hiroshi; Akimoto, Masayuki.

1997-03-01

This report intends to provide useful information on software tools for parallel programming through the survey on parallel programming environments of the following six parallel computers, Fujitsu VPP300/500, NEC SX-4, Hitachi SR2201, Cray T94, IBM SP, and Intel Paragon, all of which are installed at Japan Atomic Energy Research Institute (JAERI), moreover, the present status of R and D's on parallel softwares of parallel languages, compilers, debuggers, performance evaluation tools, and integrated tools is reported. This survey has been made as a part of our project of developing a basic software for parallel programming environment, which is designed on the concept of STA (Seamless Thinking Aid to programmers). (author)
Parallel-Architecture Simulator Development Using Hardware Transactional Memory

OpenAIRE

Armejach Sanosa, Adrià

2009-01-01

To address the need for a simpler parallel programming model, Transactional Memory (TM) has been developed and promises good parallel performance with easy-to-write parallel code. Unlike lock-based approaches, with TM, programmers do not need to explicitly specify and manage the synchronization among threads. However, programmers simply mark code segments as transactions, and the TM system manages the concurrency control for them. TM can be implemented either in software (STM) or hardware (HT...
Argentina's experience with parallel exchange markets: 1981-1990

OpenAIRE

Steven B. Kamin

1991-01-01

This paper surveys the development and operation of the parallel exchange market in Argentina during the 1980s, and evaluates its impact upon macroeconomic performance and policy. The historical evolution of Argentina's exchange market policies is reviewed in order to understand the government's motives for imposing exchange controls. The parallel exchange market engendered by these controls is then analyzed, and econometric methods are used to evaluate the behavior of the parallel exchange r...
Programming parallel architectures - The BLAZE family of languages

Science.gov (United States)

Mehrotra, Piyush

1989-01-01

This paper gives an overview of the various approaches to programming multiprocessor architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive, since they remove much of the burden of exploiting parallel architectures from the user. This paper also describes recent work in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described.
Parallel plate detectors

International Nuclear Information System (INIS)

Gardes, D.; Volkov, P.

1981-01-01

A 5x3cm 2 (timing only) and a 15x5cm 2 (timing and position) parallel plate avalanche counters (PPAC) are considered. The theory of operation and timing resolution is given. The measurement set-up and the curves of experimental results illustrate the possibilities of the two counters [fr
Parallel processing of neutron transport in fuel assembly calculation

International Nuclear Information System (INIS)

Song, Jae Seung

1992-02-01

Group constants, which are used for reactor analyses by nodal method, are generated by fuel assembly calculations based on the neutron transport theory, since one or a quarter of the fuel assembly corresponds to a unit mesh in the current nodal calculation. The group constant calculation for a fuel assembly is performed through spectrum calculations, a two-dimensional fuel assembly calculation, and depletion calculations. The purpose of this study is to develop a parallel algorithm to be used in a parallel processor for the fuel assembly calculation and the depletion calculations of the group constant generation. A serial program, which solves the neutron integral transport equation using the transmission probability method and the linear depletion equation, was prepared and verified by a benchmark calculation. Small changes from the serial program was enough to parallelize the depletion calculation which has inherent parallel characteristics. In the fuel assembly calculation, however, efficient parallelization is not simple and easy because of the many coupling parameters in the calculation and data communications among CPU's. In this study, the group distribution method is introduced for the parallel processing of the fuel assembly calculation to minimize the data communications. The parallel processing was performed on Quadputer with 4 CPU's operating in NURAD Lab. at KAIST. Efficiencies of 54.3 % and 78.0 % were obtained in the fuel assembly calculation and depletion calculation, respectively, which lead to the overall speedup of about 2.5. As a result, it is concluded that the computing time consumed for the group constant generation can be easily reduced by parallel processing on the parallel computer with small size CPU's
Parallel Application Development Using Architecture View Driven Model Transformations

NARCIS (Netherlands)

Arkin, E.; Tekinerdogan, B.

2015-01-01

o realize the increased need for computing performance the current trend is towards applying parallel computing in which the tasks are run in parallel on multiple nodes. On its turn we can observe the rapid increase of the scale of parallel computing platforms. This situation has led to a complexity
Parallelization of ITOUGH2 using PVM

International Nuclear Information System (INIS)

Finsterle, Stefan

1998-01-01

ITOUGH2 inversions are computationally intensive because the forward problem must be solved many times to evaluate the objective function for different parameter combinations or to numerically calculate sensitivity coefficients. Most of these forward runs are independent from each other and can therefore be performed in parallel. Message passing based on the Parallel Virtual Machine (PVM) system has been implemented into ITOUGH2 to enable parallel processing of ITOUGH2 jobs on a heterogeneous network of Unix workstations. This report describes the PVM system and its implementation into ITOUGH2. Instructions are given for installing PVM, compiling ITOUGH2-PVM for use on a workstation cluster, the preparation of an 1.TOUGH2 input file under PVM, and the execution of an ITOUGH2-PVM application. Examples are discussed, demonstrating the use of ITOUGH2-PVM
An environment for parallel structuring of Fortran programs

International Nuclear Information System (INIS)

Sridharan, K.; McShea, M.; Denton, C.; Eventoff, B.; Browne, J.C.; Newton, P.; Ellis, M.; Grossbard, D.; Wise, T.; Clemmer, D.

1990-01-01

The paper describes and illustrates an environment for interactive support of the detection and implementation of macro-level parallelism in Fortran programs. The approach couples algorithms for dependence analysis with both innovative techniques for complexity management and capabilities for the measurement and analysis of the parallel computation structures generated through use of the environment. The resulting environment is complementary to the more common approach of seeking local parallelism by loop unrolling, either by an automatic compiler or manually. (orig.)
K.I.S.S. Parallel Coding (lecture 2)

CERN Multimedia

CERN. Geneva

2018-01-01

K.I.S.S.ing parallel computing means, finally, loving it. Parallel computing will be approached in a theoretical and experimental way, using the most advanced and used C API: OpenMP. OpenMP is an open source project constantly developed and updated to hide the awful complexity of parallel coding in an awesome interface. The result is a tool which leaves plenty of space for clever solutions and terrific results in terms of efficiency and performance maximisation.
Models of parallel computation :a survey and classification

Institute of Scientific and Technical Information of China (English)

ZHANG Yunquan; CHEN Guoliang; SUN Guangzhong; MIAO Qiankun

2007-01-01

In this paper,the state-of-the-art parallel computational model research is reviewed.We will introduce various models that were developed during the past decades.According to their targeting architecture features,especially memory organization,we classify these parallel computational models into three generations.These models and their characteristics are discussed based on three generations classification.We believe that with the ever increasing speed gap between the CPU and memory systems,incorporating non-uniform memory hierarchy into computational models will become unavoidable.With the emergence of multi-core CPUs,the parallelism hierarchy of current computing platforms becomes more and more complicated.Describing this complicated parallelism hierarchy in future computational models becomes more and more important.A semi-automatic toolkit that can extract model parameters and their values on real computers can reduce the model analysis complexity,thus allowing more complicated models with more parameters to be adopted.Hierarchical memory and hierarchical parallelism will be two very important features that should be considered in future model design and research.
Parallel ray tracing for one-dimensional discrete ordinate computations

International Nuclear Information System (INIS)

Jarvis, R.D.; Nelson, P.

1996-01-01

The ray-tracing sweep in discrete-ordinates, spatially discrete numerical approximation methods applied to the linear, steady-state, plane-parallel, mono-energetic, azimuthally symmetric, neutral-particle transport equation can be reduced to a parallel prefix computation. In so doing, the often severe penalty in convergence rate of the source iteration, suffered by most current parallel algorithms using spatial domain decomposition, can be avoided while attaining parallelism in the spatial domain to whatever extent desired. In addition, the reduction implies parallel algorithm complexity limits for the ray-tracing sweep. The reduction applies to all closed, linear, one-cell functional (CLOF) spatial approximation methods, which encompasses most in current popular use. Scalability test results of an implementation of the algorithm on a 64-node nCube-2S hypercube-connected, message-passing, multi-computer are described. (author)

Workplace Issues in Extension--A Delphi Study of Extension Educators

Science.gov (United States)

Kroth, Michael; Peutz, Joey

2011-01-01

Using the Delphi technique, expert Extension educators identified and prioritized those workplace issues they believe will be the most important to attract, motivate, and retain Extension educators/agents over the next 5 to 7 years. Obtaining and then utilizing a talented, highly motivated workforce during a period when many will be retiring will…
Data communications in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Davis, Kristan D; Faraj, Daniel A

2013-07-09

Algorithm selection for data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including specifications of a client, a context, and a task, endpoints coupled for data communications through the PAMI, including associating in the PAMI data communications algorithms and ranges of message sizes so that each algorithm is associated with a separate range of message sizes; receiving in an origin endpoint of the PAMI a data communications instruction, the instruction specifying transmission of a data communications message from the origin endpoint to a target endpoint, the data communications message characterized by a message size; selecting, from among the associated algorithms and ranges, a data communications algorithm in dependence upon the message size; and transmitting, according to the selected data communications algorithm from the origin endpoint to the target endpoint, the data communications message.
Scalability of Parallel Scientific Applications on the Cloud

Directory of Open Access Journals (Sweden)

Satish Narayana Srirama

2011-01-01

Full Text Available Cloud computing, with its promise of virtually infinite resources, seems to suit well in solving resource greedy scientific computing problems. To study the effects of moving parallel scientific applications onto the cloud, we deployed several benchmark applications like matrix–vector operations and NAS parallel benchmarks, and DOUG (Domain decomposition On Unstructured Grids on the cloud. DOUG is an open source software package for parallel iterative solution of very large sparse systems of linear equations. The detailed analysis of DOUG on the cloud showed that parallel applications benefit a lot and scale reasonable on the cloud. We could also observe the limitations of the cloud and its comparison with cluster in terms of performance. However, for efficiently running the scientific applications on the cloud infrastructure, the applications must be reduced to frameworks that can successfully exploit the cloud resources, like the MapReduce framework. Several iterative and embarrassingly parallel algorithms are reduced to the MapReduce model and their performance is measured and analyzed. The analysis showed that Hadoop MapReduce has significant problems with iterative methods, while it suits well for embarrassingly parallel algorithms. Scientific computing often uses iterative methods to solve large problems. Thus, for scientific computing on the cloud, this paper raises the necessity for better frameworks or optimizations for MapReduce.
The Performance of an Object-Oriented, Parallel Operating System

Directory of Open Access Journals (Sweden)

David R. Kohr, Jr.

1994-01-01

Full Text Available The nascent and rapidly evolving state of parallel systems often leaves parallel application developers at the mercy of inefficient, inflexible operating system software. Given the relatively primitive state of parallel systems software, maximizing the performance of parallel applications not only requires judicious tuning of the application software, but occasionally, the replacement of specific system software modules with others that can more readily respond to the imposed pattern of resource demands. To assess the feasibility of application and performance tuning via malleable system software and to understand the performance penalties for detailed operating system performance data capture, we describe a set of performance instrumentation techniques for parallel, object-oriented operating systems and a set of performance experiments with Choices, an experimental, object-oriented operating system designed for use with parallel sys- tems. These performance experiments show that (a the performance overhead for operating system data capture is modest, (b the penalty for malleable, object-oriented operating systems is negligible, but (c techniques are needed to strictly enforce adherence of implementation to design if operating system modules are to be replaced.
Parallelization of MCNP Monte Carlo neutron and photon transport code in parallel virtual machine and message passing interface

International Nuclear Information System (INIS)

Deng Li; Xie Zhongsheng

1999-01-01

The coupled neutron and photon transport Monte Carlo code MCNP (version 3B) has been parallelized in parallel virtual machine (PVM) and message passing interface (MPI) by modifying a previous serial code. The new code has been verified by solving sample problems. The speedup increases linearly with the number of processors and the average efficiency is up to 99% for 12-processor. (author)
Techniques applied in design optimization of parallel manipulators

CSIR Research Space (South Africa)

Modungwa, D

2011-11-01

Full Text Available the desired dexterous workspace " Robot.Comput.Integrated Manuf., vol. 23, pp. 38 - 46, 2007. [12] A.P. Murray, F. Pierrot, P. Dauchez and J.M. McCarthy, "A planar quaternion approach to the kinematic synthesis of a parallel manipulator " Robotica, vol... design of a three translational DoFs parallel manipulator " Robotica, vol. 24, pp. 239, 2005. [15] J. Angeles, "The robust design of parallel manipulators," in 1st Int. Colloquium, Collaborative Research Centre 562, 2002. [16] S. Bhattacharya, H...
High Performance Parallel Multigrid Algorithms for Unstructured Grids

Science.gov (United States)

Frederickson, Paul O.

1996-01-01

We describe a high performance parallel multigrid algorithm for a rather general class of unstructured grid problems in two and three dimensions. The algorithm PUMG, for parallel unstructured multigrid, is related in structure to the parallel multigrid algorithm PSMG introduced by McBryan and Frederickson, for they both obtain a higher convergence rate through the use of multiple coarse grids. Another reason for the high convergence rate of PUMG is its smoother, an approximate inverse developed by Baumgardner and Frederickson.
Graph Transformation and Designing Parallel Sparse Matrix Algorithms beyond Data Dependence Analysis

Directory of Open Access Journals (Sweden)

H.X. Lin

2004-01-01

Full Text Available Algorithms are often parallelized based on data dependence analysis manually or by means of parallel compilers. Some vector/matrix computations such as the matrix-vector products with simple data dependence structures (data parallelism can be easily parallelized. For problems with more complicated data dependence structures, parallelization is less straightforward. The data dependence graph is a powerful means for designing and analyzing parallel algorithms. However, for sparse matrix computations, parallelization based on solely exploiting the existing parallelism in an algorithm does not always give satisfactory results. For example, the conventional Gaussian elimination algorithm for the solution of a tri-diagonal system is inherently sequential, so algorithms specially for parallel computation has to be designed. After briefly reviewing different parallelization approaches, a powerful graph formalism for designing parallel algorithms is introduced. This formalism will be discussed using a tri-diagonal system as an example. Its application to general matrix computations is also discussed. Its power in designing parallel algorithms beyond the ability of data dependence analysis is shown by means of a new algorithm called ACER (Alternating Cyclic Elimination and Reduction algorithm.
Directions in parallel processor architecture, and GPUs too

CERN Multimedia

CERN. Geneva

2014-01-01

Modern computing is power-limited in every domain of computing. Performance increments extracted from instruction-level parallelism (ILP) are no longer power-efficient; they haven't been for some time. Thread-level parallelism (TLP) is a more easily exploited form of parallelism, at the expense of programmer effort to expose it in the program. In this talk, I will introduce you to disparate topics in parallel processor architecture that will impact programming models (and you) in both the near and far future. About the speaker Olivier is a senior GPU (SM) architect at NVIDIA and an active participant in the concurrency working group of the ISO C++ committee. He has also worked on very large diesel engines as a mechanical engineer, and taught at McGill University (Canada) as a faculty instructor.
A Parallel Saturation Algorithm on Shared Memory Architectures

Science.gov (United States)

Ezekiel, Jonathan; Siminiceanu

2007-01-01

Symbolic state-space generators are notoriously hard to parallelize. However, the Saturation algorithm implemented in the SMART verification tool differs from other sequential symbolic state-space generators in that it exploits the locality of ring events in asynchronous system models. This paper explores whether event locality can be utilized to efficiently parallelize Saturation on shared-memory architectures. Conceptually, we propose to parallelize the ring of events within a decision diagram node, which is technically realized via a thread pool. We discuss the challenges involved in our parallel design and conduct experimental studies on its prototypical implementation. On a dual-processor dual core PC, our studies show speed-ups for several example models, e.g., of up to 50% for a Kanban model, when compared to running our algorithm only on a single core.
Nuclear Energy Gradients for Internally Contracted Complete Active Space Second-Order Perturbation Theory: Multistate Extensions.

Science.gov (United States)

Vlaisavljevich, Bess; Shiozaki, Toru

2016-08-09

We report the development of the theory and computer program for analytical nuclear energy gradients for (extended) multistate complete active space perturbation theory (CASPT2) with full internal contraction. The vertical shifts are also considered in this work. This is an extension of the fully internally contracted CASPT2 nuclear gradient program recently developed for a state-specific variant by us [MacLeod and Shiozaki, J. Chem. Phys. 2015, 142, 051103]; in this extension, the so-called λ equation is solved to account for the variation of the multistate CASPT2 energies with respect to the change in the amplitudes obtained in the preceding state-specific CASPT2 calculations, and the Z vector equations are modified accordingly. The program is parallelized using the MPI3 remote memory access protocol that allows us to perform efficient one-sided communication. The optimized geometries of the ground and excited states of a copper corrole and benzophenone are presented as numerical examples. The code is publicly available under the GNU General Public License.
Massively parallel evolutionary computation on GPGPUs

CERN Document Server

Tsutsui, Shigeyoshi

2013-01-01

Evolutionary algorithms (EAs) are metaheuristics that learn from natural collective behavior and are applied to solve optimization problems in domains such as scheduling, engineering, bioinformatics, and finance. Such applications demand acceptable solutions with high-speed execution using finite computational resources. Therefore, there have been many attempts to develop platforms for running parallel EAs using multicore machines, massively parallel cluster machines, or grid computing environments. Recent advances in general-purpose computing on graphics processing units (GPGPU) have opened u
Nuclear plant life extension

International Nuclear Information System (INIS)

Negin, C.A.

1989-01-01

The nuclear power industry's addressing of life extension is a natural trend in the maturation of this technology after 20 years of commercial operation. With increasing emphasis on how plants are operated, and less on how to build them, attention is turning on to maximizing the use of these substantial investments. The first studies of life extension were conducted in the period from 1978 and 1982. These were motivated by the initiation, by the Nuclear Regulatory Commission (NRC), of studies to support decommissioning rulemaking. The basic conclusions of those early studies that life extension is feasible and worth pursuing have not been changed by the much more extensive investigations that have since been conducted. From an engineering perspective, life extension for nuclear plants is fundamentally the same as for fossil plants
Design strategies for irregularly adapting parallel applications

International Nuclear Information System (INIS)

Oliker, Leonid; Biswas, Rupak; Shan, Hongzhang; Sing, Jaswinder Pal

2000-01-01

Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance of dynamically adapting computations. In this work, we examine two major classes of adaptive applications, under five competing programming methodologies and four leading parallel architectures. Results indicate that it is possible to achieve message-passing performance using shared-memory programming techniques by carefully following the same high level strategies. Adaptive applications have computational work loads and communication patterns which change unpredictably at runtime, requiring dynamic load balancing to achieve scalable performance on parallel machines. Efficient parallel implementations of such adaptive applications are therefore a challenging task. This work examines the implementation of two typical adaptive applications, Dynamic Remeshing and N-Body, across various programming paradigms and architectural platforms. We compare several critical factors of the parallel code development, including performance, programmability, scalability, algorithmic development, and portability
SMARTS: Exploiting Temporal Locality and Parallelism through Vertical Execution

International Nuclear Information System (INIS)

Beckman, P.; Crotinger, J.; Karmesin, S.; Malony, A.; Oldehoeft, R.; Shende, S.; Smith, S.; Vajracharya, S.

1999-01-01

In the solution of large-scale numerical prob- lems, parallel computing is becoming simultaneously more important and more difficult. The complex organization of today's multiprocessors with several memory hierarchies has forced the scientific programmer to make a choice between simple but unscalable code and scalable but extremely com- plex code that does not port to other architectures. This paper describes how the SMARTS runtime system and the POOMA C++ class library for high-performance scientific computing work together to exploit data parallelism in scientific applications while hiding the details of manag- ing parallelism and data locality from the user. We present innovative algorithms, based on the macro -dataflow model, for detecting data parallelism and efficiently executing data- parallel statements on shared-memory multiprocessors. We also desclibe how these algorithms can be implemented on clusters of SMPS
SMARTS: Exploiting Temporal Locality and Parallelism through Vertical Execution

Energy Technology Data Exchange (ETDEWEB)

Beckman, P.; Crotinger, J.; Karmesin, S.; Malony, A.; Oldehoeft, R.; Shende, S.; Smith, S.; Vajracharya, S.

1999-01-04

In the solution of large-scale numerical prob- lems, parallel computing is becoming simultaneously more important and more difficult. The complex organization of today's multiprocessors with several memory hierarchies has forced the scientific programmer to make a choice between simple but unscalable code and scalable but extremely com- plex code that does not port to other architectures. This paper describes how the SMARTS runtime system and the POOMA C++ class library for high-performance scientific computing work together to exploit data parallelism in scientific applications while hiding the details of manag- ing parallelism and data locality from the user. We present innovative algorithms, based on the macro -dataflow model, for detecting data parallelism and efficiently executing data- parallel statements on shared-memory multiprocessors. We also desclibe how these algorithms can be implemented on clusters of SMPS.
Smuggling, non-fundamental uncertainty, and parallel market exchange rate volatility

OpenAIRE

Richard Clay Barnett

2003-01-01

We explore a model where smuggling and a parallel currency market arise, owing to government restrictions that prevent agents from legally holding foreign exchange. Despite such restrictions, agents are able to diversify their savings, holding both domestic and parallel foreign cash, basing their portfolio allocation on current and prospective parallel exchange rates. We attribute movements in parallel rates to non-fundamental uncertainty. The model generates equilibria with both positive and...
Applications of Parallel Processing in Mobile Banking

Directory of Open Access Journals (Sweden)

2007-01-01

Full Text Available The future of mobile banking will be represented by such applications that support mobile, Internet banking and EFT (Electronic Funds Transfer transactions in a single user interface. In such a way, the mobile banking will be able to cover all the types of applications demanded at the market level. The parallel processing of credit card bank transactions could be performed with the help of a grid network. Excluding some limitations, the grid processing offers huge opportunities to exploit the parallelism. For this reason, a lot of applications of waiting queues in grid processing were developed in the last years. Grid networks represent a distinctive and very modern field of the parallel and distributed processing.
Parallel computational in nuclear group constant calculation

International Nuclear Information System (INIS)

Su'ud, Zaki; Rustandi, Yaddi K.; Kurniadi, Rizal

2002-01-01

In this paper parallel computational method in nuclear group constant calculation using collision probability method will be discuss. The main focus is on the calculation of collision matrix which need large amount of computational time. The geometry treated here is concentric cylinder. The calculation of collision probability matrix is carried out using semi analytic method using Beckley Naylor Function. To accelerate computation speed some computer parallel used to solve the problem. We used LINUX based parallelization using PVM software with C or fortran language. While in windows based we used socket programming using DELPHI or C builder. The calculation results shows the important of optimal weight for each processor in case there area many type of processor speed
Negative plant-phyllosphere feedbacks in native Asteraceae hosts - a novel extension of the plant-soil feedback framework.

Science.gov (United States)

Whitaker, Briana K; Bauer, Jonathan T; Bever, James D; Clay, Keith

2017-08-01

Over the past 25 years, the plant-soil feedback (PSF) framework has catalyzed our understanding of how belowground microbiota impact plant fitness and species coexistence. Here, we apply a novel extension of this framework to microbiota associated with aboveground tissues, termed 'plant-phyllosphere feedback (PPFs)'. In parallel greenhouse experiments, rhizosphere and phyllosphere microbiota of con- and heterospecific hosts from four species were independently manipulated. In a third experiment, we tested the combined effects of soil and phyllosphere feedback under field conditions. We found that three of four species experienced weak negative PSF whereas, in contrast, all four species experienced strong negative PPFs. Field-based feedback estimates were highly negative for all four species, though variable in magnitude. Our results suggest that phyllosphere microbiota, like rhizosphere microbiota, can potentially mediate plant species coexistence via negative feedbacks. Extension of the PSF framework to the phyllosphere is needed to more fully elucidate plant-microbiota interactions. © 2017 John Wiley & Sons Ltd/CNRS.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.